Machine Learning with Scikit-Learn – 39 – Label Encoding

today we're gonna discuss label encoding so in your average machine learning classification project you use features and labels to train your algorithm now features provide unique representation for your samples while labels are used to assign the samples to different classes let's take for example the classical iris dataset and which flowers are labeled into three classes so Tosa versicolor and virginica in this case our labels are in the form of words but they can also be numbers however algorithms are trained more efficiently when these labels are numerical so in our case we could encode the three classes in the following way so let's say Soto's is gonna be zero versicolor is gonna be one and virginica is gonna be two and this would actually make the training process more efficiently now what we've done here is something called label encoding and scikit-learn the machine learning library we've mostly used in these tutorials there are modules and methods that can take care of the labeling now let me show you so from SK learn we're gonna import pre-processing pre-processing okay and let's create a few samples so let's say labels is set OSA versicolor and virginica okay let's run this one now what we're actually gonna do is I'm actually going to show you how we can use a label encoder so let's say encoder pre-processing label encoder this is how we instantiate it and then we fit it on to our labels so encoder set labels okay now let's actually see how the encoder mapped these labels so let's do a for loop for eye items in enumerate encoder classes we'll gonna print item I now shift enter to run this for I item okay my bad so it actually encoded them the same way I've encoded them up here now we could test this on new samples so I could say more labels let's say versicolor versicolor virginica so Tosa and another one versicolor okay and we'll transform these new labels according to our encoder so let's say more labels encoded encoder so we'll use our encoder to transform more labels and then we'll actually print the output but first we're gonna print how our more labels look like some more labels equal equals more labels and then we'll print more labels encoded equals let's say list more labels encoded okay now we're gonna shift enter to run this one so we can see that the actual labeling has been performed correctly so versicolor is one so first the color one versicolor one virginica to soto so zero and versicolor one now in your machine learning project in your generic machine learning project you'd usually have X so big X for the features and Y for the labels and you'd perform this type of encoding only on the Y now in an upcoming video we'll look into another slightly more complex type of encoding so if you enjoyed this video please hit the like button and subscribe thank you for watching and I'll see you in the next one

7 thoughts on “Machine Learning with Scikit-Learn – 39 – Label Encoding”

  1. This is really enraging. You are not showing the most important part (AS ALWAYS) – how do i save this encoder and re-use it. I've been looking for this information IN CODE for 5 hours and everyone just uses it just like that… What if you need to re-use this encoding in 2 weeks? Are you gonna keep the environment running for 2 weeks? If yes – you are a moron….

  2. This has been a great series. Thanks a lot for your efforts. Do you plan to add more videos to this series ? Covering unsupervised learning and regression algorithms

Leave a Reply

Your email address will not be published. Required fields are marked *