Boosting Machine Learning Tutorial | Adaptive Boosting, Gradient Boosting, XGBoost | Edureka



Since we're generating an immeasurable
amount of data it has become a need to develop more advanced and complex
machine learning techniques. Boosting machine learning is one such technique
that can be used to solve complex data-driven real-world problems hi
everyone I'm Zulaikha from Edureka and I welcome you to this session on
boosting machine learning let me quickly run you through today's agenda we're
going to begin the session by understanding why boosting is used after
that we'll understand what exactly boosting means in machine learning we'll
move on and discuss how the boosting algorithm works and we'll finally
discuss the different types of boosting which include adaptive boosting, gradient
boosting and XGBoost we finally in the session by looking at a practical
implementation in Python where will understand how boosting machine learning
algorithms can be used to improve the accuracy of a model so as before I move
any further make sure that you subscribe to Edureka youtube channel in order to
stay updated above the most trending technologies now let's take a look at
our first topic so why exactly are we using boosting machine learning
techniques before I tell you what it is let's understand what led to the need of
boosting machine learning in order to solve complex and convoluted problems we
require more advanced techniques right now let's suppose that given a data set
of images containing cats and dogs you are asked to build a machine learning
model that can classify these images into two separate classes now like every
other person you will start by identifying the images by using some
rules let's say the image has point to yours is a rule now if the image has
point ears then it's a cat right similarly let's say you've created
another rule which is the image has cat shaped lights which means it's a cat
again now if the image has bigger limbs and
then it's a dog and if the image has sharpened claws then it's a cat
similarly if the image has a wider mouth structure then it's a dog
now these are some rules that we define in order to identify whether it is a cat
or a dog using one of these rules to classify the image it does not make
sense okay let's say that the cat is of a different breed and it has bigger
limps and you give an input image and the rule sees the image has bigger limbs
is classified as dog now each of these rules if applied individually on an
image will not give you an accurate result right you have to apply all of
these rules and make sure that image go through all these rules and only then
predict the output so each of these rules individually are called weak
learners because these rules are not strong enough to classify an image as a
cat or a dog individually okay what I'm saying is if you just use one rule to
classify an image as a cat or a dog then your prediction will mostly be wrong you
cannot take one feature into consideration and classify the images
either cat or dog so to make sure that our prediction is more accurate we can
combine the predictions from each of these weak learners by using the
majority rule or weighted average and this is exactly what a strong learner
model is so in the above example what we did is we've defined five weak learners
and the majority of these rules give us a prediction that the image is a cat
that's why our final output is a cat right here you can see that three of
these rules classify the image as a cat and two of them classify it as a dog so
a majority says that this is a cat so we're going to go with cat this is what
a strong learner module is it'll just combine all the weak learners in order
to give you a more precise and more accurate prediction now this brings us
to the question what exactly is boosting boosting is an ensample learning
technique that uses a set of machine learning algorithms in order to convert
or combine weak learners to strong learners in order to increase the
accuracy of the model so guys boosting is actually a very
effective method in order to increase the efficiency of your model in most of
these competitions that you see on Kaggle or any machine learning
competition maximum of the computers or you know the winners
usually implement boosting and bagging or any other ensamble learning technique
now for those of you who don't know what ensamble learning is don't worry I'll be
covering that in the next line so as you can see from the figure by combining the
outputs or the predictions that we get from all our vehicle owners or our rules
in order to get a strong learner right so this is the basic principle behind
boosting now let's understand what ensamble learning is and Semba learning
is basically a technique that is used to enhance your model performance and its
accuracy this is exactly why in-sample methods are used to win market leading
competitions such as the Netflix recommendation competition and other
Kaggle competitions right so maximum of your winners will always be implementing
ensamble learning models so under ensamble learning we have two types we
have sequential and symbol and parallel and symbol so guys before you get
confused let me tell you that boosting is a type of ensamble learning boosting
and bagging are the two different types in which you can perform ensamble
learning so the first type of model is the sequential ensamble model which is
popularly known as boosting here the weak learners are sequentially produced
during the training phase the performance of the model is improved by
assigning a higher beta to the previous incorrectly classified samples an
example of boosting is the adaptive boosting algorithm now in boosting and
sample learning what happens is you feed you entire data set to your algorithm
and the algorithm will make some predictions let's say the algorithm
misclassified some of your data now what happens in boosting is you pay more
attention to the misclassified data points you increase our wait age and
therefore you make it a point that a lot more importance is given to these
misclassified values you keep doing this until all your wrongfully predicted or
your misclassified samples are correctly predicted right that's how you increase
the efficiency of your model then we have something known as parallel and
sample learning also known as bagging your the week learners are produced
parallel during the training phase now the
performance of the model can be increased by parallely training a number
of weak learners on a bootstrapped data set an example of bagging is the random
forest algorithm okay so the principle behind bagging is dividing your data set
into different bootstrap data sets and you're running a weak learner or an
algorithm on each of these data sets so your paddle I doing all of this
whereas in boosting you sequentially doing this along with updating the
weights depending on the misclassified samples right this is exactly what
ensamble learning is and I just told you what exactly bagging and boosting is
right so there is a clear distinction between these two and this is actually
one of the most frequently asked questions if you go for an interview on
machine learning they always make sure to ask you what exactly is bagging and
boosting and how are they different so as make sure you understand the
difference between the two now let's move on and understand how the boosting
algorithm works so like I mentioned the basic principle here is to generate
multiple week learners and combine their predictions to form one strong rule now
these big learners are generated by applying base machine learning
algorithms on different distributions of the data set now these base learning
algorithms are usually decision trees by default in a boosting algorithm so what
these based learners do is they generate weak rules for each iteration so after
multiple iterations the weak learners are combined and they form a strong
learner that will predict a more accurate outcome so let me explain this
stepwise to you consider this about data set over here you have two different
types of data you have squares and you have circles so basically your end goal
is to classify them into two different classes now this is exactly how you do
it so how you start is the base algorithm will read the data and it'll
assign equal wage to all of these data points so after that we'll try to
analyze the data and try to draw a decision stump
decision stump is basically a single level decision tree that tries to
classify the data points so after it assigns equal weights to all the
the points in the class it will try to draw a decision stump right in the first
image you can see the decision stump now after that it will check for any
false predictions now the next step is the base learner will identify all the
false predictions that had made so in the next iteration what you do is you
just assign a higher weight age to these misclassified samples in the first image
we have successfully separated these two squares right but there are three other
squares on the other side meaning that we've misclassified these three squares
so in the next iteration if you take a look at the image the three squares have
a higher beta jazzing have shown that by increasing the size of the image bhadiya
the next hydration you increase the page on your misclassified samples
similarly you keep doing this until you separate your class a from a class B so
basically you are going to pay more attention to your misclassified samples
you're going to increase their wait age and you're going to make sure that those
samples are correctly classified in the next iteration so like I said you'll
repeat the step two right you will keep increasing the weight age of
misclassified samples until all of the samples are correctly classified so look
at the forth diagram here so everything is classified correctly right we have a
set of squares we have a set of circles so that's exactly how boosting algorithm
works now let's understand the different types of boosting so mainly there are
three classes of boosting this adaptive boosting gradient boosting and xg–
boost so we'll discuss each of these in brief so that the boosting is what I
explained to you in the previous line it is implemented by combining several weak
learners into a single strong learn so the couple of steps that this adaptive
boosting algorithm follows so adaptive boosting starts by assigning equal
weight edge to all of your data points and you draw out a decision stump for a
single input feature so the next step is the results that you get from the first
decision stump are analyzed and if any observations are misclassified then they
are assigned higher weights this exactly are explained in the previous slide also
so after that new decision stump is drawn by considering the
observations with the higher weights as more significant so whichever data point
was misclassified they are given a higher weight it in the next step you'll
draw another decision stump that tries to classify the data points by giving
more importance to the data points with higher weight age again if there are any
observations that are misclassified then they're given higher weight and this
process will keep continuing it will keep looping until all the observations
fall into the right class so the end goal here is to make sure that all your
data points are classified into the correct classes adaptive boosting or add
a boost can also be used for regression problems it's not restricted to
classification only it can be used for both classification and regression but
it's more commonly seen in classification problems so that was a
brief about adaptive boosting now let's understand gradient boosting so gradient
boosting is also based on the sequential and symbol learning model here what
happens is the base learners are generated sequential e in such a way
that the present based learner is always more effective than the previous one
basically the overall model improves sequential e with each iteration now the
difference in this type of boosting is that the weights for misclassified
outcomes are not incremented you're not going to increment or add weights to the
misclassified outcomes instead in gradient boosting what you do is you try
to optimize the loss function of the previous learner by adding a new
adaptive model that adds weak learners in order to reduce the loss function now
the main idea here is to overcome the errors in the previous learner's
prediction now this type of boosting has three main components it has something
known as the loss function loss function is the one that needs to be optimized
meaning that you need to reduce the error the other component is that the
weak learners are needed for computing predictions and forming strong learners
then you need an additional model that will regularize the loss function
meaning that it will try to fix the loss or the error from the previous week
learner right so you keep adding a model that
will regularize the loss function from the previous learner so just like
adaptive boosting gradient boosting can also be used for both classification and
regression problems now let's discuss the last type of boosting which is XG
boost now XG boost is basically an advanced version of gradient boosting it
literally means extreme gradient boosting so XG boosting actually falls
under the category of distributed machine learning community okay it's a
more advanced version of the gradient boosting method the main aim in this
algorithm is to increase speed and to increase the efficiency of your
competitions and off the model performance so the reason why this model
was introduced was because gradient boosting algorithm was computing the
output at a very slow rate right because there's sequential analysis of the data
set and it takes a longer time that's why XG boost was introduced it
basically boosts your or extremely boost the performance of the model so SG boost
is mainly going to focus on your speed and your model efficiency in order to do
this it has a couple of features it supports parallelization by creating
decision trees parallely there's no sequential modeling in this it
implements or something known as distributed computing methods for
evaluating any large and any complex modules it also uses out of core
computing in order to analyze huge and varied datasets it implements kashchei
optimization in order to make best use of your hardware and of your resources
overall so guys these were the basics of the different types of boosting
algorithms now to make things a little more interesting let's run a practical
implementation a short disclaimer before I get started with the demo is that I'll
be using Python to run the demo so if you don't know Python I'll leave a
couple of links in the description box you can go through those links and maybe
then come back and watch this video so now let's understand what exactly we're
going to do in this demo so your problem statement is to study a mushroom data
and build a machine learning model that in classify a mushroom as either
poisonous or edible by analyzing the features of the mushroom so you're going
to be given a mushroom dataset what you have to do is you have to understand
which of these mushrooms are edible and which is poisonous so this data set
basically has mushrooms of 23 different species and a species is either
classified as edible mushrooms or non edible ones right so guys the logic
again over here is to build a machine learning model by using one of the
boosting algorithms in order to predict whether or not a mushroom is edible so
let me quickly open up the code for you I hope all of y'all can see the console so we'll wait for this to run until then
lemme just run you through the entire code so like any other demo what you do
is you start by importing the required packages now the best thing in Python is
that there are inbuilt packages and libraries it let you implement any
complex process all you have to do is you have to import these libraries so
that's exactly what I'm doing over here after that I'm loading the dataset into
a variable known as data set and so basically this is my data set it's
stored in this location at all I'm doing is I'm reading it and I'm storing it in
this variable after that we'll perform data processing here we will define the
column names at in our data set the column names are not defined and I'm
just defining all the column names here and then we're assigning these column
names to our data set next we're running this print data set info to look at all
our features so these are our data columns so in total we have 23 features
meaning that there are 23 variables are to which one variable is your target
variable so the target variable is your output variable that we're trying to
predict and the rest of the variables bruce's cap color cap surface all of
these are predictor variables next what we're going to do is we are going to
drop this target variable from our data set because we are trying to predict
this we're trying to predict the value of this target variable so we'll just
drop this variable because we are trying to predict that vehicle our Y will
contain our target variable our X will contain no target variable now Y is
basically created for evaluating your model so guys I hope you all know what
all of this is right I'm not going in-depth into this because this is basic
machine learning and I'm hoping that you'll have a good idea about machine
learning if you're studying boosting machine learning next you're performing
something known as data splicing which is basically splitting your data set
into your training and testing data set right this variable here defines size of
your testing data set so 30% is assigned for testing and 70% here is assigned for
training after that we're creating a model by using the decision tree
classifier as our base estimator right base estimators basically your weak law
and here we're using the entropy method in order to find the best attribute for
the route vehicle this is a part of decision tree next for calling this
function adaboost classifier now this is an inbuilt function that would basically
do the exact same thing that an adaptive boosting classify is supposed to do and
the three important parameters that you pass through this function base
estimator n estimators and learning rate your base estimator is basically you're
a weak learner right and by default a weak learner is always the decision tree
right so what we're doing is which is passing the variable model over here in
model we've stored the decision tree classifier next we have n underscore
estimators so this field specifies the number of based learners that we are
going to use right it just specifies the number of week learners that we have in
our model we've assigned a number of 400 then we have learning grid now learning
rate specifies the learning rate of course which we have set to the default
value as one let me just clear that line that's not
needed next which is fitting our training data set into our model here
which is evaluating our model and seeing how it will predict the values when you
give it the testing data set next which is comparing our predicted values with
our actual values when you do that we get an accuracy of 100% and here you can
see the accuracy is 100% which is perfect because you know this is
expected when you use boosting machine learning algorithms now instead of using
boosting machine learning if you try this with just weak learner models like
decision trees then your accuracy will not be hundred-percent
there's always some of the other problem especially where decision tree is
overfitting can occur so the best way to get your model to increase its accuracy
is by using boosting machine learning algorithms that's exactly what I wanted
to prove to y'all that boosting technique will help you increase the
accuracy of your model so guys with that we come to an end of today's session if
you have any doubts regarding this session then you can leave them in the
comment section I hope you enjoy the class
and until next time happy learning. I hope you have enjoyed listening to this
video please be kind enough to like it and you can comment any of your doubts
and queries and we will reply them at the earliest do look out for more videos
in our playlist and subscribe to Edureka channel to learn more, happy
learning.

5 thoughts on “Boosting Machine Learning Tutorial | Adaptive Boosting, Gradient Boosting, XGBoost | Edureka”

  1. Got a question on the topic? Please share it in the comment section below and our experts will answer it for you. For Edureka Python Machine Learning Course curriculum, Visit our Website: http://bit.ly/2FBUtO7

Leave a Reply

Your email address will not be published. Required fields are marked *