Machine Learning Fundamentals: Bias and Variance



Hurricane Florence came by while I was working on stat quest dark clouds filled the sky but that didn't stop stat quest stand quest hello I'm Josh stormer and welcome to stat quest today we're going to be talking about some machine learning fundamentals bias and variance and they're gonna be clearly explained imagine we measured the weight and height of a bunch of mice and plotted the data on a graph light mice tend to be short and heavier mice tend to be taller but after a certain weight mice don't get any taller just more obese given this data we would like to predict Mouse height given its weight for example if you told me your mouse weighed this much then we might predict that the mouse is this tall ideally we would know the exact mathematical formula that describes the relationship between weight and height but in this case we don't know the formula so we're going to use two machine learning methods to approximate this relationship however I'll leave the true relationship curve in the figure for reference the first thing we do is split the data into two sets one for training the machine learning algorithms and one for testing them the blue dots are the training set and the green dots are the testing set here's just the training set the first machine learning algorithm that we will use is linear regression aka least squares linear regression it's a straight line to the training set note the straight line doesn't have the flexibility to accurately replicate the arc in the true relationship no matter how we try to fit the line it will never curve thus the straight line will never capture the true relationship between weight and height no matter how well we fit it to the training set the inability for a machine learning method like linear regression to capture the true relationship is called bias because the straight line can't be curved like the true relationship it has a relatively large amount of bias another machine learning method might fit a squiggly line to the training set the squiggly line is super flexible and hugs the training set along the arc of the true relationship because the squiggly line can handle the arc in the true relationship between weight and height it has very little bias we can compare how well the straight line and the squiggly line fit the training set by calculating their sums of squares in other words we measure the distances from the fit lines to the data square them and add them up just they are squared so that negative distances do not cancel out positive distances notice how the squiggly line fits the data so well that the distances between the line and the data are all 0 in the contest to see whether the straight line fits the training set better than the squiggly line the squiggly line wins but remember so far we've only calculated the sums of squares for the training set we also have a testing set now let's calculate the sums of squares for the testing set in the contest to see whether the straight line fits the testing set better than the squiggly line the straight line wins even though the squiggly line did a great job fitting the training set it did a terrible job fitting the testing set in machine learning lingo the difference in fits between data sets is called variance the squiggly line has low bias since it is flexible and can adapt to the curve in the relationship between weight and height but the squiggly line has high variability because it results in vastly different sums of squares for different data sets in other words it's hard to predict how well the squiggly line will perform with future data sets it might do well sometimes and other times it might do terribly in contrast the straight line has relatively high bias since it cannot capture the curve in the relationship between weight and height but the straight line has relatively low variance because the sums of squares are very similar for different data sets in other words the straight line might only give good predictions and not great predictions but they will be consistently good predictions BAM Oh No terminology alert because the squiggly line fits the training set really well but not the testing set we say that the squiggly line is over fit in machine learning the ideal algorithm has low bias and can accurately model the true relationship and it has low variability by producing consistent predictions across different data sets this is done by finding the sweet spot between a simple model and a complex model oh no another terminology alert 3 commonly used methods for finding the sweet spot between simple and complicated models our regularization boosting and bagging the stat quest on a random forest show an example of bagging in action and we'll talk about regularization and boosting and future stat quests double bam hooray we've made it to the end of another exciting stat quest if you liked this stack quest and want to see more please subscribe and if you want to support stack quest well please consider buying one or two of my original songs alright until next time quest arm

36 thoughts on “Machine Learning Fundamentals: Bias and Variance”

  1. Have watched many of your videos and that have forced me to write a comment, Stat Quest is AWESOME!! and @Josh Starmer, I am you fan. The way you begin your videos and go about explaining some of the most difficult concepts in Statistics and Machine Learning is GREAT. Many books and tutorials mention making the complex simple, but rarely do so. This channel is not one of them, it truly makes things simple to understand.
    I have just one request (i think most of your followers would agree to this point), please write a book on Machine Learning and it's application of various algorithms (may be a series of books).

  2. please explain what is the term bias in a linear regression formula ?
    please explain at simply as possible.
    thank you

  3. Thank goodness you exist… I've never ever understood why squaring the distances mattered until your foot note at 3:12

  4. Thanks for the lovely explanation, Sir… Could we fit the squiggly line by using the Maximum Likelihood Estimation?

  5. 3:25
    KEVIN
    We follow the usual pattern which is to
    (i) import the class
    (ii) instantiate the model and
    (iii) fit the model with the training data.
    (iv) Then we'll make our predictions by passing the entire Feature Matrix (X) to predict method of the fitted model and
    (v) print out those predictions. Let's store those predictions in an object called y_pred.
    >>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<
    (i) from sklearn.linear_model import LogisticRegression
    (ii) logreg = LogisticRegression()
    (iii) logreg.fit(X, y)
    (iv) logreg.predict(X)
    (v) y_pred = logreg.predict(X) (3:47)

  6. I think the 'variance error' here is simply caused by data noise, and it is equal for both high-variance models and high-biased models. Anyone has the same feeling?

  7. From Intro to Statistical Learning with Application in R. I fully grasp the picture of Bias and Variance. In addition, flexible techniques vs less flexible techniques now cement into my memory, before I just crammed the terminology without knowing exactly what it means. I will be a constant goer to this channel

Leave a Reply

Your email address will not be published. Required fields are marked *