Linear Regression – Machine Learning Fun and Easy

hi and welcome to the new lecture in the fun and easy machine learning series today I'll be talking about linear regression linear regression attempts to model the relationship between two variables by setting a linear equation to the observed data one variable is considered to be an explanatory variable while the other is considered to be a dependent variable please don't forget to subscribe and click the bell icon for more videos on machine learning so the dependent variable it's a variable whose values we want to explain or forecast the independent or explanatory variable is a variable that explains the other variables and the values are independent dependent variables can be denoted as Y so you can imagine a child always asking why is it dependent on these parents and then you can imagine the X as your ex-boyfriend or girlfriend who is independent because they don't need you or depend on you a good way of remembering anyways linear regression can be used in one of two ways to establish if there's a relation between two variables or see if there is a statistically significant relationship between the two variables so for example you want to see how an increase in some tax has an effect on how many cigarette packs are consumed how many sleep hours versus discourse see what's the correlation between experience for salaries pokeymon versus urban density and Hospital area versus house price the second application is to forecast new observations we can use what we already know to focus unobserved values here are some examples of the way that linear regression can be applied so say is the return on investment of participants over time as well as stock price over time or to predict the price of the coin over there so you can think of linear regression as the line of this set so the line of best fit can be represented by the linear equation y equals a plus BX or y equals MX plus B or y equals B sub set 0 plus B sub set 1 times X it mostly learned us in school so B is intercept if you increase this variable the intercept moves up or down along the y axis and M is the slope or gradient if it changes then the line rotates along the intercept so theta is actually a series of x and y observations as shown on the scatter plot they don't follow a straight line however they do follow a linear pattern hence the term linear regression assuming that you already have the best fit line we can calculate the error term epsilon also known as the residual and this is the term that we would like to minimize along all the points in the data series so say we have our linear equation represented in statistical notation the residual fits into our equation as shown y equals B 0 plus B 1 times X plus Epsilon ok let's try doing this by hand in an example okay so over here we have our data set of Pokemon vs. open testing and this data set you can get from my github repository and the range EG 0 9 and machine learning F and E which stands for fun and easy go to lab 1 linear regression and then you'll have the homework package that you can work on and then as well as the solution over here on this site you'll be able to find on the same data set as well as the equations defining the coefficients of a and B and then define the correlation coefficient squared you can find an equation right over here and you can just follow the address okay so we have pokemon which is our now x-axis or y-axis is open SD and then we can create a plot a scatter plot of that and then we can just format it to make it look a little bit nice get adding some data labels and then we're going to be calling this urban density on the y-axis and then we're going to label this axis box one quantity now relative it is a mock data set you can fill it with any data that you want or you can import a legitimate data set is the equation that get from our trend line and we're going to create it on our own inoculated cost coefficients by hand so this is equation that we have and that over there is the sum of Y you can find it over here the sum of x squared which is over here and the sum of x over there and a sum of x times y and n is the number of items in our dataset so first we calculate x times y we take 59 times 81 and this is our answer and then we can just spread it across and then we go to our product of x and y we can create the sum of that so then that will give us sum of x times y and then you can do the same for x squared very thick squared of 43 and do the same for the rest the dataset and the same we can do for y so it acquired them it sells at six thousand five hundred sixty-one and then we have to sum already now you can do the same for Pokemon the sum of the pokemons quantity as well as the of intensity that will give us some of each of our features so we need to calculate our coefficients so we have a P and R squared so we can say equals put in our brackets and the first one is sum of Y and then you can enter the equation into the formula bar I'm going to speed up the spotters video so it's not so monotonous and then there you have it we have 65 which is the same number that we get from our trend line now we can do the same for B I've already done this before so I'm just going to copy that and basically you can do this by hand you get a feel for how to enter the equations let's see if you get the same thing so it's quite simple you just need to enter the equation as it is into Excel and then you can do the same for R squared as you can see linear regression is very easy simple okay so in the next lecture we're going to see how we can use this in Python using scikit-learn see in the next lecture please don't forget to Like subscribe and share as well as click the bell icon if you'd like to see more machine learning tutorials and also please support us on patreon so see you in the next lecture

46 thoughts on “Linear Regression – Machine Learning Fun and Easy”

  1. I am in a machine learning class and its not my strong suit, but this video explains everything 4x better than my professor! Nice job! +1 Sub

  2. I had so much hoped that my exposure to math ended in 1992 after my finals, and I'd never see another formula that I actaully needed to use again. Yet, here I am in 2019 yet again sitting with a science calculator open on one screen, excel on another and listening to yet another asian guy twice as bright as I am…. Darn!

  3. Here is a quick summary of linear regression:

    – Linear regression is finding out the best linear relationship that describes some data you have.

    – It is important to note, that you assume there is a linear relationship between your dependent and independent variables

    – Once you make that assumption, you next need to figure out the specific linear relationship
    – We know that the general form of a linear relationship looks like this: Ax + By = C

    – We want to find a specific linear relationship, i.e. a specific set of A, B and C, such that, this linear relationship fits our data best.

    – Let's expand on what we mean by "fits our data best"

    – We know that once we get a linear relationship, that relationship allows us to predict our independent variable (y) given our dependent variable (x)

    – We have some sample data (called the test data) where we allredy know the ys for given xs.

    – What if for each of our sample data, we compare the known y and the y that we get if you use our specific linear relationship?

    – So now, our goal has become: To find the specific linear relationship that will result in ys as close to your actual ys in your sample data as possible.

    – If you have done that, then you say that you have found a "best" fit line.

    – In the above example, we consider a situation where we only have one independent variable, but you could have many and the same concept will still apply.

    -If you have many independent variables, the general form of a linear relationship will look like: Ax + By + Cz + … = F where A, B, C and F are parameters, you want to find values of A, B, C and F that fit your sample data best. See nothing changes if you add more independent variables!

    The key thing to know about linear regression:
    You assume there is a linear relationship between your data and you then find a specific linear relationship that best fits your data.

    Thanks again for another very fun and informative video, I enjoy these a lot!

  4. Your video is amazing, which software you are using for making such a beautiful video please leave a comment…

  5. if only my teachers taught this concept in more practical manner, I wouldn't have been so confused. I remember learning this things during junior high school but never understood what's the purpose of learning this

  6. Brave, Great , no word to say — One of the finest way to teach people, while i dont have any mathematical background

  7. Great video! Where can I get the link of the linear regression implementation using python??
    Please help

  8. How do you calculate ESTIMATION times with x & y variables. Lets say you have money transfer method a,b,c – debit,credit,bank account and you have a dataset with send/receive times x & y and u want to ESTIMATE the delivery time with the video's model. But x & y are in time frames not in definite numbers? like send at 14:03:01 on 01/03/16 , received 18:07:01 on 03/03/16 , should i convert to seconds or minutes or hours.min.sec. How do I do ESTIMATIONS on a dataset? Thank you

Leave a Reply

Your email address will not be published. Required fields are marked *