Machine Learning Tutorial Python – 7: Training and Testing Data

in this video we are going to look into how to split your data set into training and test using SQL on train test split method usually when you have a data set like this sometimes we train the model using the entire data set but that's not a good strategy the good strategy is to split the data set into two parts where you use part of the sample for actual training and you use Raney remaining samples for tasting your model and the reason is you want to use those samples for tasting the models that model has not seen before so for example here if I use the first eight samples to train the model and then use the remaining two two to test then I will get a good idea on the accuracy of model because the model has not seen these two samples before the data set that we are using for this exercise is the BMW car prices data set here I have the mileage and the age of the car and the price that it was sold for so these are all the sold BMW cars along with their mileage age and sell price and here mileage and age are independent variables and selling price is the dependent variable in my Jupiter notebook I have loaded this CSV file into a data frame which looks like this and then I'm using some matplotlib visualization to figure out the relationship between my dependent and independent variable so here I have a plot of mileage versus the sell price and you can see a clear linear relationship here we can draw a line that goes through all these data points similarly for car age and sell price I have plotted another scatter plot and here also you can sort of apply a linear relationship so we are going to use a linear regression model based on this visualization I have prepared my X&Y here so X again mileage and age and y is the selling price the first thing we'll do here is use trained taste split method from SK learn dot model selection we are importing trained taste split method and then use this method supply x and y as an input and also you need to supply the ratio by which you are spreading so here I want my taste data set size to be 20% and my training data set size to be 80% so this is how you specify that as a result what you get is a X train data set then X taste data set y train and y taste you get four parameters back okay and if you look at the length of whatever you got back you will see it is 80% of your total data size your total data size here is 20 and based on 80 percent ratio my training data set is 16 samples look at test and it will be 4 as you would expect it to be so if you check the actual content of your X train you will see that it choose the random samples is not selecting the first 80% of the samples just using random samples which is good if you execute this method again and again you will see that the samples will change here now sometimes you want your sample to remain same and for that you can use this random state method if you use random state remember it's gonna use same samples okay so for random States value of 10 it will always produce the same output you'll see that my x train is not changing now right 22,500 use you'll see these values are not changing when I execute this multiple times whereas if I didn't have this parameter it was changing all the time so if you do to enter you can see that right okay now let's use our linear regression model so you can guess that I'm going to import linear regression class and create my classifier which is nothing but an object of this class and then you are use if fit method to actually train your model okay so I need to use extreme white rain and my model is strained now and now I can call predict method on my actual test dataset so my model is pretty predicting that the values predicted values for my X test are these let's look at how my voice looks so my Y test looks like this so values are kind of in similar range but not exactly so let's check the accuracy of the model by calling score method so what score method will do is it will use exte and predict these values and then compare it against this via test value then tell you the accuracy so here accuracy is point 89% okay that's just because of the nature of my data set but that's all I had for this tutorial you learned that how train test split can be used to split your actual data set you can change your percentage ratio between the two data set by modifying this if I do this then it will be 70/30 ratio okay so that's all I had for this tutorial I don't have any exercise but I have this notebook available in the video description below so feel free to download it and play around it thank you bye

26 thoughts on “Machine Learning Tutorial Python – 7: Training and Testing Data”

  1. What is alternative for score function in other modules? Actually i tried to run this code but unfortunately this is showing the following error.
    AttributeError: 'Sequential' object has no attribute 'score'

  2. With this DataSet sometimes i´m getting 40% accurate and in others 95%. I guess we can do a for loop and get the best data to our model.

  3. As usual awesome video, please put a series on either NLP, Speech Recognition or DEEP Learning….eagerly waiting for your new videos

  4. I tried the code with one dependent and one independent variable but it always fails at the training section i.e fit

  5. Good explanation but I am afraid your code downloaded from github website can't be opened on Jupiter notebook – "Unreadable Notebook: ……train_test_split.ipynb NotJSONError("Notebook does not appear to be JSON: '\n\n\n\n\n\n<!DOCTYPE html>\n<html lang…")'"

  6. Do you have an example for training machine for facial expression detection. I have dataset but dont know how to train, how to test and how to implement.

  7. sir can u guide my how to upload our codes on github account as like u have uploaded and project also…. plzzzz reply fast

  8. I checked the score for my model and got 79 % accuracy.In real time is this score enough? Generally what score/range is considered a good one ?

  9. But how will you show the relation between y_test and y_pred using matplotlib? Any help would be appreciated

  10. can you teach me training and testing of data for remaining useful life of aircraft engine ? using neural network ""?

  11. I have one doubt, I think @5:28, to compare train & test predicted data values, their must be comparison between y_test with y_train at jupyter In [101]

Leave a Reply

Your email address will not be published. Required fields are marked *