Machine Learning Algorithms | Data Science Algorithms | Intellipaat


hey guys welcome to another session by
Intellipaat driving a data-driven business using machine learning is
considered an important aspect in today’s world top companies such as
Amazon Facebook Apple and many more used machine learning to perform advanced
analytics and drive their business to success in today’s session we’re gonna
have a quick look into the world of machine learning algorithms now before
we begin do subscribe to Intellipaat YouTube channel so that you never miss
out on any of upcoming videos now let’s have a look at the agenda for today’s
video first we’ll understand why we require
machine learning algorithms then we’ll further understand what these algorithms
actually are after that we’ll take a quick dive into the world of machine
learning algorithms and finally we’ll do a couple of demos using these algorithms
also guys if you’re looking to get certified in data science Intellipaat
provides data science certification training courses for more details you
can check out the description without much further delay let’s get started
what do you think is the need for something called as an algorithm well
consider this situation right so let’s say you’re either baking a cake or
you’re driving your car you’re even walking or singing well your body is
continuously oh you know executing the set of steps that you have already
trained it to do and then this is what we call as an algorithm so basically
when you when you driving your car your brain is already programmed to do all
the tasks that are required to pretty much help you to you don’t drive your
car and then when you’re walking as well how do you maintain balance well as a
kid if you could realize that maintaining your balance as a toddler
was very difficult but then you trained yourself every day and then now you can
walk very easily right so this process which involves learning and then this
repetitive process is again pretty much can be termed as an algorithm as well
guys well if you’ve been wondering if algorithms are new concepts well they’re
not algorithms have been used for decades together well back to this
person on the screen called Alan Turing this person appears a good fact this
person was the reason probably why World War two ended he was the one who decoded
the very famous encrypted enigma messages from Germany and then this
person decoded that and then and all the code breakers and so much more right so
the entire point here is to tell you that algorithms have been used as an
age-old tradition that’s being used and these days we’ve been pushing it to our
computer science field as well and then making sure that we make full use out of
it guys and then again why would we require it well think of the huge amount
of data that’s being generated these days and then think of the methods that
we’d need to process it to understand the data or to process the data and then
to you know pretty much clean up the data and work with it right so for all
of these we have something called as algorithms guys so on that note what are
algorithms what is the formal definition of an algorithm
well guys algorithms are as simple as this they are just a set of rules or you
can call them as processes as well to be followed in calculations
or any other problem-solving operations when done by a computer well house how
simple is that well this is exactly what an algorithm means well you have a turn
a symbol on the left hand side that is pretty much what a flowchart looks like
as well or don’t worry you’ll just be checking out the flowchart sections in
the next set of this slide but then right now I want to tell you guys it you
guys are using algorithm as is well step one you’re looking at your screen while
you’ve programmed yourself to look at the screen that’s an algorithm and
YouTube is running a recommendation algorithms where you just aw let’s say
you search for something Python tutorials or anything for that matter
right intellipaat videos are up there so how does YouTube know that you know
it should recommend intellipaat’s videos to its learners
well again an algorithm is being said there and every time you check your mail
you mails a filter in your inbox or in your spam folder and so much more so how
does do a Google or gmail know what or what mail is a spam mail what mail is
not a spam mail right so that again is an algorithm right there and no matter
what operating system you’re on Windows right now or let’s say iOS let’s say Mac
OS Android whatever right so all these operating systems are using algorithms
right now and on that note we let’s quickly break it down into simple terms
and check out the relationship between a pseudocode and a flowchart guys a quick
info guys if you’re looking to get so defined in data science Intellipaat
provides the data science certification training courses do check out our
website for more information let’s continue the session so here it is a
very simple piece of code for you guys this is what we call a pseudocode or a
pseudocode is almost a high level language code it just looks a little
very literal and then you can figure out what the code is doing even though you
might not be a native programmer so once the code part of it and the other is
what we call as the algorithm which the flowchart alongside it so pretty much
we’re inputting a single variable a putting the value 10 to it we’re
inputting our variable B or putting a value of 20 to it we’re adding it pretty
much so C will have the value 30 right now right
a plus B is 10 plus 20 and then we out putting that the word start and stop
again are a part of the pseudocode flowchart relationship and on the right
side if it can just take a look this is what the flowchart of this exact
pseudocode will look like guys well this was very simple so let me quickly step
it up one single notch you know where we can go about checking another pseudocode
flowchart relationship guys so here again we’re inputting a inputting B and
then we’re making sure that until a becomes equal to B will be printing all
the values from A to B and then we’re gonna be increasing a by one so right
now a is 10 it’s gonna check if you know 10 is equal to 20 it’s not so until 10
becomes 20 we’re gonna start printing out everything so the answer is going to
be 10 11 12 13 14 all the way until 20 and this is going on around an iteration
in a loop if you can figure out the diamond box is called as the decision
box where it has two tracks one is one can be your true/false strike or a
yes/no track and in this particular case we have the yes/no track here guys
so on that note we need to understand why we would require all these
algorithms in machine learning right so before that why would we even require
machine learning well guys again the machine learning definition can pretty
much can be given you know to the world as the ability for a machine to learn
something without it being programmed for that particular thing well how cool
is that it is again basically the field of study where computers use a massive
amount of data and they apply all of these algorithms were training
themselves how here’s the keyword training themselves and again making
predictions on that right so again training in machine learning entails
feeding a lot of data into the algorithm and allowing the machine itself to learn
more about the process information well you’re gonna just tell the Machine a lot
of basics probably or just show it one iteration where the Machine pretty much
goes on to figure out say 9 or 10 more iterations on its own it’s gonna learn
on its own it’s kind of process on its own and pretty much you know you can
work with that data later on right so again we can call this a process of
converting just raw data into useful information as
but then we’re doing it with the help of these algorithms that we’re about to
learn guys so on that note or we need to check out what the types of machine
learning are so we have three main types of learning which happens when we talk
about machine learning guys it’s supervised learning
it’s unsupervised learning and it’s reinforcement learning guys so if I were
you guys I would just suggest I would just suggest you guys just take a minute
pause on the slide to note these three types of machine learning guys
supervised learning unsupervised learning and reinforced learning if
you’re already familiar with the concepts or if you think that you got it
in the bag well let’s more to check out what supervised learning actually means
oh well supervised learning as the name suggests requires some sort of
supervision right let us talk in terms of variables so we can understand it
easily again in super wise machine learning algorithms let’s say we have
input variables and our output variables these input variables are denoted by X
and the output variables are denoted by Y so X is input Y is output the goal of
any supervised learning system is to understand how your output variable Y
changes with respect to the change made in terms of X guys so how does the
output variable Y vary when we go about playing with our input variable X is
pretty much the goal of for supervised learning system guys and then here will
also be approximating the mapping function or to a point where we’ll have
new input data coming in which we haven’t seen which the machine hasn’t
seen and then we can predict new output variables Y with respect to all the new
data the new X data that the machine just saw so we have pre ended for a
particular amount of X’s and then it saw a new amount of data a new amount of
input variables and then it trains itself to pretty much give us new a Y
output value guys so how cool is that right and then we need to also know that
we have dependent variables and the concept of independent variables right
and our aim here is to pretty much understand how our dependent variable
will change with respect to one independent
variable so we have a couple of dependent variable with, you know
goes hand-in-hand with all the variability call as the independent
variable and then we need to understand what are the changes that goes into
these dependent variables when they are mapped across and compared or with
respect to our independent variable says just to make sure that you guys are
getting the concept out here here’s a very simple example showing you the same
so again here our independent variable in our particular cases let’s say our
gender of the student we have a girl and a boy here the dependent variable can be
the outcome of the educational qualification of these Students so
let’s say if the student either passed an examination or fail an examination
this becomes our dependent variable so the independent variable is our gender
the dependent variable becomes the output of what the student is trying to
do and at the end of it what we’re trying to do is basically trying to
determine whether the student would pass the exam or not based on the person’s
gender let’s say we’re doing a survey where we need to find out how many girls
have passed or how many boys have passed here again the gender becomes the
independent variable and all of that depending on it in our particular cases
the outcome the paths of the fail becomes dependent right so here again we
trying to find out if the student would pass based on the gender or not so the
dependent variable would pretty much here be again now as I’ve already been
mentioning it’s going to be the outcome and the independent variable is going to
be the gender guys so do we have anything more in terms of supervised
learning well yes guys here is more classification with respect to
supervised learning as we have for something called as classification and
something called as the regression let us quickly check out water regression is
and then we can come talk about classification guys well regression is a
type of supervised learning where the output variable is a continuous numeric
value to what do we mean by a continuous numeric value right so let me again take
another quick example to make sure you guys understand this better
I’ve images of two apples for you guys one Apple cost four dollars the other
Apple costs are three dollars here the output variable is the cost of the Apple
it is a numeric value which is a nice value you can predict it right is
the Apple ripe if it’s yes then its costly if it’s not yet ripe then it’s
cheap well is it or Shimla Apple as a Kashmiri Apple is it a Washington
Apple well you can you can pretty much go on adding so many factors around this
Apple and then come up with one particular outcome out of it which would
be the price right so the price depends on all of these factors and in our case
the price is the output variable so we’re trying to predict the cost of the
apple with respect to all these other factors right so again doing this in a
real-world or in a mathematical situation and in this situation pretty
much we call it as a regression guys a quick info guys if you’re looking to get
Certified in data science intellipaat provides the data science a certification
training courses do check out our website for more information let’s
continue the session so with respect to regression again there is another type
of regression which what we call it as the logistic regression and this is
basically just a technique you know where our dependent variable instead of
it being a country it’s numerical value it is a categorical value guys so again
what do we mean by this time for an example if you can take a look at the
example on your screen right now what we’re trying to do is we’re trying to
predict whether or if it’s gonna rain on that particular day or not and this is
being done with respect to two independent variables right so how do we
check rain again pretty much it’s usually done by checking the temperature
or checking the humidity and if all of this is good we probably just go out
take a look at the sky or to check for clouds and so much more right and you’re
coming back to logistic regression the dependent variable is the categorical
variable right so it can have only two values a categorical variable can only
have two values it is mostly binary guys so it is going to be either zero or it’s
gonna be one and in this logistic regression model what we call it
depending on all of these attributes or we get the probability our final answer
is going to be either yes or no right so if you ask someone a question is it
gonna rain their answer might be either a yes or a no right so it’s a binary
answer again here it’s the same as well again so – pretty much – graph out what
it would look like we have an s-shaped curve out of this model what
we call as the logistic regression case so on the Left we have a linear
relationship between our dependent variables and the independent variables
and it’s just a straight line on the right since it’s a binary value by the
outcome that we were looking at the curve looks like an S so again guys take
a moment pretty much pause on this slide to understand what a linear regression
graph looks like versus what a logistic regression graph looks like so on that
note let us quickly come back to check out the next subdivision under
supervised learning which is called as classification guys oh you pretty much
as the name suggests you might already know what classification means in
literal terms well again classification here the output variable is
categorical in nature so again it’s going to be a binary value so you can
just have a have a look at the picture on your screen and then we can
categorically analyze if that person is a male or a female right so here the
buyer your outcome is again the gender of the person if the person’s either a
man or a woman and then again the output variable is the gender of the person
which is a categorical value and we are trying to classify this person into a
specific gender or based on all the other factors as well well how do we
know it well we could see the beard on the face it looks like a man so our
brain pretty much told us it as a man right simple as that so on that note of
we’ve pretty much checked out what supervised learning is so what is
unsupervised learning well guys in unsupervised learning or
all of the algorithms that we have right we have input data which has no labels
so when we mean that we the data does not have any labels then there is
nothing that the Machine can map to understand the data offhand very easily
so if we can take a look at the raw data ourselves right so we can probably tell
that it there’s a couple of fishes in there there’s a couple of birds in there
well we know it because we have trained ourselves for that when the machine sees
this there’s not gonna be any label which is going to tell that this is a
fish or this is a bird so our unsupervised learning algorithm
is pretty much going to run through this again and at the end of it with respect
to clustering what we call is the process of clustering it’s going to
divide all the fishes for us divide all the birds for us on its
on so here the input data has no input labels has no class labels and it
doesn’t know what’s a fish what’s a bird right so again building a
supervised or unsupervised model on top of this input data is again very
interesting and very fun guys so here again is going to pretty much be
giving out two clusters first consists of all the fishes and second consists of
all the birds guys so coming to clustering which is again a major part
of unsupervised learning the most important clustering algorithm the most
simple one is the k-means clustering guys well k-means clustering again is an
unsupervised machine learning algorithm where the aim is to pretty much go about
grouping all the similar data points just like fishes and birds and making it
to do one cluster race so again there must be already high I know intra
cluster similarity and low inter cluster similarity out here right so what do we
mean by that well all the data points you know within a cluster should be as
similar as possible and all the data points in between two different clusters
must be as different as possible so all the data in one cluster is simple and
similar all the data when you compare two different clusters are very
different to each other right so this is pretty much the k-means clustering in
just a sentence guys well what is the K stand for on the k-means clustering
right well k is the number of clusters that you just want the outcome to be in
a particular case we have close to A cluster B and cluster C so the K value
here is three because we have three different clusters right very very very
simple as that guys so on that note the next type of learning that happens is
what we call as the reinforcement learning guys again in reinforcement
learning or there is something called as an agent and this agent pretty much runs
up and returns up most effective actions for us by mapping its state at every
single moment guys so to give you a better clarity just so I I hope you guys
have played pac-man in your raw in your olden days guys so in this particular
video game the space around or around the figure should what we call as a 2d
game space again you have all you have something called is packed dots you have
enemies you have walls and so much more right so the action here is to again
just pretty much more around and make sure you don’t
bad guys and just finish your entire goal here how do you know what the who
the good guys are and where you need to move and how you you’re not supposed to
you know get out every single time right so that particular thing you’ve been
playing this game for a while or let’s say you’ve been playing this game for a
couple of hours couple of days in your childhood and then you realize how the
game actually works well that exactly is reinforcement learning guys again to
give you another example reinforcement learning is pretty much how a dog or a
cat has trained in its real life as well if the dog does something right if the
dog has given a handshake let’s say we’re training a dog to give a handshake
and then if the dog is given a handshake you might see that the trainer just
feeds a biscuit that instant right so the dog knows that the outcome of giving
and a handshake is pretty much the right thing to do because there is a biscuit
at the end of it so the reward is being hunted by the animal right so again to
put it all in one single picture this would or reinforcement learning
environment would look like I guess so we have an agent who performs an action
in an environment and then here we can actually have two tracks where it if the
agent does it right if the task is being performed right there is a reward with
respect to it and everyone’s happy yeah else if you do not have that particular
reward then it means that something went wrong and this will have a state because
something went wrong you’re eventually not getting the reward let’s say the dog
did not give you a handshake or if you pretty much give it a biscuit at that
moment it will not realize if it’s doing the right thing or the wrong thing right
so that we can have a state of let’s say the dog did not give a handshake and
that’s pretty much what st means guys a reward is RP and this keeps on going in
a nitration where you’re just training your model better and better and better
to hunt more rewards the more the rewards then the machine is doing the
right thing it’s as simple as that case so all that note I have two very simple
demos which are in Python that I just quickly want to run it by you guys to
tell you the use of machine learning algorithms anyway also on that note let
me quickly jump into Google collab a quick info guys if you’re looking to get
so defined in data science intially path provides data science certification
training courses do check out our website for more information
let’s continue the session google collab is basically a Python or Jupiter
notebook hosted on the Google cloud and I use this for most of my Python coding
as well so anyway coming back to it here’s the
here’s the first example that we’d like to discuss with you guys well just give
me a second the runtime is being connected so it’s almost connected now
it’s initializing and then it’s gonna say connected any minute time and there
it is so first let us take out a k-means clustering demo right so pretty much
we’re gonna import a couple of packages such as numpy pandas we have matplotlib
to pretty much give us the output in terms of graphs we have SK learn to
pretty much import of what we have the sub library called as the k-means
library and then go on working with it so let me quickly import all of these
libraries that we’ll be making use of and then go ahead with that so to
generate a data of our own instead of just picking it up from any data set for
this particular case we’ll be making our own data using something called us make
underscore blobs case so we’ll have 300 samples here and then we’ll have four
clusters each so this is what we mean Zen and disco samples is $300 we have
300 dots on your screen right now and these dots are divided pretty much into
four clusters for us so let us use something called as the elbow method or
we’re pretty much it’s called as W CSS I would recommend you guys pretty much
google it what would if you want to know what W CSS means it does again a very
complex part of the k-means algorithm and and i would just suggest you guys to
check it out on your own because it is not on the scope of this particular
tutorial and then so we’ll be using that particular method and we’re gonna tree
in the entire model for us or to make it understand what’s going on so look at
this right so what does the optimal number of clusters again for us is
somewhere around or say 3 or 4 as well so we have 4 clusters and we have the
WCS s all the way from 2500 or till 0 right so we’re gonna have to categorize
this is just a graph to tell us what the data might look like right so we need to
find out the centroid of what we call as the centroid in our k-means clustering
algorithm of each different cluster and then we need to mark that Center
right so this is exactly the red dot what you see is again exactly what’s
going on then so if pretty much found out that there
are four clock clusters that exist and then we’ve pretty much mark the centroid
of the of the four different clusters that you see are using k-means
clustering guys it’s as simple as that so that was a very simple first demo
right for a second scenario I will be checking out our logistic regression and
in this particular case we’ll be going on to predict a heart disease prediction
data set and we’ll be performing our machine learning algorithms and we’ll be
using machine learning here to predict if a person is gonna have a heart
disease or not and we’re gonna be doing this entirely using the process of
logistic regression guys again we’re importing a couple of libraries here
pandas to handle the data on numpy 200 mathematical operations Skype right to
go on to do our computations then we have matplotlib and Seabourn – pretty
much to give us visualizations and we have SK learn which is a sky kick learn
which is again a very important machine learning library of Python and we’re
gonna import all of these guys so just before that we need oh we need the data
set file which is called as the framingham data site well the data set
is from the town of Framingham in Massachusetts so let me just quickly you
know import the file which is called as the Framingham dataset and then we can
pretty much go on to working over that guys so you know it’s gonna take a
second to pretty much get uploaded it’s a small file and as you can see it’s
been uploaded so now I can go out to pretty much run this code where this is
what our dataset would look like oh if it’s a binary value for mail it means if
it’s mail equal to one then the person’s mail if mail equal to zero it means the
person’s if email there it has the age it has if the person is a if the person
is a current smoker or not and how many cigarettes per day do you have PP Mandic
BB medications and their blood pressure basically and then have you had a stroke
in your life are you diabetic what is your total cholesterol what is your
systolic blood pressure what is your diastolic blood pressure what is your
body mass index what’s your heart rate what’s the glucose that and then it’s
not check your or CHT as well and so much more so this isn’t a me
using data said to work with and pretty much we’re gonna be just replacing the
column of mail by sections command that’s about it what we’re doing here
and then we need to find out how many missing values we have in this
particular data set and there are so many values with zeros in it right so we
have a about 388 missing values when it comes to glucose 50 missing values when
it comes to cholesterol and so much more so let us go on to you know remove all
of these missing values and say hey look it found pretty much about 500 or total
number of rows with missing values right and it’s fine in our particular case
because it’s only 12% of the entire data set so we can exclude that and we can
pretty much drop it and you know it wouldn’t hurt our analysis at the end of
it so to begin with you have to perform some exploratory analysis where we need
to show what the data is being distributed like I mean we just hunt
into our data to find out what the data is telling us right so here’s a couple
of for quick charts which pretty much give us all of our numerical data with
respect to graph so we have again the sex distribution we have the age
distribution current smokers BP medications distribution cigarettes per
day up again our diabetics total cholesterol is BMI systolic blood
pressure the weekend diastolic blood pressure and so much more right so we’re
just pretty much performing some quick exploratory analysis analytics on it and
then are they gonna be going about to find out what the actual this is just a
10-year raw CHD that i’m printing out and then we need to go about finding out
if the person has a rate you know has a chance of forgetting a heart disease or
not well here we can check out the count right so there are about 500 let’s say
600 people who are in the risk of getting a heart disease while there are
about 3,500 or let’s say 4,000 people who are healthy and quite well this is
what exploratory analysis you know pretty much helps us to do it gives us a
sort of an analytics number where it can find out of the person might you know
suffer from our heart disease or so in the near future and so much more right
so let us quickly you know go about plotting that and we can go out from
that well as you guys could see that pretty
much took about a minute of processing because it has to plot so many values
for us right I’m sorry let me quickly scroll down so we can get a better view
again this is respective this is a seaborne access grid plot and then you
can see all the concentration of all the values at every particular instant right
this is for every single aspect that we are using to compare so let us quickly
use describe to pretty much tell us what we’re just looking at and yeah so we
have a count of about three thousand seven fifty one males thieves it’s gonna
give you the age of so many people it’s gonna give you all the cigarettes BB
Mets prevail and stroke and so much more right so coming to the process of
logistic regression out here from all these data set we need to make we need
to have an inference at the end of it right so to do that we pretty much be
running a couple of functions one of those functions is lambda function and
then we can have this very nicely optimized output printed for us and then
as you can check out as it already says the tenure or CH D is pretty much our
dependent variable will be using logistic regression so much more right
so it’s going to give you all the standard errors all the values of we
call it the Z method it’s going to be the Z method value it’s gonna check if
your probability of your outcome is greater than or the value of Z with
respect to all of these single categorical variables that were checking
and then when it comes to backward elimination will pretty much be using
our off each of selection to go about doing it and the end of it we can have a
summary very nice looking somebody printed for us oh well again the
somebody looks nice right so we need to make more sense out of it such that okay
this is the odds this is the ratio around so here we have something called
as the p-values we have the odds ratio and the CI 95% value is out here so here
we can pretty much go on to analyze what actually causes or you know the the
outcome of let’s say our heart disease and so where we can make sense out of it
to use our model to make sense out of us let’s quickly split our row one single
dataset into a training data set and our testing dataset and let us make our
model give us the answer for us right so checking out model accuracy using our
raw skycat law library again you can pretty much find out that our
model is almost accurate for about 90 percent right so eighty-eight point one
four percent is a big number and it’s been training well not for many times
right so the number of high iterations again is very less so here’s our subplot
is what we call as an access subplot and here as well you can pretty much check
out the actual predicted outcome values which is predicted one predictor zero
the actual outcome values is this color while the actual values blue color right
so the color distribution here again will let you know if what’s going on
there as well well here is another step to pretty much print out what’s you know
what’s a true or true positive rate of the data true negative date of the data
and so much more to put it all into one single print statement to make it sure
it looks very nicely the accuracy of our entire model is about 88% the miss
classification is pretty much 1 – so what the accuracy is right so we’ve
missed about 11 percent of accuracy true positive rates we are somewhere about 4
percent – negative rates we have somewhere around 99 percent positive
prediction rate is 80 percent negative prediction rate is somewhere around 88
percent and so much more right so look at this amount of data look at this
amount of data that our machine learning algorithm is up is pretty much giving us
right so if you put it literally you know in terms of for use cases in terms
of medicine then this is going to help a lot of people right so that was a quick
walk through you know pretty much on how you can go about using gain means
clustering and logistic regression algorithm sketch all right guys I hope
this video is helpful to you if you have any further queries do let us know in
the comment section below we’ll reach out to you immediately so guys thank you
so much for watching this video and giving us your precious time

6 thoughts on “Machine Learning Algorithms | Data Science Algorithms | Intellipaat”

  1. Guys, what else do you want to learn from Intellipaat? Comment down below and let us know so we can create more such tutorials for you.

  2. 👋 Guys everyday we upload in depth tutorial on your requested topic/technology so kindly SUBSCRIBE to our channel👉( http://bit.ly/Intellipaat ) & also share with your connections on social media to help them grow in their career.🙂

Leave a Reply

Your email address will not be published. Required fields are marked *