How to Make an Image Classifier – Intro to Deep Learning #6

How do we classify things? We consider people to be experts
in a field if they’ve mastered classification. Doctors can classify between a good
blood sample, and a bad one. Photographers can classify if their
latest shot was beautiful, or not. Musicians can classify what sounds good,
and what doesn’t, in a piece of music. The ability to classify well
takes many hours of training. We get it wrong over, and over again,
until eventually we get it right. But with a quality data set,
deep learning can classify just as well, if not better than we can. We’ll use it as a tool to improve
our craft, whatever it is. And if the job is monotonous,
it’ll do it for us. When we reach the point where we
aren’t forced to do something we don’t want to just to survive,
we’ll flourish like never before. And that’s the world we’re aiming for.>>Hello, world, it’s Siraj. And today, we’re going to build an image classifier
from scratch, to classify cats and dogs. Finally, we get to work with images. I’m feeling hype enough
to do the Macarena. [MUSIC] So, how does image classification work? Well, there were a bunch of different
attempts in the 80s, and early 90s, and all of them tried a similar approach. Think about the features
that make up an image, and hand code detectors for each of them. But there is so much variety out there. No two apples look exactly the same. So the results were always terrible. This was considered a task
only we humans could do. But in 98, a researcher named introduced a model
called a Convolutional Neural Network. Capable of classifying characters with a
99% accuracy, which broke every record. But CNN learned features by itself. In 2012, it was used by other
researcher named Alex Krizhevsky at the yearly ImageNet competition. Which is basically the annual
Olympics of computer vision. And it was able to classify thousands
of images with a new record accuracy, at the time of 85%. Since then CNN’s have been adopted by
Google, to identify photos in search, Facebook for automatic tagging. Basically they are very hot right now. But where did the idea for
CNN’s come from? [MUSIC] We’ll first want to download our image
data set from Cackle with 1024 pictures of dogs and cats,
each in its own folder. We’ll be using the Keras deep
learning library for this demo. Which is a high level wrapper
that runs on top of TensorFlow. It makes building models
really intuitive, since we can define each layer
as it’s own line of code. First thing’s first, we’ll initialize
variables for our training and validation data. Then we’re ready to build our model. We’ll initialize the type of model
using the sequential function, which will allow us to build
a linear stack of layers, so we treat each layer as an object
that feeds data to the next one. It’s like a conga line, kind of. No, the alternative would be a graph
model, which would allow for multiple separate inputs and outputs. But we’re using a more simple example. Next, we’ll add our first layer,
the convolutional layer. The first layer of a CNN is
always the convolutional layer. The input is going to be a 32 by
32 by 3 array of pixel values. The 3 refers to RGB values. Each of the numbers in this array
is given a value from 0 to 255, which describes the pixel
intensity at that point. The idea is that,
given this as an input, our CNN will describe the probability
of it being of a certain class. We can imagine the Convolutional Layer
as a flashlight shining over the top left of the image. The flashlight slides across all
the areas of the input image. The flashlight is our filter, and the region it shines over
is the Receptive field. Our filter is also an array of numbers. These numbers are weights
at a particular layer. We can think of a filter
as a feature identifier. As our filter slides, or
convolves around the input, it is multiplying its values with
the pixel values in the image. These are called element
wise multiplications. The multiplications from each
region are then summed up, and after we’ve covered all parts of the
image, we’re left with the feature map. This will help us find not buried
treasure, but a prediction. Which is even better. Since our weights
are randomly initialized, our filter won’t start off being
able to detect any specific feature. But during training, our CNN will
learn values for its filters. So this first one will learn to detect
a low level feature, like curves. So if we place this filter on a part of
the image with a curve, the resulting value from the multiplication,
and summation, is a big number. But if we place it on a different
part of the image, without a curve, the resulting value is zero. This is how filters detect features. We’ll next pass this feature map through
an activation layer, called ReLU, or rectified linear unit. ReLu is probably the name of same alien,
but it’s also a non-linear operation, that replaces all the negative pixel
values in the feature map with zero. We could use other functions, but ReLu tends to perform
better in most situations. This layer increases the non-linear
properties of our model, which means our neural net will be able
to learn more complex functions than just linear regression. After that,
we’ll initialize our max pooling layer. Pooling reduces the dimensionality
of each feature map, but retains the most
important information. This reduces the computational
complexity of our network. There are different types, but
in our case, we’ll use Max. Which takes its largest element from the
rectified feature map within a window we define, and will slide this window
over each region of our feature map, taking the max values. So a classic CNN architecture looks
like this, three Convolutional Blocks, followed by a Fully Connected layer. We’ve initialized
the first three layers. We can basically just repeat
this process twice more. The output feature map is fed into
the next convolutional layer. And the filter in this layer will learn
to detect more abstract features, like paws and doge. One technique we’ll use to prevent over
fitting, that point when our model isn’t able to predict labels for
novel data, is called dropout. A dropout layer drops out a random
set of activation’s in that layer, by setting them to zero
as data flows through it. To prepare our data for the dropout, we’ll first flatten
the feature map into one dimension. Then we’ll want to initialize a fully
connected layer with the dense function, and apply ReLu to it. After dropout, we’ll initialize
one more fully connected layer. This will output an n
dimensional vector, where n is the number
of classes we have. So it would be two. And by applying a sigmoid to it, it will
convert the data to probabilities for each class. So how does our network learn? Well, we’ll want to minimize a loss
function which measures the difference between the target output,
and the expected output. To do this,
we’ll take the derivative of the loss, with respect to
the weights in each layer. Starting from the last, compute the
direction we want our network to update. We’ll propagate our loss backwards for
each layer. Then we’ll update our weight values for
each filter, so they can change in the direction of the
gradient that will minimize our loss. We then figure the learning process
by using the compile method. Where we’ll define our loss as binary
crossentropy,which is the preferred loss function for
binary classification problems. Then our optimizer, rmsprop,
which will perform gradient descent. And a list of metrics which
will set to accuracy, since this is a classification problem. Lastly, we’ll write out our fit
function to train the model, giving it parameters for
the training and validation data. As well as a number of epochs to run for
each. And let’s save our weights, so
we can use our trained model later. Overall accuracy comes to be about 70%,
similar to my attention span. And if we feed our model a new
picture of a dog or cat, it will predict its label
relatively accurately. We could definitely improve
our prediction though, by either using more pictures, or
by augmenting an existing pre-trained network with our own network,
which is considered transfer learning. So to break it down, convolutional
neural networks are inspired by the human visual cortex, and offer state
of the art and image classification. CNN’s learned filters at each
convolutional layer that act as increasingly abstract feature detectors. And with Keras and TensorFlow,
you can build your own pretty easily. The winner of the coding challenge from
the last video, is Charles David-Blot. He used Tensorflow to build a deep net,
capable of predicting whether or not someone would get a match or
not after training on a data set. And had a pretty sweet data
visualization of his results. Wizard of the Week. And the runner up is Dalai Mingat,
clean, organized, and documented code. The coding challenge for this video
is to create an image classifier. For two types of animals,
instructions are in the read me. Post your GitHub link in the comments,
and I’ll announce the winner next Friday. Please subscribe if you want to
see more videos like this, check out this related video, and
for now, I’m gotta upload my mind. So, thanks for watching

100 thoughts on “How to Make an Image Classifier – Intro to Deep Learning #6”

  1. Can somebody please explain how does tensorflow calculate the confidence level (the percentage match) ? On what basis does it give the number?

  2. hi siraj
    big fan of your work thank you for providing such knowledge in a spectacular way,
    I am building a image processing classifier in which I have 66 classes i know I should duplicate the convolution layers but I dont know how I mean what parameters should I give in those layers
    thank you for reading 🙂
    please rply 🙂

  3. Can I identify a 'particular' object in an image using Keras/Python. Pls share video/source codes. Thanks. U r doing a gr8 job.

  4. Hey siraj very nice how to build an image classifier for gender recognition it'd be great if you do a video on it

  5. i really want to learn what's in these videos, but the format is so annoying. I don't understand why there's an ongoing slideshow of distractions, or how the rapping helps.

  6. After reading this blog post (the link is in the description) and watching the video for a second time, I have a much better understanding.

    I highly suggest checking out this article to supplement the video!

  7. Hello sir,
    i am currently working on project which identifies sugarcane leaf, cotton leaf and rice leaf from given input image leaf. then how i start this project plz explain me step by step. after this my aim is to identify leaf diseases by uploaded photo of crop leaf. plz guide me sir, and reply as soon as possible…..

  8. I wish you would have left a link to the code for each video. Not too sure I am ready for challenges but would like to see the 40 lines (or however many were used in video) of code from the current video.

  9. Hello Siraj,
    I want to get your opinion on a project that i'm starting:
    It is a deep learning and i feel that i need some advice to know in which direction i can go. Let's start: I want to use deep learning to build a model that recognizes many characteristic in an image. To be precise, i want to create a model that allows me to recognize in a photo: .if the person is a MALE/Female. .what type of clothing is he/she wearing .what are the colors of the clothes he/she is wearing.
    What do you think of the complexity of the problem ? I really need some guidance in order to start thinking of the possible techniques that could help me go deeper in this project. Do you advise me to start thinking first of a model that can olny distinguish male/femal (and ofc it needs to detect them first)?Or do i need to think of the subject from another perespective? What are the topics that i need to look for?
    Thanks in advance.

  10. I really enjoy the series, but I think when I initially watched this video upon its release I struggled with the core content at first because its easy to get wrapped up in jupyter or get stuck installing something, which creates a lot of additional friction when I need to learn the core concept.

  11. Noob question: I get an error on this line model.add(Convolution2D(32,3,3 input_shape=(img_width,img_height,3))) and I seem to overlook what I did wrong. Any help? :-/

  12. I've noticed that there is a noticeable gap between the val_acc and the acc , Isn't your model overfitted ??

  13. I've noticed that there is a noticeable gap between the val_acc and the acc , Isn't your model overfitted ??

  14. Hey, I trained a model and when I ran predict on an image, I got [[1. 1. 1.]] as output. What does that mean?

  15. Can I use this process for apple defect Multispectral Method for Apple Defect

    Detection using Hyperspectral Imaging System ?Im new in image processing

  16. I just had one doubt. Suppose that we have a filter to detect the curve as shown in the video but the curve is aligned in some direction. How would it still recognize it? Or is it even supposed to?
    In some cases, like identifying an object, the filter should be able to identify the object regardless of it's orientation. While in other cases like identifying numbers, it shouldn't cause it may confuse a 6 with a 9.

  17. You say 32 by 32 for the first convolution2D added to the model , yet what is the difference if we used 64 by 64. Would we gain resolution in data and accuracy in exchange for more processing time; or is this difference negligible and the results nearly equivalent?

  18. How can I make a neural network to extract the features from images of food wrappers and train the network ? As they don't have any specific shape or type..
    Please reply..

  19. y Predict_proba and predict in keras produce same results . I am preparing a htr for english . so i have 26 classes , now i need to get the probabilities for each class . but both predict and predict_proba gives only the predicted class . any solutions . My last activation layer is softmax

  20. literally when im running the first block it says.

    importError Traceback(most recent call last)

    importError: cannot import name load_data

    what is going on? any help??

  21. hey siraj, u're work's good! but u have been replying to all the comments that praises u but havent been able to answer the question that what does import parser do and what is "load_data"?? plz get back to this

  22. Hello, do you know any intelligence that recognizes hand gestures? I would like to do something like that using an OpenMV cam M7

  23. so sad…
    This vdo seems informative but presented steps are too advance to follow (many guidance missing).
    it is not helpful to beginner like me. 🙁

  24. One side thing I appreciated from this tute (as a beginner) is the sheer amount of space and computation power required to do all this. :'(

  25. Instead of singing and dancing, maybe you could just focus in explaining the code at a beginner level.

  26. Why are people so mad about it not being beginner friendly, machine learning is a very complex subject. In fact even this is easy in comparison to what each one of those functions are doing. He tried his best, I personally don't see how I might simplify this further.

  27. Your videos do have information for a quick grasp but dude there is too much distraction with all the pictures and animations and quotes and singing all that you put in your videos. It is ok to have a little fun thing but not so much that you prevent the viewer from focussing and interpreting things.

  28. Can u make a detailed video on semantic segmentation of medical images in matlab including all the steps..Like training,validation,and getting the result?It would be really helpful for me or some others.

  29. 3rd line gives error… Import Error cannot import name 'load_data' when on line from parser import load_data up up

  30. code is wrong
    image is not defined
    img = image.load_img('datasettest_setcatscat.4014.jpg', target_size = (224, 224))

  31. HELP !!
    ValueError: Python inputs incompatible with input_signature:

    inputs: (

    Tensor("IteratorGetNext:0", shape=(None, 50, 50, 1), dtype=uint8))

    input_signature: (

    TensorSpec(shape=(None, None, None, 1), dtype=tf.float32, name=None))

Leave a Reply

Your email address will not be published. Required fields are marked *