Stanford CS230: Deep Learning | Autumn 2018 | Lecture 8 – Career Advice / Reading Research Papers

Okay. Hey, everyone, looks like we’re on. So as usual, if you have not yet, um, please enter your SUID so that we know you’re here in this room. Um, so actually, can you hear me okay at the back? Is it okay? Oh, yes, is the volume okay at the back? All right. No one’s responding. Yes, okay. All right. [LAUGHTER] Thank you. Okay. So, um, what I want to do today is, um, share with you two things. You know, we’re approaching the end of quarter. Uh, I hope you guys are looking forward to, to the Thanksgiving break, um, next week. Um, actually and I guess we have a lot of home viewers, but those us- those of you that are viewing this from outside California, know that we’re all feeling really bad air here in California. So I hope, if you’re somebody watching at home you have better air wherever you are. Um, uh, but, uh, what I hope to do today is give you some advice that will set you up for the future, uh, so if even beyond the conclusion of CS230. And in particular, what I want to do today is, um, share with you some advice on how to read research papers, uh, because, you know, deep learning is evolving fast enough that even though you’ve learned a lot of foundations of deep learning and learned a lot of tips and tricks and probably know better than many practitioners how to actually get deep, deep learning algorithms to work already. Uh, when you’re working on specific applications whether in computer vision or natural language processing or speech recognition or something else, um, for you to be able to efficiently figure out the academic literature on key parts of, uh, the, the deep learning world, will help you keep on developing and, you know, staying on top of ideas even as they evolve over the next several years or maybe decade. So first thing I wanna do is, uh, give you advice on how, uh, when say, when I’m trying to master a new body of literature, how I go about that and hope that those techniques would be useful to help you be more efficient in how you read research papers. And then a second thing is, in previous offerings of this course, one request from a lot of students was just advice for navigating a career in machine learning. And so in the second half of today, I want to share some thoughts with you on that. Okay, so it turns out that- so I guess two topics reading research papers, right? Um, and, uh, then second career advice in machine learning. So it turns out that, uh, you know, reading research papers is one of those things that a lot of P- PhD students learn by osmosis, right? Meaning that if you’re a PhD student and you see, you know, a few professors or see other PhD students do certain things, then you might try to pick it up by osmosis. But I hope today to accelerate your efficiency in how you acquire knowledge yourself from the, uh, from the a- academic literature, right? And so let’s say that this is the area you want to become good at, let’s say you want to build that, um, speech recognition, right? Let’s turn this off now. Let’s say you want to build that, um, speech recognition system that we talked about, the Robert turn on and the desk lamp. All right. Um, this is what I’ve read- this is the sequence of steps I recommend you take, uh, which is first: [NOISE] compile lists of papers and the- and by papers, I mean, both research papers often posted on arXiv, onto the Internet, but also plus Medium posts, um, [NOISE] you know, what maybe some occasional GitHub post although those are rarer. But whatever texts or learning resources you have. And then, um, what I usually do is end up skipping around the list. All right. So if I’m trying to master a new body of knowledge, say you want to learn the most speech recognition systems, this is what it feels like to read a set of papers, which is maybe you initially start off with five papers and if on the horizontal axis, I plot, you know, 0 percent to 100 percent read/understood, right? The way it feels like reading these papers is often read, you know, ten percent of each paper or try to quickly skim and understand each of these papers. And if based on that you decide that paper number two is a dud, right, other, other, other authors even cite it and say boy they, they sure got it wrong or you read it, and it just doesn’t make sense. Then go ahead and forget it. And, uh, as you skip around to different papers, uh, you might decide that paper three is a really seminal one and then spend a lot of time to go ahead and read and understand the whole thing. And based on that, you might then find a sixth paper from the citations and read that and go back and flesh out your understanding on paper four. And then find a paper seven and go and read that all the way to the conclusion. Um, but this is what it feels like as you, you know, assemble a list of papers and skip around and try to, uh, um, master a body of literature around some topic that you want to learn. And I think, um, some rough guidelines, you know, if you read 15 to 20 papers I think you have a basic understanding of an- of an area like, maybe good enough to do some work, apply some algorithms. Um, if you read, um, 50 to 100 papers in an area like speech recognition and, and kind of understand a lot of it, then that’s probably enough to give you a very good understanding of an area, right? You might, know- I’m, I’m always careful about when I say you’re mastering a subject but you read 50 to 100 papers on speech recognition, you have a very good understanding of speech recognition. Or if you’re interested in say domain adaptation, right? By the time you’ve read 50 or 100 papers, you have a very good understanding of, of a subject like that. But if you read 5 to 20 papers, it’s probably enough for you to implement it but maybe not, not sure this is enough for you to do research or be really at the cutting edge but these are maybe some guidelines for the volume of reading you should aspire to if you want to pick up a new area. I’ll take one of the subjects in CS230 and go more deeply into it, right? Um, now [NOISE] how do you read one paper? And, um, I hope most of you brought your laptops. So what I’m gonna do is describe to you how I read one paper, and then after that I’m actually going to ask all of you to, you know, download the paper online and just take, I don’t know, uh, uh, take, take a few minutes to read a paper right here in class and see how far you can get understanding of a research paper in just minutes right, right here in class. Okay. So when reading one paper. So the, the, the bad way to read the paper is to go from the first word until the last word, right? This is a bad way to- when you read a paper like this. Oh, and by the way, actually here, I’ll tell you what my real life is like. So, um, I actually pretty much everywhere I go, whenever I backpack this is my actual folder. I don’t want to show- this is my actual folder of unread papers. So pretty much everywhere I go, I actually have a paper, you know, a stack of papers is on my personal reading list. This is actually my real life. I didn’t bring this to show you. This is in my backpack all the time. Ah, and I think that- these days on my team at Landing AI and, I personally lead a reading group where I lead a discussion about two papers a week. Uh, but to select two papers, that means I need to read like five or six papers a week to select two, you know, to present and discuss at the Landing AI and meeting group. So this is my real life, right? And how I try to stay on top of the literature and, and I have a- I’m doing a lot. If I can find the time, if I can find the time to read a couple of papers a week, hopefully all of you can too. Uh, but when I’m reading a paper, uh, this is, this is how I recommend you go about it which is, do- do- don’t go for the first word and read until the last word, uh, instead, uh, take multiple passes through the paper [NOISE]. Right? Um, and so, you know, step one is, uh, [NOISE] read the title, [NOISE] the abstract, um, [NOISE] and also the figures. Um, especially in Deep Learning, there are a lot of research papers where sort of the entire paper is summarized in one or two figures in the figure caption. So, um, so sometimes, just by reading the title, abstract and, you know, the key neural network architecture figure that just describes what the whole papers are, and maybe one or two of the experiments section. You can sometimes get a very good sense of what the whole paper is about without, you know, hardly reading any of the texts in the paper itself, right? Tha- tha- that’s the first pass. Um, second pass, I would tend to read more carefully, um, [NOISE] the intro, the conclusions, um, look carefully at all the figures again, [NOISE] and then skim, um, the rest, and you know, um, I- I don’t know how many of you have published academic papers, but, uh, when people publish academic papers, um, part of, you know, the publication process is, uh, convincing the reviewers that your paper is worthy for acceptance. And so what you find is that the abstract, intro and conclusion is often when the authors try to summarize their work really, really carefully, uh, to make a case, to make a very clear case to the reviewers as to why, you know, they think their paper should be accepted for publication. And so because of that, you know, maybe slightly not, slightly unusual incentive, the intro and conclusion and abstract often give a very clear summary of what’s the paper actually about. Um, and depending on, [NOISE] um, and again, just to be, you know, b- bluntly honest with you guys, um, the related work section is useful if you want, sometimes is useful if you want to- to understand related work and figure out what’s- what are the most important works in the papers. But the first time you read this, you might skim or even skip, skim the related work section. It turns out, unless you’re already familiar with the literature, if this is a body of work that you’re not that familiar with, the related work section is sometimes almost impossible to understand. Uh, and again, since I’m being very honest with you guys, sometimes, related work section is when the authors try to cite everyone that could possibly be reviewing the paper and to make them feel good, uh, uh, and then hopefully accept the paper. And so related work sessions is sometimes written in funny ways, right? Um, and then, uh, [NOISE] step 3, I would often read the paper, but, um, [NOISE] just skip the math [NOISE], right? Um, and four, read the whole thing, uh, but skip parts that don’t make sense, [NOISE] right? You know, um, I think that, uh, one thing that’s happened many times in the research is that, I mean, the papers will tend to be cutting edge research, and so when, uh, we publish things, we sometimes don’t know what’s really important and what’s not important, right? So there are- there are many examples of- of well known, highly cited research papers where some of it was just great stuff and some of it, you know, turned out to be unimportant. But at the time the paper was written, the authors did not know, every- no one on the planet knew what was important and what was not important. And maybe one example. Um, the LeNet-5 paper, right? Sample paper by Yann LeCun. Part of it was phenomenal, just established a lot of the foundations of ConvNets. And so it’s, uh, one of the most incredibly influential papers. But you go back and read that paper, an- another sort of, whole half of the paper was about other stuff, right? Transducers and so on that is much less used. And so- and so it’s fine if you read a paper and some of it doesn’t make sense because it’s not that unusual, or sometimes it just happens that, um, great research means we’re publishing things at the boundaries of our knowledge and sometimes, ah, uh, the stuff you see, you know, we’ll realize five years in the future that that wasn’t the most important thing after all, right? Or that- what was the key part of the algorithm, maybe it wasn’t what the authors thought. And so sometimes the past papers don’t make sense. It’s okay to skim it initially and move on, right? Uh, uh, unless you’re trying to do a pe- unless you’re trying to do deep research and really need to master it, then go ahead and spend more time. But if you’re trying to get through a lot of papers, then, you know, then- then it’s just prioritizing your time, okay? Um, and so, ah, just a few last things and then I’ll ask you to practice this yourself with a paper, right? Um, you know, I think that when you’ve read and understood the paper, um, [NOISE] these are questions to try to keep in mind. And when you read a paper in a few minutes, maybe try to answer these questions: what do the authors try to accomplish? And what I hope to do in a few minutes is ask you to, uh, download a paper off the Internet, read it, and then, um, try to answer these questions and discuss your answer to these questions with- with- with your peers, right? With others in the class, okay? Um, what were the key elements, [NOISE] what can you use yourself, and um, [NOISE] okay? So I think if you can answer these questions, hopefully that will reflect that you have a pretty good understanding of the paper, okay? Um, and so what I would like you to do is, um, pull up your laptop and then so you- there- there’s actually a- so I think on the, uh, ConvNet videos, right? On, um, the- the different AI ConvNet videos on Coursera, you learned a bit about, um, ah, well, various neural network architectures up to ResNets. And it turns out that there’s another, uh, follow-on piece of work that maybe builds on some of the ideas of ResNets, which is called DenseNets. So, what I’d like you to do is, um, oh, and- and so what I’d like you to do is actually try this. [NOISE] And when I’m reading a paper, [NOISE] again, in the earlier stages, don’t get stuck on the math, just go ahead and skim the math, and read the English text where you get through faster. Ah, and maybe one of the principles is, go from the very efficient high information content first, and then go to the harder material later, right? That’s why often I just skim the math and I don’t- if I don’t understand some of the equation just move on, and then only later go back and, and really try to figure out the math more carefully, okay? So what I’d like you to do is take on a- I want you to take, um, uh, uh, wonder if, uh, let’s- let’s- let’s try, let’s- let’s- have you take seven minutes. I was thinking maybe one- one minute per page is quite fast and, um, [NOISE] search for this paper, [NOISE] Densely Connected Convolutional Neural Net- Networks, by Gao Huang et al, okay? I want you guys to take out your laptops, uh, search for this paper, er, download it. You should find this on arXiv, um, A-R-X-I-V, right? And, uh, and this is also, so sometimes we also call this Dense Nets, I guess. And go ahead and, uh, take, why don’t you take like seven minutes to read this paper and I’ll let you know when the time is passed, and then after that time, um, I’d like you to, you know, discuss with your, with, with the others, right, what, wha- what you think are the answers, especially the first two. Because the other two you can leave for later. Why don’t you go ahead and take a few minutes to do that now, and then I’ll let you know when, um, sort of like, seven minutes have passed and then you can discuss your answers to these with your friends, okay? [NOISE] All right guys. So, um, anyone with any thoughts or insights, surprises, or thoughts from this? So, now you’ve spent 11 minutes on this paper, right? Seven minutes reading, four minutes discussing. It was a very, very short period of time, but any, any thoughts? What do you think of the paper? Come on, you-all, you-all just spent a lot of time sitting around, discussing with each other. Wha- wha- what did people think about the time you spent trying to read the paper? Actually, did you feel you, how, actually, r- raise your hand if you feel, you know, you’ve kind of understood the main concepts in the paper just a bit. Okay, yeah, like, two-thirds of you, many of you. And, actually, what did you think of the figures? Wow, people are really less energetic today than usual [inaudible] So I think this is one of those papers where the, the paper is almost entirely summarized in figures one and two, all right. I think if you [inaudible] um, if you look at Figure One and the caption there and Figure Two on page three and the caption there and understand those two figures, those really convey, you know, 80 percent of the idea of the paper, right? Um, and I think that, uh, um, couple of other tips. So, um, it turns out that as you read these papers with practice, you do get faster. So, um, for example, Table One, uh, on page four, right, the, you know, this mess of the table on top. This is a pretty common format or a format like this is how a lot of authors use to describe a network architecture, especially in computer vision. So one of the things you find as well is that, um, the first time you see something like Table One it just looks really complicated. But by the time you’ve read a few papers in a similar format, you will look at Table One and go, “Oh, yeah, got it.” You know, this is, this is, this is the DenseNet-121 versus DenseNet-169 architecture, and you will more quickly pick up those things. And so another thing you’ll find is that, um, reading these papers actually gets better with practice, because you see different authors use different ways or similar ways of expressing themselves, and you get used to that. You’ll actually be faster and faster at, uh, implementing these, um, at, at, at understanding these ideas. And I think, I know these days when I’m reading a paper like this, it maybe takes me about half an hour to, to feel like, and I, I know I gave you guys seven minutes when I thought I would need maybe half an hour to figure out a paper like this. Uh, um, uh, and I think, uh, for a more c- uh, I find that, uh, it’s not unusual for people relatively new to machine learning to need maybe an hour to kind of, you know, really understand a paper like this. Um, and then I know I’m pretty experienced in machine learning, so I’m down to maybe half an hour for a paper like this, maybe even 20 minutes if it’s a really easy one. But there are some outliers, so I have some colleagues who sometimes stumble across a really difficult paper. You need to chase down all the references and learn a lot of other stuff. So sometimes you come across a paper that takes you three or four hours or even longer to really understand it, but, uh, but I think depending on how much time you want to spend per week reading papers, um, you could actually learn, you know, learn a lot, right, um, uh, doing what you just did by maybe spending half an hour per paper, an hour a paper rather than seven minutes, right? Um, so, all right. I feel like, uh, yeah, and so, I, I think it’s great, and, and, and notice that I’ve actually not said anything about the content of this paper, right? So whatever you guys just learned, that was all you. I had nothing to do with it. So, yeah, like you have the ability to go and learn this stuff by yourself. You don’t need me anymore, right? [LAUGHTER] Um, so just the last few comments. Um, let’s see. So the other things I get asked, questions I get is, uh, you know, where, where do you go? The deep learning field evolves so rapidly. So where, where do you go, uh, to? So if you’re trying to master a new body of knowledge, definitely do web searches, and there are often good blog posts on, you know, here are the most important papers in speech recognition. There are lots of great resources there. And then the other thing you, I don’t know, a lot of people try, want to do is try to keep up with the state of the art of deep learning even as it’s evolving rapidly. And so, um, I, I- I’ll just tell you where I go to keep up with, um, you know, discussions, announcements. And surprisingly, Twitter is becoming an impo- surprisingly important place for researchers to find out about, um, new things. Um, there’s an ML Subreddit, it is actually pretty good. Um, lot of noise, but many important pieces of work do get mentioned there. Uh, some of the top machine-learning con- conferences are NIPS, ICML, and ICLR, right? And so whenever these conferences come around, take a look and glance throughout these, the titles, see if there’s something that interests you. And then I think I’m, I’m fortunate I guess to have, um, friends, you know, uh, both colleagues here at Stanford as well as colleagues in several other teams I work with that, um, uh, that just tell me when there’s a cool paper, I guess. But I think with, here within Stanford or among your workplace, for those of you taking this at SCPD, you can form a community that shares interesting papers. So a lot of the groups I have on Slack and we regularly Slack each other or send, send each other, uh, text messages on the Slack messaging system, where you find interesting papers, and tha- tha- that’s been great for me actually. Um, yeah, oh, and, and, and Twitter, let’s see. Kian is, I follow Kian, you can follow him too. Uh, This is me, Andrew Y Ng, right? Um, I probably don’t Slack up papers as often as I do. But if you look at, and you can also look at who we follow, and there are a lot of good researchers, uh, that, that will share all these things online. Oh, and, um, there, there are, there’s a bunch of people that also use a website called Arxiv Sanity. Um, I don’t as much sometimes, um, but there’s lots of resources like that, all right? Um. All right. Cool. So just two last tips for how to read papers and get good at this. Um, so to more deeply understand the paper, uh, some of the papers will have math in it. Uh, and, actually, if you read the, I don’t know, you all learned about Batch Norm, right? In the second module’s videos. If you read the Batch Norm paper, it’s actually one of the harder papers to read. There’s a lot of math in the derivation of Batch Norm but there are papers like that. And if you want to make sure you understand the math here’s what I would recommend, which is, read through it, take detailed notes and then see if you can re-derive it from scratch. So if you want to deeply understand the math of an algorithm from like, you know, Batch Norm or the details of back-prop or something the good practice. And I think a lot of sort of a theory- theoretical science and mathematics Ph.D students will use a practice like this. You just go ahead and read the paper. Make sure you understand it and then to make sure you really, really understand it put, put, put aside the results and try to re-derive the math yourself from scratch. And if you can start from a blank piece of paper and re-derive one of these algorithms from scratch, then that’s a good sign that you really understood it. When I was a Ph. D student I did this a lot, right? That you know I would read a textbook or read a paper or something and then put aside whatever I read and see if I could re-derive it from scratch starting from a blank piece of paper as only if I could do that, and I would you know feel like yep, I think I understand this piece of math. And it turns out if you want to do this type of math yourself is your ability to derive this type of math, re-derive this type of math, that gives you the ability to generalize, to derive new novel pieces of math yourself. So I think I actually learned a lot of math, for several machine learning by doing this. And just by re-deriving other people’s work that allowed me to learn how to derive my own novel algorithms. And actually sometimes you go to the art galleries, right? They go to the Smithsonian. You see these art students, you know, sitting on the floor copying the great artworks, the great paintings you know, painted by the masters centuries ago. And so I think just as today there are art students sitting in or the de Young Museum or whatever or and I was at the Getty Museum in LA a few months ago. You actually see these art students you know, copying the work of the masters. And I think a lot of the ways that you want to become good at the math of machine learning yourself, this is a good way to do it. It’s time-consuming but then you can become good at it that way. And same thing for codes, right? I think the simple lightweight version one of learning would be to download and run the open source code if you can find it, and a deeper way to learn this material is to re-implement it from scratch. Right, it is easy to download an open sourcing and run it and say ooh, it works. But if you can re-implement one of these algorithms from scratch then that’s a strong sign that you’ve really understood this algorithm. Okay? Um, alright. And then longer term advice. Right. You know, for user keep on learning and keep on getting better and better, the more important thing is for you to learn steadily not for you to have a focus intense activity you know, like over Thanksgiving you read 50 papers over Thanksgiving and then you’re done for the rest your life. It doesn’t work like that, right? And I think you’re actually much better off reading two or three papers a week for the next year than you know, cramming everything right over, over one long weekend or something. Actually in education we actually know that spaced repetition works better than cramming so the same same thing, same body of learning. If you learn a bit every week and space it out you actually have much better long-term retention than if you try to cram everything in short-term so there’s, there’s a very solid result that we know from pedagogy and how the human brain works. So, so if you’re able to- so so again the way I, my life is my backpack. I just always have a few papers with me. And I find that I can, I read almost everything on the tablet. Almost everything on iPad, but I find that research papers one of the things where the ability to flip between pages and skim I still find more efficient on paper. So I read almost nothing on paper these days except for research papers, but that’s just me. Your mileage may vary. Maybe something else will work better for you. Okay? All right. So let’s see, that’s it for reading research papers, I hope that while you’re in CS230, you know, if some of you find some cool papers or if you go further for the DenseNet paper and find an interesting result there. Go ahead and post on Piazza if any of you want to start a reading group of other friends here at Stanford you know, encourage you to look around class, find, find, find a group here on campus or with among your CS230 classmates or your work colleagues. For those of you taking this on SCPD so that you can all keep studying the literature and learning and helping each other along. Okay? So that’s it for reading papers. The second thing we’re gonna do today is just give some longer-term advice on navigating a career in machine learning, right? Any questions about this before I move on? Okay. Cool. All right. But I hope that was useful. Some of this I wish I had known when I was a first-year PhD student but c’est la vie. Alright. Let’s see. Can we turn on the lights please? Alright. So kind of in response to requests from early- students in earlier versions of the class, before we, you know as we approach the end of the quarter, want to give some advice to how to navigate a career in machine learning, right? So today machine learning there are so many options to do, so many exciting things. So how do you, you know, what do you want to do? So I’m going to assume that most of you will want to do one of two things, right? At some point you know you want to get the job, right? Maybe a job that does work in machine learning and including a faculty position for those of you who want to be a professor. But I guess eventually most people end up with a job I think I guess there are other alternatives but but and some of you want to go on to more advanced graduate studies although even after you get your PhD at some point, most people do get a job after the PhD. And by job I mean either in a big company, you know, or a or a startup, right? But regardless of the details of this, I think- I hope most of you want to do important work. Okay. So what I’d like to do today is break, you know, this into, how do you find a job or join a Ph.D program or whatever that lets you do important work. And I want to break this discussion into two steps. One is just how do you get a position? How do you get that job offer or how do you get that offer of admission to the Ph.D program or admission to the master’s program or whatever you wanna do. And then two is selecting a position. Between going to this university versus that university or between taking on the job in this company versus that company. What are the ones that will tend to set you up for success, for long-term personal success and career success? And really I hope that, by the way, I hope that all of these are just tactics to let you do important work right and this, I hope that’s what you want to do. So you know, what do recruiters look for? And I think just to keep the language simpler I’m going to pretend that, I’m just gonna talk about finding a job. And but a lot of that very similar things apply for PhD programs is just instead of saying recruiters I would say admissions committees right then it’s actually some of this is, but let me just focus on the job scenario. So most recruiters look for technical skills. So for example, there are a lot of machine learning interviews that will ask you questions like, you know, where would you use gradient descent or batch gradient descent or stochastic gradient descent, you know, descent and what happens when the mean batch size is too large or too small, right? So there are companies, many companies today asking questions like that in the interview process. Or can you explain difference between an LCM and GIGO and when would you use GIGO? So you really get questions like that in many job interviews today. And so recruiters looking for ML skills as well as, and so you will often be quizzed on ML skills as well as your coding ability, right? And then beyond your- and I think Silicon Valley’s become quite good at giving people the assessments to test for real skill in machine learning engineering and in software engineering. And then the other thing that recruiters will look for, that many recruiters will look for is meaningful work. And in particular, um, uh, you know, there are some candidates that apply for jobs that have very, um, theoreticals. They’re very academic skills meaning you can answer all the quiz questions about, you know, what is Batch Norm? Can you derive the [inaudible] for this? But unless you’ve actually shown that you can apply this in a meaningful setting, it’s harder to convince a company or a recruiter that you know not just the theory, but that you know how to actually make this stuff work. And so, um, having done meaningful work using machine learning is a very strong, is a very desirable candidate, I think, to a lot of companies. Kind of work experience. And I think really, whether you’ve done, whether you’ve done something meaningful, um, reassures that, you know, that you can actually do work, right? There’s not just you can answer quiz questions, that you know how to implement learning algorithms that work. Um, and, and maybe, um, uh, yeah, right. Um, and then many recruiters actually look for your ability to keep on learning new skills and stay on top of machine learning even as it evolves as well. Okay. And so a very common pattern for the, um, successful, you know, AI engineers, say, machine learning engineers, would be the following, where if on the horizontal axis, I plot different areas. So, you might learn about machine learning. Learn about deep learning. Learn about probabilistic graphical models. Learn about NLP. Learn about computer vision and so on for other areas of AI and machine learning. Um, and if the vertical area of the vertical axis is depth, uh, a lot of all the strongest candidates for jobs are, um, T-shaped individuals. Meaning that you have a broad understanding of many different topics in the AI machine learning, and very deep understanding in, you know, maybe at least one area. Maybe more than one area. Um, and so I think by taking CS230 and doing the things that you’re doing here, hopefully you’re acquiring a deeper understanding of one of these areas of deep learning in particular. Um, but the other thing that even, you know, deepens your knowledge in one area will be the projects you work on. Um, the open source contributions you make, right. Uh, whether or not you’ve done research. Um, and maybe whether or not you’ve done an internship. Right? Okay. And I think these two elements, you know, broad area of skills, and then also going deeper to do a meaningful project in deep learning. Or, um, work with a Stanford professor, right? And do a meaningful research project, or make some contribution to open-source. Publish it on GitHub, and then let us use it. These are the things that let you deepen your knowledge and, and convince recruiters that you both have the broad technical skills, and when called on you’re able to apply these in a, in a, in a meaningful way to an important problem, right? And in fact, um, the way we design CS230 is actually a microcosm of this. Where, um, you know, you learned about neural nets. Um, then about topics like Batch Norm, ConvNets, sequence models, right? I’m just gonna say RNNs. So, actually you’ve a breadth within the field of deep learning. And then what happens is, well, then, and the reason I want you to work on the project is so that you can pick one of these areas. And maybe go deep and build a meaningful project in one of these areas, which will, which will, and it’s not just about making a resume look good, right? It’s about giving you the practical experience to make sure you actually know how to make these things work, um, uh, and give you the learning. To make sure that you actually know how to make a CNN work, to make a RNN work. All right. And then of course it stands many students also list their projects on their resumes obviously. Um, so, I think the um, let’s see. The- the- the- failure modes. The things, bad ways to navigate your career. Um, there are some students they just do this, right? There are some Stanford students that just take class, after class, after class, after class, and go equally in depth in a huge range of areas. And this is not terrible. You can actually still got a job uh, uh you still get. Sometimes you can even get into some Ph.D. programs like this with all the depth, but this is not the best way to navigate your career. All right? So, there are some Stanford students who’s- that takes tons of classes. You can get a good GPA doing that, but do nothing else. And this is not terrible, but this is- this is not- this is not great. It’s not as good as the alternative. Um, there’s one other thing I’ve seen Stanford students do which is, uh, just try to do that, right? But you just try to jump in on day one, and go really really deep in one area. And again, um, this has its own challenges, I guess. You know, one, one, one failure mode, one mode is actually not great. As sometimes you actually get, um, undergrad freshmen at Stanford that have not yet learned a lot about coding, or software engineering, or machine learning, and try to jump into research projects right away. This turns out not to be very efficient because it turns out Stanford classes are, your online courses or Stanford classes are a very efficient way for you to learn about the broad range of areas. And after that going deeper and getting experience in one vertical area then deepens your knowledge. It makes so you know how to actually make those ideas work. Uh, so I do see sometimes unfortunately, you know, som- some Stanford freshmen join us already knowing how to code and have implemented, you know, some learning algorithms, but some students that do not yet have much experience try to jump into research projects right away. And that turns out not to be very productive for the student or for the research group because until you’ve taken the classes and mastered the basics it’s difficult to understand what’s really going on in the advanced projects, right? Um, so I would, I, I would say this is actually worse than that, right? This is, this is actually okay. This is actually pretty bad. It is I, I, I would not do this for your career, right? Yeah. Probably not. Yeah. Um, and then the other not-so-great mode that you see some Stanford students do is get a lot of breadth, and then do a tiny project here. Do a tiny project there. Do a tiny project there. Do a tiny project there. And you end up with ten tiny projects, but no one or two really sec- significant projects. So again, this is not terrible, but, you know, beyond a certain point, by the way recruiters are not impressed by volume, right? So, having done 10 lame projects is actually not impressive. Not nearly as impressive as doing one great project or two great projects. And again, there’s more to life than impressing recruiters, but recruiters are very rational. And the reason recruiters are less impressed by someone who’s profile looks like this is because they’re actually probably factually less skilled and less able at doing good work in machine learning compared to someone that, that has done a substantive project and knows what it takes to see, see the whole thing through. Does that make sense? So, when I say you’d have recruiters more or less impressed is because they’re actually quite rational, in terms of, uh, trying to understand how good you are at um, uh, at, at, doing important work, at building meaningful AI systems. Makes sense? Um, and so in terms of building up both the horizontal piece and vertical piece, uh, this is what I recommend. Um, to build a horizontal piece, a lot of this is about building foundational skills. And, um, it turns out coursework is a very efficient way to do this. Uh, you know, in, in, in these courses, right, you know various instructors like us, but many other Stanford professors, um, put a lot of work into organizing the content to make it efficient for you to learn this material. Um, and then also reading research papers which we just talked about. Having a community will help you. Um, and then that is often, uh, building a more deep and, um, relevant project, right? And, and, and the pro- projects have to be relevant. So, you know, if you want to build a career machine learning, build a career in AI. Hopefully, the project is something that’s relevant to CS, or machine learning, or AI deep learning. Um, I do see, I don’t know, for some reason, I feel like, uh, a surprisingly large number of Stanford students I know are in the Stanford dance crew, and they spend a lot of time on that which is fine. If you enjoy dancing, go have fun. Don’t, don’t, you know, you, you don’t need to work all the time. So, go join the dance crew, or go on the overseas exchange program. And go hang out in London and have fun, but those things do not as directly contribute to this, right? Yeah. I know, I think, I think, in an earlier version of this presentation, you know, students walked away, saying ha, you know, Andrew says we should not have fun we should work all the time and that’s not the goal [LAUGHTER]. Um, All right. There is one. All right. Um, you know, there is the uh, Saturday morning problem which all of you will face. Right? Which is every week, uh, including this week on Saturday morning you have a choice. Um, you can, uh, read a paper [LAUGHTER] or work on research or work on open source or, I don’t know what people do, or you can watch TV or something, [LAUGHTER] right? Um, and you will face this choice, like, maybe every Saturday, you know, for the rest of your life or for all Saturdays in the rest of your life. And, um, you know, you can build out that foundation skills, go deep or go have fun, and you should have fun, all right? Just for the record. But one of the problems that a lot of people face is that, um, even if you spend all Saturday and all Sunday reading a research paper, um, you know, the following Monday, or maybe spend all Saturday and Sunday working hard, it turns out that the following Monday, you’re not that much better at deep learning. Is like, yeah, you work really hard. So you read five papers, you know, great. Uh, but if you work on a research group the professor or your manager if you’re in a company, they have no idea how hard you work. So there’s no one to come by and say ”Oh, good job working so hard all weekend.” So no one knows these sacrifices you make all weekend to study or code open source, just no one knows. So there’s almost no short-term reward to doing these things. Um, but they see- and, and, and, and whereas there might be short-term rewards for doing other things, right? Um, uh, but the secret to this is that it’s not about reading papers really, really hard for one Saturday morning or for all Saturday once and it being done. The secret to this is to do this consistently, um, you know, for years, um, or at least a month. And it turns out that if you read, um, two papers a week, and you do that for a year then you have read 50 papers after a year and you will be much better at deep learning after that, right? I mean when you read, you have read 100 papers in the year if you read two papers a week. And so for you to be successful is much less about the intense burst of effort you put in over one weekend. It’s much more about whether you can find a little bit of time every week to read a few papers or contribute to open source or take some online courses, uh, but- and if you do that you know every week for six months or do that every week for a year, you will actually learn a lot about these fields and be much better off, and be much more capable at deep learning and machine learning or whatever, right? Um, yeah. So, um, yeah, and yeah she- my wife and I actually do not own a TV. [LAUGHTER] For what it’s worth. Okay, but again, if you own one go ahead. Make sure- don’t, don’t drive yourself crazy. There’s a healthy work-life integration as well. All right. So, um, so I hope that doing these things more is not about finding a job, it’s about doing these things and make you more capable as a machine learning person, so that you have the power to go out and implement stuff that matters, right? To do stuff that actually, do, do work that matters. Well the second thing we’ll chat about is selecting a job, right? And it’s actually really interesting. Um, I, uh, gave this part of presentation, um, last year, uh, actually sorry earlier this year and shortly after that presentation, um, there was a student in the class that was already in a company who emailed me saying, “Boy Andrew, I wish you’d told me this before I accepted my current job.” Um, and so [LAUGHTER] let’s see. Hopefully this will be useful to you. Um, so it turns out that ,um, uh, you know, I, so when you’re- at some point you’re deciding, you know, what Ph.D program do you want to apply for, what companies you want to apply for a job at and, um, I can tell you what, uh, so if you want to keep learning new things, um, I think one of the biggest predictors of your success will be whether or not you’re working with great people and projects, right? And in particular, um, you know, there are these fascinating results from, uh, what are these, I wanna say from the social sciences showing that, um, if your closest friends, if your five closest friends or ten closest friends are all smokers, there’s a much higher chance you become a smoker as well, right? And if your five or 10 close friends are, uh, um, you know, overweight, there’s a much high chance you’d do the same or- and conversely if there’s a, you know, so I think that if your five closest friends work really hard, read a lot of research papers, care about their work, learning and making themselves better, then there’s actually a very good chance that you will be, that they’ll influence you that way as well. So we’re all human. We’re all influenced by the people around us, right? And so, um, I think that- and I’ve been fortunate, I’ve taught at Stanford for a long time now, so I’ve been fortunate to have seen a lot of students go from Stanford to various careers and because I’ve seen how many hundreds or maybe thousands of Stanford students, that I knew right, when they were still Stanford students, go on to separate jobs. I saw many of them have amazing careers. Um, I saw, you know, a few have, like, okay careers, um, that I think over time I’ve learned to pattern match what is predictive of your future success after you leave Stanford and I’ll share with you some of those patterns, share with you some of those patterns as you navigate your career. And it’s just there’s so many options in machine learning today that its’s kind of tragic if you don’t, you know, navigate to hopefully maximize your chance of being one of the people that gets to do fun and important work that helps others. Um, so when selecting a position, um, I would advise you to focus on the team, um, [NOISE] you interact with and by team I mean, you know, somewhere between 10 to 30 persons, right, maybe up to 50, right? Um, because it turns out that if yo- there will be some group of people. Maybe 10 to 30 people, maybe 50 people that you interact with quite closely and these will be appears in the people that will influence you the most, right? Um, because if you join a company with 10,000 people, you will not interact with all 10,000 people. There will be a core of 10 or 30 or 50 people that you interact with the most, and it’s those people how much they know, how much they teach you, how hard working they are, whether they’re learning themselves that will influence you the most, rather than all these other hypothetical 10,000 people in a giant company. Um, and of these people, one of the ones that will influence you the most is your manager, all right? So make sure you meet your manager and get to know them and make sure they’re someone you want to work with. Um, and in particular, I would recommend focusing on these things and not on the brand, um, of the company. Because it turns out that the brand of the company you work with is actually not that correlated. Yeah maybe there’s a very weak correlation, but it’s actually not that correlated with what your personal experience would be like if that makes sense, right? Um, and so, um, [NOISE] and by the way, again, just full disclosure. I’m one of the- I have a research group here at Stanford, right? My research group at Stanford is one of the better known research groups in the world but just don’t join us because you think we are well-known, right? It’s just not a good reason to join us for the brand. Instead, if you want to work with someone, meet the people and evaluate the individuals, or look at the people and see if you think these are people you can learn from and work with, and are good people, makes sense? [NOISE] So, um, in today’s world there are a lot of companies, um, recruiting Stanford students. So let me give you some advice. This piece I only give because many years- well I’ll just give you advice. So sometimes, there are giant companies with let’s say, uh, 50,000 people, right? And I’m not thinking of any one specific company. If you’re trying to guess what company I’m thinking of, there is no one specific company I’m thinking of but this pattern matches, uh, to many large companies. But maybe there’s a giant company with, you know, 50,000 people, right? And, um, let’s say that they have a 300 person, right, AI team, um, it turns out that if you look at the work of the 300 persons in the AI team and if they send you a job offer to join the 300 person AI team, that might be pretty good, right? Since this may be the group, you know, whose work you hear about, they publish papers, you read them on the news. Um and so if you’ve got a job offer to work with this group, that might be pretty good or even better would be sometimes even within the 30 person AI team it’s actually difficult to tell what’s good and what’s not. There is often a lot of variance even with this, what’s even better would be if you get a job offer to join a 30 person team. So you actually know who’s your manager, who are your peers, who you’re working with. And if you think these are 30 great people you can learn from, that could be a great job offer. The failure mode that unfortunately I’ve seen, um, several Stanford students go down and it’s actually this is a true story. There was once, uh, several years ago there’s a Stanford student I knew that I thought was a great guy, right? You know, I knew his work, he was coding machine learning algorithms. I thought he was very sharp and did very good work, uh, working with some of my Ph.D students. He got a job offer from one of these giant companies with- that has a great AI group. Um, and his offer wasn’t to go to the AI group, his offer was to, um, join us and then we’ll assign you to a team. So this particular student, that was a Stanford student that I know about and care about, um, he wound up being assigned to a really boring Java back end payments team and, uh, so after he accepted the job offer, he wound up being assigned to a, you know, back-end- and I apologize. I know you work on Java back-end payment process systems I think they’re great [LAUGHTER] but the student was assigned to that team and he was really bored and so, um, I think that this was a student whose career- I personally saw his career rising, while he was at Stanford and after he went to this, you know, frankly not very interesting team, I saw his career plateau, um, and after about a year and a half he resigned from this company after wasting a year and a half of his life and missing out really on a year and a half of this very exciting growth of AI machine learning, right? So it was very unfortunate. Um, uh, and it was actually after I told this story, um, last time I taught this class earlier this year that actually someone from, um, actually it was from the same big company [LAUGHTER] he found me and said, “Boy, Andrew I wish you’d told me the story earlier, because this is exactly what happened to me, at the same big company [LAUGHTER]. Now, I wanna share with you, uh, a different, um, so- so I would just be careful about rotation programs as well. You know, when the company is trying to recruit you, if a company refuses to tell you what project you work on, who’s your manager, exactly what team you’re joining, I personally do not find those job offers that attractive because if they can’t, you know, if they refuse to tell you what team you’re gonna work with, well chances are, right, telling you the answer will not make the job attractive to you. That’s why they’re not telling you. So I’d just be very careful. And sometimes rotation programs sound good on paper but it is really, you know, well we’ll figure out where to send you later. So, I feel like I’ve seen some students go into rotation programs that sound good on paper, that sound like a good idea but just as you wouldn’t- after you graduate from Stanford, would you wanna do four internships and then apply for a job? That would be a weird thing to do. So, sometimes rotation programs are yeah, come and do four internships and then we’ll let you apply for a job and see where we wanna send you. It could be a job at back end payment processing system, right? So, um, so so just just be cautious about the marketing of rotation programs. Um uh, and again, if you do if but if- but if what they say is do rotation and then you join this team, then you can look at this team and say yep, that’s a great team. I wanna do rotation but then I would go and work with this team and and these are the 30 people I’ll work with. So that could be great. But do a rotation and then we can send you anywhere in this giant company, that I would just be very careful about. Um, now on the flip side, there are some companies, I’m not gonna mention any companies, but there are some companies with you know, not as glamorous, not as- not as like cool brands, and maybe this is a, I don’t know, 10,000 person company or 1,000 or 50,000 person or whatever. Let’s say 10,000 person company. I have seen many companies that are not super well-known in the AI world, they are not in the news all the time, but they may have a very elite team of 100 people doing great work in machine learning, right? And there are definitely companies whose brands are not you know, the first companies you think of when you think of big AI companies that sometimes have a really really great 10 person or 50 person or 100 person team that works on learning algorithms. And even if the overall brand or the overall company, you know, isn’t as like, is a little bit sucky. If you manage to track down this team and if you have a job offer to join this elite team in a much bigger company, you could actually learn a lot from these people and do important work. You know, one of the things about Silicon Valley is that uh, the brand of your resume matters less and less, right? Less than never before. I mean, I guess, I think the exception of the Stanford brand, you totally want the Stanford brand in your resume but with that exception, but really you know, Silicon Valley is becoming really good. Sili- the world, right? Has become really good at evaluating people for your genuine technical skills and your genuine capability and less for your brand and so, I would recommend that instead of trying to get the best stamps of approval on your resume to go and take the positions that let you have the best learning experiences and also allows you to do the most important work and that is really shaped by the you know, 30 or 50 people you work with and not by the overall brand of the company you work with, right? So the variance across um uh-so there’s a huge variance across teams within one company and that variance is actually pretty bigger or might be bigger than the variance across different companies, does that make sense? Since I would, and if a company refuses to tell you what team you would join, I would seriously consider just, you know, doing something- if you have a better option, I would, I would do something else. Um, and then finally, um, yeah and- and so really I- again I guess I don’t wanna name these companies but you know think of some of the large retailers or some of the large healthcare systems or there are a lot of companies that are not well known in the AI world but that I’ve met their AI teams and I think they’re great. And so if you’re able to find those jobs and meet their people you can actually get very exciting jobs in there. All right but of course, for the giant companies with elite AI teams, you can join that elite AI team, right? That’s also- that’s also great. I’m a bit biased since I use to lead some of these elite AI teams. So- so I think those teams are great but the loss of some teams in a, um, ah, yeah. All right. Um, lastly, you know, just general advice, this is how I really live my life. I tend to choose the things to work on that will allow you to learn the most and you know, try to do important work, right? So, you know especially if you’re relatively early in your career, whatever you learn in your career will pay off for a long time and so um, uh and so joining the teams that are working with a great set of 10 or 30 or 50 teammates will let you learn a lot, and then also, you know, hopefully, I mean, yeah and- and just don’t- don’t don’t join a like a cigarette company and hope you know, give more people cancer or stuff like that. Just don’t- don’t do this. Don’t- don’t do stuff like that. But if you can do meaningful work that helps other people and do important work and also learn a lot on the way, hopefully you can find positions like that, right? That let you set- set yourself up for long-term success but also do work that you think matters and that, and that helps other people. All right. Um, any questions while we wrap up? Yeah. [NOISE] I have a question about important work, what are some topics that you think you would include as important [inaudible]? What’s important? You know, I don’t know. Um, I think one of the most meaningful things to do in life is called [inaudible]. Either advance the human condition or help other people. But the thing is, I’m nervous. I don’t wanna name one or two things because the world needs a lot of people who work on a lot of different things. So, the world’s not gonna function if everyone works on computational biology. I think comp-bio is great but it’s actually good that, where people work on comp-bio, my Ph.D students like you know, many work on the outside to healthcare. My team at Landing AI does a lot of work on the AI applied to manufacturing, to agriculture, to some health care and some other industries. Um,uh, I actually especially the California fire is burning you know, I actually think that there’s important work to be done in AI and climate change, uh, um, but I think that there’s a lot of important work in a lot of industries. Right, I actually think that, you know, I should think that the next wave of AI, excuse me I should say machine learning, is we’ve already um, transformed a lot of the tech world, right? So, you know, yeah, I mean we’ve already helped a lot of the Silicon Valley tech world become good at AI and that’s great, right? Helped build a couple of the teams that wound up doing this, right? Google Brain, how Google become good at deep learning, the Baidu I grew up with, hope I do become, you know, good at one of the greatest AI companies in the world, certainly in China, and I’m very happy that between me and some of my friends in the industry we’ve made a lot of good AI companies. I think part of the next phase for the evolution of machine learning is for it to go into not just the tech companies like the, you know, like the Google and Baidu which I helped as well as Facebook, Microsoft which I had nothing to do with as well as what else AirBnB, Pinterest, Uber, right? All these are great companies. I hope they’ll all embrace AI. But I think some of the most exciting work to be done still has also looked outside the tech industry and to look at all the sometimes called traditional industries that do not have shiny tech things because I think the value creation there as surprise you could implement there may be even bigger than if you, you know, uh, uh yeah. I’ll mention one interesting thing, one thing I noticed is a lot of large tech companies all work on the same problems, right? So everyone works on machine translation, everyone works on speech recognition, face detection, and click-through rate and part of me feels like this is great because it means there’s a lot of progress in machine translation and that’s great. We do want progress in machine translation. Though sometimes when you look at other industries. Um, so, you know, when you look at manufacturing or um, some of the medical devices that you’re looking at or sometimes on on these farms hanging out with farmers on, on, on. If you like, in my own work with my teams where sometimes we’re stumbling across brand new research problems that the big tech companies do not see and have not yet learned to frame. So, I find one of the most exciting challenges is actually to be constantly on the cutting edge. Looking at these types of problems there’s a different cutting edge than the cutting edge of the big tech companies. So, I think some of you will join the big tech companies and that’s great. We need more AI in the big companies, in the tech companies, but I think a lot of exciting work to do in AI is also outside where we traditionally consider tech, right? All right. It’s 10 to, it’s 12:50. So, hope- I hope this was helpful and let’s- let’s break for today. Have a, have a great Thanksgiving everyone and we’ll see you in a couple of weeks.

Leave a Reply

Your email address will not be published. Required fields are marked *