Grant Jenks – Fight Night in Literate Python – Pyninsula #19



that lay there I guess all right grant take it away grant thank you enter full screen how many people here are really into the WWE or WWF if you've ever seen one of those you know they always begin they're like let's get ready to rumble you know like the guy has to come on the stage and it's gonna be a big fight and I kind of want you to think of that's bad tonight that's what we're doing here this is okay fight night let's get ready to rumble we're gonna bring on two titans of computer science tonight one of them is going to emerge victorious and one of them will go shamed shame not lady 8:00 we begin our journey in the Zen of Python the Zen abite pythons kind of like a poem it's like a set of rules I don't really know what it is programming languages don't have Zen's so it's just a weird thing that's part of a weird culture which is Python welcome to our group if you have an interpreter open type import this that will bring your Zen to you it's like 15 or 16 lines it was written by Tim Peters and when I first found the pythons then I thought oh that's really interesting let me read through it kind of figure out what it's about and it had these two lines in it so simple is better than complex complex is better than complicated I was thinking for my eyes like so but what what's really the difference between complex and complicated so I went to the dictionary go to Webster's open up complex it says blah blah blah see complicated I know where this is headed I go to complicated it says blah blah blah see complex circular definitions what are we going to do about this well I want to take us back time to the 1980s you have to like rewind your brain to before Facebook before Google before the internet right before all these things you had a couple of titans of computer science and in the first corner of our Fight Night is Donald Knuth Titan he's got some heavyweight Donald Knuth heavyweight belt he is known as the Isaac Newton of computer science that's a pretty big deal right I guess I could not make a bigger deal about this man he is so influential on all the computing that you and I have ever done he's best known for literary works it's called the art of computer programming and in I think six volumes is it five he's not he hasn't finished this is going to be this is like his life's work his magnum opus is the art of computer programming I bought the volumes back when there were only three I thought about dragging it here but it is so heavy I was like no spare my back I don't want to drag that out here but you should go on Amazon you should buy a copy tonight just to pay your tribute to this Titan Stanford University he's still alive like he still talks there very good you should go and you know in your lifetime see him speak if you've ever heard of Big O notation or what we would call asymptotic complexity how many people have ever enjoyed or found the bane of Big O notation yeah you have this dialing for that he's the one who tediously and in great detail determined and documented the Big O notation of all the core datatypes all the core algorithms you've ever worked with in fact the second volume of his work is just called sorting like he wrote 2,000 pages on sort there's a lot there and he's also known for the tech computer typesetting system sometimes referred to as la tech basically a very fancy would say WYSIWYG kind of text editor in the other corner our other Titan of computer science is Douglas McElroy and Doug is best known for a little operating system called Unix you ever heard of this one like he was the head of research at that last he was one of the people that attended Ritchie I encouraged and reported to as they're developing this operating system that they would eventually go on to win a Torino word for like again it's hard to make a big enough deal about this guy he is sometimes known as the UNIX philosopher how many people have ever heard of the UNIX philosophy the very first item in the UNIX philosophy is to do one thing and to do it well that's this guy he sometimes known as the piper of the shell I know it sounds like I tried to think of a Pied Piper joke I just like slip in here but I couldn't think of one how many of you have ever piped the output of one command into another you have him to blame for that he was a huge fan of shell programming and he developed a number of UNIX tools disk sort transliterate join graphs spell speak and he's still alive he's a professor at the Dartmouth College back in the 1980s without any internet people had to learn things they couldn't just google it they couldn't get it on their social media so there were magazines so imagine a printed thing and then open that up on a Friday night and you would kind of like flip through this and back in the 1980s John Bentley he published this magazine called programming pearls and he invited Don Knuth who is a big fan of something called literate programming and Doug McIlroy to contribute to the magazine Doug was a big fan of shell programming and he said let's throw down Doug actually proposed this he said let's throw down a challenge and we'll publish our solutions and we'll see who does better so he was the challenge you got to read a file you've got to parse the words in the file you've got a tally the frequency of each word and print the top ten Wow whoa by today's standards how does this challenge feel it's like an interview question right it's like so you couldn't do that we don't want to bring you on site it was nice meeting you good luck but back in the 1980s this was like we should get the two smartest people alive this problem and then we'll publish it in a magazine so that everybody can read it and this was actually a really widely distributed magazine so they agree to this and you have to understand for Doug to propose this challenge with Dawn with a little bit like crazy because Don Knuth was such a Titan of programming that he wouldn't just submit something brilliant he would prove that what he submitted was the best possible thing anyone could do and he would prove that it was optimal and true and everything it was like boo you cannot do better I've proven it that was that was what he was going up against so Don he tackles this problem he injured at the time a data type that very few people had ever even heard of it's called a pre fixed tree or try it comes from the word retrieval he proves that this is optimal in time and space you can do no better than this data type and he writes it in his literate programming style now understand that Don Knuth being professor emeritus at Stanford working on the art of computer programming this guy is like an uber academic literate programming serves his needs really well because when he writes an algorithm he needs to publish it in a paper he needs to prove that it's correct he needs to test it he needs to get it you know typeset and all these things and so he really developed literate programming around that idea so he could take one program and he could feed it through a compiler he'd get an executable he could feed it through a typesetter he gets something he could publish he could feed it through a test system he'd see all of his test cases passed and they decide to publish this in programming pearls anyone want to guess what the solution weighs in that how many pages of the magazine eight and it is abbreviated heavily this is a complicated solution here's McIlroy's solution he actually wrote it in one line he spent one extra paragraph describing each of the steps and he said I'm gonna let other people decide which of these is the better the first program here at TR is called transliterate he's saying take all of the letters that are not a through Z and convert them into newline characters then take all of the uppercase letters and make them lowercase sort that now as he sorts that it's gonna put all of the words first and he says oh look at this trick I have unique which just happens to have a – C switch which will go ahead and count how often things occur in a row and when we output that I can pipe it back to sort which just happens to have a – n switch which will sort numerically by the second field and – R will do it in reverse and then the said line says only give me this number lines in the output that was his entire solution so what's kind of interesting here in the modern day this is a multi-process solution he's starting several processes it's actually almost a distributed system system we could get a bunch of these machines together we could start connecting them via the network we could pipe this data from one to the other you look hard enough at that and kind of squint and cross your eyes that's a MapReduce thing right it's like ooh this is googly it supports problems that are larger than memory he doesn't have to pull everything into memory at once and he's using his shell programming so here's the question who won how many of you have ever written a literate program how many of you have ever written a shell program who won Doug one and this was what was so fascinating to me as ice all these two styles of programming I saw them as really like extremes right on the one hand you have literate programming which has all of this documentation and testing and it's beautiful and it has a proof and it's got all this typesetting and it's part of a whole ecosystem of tools for teaching other people the best solutions and then on the other hand you have this one line shell script which of these would you rather maintain I would much rather maintain the one line shell script and what's interesting to me is that Python falls between these two and I want to look for a moment at some of what Python has that helps it in this battle the biggest feature of Python which is actually built into the language itself is docstrings we actually support this idea that you should embed documentation in your Python code and where can you put doc strings you can put them at the top of a file that's a module you can put them at the top of a class you can put them at the top of a function now there are only three different things that create a new scope in Python what are the three things that create a new scope in Python modules classes functions ah isn't that interesting how those line up so well and these actually get saved in an attribute it's the dunder doc attribute it sits right next to dunder name and because it's an attribute because Python embraces this idea of introspection and you can take things apart we build tools on top of this how many people have ever used the help built-in function how many people have ever typed help of help ah some clever folk there Python doesn't explode when you do that it tells you about how the help system works what's really interesting is that help is a function that comes from the PI doc module hi doc was a set of documentation tools for how do we take these doc strings and display them in a really elegant way and what I love about PI doc is not just the help built-in but the fact that it's also available from the command-line so you're inside your shell type Python – MP doc – – health or just type a module name and I'll give you something like a Linux man page it really makes python feel like it's part of your system it's embedded in things it works the same way as that UNIX philosophy if you go ahead and try Python – MP doc – be it actually fire up a web server it'll translate all of those doc strings into HTML web pages on the fly and it'll open up a web browser for you so you can browse the documentation inside a web browser somebody say wow wow I was on a plane once they wanted to charge me 40 bucks for Wi-Fi and I was like oh hell no I don't need that I just need to look up the doc strings of a few modules it's all tied up boom done run that on localhost in addition to pi doc you get doc tests and doc tests is one of those gems that most people they just can't bother themselves with but doc test is so fun because it's a twofer it's a built-in twofer guys you're gonna write documentation and buy write documentation you get to write it in your favorite way possible my favorite way to write documentation copy/paste how you write some code the Python shell you try it out and you think hey it would be nice if I let future users know that this is how it works you copy paste that snippet in now people look at this they go ah yes the solve linear function if I give it two and four it gives me negative – doc test goes and takes that and turns it into a test case like boom done that's awesome right so what's fun about this what's really fun about this how many of you have ever looked at documentation before you're presented with a wall of prose and you carefully read through every line of that prose in order to understand what you were working for before you tried it out is that what you do it's not what you do what do you do you scroll for an example you take the example you copy it you paste it and then what happens if the example doesn't work do you carefully fire up file a bug just as Lisa described and say hey I'd like to help you maintain go well I guess it's quittin time example doesn't work I don't know what to tell you what how can we go any further this will keep your examples up to date because now your examples can fail ah that's not kind of scary doc test test mod is how you access it from Python it'll give you back this little name tuples zero failed one attempted it's actually turned this into a test case it's got a nice shell interface as well Python 3 – M doc test I encourage you do a – – help or go ahead and put any text file here could be your restructured text documentation could be a Python file could be a text file anything that's plain text it'll go through that you know look it's basically really simple actually it's looking for this three greater than signs it goes that looks like a Python shell okay take the thing that's after it call exec on that compare the output using repper it matches you past now it's actually a little smarter than that because if you embed logging or you have print statements it goes and it captures those print statements it ignores those logging messages and it'll go and validate your output for you the core developers read canoe so that you don't have to it's one of the sayings we have in the Python community canuse a big deal all of you should take the time to read the art of computer programming and in fact Knuth was really concerned that computer science was going to get dumbed down by new languages and things that would make things easy on people so with that disclaimer know that python is specifically intended to make your life easier thank God and if we looked at like a Python solution to the problem we had talked about today if you look inside the counter data type inside the collections module it's got a most common method which will go ahead and spit out the end most common by value in your dictionary you could build a very simple Python solution using that data type how many people know how most common works we have the author here this evening besides Raymond anyone know how if you were if I asked you give me the ten largest in a sequence how do you do it bubble sort is very small maybe that's a good idea in the back sir maybe he may be a heap see the naive solution is to go and sort the whole thing like bubble sort maybe like quicksort if you thought oh we can do better than bubble sort but it turns out Knuth tackled this problem he said we can do even better than that we use a heap data structure and because you're interested in only the ten largest out of a thousand we don't need to go and sort the nine hundred and ninety that come after that ten the core developers read Knuth so that you don't have to all right who won our fight Knuth no I don't think so I think math or Roy run won the fight but I encourage you to write Python code which of those solutions was complicated the news was complicated because the complexity of a tree a prefix tree was something that was irreducible it is optimal it's the right answer but my god is it complicated which one was complex mcelroy because it was the composition of simple tools you could write sort you could write transliterate and McIlroy encouraged software reuse by breaking all those down into simple tools that you would compose together in order to create complex behavior simple is better than complex complex is better than complicated thank you so much you [Applause]

Leave a Reply

Your email address will not be published. Required fields are marked *