The Power of Self-Learning Systems


TOMASO POGGIO: I’m
Tomaso Poggio . I’m the Director of CBMM,
which is hosting this talk. Back in 2011, Josh
and I organized– Josh is there– one of the
symposium for the MIT 150 anniversary. And the symposium was
brains, minds, and machines. And the last day was the most
exciting day of the symposium. [INAUDIBLE] about the
marketplace for intelligence. We invited some of the big
companies, like IBM, Microsoft, Google, and so on, and
a couple of startups. One was Mobileye and you have
heard from Amnon yesterday, about the state of
autonomous driving. The other one was DeepMind. And Demis spoke
then, he had been a post-doc for a short period
of time with me at Harvard. And he was at CBMM, in
what was already then at the heart of machine
learning at MIT. Demis spoke then about the
unofficial business plan of DeepMind, which was
to create the first AI in the virtual world of games. Since then, you have
come back a few times to keep updated about
the progress of DeepMind. Last time I think
was three years ago. And this was shortly
after AlphaGo won the competition in
South Korea in Seoul against Lee Sedol. And so this time, Demis may
just very well declare a victory in the virtual world of
games, because they’ve done everything, you
know, beyond all hopes. Maybe we’ll issue a new
business plan, a new challenge for the real world. We’ll see. I have often said
that intelligence, the problem of intelligence, is
the greatest problem in science today and tomorrow. This also means
it’s not going to be particularly easy to solve it. I think, it’s not
only one problem but like biology, the science
of life, in the same way, the science of intelligence
is a lot of problems. We’ll need a lot of
breakthroughs, not one, but many Nobel prizes. So both Demis and
CBMM share the view that neuroscience
and cognitive science will be at the core of
progress, that ultimately will lead to develop– to understanding better
human intelligence and developing
intelligent machines. The journey may be longer
than many people think. But it will be very rewarding
in many different ways, intellectually and otherwise. And we should enjoy this journey
and enjoy making history. So please welcome Demis. [APPLAUSE] DEMIS HASSABIS: Thanks Tommy. Hope you can all hear me OK. Thanks all of you for coming. It’s amazing to
see you all here. And it’s always really fun
for me to come back to MIT and catch up with
old friends, and also see how much amazing work
CBMM and all of you are doing, both in neuroscience as
well as machine learning, and in the crossover, which
is very close to my heart. So today what I thought
I’d talk about– I mean there’s lots
of things I could discuss that we’ve done since I
was last here three years ago. But I’ve titled the talk, The
Power of Self-learning Systems. Because I think what
we and others have shown over the last few years
is how surprisingly useful they can be, and how powerful
quite simple idea can end up being. So I’m just going
to talk about– begin with framing. You know I’ve always
thought about AI as effectively
bifurcating into two different types of approaches. And this is when we think
about AI and the history of AI. And on the one
hand, you know, we can try to build expert
systems that rely on hardcoded, knowledge, that basically are
handcrafted with the solution to a problem. And they’re usually inspired by
logic systems and mathematics. And that, for a long
time, was the way that most people
attempted to build AI. And the problem with that is
that those kinds of systems can’t deal with the unexpected. They basically usually
fail catastrophically, if something hasn’t already
been programmed into them, if they encounter
something unusual or that the programmer
had not foreseen. And the other interesting
issue is that, of course, they’re limited in scope
to the sorts of solutions that we are able to articulate,
we as the human programmers. So, of course, by
definition, they’re limited to these
pre-programmed solutions. On the other hand– and I think
why this is such an exciting moment in scientific history
and why you’re all here– is that there’s been this
sort of big renaissance, if you like, of the
learning system approach. Where instead of
programming solutions, we build systems that
are able to learn for themselves from
first principles, learn their own
solutions to problems. And what we hope is
that these systems will be sufficiently general. They can generalize to
all sorts of new tasks, perhaps tasks they’ve
never seen before, and actually indeed
even solve things that we as human scientists
are not able to do. Right. So maybe the promise
of these systems is that they could
go beyond what we’re able to solve on our own. And I’m going to talk about
that in the latter part of this talk. Now [INAUDIBLE] in
learning systems, and why I think what CBMM
is doing is so great and also what we
do at DeepMind is that we can look to the best
learning system we have, the brain, the
human brain, and see if we can be inspired by
understanding that better, inspired about new
algorithms that we could use, new representations, new
architectures that are inspired by neuroscience and our
understanding of the brain, incomplete though that is. And I would say, not
only can we be inspired, but we can also
validate algorithms that we’ve come up with
ourselves from perhaps for mathematical
approaches or physics approaches in orthogonal
approaches from neuroscience. If we build a system like that,
reinforcement learning I think is a good example of that,
which was pushed forward in engineering quite a
lot in the ’80s and ’90s. We can then see,
you know, when we find that the brain implements
a form of DTD learning in some famous
results in the ’90s, you know, we can
be confident then that reinforcement
learning is plausibly part of a kind of
overall AI solution. And so we can push harder
on those techniques, if we know that the brain
also uses those techniques. And that point of validation
is often overlooked. But it’s very important
when you are running or in charge of a big
engineering program, you know, where do you
decide to put more effort in. If something doesn’t
work, which you know things often don’t work
first time or even many times in research and engineering,
how much more should you push that approach. And if you can
take some guidance from the brain and some comfort
that these systems are– the brain does
implement them, then that can be a very important
source of information. So as Tommy mentioned,
last time I was here and maybe some of you were in
the audience three years ago now, we’d just come off– fresh
off the back of our big AlphaGo match that we played
in Seoul, and really overturned a lot of the
traditional thinking in the game of Go but also was
very surprising to many people in AI. And many of the experts
sort of proclaimed that this was a decade before
they expected it to happen. So I’m not going to talk
about AlphaGo today. But if you’re interested in a
sort of behind the scenes look at what happened with AlphaGo
and the whole project, I’d recommend this
documentary that was done by– it’s an award
winning documentary, It was done by a
great filmmaker, who followed us behind
the scenes and had access behind the scenes
all during this journey. And it’s on Netflix and
Amazon Now and other places. And I’d recommend you
take a look at that if you’re interested
in that story. So today I’m going to
focus on what we’ve been doing in the last 12 months. And it’s been quite a watershed
year for us at DeepMind. And we’ve had quite a few
interesting breakthroughs that I’m going to cover today. So the first thing I’m going
to talk about is AlphaZero. And AlphaZero is our
latest incarnation of the AlphaGo program, a
sort of series of programs. So I’m just going to show
you, for those of you who don’t know, the
lineage of AlphaZero and how we came about
working on this project. So first of all, there
was the original AlphaGo. So this is 3– three plus years ago now. And AlphaGo was
amazingly strong. But we had to do a
bootstrapping step, which was to learn from
human games first, by predicting what human
players, not experts but strong amateurs, that we
downloaded some games from online databases. And we tried to initially train
our neural network systems to predict what the
human player would do, sort of by mimicking
these human players. And then once it got sort
of reasonably strong, like weak amateur level,
then we started this process of self-play and self-learning
by playing against itself to improve. But what we wanted to do is–
and the way we work at DeepMind is, we try to– we
always got generality in mind, that’s the final
goal, like the kind of purer system you can build with the
least amount of assumptions in it. And maximally general,
that works across as many possible
domains as possible, without any adjustment. So we start off, can
Go be cracked at all? That was obviously
the original question. Then once we did
AlphaGo, then we looked at all the
components of the system. And we saw– we sort of
tried some systematic work then to remove all
the things that were specific to Go,
right, that were still left remaining specific to Go. So the next step was what
we called AlphaGoZero. And what we did here is
remove this initial step of needing some human
games to bootstrap from from the beginning. So AlphaGoZero started
from totally random play, and improved all the
way to become stronger than the original AlphaGo
just through self-play, just from playing millions
of games against itself and modifying its
neural network based on whether it won or lost. So– and so zero here
refers to the use of zero human knowledge, domain
specific knowledge. Obviously, we need the human
knowledge to make the system. But we didn’t use any
Go specific knowledge was required. And that’s important. Because eventually if we
want to use these systems for real world
problems, you know you may not have a
treasure trove of data, like millions of
human amateur games that are freely downloadable
on the internet. You may not have access. Or you may not– there may
not be that kind of data. So you might have to
generate it itself. And then the final
step, which we’re going to talk a little bit about
in this section, was AlphaZero. So now we drop the word
Go, because AlphaZero is able to play any two-player
perfect information game, to world champion
or higher standard. And I put the asterisk
on any, because we only tried it with three. The three– the
three biggest games that are played professionally. So chess, which
I’m going to talk about lots and uses
the canonical example. Go, so AlphaZero
can be AlphaGoZero. And Shogi, Japanese chess, which
is a really amazing version of chess. It’s very different
from Western chess, but extremely complex as well
and played professionally in Japan. So I’m going to talk
about AlphaZero. And you can see– so this is sort of framing
of increasing generality. AlphaGo first, remove all the
Go specific and human data components of that, and
then finally generalize to any two-player game. Now as you all
know, chess and AI has had a long and
storied history, starting from the dawn
of modern computing. Von Neumann, Turing,
Shannon, all of– many of my scientific heroes,
even including Babbage actually, going
all the way back, all tried their hand
at chess programs. In Turing’s case, he wrote
it out on a piece of paper and executed it himself. But so it wasn’t a
computer program as such. But he, you know, he
tried his hand at that. And they all imagined
what a strong chess program might be like. And in fact, Garry
Kasparov recently wrote, in his editorial
for our science paper, that chess has
always been or can be thought of as like a
drosophila of reasoning. And I sort of agree with
him on that, actually. That’s definitely
the kind of place it’s occupied for many
years in AI research. Now what you can
argue, of course, chess has been done, right. I mean it was done in the late
’90s, when famously IBM Deep Blue beat Garry Kasparov
in a six game match. So at that point,
and since that point, chess programs
have been stronger than the best human players. And indeed has changed the
way that we play chess. But so you could
argue, well, why did we bother trying
to apply AlphaZero to chess, when we already know
machines can beat the world champion in chess? So this is unlike Go,
where obviously there was no program that could
beat the Go world champion. Now the reason is,
and this is a debate I had actually with one of the
project leaders on Deep Blue, back in 2016, when
we’d just done AlphaGo, I think we hadn’t actually
yet done the Lee Sedol match. But we’d done the Fan Hui match. And I gave a talk at triple-AI. And at the end
Murray Campbell, who was one of the project
leaders on IBM Deep Blue, came up to me afterwards and
congratulated us on AlphaGo, but asked me the question
of, what do you think will happen if we
apply this to chess? Is it possible for
these learned systems to be stronger than the
handcrafted systems that have had 30 years of incredible
engineering being done on them, plus the distillation of many
hundreds of grandmasters, right. So this is one of the most– I’d argue chess computers is one
of the most heavily engineered areas in all of AI, right. I think it’s been
probably the longest standing continuous domain
that’s been worked on. And they’re obviously
incredibly strong. And even as a chess player
myself, an ex-chess player myself, I was wondering, was
chess rich enough as a game to have enough exploration
left for a learned system to learn some new ideas and
perhaps some new theories or themes about the game,
that would allow it to compete with these really big brutes of
the machines now that we have that are incredibly optimized
to brute force search through chess moves. So you know actually,
we both concluded at the end of the
discussion that we didn’t know the answer to this. Would it be able
to be competitive? Is there enough room in
chess for this kind of stuff? When I asked my strong
chess player friends, they didn’t know either? So, to me, that’s
always a good sign of a great scientific
question, where basically either answer is
very illuminating and very interesting. So we decided to
try and do that. Now just to give you
an example of how amazing and kind of carefully
these current chess engines are written, the
current world champion, or at least when we
were doing these tests, the 2016 world champion is
a program called Stockfish. And it’s in the lineage,
it’s an open source program that’s in the lineage of Deep
Blue, these kinds of systems. And they have hundreds
of rules, even thousands of handcrafted
rules about chess, pawn structures, king
safety, all different things of valuation. The teams of programmers
over many years have tried to distill
from human grandmasters and encapsulate in these
complex database of rules. And then of course, they
have to balance those rules against each other. So there’s a phenomenal
amount of engineering has gone into this. And then obviously,
a lot of optimization to make these systems
as fast as possible, so they can actually
look at tens of millions of moves per decision
they have to make. And then they have
opening databases and end-game databases
that tell you– that exactly solve
both the opening and the end, especially
the end game. Seven pieces or less
has been solved. So you just have a lookup table. Now that’s what a chess
engine looks like today. That’s what we’re confronting
when we try and build a learning system that
could compete against that. What we do is we throw
away all those thousands of handcrafted rules and that
handcrafted knowledge and all that chess knowledge,
and we replace it with two things, self-play
reinforcement learning and Monte-Carlo tree search. That’s it. So it’s actually a
very simple program once you’ve built
it and optimized it. So I’m just quickly going to
go through how AlphaZero works, for those of you who don’t know. I’m actually going
to compound together AlphaZero and AlphaGoZero,
because they’re slightly different systems. But they’re pretty much
trained in the same way. So I don’t want to go
through this twice. But as you can kind
of imagine, it’s a bit of a hybrid what I’m going
to describe between AlphaGoZero and AlphaZero. So what you do is, first of all
you create your neural network architecture. And in AlphaGo, we used to
have two neural networks. One that chose, narrowed
down, the likely moves that would be played
in a certain position. We called that the
policy network. And then another
neural network that learned to evaluate
the current position, and who is winning and
the probability chance of which side was going to win. By AlphaGoZero and
AlphaZero, we managed to merge these two
neural networks into one neural network
the output two values, sort of had two outputs. The most likely moves to
search and this evaluation. And that allows the tree
search to be very efficient, which I’ll come back to the
end, much more efficient than we have to do with
traditional chess engines. Because we can both reduce
the width of the search by using the policy network to
narrow down to the most likely moves, and we can reduce
the depth of the search by truncating the
search at any point and using the value–
calling the value network and evaluating the
position at that point. So we start with that, has
no knowledge about anything. We do 100,000 games
sort of in batches of self-play of the current best
version of our neural network against itself,
100,000 times roughly. That creates a big
training corpus of data, synthetic training
data, obviously, in this case. Every 100,000 or
so batches, we then try to train a new
neural network, based off of the old training data
and the new training data. And then we create this
new neural network. And then this new
neural network, we play it roughly 100 times
against, or maybe 1,000 times, against the old neural network. And at the point when it
wins 55% of the time, then we replace the old neural
network with the new one. So if it doesn’t beat the old
neural network more than 55% of the time, we do
the next 100,000 games with the old network. So now we have 200,000 to train
the new neural network and so on. Until eventually,
the neural network, the new neural
network, is better. If it is better, than it gets– it replaces the old one. And now the new
network is what we use to generate the next batch
of self-play data and so on. And this pretty simple regime
is incredibly powerful. And you can bootstrap from
random to world champion level and above in a matter of
hours with this system, from literally starting
from random play. So in chess, this requires
about 40 million games. We play a game in
about three seconds. That’s an interesting
tuning sort of parameter. Because what you could do is you
could allow the neural network in these training games a
longer amount of time to think, then obviously the games
would be higher quality. But you would have– but it would take you longer
to generate the games. So you’d have less games in
a certain amount of time. So that’s an
interesting trade-off that is still not clear what
the right trade-off is there. We’re still doing some
experiments on that. And then we play 5000 games
at a time on 5000 TPUs, which we have access to. But obviously if you
had less computers, you would have less
games at a time. And it would just take longer. So with this amount of compute,
it only takes a few hours. So that’s AlphaZero,
that’s the basic training. And then we then tested
it under match conditions, which is in our size paper. We talked to Stockfish creators. And we got the exact conditions
they use for their world championship matches. This is Stockfish
8, I should say. So there is now Stockfish 10. But two years ago, Stockfish
8 was the world champion. And we won the 1000-game
match, 155 wins to 6 losses, and the rest draws. But chess is a
very draw-ish game. But that’s a big
margin, 155 to 6. And actually when you look
at the six games, what you find out in those
is, it’s generally drawn positions, where AlphaZero
has pressed too hard for a win. So that’s another interesting
thing about how much should we reward
a win to a draw. And that that’s another
interesting thing that we could discuss in the Q&A
that we’ve played around with. So it’s convincingly stronger. And by the way, it surpasses
Stockfish in four hours from starting from random. So this is the
graph of improvement there, that you can see. So all this decades of human
handcrafted programming can be learned, well,
different knowledge can be learned in
a matter of hours. And then we tried it on Shogi. And we beat the world’s
best Shogi program, which is roughly human
world champion level, in a couple of hours. And in Go, we were able to
beat AlphaGo, AlphaGoZero in eight hours of training
with this new, more efficient architecture. So we believe this would work
on any two-player perfect information game, which I
have to say, was one of my– always one of my
childhood dreams. Because I used to play
lots of different games, as well as chess,
and we always used to talk about what would the
kind of master games player be. And those of you who
read any Hermann Hesse, The Glass Bead Game is
one of my favorite books. I’d recommend that
to all of you. And that in there is
about the beauty of games. And in this case, an
intellectual getting incredible at a game, and then using that
to solve many other domains. So for me this was
always a waypoint that I’ve been dreaming
about for a long time. And I think we’ve got there now. Now the interesting
thing is that– while there’s many
things to say actually, and I only cover a few
of the interesting things now and maybe we can go
more into this in the Q&A. But one thing to
look at is the amount of search– the efficiency
of these systems and the amount of
search per decision. Now there’s lots of ways
of measuring efficiency. And obviously, the human brain
is incredibly energy efficient. And these systems are not,
both the normal chess engines and things like
AlphaZero, certainly not compared to human grandmaster. But what I’m more
interested in actually is the compute efficiency
or the sample efficiency. And traditional chess
engines, like the ones that you can run
on your PC at home, now are so optimized they can
look at 10 million moves, order of tens of millions of moves
per decision they have to make. Now if we compare that
to a human grandmaster, on the left there, you know
the top grandmasters look at maybe the order of
a couple of hundred moves with each decision
they have to make. So many orders of magnitude less
than the computer engines do. And what’s interesting is
AlphaZero is sort of– it’s not as efficient as
a human grandmaster, not by a long shot, 100 x more. But it’s in the middle. AlphaZero looks at the
order of tens of thousands of moves before making a
decision, not tens of millions. So I think that’s
interesting, directionally, to think about there. So you know another interesting
test that you can do is, if you turn off all
the search completely, right, you can kind of measure
how strong these engines are. And with the chess
engines, if you turn off– if you get rid of their search,
so just using their evaluation function, they’re terrible. They’re like weaker than a
weak club player, so just about know how to play chess. But AlphaZero, if you
turn off all its search, it’s roughly international
master standard. So it’s pretty good, even
though it’s not– even without any search whatsoever. And we think we can
make that stronger, too. But what’s also cool, for
those of you who play chess, is the way AlphaZero plays. So we saw this in AlphaGo. AlphaGo came up with these
amazing new themes and motifs that the human Go
players had never seen and are now using,
by all accounts. But this was the answering– so
the first part of the question was answered, can we–
that Murray asked me– can we build a
learning system that can compete with these brute
force search engine type systems? And the answer is yes,
handcrafted systems. And the second question
is, what’s the richness of chess itself as a domain? And what was amazing and
really pleasing to see is the AlphaZero played
in this very unique style. And there’s actually many
unique things about it. But the main key difference
between AlphaZero and the way the
chess engines play is the AlphaZero
really favors mobility of its pieces
versus materiality. So the chess engines
really like material. And they’re known to
be grabbing materials. If you give it a pawn, it
will always take the pawn. And then the thing
about those systems was, they were quite
ugly to the human eye, to human expert aesthetically. Because what these
engines would do is they would grab material,
greedily grab material, get into a kind of bad
looking positional position but have more material,
and because they were so good tactically,
they couldn’t be beaten, even though
the moves they were making looked a bit ugly
to a human expert. Of course, in the
end, the chess players concluded maybe they just–
the computer just knew better, because ultimately
it was stronger. And perhaps our
intuition of beauty just doesn’t map
to functionality. But what AlphaZero
did was it plays in this very dynamic and,
to human grandmaster, aesthetically beautiful style. And it’s able to beat this sort
of kind of calculation style that the engines have. And that really excited
the chess world. And in fact has rekindled
my passion for chess, and I’ve been playing
a lot more recently. And got the chess
players really excited. And for those of
you who play chess, this is my favorite position
from the AlphaZero Stockfish games. AlphaZero is white. Stockfish is black. And AlphaZero loves
sacrificing pieces. And we can maybe
talk a little bit about why that is in a second. But it loves sacrificing
pieces to get more mobility for its remaining pieces. And this is the perfect
example of that. In chess, there’s this term,
German term, called zugzwang, which means that you’ve got your
opponent into a position where any move they make will
make their position worse. Right, that’s what
zugzwang means. And what AlphaZero
has done here, for those of you
who know chess, is it swapped its
rook for a bishop. And rooks are worth five points. Bishops are only
worth three points. And it’s done that so it
can seal off the black’s queen in the corner. You can see the queen,
hopefully with this pointer, right in the corner, can’t move. And these two rooks
are stuck, because they have to defend this
pawn next to the king. Because all of white’s pieces
are ganging up against it, none of its pawns can
move without being taken. So basically– and
the king can’t move. So nothing can move
in this position. So somehow it’s like AlphaZero’s
hermetically sealed Stockfish in. And literally, it can’t move. So it just has to
give up his pieces. So this is kind of incredible. And obviously AlphaZero has
foreseen this, you know, 10, 20 moves beforehand that it
was going to get into this situation, where literally
Stockfish cannot do anything. Now one reason you might
think– what we can debate, how come AlphaZero can
do this more freely. And some, by the way,
in human history, some amazing chess players,
some world champions, were famous for doing this. This guy called
Mikhail Tal where in the ’60s was world champion. And he was famous for making
extraordinary sacrifices and winning these
very beautiful games. And AlphaZero is sort
of in that style. And if you think about it,
in a normal chess engine, one of the first rules you would
build in to a chess engine, if you’re writing them. And I used to write these when I
was a kid, like different types of engines. You’d write the piece
values in, right. So you’d say rooks is five
points, knight is three points. And you would total up. One of the first things you
would do for your search engine is to basically total up
what each side has got. And it’s better to have
more points, right. So if you think about it, if
a chess engine, normal chess engine, wants to sacrifice
a rook for a knight, it’s minus– it’s hardcoated
rule is telling it that’s minus two points. So somewhere down the
line, through its tens of millions of
calculations, it’s got to figure out
that it’s going to get two points of value back,
like objectively figure that out by some other rule, right,
or capture that material back or more. And so that’s very constraining,
if you think about it, right. Whereas AlphaZero doesn’t
have that in-built rule, so it doesn’t have to overcome
this in-built bias that it’s about to make this sacrifice
and it’s worth minus– it’s minus two points. As far as its concerned,
rooks are just assets. Knights are just assets. They move in different ways. Right now, the
opponent’s knight is an outpost and really powerful. My rook’s passive
and not doing much. So I’m going to swap it, right. It can take into
account context, which the handcrafted
rules obviously can’t. Doesn’t matter how many
rules you program in, you can’t go in this position,
in this specific case, with the queen in the corner,
then it’s OK to sacrifice, right. I mean you could try. And people have tried. But it wouldn’t be
very general that rule. And you probably need millions
and millions and millions of rules to encapsulate that. And then on top of that,
think about something else, how are you going to balance all
those rules against each other, right? How are you going
to balance material, versus pawn structure,
versus king’s safety, all of these kind of fairly
esoteric concepts? And obviously you can
have a go at doing that, with grandmasters telling
you what they think. But it’s very difficult,
even human grandmasters don’t think in that way, right. They can’t perfectly balance
these things together within a few decimal points. So I think there’s
many reasons why AlphaZero is– and this style
of program00 is stronger. Than we showed it to– the kind of coda
to all of this is we showed it to one of my old
chess friends Matthew Sadler and Natasha Regan, two of my
chest friends from Cambridge. And Matthew Sadler’s
two-time British champion. And we let them in before
we published the paper to look at all the
self-play games. And they found that there were
so many interesting new motifs that AlphaZero had found. There were seven new themes
that they haven’t really seen before in
professional chess. And they were–
they really asked or petitioned us to
kind of write a chess book about these new ideas. And that’s just
come out in January. So if any of you
are chess players, I’d recommend this
book Game Changer which talks about how
these new ideas and why they’re so exciting. And Garry Kasparov also
wrote some very nice things about this. And he’s very interested in AI. And one thing he wrote
was, about AlphaZero, was programs usually reflect
priorities and prejudices of programmers. But because alpha zero
learns for itself, I would say its style
reflects the truth, which is quite an
amazing statement really. Of course he went
on to say that it plays like him,
which he was very– [LAUGHTER] Perhaps he’s a bit biased. I think it does a little bit. But he was very
pleased to say that as well later on in the article. So that’s AlphaZero. So I want to move
on now to AlphaStar, which is our newest program. So you can think, well, OK, you
know, we’ve done board games. And I was very clear to caveat
that with perfect information two-player games, right. So that’s quite
a big wide branch of things, encompasses a lot
of human activity in the games domain. But you know it’s still– they’re still easy
in some sense, right. And in two ways
that they’re easy, two important ways
that they’re easy, is that the state
transitions are very simple. You make a move. It’s pretty clear what the
next state is going to look at. And they’re perfect information. So there’s no hidden information
or impartial observability. So we wanted to tackle
these two challenges that we feel are beyond
what we did with AlphaGo and the AlphaZero
series of programs. And so we wanted to pick a
domain that was sort of more complex and a dynamic,
real-time environment that also had hidden information. And to do that, we chose the
game of StarCraft II, which many of you will be fans of. And you know, you can ask why
did we choose StarCraft II. Well, for several reasons, in
my opinion and widely held sort of acknowledged, it’s the most
complex and most exquisitely balanced real-time strategy
game, I think, ever made. And real-time strategy
games are kind of the hardest of the strategy
games in the computer games world. It was also the first e-sport. So it’s been played
professionally over a decade. So there are many hundreds of
professionals, a lot of them in Korea again, like
with the Go players. And that’s been played
very, very high level. It’s also been quite a
classic challenge for AI for about a decade, as well. So people have been
researching StarCraft AI and having– organizing global
competitions from about 2010, a lot of the first work
came out of Berkeley. And also, we had good
relation with Blizzard, who make this fantastic game. And they were able to
supply us with things like anonymized
replays of human games. And they were very
excited about exploring this for their own
games development. If they can build AI
in this way for games, that would save them a huge
amount of time and effort. And so the new challenges
are partially observable, as I mentioned,
massive action space. So this is another new challenge
compared to board games. There’s not a good way
of estimating this. But we estimate
there’s roughly 10 to the power 8 possible
actions per move, if you like, although it’s real time. And then there’s very
long term dependency, so games can last
like half an hour. And they can involve more than
5,000 steps, 10,000 steps. It’s kind of a range,
which is obviously a lot more than a board game,
which is in the order of 100 to 200 moves. And it’s real time
in multiplayer. The other thing about this game
is that the way you play it, it’s a very dynamic game. So unlike a board game
where the set of pieces is fixed at the start, like
in chess, here in StarCraft and these kinds of real
time strategy games, you build up your
army and your units. So every game is different. So there’s basically
four steps in StarCraft. Step one is you
collect your resources. You build a base. You build your units. And then you battle
with your opponent. And so it’s a very
complex game with many, many rich strategies. And there are three different
alien races you can play. And there’s a kind of
paper-scissors-stone element. So it’s a very
complicated game to play. In terms of our architecture,
I haven’t got time to go into this in detail. But we’ll be publishing
a paper on this soon, which will have all the details. But basically, there
are three feed forward networks that take in
the various observations. So there’s spatial
observations about the map. There’s various
numbers that keep track of your economy,
how many resources you’ve collected, and so on. And then there’s information
about your units, how many– there’s like 50
different types of units you can build, how many have
you go out of each type, and how many are you producing. And so those three feed forward
networks that you can see here are piped into a deep
LSTM, which is really the core of the system. And it has three layers of LSTMs
and a bunch of other stuff. And then the output
is a function head, which is what action to take. And then that’s parameterized
across the units, using a tension mechanism. So you can think of this as
a parameterized function. There’s output, move these
five units to this position. And that’s what is
output, and then you see that being executed. Now in terms of the
training, we created this thing called AlphaStar
league, which is– we call this population
based training. And really this is kind of
like self-play on steroids. Because instead of
now just playing against yourself,
one opponent, we have an entire league
of diverse competitors that you are competing against. So in the beginning, we
start with the first agent that we build. And that started by
imitation learning and supervised learning, by
looking at the replay games that we got from Blizzard. So this is human data, actually
not that strong those players. So they were like median
level human players. And then we create our
first agent here, 001, which is created by imitation. But then we fork
this agent, and we start using our self-play
and reinforcement learning to improve the
level of this agent. So this is very much how we
started like with AlphaGo. Except in this
case, we also keep around older agents,
the ones in blue, to make sure that as we
improve on to new strategies, we don’t forget to be– our old strategies. And then this level improvement
carries on for about 1,000 or more different epochs
as we keep increasing the strength of these systems. And at the end,
before we challenge the top human
players, what we do is we take what we call
the Nash of the league. So we look at, if we
have five matches, we take the best five strategies. And the ones that
basically together dominate the rest of the
league but are not dominated by any other
individual agent in– that’s outside of
that Nash, those five. So and then those are the ones
we take into the competition. Now one way we kind of increase
the amount of diversity in our systems is by introducing
intrinsic motivation. So obviously the AI systems get
rewarded for winning a game. But we also, to
increase diversity, we also give them
some pseudo rewards or intrinsic motivations. And these can be, make sure
you build x number of units and then win, right. So there’s like 50 different
units as I was saying. So we can kind of make AI– make some of the
agents specialize in certain types of units. We can also say, only
focus on beating this one other agent in the league. So it can kind of
pick on one agent. And we can sort of randomize
that a bit as well. So we can kind of
introduce asymmetries in all sorts of ways. And we’re still experimenting
with different ways, a very rich area to look at,
designing pseudo rewards and involving them potentially. And what you get is,
if you see this graph, is this is the different
units you can build or a selection of them,
so like 25 things here. And this is the number of
days of training the league. And this is how many– the sort of height of this
graph is how many of those units were built at different
stages of the league training, by all of the different
agents that were active at that moment in time. And you can see that
the different strategies ebb and flow, depending on which
one is dominant at the time. So once we did that, just
before Christmas last year, we decided we were
ready to take on some of the top professionals
in the world. So we invited in two of the
top professionals, TLO and MaNa from a German team. And they came in and
tested our system, behind closed doors in two
official five game matches. And here is– you can see
the progress of AlphaStar. Each of these dots is one of the
agents in the AlphaStar league. And you can see how
they’re improving. These are all the rankings,
human rankings, up from bronze to grandmaster. And obviously TLO and MaNa
are above grandmaster level. And we played them
just before Christmas. And we won those
two matches 10, nil. So we play two
five game matches, which we won 5, nil and 5, nil. We then showed the
replays of that. And it was
commentated on online. Some of you may have
seen it in January. We also tried one exhibition
match, which we actually lost, which we tried the new
interface, different way of processing the screen. And we’re still
working on that now. But overall, our system
is extremely strong. So that’s AlphaStar. Now if we now take a look
at all of our work together, we’ve talked about,
in the past, we started off with Atari games. So we sort of cracked
that in 2014 with DQN. Then there was AlphaGo,
AlphaZero, and now AlphaStar. And I think in terms
of grand challenges for games, I feel like
we’re sort of done it now, that’s the– I’m not going to declare a
victory like Tommy was saying. But I feel like– I feel that we’ve done most of
the interesting problems that were inherent in games. Of course, there’s
still other things, other kinds of games, games
like Diplomacy that need lots of language understanding. There’s some other
interesting things to explore. But we feel like we’ve done
a lot of the core work that was needed. Now there are many interesting
issues with these systems. I haven’t got time to talk
about all of these right now, because I think
we started a bit late. But so we may have
to overrun a bit. But there’s– one of the things
that was interesting about these systems and we learned
about these systems is with AlphaGo, especially the
first version of AlphaGo, it lost game four in this famous
match we played against Lee Sedol. We won 4, 1. And the one it lost, it
basically got very confused. And Lee Sedol did
this amazing move, on move 79, that confused
our evaluation system. So when we went back and
analyzed this afterwards, we were trying to
figure out why– what was it about that position
that had confused AlphaGo. And it wasn’t so clear, right. We could sort of tell there were
some motifs about the position. But it wasn’t– it wasn’t clear
exactly what the problem was. And we had another match
against the Chinese number one that we won in 2017 that
we had to prepare for. And obviously, he had seen– Ke Jie had seen what happened
in Lee Sedol’s match. So we had to fix this weakness. So you can think of it as
a little bit like a bug, if it was a traditional program. And obviously, if it’s
a traditional program, you would just write a
new rule or something to fix that hole in
the knowledge, database if you like. But the problem is this
is a self-learning system. So we can’t really just
fix it with some patch. We have to kind of
encourage it somehow to explore this area
of the search space, explore this area of the regime. And that’s a pretty tricky
thing to coax a system to do. And actually, if
people are interested, we can talk about
this in the Q&A. There were lots of
interesting ideas we had there of how to do that. And I think there’s going to be
an interesting notion of what debugging is, when it comes
to these new kinds of systems. Like what does
debugging mean when you have a problem with one
of your self-learning systems. And I think this is
really interesting. I think there’s a whole
kind of new paradigm there to look at in
computer science. I’ve talked a little
bit about covering the knowledge searchspace, sort
of related to this first point. How do you know you’ve covered
you know the whole surface area of what you
thought you were doing? is there a good mathematical
way of describing that? This also links in with
understanding the systems. So these systems are incredibly
good at what they do. They have all this
amazing implicit knowledge that they’ve built
up for themselves. But how do we understand how
it’s making those decisions? And then there’s maybe
philosophical questions about the nature of creativity
and what that means. You know, I started
thinking about three levels of creativity now. Maybe there’s probably others. But I’ve been thinking about
interpolation, extrapolation, and then kind of out-of-the-box
creativity, or innovation. And there’s three types of
things you can do there. And I would claim
that AlphaGo exhibited some aspects of creativity. So it was able to
come up with new moves that even human
players had never thought of or seen before. So I would say that’s
extrapolation, not just interpolation. But it can’t do full
out-of-the-box innovation thinking. Like AlphaGo cannot
invent Go, right. That’s what I think
we’re eventually after, can’t invent chess,
AlphaZero, right. It can just play some– it can create creative moves,
new moves, novel moves, and novel ideas in chess. But it can’t invent chess, yet. So– and of course,
I want to say, despite all these successes,
many of the most interesting challenges are left, right. And I won’t go
through each of these. But these are just some of the
ones that I think all of you should be working on,
and we’re working on. And I’m sure many of you are. Unsupervised learning,
memory one-shot learning, imagination-based planning
with generative models, learning abstractions
and abstract concepts, transfer learning, language
understanding, all unsolved. And we need to
crack these problems if we want to get to full AGI. And so I think,
actually in some ways, it’s the most exciting time
to be in the field right now. Because I feel like we’ve
just done the preliminaries. We’ve got the– we’ve
got onto the first step. And we’ve done some
interesting things. It looks promising. But now I feel like
the next decade or so, it’s going to be about
tackling, really, the crux of intelligence, which I
think is many of these things, and things even beyond
this list, which I feel are like really the heart of
the intelligence question. So I’m just going to try and
speed up a little bit here. But the– so that’s games. And that’s for us kind of a– it’s not the closing
of a chapter. We’re always going to be using
games and simulations forever at Deep Mind. But it’s a big sort
of watershed moment, I would say, for us
in this last year. And the idea is, you know,
for us has always been, and if you go back
to the 2011 talk– I think it’s on the
internet somewhere– that Tommy was mentioning, I
do talk about games and effects for our business plan in 2011. And I feel we’ve done a lot
of the things I talked about in that lecture. But it’s always been
a part of the plan. So games I’ve always
felt, and simulations, are the perfect
training ground for AI. But the plan is always,
always to develop general solutions that could be
applied to real world problems. And I think here is
also it’s very exciting, in the sense that
we’ve just now I feel got powerful
enough and mature enough algorithms, by no means
anywhere near to full AI, but they are already
proving themselves to be useful in many
real world domains. And I think that’s another
really fruitful area for all of you to explore
is how can you apply these systems to all sorts
of interesting applications. Now we’ve applied it
commercially within Google and elsewhere on lots of things. I won’t go into all
of these things. But in healthcare,
we’ve done a lot work on energy and data centers. Other people have worked
on personalized education, virtual assistant. I think the possible
applications are almost limitless. We’ve– I won’t spend
much time on this. But one of our most
recent pieces of work is we’ve improved the
windpower, again, Google uses a lot of wind power and
renewables, 700 megawatts. And we got 20% efficiency gain
on what they were getting out of their windpower, using
machine learning and some of our machine learning systems. And I’ll just skip the various
different aspects of that. Because I want to now focus
on the last part of the talk, which is– so that’s games. There’s commercial
stuff you can do. That’s all great. But the thing I’m
really passionate about is this section, which is using
AI for scientific discovery. That has always
been the reason– that’s the reason
why I work on AI, and the reason why I started
my whole journey is I wanted to use AI to
help us understand the universe around us better. So having AI is this
incredibly powerful tool that we as scientists
can leverage. So I already think even
with the current systems, which need lots more
improvement as I just mentioned in the previous slides. But even the current systems
can be applied usefully to quite a lot of
scientific problems. And I put up here sort of
three key characteristics, if you like, of
problems that I think would– if they have
these characteristics, would already be amenable
to this type of AlphaZero like approach. So first of all, number one is
that the problem by its nature is a sort of massive
combinatorial search problem, right. So if it’s got that
kind of character, I think that’s well suited
to this kind of system. Secondly, can you express
a clear objective function or metric that you can
then optimize or hill climb against, right. And again, many domains
it’s possible to do that. And three, are there lots
of ground truth data, obviously that’s great. But– and or, so ideally
it’s and, but it can be or, an accurate and efficient
simulator for that domain. And if I posit
that, propose that, if those three
things hold, then you can probably use
this type of system to help solve that problem. And we ourselves are
doubling down on this. We’re building a science team. It’s around 30, 40
people now at Deep Mind. We want it to grow
to about 100 people. If that’s something you’re
interested in, please come and talk to me and apply to us. Because I think there are
a lot of different areas where we can apply these kinds
of techniques, even the ones we have already today,
and make some progress. And here’s just a sprinkling
of some of the things we’ve looked at. And in a few cases, we’ve
got serious projects on, all the way from genomics
in theorem improving to quantum chemistry and so on. And it’s been used successfully
already by us and other groups in lots of areas,
exoplanet discovery. Some of their
colleagues at Google have done that,
discover new planets. Nuclear fusion, we’re looking
at controlling the plasma and in these fusion reactors. On health care, like diagnosing
macular degeneration. And even things like chemical
synthesis and material design. But I just sort of
final part of my talk, I want to talk
about what I think is the most exciting
thing that’s happened at Deep Mind
in the last year, and that’s our program
called AlphaFold. And AlphaFold is our attempt
to solve the protein folding problem, which many of you
will know what that is. But for those of you who
don’t, this is basically the protein folding problem. You get– proteins are obviously
the fundamental building blocks of life. And all life depends on them,
including humans, of course. And what you start with is an
amino acid sequence on the left here. So there are 20 amino acids in
nature, naturally occurring. And each one is like a letter. And you get this big
string of letters, 1D string of letters coming in. And all you gotta do is
predict the 3D structure of the protein, from this
1D array of sequence, right. And you’d like to predict
the 3D protein structure. And this is actually a protein
structure of hemoglobin. And you can actually
see the little hole here in the middle, the hemoglobin
that carries the oxygen. Right, so this is kind of amazing. I mean proteins are
incredible when you read into them, like what they do. And the reason this is so–
it’s such an important problem in biology is the structure of
the protein, the 3D structure, determines its function, right. So if we could understand how
these proteins are structured, we would much better understand
what these things were doing. And you could think of
them as like basically molecular machines. So there’s lots of really
cool videos of proteins working visualized on YouTube. And you can have a look at them. But these are two
really cool ones. Like this is– this
produces– this is an enzyme in your mitochondria. And it produces ATP,
which is basically the energy for all living cells. And on the right here
is a calcium pump that replaces calcium when
you exercise your muscle contraction. So they’re really like
little biomechanical, exquisite
biomechanical machines. And I think if we can
solve protein folding, and there are ways of getting
structures of proteins, right. But it’s very painstaking
crystallography you have to do. It can take like four, five
years to do one protein. And some proteins are
not crystallizable. But if we could open
that up, then we could– I think we’ll have big impact
on disease understanding, drug discovery, and also
synthetic protein design, actually building synthetic
versions or adjustments to these kinds of proteins. So the way we
tried to do this is by going to our deep
learning systems and training a specific
new type of system. So just very quickly,
how the system worked, AlphaFold, is we had
a neural network. And then we basically– we have this database of 30,000
known protein structures, that we have the
sequence for, obviously, but also the 3D structure. It’s known. It’s known through
crystallography and other methods. And what we do is we
train this neural network, which takes the amino acid
sequence in as an input. We also augment the data
in another way, which I hadn’t got time to explain. But by comparing the
sequence against naturally occurring sequences. And so those are the inputs. And then the outputs are
predictions of angles of each, at each point in the protein. And also a distogram, which is a
pair-wise distance in angstroms between estimated distance
between every pair of amino acids in that sequence, right. And these are
probability distributions of angles and distances. And obviously for
the training data, we can recreate the angle
outputs and the distogram from the actual structure. So this is trained in a
supervised training way. Then we have to do
structure optimization. So once we train
that neural network, we can put a new, never seen
before, protein structure in. And it will output two
outputs, the angle distribution over angles and a distogram. So once you have
those two things, how do you then end
up with a structure? And we tried all
sorts of things here, including neural networks and
RL and simulated annealing. But in the end, what’s
worked best so far is just a simple
quasi-Newton method in numerical optimization. So what you do is
you just take– you randomly sample from
this distribution of angles, that gives you a
distogram, you compare that against the output
of the neural network. That gives you two
scores, you also include a chemistry score
of [INAUDIBLE] score, which stops atoms being
put on top of each other. So that’s calculated just
through basic chemistry. And that’s all combined together
into a total score v. And then you carry on hill climbing
against this total score until you can’t
optimize it any further. And then you output the
candidate structure. Now we’re actually
working on– we feel like there will be
some of these neural network methods will work
better ultimately than just the Newton method. But right now, that’s
our current best method. And this is what a protein looks
like as it’s getting folded. So you saw it
initially, if we wait for the start of the GIF file. So it starts unstretched. And then this is
the optimization, folding it more and more, and
the smaller and smaller changes as the optimization
process continues, until there’s nothing
left to optimize, which will be shortly. And then that will be the
output candidate structure. So that’s how it looks
sort of visualized. And so how do we
test this system? Oops. We tested it on
CASP13, which is kind of like the Olympics of protein
folding you can think of. It’s held bi-annually
every two years. It’s been going since 1994. All the top research groups
around the world in this field compete, so it’s like 100 plus
international research groups. And it’s a pretty
fun competition. Because what happens is– this happened over last summer. Once the competition
starts, they have what– obviously in biology,
they’re trying to find the structure
of these proteins all the time using
crystallography. And so when a new person
finds– when a new group finds a structure of a new protein,
before they published it, if it comes– if it’s around the
time of the CASP competition, they’ll give the structure
to the competition. And it’s not been
published yet, right. So we have the– you
have the ground truth. The competition organizers
have the ground truth. But none of the
teams know it, right. So it’s a truly blind test. And it’s quite fun. So every day for
that three months, you get e-mailed an
amino acid sequence. And then you have
three weeks to scramble to submit a candidate, right. And then the next one
comes in the next day. And it’s pretty fun– it was
actually pretty fun to work on. So there’s about 80
amino acids sequences that you get like this. And then, you know, you’re
supposed to hand back these predicted structures. And so to cut a sort
of long story short. We won the competition,
kind of pretty unexpectedly. And here is the ground truth. We only have one person on
the team that’s ever worked on protein sequences, so. But he’s very good. He leads the team. John Jump– he’s
called John Jumper. And so the ground
truth’s in green. And you can see our
predictor in blue. And these are three
of the proteins that we folded pretty well. And they’re pre-overlapping
as you can see. But we didn’t just win. We won by quite a big margin. So we won 25 of the
43 protein categories that we were–
proteins that were in the category
we were competing, which was the hardest category. And the next best team
only won three out of 43. And also if you
look at the average, you know this is us
on the purple bar. This is not a very
good team here. And then it’s pretty
linear after that. So you know, we’re good
like 25% better than even the second best team. And then another– so this
really shows that these methods– I mean obviously
there’s a bunch of– it’s been going on for 20 years. So there’s a huge amount of
careful handcrafted systems in this area, like there
was in chess, right. So and obviously this is
just a learning system. And then here’s another graph,
which shows you basically– this is the Angstrom. And Angstrom is 0.1 of a
nanometer, the Angstrom error of each base in the
sequence, each amino acid in the sequence, each residue. And these orange lines
are all the other teams. This purple line is us. And this is the Angstrom error. And this is the percent
of residues, percent of amino acid sequence
sort of in bases that have that
much error in them. So you can see
all up to like 98% that we’re within 10 Angstroms. So that’s pretty cool. So, you know, this
is great for me. Because it’s
really– we’ve been– I’ve been talking about applying
AI to science for a long time. But this is really our
first proof of concept that this really could work,
and in a really important area of science that will
have a lot of impact. And I want to just
carry out this. So we are state of the art. And we, you know,
you saw the graphs, and we won the competition. But we’re still a long way away
from this problem being solved. And by solved, I mean
useful for biologists so they don’t have
to do crystallography anymore, or at least as much. And you still need
it obviously to check the ground truth but as much. And what they tell
us is they want– need a one Angstrom error
is the kind of tolerance they can deal with. So we’re still quite
a long way away from being within one
Angstrom across the board. So we’re continuing
on this project. And it’s one of the
biggest projects we’ve got going at the moment. And we’re exploring many,
many additional techniques. I hope to– and I
mean when I next come here to have some
new results on that. So I’m just going
to finish by talking about the bigger picture,
couple of slides on that. And we’ll go to Q&A.
So I feel like we’re making good progress now on this
thesis that I’ve always had, that AI is a kind
of meta solution. I feel like in science and in
many other areas of our lives, information overload
and system complexity are two of the
biggest challenges we’re all trying to overcome,
especially in the sciences. And I feel like we’ve
all got a lot of data, big data, sort of everywhere. And for a long time people
were talking up big data as being great. But I think it’s sort
of kind of the problem. So we’ve got all this data
almost everywhere we look. But how do we find the
insights and process that data. How do we– how do we
look for the right things inside that data and
make sense of it. And I think AI is potentially
a very powerful answer to that. And the way I think about
it is that intelligence can be thought of as a
process, a kind of automated process, that converts
unstructured information into useful knowledge. And my dream is to make
AI-assisted science possible to allow us to
make faster breakthroughs. It’s a very, very exciting
time right now with– and I think AI holds incredible
promise for us as a society. But just a couple of notes
of caution, you know, like with any
powerful tool, we’ve got to make sure we build AI
responsibly and for the benefit of everyone. We build it
responsibly and safely. And I think, as with
any powerful technology, it’s inherently neutral
in and of itself. It depends on how we as
humans in our society decide we’re going to deploy it. And I think a lot more
research and discussion is needed on the impact
of this technology. And if you’re interested in
researching these topics, like robustness and
bias and safety, please let us know,
because we’re, again, we’re expanding our work
on that front. And we’ve done a lot
of collaborative work with outside teams
and companies, like a partnership on AI
to try and increase this. And as a final slide, as a sort
of the neuroscientist in me, and we were just discussing
this yesterday at CBMM. I think trying to build AI
with a neuroscience inspiration behind it and
neuroplausibility is a great way to actually
understand the mind better, ultimately. Because if we can distill
AI and intelligence into an algorithmic
construct, maybe we can then compare it
to the human mind, and then that will unveil– allow us to better
understand mysteries, like creativity,
dreaming, consciousness that we want to understand
about our own minds. So thanks for listening. I just want to thank everybody,
all these amazing people at Deep Mind who worked on
all these different projects. And thank you all for listening. [APPLAUSE]

8 thoughts on “The Power of Self-Learning Systems”

  1. Amazing Talk ! Those who can image anything ,can create the impossible ~ Are there any clips about the Q&A parts?

  2. Thank you DeepMind we need you. Congratulations on your ARTIFICIAL INTELLIGENCE solving intelligence using it to solve everything else!!! So proud for the future of demis hassabis!

Leave a Reply

Your email address will not be published. Required fields are marked *