Adversarial Machine Learning

[MUSIC]>>See it. The stack will be
roughly organized in two parts. There’s going to be discussing
some of the efforts that I have been doing and I guess MSR has been doing on
robustness in machine learning, and of course, this
means a million things, and there’s many aspects of the pipeline that one
might want to make robust. In fact, I mean all the
parts of the pipeline, but in this time we’re going to focus generally on two aspects of it. First part, will be about
robustness as training time. This is what happens when perhaps you’ve outsourced to
collection of your data, or your training set is unvetted. Perhaps what happens when
the training set has perhaps arbitrarily chosen
outliers. Here is the first part. Second part is about robustness at test times is what happens when the adversary is
trying to fool your models. Let’s say, I’ve trained a model and now instead of giving
you clean input, I give it some abstract
unperturbed input and then all sorts of weird
thing can happen here. We’ll get into these
things, one at a time. Okay. So robustness at train time. Let me give you two
motivations of why we might care about this
robustness at training time. The first is more classical perhaps, and it’s focused on genetic data. So this is some genetic microarray
data, or some picture of it. What happens here it’s
gene expression data. So every individual you have
some maybe 20 K, or 40 K, or depends on the technology, length, vector, where
every vector is one, zero. This is an oversimplification
is either one or zero, if maybe some protein is expressed, and this is some genetic
information essentially by proxy. Then you want to do a lot of stuff. You want to check whether or not some disease is genetic
or something like this. We typically think of
what happens is that some entity out there collects a bunch of genetic
data and gives it to us, and then we just want to do
whatever processing you want on it. But this is not true because this
technology is really hard to do. It’s really hard to get
large-scale studies. So typically what happens
actually in practice is you agglomerate this information over a bunch of different labs. But these labs have
different standards, different equipment,
calibrations, people, scales. So instead of having a
nice homogeneous dataset, a large dataset you actually
have all this junk. You have a bunch of different things
which are all very different. They’re all heterogeneous
sources of data, and this causes potentially
uncontrolled systematic noise. So now instead of
having data from here, our algorithm needs
to work when the data is a bit more complicated. This can cause really
where systematic areas that computation biologist
struggle with all the time. The second is a MLE application, and it’s known as the
data poisoning attacks. Here this is what happens when. now I think maybe my data’s being crowdsourced
from some application, and think of it as being some recommendation system like maybe I want to recommend
restaurants or something. But instead of just, we crowdsource
a collection of these ratings. Now what happens if the
friends and family of a bad restaurant wanted like boots is reading, these things. How do we prevent
this from happening? Here’s a more interesting
example, perhaps I would say. This is something known as a
watermarking or back-door tech. What these researchers were
able to do was they took a standard training set
for training stop signs, or recognizing stop signs, and the standard algorithm neural network for recognizing stop signs, and they added a small amount of carefully chosen perturbations
to the training set. So they added some crops
into the training set, and then they ran the algorithm
on this training set. What they found, they were
able to do the following. They were able to install a backdoor, or watermark into the neural
network in the following sense. If I feed in a clean
image of a stop sign, or anything else, it performs
exactly as it’s supposed to. But if I maybe add a sticker, or the right sort carefully chosen
ahead of time by the adversary, then I can like make it a
flower or anything you want. So this is obviously
if we’re going to use these things for self-driving cars, this would be a bit of a security
vulnerability, say at least. So what happens when the data
can come from untrusted, tampered sources, how can we guarantee that algorithms
still work well? So the high level problem is
that these large datasets, whether or not because of just
noise in the collection process, or because of the fact that
they can’t be fully vetted, they’re often inherently noisy. They can be really adversarial noise. So the high level
question is how can we learn from this really
high dimensional data. So it turns out that a
statistical abstraction of this setting that really captures more or less all the
settings I already describe, or these example that
described is the following. As some idealized model of the data, let’s say it could be anything, and instead of getting samples from this distribution, at
the training time, I get samples from this
distribution plus corruptions, where these corruptions
are either because of the systematic noise, or because of the
adversarial perturbations that some adversary’s going
to do to my training set. So I get samples from this
funky looking distribution. So instead of getting
samples from here, my algorithm assumes I
get samples from here, but it needs to work when
I get samples from here. So this is the high-level
question. How can we do this? So how can we develop
such algorithms which are probably were robust
to worst case noise. I want to emphasize it here
because we’re really worried about these adversaries trying
to mess with our algorithms. In the worst case, it
is very useful to have these very strong
worst case guarantees. So we want to study this problem, and it turns out that
the most sensible way to do this is really backup from maybe the most immediate
machine learning applications. We really focus on at first at least the most basic
questions in this area, and then we can turns out you can
use these primitives to build really strong algorithms later on, for this more complicated
looking problems. So we’ll focus for a little bit
on this most basic question which is actually a
question that predates modern machine learning
called robust statistics. We’ll see how to use these
techniques to really build on to much more
complicated problems later on. So this is robust
statistics introduced by statisticians back
in ’60s and ’70s, and they ask the following question. Given samples from a distribution, where an adversary can
move an Epsilon fraction of your data arbitrarily, can you recover statistics
like the mean or the covariance or whatever of
the original distribution? A very simple question. So
here’s some repertoire model, and the adversarial
perturbations we’re considering. So I’ve drawn these nice IID
samples from a distribution. Maybe there you ask me, I
don’t get to see the sample, and said I haven’t had them to
very powerful malicious adversary, and what Tom Brady does as
he looked at these samples, he’s allowed to inspect
these samples arbitrarily, he has access, he knows what all my algorithm trying
to do even, let’s say. Then he gets a change
and Epsilon fraction of the points in whatever way
he desires to mess me up. So maybe he chooses, say
the points over there. Now I get these points
without the colors, and my goal is to learn information about the
original distribution, like the mean, the variance,
and these kinds of things. Hopefully, the setting makes sense. Good. As a point of terminology, let’s call that set of samples
that has this behavior where a small fraction of them
had been cropped at epsilon, and we should think of epsilon
as maybe being like 10 percent, 5 percent, these sorts of things. Relatively small, but constant
fraction of our data. So what can we do about this? Suppose I want to learn the
mean of this distribution, it turns out in low dimensions,
this problem is not too hard, just because in low dimensions, if your distributions is
at least are all nice, then points tend to concentrate
pretty closely to the true mean. Everything closes to the true mean. For instance, if your
data is Gaussian, you had this like nice
bell-curve behavior, everything’s really close to truth. So that means if the bad points, ones that are not being
obviously outliers, they also have to be pretty
close to the true mean. So they can’t actually influence
your statistics by too much. So all these points are pretty close, and so even if I just take
the empirical mean here, then I’ll still get
pretty close to two mean. However, this picture
changes dramatically when the mentioned becomes higher, and this is a setting which
we really care about. For instance, in the
genetic data setting, the distribution was
like 20,000 dimensional. But it turns out as the
dimensionality data scales, this picture changes dramatically. I can’t really draw
20,000 dimensions, but you should think of what happens as all the points now
live in this ring, almost like a doughnut, the
square root of the dimension. This becomes very
problematic especially as the dimensionality increases
because now all of a sudden, I have an epsilon fraction of points, and all of them are like root
D away from the true mean. So as the dimensionality grows, disorder of the influence of
these points on the statistic increases polynomially
with dimension. So if you do naive statistics, like if I just tried to throw away points which are
obviously outliers, I couldn’t throw away
any of the other points which lie within this ring
of radius root D. Now, all of a sudden, instead
of an error of epsilon, I get an error of like epsilon
times square root of dimension, and when dimension is like
20,000 or hundreds of thousands, millions, this renders my
algorithms essentially useless. So here’s the problem. Any methods
which looks for outliers on an individual basis will lose
its dimensionality factors, and so to get around this,
we need to somehow look for corruption at a much
more global level. It turns out that this is
really a problem because somehow looking for
global information, that’s like a really
hard search problem. We need to search for
a subsets of size epsilon N. This is a really
hard computational problem. For a long time, there was
a curse of dimensionality. Either the algorithm for a robust mean estimation was computationally intractable
in high dimensions. That is the runtime was
exponential dimension. So I just couldn’t run it in
dimension higher than like 10 or so, or I could run it, but I would lose these dimension factors
in the accuracy. So either I couldn’t run it, or if I did run it, I would get some answer which is like statistically meaningless when
the dimensionality data is high. So this raises a natural question, is efficient robust
estimation possible? High dimensions, and I guess I wouldn’t be here if
the answer wasn’t yes. Yeah. The answer is yes.
At a very high level, intuition is the following. So we can find some
statistics which allow us to certify whether or not the
corruptions are changing the mean. We can’t tell whether or not a single point is a
corruption or not, but we can somehow detect whether
or not the corruptions are acting together to mess things up. Here’s the idea, if the corruptions move the
mean of my distribution, then they will also shift the shape or the covariance in my distribution. Here’s the cartoon
explanation of why. So here’s setting I had before, and here’s the true mean of my distribution centered
roughly at zero. But the empirical mean of my dataset, because you can see the
red points are also appointed roughly in this direction, it’s going to be shifted
somewhere around here. How can that happen? So consider the true covariance
matrix of the data, it’s supposed to look like
this, spherical at least, that’s what this picture suggests. But along this direction of change, the back points are
generally aligned. That’s what’s happening here. They’re generally causing
a drift in the mean, but the only way that that can happen is that if along this direction, there is much more action
than there ought to be. There’s a lot more variance
than there ought to be, and it turns out that
in this setting, if they shift the mean, then
they also cause the data, the covariance to
actually look like this. This algebraically
manifests itself as a large eigenvector or a
principal component on my data, and we can detect
this algorithmically. So this is very useful
for two reasons. The first of all, the
contrapositive of this says that if the large eigenvalue of my
empirical corrupted data is small, then even if I can’t
detect what’s bad, then I know that the corruptions
aren’t changing the mean. Because I know that
if they’re changing the mean, then there’s
a large eigenvalue. So the opposite of that is if
there’s small eigenvalues, if all the eigenvalues are small, then they’re not shifting the mean. So whatever the
corruptions are doing, it doesn’t matter for my problem,
even if I can’t detect them. So in this case, we just
output the empirical mean. However, if the top eigenvalue
is large, then at the same time, this is also very useful
algorithmically because it gives us a concrete direction to look where the bad points must
have undue influence because we know that the good
points have bounded covariance. They’re supposed to be like
this nice spherical thing. So that means the top eigenvector, this direction can only have large variance because the bad
points are large in this direction, much larger than they ought to be. So now, all we had to do is
something like somehow project in this direction and down way based on how large they are,
and this projection, turns out that if actually
do this slightly carefully, you will down way the bad points must more
heavily than the good points, or remove more bad
points than good points. You can think of it
this way. This gives us a very simple meta algorithm
which we call filtering. It proceeds as following, you’re
given a corrupted dataset, you let mu hat be the empirical mean, you let Sigma hat be the
empirical covariance, and then you just take
the top eigenvalue, eigenvector of Sigma hat. Then if the top eigenvalue
is not too large, that means the shape of the data
is actually still spherical, so you start with the empirical mean. This is what we were saying before. Otherwise, we should project along the direction of
the top eigenvector, just throw away points which have large projection, it’s projected. If you are slightly careful, this is not actually a formal
algorithm description, but if you’re a slightly
careful about how you do this, it turns out it works. The high level of
intuition is that when you project on this direction V, maybe you’re supposed to see
it like a nice ball-curve or something because your data
is Gaussian, but you won’t. What you’ll see is this little
kink in the distribution, and like presence of outliers. So they have to be in the tail, and so if you just throw away
the points with bad score, you’ll throw away more bad
points than good points. So we made a lot of progress. So this is a general meta algorithm. I don’t want to get into
too many specifics of how to set these things. It changes based on what you
believe about your good data, but the nice thing about
this is this is it’s actually a very practical algorithm, because if you’re just slightly careful about how
you implement these things, if you do an approximate top
eigenvalue-eigenvector computation, a single iteration of this
runs a near linear time. So this algorithm is
eminently practical, and we’ll actually see some
experimental results of this, and will show that it
works pretty well. So here are some theoretical results, just in case anybody is interested. So if you want to estimate the mean of a distribution
with bounded second moment, you can get it through this error, if you want a Gaussian, you can get even better error, and it turns out all these
things are optimal error rates. More interestingly, and I’ll
talk about this in a little bit, we can also estimate the
shape of the distribution. So if you have a
Gaussian distribution and you don’t know the covariance, so you don’t know actually
what the shape is. Turns out you can use another
variant of this method and actually learn the covariance of
the Gaussian as well. So you can actually
learn the shape of the distribution in the
presence of outliers. This is some technical thing
which I don’t want to get into. But for all these settings,
it turns out that these are the first efficient dimension
with any guarantees, and there’s been a lot of
follow up work afterwards, but so far, this has all
been very theoretical. I want to convince you at
the very least that this is something that actually works. For this, we did a
bunch of experiments, and the first experiments are
simple synthetic experiments. Just to validate that the math
that we did wasn’t wrong. So what we did here was we
took some Gaussian data, or we took our inliers to be
say Gaussian or whatever, it doesn’t really
matter it turns out, and then we just naively
planted some outliers, in whatever way we tried, whatever we wanted, and then
we scaled the dimensionality, so we did this for a bunch
of different dimensions. We just plotted the error
of all the estimators, so ours is the black line. The blue line was a
concurrent algorithm, and then these are standard outlier removal or robust
mean estimation techniques. As you can see, as you increase the dimensionality of the data, these classical techniques, they indeed does scale like routine, but ours stays constant. When you zoom in, ours is better than this concurrent
algorithm as well. The other setting that
was very interesting was this covariance estimation stuff. So how do I estimate the
shape on my distribution given corruptions? We
tried two settings here. Here we tried one setting in which the unknown commands are spherical, so the data was nicely like a
ball like the example I showed. Now, this setting which
is more challenging, where the data is much more skewed in some
direction than the other. So in the isotropic setting, everything did pretty
well, although we again, as long as the dimensionality was
sufficiently high, did the best.>>However, in the skewed setting, there’s a dramatic difference. So our error is very, very small, and the concurrent algorithm
got pretty good error, and everything else had error
up here, we can plot it. So that’s great. That actually demonstrates at least that we didn’t
do the math wrong. That’s really reassuring, but this
is actually work on real data. So we tried some this
experiment which is, well, it’s a classical experiment called Genes Mirror
Geography in Europe. It’s a very nice experiment, and it relates to the first
example I showed you before. What they did was they took a
bunch of gene expression data, this is what I was discussing. They just projected onto
these principal components, and then they like this is the projection
of these points onto their top principal components. What they found was that, roughly speaking, after
appropriate rotation, they and I guess reflection perhaps they more or less recover
a map of Europe which is nice, and you’ve got the Iberian
Peninsula, those things. This is like surprising at
first when you think about it, but maybe now on second thought, not so surprising just
because you do expect that geography is a very important feature for determining genetic tree, you are much more likely to be close to people who are close
to you geographically. Principal components are supposed to tell you the most
important directions, because that’s what they do, and if you expect that the most important directions are like
a line of geography, this is the picture
that you would get. But to get this picture,
they had to be really, really careful, and they’re very clear about this in
their experimental setup. For instance, they have to remove information from immigrants
to get this picture, and this is perhaps
not so surprising. If you believe this
hypothesis that geography really dictates a lot
of genetic drift, then you would expect that immigrants from another
part of the world, it would destroy this
fine grained picture that they’re trying to get for Europe. So for the purposes
of this experiment, one can think of the
immigrant genetic information as outliers in the genetics setting. If you just try to add them back in, indeed what you get is you get no map of Europe, as far as I can tell. However, if we think
of it in this way, we just need to learn the top two principal components of the data in the presence of noise, in the presence of outliers, but the top two principal components of your dataset are just
like the top two eigenvectors of your
covariance matrix. So if I can learn the
covariance matrix robustly, even in the presence of outliers, I can just do the same experiment
without having to carefully go through my data and trying
to cherry pick these things. So if you do our experiments, you indeed get not
quite the same picture, but a much better picture that still covers large geography of Europe, even in the presence of outliers. There’s a lot more applications, another application for
instance is outlier detection. So another nice thing
about this algorithm, this filtering algorithm is
it actually gives you scores, because the projection
onto this top eigenvector, the larger it is, the more
suspicious you data point looks. So this gives you a natural score of which points are more
likely to be outliers. I mean there’s no, perhaps, theoretical justification for this, but it’s a nice heuristic approach, and it allows you to rank it,
and then you can just use this directly as a method
for outlier detection. If you use the state of the art techniques for
robust mean estimation, so we have a slightly more
complicated version of the filter, you get scores which we call
these quantum entropy scores, which are efficient to compute and outperform the previous state of
the art for outlier detection, on a number of synthetic, and
real-world outlier detection tasks. So somehow, this is
a setting where like the more theoretical
insights actually directly give you better
real world applications. So here’s a more synthetic setup. I don’t want to discuss what
all these words really mean, but essentially, so Alpha is a
parameter that we get the tune, and as long as we’re above zero, that means that we’re doing better
than the previous benchmarks, and you can definitely see
that in some benchmarks, at least for some choice of Alpha appropriately chosen,
we’re definitely doing better. So this is against to
specific benchmarks, and also when against
the much larger suite of outlier detection techniques,
we are able to do much better. On synthetic data and
also on real world data. This is an experiment where
we introduced outliers to CIFAR-10 and see how much of
the outliers you can detect, and again, we can detect much more. I don’t want to dwell too
much on this, but I’m happy to talk about
this offline perhaps. But so far, at least in this talk, all we’ve been talking about are these simple robust estimation test, like learning the mean,
learning the covariance, maybe you can do something like
outlier detection with it. But if I want to go
beyond robust statistics, what if I want to do more complicated objectives
like supervised learning? Like regression SVM in
the presence of noise? Maybe I have some
dataset and I want to do regression or train
a neural network, but actually my data
points have been cropped, maybe by somebody who’s
trying to mess me up. Turns out, we can also
handle these things, because these problems
can be phrased in the framework of
stochastic optimization. So the idea is, again hopefully, this has gone
over and one of the previous, days was what happens when you actually train these things is
you’re given some loss function, which takes a data point x and a model w. So this is
the parameters of your model. This is like a label data point, and you get a distribution over
these labeled data points, and your goal is to minimize your
expected loss of your models. So find a model w that has
the smallest expected loss, it minimizes the
function f. So this is a typical stochastic
optimization tasks that really captures all of these
problems like regression SVM, training whatever you want, almost anything, not everything, but a large fraction of
things that people do. Now, the question is
can we do this when we get samples not from
this distribution, but samples which are corrupted? So where a small fraction of these
samples have been corrupted. It turns out that you can’t do this. The first idea is just to run stochastic gradient descent
using robust estimates. So what do I mean by
this? So hopefully, this was covered in the first couple of days,
or something like this. But STD is a first-order
optimization method, and it’s an iterative method. So what you do is if you
have a current model Wt to get the next model Wt plus one, as you take Wt minus some step size times the
gradient in the direction of w, of your loss at a new data point Xt, evaluated at your current model Wt. So this is stochastic
gradient descent, and it works because if I
want to go in this direction, because this is just
the gradient descent, but I don’t have access
to this because I only get samples from
our distribution, but this expectation is an unbiased estimator
for this expectation, and it all just works. So hopefully this is somehow
familiar to people, I hope. The problem is that we can’t
do this in the presence of noise because if I, for instance, draw or crop the data point here, where I’m doing this, then this is no longer an unbiased estimate
for the true gradient. It can be very far off. But stated in other words, I have samples, and each
sample gives me a gradient. So now, let me think of the problem of I’m given a bunch of gradients, some of these gradients
are corrupted, but there’s only an
Epsilon fraction of them, and I want to learn the true
mean of these gradients. So I can just apply
robust mean estimation to robust learn the
mean of the gradients. Instead of like using
the plug-in estimator, I can just use data
estimator instead. So in other words, you just do this. I can just take a step in direction Gt where g sub t is a robust
estimate for the true gradient, and I can do this just by using the techniques that I introduced before using
robust mean estimation. This actually works great in theory, but the it’s a bit slow in practice. So we actually also come up with a more complicated alternative which I don’t really
want to get into, which does something else, but
also has provable guarantees. So here’s some guarantees,
it’s not super important, but it turns out that
it essentially does in fact get an approximate minimizer, and you get some bounds
which are pretty good, and in special cases, you
get very tight answer. But this is still a very
practical algorithm, and so here are some
experiments that we did. So we did ridge regression, we also did SVM, but the results are similar. So I’m just going to show you the
results for ridge regression. So again, on synthetic data, our errors like the best, like this, and when you try
some real-world dataset, there’s very large suites
of attacks that people devised against regression in SVM, and when you like target and attacking as any
of the other benchmarks, any other proposed
defenses that people try, so the error can go off
to infinity very easily, and R stays good. However, even when you try
to target the attack as much as we can against
our own defense, we’re very still competitive, and we are still very bounded. Let me even go beyond. Yeah.>>Do you keep track of all of your gradients and
just calculate the mean?>>All right, so what
we will actually do in practice is we are actually, like this full server algorithm, which I haven’t discussed. You just “Run” to optimality
on your corrupted data and then only then do you “Filter” on the gradients, and then you repeat. It turns out you can still
prove that this works. I pretend that I have no corruptions, I train something and then I check if maybe something’s gone
wrong at this point. If something has gone wrong,
I retrain after throwing out some data and that’s how it works at a high level.
Does that make sense?>>Then how do you identify
the data that you throw out?>>I mean throughout gimmick, it turns out that you can show that as long as you’re throwing out data, you can guarantee you throw out
more bad data than good data. So that’s all you need to
get some formal guarantees.>>You just throw out?>>Just throw out. We have no
idea exactly what we throw out, but all we know that we throw
out more bad than good, and that is sufficient, it turns out. I’m sweeping many
details under the rug, but I should say in the end, while the analysis
is very complicated, the algorithm itself is very simple. I want to now also address the second motivating example I mentioned at the
beginning of the talk, which were these backdoor
attacks against known networks. Turns out these attacks are
really easy to implement. So again the idea is that
the attacker has access to the training data and they’re allowed to add maybe a
small fraction of training points, and the goal is after I train, resonate to whatever
neural network on it, natural images still are
classified correctly, but if I have a specific
predetermined backdoor, then the classification switches. So for instance, this is airplane,
it’s classified correctly, but if I add this little
spot here it turns into a bird and here automobile is classified
correctly and if I add this, it’s like a speck of
dirt round thing, then the classification
again switches. These things are very easy to
implement at a high level, the way they work is that
these attacks try to convince the network that the
implanted water mark is a very strong signal
for classification. So the most basic
attack for instance, suppose I want to do
airplane versus bird, is I just add a bunch
of pictures of images as many as I can of airplanes with this watermark
and I call them birds, and then the neural network
realizes at least at train time, because it’s not very
smart in some sense, that if I want to classify
these images correctly, the only thing that’s different
from this airplanes is this watermark and you’re trying to get really good
accuracy on the training set. That’s the whole point of
what we’re trying to do. So you’re just going
to realize that if it has this implanted watermark, it should ignore everything
else and just focus in on this watermark
and call it a bird. So at train time, the learned
representation is going to amplify the signal of the watermark
and this creates a backdoor, because now when I get a fresh image, as long as I see this
watermark anywhere, the neural network
has learned this is the most important feature of an
airplane, a flight or a bird. If I ever have a picture of this, it’s a bird, or if I ever have that pattern and
the pixel, it’s a bird. No matter what the
other features are. This is at a high level
how the attacks work, and it’s actually kind of
strange because how do we defend against it? It’s not even clear that if I hadn’t like the model
of neural network, that it’s been “Backdoored”. Unless I know the watermark itself, it’s just going to behave like a clean image or
clean neural network. The inside is like what happens if we feed the training set through
the learned representation? Now, let’s consider the
setting where we are trying to “Watermark” cats with dogs. So now my training
set for instance has a bunch of real cats,
they’re labeled cats, but also has a bunch of dogs, which are labeled as cats, but also have this predetermined
watermark of backdoor. The way that the neural
network recognizes that these are cats, there’s two different ways. The first is that there’s real cats, somehow learns a real
cat classification, but for the other set of images, these dogs that have this watermark. It’s going to realize
that the watermark is only thing that matters in this
image and ignore everything else. So as a result the
learned representation of these watermark dogs is
going to be very different. It’s going to look over here, and the point is that in
representation space, like in the final layers
in the neural network, these things tend to
have very large distance and particularly this causes
a very large eigenvalue, my covariance matrix, and
this is all actually we needed for a robust mean
estimation for the filter to work. So we can actually “Run”
the filter on this and the algorithms is going
to detect the corruptions and remove them very effectively. I don’t want to get
into too much of it, but you can see this is
where the clean accuracy, like how successful
the attack is without a defense and this is how
successful it is after a defense. So it’s quite successful I would say. That’s all I want to say about
Robustness at Training Time, let me switch gears a bit and talk more about
Robustness at Test Time. Although, this is already like a mix between the two anyways. So I’m going to focus on here. These things known as Adversarial examples
for neural networks, and this is a very
curious phenomenon, which is what I can do is I
can take any image of a pig essentially and then I can add to it a very carefully
chosen perturbation. This might look like white noise, but it’s very carefully chosen, such that when I add a
very small amount of this to this image, makes
it looks like this. As far as you can tell as humans, I can tell the difference
between these two things, but to a neural network, this now looks like an
airliner and this looks like a pig. Okay, yeah.>>You need to know the
neural network, right?>>Not necessarily,
the most commonly used attack is a white box attack, where you assume you
know the models itself, but there’s also black box attacks. It turns out that
these sorts of things actually transfer
across architecture. So if I train a perturbation that fools enough known architectures, it will also typically fool
another unknown architecture. I should say that from a
security point of view, this is actually a very
real-world problem. You can get these things to be
very effective in real world. Here’s some examples, so here some students in MIT 3D
printed this turtle, which you can tell is
very clearly a turtle, but from many images, well from almost all angles, it looks like a rifle or something
else to a neural network. Here, Researchers at
Berkeley and U-Dub were able to put things on
these stop signs that look like graffiti that make neural networks think that
it’s like a speed limit sign, and similarly,
Researchers at Berkeley were able to design these glasses that make this person
look like Brad Pitt, and there’s also other stuff. So there’s another one, I couldn’t find like a
nice image to show this, but it’s actually maybe
the most real world one, which some Researchers
at Tencent Security Lab, they actually took a real Tesla, and they were able to put stickers on the ground so that when the Tesla
was doing it self-driving stuff, it saw these stickers, which just looked like maybe splotches of dirt or whatever or splashes
of paint to a human, and it turned, it swerved
into a wrong lane. It’s kind of scary, I was
going to buy a Tesla. So the point is that these
are real-world worries, but we want to defend against
these sorts of things. So formally, what are these things? Like what is the formal notion? So formally, we have a
classifier. This takes a point. You can think of RD as this space
of images and it maps it to, say, some set of classes. So here for now we’re going
to focus on classification, although you can also do
this further test as well. Then you have some allowed
perturbation set like an l_p ball. Okay. So some small perturbation set. Then we say an
adversarial example for a data point is the point x
plus Delta where Delta is in this perturbation set such
that the classification of x under f has changed
from x plus delta. So you want to defend against this. Roughly speaking, defenses
against this come in two camps. The first is empirical defenses which are defenses that
seem to work in practice, but we don’t know for sure
whether or not they work. We don’t have any
proof of correctness. In fact, many of these
have been broken often within weeks or
months of publication. So there’s a very
famous paper I think. So there’s last year at ICLEI which
is one of the big conferences. I think five or so, or sorry, eight papers were accepted
which claimed had defenses against
adversarial examples. Before ICLEI even happened,
five of them were broken. So there was this long cat and mouse game between the empirical
attacks and empirical defenses. There’s only one notable
exception to this rule which is there’s a technique
called adversarial training. It was as like the one
accepted defense to ICLEI that actually seems to work.
It works as following. So standard training as I alluded to before is just does the following. You have some query model, Theta t, you have a data point
XY in a loss function. So I have slightly changed
notation, apologies, but you’re just going to
do SGD, which is this, you apply theta t plus one
and theta t minus some step size times the gradient of your
loss, that’s your data point. This is just standard SGD. Adversarial training is just a
very slight modification of this. It says, instead of trying to
minimize the loss at my X, I’m still going to draw
this data point x. What I’m going to do is I’m
going to try to see if there’s any adversarial
examples to this point. I am going to see if there’s
any adversarial perturbations, and I’m going to try minimize
the loss at that point. Intuitively, that tells the network this point currently your model thinks that there is an
adversarial example to this, or currently, your model
thinks that this perturbation is a misclassify but it
shouldn’t be misclassified. I’m just going to force
you to classify correctly. So all it does is instead
of training at X, evaluating the gradient at X, evaluates the gradient X prime, where X prime is some average
o perturbation that you find via a strong of an attack as you can. This just tells the network, “Look, X prime should be classified as y and not whatever you think it is.” You just train it that way.
This seems to work but again, we don’t have any proof
of guarantees about this. The other camp of defenses are these so-called
certified defenses which have proof of correctness that you can’t break them no matter what. Unfortunately, traditionally,
these haven’t scaled very well. Or even if they do scale, they tend to get much worse numbers
than the empirical defenses. Something that’s been a very recent exciting development in this field is the development of a
protocol randomized smoothing by these
sequence of papers, culminating in this
paper by Cohen et al, which seems to bridge the gap. It comes a lot closer and often matches or beats the
previous empirical defenses. So what is randomized smoothing?
It’s a very simple operation. You can state it in a very
complicated looking way. But it’s the following. Given a classifier, it’s just this expression which let me explain in
pictures what it does. So imagine this is my classifier
or the colors of the classify. So if I have a point
in this blue area, its classified as some class. If it’s in this mint-area, it’s classified some other
class and so on and so forth. So the original colors are
the original classifier. It’s just some classifier. So now, given this classifier, I can define for you it’s associated
self classifier as follows. Given a point, what I do, is I sample a Gaussian, a bunch of Gaussians. Then the likelihood of the blue class is the probability that the Gaussian lands within the blue boundary. The probability that it lands of the min-class is now the probability that this Gaussian centered at here ends in the mint-class,
and so on so forth. So now I get this probability
distribution over all my classes. Now again, my prediction
is the most likely class. The point is that this
can actually change. For instance, if I’m like right here, I’d be classified as mint
by the Bayes classifier, but if I do a Gaussian smoothing, I’m still going to more
likely to be blue. So this does change the classifier. So hopefully that
description makes sense. The point is that actually turns
out that as long as I have a good randomized smooth classifier, it actually automatically has a
strong certifiable robustness. Here’s a theorem from
the Cohen paper. Let me just pass this
really really quick. It says that suppose
I have a classifier G which is like one of
these smooth classifiers. I’m at a point X where
this smooth classifier is very confident about his prediction. So if A and B are the
most likely classes, and the probability of A is much
bigger than the probability of B, then there’s a very
large radius delta which depends on the
difference between probability of A and
probability of B, such that the classification
cannot change. So this is just what
this theorem says. What it really depends on
is how good of a gap I can get from my smooth classifier. A larger of a gap between
P_A and P_B I can get, the larger of a radius
I know is robust around my point. So
hopefully this makes sense. As a small caveat technically, I can’t compute these things exactly, but if I sample these things
enough, then with extremely, extremely high probability,
like 10 to the minus 10, I can estimate these things. I can get very good estimates on
how good these quantities are. I can be very confident that
my thing is still robust. But the key is somehow design a hard classifier such that it’s associated like randomized
smooth classifier, has this big gap behavior, and most points I care about. So how do we train
these smooth networks? How do you train the base? Now where essentially the
smooth classifiers effective? So the paper of Cohen
et al tried this method of the Gaussian data augmentation. Intuitively, this makes sense. Somehow, I have the classifier, it’s going to be sampling like a bunch of Gaussian perturbations to my points and making predictions
based on those points. So maybe if I’m just correct
at a bunch of points that are Gaussians where
perturbations in my point, then maybe my method will also work. It’ll at least get pretty
good clean accuracy. This does work okay. They
get pretty good numbers. In fact, before I worked,
they were state of the art. But it turns out you can do much
better by directly robustifying the smooth network using adversarial
training on the smooth loss. So let me try to explain
what I mean by this. This is a recent work which
is to appear in NeurlPS. So recall that this is really
the loss that we care about, so let me break this
down for a second. So we’re going to focus
on cross entropy loss because that’s what people
usually do in practice. We just want, given a data
point and a label XY, we want to find an x_i that
maximizes the loss of G, where G is the smooth classifier. At this point, in an l_2
ball around this point. Because what we want to
do, this will literally be an adversarial perturbation for
the smooth classifier directly. So in particular, if you can’t find an x_i that
has like a large loss, then the smooth classifier is going
to be correct in this radius. Then we’re obviously
also going to be robust. So the point is that
this is the loss. You can just write
this down as this way, just expanding out
with the definition of the cross entropy losses and the definition of the
smooth classifier. We call this the smooth
adversarial objective. Then the point is that what we’re
going to do is we’re going to do adversarial training
with this objective. Meaning what we’re
going do at every step, is we’re going to find
Adversarial Perturbation. So recall what adversarial
training does is, when you’re given a data point, it finds some adversarial perturbation and trains it at that data point. What we’re going to do is
we’re going to do adversarial training trying to find an adversarial perturbation to
this objective at every step. This is very different from the
Gaussian Augmentation Objective. This is what we do, and this is what a Gaussian augmentation would
be doing to work it out. It’s a very subtle difference. So now the only difference
is that the log is on the outside instead of on the inside. But qualitatively, there’s actually very different
interpretations of this, because this says, find me an adversarial example of the truth move classifier
that I care about. This is why we think that this
is a natural thing to do. But this objective means
find an adversarial example, so the Bayes classifier, that when I perturbate by
Gaussian noise is robust, which maybe is a good
property to have, but somehow, we don’t
care about this. We don’t care if the
adversarial example is robust to Gaussian noise, we care about that if we apply the Smooth
classifier at this point, it is in fact an adversarial example. So if you do training
with this objective, you will actually get
much worse results. So here’s a plot that shows these lines are what
happens when you try to use this, and these lines up here is where the higher this is, the stronger the attack that you get. If you do the stronger attacks, you get stronger defenses when you use adversarial training as well. Essentially, these attacks
that use this are much stronger than attacks
that do this. Yeah.>>Excuse me. So assume
that [inaudible] isn’t the Gaussian Augmentation
on the right possibly finding the least strong
adversarial example?>>It’s just finding
an adversarial example that itself is very robust. But sir, I don’t care
about the robustness of the adversarial example itself, I care about the robustness of the smooth classifier
at this point. So that’s the moral difference. Yeah. So when you do
adversarial training with this, it turns out you can
substantially boost or get a much larger gap in the resulting certified correctness. So here are some plots. So
here are some results on CIFAR10. So there’s two lines here. The red line, this red curve, it’s however a certified, what code of all the previous
work could guarantee is robust for different
choices of the outer radius. So this is how much they could
actually certify this radius. The dotted line is how much the empirical attack
was able to achieve. So this is an upper bound on how
robust their model actually was. Similarly, for the blue line, for our side of the blue line, then you can see actually our certified accuracy is actually better than what they
can actually ever hope for. These are more detailed plots, but they show all of
the same picture, and the similar story for ImageNet.
We get substantially better. This is roughly maybe
10 percent or so, which is a very large boost. We can also combine these with
state of the art techniques, others state of the art techniques. So there’s a recent paper that showed that if
you pre-train a model, somehow take advantage of ImageNet. If you first trained
a model at ImageNet, then transfer that over to CIFAR10, the resulting model has
much better accuracy. If you do the same thing for us and you combine it with
our adversarial training, you also get large boosts. So here are the numbers. So again, here’s the plot that
shows both numbers as a table. So ours is before pre-training, we’re still a really large boost, but with pre-training,
you get even larger, often like 20 percent improvement upon what was previously
state of the art. So that’s pretty much all I have. I want to say that in
machine learning, in AI, we are increasingly using the
systems for various sensitive tasks, like driving a car,
these sorts of things. I think we need to be
increasingly aware of the security risks that
these things pose. These are also very interesting
theoretical problems, like there’s very interesting math, but we also need to be aware that these introduce
additional weaknesses potentially through any pipeline
that might be security sensitive. While I’ve shown, at
least the things that I described, were mostly academic. The Tesla example, and
the glasses examples are becoming increasingly real-world. These theoretical insights
often directly lead, as we try to understand these things from a more
theoretical perspective, the insights that we get often directly lead to these
better practical algorithms. The robust mean estimation
thing directly led to these better outlier
detection methods, and these better methods for learning the shape of Europe,
and all these kinds of things. But there’s still many exciting theoretical and applied
questions to consider, and I should say that in Microsoft, we’re constantly red
teaming our own systems to try to find and patch security
vulnerabilities of this way. That’s pretty much all I had. Yeah.>>Have you seen substantial evidence that creating algorithms
which are robust to outliers or adversaries [inaudible] even though
those don’t exist?>>So this is a tricky question. So in the setting of
adversarial examples, there have been these
studies recently that showed that, how do I say this? There’s this drift which is natural, maybe as I shift from one
image set to another, and there’s this adversarial
distribution shift. It seems that these two
are very different. One does not help the
other necessarily. It’s a little bit of a
different issue with the robust statistics because there, there’s directly statistical
interpretations of all these things, like if you’re a total
variation distance epsilon, then you still have
to get these things. So it depends really on the setting. Yeah. Yeah.>>Encountered any examples of adversarial training
data does damage to your model without it showing
up in the covariance matrix?>>So in some sense, we know that for specific
things, that’s impossible. At least, for mean estimation,
we know that if it doesn’t show up on the covariance matrix, it doesn’t mess up at least the mean. Of course, what we
actually care about, corrupting might not be the mean, and then perhaps you need to look for some other statistic which would hopefully give you evidence
that something is wrong. So it depends on your algorithm. For instance, for the
stochastic optimization, if you try to look at
covariances of something else, of not the gradients, then you
wouldn’t find anything wrong. So you have to find out exactly what the thing is that you care about and then use that. Yeah.>>Any other questions?
Yeah. Thank you.

3 thoughts on “Adversarial Machine Learning”

Leave a Reply

Your email address will not be published. Required fields are marked *