Algorithmic Bias and Fairness: Crash Course AI #18

Hi, I’m Jabril and welcome back to
CrashCourse AI. Algorithms are just math and code, but algorithms
are created by people and use our data, so biases that exist in the real world are mimicked
or even exaggerated by AI systems. This idea is called algorithmic bias. Bias isn’t inherently a terrible thing. Our brains try to take shortcuts by finding
patterns in data. So if you’ve only seen small, tiny dogs,
you might see a Great Dane and be like “Whoa that dog is unnatural” This doesn’t become a problem unless we
don’t acknowledge exceptions to patterns or unless we start treating certain groups
of people unfairly. As a society, we have laws to prevent discrimination
based on certain “protected classes” (like gender, race, or age) for things like employment
or housing. So it’s important to be aware of the difference
between bias, which we all have, and discrimination, which we can prevent. And knowing about algorithmic bias can help
us steer clear of a future where AI are used in harmful, discriminatory ways. INTRO There are at least 5 types of algorithmic
bias we should pay attention to. First, training data can reflect hidden biases
in society. For example, if an AI was trained on recent
news articles or books, the word “nurse” is more likely to refer to a “woman,”
while the word “programmer” is more likely to refer to a “man.” And you can see this happening with a Google
image search: “nurse” shows mostly women, while “programmer” mostly shows mostly
men. We can see how hidden biases in the data gets
embedded in search engine AI. Of course, we know there are male nurses and
female programmers and non-binary people doing both of these jobs! For example, an image search for “programmer
1960” shows a LOT more women. But AI algorithms aren’t very good at recognizing
cultural biases that might change over time, and they could even be spreading hidden biases
to more human brains. t’s also tempting to think that if we just
don’t collect or use training data that categorizes protected classes like race or
gender, then our algorithms can’t possibly discriminate. But, protected classes may emerge as correlated
features, which are features that aren’t explicitly in data but may be unintentionally
correlated to a specific prediction. For example, because many places in the US
are still extremely segregated, zip code can be strongly correlated to race. A record of purchases can be strongly correlated
to gender. And a controversial 2017 paper showed that
sexual orientation is strongly correlated with characteristics of a social media profile
photo. Second, the training data may not have enough
examples of each class, which can affect the accuracy of predictions. For example, many facial recognition AI algorithms
are trained on data that includes way more examples of white peoples’ faces than other
races. One story that made the news a few years ago
is a passport photo checker with an AI system to warn if the person in the photo had blinked. But the system had a lot of trouble with photos
of people of Asian descent. Being asked to take a photo again and again
would be really frustrating if you’re just trying to renew your passport, which is already
sort of a pain! Or, let’s say, you got a cool gig programming
a drone for IBM… but it has trouble recognizing your face because your skin’s too dark…
for example. Third, it’s hard to quantify certain features
in training data. There are lots of things that are tough to
describe with numbers. Like can you really rate a sibling relationship
with a number? It’s complicated! You love them, but you hate how messy they
are, but you like cooking together, but you hate how your parents compare you… It’s so hard to quantify all that! In many cases, we try to build AI to evaluate
complicated qualities of data, but sometimes we have to settle for easily measurable shortcuts. One recent example is trying to use AI to
grade writing on standardized tests like SATs and GREs with the goal to save human graders
time. Good writing involves complex elements like
clarity, structure, and creativity, but most of these qualities are hard to measure. So, instead, these AI focused on easier-to-measure
elements like sentence length, vocabulary, and grammar, which don’t fully represent
good writing… and made these AIs easier to fool. Some students from MIT built a natural language
program to create essays that made NO sense, but were rated highly by these grading algorithms. These AIs could also potentially be fooled
by memorizing portions of “template” essays to influence the score, rather than actually
writing a response to the prompt, all because of the training data that was used for these
scoring AI. Fourth, the algorithm could influence the
data that it gets, creating a positive feedback loop. A positive feedback loop basically means “amplifying
what happened in the past”… whether or not this amplification is good. An example is PredPol’s drug crime prediction
algorithm, which has been in use since 2012 in many large cities including LA and Chicago. PredPol was trained on data that was heavily
biased by past housing segregation and past cases of police bias. So, it would more frequently send police to
certain neighborhoods where a lot of racial minority folks lived. Arrests in those neighborhoods increased,
that arrest data was fed back into the algorithm, and the AI would predict more future drug
arrests in those neighborhoods and send the police there again. Even though there might be crime in neighborhoods
where police weren’t being sent by this AI, because there weren’t any arrests in those
neighborhoods, data about them wasn’t fed back into the algorithm. While algorithms like PredPol are still in
use, to try and manage these feedback effects, there is currently more effort to monitor
and adjust how they process data. So basically, this would be like a new principal
who was hired to improve the average grades of a school, but he doesn’t really care
about the students who already have good grades. He creates a watchlist of students who have
really bad grades and checks up on them every week, and he ignores the students who keep
up with good grades. If any of the students on his watchlist don’t
do their homework that week, they get punished. But all of the students NOT on his watchlist
can slack on their homework, and get away with it based on “what happened in the past.” This is essentially what’s happening with
PredPol, and you can be the judge if you believe it’s fair or not. Finally, a group of people may mess with training
data on purpose. For example, in 2014, Microsoft released a
chatbot named Xiaoice in China. People could chat with Xiaoice so it would
learn how to speak naturally on a variety of topics from these conversations. It worked great, and Xiaoice had over 40 million
conversations with no incidents. In 2016, Microsoft tried the same thing in
the U.S. by releasing the Twitterbot Tay. Tay trained on direct conversation threads
on Twitter, and by playing games with users where they could get it to repeat what they
were saying. In 12 hours after its release, after a “coordinated
attack by a subset of people” who biased its data set, Tay started posting violent,
sexist, anti-semitic, and racist Tweets. This kind of manipulation is usually framed
as “joking” or “trolling,” but the fact that AI can be manipulated means we should
take algorithmic predictions with a grain of salt. This is why I don’t leave John-Green-Bot
alone online… The common theme of algorithmic bias is that
AI systems are trying to make good predictions, but they make mistakes. Some of these mistakes may be harmless or
mildly inconvenient, but others may have significant consequences. To understand the key limitations of AI in
our current society, let’s go to the Thought Bubble. Let’s say there’s an AI system called
HireMe! that gives hiring recommendations to companies. HireMe is being used by Robots Weekly, a magazine
where John-Green-bot applied for an editorial job. Just by chance, the last two people named
“John” got fired from Robots Weekly and another three “Johns” didn’t make it
through the hiring process. So, when John-Green-Bot applies for the job,
HireMe! predicts that he’s only 24% likely to be employed by the company in 3 years. Seeing this prediction, the hiring manager
at Robots Weekly rejects John-Green-bot, and this data gets added to the HireMe! AI system. John-Green-Bot is just another “John”
that got rejected, even though he may have been the perfect robot for the job! Now, future “Johns” have an even lower
chance to be hired. It’s a positive feedback loop, with some
pretty negative consequences for John-Green-Bot. Of course, being named “John” isn’t
a protected class, but this could apply to other groups of people. Plus, even though algorithms like HireMe! Are great at establishing a link between two
kinds of data, they can’t always clarify why they’re making predictions. For example, HireMe! may find that higher
age is associated with lower knowledge of digital technologies, so the AI suggests hiring
younger applicants. Not only is this illegally discriminating
against the protected class of “age,” but the implied link also might not be true. John-Green-bot may be almost 40, but he runs
a robot blog and is active in online communities like Nerdfighteria! So it’s up to humans interacting with AI
systems like HireMe! to pay attention to recommendations and make sure they’re fair, or adjust the
algorithms if not. Thanks, Thought Bubble! Monitoring AI for bias and discrimination
sounds like a huge responsibility, so how can we do it? The first step is just understanding that
algorithms will be biased. It’s important to be critical about AI recommendations,
instead of just accepting that “the computer said so.” This is why transparency in algorithms is
so important, which is the ability to examine inputs and outputs to understand why an algorithm
is giving certain recommendations. But that’s easier said than done
when it comes to certain algorithms, like deep learning methods. Hidden layers can be tricky to interpret. Second, if we want to have less biased algorithms,
we may need more training data on protected classes like race, gender, or age. Looking at an algorithm’s recommendations
for protected classes may be a good way to check it for discrimination. This is kind of a double-edged sword, though. People who are part of protected classes may
(understandably) be worried about handing over personal information. It may feel like a violation of privacy, or
they might worry that algorithms will be misused to target rather than protect them. Even if you aren’t actively working on AI
systems, knowing about these algorithms and staying informed about artificial intelligence
are really important as we shape the future of this field. Anyone, including you, can advocate for more
careful, critical interpretation of algorithmic outputs to help protect human rights. Some people are even advocating that algorithms
should be clinically tested and scrutinized in the same way that medicines are. According to these opinions, we should know
if there are “side effects” before integrating AI in our daily lives. There’s nothing like that in the works yet. But it took over 2400 years for the Hippocratic
Oath to transform into current medical ethics guidelines. So it may take some time for us to come up
with the right set of practices. Next time, we have a lab and I’ll demonstrate
how there are biases in even simple things like trying to adopt a cat or a dog. I’ll see ya then. Speaking of understanding how bias and misinformation spread, you should check out this video on Deep Fakes I did with Above the Noise — another PBSDS channel that gets into the research behind controversial issues. Head over to the video in the description to find out how detect deep fakes. Tell them Jabril sent you! Crash Course AI is
produced in association with PBS Digital Studios! If you want to help keep all Crash Course
free for everybody, forever, you can join our community on Patreon. And if you want to learn more about prejudice
and discrimination in humans, you can check out this episode of Crash Course Sociology.

69 thoughts on “Algorithmic Bias and Fairness: Crash Course AI #18”

  1. Omg, im like 3 min in and its already dumb. So a google search shows more women pictures than men for nurses, and more men than women for programs. What the ratio of pictures featuring women to men for nursing, and men to women for programing? Im guessing the same as Google images shows.

  2. I dont get people are upset when AI cant recognize their face. I'd be thrilled to not be recognized. They wouldnt be able to use recognition software on me.

  3. That's why all big companies that use AI should establish AI ethics boards – let's keep our AI fair and unprejudiced! 🙂

  4. I'll imagine at least a couple of people will be upset to hear that things like what data is put into their algorithm will bias the outcome which is a very easy concept to grasp, like the chemicals in a reaction will narrow down what products you possibly get or how different fuels to a fire can impact the heat generated by the fire – like I could only imagine disagreement there if getting biased results while claiming there was none would be the intention to begin with…

  5. Hi Jabril, I'm a machine learning (a.k.a AI) researcher with a PhD, and an upcoming book* in the field, I was super excited to see this series (as a long-time fan of CrashCourse from World History with John Green to Sociology with Nicole Sweeney ). Now I'm even more excited to see you tackle this very very hard question in my field, AI. Unsurprisingly, I see a setback in the comments saying that this is becoming a social justice and not a science channel. So let me address some of these concerns.
    1) Algorithmic fairness is a highly scientific, highly "technical" topic, involving the state of the art knowledge we have in statistics and computer science today.
    2) Historically, the very word of "algorithm" originate from the 9th century book on "algebra", by alkhawarizmi (latinised to algorithmi), which itself is a book on *justice*, written by a lawyer for other lawyers so that they can also solve complex inheritance cases. More than half of that book uses indistinguishably the concept of "judgement" and "computation" (Hissab: حساب).
    Just to say: the very birth of "algorithms" was about making better judgements. Today, as we are automating these judgements, we ought to come up with better scientifically robust notions of fairness, which is what a whole community of more specialised researchers than myself are doing.
    3) Some of the reactions I saw in the comments are not uncommon even in discussions with top scientists (when they are from other specialities and are not aware of the impressive research being done in algorithmic fairness).

    best of luck, and keep up the very good job you and the rest of the channel's team are doing.

    *: (French version already available, English version due for June 2020)

  6. I am glad I watched this video even though I already knew about algorithmic bias from news stories. I learned a few terms and about similar cases I had not heard before. Now the question is how much information did I actually retain.

  7. Thank you for including nonbinary people. The person I watched before this on a different channel before this still only said "he or she…"

  8. If you gave the algorithm more data for protected classes, wouldn't that just bias it towards them? It seems that any learning data would necessarily contain some kind of pre-selected bias to even make a choice.

  9. "Algorithms are unambiguous specifications for performing calculation, data processing, automated reasoning, and other tasks.", AI (neural networks) are not unambiguous and don't qualify as algorithms. In neural networks biases may emerge spontaneously regardless of the training data.

  10. I'm reminded of the resume-screening AI that taught itself that the best candidates were named Trevor and played high school lacrosse. Biases in culture introduce biases into data, which just replicates the bias.

  11. If cold, hard numbers give you an inconvenient result, perhaps YOU'RE the biased one. Science seeks out the truth, it isn't a tool to validate existing beliefs.

  12. Yeah when you complain about the AI making things "a little more difficult" or "frustrating" then you've really got nothing to complain about. So Google image shows pictures of nurses as women and programmers as men. More women are nurses and more programmers are men. Nobody is keeping anyone from being a programmer if they're female or being a nurse if they're male. I'm sorry that's just a non-issue. We don't need to try and ensure that every single vocation has a perfect balance or race and/or gender. All we need to do is make sure that nobody is barred from any career path based only on their gender or race. Thinking like this should just be called "too many straight white men over there" because that seems to be the only group anybody is interested in making sure there aren't too many of in a given area. This just about is never applied to any other group.

  13. Prioritizing resources to areas where statistically in the past there is more likely to be issues makes 100% perfect sense.

  14. And some are deliberately biased. This is how cultural manipulation through government black projects is done here: let private corporations censor unspecified classes and it's not illegal. Only governments can be called illegal for suppressing speech on line. Nice trick!

  15. An interesting paper published in the journal of Psychological Science in 2018 looked at cross-cultural differences between international databases of achievement in STEM programs, and found that the lower Gender Gap Index of a country, the more likely it is to have "equal" rate of women to men among STEM graduates of universities. That is to say in countries with high Global Gender Gap Index like Finland, Norway, and Sweden, they tend to have significantly lower rates of women graduates of STEM programs (~20-25%), whereas in some of the countries with the lowest Global Gender Gap Index like UAE, Turkey, Algeria, they have some of the highest rates of women graduates of STEM programs (~36-41%).

    The paper is titled "The Gender-Equality Paradox in Science, Technology, Engineering, and Mathematics Education" DOI: 10.1177/0956797617741719

    In the Nursing and Programmer example, you mention data reflecting hidden biases in society, and certainly there must be some hidden biases that influence this population distribution. But it would be apt to also note that bias can exist in the way the data is presented to. This bias is called "Algorithmic Fairness" and is used by google, and ties in with data manipulation mentioned in section 5, though arguably it's not "malicious".

    At its core, algorithmic fairness manipulates data to over-represent groups. There are examples of this that anybody can test out, where results on image searches produce nearly 50:50 results between two subpopulations, despite their actual ratio in reality not being 50:50. This isn't malicious, but it could be harmful all the same. In an ideal world where there is true equality, we can pursue whatever career we want without worrying about the statistics of who makes up what job. And in the most "equal" societies, we find that there are fewer women in STEM. While using algorithmic fairness to show even pictures of men and women might make women in STEM feel better, it might also make women who don't wish to pursue STEM feel bad for not contributing to that equality. And this may sound silly, but we see the consequences of this in the "STEAM" movement where they try to include Arts into STEM to be more inclusive of women.

    Ultimately this boils down into the problem of equality of outcome, versus equality of opportunity. We know that in the most equal societies, they have close to equality of opportunity but not enough equality of outcome. And we know that in the least equal society, they have close to equality of outcome, but nowhere near enough equality of opportunity. The key question, then, is "Should algorithms reflect the actual data even if it's biased, or should algorithmic fairness be implemented to to makeup for biases hidden in society?" because there are arguments for both sides. If we truly believe that more equal societies are better, I think there's merit in accepting disproportionate gender representation.

  16. These features are going to come to pass, the only hope is in salvation by faith through Jesus Christ, to obtain the Holy Spirit which will lead you to a relationship with God

  17. During the debate that followed ProPublia's accusations of the COMPAS-algorithm being discriminatory against black people, Kleinberg, Mullainathan and Raghavan showed that there are inherent trade-offs between different notions of fairness.

    In the case of COMPAS, for example, the algorithm was "well-calobrated among groups", which means that, independent of skin colour, a group of people classified as, say, 70% to recidive, actually had 70% of people that would recidive.

    However, ProPublia objected, that the algorithm produced more false positive predictions for blacks (meaning that blacks were labeled more often wrongly as high risk) and more false negative predictions for whites (meaning that whites were more often labeled wrongly as low risk).

    In their paper, the authors showed that these notions of fairness, namely "well balanced among groups", "balance for the negative class" and "balance for the positive class" are mathematically incompatible and exclude each other. One can't have the one and the other at the same time.

    So yes, AI-systems will be biased, as insisted upon in the video. But it raises questions about what kind of fairness we want to be implemented and what we're willing to give up.

  18. Machine Learning creates Bias Machines; things capable of making a snap judgement based on minimal input. They're better at discovering patterns in empirical reality than humans are. If you don't like that, then just make an Expert System instead.

  19. Most nurses are female, and most programmers are male. So if you Google a nurse, you see a typical nurse (which is female). The example between minute one and two is not about our biases, its simply what is actually there.

  20. Did a short stint working on an algorithm that looked for potential pickpockets, trained on video of actual incidents that led to arrest.

    Was moved to another project after I kept bringing up the fact that the algorithm was biased as the data set was generally representative of a subset of pickpockets, the ones who get caught. My request for video of successful pickpockets that were not arrested to train the algorithm was not viewed favorably.

  21. Have you guys covered politics much? I know it can be touchy, but I'd like someone in a good position to do so. To explain the issues with things like the party system and gerrymandering, and what you can legally do to change it instead of letting things go until the levy breaks.

  22. "sexual orientation is strongly correlated with certain characteristics of a social media profile photo"
    which characteristics? how do i algorithmically optimize the gayness of my profile??

  23. The point of the Google image search example isn't to accuse Google of some grave injustice, it's just an easy to understand example of how just because a computer is generating it doesn't mean its output isn't biased. The society it's getting its data from is biased in favour of female nurses, so it will return mostly pictures of female nurses even when the user is just looking for "nurse" without specifying gender. Once you understand that, it's easy to understand how that can become a problem when the situation is more complicated, the stakes are higher, which is the whole point of the episode.

    Let's say there's 10 male nurses in the world and 90 female nurses. Out of those 100 nurses, one man and two women have committed the same misdemeanour on the job. Given that, would it be fair to make decisions on who to to employ as nurse based on the idea that 10% of men have committed this misdemeanour but only ~2% of women have? An AI trained with this data might. Worse yet, you don't even know it's doing this because its decision-making process is more or less a black box.


  25. A fundamentally dishonest video. No mention of the ideological bias that is almost unanimous in silicon valley, obviously is going to infect the algorithms. and is the prima facie cause of the discrimination so many conservatives have noticed in social media and on youtube right here. No mention of the obstinate denials of this obvious reality by the tech companies, rather than trying seriously to deal with it by hiring enough non-leftists who would be able to recognize it and help them police it. No mention of the refusal of these companies and their personnel to acknowledge their own bias, the first step toward policing. No mention of their firing of people who point out the problem. The failure here to mention this, on a supposedly scientific video here, is itself a confirmation of how serious the problem is.

  26. Are there really deep learning models that implement a person's name as a factor to extrapolate their personality traits or compatibility for a job? Are there any studies that show that a person's given name has a significant correlation to their personality?

  27. Should a program be faulted for showing mostly female nurses? 91% of nurses are female. Should it be faulted for recognizing more white people? The United States is 72% Caucasian. It seems silly that we try to tell computers lies, so that their results don’t hurt anyone’s feelers.

  28. Many people are missing the point to the Google analogy. AI hiring systems will learn associated characteristics of a nurse or programmer or what have you from similar datasets. That's not so much the problem- it's what happens next. It discriminates against people who don't meet the average characteristics. The AI system may throw out a resume for a nursing position that has the words "Boy Scout troop leader" because that's not something associated with the average nurse. It may throw out qualified programmer resumes from people who attended HBCUs, because most programmers haven't. If you don't quite get this, please look up the scrapped Amazon AI hiring program. It downgraded resumes from applicants who attended women's colleges.

  29. OK, I was kind of dreading this one because I expected a bunch of woke drivel – but I gotta be honest, you folks pretty much nailed it. This was informative, and probably as even-handed as Crash Course has ever been on such a sensitive topic. I am impressed.

Leave a Reply

Your email address will not be published. Required fields are marked *