Thursday, July 25, 2024

Algorithms and Algorithm Bias: What Does Machine Learning Reveal About Us?

Algorithms dictate what we see online. But how much of them do we dictate? Learn about algorithm bias—and more importantly, how we can solve it.

With everything that has happened in 2020, it’s interesting to think about how two years ago, the biggest question on many people’s minds was “Who’ll be directing the third Guardians of the Galaxy film?”

Back in July 2018, Disney fired writer-director James Gunn from the franchise after a series of offensive tweets he posted from 2009 to 2010—years before he became well-known for crafting two of the films in the pop culture juggernaut known as the Marvel Cinematic Universe—were brought to light.

Screencap: Jack Posobiec

News outlets reported that the tweets, which were vulgar jokes revolving around paedophilia and other sensitive topics, were unearthed by right-wing activists following the director’s criticism of U.S. President Donald Trump. Gunn swiftly apologised for his old tweets, admitting that they were failed attempts at provocative humour; however, the damage had already been done. And while the director was eventually rehired by Disney eight months later, the whirlwind of debates and discussions during the heat of the controversy was, for lack of a better word, astounding.

Some argued that Gunn’s old tweets no longer represented the person he is today, while others called for him to be ‘cancelled’—a term that, nowadays, refers to a person (usually a celebrity) being shunned and having the spotlight taken from them as punishment for insensitive, insulting, or politically incorrect comments and/or behaviour. You’ve probably heard of so many other examples of this so-called ‘cancel culture’, especially involving Hollywood personalities. Johnny Depp, Kevin Hart, Kevin Spacey… the list goes on.

Here in the UK, the two most prominent examples of cancel culture in action are the ones that involve the famous author of the Harry Potter books, J.K. Rowling, because of her views on transgender people, and the actor and singer Laurence Fox, who clashed with a BBC ‘Question Time’ audience member on whether or not the British press was being “racist” towards Meghan Markle.

Both have since endured much trolling on Twitter for their views and essentially ‘cancelled’. Staff who work at Hachette U.K., the publisher of J.K. Rowling’s forthcoming children’s book The Ickabog, have told management that they were “no longer prepared to work on the book.” And as for Laurence Fox, both he and the audience member he corresponded with have received death threats.

Now, I’m not here to talk about the morality of cancel culture, or whether or not these people actually deserved to be ‘cancelled’.

What I’d really like us to sit down and think about, though, is just how quick these discussions tend to blow up. Of course, star power’s a factor here: The names I mentioned are (or were) all A-list celebrities. But what about the many, many other cases of relative unknowns and private citizens who became the targets of cancel culture, like Justine Sacco, the author of that infamous Africa AIDS tweet from 2013?

The thing is, controversy and negativity tend to spread across the internet like wildfire, and it’s not just because people like talking about these things. The reason behind this is the same mechanism that enables eBay, Amazon, Netflix, and Facebook to show us ads and content that we’re likely to pay attention to: algorithms.

Algorithms: The formula for digital success

Most of us are familiar with the word ‘algorithm’ from our mathematics classes: It’s a set of steps that help us compute and solve problems. In computer programming terms, the definition is similar: It’s a well-defined sequence of instructions that enables the processing of information or execution of a task.

Think of an algorithm as the instruction manual for putting together that ready-to-assemble table you just bought, or a process for getting a taxi, which might look a little bit like this (from HowStuffWorks):

The taxi algorithm:

  1. Go to the taxi stand.
  2. Get in a taxi.
  3. Give the driver my address.

In the same way, technology follows a set of instructions so that it can carry out specific tasks or make the overall experience better suited to the user’s needs and preferences.

Digital platforms are designed to keep you engaged through their respective algorithms, which is why Netflix knows which shows to recommend, Facebook displays ads aligned with your interests, and Google gives you what it deems to be the most relevant results for your search terms.

Here’s the thing: Whenever you use an app on your phone, click an ad as you’re using your laptop or even try the first episode of a show on your preferred streaming service, your choices reveal a little bit about you. And each time you perform any of these actions, the companies that own and run these programs get a clearer idea of who you are and what you like.

When you listen to certain types of music on YouTube or Spotify, you give hints as to what your true age is; when you enter the name of your alma mater, it may point to where you live or used to live. The same goes for when you enter your address and credit card details when you make a purchase: That data allows companies to get to know you better.

Some algorithms use all of these personal data to deliver tailored results and content. Others are based on tried and tested trends and information about what works and what doesn’t.

So wait—how does this tie into cancel culture?

Well, as I mentioned earlier, algorithms of digital platforms are designed to keep you engaged as long as possible. With so much going on in the realm of the World Wide Web, algorithms work in conjunction with things that can result in tremendous engagement in the blink of an eye—and nothing sparks engagement faster than Internet rage.

As a result, a handful of people posting about their anger or disgust towards a certain individual or organisation can easily reach a wider network, often those of the same mindset or predisposition. As they get more attention and engagement, the discussion becomes even more massive. Pretty soon, everyone will be talking about it, regardless of whether the full context of the original posts gets carried over or not—and as more people get outraged, cancellation becomes more and more imminent.

Now, that doesn’t sound very fair, does it? Sadly, that’s just one manifestation of algorithm bias, a problem that affects us on a level far deeper than we may realise.

The issue of bias in algorithms

In a nutshell, algorithm bias happens when a computer program produces output that shows preference towards or excludes a group of users, based on a particular categorical distinction. Unfortunately, the most obvious manifestations of this in machine learning tend to be discriminatory.

Simply said, if algorithms were people, they would be subject to the same scrutiny people come under. For example, if someone demonstrated that they could be racist, sexist and discriminatory towards others, wouldn’t you say something or do something about it?

Let’s look deeper into how algorithms are just as flawed as human beings and understand the impact of flawed or imperfect algorithms or more known as algorithm bias can have on people’s lives.

Can algorithms be ‘racist’?

I read a study published in 2019 that closely examined a widely used algorithm in allocating health care in the United States. To optimise the process of identifying which patients need help the most, health care systems take advantage of carefully constructed algorithms that read mountains of data from patients across the country. This particular algorithm assessed patient priority based on how much it would cost a healthcare provider to treat them.

While this works on paper, it doesn’t take into account the fact that black patients don’t have the same level of access to health care as white patients, which means less money is spent on them. Thus, the results showed that black patients were at lower risk than white patients; in reality, though, fixing the algorithm bias would mean that more than twice as many black patients would be able to get much-needed assistance.

Another example was shared by MIT grad student Joy Buolamwini in an amazing TED Talk. In her talk, Joy recounted an instance in which facial analysis software couldn’t detect her face. She realised that the people who wrote the algorithm failed to account for a wider range of facial structures and skin tones, thereby creating a bias towards faces and skin tones belonging to a certain group of people—or as she called it, ‘coded gaze’.

This doesn’t just affect software. Algorithms are also present in hardware. Take for example when Nikon cameras equipped with a blink detection feature wouldn’t snap photos of many of its Asian users because the software ‘thought’ their eyes were never open.

Do you own an Alexa? Then you might know about this next problem. Amazon Alexa often struggles to recognise different accents. Only those with British or American English accents are the ones most easily detected by Alexa, which others find themselves having to repeat themselves over and over again.

And even worse, in 2015, Google’s photo-recognition tool mistakenly tagged a photo of two black people as gorillas.

Algorithm bias doesn’t just extend to incidental racism; as it turns out, algorithms can lead to sex- and gender-related discrimination as well.

Can algorithms be ‘sexist’?

Back in 2015, Amazon was testing an artificial intelligence (AI) system that it had put in place to help screen job applicants. The problem? The algorithm was designed to identify certain keywords on applicants’ resumes, which resulted in the unintentional “weeding out” of female applicants.

Even Google Translate is not immune to being a little bit sexist. It’s found to insistently associate certain jobs with certain genders while translating sentences with gender-neutral pronouns from languages like Turkish, Finnish and Chinese.

Can algorithms ‘out’ someone?

Meanwhile, a 2017 study discovered that Facebook’s algorithm had classified certain users as gay, based entirely on their post-liking behaviour. This was problematic because not all of the people on that list openly identified themselves as gay, yet the algorithm still classified them as such.

Can algorithms be discriminatory?

Algorithm bias can even result in issues with policy and law enforcement. For example, in 2017, Chicago declared that it would use ‘predictive policing’ to determine which areas were more likely to experience violent crime (and therefore would require more officers). However, the model ended up pointing to places with an already sizable police presence as areas that needed even more enforcers; instead of addressing existing gaps, it only served to reinforce existing biases.

In the US, there is also a crime-predicting algorithm that wrongly labelled black people as re-offenders—at nearly twice the rate of white people.

Are you beginning to see a pattern emerging here?

So why does algorithm bias happen? Who should we blame? Is it the data, the programmers, or something else entirely?

Why algorithm bias happens

A common narrative is that algorithm bias happens when the wrong data— ‘bad data’ —are used. However, data can’t be inherently bad or discriminatory. Data can’t have existing prejudices, conscious or unconscious. Data can’t have preferences. Data can only be data, plain and simple.

In reality, it’s in the way algorithms interpret data that algorithm bias rears its ugly head. This becomes especially apparent when algorithms are designed to profile people based on correlations instead of actual, individually gathered information.

That’s not to say that people who write algorithms are out to discriminate against certain groups of people, though. In fact, it’s safe to say that programmers and software engineers go to great lengths to make sure that their algorithms work fine for everyone.

The problem, however, is that we humans tend to be blind to our own biases and existing notions, what we might call ‘blind spots’ —a fact that quickly becomes apparent when we design algorithms that end up being unintentionally biased themselves.

In an interview with Forbes Magazine, Corey White, Senior Vice President of technology strategy firm Future Point of View, enumerated three types of biases that are commonly found in datasets. According to White, these may clue us in as to how and why algorithm bias occurs.

The first type is interaction bias

In the example I mentioned earlier, that facial recognition software was unable to recognise non-Caucasian faces because it had been trained to recognise Caucasian ones, instead of a wider range of faces and races. This can create a slew of problems for people who don’t fit the profile of the average white person, from airport inconveniences to wrongful arrests.

This was also the issue in the Nikon blink detection feature I mentioned earlier. The Nikon camera’s algorithm taught its AI (artificial intelligence) to detect ‘open eyes’ and used Caucasians as its main data set. So because of the ‘interaction bias’, it’s easy to see how the ‘blink detection’ would be triggered when an Asian person steps into frame.

This is an example of the algorithms in question not trained with enough or related data in order to produce correct results.

The second type is latent bias

Remember that study about the health care algorithm? The algorithm used historical data, but read it the wrong way. That’s why it concluded that black patients were low-risk, without taking into consideration their access to proper health care.

The third type is selection bias

This type of bias reflects an imbalance in available data for one group compared to another. According to White, an example of this is when search engines and search algorithms developed by Western coders are more likely to recognise and classify people, objects, and events in Western settings because those are the settings that the coders have more data on.

It’s not enough to know and acknowledge that the problem of algorithm bias exists. The more important question now is: How do we fix it?

Introducing the green-haired algorithm bias warrior princess, Cathy O’Neil

I also want to take a moment to introduce Cathy O’Neil — someone who I follow and turn to when enlightening myself on this topic. She is the Math Babe and author of the book “Weapons of Math Destruction”, who is fighting the good fight against algorithm bias and ending the era of blind faith in big data.

I love how she broke down how algorithms and algorithm bias works in this cute RSA video.

Cathy explains that to build an algorithm, we need two things:

  1.  a historical data set
  2. a definition of success

She gives a great example of how exactly an algorithm is built by illustrating how she would build an algorithm for cooking dinner for her family. The two things she needs to execute this task is:

  1. A historical data set: The data she uses on a daily basis is the ingredients in her kitchen.
  2. A definition of success: As she is the one in charge of cooking and building the meals, she has the power to define if the meal was a success or not. So she defines the meal as successful if her kids eat the vegetables.

Now I want you to imagine a different scenario. What if Cathy’s kids were to be the one in charge of building this algorithm? They might be using the same historical data set but do you think the definition of success will be the same?

Hell no! Right?

Kids being kids will mostly define the success of the meal if it was tasty and probably started with dessert instead of ending with it.

So this just proves to demonstrate that when we create algorithms, we embed our values into algorithms. Cathy says…

“So when people tell you algorithms make things objective, you say no. Algorithms make things work for the builder of the algorithms. In general, we have a situation where algorithms are extremely powerful in our daily lives but there is a barrier between us and the people building them. And those people are typically coming from a kind of homogeneous group of people who have their particular incentives — if it’s in a corporate setting, it’s usually profit and not usually a question of fairness for the people who are subject to their algorithms. 

So we always have to penetrate this fortress. We have to be able to question the algorithm themselves. Especially when it’s very important to us. We have to inject ethics into the process of building algorithms, and that starts with data scientists — agreeing and signing a hippocratic oath of modelling. We have to stop blindly trusting algorithms to be fair. They are not inherently fair. Start looking into what they are actually doing.” 

Fixing algorithm bias

The first step towards fixing algorithm bias is to recognise that there probably won’t be a truly unbiased algorithm.

Algorithms will always make predictions based on generalised statistics; it’s simply not efficient (or even possible) for an algorithm that will take each and every person’s unique characteristics and circumstances into consideration.

However, by designing these algorithms properly, we can significantly minimise the amount of bias inherently found in them.

Going back to the 2019 health care study, the researchers recommended a simple fix. Instead of using a different set of data, designing the algorithm to look at the same data differently can yield greatly reduced bias. When the researchers taught the algorithm to look at specific costs (e.g. how many times patients were sent to the emergency room), the disparity was significantly decreased. They also advised shifting the focus of the algorithm from predicting costs to predicting actual health outcomes.

It also helps to remember that algorithm bias is a result of human bias, whether intentional or not. This means that the people writing these algorithms must learn to be conscious about their own biases—and understand what bias means, in the first place.

Algorithm bias is much easier to fix than human bias. Fixing algorithm bias can involve something as simple as using a wider range of samples to train facial recognition software or rethinking which keywords should be flagged for job recruitment purposes. Either way, it highlights one simple truth: Human intelligence and empathy must always be factored in these automated solutions.

If you’d like to have a chat with me over a Zoom coffee about algorithm bias or other topics surrounding how data and technology introduce challenges and opportunities to society, hop on to this link to suggest a time and day to book a call with me.