As machine learning is currently being utilized to decide everything, from stock prices, advertising, marketing, to medical diagnoses. It has never been more important to look at the decision-making process of these machine learning models and algorithms. Unfortunately, a good portion of currently deployed machine learning systems, are prejudiced in way of sexism, ageism, racism — you name it.

We are often quick to say that the way to make these machine learning models less biased, is to simply come up with better algorithms. However, algorithms are only as good as the data fed to them. The result of using non-prejudiced data can make a big difference.

For instance,

1.In 2015 Google’s photo application embarrassingly labeled some African-American individuals as gorillas [1]

2.Even more disturbing was an analytical report a year ago by ProPublica [2] that discovered programming used to anticipate future criminal conduct — a la the film “Minority Report” — was one-sided against minorities.

3.Anastasia Georgievskaya, a Research Scientist with Youth Laboratories [3]first experienced prejudice from machine learning when working on an AI-judged beauty contest application[4], which uses computer vision and machine learning to study aging, where almost all of the winners picked by the ML models were white.

In the end, algorithms can always be improved, however, ML systems can only learn from the data they are given.

By far, the largest misconception is that more data is always better. However, when it comes to datasets of population that does not necessarily help, as drawing from the exact same population often leads to the same classification of subgroups being under-represented. Even one of the most popular image database ImageNet [4], with its gigantic image dataset, has been show to be biased towards the Northern Hemisphere [5].

Human Biases That Can Result Into ML Biases

1.Reporting Bias/Sample Bias: - Reporting bias occurs when the frequency of events, properties and the results in a data set do not reflect their real-world data accurately. This bias can arise because people tend to focus on documenting circumstances that are unusual or especially memorable. This should not be done and equal distribution of large datasets should be used

2.Prejudice Bias: If the training data that is influenced by stereotypes like culture for an example. The training data must have an equal number of all kinds of data. The training data decisions should not reflect on social stereotypes. This bias can be avoided by not taking into account the facts revolving the occupations with regard to gender.

3.Measurement Bias: This kind of bias tends to skew the data in a particular direction. This will result in systematic value distortion and is attributed to the fault in the device used to measure. The algorithm would be trained on image data that the system failed to represent the environment it will operate in. This kind of bias can’t be avoided simply by collecting more data. It’s best avoided by having multiple measuring devices, and humans who are trained to compare the output of these devices.

4.Automation Bias: Automation bias is a tendency to favor results generated by automated systems over those generated by non-automated systems. It does not take into account the error rates of both the automated and non-automated systems.

5.Group attribution Bias: This bias assumes a particular attribute to the entire group. For example, if the majority of women are in the designing industry and a majority of men in the hardware, it tends to assume the respective professions for both.

6.Algorithm Bias: In machine learning, bias is a mathematical property of an algorithm. The counterpart to bias in this context is variance. ML algorithms with a high value of variance can easily fit into training data and welcome complexity but are sensitive to noise. High bias models are more rigid and not largely by variations in the data.

It is important to continue building awareness around issues of biases in machine learning and invest heavily in scrubbing discrimination out of datasets currently being used. Technologies and algorithms have become essential and they surround us in every part of our life. It is imperative to design machine learning systems that treat us in the right way, in order to live without prejudices in the future.

What Can Be Done To Prevent Biases:

There are signs of existing self-correction in the AI industry: Researchers are looking at ways to reduce bias and strengthen ethics in rule-based artificial systems by taking human biases into account, for example, some good practices to follow:

1.Choose the right learning model for the problem.

There’s a reason all AI models are unique: Each problem requires a different solution and provides varying data resources. There’s no single model to follow that will avoid bias, but there are parameters that can inform your team as it is being built.

2.Choose a representative training data set.

Making sure the training data is diverse and includes different groups is essential, but segmentation in the model can be problematic unless the real data is similarly segmented.

3.Monitor performance using real data

It’s unwise, for example, to use test groups on algorithms already in production. Instead, run your statistical methods against real data whenever possible.

Biases in ML algorithms come only as a result of the training data being biased. Humans themselves naturally tend to be at least slightly biased in general and that is often times reflected in these ML systems. Some consequences of these biased algorithms may come off as hypothetical and indirect, but on the other hand, some are direct and immediate that can affect people's life.

References :











11.Why is My Classifier Discriminatory Irene Chen, Fredrik D. Johansson, David Sontag Massachusetts Institute of Technology

12.A Deeper Look at Dataset Bias Springer Document

[Mustafa Ali]