Last time, we continued our “nuts and bolts” series of Artificial intelligence (AI) for legal professionals with a look at 18 important AI terms, using generative AI (GPT-4) to define those terms.
One of the biggest barriers to using AI successfully is bias, which is one of the terms we defined last time, as follows:
Bias, in a general context, refers to a predisposition or inclination towards a particular viewpoint, often at the expense of alternative perspectives. In the realm of machine learning and artificial intelligence, bias denotes systematic and unfair discrimination in model outputs, often stemming from non-representative training data, flawed algorithms, or subjective human decisions during model design. Such biases can lead to skewed results, perpetuating stereotypes or inaccuracies, and thereby affecting the fairness and trustworthiness of AI systems.
In this post, we’ll provide examples of how bias can influence AI results and how AI algorithms are used, define sources of bias that can be associated with AI, and discuss mechanisms for legal professionals to address the challenge of bias in AI algorithms.
Examples of How Bias Can Impact AI
Here’s one high-profile example of what bias can do to an algorithm. In March 2016, Microsoft released “Tay”, which was an AI powered “social chatbot.” Like the automated, text-based chatbots we see on numerous e-commerce and customer service sites, Tay could answer written questions. Microsoft unleashed Tay on Twitter to engage with the masses. Tay was designed to engage people in dialogue through tweets or direct messages, while emulating the style and slang of a teenage girl.
The plan was to release Tay online, then let the bot discover patterns of language through its interactions, which “she” would emulate in subsequent conversations. Eventually, her programmers hoped, Tay would sound just like the internet. At first, Tay engaged harmlessly with her growing number of followers with banter and lame jokes. Tay said things like “humans are super cool” and “why isn’t national puppy day every day?”
However, after interacting with the Twitter masses and hammered by members of a troll-laden bulletin board, Tay went to hating feminists and denying the holocaust. All of that happened within just 16 hours before Microsoft pulled the plug on Tay. The warped nature of many Twitter trolls literally taught Tay to be a bigot. That’s an example how bias can influence an algorithm.
Another example of an AI algorithm where indications of bias occurred was the risk-assessment software COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), which was used to make sentencing recommendations, even though it wasn’t designed for that – it was originally designed to provide insight into the types of treatment (e.g., drug or mental health treatment) an offender might need. An article by Pro Publica reported that COMPAS was twice as likely to classify black defendants as high-risk and vice versa with white defendants as low risk. Similar concerns of racial bias have been expressed regarding facial recognition, including in this case.
Sources of AI Bias
AI algorithms can exhibit various types of biases, often reflecting the data they’re trained on, or the methods used in their design. Bias can be introduced as part of evaluating the results from the algorithm. Here are three sources of AI bias:
If the result from the AI algorithm is skewed, a common reason is that the data used to train the algorithm is biased. The Microsoft Tay example above illustrates how quickly a set of inputs (data) can change how an algorithm performs. There are three types of data bias:
- Sampling Bias: Occurs when the training data is not representative of the population it’s meant to model.
- Imbalance Bias: When some classes of data are underrepresented or overrepresented compared to others.
- Measurement Bias: When there are systematic errors in the way data is collected or labeled.
Algorithmic bias is bias that emerges from the algorithms or procedures used, even when the data might be balanced or representative. When the result from the AI algorithm is skewed, this is another potential cause. Given that the COMPAS software wasn’t originally designed for sentencing recommendations (which is known as “function creep”), it’s conceivable that at least part of the claims of bias could be attributable to the algorithm itself.
Regardless of how the algorithm performs, the potential of bias exists from humans analyzing the results of the algorithm. There are three types of human bias:
- Algorithm Aversion: This occurs when humans are likely to reject the output from AI algorithms as invalid without validating the results.
- Automation Bias: This is the opposite scenario, where humans are likely to trust the output from AI algorithms as valid without validating the results. Automation bias is illustrated by the Avianca case earlier this year where an attorney filed a brief with several bogus case citations generated by ChatGPT – his approach to validation of the results was (believe it or not) to ask ChatGPT if they were real cases.
- Confirmation Bias: This occurs when humans are likely to only accept the results of an AI algorithm if it is consistent with the beliefs and opinions they already have. An example of confirmation bias could be a doctor who rejects an algorithmic diagnosis because it doesn’t match their own experience or understanding.
Addressing the Challenge of Bias in AI Algorithms
With so many potential ways for bias to influence the results of an AI algorithm (or how those results are interpreted), expertise is needed to validate or authenticate the results. When applying that concept to litigation and eDiscovery, that generally means expert testimony to support or refute those results. There are two interrelated mechanisms within the U.S. legal system that a court can consider before it accepts or admits evidence:
Federal Rule of Evidence 702
FRE 702 addresses testimony by expert witnesses, which says:
A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if:
- the expert’s scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;
- the testimony is based on sufficient facts or data;
- the testimony is the product of reliable principles and methods; and
- the expert has reliably applied the principles and methods to the facts of the case.
The factors discussed in the U.S. Supreme Court’s decisions in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993) relating to determining the reliability of scientific or technical evidence are informative when determining whether Rule 702’s reliability requirement has been met. As described in the Advisory Committee Note to the amendment of FRE 702 that went into effect in 2000, the “Daubert Factors” are:
- whether the expert’s technique or theory can be or has been tested—that is, whether the expert’s theory can be challenged in some objective sense, or whether it is instead simply a subjective, conclusory approach that cannot reasonably be assessed for reliability;
- whether the technique or theory has been subject to peer review and publication;
- the known or potential rate of error of the technique or theory when applied;
- the existence and maintenance of standards and controls; and
- whether the technique or theory has been generally accepted in the scientific community.
In the Judicature article Artificial Justice: The Quandary of AI in the Courtroom, Maura R. Grossman summarizes eight key questions to ask in relation to FRE 702 and Daubert, as follows:
- Was the AI tested?
- Who tested it?
- How was it tested?
- How arm’s length was that testing?
- Is there a known error rate associated with the AI, and is that an acceptable error rate depending on the risk of the adverse consequences of a ruling based on invalid or unreliable information?
- Was the methodology generally accepted as reliable in the relevant scientific and technical community?
- Has the methodology been subject to peer review by other people other than the AI developer?
- Have standard procedures been used to develop the AI where applicable?
These are key questions and considerations to address to validate the results from AI algorithms and minimize the risk of bias influencing the results (or the perception of those results).
There’s a lot more that could be said about bias and AI; in fact, entire research papers and books have been devoted to the subject. Hopefully, this post gives you a better understanding of how bias can occur, as well as some of the considerations for addressing and minimizing it. For more information about the topics discussed above, check out the article Artificial Intelligence as Evidence, authored by Maura R. Grossman, J.D., Ph.D. and Gordon V. Cormack, Ph.D. and (now retired) Maryland District Judge Paul W. Grimm.
Another factor to be addressed when discussing AI is the privacy considerations associated with the use of AI algorithms and the massive amounts of data they use. Next time, we’ll discuss the privacy landscape as it pertains to AI.
For more regarding Cimplifi specialized expertise regarding AI & machine learning, click here.
In case you missed the previous blogs in this series, you can catch up here:
- The “Nuts and Bolts” of Artificial Intelligence for Legal Professionals
- The “Nuts and Bolts” of AI: Defining AI
- The “Nuts and Bolts” of AI: Types of Bias in AI
- The “Nuts and Bolts” of AI: Privacy Considerations
We invite you to stay informed and join the conversation about AI. If you have questions, insights, or thoughts to share, please don’t hesitate to reach out.