Bias and Language with LLMs

Linguistics

WHO: Pranav Anand, Andrew Kato, Millie Hacker

WHAT: This project examines the sources and types of bias in modern large language models (LLMs), like ChatGPT, as well as the strategies researchers have formulated for diagnosing bias in the models and removing it (“debiasing”). One early example of a potential bias that is well known is that some systems will complete an analogy like “man is to doctor as woman is ____” with “nurse.”

The overall project aims to pair efforts in computational linguistics around bias detection and mitigation with increased scrutiny in theoretical linguistics around bias in linguist-generated examples. That latter effort has provided very fine-grained and subtle forms of bias in wording, phrasing, and framing; for example, in these human-generated examples, men are more likely to be active, animate subjects of sentences and woman be acted on, even though in the generated examples, the characteristics of the individuals is completely irrelevant.

We do not believe these kinds of signatures have been as systematically investigated in computational circles, or in the textual data such systems have been trained on, and our goals are to attempt to estimate these fine-grained forms of bias, both in contemporary LLMs and in the training data, partially by hand and partially using LLMs themselves.

Our first task has been to compile a comprehensive bibliography of work on bias in LLMs, and we have approximately 100 papers we have begun to sift through and catalog for the kinds of bias they address, the way they think about and measure that bias, and relevant strategies for mitigation. So far, we have not encountered approaches that deal with the subtleties we observe in the theoretical linguistics literature, but we still have many papers to work through.

WHY: It is important to understand the forms of bias in existing AI systems properly and provide tools to help us measure next-generation systems. Work by linguists is useful here because modern LLMs are so complex to resist explanatory analysis, and approaching it by thinking about the end result might be a useful way to diagnose deficiencies. It’s also important to note that while there is a great deal of work in this space, the notions of bias deployed in computational linguistic settings are often rather crude, and revising those to include humanistic and social scientific theorizing will be conceptually useful going forward.

WHAT’S NEXT: After we complete our survey, we will then turn to investigate the viability of using LLMs to annotate for the fine-grained forms of bias we have extracted from the linguistic literature and attempt to generalize them to other forms of bias as well.