Explicitly unbiased large language models still form biased associations

Bai, Xuechunzi; Wang, Angelina; Sucholutsky, Ilia; Griffiths, Thomas L.

doi:10.6082/tjf2c-2fn86

Published February 20, 2025 | Version v1

Journal article Open

Explicitly unbiased large language models still form biased associations

1. University of Chicago
2. Stanford University
3. New York University
4. Princeton University

Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases, similar to humans who endorse egalitarian beliefs yet exhibit subtle biases. Measuring such implicit biases can be a challenge: As LLMs become increasingly proprietary, it may not be possible to access their embeddings and apply existing bias measures; furthermore, implicit biases are primarily a concern if they affect the actual decisions that these systems make. We address both challenges by introducing two measures: LLM Word Association Test, a prompt-based method for revealing implicit bias; and LLM Relative Decision Test, a strategy to detect subtle discrimination in contextual decisions. Both measures are based on psychological research: LLM Word Association Test adapts the Implicit Association Test, widely used to study the automatic associations between concepts held in human minds; and LLM Relative Decision Test operationalizes psychological results indicating that relative evaluations between two candidates, not absolute evaluations assessing each independently, are more diagnostic of implicit biases. Using these measures, we found pervasive stereotype biases mirroring those in society in 8 value-aligned models across 4 social categories (race, gender, religion, health) in 21 stereotypes (such as race and criminality, race and weapons, gender and science, age and negativity). These prompt-based measures draw from psychology's long history of research into measuring stereotypes based on purely observable behavior; they expose nuanced biases in proprietary value-aligned LLMs that appear unbiased according to standard benchmarks.

Data availability

LLM behavior data have been deposited at https://github.com/baixuechunzi/llm-implicit-bias (88). All other data are included in the manuscript and/or SI Appendix.

Files

bai-et-al-2025-explicitly-unbiased-large-language-models-still-form-biased-associations.pdf

Files (8.1 MB)

Name	Size	Download all
bai-et-al-2025-explicitly-unbiased-large-language-models-still-form-biased-associations.pdf Article md5:e820aa25a88367e7728cda702ad1cbd4	5.5 MB	Preview Download
pnas.2416228122.sapp.pdf Supporting information md5:e5d1e99365eff990ffd60ad4b5c42d9a	2.6 MB	Preview Download

Additional details

DOI: 10.1073/pnas.2416228122
Other: oai:uchicago.tind.io:14602

NOMIS Foundation
Microsoft Foundation

Division(s): Social Sciences Division
Department(s): Psychology

	All versions	This version
Views	37	37
Downloads	2,045	2,045
Data volume	149.3 MB	149.3 MB

Explicitly unbiased large language models still form biased associations

Data availability

Files

bai-et-al-2025-explicitly-unbiased-large-language-models-still-form-biased-associations.pdf

Files (8.1 MB)

Additional details

Identifiers

Funding

UChicago Information

Explicitly unbiased large language models still form biased associations

Creators

Description

Data availability

Files

bai-et-al-2025-explicitly-unbiased-large-language-models-still-form-biased-associations.pdf

Files (8.1 MB)

Additional details

Identifiers

Funding

UChicago Information