> This is the markdown version of https://www.classet.ai/blog/ai-hiring-bias-stanford-study
> Learn more at https://www.classet.ai



![Low-poly magnifying glass over a stack of job applications, revealing hidden patterns inside an AI hiring black box](/_next/image?url=%2Fimages%2Fblog%2Fai-hiring-bias-stanford-study.png&w=3840&q=75)

# What Stanford's AI Hiring Bias Study Reveals About Audits

Stanford analyzed 4 million AI-screened job applications and found clean-looking audits can hide racial bias. Here's what hiring teams should check now.

[![Paul Jones](/_next/image?url=https%3A%2F%2Fassets.basehub.com%2Fe0b5701f%2F6599306507912123f90f150a8bfaaf6c%2Fscreenshot-2026-01-28-at-10.53.16-am.png%3Fwidth%3D100%26height%3D100%26quality%3D100&w=96&q=75)

Paul JonesHead of Growth at Classet

](/blog/authors/paul-jones)

June 11, 2026

AI Recruiting, Hiring Tips

Stanford researchers just got something almost nobody outside a vendor ever sees: the raw screening decisions from inside a real AI hiring system. Not a lab simulation, not a synthetic dataset. Four million actual applications, scored by one algorithm, for real jobs at real companies. The study, [_Algorithmic Monocultures in Hiring_](https://arxiv.org/abs/2605.27371), is the first large-scale look at what these tools do once they're deployed at scale. And the finding that should stop every talent leader cold is not "AI is biased." It's that the audits meant to catch that bias said everything was fine.

Quick answer

A Stanford-led team analyzed 4.1 million job applications from 3.3 million people, all screened by a single AI vendor. About 26% of Black applicants' applications and 15% of Asian applicants' applications went to roles where the tool selected their group at a rate low enough to trigger federal adverse-impact scrutiny. The vendor's own audits showed no bias when every job was averaged together. Checked position by position, as U.S. law requires, the problem appeared in roughly 1 in 10 roles.

## What the study actually found

The dataset came from a single screening vendor (Pymetrics) and covered 156 employers and 1,746 separate job postings. Researchers Rishi Bommasani, Sarah Bana, Kathleen Creel, Dan Jurafsky, and Percy Liang ran the kind of analysis a buyer almost never can: they checked each position on its own.

4.1M

Applications screened by one AI vendor

3.3M

Real applicants across 156 employers

~40,000

Applications that would have advanced without the gap

1 in 10

Positions showing adverse impact by race

Here's the number that frames the whole thing: if the tool had recommended Black and Asian candidates at the same rate as the most-favored group, about 40,000 more applications would have moved to the next round. That's the cost of the gap, from one vendor, in one dataset.

What makes it worse is how quietly it happened. Applicants never typed their race, gender, or age. They played gamified neuroscience tasks, including a balloon-pump game that scores risk tolerance by how far you'll push before it bursts. Nobody coded race into the model. But how people play those games turned out to track with who they are, so the scores became an unintentional proxy. No villain, no checkbox, just math doing a bad thing on its own.

## An audit is only as good as the question it asks

"Audited for bias" sounds like a clean bill of health. On its own, it isn't, because an audit answers one specific question, and the buyer rarely knows which one.

The vendor's audit asked whether selection rates were fair across all jobs combined, and the honest answer was yes. The question U.S. law cares about is narrower. The EEOC's four-fifths rule looks at one position at a time: if a group's selection rate for a job falls below 80% of the top group's, it draws scrutiny. Asked that way, position by position, the study found adverse impact in about 1 in 10 roles, and those roles took roughly 26% of applications from Black applicants and 15% from Asian applicants. Same data, different question, opposite conclusion.

But the audit was never the root cause. The model was making the advance-or-reject call across 156 employers, so one blind spot didn't stay contained to a single company. It repeated everywhere that vendor screened. Co-author Sarah Bana put the buyer's trap plainly to HR Dive: when you read that something you bought has been audited, you tend to take the finding at face value. The fault line worth checking in any tool you buy isn't only how it was audited, but whether a person or the model owns the decision.

## Rejected once can mean rejected everywhere

The "monoculture" in the title is the second finding, and it's the one most teams have never thought about. When one vendor screens for many employers, those applications aren't independent shots on goal. They're the same judgment, repeated.

The study found that roughly 10% of people who applied to four positions were screened out of every single one. Among those who applied to ten, 4% were rejected from all ten, a rate higher than chance would predict if each company decided on its own. When the researchers ran the same method on the largest earlier study of hiring decisions, 83,000 applications sent to 108 Fortune 500 firms, the rejection pattern matched what you'd expect from companies deciding on their own. No pile-up. The shared algorithm is what changed. One model's quiet "no" follows a candidate across the whole market.

## A gut check for your own screening stack

You don't need a Stanford dataset to pressure-test what you're running. Three questions, this week:

-   Does a model make the reject decision, or does a person? The harm in the study traces back to an automated recommend-or-reject verdict applied at scale. If a tool screens candidates out on its own, with no human reading the result, that's the pattern to watch.
-   Have you ever let a random sample of rejected applicants through? It's the cheapest bias test there is: advance a small random group past the screen and watch how they actually perform on the job. Most teams never do it.
-   Can you explain, in plain language, why a specific candidate was screened out? If the honest answer is "the model said so," you can't defend it to that candidate or a regulator, and you can't tell whether a hidden proxy is doing the deciding.

It's also worth asking how your vendor checks its own fairness, and how often. A bias review run by an independent party, refreshed on a schedule, and published where you can read it tells you more than a single number in a sales deck. For a deeper walk through what a credible audit looks like, we wrote a buyer's guide on [AI recruiting and bias audits](/blog/ai-recruiting-bias-audits-compliant-platform).

## How we think about this at Classet

We'll be direct: this is the failure mode Joy, our AI phone screener, was built to avoid, and the safeguard that matters most is keeping a person in the loop. Three choices make the difference.

Joy runs a structured interview, not a personality proxy. Every candidate gets the same job-relevant questions about availability, experience, certifications, and must-haves, the things that actually predict whether someone shows up and does the work, instead of inferring traits from how they play a game. That's also [why a structured phone screen reduces bias](/blog/hear-them-out-ai-phone-screenings-edge-in-reducing-hiring-bias) compared with looser methods.

Joy doesn't make the hire-or-reject call. Your team does. Joy conducts the interview and hands over structured results for a person to read, so a human decides who moves forward and can say why. Keeping a human on that decision is the opposite of the pattern the study warns about, where an automated verdict quietly screened people out across employer after employer with nobody in the loop.

And we don't ask you to take our fairness on faith. Joy is independently bias-audited every month by Warden AI, with results published on a public [trust page](https://trust.warden-ai.com/classet/ai-phone-interviewer) you can read yourself. Independent, refreshed every month, and out in the open, instead of a number you never get to see.

None of that makes bias impossible. It makes it visible, contestable, and a human's responsibility, which is the whole point.

## FAQ

What did the Stanford algorithmic monoculture study find?

It analyzed 4.1 million job applications from 3.3 million people across 156 employers and 1,746 positions, all screened by one AI vendor. About 26% of Black applicants' and 15% of Asian applicants' applications went to roles where their group was selected at a rate low enough to trigger federal adverse-impact scrutiny under the EEOC four-fifths rule. Roughly 40,000 more applications would have advanced without the gap.

How can an AI hiring tool pass a bias audit and still be biased?

In the Stanford study, the issue was aggregation. When you average every position together, over-selection in one role can cancel under-selection in another, so the combined number looks fair even when individual roles aren't. U.S. law evaluates adverse impact per position, which is how the researchers surfaced disparities the pooled view missed. The broader lesson for buyers: look past a single headline fairness number, ask how the check was run and how often, and make sure a human, not the model, owns the final reject decision.

What is algorithmic monoculture in hiring?

It's what happens when one screening vendor makes the decision for many employers at once. Those applications stop being independent. The study found about 10% of people who applied to four positions were rejected from all of them, far more than chance, because the same model rendered the same judgment everywhere. When the researchers applied the same method to an earlier, non-algorithmic hiring study, rejections matched independent decision-making, with no comparable pattern.

How is Classet's approach different from the tool in the study?

Joy runs the same structured, job-relevant interview for every candidate instead of inferring traits from gamified tasks. It hands your team structured results to review, and a human recruiter makes the hire-or-reject decision and can explain it, rather than an algorithm deciding on its own. Joy is also independently bias-audited every month by Warden AI, with results published on a public trust page you can read yourself.

Is AI screening legal to use in hiring?

Yes, when it's applied carefully. The risk isn't the technology, it's how it's acted on. Tools that produce an automated hire-or-reject decision with no human review are the ones that draw adverse-impact scrutiny. Structured screening that informs a human decision, backed by regular independent bias audits, is far easier to defend.

## Key points

-   Stanford's _Algorithmic Monocultures in Hiring_ is the first large-scale study of real deployed AI screening: 4.1M applications, 3.3M applicants, one vendor.
-   Roughly 26% of applications from Black applicants and 15% from Asian applicants went to roles where their group was selected at rates that trigger federal adverse-impact scrutiny.
-   The vendor's pooled audits showed no bias; checking position by position, as law requires, revealed it in about 1 in 10 roles.
-   Monoculture effect: one shared algorithm can reject the same person across every employer it screens for, with no independent second look.
-   The fix for buyers: keep a human on the reject decision, sample your rejected pile, and require an independent, ongoing bias audit you can actually read.

## See it work

If you're auditing your own screening stack after reading this, that's the right instinct. [Book a walkthrough](/demo) and we'll show you how Joy runs a structured interview, keeps a human in every decision, and where to read our monthly third-party bias audit.

[![Paul Jones](/_next/image?url=https%3A%2F%2Fassets.basehub.com%2Fe0b5701f%2F6599306507912123f90f150a8bfaaf6c%2Fscreenshot-2026-01-28-at-10.53.16-am.png%3Fwidth%3D100%26height%3D100%26quality%3D100&w=128&q=75)

Paul Jones

Head of Growth at Classet

Paul comes from an operator background running an Alpine-owned company, and brings firsthand experience with the hiring challenges Classet was built to solve. He's driven by a belief that the right technology can make meaningful work more accessible.

](/blog/authors/paul-jones)

## Explore More

### Use Cases

-   [RPO / BPO Recruiting](/use-cases/call-centers-bpo)
-   [Healthcare Recruiting](/use-cases/healthcare)
-   [Hospitality Recruiting](/use-cases/hospitality)

### Integrations

-   [Greenhouse](/integrations/greenhouse)
-   [Bullhorn](/integrations/bullhorn)
-   [Lever](/integrations/lever)