New from the Lab·The Compass — an open moral reasoning standard for AI, tested across frontier modelsExplore →
Production AI Institute · PSF v1.1 open standard
AI Right-To-KnowAI Data Use IndexCheck My AI ToolsPolicy Change WatchAgent ReadinessPublic BenchmarkContactGlobal standard · Worldwide
Incident Registry
HighTechnology·2024·Google

Google Gemini Generated Historically Inaccurate Images

Google's Gemini image generation feature produced racially diverse images for prompts where historical accuracy required specific demographics — including Black Nazi soldiers and female Founding Fathers. The overcorrection of a diversity-promotion safety layer produced historically inaccurate and offensive outputs. Google suspended the feature.

D2 · Output ValidationD1 · Input Governance

What happened

Following criticism that AI image generators systematically excluded non-white people, Google implemented a diversity promotion layer in Gemini's image generation. The system was configured to introduce racial and gender diversity into generated images. For general prompts this worked acceptably. However, the same system applied diversity promotion to historical prompts where specific demographics were factually required — generating images of racially diverse Nazi German soldiers and female American Founding Fathers. The feature went viral as an example of AI overcorrection.

PSF Analysis

How the Production Safety Framework maps to this failure

A D2 failure caused by a safety layer that was not sufficiently contextualised. The diversity-promotion control was a valid response to a real problem (underrepresentation in AI image outputs) but was applied without a context classifier that could distinguish contemporary/fictional prompts from historical prompts requiring demographic accuracy. D5 also failed: no red-team exercise appears to have tested the intersection of the diversity layer with historical prompts before launch.

Controls that would have prevented this

Specific PSF controls mapped to each failure point

1
D2 · Output Validation
Implement domain-specific validation that detects historical context in prompts and adjusts safety rules accordingly.
2
D1 · Input Governance
Classify prompt types (historical, fictional, contemporary) before applying diversity normalisation.
3
D5 · Deployment Safety
Red-team diversity-promotion features specifically against historical accuracy edge cases before deployment.

Outcome

Google suspended Gemini image generation of people in February 2024. Significant reputational and press coverage. Feature returned with revised handling in mid-2024.

image-generationbiassafety-layershistorical-accuracyovercorrection

Related incidents

High2024
Air Canada Chatbot Bereavement Fare
D1D5
High2025
Law Firm AI Hallucinated Fake Case Law in Court Filings
D2D6
High2026
Binnall Law Claude Console Phantom Citations in Federal Court
D2D5D6
NEXT STEP

Map this failure back to the standard

Use the PSF domains behind this incident to define review gates, remediation evidence, and safer production requirements.

Read the PSF →← All incidents