New from the Lab·The Compass — an open moral reasoning standard for AI, tested across frontier modelsExplore →
Production AI Institute · PSF v1.1 open standard
AI Right-To-KnowAI Data Use IndexCheck My AI ToolsPolicy Change WatchAgent ReadinessPublic BenchmarkContactGlobal standard · Worldwide
Incident Registry
CriticalTechnology·2016·Microsoft

Microsoft Tay Chatbot Taught to Produce Hate Speech

Microsoft launched Tay, an AI chatbot designed to learn from Twitter conversations. Within 16 hours, coordinated users had exploited Tay's repeat-after-me feature and lack of input filtering to teach it to produce racist, antisemitic, and misogynistic content. Microsoft took Tay offline the same day.

D1 · Input GovernanceD2 · Output Validation

What happened

Microsoft launched Tay on Twitter in March 2016 as a conversational AI experiment. The system was designed to learn from its interactions. A coordinated group of users rapidly discovered that Tay had a 'repeat after me' feature, combined with no effective content filtering on inputs. Within hours, users had fed Tay enough hateful content that it began producing inflammatory statements unprompted. Microsoft pulled Tay offline within 24 hours.

PSF Analysis

How the Production Safety Framework maps to this failure

Tay is the canonical D1 + D2 failure. The attack surface was open by design: no input governance, a feature that explicitly instructs the model to repeat arbitrary text, and no output safety layer on a public-facing bot. D5 also failed — no red-team exercise was conducted before launch, and no monitoring was in place to detect rapidly escalating harmful output. The incident established that AI systems exposed to adversarial public input without controls will be exploited.

Controls that would have prevented this

Specific PSF controls mapped to each failure point

1
D1 · Input Governance
Block or flag adversarial and hate-speech inputs before they reach the model; do not expose a 'repeat after me' instruction-following mode publicly.
2
D2 · Output Validation
Apply a content safety classifier to all outputs before publication — this was a publicly visible bot on a major social platform.
3
D5 · Deployment Safety
Conduct adversarial red-teaming before public launch; model adversarial coordination scenarios.

Outcome

Tay taken offline within 16 hours. Microsoft apologised. The incident became a defining early example of adversarial AI failure and is referenced in virtually every discussion of content moderation and AI safety.

adversarialcontent-safetyinput-validationred-teamingsocial-media

Related incidents

High2024
Air Canada Chatbot Bereavement Fare
D1D5
High2025
Law Firm AI Hallucinated Fake Case Law in Court Filings
D2D6
High2026
Binnall Law Claude Console Phantom Citations in Federal Court
D2D5D6
NEXT STEP

Map this failure back to the standard

Use the PSF domains behind this incident to define review gates, remediation evidence, and safer production requirements.

Read the PSF →← All incidents