Production AI Institute — vendor-neutral certification for AI practitioners
Verify a credentialFor organisationsContact
AI Incident Registry
CriticalTechnology·2016·Microsoft

Microsoft Tay Chatbot Taught to Produce Hate Speech

Microsoft launched Tay, an AI chatbot designed to learn from Twitter conversations. Within 16 hours, coordinated users had exploited Tay's repeat-after-me feature and lack of input filtering to teach it to produce racist, antisemitic, and misogynistic content. Microsoft took Tay offline the same day.

D1 · Input GovernanceD2 · Output Validation

What happened

Microsoft launched Tay on Twitter in March 2016 as a conversational AI experiment. The system was designed to learn from its interactions. A coordinated group of users rapidly discovered that Tay had a 'repeat after me' feature, combined with no effective content filtering on inputs. Within hours, users had fed Tay enough hateful content that it began producing inflammatory statements unprompted. Microsoft pulled Tay offline within 24 hours.

PSF Analysis

How the Production Safety Framework maps to this failure

Tay is the canonical D1 + D2 failure. The attack surface was open by design: no input governance, a feature that explicitly instructs the model to repeat arbitrary text, and no output safety layer on a public-facing bot. D5 also failed — no red-team exercise was conducted before launch, and no monitoring was in place to detect rapidly escalating harmful output. The incident established that AI systems exposed to adversarial public input without controls will be exploited.

Controls that would have prevented this

Specific PSF controls mapped to each failure point

1
D1 · Input Governance
Block or flag adversarial and hate-speech inputs before they reach the model; do not expose a 'repeat after me' instruction-following mode publicly.
2
D2 · Output Validation
Apply a content safety classifier to all outputs before publication — this was a publicly visible bot on a major social platform.
3
D5 · Deployment Safety
Conduct adversarial red-teaming before public launch; model adversarial coordination scenarios.

Outcome

Tay taken offline within 16 hours. Microsoft apologised. The incident became a defining early example of adversarial AI failure and is referenced in virtually every discussion of content moderation and AI safety.

adversarialcontent-safetyinput-validationred-teamingsocial-media

Related incidents

High2024
Air Canada Chatbot Bereavement Fare
D1D5
Medium2022
GitHub Copilot Reproduced Licensed Code Verbatim
D2D3
High2024
Google Gemini Generated Historically Inaccurate Images
D2D1
NEXT STEP

Prove you understand how to prevent failures like this

The AIDA exam tests PSF knowledge across all 8 domains. Free to take, immediately verifiable.

Take the AIDA exam →← All incidents