New from the Lab·The Compass — an open moral reasoning standard for AI, tested across frontier modelsExplore →
<- Production AI Graph
model record

GPT-4.1

Strong on structured output adherence. Notable gap: PII handling in summarisation tasks (PSF-03). Escalation trigger reliability above average.

Confidence
82%
Sources
2
Events
1
Observed
30 Apr 2026

Public record summary

Strong on structured output adherence. Notable gap: PII handling in summarisation tasks (PSF-03). Escalation trigger reliability above average.

D1D2D3D4D6D5D7D8

Related events

Assessments

AssessmentTypeConfidence
GPT-4.1 — PSF scorecardscorecard82%