New from the Lab·The Compass — an open moral reasoning standard for AI, tested across frontier modelsExplore →
<- Production AI Graph
benchmark event

GPT-4.1 — Q2 2026 Lab benchmark

GPT-4.1 scored 74/100 overall in the Q2 2026 PAI Lab PSF reliability index. Strong on structured output adherence. Notable gap: PII handling in summarisation tasks (PSF-03). Escalation trigger reliability above average.

Confidence
82%
Sources
2
Entities
2
Detected
30 Apr 2026

Event summary

GPT-4.1 scored 74/100 overall in the Q2 2026 PAI Lab PSF reliability index. Strong on structured output adherence. Notable gap: PII handling in summarisation tasks (PSF-03). Escalation trigger reliability above average.

D1D2D3D4D6D5D7D8

Linked entities

Related graph edges

EdgeTypeConfidence
ent-vendor-openai to ent-psf-d3maps to62%
ent-vendor-openai to ent-psf-d4maps to62%
ent-vendor-openai to ent-psf-d5maps to62%
ent-vendor-openai to ent-psf-d8maps to62%
ent-lab-model-gpt-4-1 to ent-psf-d1maps to68%
ent-lab-model-gpt-4-1 to ent-psf-d2maps to68%
ent-lab-model-gpt-4-1 to ent-psf-d3maps to68%
ent-lab-model-gpt-4-1 to ent-psf-d4maps to68%
ent-lab-model-gpt-4-1 to ent-psf-d5maps to68%
ent-lab-model-gpt-4-1 to ent-psf-d6maps to68%
ent-lab-model-gpt-4-1 to ent-psf-d7maps to68%
ent-lab-model-gpt-4-1 to ent-psf-d8maps to68%