<- Production AI Graph
benchmark event
GPT-4.1 — Q2 2026 Lab benchmark
GPT-4.1 scored 74/100 overall in the Q2 2026 PAI Lab PSF reliability index. Strong on structured output adherence. Notable gap: PII handling in summarisation tasks (PSF-03). Escalation trigger reliability above average.
Confidence
82%
Sources
2
Entities
2
Detected
30 Apr 2026
Event summary
GPT-4.1 scored 74/100 overall in the Q2 2026 PAI Lab PSF reliability index. Strong on structured output adherence. Notable gap: PII handling in summarisation tasks (PSF-03). Escalation trigger reliability above average.
D1D2D3D4D6D5D7D8
Linked entities
Related graph edges
| Edge | Type | Confidence |
|---|---|---|
| ent-vendor-openai to ent-psf-d3 | maps to | 62% |
| ent-vendor-openai to ent-psf-d4 | maps to | 62% |
| ent-vendor-openai to ent-psf-d5 | maps to | 62% |
| ent-vendor-openai to ent-psf-d8 | maps to | 62% |
| ent-lab-model-gpt-4-1 to ent-psf-d1 | maps to | 68% |
| ent-lab-model-gpt-4-1 to ent-psf-d2 | maps to | 68% |
| ent-lab-model-gpt-4-1 to ent-psf-d3 | maps to | 68% |
| ent-lab-model-gpt-4-1 to ent-psf-d4 | maps to | 68% |
| ent-lab-model-gpt-4-1 to ent-psf-d5 | maps to | 68% |
| ent-lab-model-gpt-4-1 to ent-psf-d6 | maps to | 68% |
| ent-lab-model-gpt-4-1 to ent-psf-d7 | maps to | 68% |
| ent-lab-model-gpt-4-1 to ent-psf-d8 | maps to | 68% |