model record

GPT-4.1

Strong on structured output adherence. Notable gap: PII handling in summarisation tasks (PSF-03). Escalation trigger reliability above average.

Confidence

82%

Sources

Events

Observed

30 Apr 2026

Public record summary

Strong on structured output adherence. Notable gap: PII handling in summarisation tasks (PSF-03). Escalation trigger reliability above average.

D1D2D3D4D6D5D7D8

Assessment	Type	Confidence
GPT-4.1 — PSF scorecard	scorecard	82%