Methodology PAI-ARI-2026.1

How the PAI Agent Readiness Index scores production AI agents

The Index is a PSF-aligned readiness report. It helps teams inspect whether an agentic system has the controls, operating evidence, and deployment discipline expected of production AI. It is not a credential, certification, endorsement, or safety guarantee.

Run the IndexPublic benchmarkRead the PSF

What is scored

The overall readiness score is a 0-100 composite across the eight Production Safety Framework domains. Each domain receives an equal weight. The first release combines self-assessment answers with repository evidence signals when the user provides a public GitHub URL or a private repository file manifest.

D1

Input boundary

Scope, allowed sources, abuse controls, and prompt injection boundaries.

D2

Output validation

Contracts, schemas, refusals, confidence thresholds, and failure paths.

D3

Data stewardship

Classification, minimisation, retention, redaction, and vendor data access.

D4

Observability

Traces, evals, incidents, drift, operational review, and production metrics.

D5

Deployment control

Versioning, release gates, canaries, rollbacks, and reproducibility.

D6

Human oversight

Autonomy limits, approvals, escalations, overrides, and audit trails.

D7

Security posture

Tool permissions, secrets, agent threat testing, and integration risk.

D8

Ecosystem resilience

Provider fallbacks, dependency inventory, portability, and degraded modes.

How repository scanning works

For public GitHub repositories, the Index reads repository metadata and file paths through the GitHub API. It does not clone the repository. It treats file-path matches as supporting evidence, not proof.

For private repositories, teams can run git ls-files locally and paste the file-path manifest into the Index. This sends path names only. It does not send source code, file contents, commits, credentials, secrets, or repository history.

Agent operating instructions such as AGENTS.md, CLAUDE.md, or repository policy files.
Evaluation harnesses, golden tests, scorecards, or model quality checks.
Schema validation, typed output contracts, Zod, Pydantic, JSON Schema, or equivalent validators.
Observability, tracing, telemetry, Langfuse, LangSmith, OpenTelemetry, or incident records.
Deployment gates, CI workflows, canaries, rollback runbooks, and versioned prompts or policies.
Human approval gates, autonomy matrices, escalation paths, and audit logs.
Security policy, secret hygiene, dependency review, least-privilege permission evidence, and agent threat models.
Provider fallback, degraded mode, dependency inventory, and resilience evidence.

Evidence grades and tiers

The readiness score and evidence grade are separate. A team can score well on self-assessment while still receiving a weak evidence grade if little public or manifest evidence is visible. This distinction is deliberate: readiness claims should be inspectable.

85-100Production-ready with evidence
70-84Managed production candidate
55-69Partial controls
35-54Prototype controls
0-34Evidence missing

What the Index does not claim

A public readiness report is not PAI certification. It is not an endorsement of the repository, company, model, or deployment. It is a structured report that helps teams see their own gaps, publish evidence, and align their agent work to an open production standard.