Production AI Institute · PSF v1.1 open standard
AI Right-To-KnowAI Data Use IndexCheck My AI ToolsPolicy Change WatchAgent ReadinessPublic BenchmarkContactGlobal standard · Worldwide
Insights / Reference

AI Agent Production Ready Checklist

A PSF-aligned sign-off checklist for autonomous agents: 32 controls across eight domains, with a ship, harden, or rebuild decision framework.

Production AI Institute · 12 min read · Updated May 2026

An agent is production ready when it can run unattended against real users, real data, and real tools without silent failure modes that tutorials do not surface. Assessments indicate that most agent pilots pass demo evals but fail on input governance, output validation, or post-deployment monitoring gaps documented in NIST's 2025 work on deployed AI system monitoring (NIST.AI.800-4).

This checklist maps 32 concrete controls to the Production Safety Framework (PSF) eight domains. It complements the narrative primer Your AI Agent Isn't Production Ready with an operational sign-off format procurement and platform teams can reuse.

How to use this checklist

Assign a named owner per domain. Score each item yes, no, or not applicable with evidence links (runbook, test output, dashboard). Require all applicable items to be yes before production traffic. For agents with tool access or write permissions, treat PSF-6 and PSF-7 as blocking domains regardless of demo quality.

OpenAI's Agents SDK deployment guidance (May 2026) expects a /health endpoint, versioned prompts, runtime secret injection, and trace export via flush_traces() in long-running workers. Those requirements align with PSF-4 and PSF-5 below; they are necessary but not sufficient for full PSF coverage.

PSF-1

Input Governance

Practitioner guide: Input Governance implementation guide· Certification: CAIS

  • Maximum input length enforced before any LLM call, with rejection logging
  • Rate limiting at user and token level, not only at the API gateway
  • PII detection on inbound user content before it enters the model context
  • Prompt injection test suite run against direct and indirect attack patterns
PSF-2

Output Validation

Practitioner guide: Output Validation implementation guide· Certification: CAOP

  • Structured outputs validated against a schema before downstream writes
  • Business logic checks reject impossible values (dates, amounts, IDs)
  • Low-confidence outputs routed to human review, not auto-executed
  • Golden eval set scored on every prompt or model change before rollout
PSF-3

Data Protection

Practitioner guide: Data Protection implementation guide· Certification: CAIA

  • Data minimisation documented: each prompt field has a named lawful purpose
  • PII masked or tokenised before transmission to third-party model APIs
  • Data processor agreements in place with model and tool providers
  • Erasure path verified: deleted user data leaves logs, caches, and vector stores
PSF-4

Observability

Practitioner guide: Observability implementation guide· Certification: CAOP

  • End-to-end traces capture LLM calls, tool invocations, and handoffs
  • Latency tracked at P50, P95, and P99 with alerting on regression
  • Token cost per request monitored with budget thresholds
  • Conversation logs stored with PII redaction and searchable incident fields
PSF-5

Deployment Safety

Practitioner guide: Deployment Safety implementation guide· Certification: CLOE

  • Kill switch disables the agent in under 60 seconds without redeploying
  • Prompt and model versions stored in version control with changelogs
  • Canary rollout: new versions start at 1% to 5% of traffic with promotion gates
  • Rollback tested: previous version restorable in under five minutes
PSF-6

Human Oversight

Practitioner guide: Human Oversight implementation guide· Certification: CAIG

  • Every agent action classified by reversibility and business consequence
  • Irreversible or high-stakes actions require explicit human approval
  • Escalation path defined for uncertainty, tool failure, and policy edge cases
  • Scheduled human review of agent outputs, not only complaint-driven review
PSF-7

Security

Practitioner guide: Security implementation guide· Certification: CAIS

  • Tool permissions scoped to least privilege per task or tenant
  • Secrets injected at runtime; never embedded in prompts, manifests, or logs
  • Multi-tenant isolation verified: one user context cannot leak to another
  • Adversarial red-team results documented with remediation owners
PSF-8

Vendor Resilience

Practitioner guide: Vendor Resilience implementation guide· Certification: CVAE

  • Model provider outages trigger graceful degradation, not hard failure
  • Retry logic uses exponential backoff with circuit-breaker limits
  • Model version pinned; upgrades are deliberate, not automatic
  • Fallback provider or reduced-capability mode documented and tested

Ship, harden, or rebuild

DecisionWhenPractitioner action
ShipAll applicable items yes; blocking domains (PSF-6, PSF-7 for tool-using agents) signed offOpen canary at 1% to 5%; monitor P95 latency, error rate, and eval score for 72 hours
HardenArchitecture fits the use case but 4+ items are no in any single domainClose gaps using domain guides; re-run eval suite before expanding traffic
RebuildNo kill switch, no human gate on irreversible actions, or no observability baselineStop production traffic; redesign control plane before re-attempting deployment

If production requires regulated evidence, map completed controls to PSF compliance and practitioner certifications (CAOP for operations, CAIS for security, CAIG for governance). MSPs deploying agents for clients should align sign-off to the MSP AI certification guide.

Sources and further reading

Apply the standard

Turn the evidence into production practice.

Use the PSF, research library, and Lab material to review your own deployment. Credentials are available when a client, employer, or regulator needs public proof.

Read the PSF →View credentials
The Production AI Brief