Insights / Reference

AI Agent Production Ready Checklist

A PSF-aligned sign-off checklist for autonomous agents: 32 controls across eight domains, with a ship, harden, or rebuild decision framework.

Production AI Institute · 12 min read · Updated May 2026

An agent is production ready when it can run unattended against real users, real data, and real tools without silent failure modes that tutorials do not surface. Assessments indicate that most agent pilots pass demo evals but fail on input governance, output validation, or post-deployment monitoring gaps documented in NIST's 2025 work on deployed AI system monitoring (NIST.AI.800-4).

This checklist maps 32 concrete controls to the Production Safety Framework (PSF) eight domains. It complements the narrative primer Your AI Agent Isn't Production Ready with an operational sign-off format procurement and platform teams can reuse.

How to use this checklist

Assign a named owner per domain. Score each item yes, no, or not applicable with evidence links (runbook, test output, dashboard). Require all applicable items to be yes before production traffic. For agents with tool access or write permissions, treat PSF-6 and PSF-7 as blocking domains regardless of demo quality.

OpenAI's Agents SDK deployment guidance (May 2026) expects a /health endpoint, versioned prompts, runtime secret injection, and trace export via flush_traces() in long-running workers. Those requirements align with PSF-4 and PSF-5 below; they are necessary but not sufficient for full PSF coverage.

PSF-1

Input Governance

Practitioner guide: Input Governance implementation guide· Certification: CAIS

Maximum input length enforced before any LLM call, with rejection logging
Rate limiting at user and token level, not only at the API gateway
PII detection on inbound user content before it enters the model context
Prompt injection test suite run against direct and indirect attack patterns

PSF-2

Output Validation

Practitioner guide: Output Validation implementation guide· Certification: CAOP

Structured outputs validated against a schema before downstream writes
Business logic checks reject impossible values (dates, amounts, IDs)
Low-confidence outputs routed to human review, not auto-executed
Golden eval set scored on every prompt or model change before rollout

PSF-3

Data Protection

Practitioner guide: Data Protection implementation guide· Certification: CAIA

Data minimisation documented: each prompt field has a named lawful purpose
PII masked or tokenised before transmission to third-party model APIs
Data processor agreements in place with model and tool providers
Erasure path verified: deleted user data leaves logs, caches, and vector stores

PSF-4

Observability

Practitioner guide: Observability implementation guide· Certification: CAOP

End-to-end traces capture LLM calls, tool invocations, and handoffs
Latency tracked at P50, P95, and P99 with alerting on regression
Token cost per request monitored with budget thresholds
Conversation logs stored with PII redaction and searchable incident fields

PSF-5

Deployment Safety

Practitioner guide: Deployment Safety implementation guide· Certification: CLOE

Kill switch disables the agent in under 60 seconds without redeploying
Prompt and model versions stored in version control with changelogs
Canary rollout: new versions start at 1% to 5% of traffic with promotion gates
Rollback tested: previous version restorable in under five minutes

PSF-6

Human Oversight

Practitioner guide: Human Oversight implementation guide· Certification: CAIG

Every agent action classified by reversibility and business consequence
Irreversible or high-stakes actions require explicit human approval
Escalation path defined for uncertainty, tool failure, and policy edge cases
Scheduled human review of agent outputs, not only complaint-driven review

PSF-7

Security

Practitioner guide: Security implementation guide· Certification: CAIS

Tool permissions scoped to least privilege per task or tenant
Secrets injected at runtime; never embedded in prompts, manifests, or logs
Multi-tenant isolation verified: one user context cannot leak to another
Adversarial red-team results documented with remediation owners

PSF-8

Vendor Resilience

Practitioner guide: Vendor Resilience implementation guide· Certification: CVAE

Model provider outages trigger graceful degradation, not hard failure
Retry logic uses exponential backoff with circuit-breaker limits
Model version pinned; upgrades are deliberate, not automatic
Fallback provider or reduced-capability mode documented and tested

Ship, harden, or rebuild

Decision	When	Practitioner action
Ship	All applicable items yes; blocking domains (PSF-6, PSF-7 for tool-using agents) signed off	Open canary at 1% to 5%; monitor P95 latency, error rate, and eval score for 72 hours
Harden	Architecture fits the use case but 4+ items are no in any single domain	Close gaps using domain guides; re-run eval suite before expanding traffic
Rebuild	No kill switch, no human gate on irreversible actions, or no observability baseline	Stop production traffic; redesign control plane before re-attempting deployment

If production requires regulated evidence, map completed controls to PSF compliance, source-backed records, and the relevant review path. Delivery teams deploying agents for clients should keep sign-off tied to evidence quality, not confidence.

Sources and further reading

NIST Center for AI Standards and Innovation, Challenges to the Monitoring of Deployed AI Systems (NIST.AI.800-4, 2025)
NIST AI RMF Generative AI Profile (NIST.AI.600-1): go/no-go deployment thresholds and ongoing capability review
OpenAI, Agents SDK Deployment Manager and tracing documentation (openai-agents-python, 2026)
OpenAI API, Sandbox Agents guide: runtime secret injection and scoped workspace mounts (2026)
Production AI Institute, Seven Failure Modes of Production AI

Public record

This record is maintained by PAI and free to cite. If something is wrong or missing, tell us. Corrections and source suggestions keep the record honest.

Get record updates →Submit a correction

Records are free to cite. citation guidance.

Insights / Reference

AI Agent Production Ready Checklist

A PSF-aligned sign-off checklist for autonomous agents: 32 controls across eight domains, with a ship, harden, or rebuild decision framework.

Production AI Institute · 12 min read · Updated May 2026

How to use this checklist

PSF-1

Input Governance

Practitioner guide: Input Governance implementation guide· Certification: CAIS

Maximum input length enforced before any LLM call, with rejection logging
Rate limiting at user and token level, not only at the API gateway
PII detection on inbound user content before it enters the model context
Prompt injection test suite run against direct and indirect attack patterns

PSF-2

Output Validation

Practitioner guide: Output Validation implementation guide· Certification: CAOP

Structured outputs validated against a schema before downstream writes
Business logic checks reject impossible values (dates, amounts, IDs)
Low-confidence outputs routed to human review, not auto-executed
Golden eval set scored on every prompt or model change before rollout

PSF-3

Data Protection

Practitioner guide: Data Protection implementation guide· Certification: CAIA

Data minimisation documented: each prompt field has a named lawful purpose
PII masked or tokenised before transmission to third-party model APIs
Data processor agreements in place with model and tool providers
Erasure path verified: deleted user data leaves logs, caches, and vector stores

PSF-4

Observability

Practitioner guide: Observability implementation guide· Certification: CAOP

End-to-end traces capture LLM calls, tool invocations, and handoffs
Latency tracked at P50, P95, and P99 with alerting on regression
Token cost per request monitored with budget thresholds
Conversation logs stored with PII redaction and searchable incident fields

PSF-5

Deployment Safety

Practitioner guide: Deployment Safety implementation guide· Certification: CLOE

Kill switch disables the agent in under 60 seconds without redeploying
Prompt and model versions stored in version control with changelogs
Canary rollout: new versions start at 1% to 5% of traffic with promotion gates
Rollback tested: previous version restorable in under five minutes

PSF-6

Human Oversight

Practitioner guide: Human Oversight implementation guide· Certification: CAIG

Every agent action classified by reversibility and business consequence
Irreversible or high-stakes actions require explicit human approval
Escalation path defined for uncertainty, tool failure, and policy edge cases
Scheduled human review of agent outputs, not only complaint-driven review

PSF-7

Security

Practitioner guide: Security implementation guide· Certification: CAIS

Tool permissions scoped to least privilege per task or tenant
Secrets injected at runtime; never embedded in prompts, manifests, or logs
Multi-tenant isolation verified: one user context cannot leak to another
Adversarial red-team results documented with remediation owners

PSF-8

Vendor Resilience

Practitioner guide: Vendor Resilience implementation guide· Certification: CVAE

Model provider outages trigger graceful degradation, not hard failure
Retry logic uses exponential backoff with circuit-breaker limits
Model version pinned; upgrades are deliberate, not automatic
Fallback provider or reduced-capability mode documented and tested

Ship, harden, or rebuild

Decision	When	Practitioner action
Ship	All applicable items yes; blocking domains (PSF-6, PSF-7 for tool-using agents) signed off	Open canary at 1% to 5%; monitor P95 latency, error rate, and eval score for 72 hours
Harden	Architecture fits the use case but 4+ items are no in any single domain	Close gaps using domain guides; re-run eval suite before expanding traffic
Rebuild	No kill switch, no human gate on irreversible actions, or no observability baseline	Stop production traffic; redesign control plane before re-attempting deployment

Sources and further reading

NIST Center for AI Standards and Innovation, Challenges to the Monitoring of Deployed AI Systems (NIST.AI.800-4, 2025)
NIST AI RMF Generative AI Profile (NIST.AI.600-1): go/no-go deployment thresholds and ongoing capability review
OpenAI, Agents SDK Deployment Manager and tracing documentation (openai-agents-python, 2026)
OpenAI API, Sandbox Agents guide: runtime secret injection and scoped workspace mounts (2026)
Production AI Institute, Seven Failure Modes of Production AI

Public record

This record is maintained by PAI and free to cite. If something is wrong or missing, tell us. Corrections and source suggestions keep the record honest.

Get record updates →Submit a correction

Records are free to cite. citation guidance.