New from the Lab·The Compass — an open moral reasoning standard for AI, tested across frontier modelsExplore →
Production AI Institute · PSF v1.1 open standard
AI Right-To-KnowAI Data Use IndexCheck My AI ToolsPolicy Change WatchAgent ReadinessPublic BenchmarkContactGlobal standard · Worldwide
NEW
PAI Intelligence Briefing — Issue 001 · The Third Chip Flip, Software 3.0, and the Agent Operator
Read the briefing →
Insights

Reference library for
production AI

Long-form, versioned reference documents on deploying AI safely. Not blog posts. Citable, maintained, authoritative.

51 articles published8 PSF domains coveredUpdated April 2026
Do not stop at the article

Turn reading into proof.

If a guide exposes a risk, a gap, or a client opportunity, move straight into the path that converts interest into evidence.

Open Source·8 min readNew

Why We Open-Sourced WorkflowOS

The PSF is open text. WorkflowOS is the working artifact — a free, MIT-licensed PSF workflow designer you can use hosted at /studio or self-host from GitHub for client engagements.

2026-06-02Read guide →
Production Readiness15 min

Your AI Agent Isn't Production Ready. Here's What You're Missing.

You built your first AI agent. It works in testing. Before you ship it to real users or real business processes — read this. The eight PSF dimensions every production AI agent needs that every tutorial skips, with real failure examples and a concrete checklist for each.

2026-05-10Read →
Governance14 min

AI Governance Frameworks Compared: PSF vs ISO 42001 vs NIST AI RMF

Three frameworks dominate AI governance in 2026. ISO 42001 certifies your organisation has a documented governance process. NIST AI RMF provides a risk vocabulary. The PSF specifies the technical controls a production system must have. Most mature organisations use all three — this guide explains how they fit together.

2026-05-09Read →
Team Leadership12 min

How to Certify Your AI Team: A Practical Guide for Engineering and Product Leaders

Which AI credentials matter, how to sequence them, and how to build a team-wide certification programme without disrupting delivery. Includes role-to-credential mapping for engineers, leads, auditors, and compliance roles.

2026-05-09Read →
Exam Prep10 min

AIDA Certification Study Guide: How to Pass the AI Deployment Associate Exam

Complete preparation guide for the free AIDA exam. Covers all 8 topic domains by frequency, explains how scenario-based questions are structured, and gives a 2-hour preparation strategy that works for practitioners at any level.

2026-05-09Read →
Implementation Guide18 min

Cursor SDK in Production: Three PSF-Compliant Deployment Patterns

Cursor SDK launched today. This guide shows three concrete production deployment patterns — CI/CD agent, event-driven ambient agent, embedded product agent — with full PSF D1-D8 controls applied to each. Published on launch day.

2026-04-30Read →
StrategyNew10 min

The Third Chip Flip: Why Andrej Karpathy Says the CPU Era Is Over

The CPU dominated computing for 40 years. The GPU displaced it for AI. Karpathy says it is happening again — and whoever owns the model wins the third era. What the pattern means for your career and your organisation.

2026-05-03Read →
CareersNew12 min

The Agent Operator: The Hottest Job Nobody Is Hiring For Yet

Enterprises are about to redesign every knowledge workflow for autonomous agents. The people who will run those workflows don't need a CS degree. They need MCPs, CLIs, agents.md fluency, and business acumen. No curriculum covers this yet.

2026-05-03Read →
GuideNew20 min

21 Agentic Design Patterns: A Complete Guide for Business Professionals

Every production AI system is built from a small set of reusable patterns. This guide explains all 21 — from prompt chaining to swarm intelligence — in plain language, without code, mapped to PSF domains.

2026-05-03Read →
FoundationsNew8 min

What microgpt Reveals About LLMs: 200 Lines That Explain Everything

Karpathy built a complete GPT with no libraries, no dependencies, in 200 lines of Python. Here is what those lines actually tell you about how language models work — for professionals who will never write a single line of AI code.

2026-05-03Read →
Ecosystem Assessment15 min

Cursor SDK in Production: A PSF Domain Assessment

The Cursor SDK launched yesterday. Developers are already embedding agents in Gmail, GitHub, and Slack via MCP. This assessment documents the PSF profile of the SDK that's making it happen — and what enterprises need before they deploy it.

2026-04-30Read →
Reference14 min

The Ambient Agent: Production Safety Requirements for AI Agents Embedded in Enterprise Tools

A new deployment pattern is spreading faster than safety frameworks can track. Agents embedded in Gmail, IDEs, browsers, and Slack inherit the capabilities of their host tool — and require fundamentally different safety thinking than standalone API deployments.

2026-04-30Read →
Ecosystem Assessment18 min

Choosing an Agent Framework for Production: LangChain vs CrewAI vs AutoGen vs Semantic Kernel

A PSF domain matrix comparing the four major agent frameworks side-by-side. Decision guide by primary constraint, universal gap analysis, and individual framework profiles for production deployment.

2026-04-30Read →
Ecosystem Assessment13 min

CrewAI in Production: A PSF Domain Assessment

Multi-agent architectures amplify every PSF gap. This assessment documents CrewAI's profile across all eight domains and the critical practitioner actions required before production deployment.

2026-04-30Read →
Ecosystem Assessment13 min

AutoGen (AG2) in Production: A PSF Domain Assessment

AutoGen has the best human oversight model of any framework assessed — and the weakest production deployment tooling. An honest profile of both sides for enterprise practitioners.

2026-04-30Read →
Ecosystem Assessment14 min

n8n in Production: A PSF Domain Assessment

n8n powers thousands of AI-assisted workflows at MSPs and IT teams worldwide. This assessment maps n8n against all eight PSF domains — where the platform excels, the execution logging risk that creates silent compliance violations, and what practitioners must add.

2026-05-01Read →
Ecosystem Assessment14 min

OpenAI Agents SDK in Production: A PSF Domain Assessment

OpenAI's first-party agent framework has the strongest native human oversight support we've assessed — and the weakest vendor resilience. A full PSF domain profile for practitioners deciding whether to commit.

2026-05-01Read →
Ecosystem Assessment15 min

Amazon Bedrock Agents in Production: A PSF Domain Assessment

Bedrock Agents inherits the full AWS compliance stack — making it the strongest platform for regulated-industry deployments. Three domains are Strong, including Data Protection. Critical Guardrails configuration gap documented.

2026-05-01Read →
Ecosystem Assessment14 min

LlamaIndex in Production: A PSF Domain Assessment

LlamaIndex's modular architecture is the most vendor-resilient in the ecosystem. The silent risk is data access control: no native retrieval access model means most enterprise RAG deployments have an unaddressed data protection gap.

2026-05-01Read →
PSF Deep Dive16 min

PSF Domain 1: Input Governance — Complete Implementation Guide

The complete practitioner guide to D1 — prompt injection defence, input classification, schema validation, and the companion tooling that closes the gap every framework leaves open. Framework-specific implementation notes for LangChain, CrewAI, AutoGen, Semantic Kernel, and Haystack.

2026-04-30Read →
PSF Deep Dive17 min

PSF Domain 3: Data Protection — Why No Framework Covers It

D3 is a Gap for every major framework — by design, not by accident. This guide explains why, maps the complete threat surface including the vector store deletion problem, covers regulatory requirements by jurisdiction, and documents the implementation path.

2026-04-30Read →
PSF Deep Dive15 min

PSF Domain 6: Human Oversight — HITL Patterns for Production AI

The compliance theatre problem, the five-level autonomy framework, when oversight is required, and how to design review interfaces that are effective rather than just present. Covers skill maintenance and automation complacency — the long-term risk most teams ignore.

2026-04-30Read →
PSF Deep Dive14 min

PSF Domain 8: Vendor Resilience — Lock-in Taxonomy, Model Deprecation, Exit Strategy

Five lock-in risk types (model API, framework, vector DB, managed service, data). The model deprecation response playbook. Multi-vendor architecture patterns. SLA benchmarking table. Exit strategy documentation template. Completes the full PSF D1-D8 deep dive series.

2026-04-30Read →
PSF Deep Dive13 min

PSF Domain 5: Deployment Safety — Model Versioning, Canary Releases, Rollback

Every framework gets a Gap or Partial on D5. Model version pinning, canary traffic splitting, automatic rollback triggers, prompt version control, and the five anti-patterns that appear in every production AI post-mortem. Includes the Predetermined Change Control Plan template.

2026-04-30Read →
PSF Deep Dive15 min

PSF Domain 7: Security — AI Threat Modelling for Production

AI has a fundamentally different threat model than conventional software. Direct and indirect prompt injection, model supply chain attacks, RAG corpus poisoning, adversarial examples — mapped with controls and a minimum pen test suite for production AI systems.

2026-04-30Read →
PSF Deep Dive14 min

PSF Domain 2: Output Validation — The Three-Layer Contract

Schema validation is table stakes. The real failure modes are semantic drift and confidence blindness — when the model produces valid JSON that answers the wrong question. The complete D2 implementation guide with Pydantic patterns and framework-specific notes.

2026-04-30Read →
PSF Deep Dive13 min

PSF Domain 4: Observability — What You Must Log and Why

Most teams discover they have an observability gap after an incident. This guide covers the minimum logging schema, the four essential alerts, drift detection approaches, and a direct comparison of Langfuse, LangSmith, Arize Phoenix, and Helicone against D4 requirements.

2026-04-30Read →
Reference14 min

The Multi-Agent Amplification Problem

Multi-agent architectures don't add PSF gaps — they multiply them. Five amplification mechanisms: blast radius, agent-to-agent trust escalation, shared context contamination, oversight gap multiplication, and observability collapse. Framework safety posture for multi-agent deployments.

2026-04-30Read →
Reference12 min

PSF-Compliant Stack Recipes

Pre-validated PSF-compliant stack combinations for LangGraph, CrewAI, AutoGen, Semantic Kernel, and Haystack — mapped to all eight domains. The specific companion tools that close each framework's gaps, with implementation notes for every domain.

2026-04-30Read →
Ecosystem Assessment13 min

Pinecone vs Weaviate vs Chroma — Vector Database Safety Assessment

A PSF D3/D4 assessment of the three major vector databases. Data residency, access control, audit logging, multi-tenancy, and the universal PII-in-vectors checklist every RAG deployment needs. Decision guide by six deployment scenarios.

2026-04-30Read →
Ecosystem Assessment12 min

Haystack (deepset) — PSF Assessment

The RAG-native framework with the strongest deployment story of any Python framework. D5 is Haystack's standout — Hayhooks turns a pipeline into a production REST API out of the box. Critical D3 gap matters more here because RAG systems handle documents containing PII.

2026-04-30Read →
Ecosystem Assessment11 min

DSPy (Stanford NLP) — PSF Assessment

The optimisation-first framework with the strongest structured output guarantees — and the widest gap between research elegance and production safety. TypedPredictor provides the best D2 posture of any framework assessed. D1, D3, and D7 are all gaps requiring full companion tooling.

2026-04-30Read →
Ecosystem Assessment11 min

Pydantic AI — PSF Assessment

Type-safe structured outputs as a first principle. Core D2 strength from Pydantic validation. Three gaps — D3, D5, D6 — are library-not-platform gaps: Pydantic AI is deliberately a library, not a deployment platform. The right choice for structured extraction at the cost of owning your own infrastructure.

2026-04-30Read →
Ecosystem Assessment12 min

Flowise & LangFlow — PSF Assessment

Low-code visual builders that accelerate prototyping — and introduce critical security gaps when deployed to production. Known CVEs in unauthenticated self-hosted instances, no native D3 controls, and a D8 gap from upstream LangChain lock-in. Right tool for PoC; not production-safe out of the box.

2026-04-30Read →
Ecosystem Assessment14 min

Guardrails AI vs NeMo Guardrails vs Azure Content Safety

Three tools that close PSF D1, D2, and D3 gaps — but from completely different architectural positions. Guardrails AI wins on custom validators and framework flexibility; NeMo wins on conversation policy; Azure Content Safety wins on enterprise compliance and managed infrastructure.

2026-04-30Read →
Industry Playbook19 min

Energy & Critical Infrastructure AI Deployment Playbook

Power grids, pipelines, and water systems face nation-state threats and physical safety consequences. NERC CIP, IEC 62443, NIS2, and TSA Pipeline Directives impose change management and supply chain requirements that directly govern AI. OT/IT convergence, cascade failure risk, and all 8 PSF domains mapped as Critical or High.

2026-04-30Read →
Industry Playbook18 min

HR & Employment AI Deployment Playbook

Employment AI is explicitly listed high-risk in the EU AI Act. NYC Local Law 144 mandates annual independent bias audits. The EEOC has confirmed algorithmic hiring tools can violate Title VII — and employers can't transfer liability to vendors. The bias mechanisms and compliance architecture, mapped.

2026-04-30Read →
Industry Playbook17 min

Retail & E-Commerce AI Deployment Playbook

EU DSA transparency mandates, FTC dark pattern enforcement, and AI Act manipulation prohibitions are active — not theoretical. Recommendation engines, dynamic pricing, fraud detection, and customer service AI mapped against PSF domains with the false positive and dark pattern failure modes most common in production.

2026-04-30Read →
Industry Playbook18 min

Legal & Government AI Deployment Playbook

The EU AI Act lists eight categories of government and justice AI as high-risk. GDPR Article 22, CJIS, FedRAMP, and OMB M-24-10 apply simultaneously. This playbook maps every regulatory obligation to a PSF domain — and covers the algorithmic bias and hallucination failure modes that have caused documented harm.

2026-04-30Read →
Industry Playbook18 min

Healthcare AI Deployment Playbook

HIPAA, FDA AI/ML guidance, clinical decision support safety, and full PSF domain mapping for healthcare deployments. The highest-stakes AI context.

2026-04-30Read →
Industry Playbook18 min

Financial Services AI Deployment Playbook

Seven regulatory frameworks mapped to PSF domains: SR 11-7, MiFID II, FCA SYSC, DORA, GDPR, EU AI Act, Basel III. Required actions per domain, recommended stack by layer, and the three deployment patterns that satisfy the most regulatory surface area with the least duplication of effort.

2026-04-30Read →
Ecosystem Assessment16 min

LangSmith vs Langfuse vs Arize Phoenix — Observability for Production AI

All three satisfy PSF D4. The choice comes down to data residency, self-hosting, and framework affinity. A direct comparison of every capability that matters in production — with a decision guide for each scenario.

2026-04-30Read →
Ecosystem Assessment14 min

Microsoft Semantic Kernel — PSF Assessment

Enterprise-grade orchestration with native Azure identity and observability. Strong on D4 and D7 — with meaningful gaps in D3 and multi-cloud resilience. The definitive framework for Azure-committed teams.

2026-04-30Read →
Ecosystem Assessment14 min

LangChain and LangGraph in Production: A PSF Domain Assessment

An independent assessment of LangChain and LangGraph against all eight PSF domains — where the ecosystem is strong, where it leaves gaps, and what companion tooling practitioners must add for compliance.

2026-04-30Read →
Ecosystem Assessment12 min

Composio in Production: A PSF Domain Assessment

Composio solves managed OAuth for AI agents across 250+ services. This assessment maps its strengths (D3, D7) and the gaps practitioners must close — particularly human oversight (D6) and deployment safety (D5).

2026-04-30Read →
Reference8 min

What Is a Production AI System?

A precise technical definition of what separates a production AI system from a prototype, a demo, or a proof of concept — and why the distinction matters for safety, governance, and certification.

2026-04-29Read →
Reference11 min

The Seven Failure Modes of Production AI Deployments

An analysis of the most common patterns through which AI systems fail in production environments. Each mode is documented with causes, signals, and architectural mitigations.

2026-04-29Read →
Guide13 min

Human-in-the-Loop: When, Why, and How to Design Oversight Correctly

Not every AI decision requires a human checkpoint. Designing oversight correctly means knowing when to require it, how to present decisions for effective human judgment, and how to avoid the trap of meaningless compliance.

2026-04-29Read →
Guide9 min

How to Write an AI Behaviour Contract

A behaviour contract specifies what an AI system is permitted to do, in what contexts, with what autonomy level, and under what escalation conditions. This is the foundation of the PAI contract system.

2026-04-29Read →
Practitioner Guide10 min

Optimising Your Content for AI Discovery

AI systems now surface content in response to queries that used to go to search engines. The rules are different. This guide covers llms.txt, structured data, answer-first writing, and how to measure AI citation.

2026-04-29Read →
Regulatory12 min

What the EU AI Act Means for Your Production AI System

The EU AI Act is in force. If your AI system makes consequential decisions about people, classifies or monitors them, or operates in critical infrastructure, you need to understand what it requires.

2026-04-29Read →
PSF Domains

Domain reference guides

Deep reference for all eight Production Safety Framework domains. Aligned with AIDA, AIMA, CPAP, and CPAA examination content. Licensed CC BY 4.0.

D1Input Governance

Prompt injection defence, input validation, intent classification.

D2Output Validation

Schema enforcement, hallucination detection, confidence gating.

D3Data Protection

PII handling, consent chains, data residency, vector store risk.

D4Observability

Inference logging, quality scoring, drift detection and alerting.

D5Deployment Safety

Canary releases, rollback procedures, circuit breakers.

D6Human Oversight

Autonomy levels, escalation design, blind review sampling.

D7Security

Threat modelling, adversarial robustness, red-teaming architecture.

D8Vendor Resilience

Abstraction layers, version pinning, fallback provider design.

Coming next

In development

Reference documents currently being researched and written.

LLMOps: Operating Large Language Models at Scale
Red-Teaming Your AI System Before Deployment
The Production AI Deployment Checklist — interactive, mapped to PSF
Glossary of Production AI Terms — definitive, citable
State of Production AI 2026 — annual flagship report
The Production AI Brief

Get the brief that keeps AI work defensible

PSF updates, deployment checks, failure patterns, and proof paths for practitioners, MSPs, and teams who need AI work to survive scrutiny. No hype.

Complete library

All 98 articles

The full PAI research library, grouped by type. Independent PSF assessments, incident analyses, playbooks and guides.

PSF Assessments · 33
Claude Opus 4.8 PSF AssessmentCursor 3.5 Automations in Production: A PSF Domain AssessmentCursor 3.6 Auto-Review in Production: A PSF Domain AssessmentCursor 3.7 Browser Design Mode PSF AssessmentCursor 3.7 Canvas Design Mode PSF AssessmentCursor Enterprise Organizations PSF AssessmentCursor SDK 3.7 PSF AssessmentGoogle Agent Executor (AX) in Production: A PSF Domain AssessmentOpenAI Codex CLI 0.134.0 PSF AssessmentOpenAI Codex Sites & Role Plugins PSF AssessmentOpenAI on Amazon Bedrock PSF AssessmentAmazon Bedrock Agents in Production: A PSF Domain AssessmentAutoGen (AG2) in Production: A PSF Domain AssessmentClaude Sonnet 4.6 PSF AssessmentComposio in Production: A PSF Domain AssessmentCrewAI in Production: A PSF Domain AssessmentCursor SDK in Production: A PSF Domain AssessmentDify in Production: A PSF Domain AssessmentDSPy — PSF AssessmentFlowise & LangFlow — PSF AssessmentGemini 1.5 Pro PSF AssessmentGPT-4.1 PSF Assessment | Production AI InstituteHaystack (deepset) — PSF AssessmentLangChain and LangGraph in Production: A PSF Domain AssessmentLangGraph 1.2.1 in Production: A PSF Domain AssessmentLlama 3.1 70B PSF Assessment (Self-Hosted)LlamaIndex in Production: A PSF Domain AssessmentMicrosoft Semantic Kernel — PSF Assessmentn8n in Production: A PSF Domain AssessmentOpenAI Agents SDK in Production: A PSF Domain AssessmentPydantic AI — PSF AssessmentTemporal Python SDK 1.27.2 in Production: A PSF Domain AssessmentTrinity (Ability.AI) in Production: A PSF Domain Assessment
Incident Analyses · 3
Anthropic Claude Multi-Model API Errors (June 5, 2026): Production ImpactBinnall Law Claude Console Phantom Citations Incident (May 2026)OpenAI May 2026 Multi-Service Outage Incident (ChatGPT, API, Login)
Comparisons · 4
Choosing an Agent Framework for Production: LangChain vs CrewAI vs AutoGen vs Semantic KernelGuardrails AI vs NeMo Guardrails vs Azure Content Safety — PSF ComparisonLangSmith vs Langfuse vs Arize Phoenix — Observability for Production AIPinecone vs Weaviate vs Chroma — Vector Database Safety Assessment
Industry Playbooks · 6
Energy & Critical Infrastructure AI Deployment Playbook | PAIHealthcare AI Deployment Playbook — HIPAA, FDA, Clinical SafetyHR & Employment AI Deployment Playbook — CV Screening Bias, NYC Local Law 144, EU AI ActLegal & Government AI Deployment Playbook — EU AI Act, FedRAMP, Algorithmic AccountabilityProduction AI in Financial Services — PSF PlaybookRetail & E-Commerce AI Deployment Playbook — Personalisation, Fraud, Customer Service AI
Guides & Checklists · 11
21 Agentic Design Patterns: A Complete Guide for Business ProfessionalsAI Agent Production Ready Checklist (PSF-Aligned)How to Write an AI Behaviour ContractPSF Domain 1: Input Governance — Complete Implementation GuidePSF Domain 2: Output Validation — Implementation Deep DivePSF Domain 3: Data Protection — Why No Framework Covers ItPSF Domain 4: Observability — Implementation Deep DivePSF Domain 5: Deployment Safety — Model Versioning, Canary Releases, RollbackPSF Domain 6: Human Oversight — HITL Patterns for Production AIPSF Domain 7: Security — AI Threat Modelling, Prompt Injection, Model Supply ChainPSF Domain 8: Vendor Resilience — AI Vendor Lock-in, Model Deprecation, Multi-Vendor Strategy
Certification · 6
Free AI Certification That's Actually Free (No Card Required)AI Certification Compared: Production AI, Cloud Vendor, and GRC TracksAIDA Certification Study Guide: How to Pass the AI Deployment Associate ExamHow to Certify Your AI Team: A Practical Guide for Engineering and Product LeadersMSP AI Certification Guide: What Your Team Needs and WhyWhat Is a Certified AI Integrator?
Articles & Research · 35
A $0.01 Bank Transfer Almost Broke a Banking AI AgentWhen AI Hides Its Rules: Claude's Secret GuardrailsAI Data Use Index: May 2026 Week 5 (Gemini Spark & OpenAI US Privacy)PAI Lab Report: Public GitHub Agent Readiness, May 2026What Gemini Enterprise Agentic RAG Changes for Production AI TeamsWhat Nemotron 3.5 Content Safety Changes for Production AI TeamsWhat OpenAI Inline Moderation Changes for Production AI TeamsAI Governance Frameworks Compared: PSF vs ISO 42001 vs NIST AI RMFai proof your careerAnthropic Suspends Claude Fable 5 — What It Means for AI EvaluationCursor SDK in Production: Three PSF-Compliant Deployment PatternsHuman-in-the-Loop design guideOptimising Your Content for AI DiscoveryPSF Compliance: What It Is and How to Achieve ItPSF-1: Input Governance — PSF Domain GuidePSF-2: Output Validation — PSF Domain GuidePSF-3: Data Protection — PSF Domain GuidePSF-4: Observability & Monitoring — PSF Domain GuidePSF-5: Deployment Safety — PSF Domain GuidePSF-6: Human Oversight — PSF Domain GuidePSF-7: Security — PSF Domain GuidePSF-8: Vendor Resilience — PSF Domain GuidePSF-Compliant Stack RecipesThe Agent Operator: The Hottest Job Nobody Is Hiring For YetThe Ambient Agent: Production Safety Requirements for AI Agents Embedded in Enterprise ToolsThe Machine God — The Physics of Morality and AI AlignmentThe Multi-Agent Amplification ProblemThe Seven Failure Modes of Production AI DeploymentsThe Third Chip Flip: Why Andrej Karpathy Says the CPU Era Is OverWhat Is a Production AI System?What Is Production AI? Definition, Maturity, and the PSF StandardWhat microgpt Reveals About LLMs: Understanding Language Models From 200 Lines of CodeWhat the EU AI Act Means for Your Production AI SystemWhy We Open-Sourced WorkflowOSYour AI Agent Isn't Production Ready. Here's What You're Missing.