Production AI Institute — vendor-neutral certification for AI practitioners

Verify a credential For organisations Contact

Ecosystem

The Production AI Ecosystem

LangChain, Composio, LangSmith, Guardrails AI — these tools are how agents get built. The PSF is how you know whether the result is safe to run in production. This page maps the ecosystem and shows where the standard applies.

Why independent assessment matters

Every tool in the production AI stack has a vendor who would prefer you thought of it as comprehensive. LangChain's documentation doesn't emphasise that it has no native PII protection. Composio's homepage doesn't lead with the fact that it provides no human-in-the-loop primitives. This is not deception — these are tool vendors describing what their tools do, not safety assessors evaluating what they miss.

The problem is that practitioners assembling a production AI stack need to know both: what each tool does well, and what gaps remain their responsibility to close. Without that complete picture, teams make confident deployments on incomplete foundations — and discover the gaps at incident time rather than design time.

PAI's role is to provide the complete picture. The Production Safety Framework defines what a safe production deployment requires across eight domains. Ecosystem assessments apply the PSF to specific tools — not to diminish them, but to give practitioners an honest map of what each tool satisfies and what each tool leaves open.

The production AI stack

A production agent deployment involves multiple layers. Each layer is necessary; none is sufficient on its own. The PSF applies to the system as a whole — not to any individual layer.

Models

OpenAI GPT-4Anthropic ClaudeGoogle GeminiMeta LlamaMistral

Foundation models. PAI assessments are model-agnostic — the PSF applies regardless of which model underpins a deployment.

Agent Frameworks

LangChain / LangGraphCrewAIAutoGen / AG2Semantic KernelSmolagentsPydanticAI

Orchestration and execution. These frameworks define how agents reason, plan, and call tools. PSF compliance depends heavily on how these are configured.

Read: LangChain PSF Assessment →

Tool Integration

ComposioToolhouseACI.devCustom connectors

Managed access to external services — email, calendar, CRMs, code repositories. Determines how agents take actions in the real world.

Read: Composio PSF Assessment →

Observability

LangSmithLangfuseArize PhoenixHeliconeTraceloop

Trace-level visibility into agent reasoning and execution. Satisfies PSF Domain 4. Critical for production incident investigation.

Safety & Guardrails

Guardrails AINeMo GuardrailsLlamaGuardPresidio

Input classification, output validation, PII detection, and prompt injection resistance. Closes PSF Domain 1, 2, and 3 gaps that most frameworks leave open.

Standards & Governance

PSF (PAI)NIST AI RMFISO/IEC 42001EU AI ActOWASP LLM Top 10

The frameworks that define what 'safe' means. PAI's PSF is the practitioner-focused standard for production agentic AI deployment.

Published PSF assessments

Each assessment evaluates a tool or framework against all eight PSF domains. Assessments are independent, versioned, and updated as products evolve.

Agent Framework

LangChain & LangGraph

Strong on observability (LangSmith) and vendor resilience. LangGraph adds strong human oversight. Gap on data protection and security without companion tooling.

D4: StrongD8: StrongD3: GapD7: Partial

Read assessment →

Agent Framework

CrewAI

Intuitive role-based multi-agent orchestration. Most extensive PSF gaps of any framework — multi-agent architecture amplifies every safety gap. Requires the most companion tooling.

D1: GapD5: GapD6: PartialD8: Partial

Read assessment →

Agent Framework

AutoGen / AG2

Standout human oversight model (UserProxyAgent). Docker code execution for sandboxed security. Weakest production deployment tooling — research origins are evident.

D6: StrongD5: PartialD7: PartialD4: Partial

Read assessment →

Agent Framework

Semantic Kernel

Microsoft's enterprise SDK for .NET and Python. Native Entra ID and Azure Key Vault give D7 a Strong rating. Best-in-class OpenTelemetry integration. The default choice for Azure-committed teams.

D4: StrongD7: StrongD1: PartialD8: Partial

Read assessment →

Agent Runtime

Cursor SDK

Released April 2026. Programmatic access to Cursor's agent runtime with MCP integration. Strong observability, gap on security and data protection — particularly for filesystem and email access.

D4: StrongD5: PartialD3: GapD7: Gap

Read assessment →

Observability

LangSmith vs Langfuse vs Arize

All three satisfy PSF D4 core requirements. LangSmith wins on LangChain depth; Langfuse wins on data residency and self-hosting; Arize wins on production alerting and MLOps integration.

D4: StrongD4a: StrongD4b: StrongD4c: Partial

Read assessment →

Tool Integration

Composio

Strong on security (managed OAuth) and data protection. Gap on human oversight — must be implemented above Composio.

D3: StrongD7: StrongD6: GapD8: Partial

Read assessment →

Agent Framework

Haystack (deepset)

RAG-native framework with the strongest production deployment story of any Python framework. Hayhooks REST serving is built-in. D4/D5/D8 are all Strong; D3 gap matters more for RAG workloads because retrieved documents carry PII.

D4: StrongD5: StrongD8: StrongD3: Gap

Read assessment →

Agent Framework

DSPy

Optimisation-first framework from Stanford NLP. TypedPredictor delivers the strongest structured output enforcement of any framework assessed (D2). Three gaps: D1, D3, D7. Research-to-production gap is real — deploy only with full companion safety layer.

D2: StrongD8: StrongD1: GapD7: Gap

Read assessment →

Agent Framework

Pydantic AI

Pydantic validation applied to LLM agents. Strong D2 from type-enforced outputs. Deliberately a library, not a platform — D5 and D6 are application responsibilities. Best for structured extraction pipelines; infrastructure ownership required.

D2: StrongD8: StrongD5: GapD6: Gap

Read assessment →

Agent Framework

Flowise / LangFlow

Visual low-code builders that accelerate prototyping and carry production security debt. Known CVEs in unauthenticated instances. D7 and D3 are gaps. Excellent for PoC; requires hardening before enterprise deployment.

D8: StrongD3: GapD7: GapD5: Partial

Read assessment →

Safety Tooling

Guardrails AI vs NeMo vs Azure CS

Three tools that close D1/D2/D3 gaps from different architectural positions. Guardrails AI for custom validators; NeMo for conversation policy; Azure Content Safety for enterprise managed compliance.

D1: StrongD2: StrongD3: StrongD7: Partial

Read assessment →

Vector Database

Pinecone vs Weaviate vs Chroma

PSF D3/D4 assessment of the three major vector databases. Weaviate wins on access control and audit logging. Pinecone wins on managed compliance. Chroma requires full application-layer D3 implementation.

D3: PartialD4: Partial

Read assessment →

Coming soon

More Assessments Coming

Haystack, DSPy, Pydantic AI, Flowise, and guardrails platform assessments in preparation.

What the Production AI Institute is not

Independence requires clarity about scope.

Not a framework vendor

PAI does not build or sell agent frameworks, tool integration libraries, or AI models. The PSF is designed to be implemented on top of any framework — LangChain, CrewAI, a custom Python stack, or anything else.

Not a consultancy

PAI does not offer implementation services. The standard and its assessments are published openly for practitioners to apply directly. If you want someone to implement it for you, that is a separate commercial relationship with a certified integrator.

Not affiliated with any vendor

PAI has no equity stakes, advertising relationships, or commercial agreements with any of the tools or frameworks it assesses. Independence is the only basis on which an assessment authority is credible.

Not a replacement for the tools it assesses

The PSF does not compete with LangChain, Composio, or any other tooling. It provides the yardstick against which they are evaluated — and most of them are genuinely useful. PSF compliance is about using the right tools correctly, not avoiding them.

How to use this for your deployment

If you are assembling a production AI stack, start by mapping your chosen tools against the PSF domains using the published assessments. Note which domains are addressed by your tooling and which require explicit implementation on your part. The gaps are your implementation checklist before deployment.

If your organisation requires formal compliance evidence — for internal governance, customer assurance, or regulatory purposes — the Certified Production AI Practitioner (CPAP) certification evaluates whether a real deployment meets PSF requirements. The assessment is conducted by an independent PAI assessor against your actual implementation, not a self-reported questionnaire.

Read the PSF →View certifications →

From reading to credential

You understand the gaps.
Get the credential that proves it.

The AIDA examination tests applied PSF knowledge across all eight domains — exactly the gaps and strengths covered in this assessment. 15 minutes. No charge. Ever.

Start AIDA — free →CPAP practitioner credential

The Production AI Ecosystem

Why independent assessment matters

The production AI stack

Published PSF assessments

What the Production AI Institute is not

How to use this for your deployment

You understand the gaps.Get the credential that proves it.

You understand the gaps.
Get the credential that proves it.