Production AI Institute — vendor-neutral certification for AI practitioners
Verify a credentialFor organisationsContact
Ecosystem

The Production AI Ecosystem

LangChain, Composio, LangSmith, Guardrails AI — these tools are how agents get built. The PSF is how you know whether the result is safe to run in production. This page maps the ecosystem and shows where the standard applies.

Why independent assessment matters

Every tool in the production AI stack has a vendor who would prefer you thought of it as comprehensive. LangChain's documentation doesn't emphasise that it has no native PII protection. Composio's homepage doesn't lead with the fact that it provides no human-in-the-loop primitives. This is not deception — these are tool vendors describing what their tools do, not safety assessors evaluating what they miss.

The problem is that practitioners assembling a production AI stack need to know both: what each tool does well, and what gaps remain their responsibility to close. Without that complete picture, teams make confident deployments on incomplete foundations — and discover the gaps at incident time rather than design time.

PAI's role is to provide the complete picture. The Production Safety Framework defines what a safe production deployment requires across eight domains. Ecosystem assessments apply the PSF to specific tools — not to diminish them, but to give practitioners an honest map of what each tool satisfies and what each tool leaves open.

The production AI stack

A production agent deployment involves multiple layers. Each layer is necessary; none is sufficient on its own. The PSF applies to the system as a whole — not to any individual layer.

Models
OpenAI GPT-4Anthropic ClaudeGoogle GeminiMeta LlamaMistral

Foundation models. PAI assessments are model-agnostic — the PSF applies regardless of which model underpins a deployment.

Agent Frameworks
LangChain / LangGraphCrewAIAutoGen / AG2Semantic KernelSmolagentsPydanticAI

Orchestration and execution. These frameworks define how agents reason, plan, and call tools. PSF compliance depends heavily on how these are configured.

Read: LangChain PSF Assessment →
Tool Integration
ComposioToolhouseACI.devCustom connectors

Managed access to external services — email, calendar, CRMs, code repositories. Determines how agents take actions in the real world.

Read: Composio PSF Assessment →
Observability
LangSmithLangfuseArize PhoenixHeliconeTraceloop

Trace-level visibility into agent reasoning and execution. Satisfies PSF Domain 4. Critical for production incident investigation.

Safety & Guardrails
Guardrails AINeMo GuardrailsLlamaGuardPresidio

Input classification, output validation, PII detection, and prompt injection resistance. Closes PSF Domain 1, 2, and 3 gaps that most frameworks leave open.

Standards & Governance
PSF (PAI)NIST AI RMFISO/IEC 42001EU AI ActOWASP LLM Top 10

The frameworks that define what 'safe' means. PAI's PSF is the practitioner-focused standard for production agentic AI deployment.

Published PSF assessments

Each assessment evaluates a tool or framework against all eight PSF domains. Assessments are independent, versioned, and updated as products evolve.

Agent Framework
LangChain & LangGraph

Strong on observability (LangSmith) and vendor resilience. LangGraph adds strong human oversight. Gap on data protection and security without companion tooling.

D4: StrongD8: StrongD3: GapD7: Partial
Read assessment →
Agent Framework
CrewAI

Intuitive role-based multi-agent orchestration. Most extensive PSF gaps of any framework — multi-agent architecture amplifies every safety gap. Requires the most companion tooling.

D1: GapD5: GapD6: PartialD8: Partial
Read assessment →
Agent Framework
AutoGen / AG2

Standout human oversight model (UserProxyAgent). Docker code execution for sandboxed security. Weakest production deployment tooling — research origins are evident.

D6: StrongD5: PartialD7: PartialD4: Partial
Read assessment →
Agent Framework
Semantic Kernel

Microsoft's enterprise SDK for .NET and Python. Native Entra ID and Azure Key Vault give D7 a Strong rating. Best-in-class OpenTelemetry integration. The default choice for Azure-committed teams.

D4: StrongD7: StrongD1: PartialD8: Partial
Read assessment →
Agent Runtime
Cursor SDK

Released April 2026. Programmatic access to Cursor's agent runtime with MCP integration. Strong observability, gap on security and data protection — particularly for filesystem and email access.

D4: StrongD5: PartialD3: GapD7: Gap
Read assessment →
Observability
LangSmith vs Langfuse vs Arize

All three satisfy PSF D4 core requirements. LangSmith wins on LangChain depth; Langfuse wins on data residency and self-hosting; Arize wins on production alerting and MLOps integration.

D4: StrongD4a: StrongD4b: StrongD4c: Partial
Read assessment →
Tool Integration
Composio

Strong on security (managed OAuth) and data protection. Gap on human oversight — must be implemented above Composio.

D3: StrongD7: StrongD6: GapD8: Partial
Read assessment →
Agent Framework
Haystack (deepset)

RAG-native framework with the strongest production deployment story of any Python framework. Hayhooks REST serving is built-in. D4/D5/D8 are all Strong; D3 gap matters more for RAG workloads because retrieved documents carry PII.

D4: StrongD5: StrongD8: StrongD3: Gap
Read assessment →
Agent Framework
DSPy

Optimisation-first framework from Stanford NLP. TypedPredictor delivers the strongest structured output enforcement of any framework assessed (D2). Three gaps: D1, D3, D7. Research-to-production gap is real — deploy only with full companion safety layer.

D2: StrongD8: StrongD1: GapD7: Gap
Read assessment →
Agent Framework
Pydantic AI

Pydantic validation applied to LLM agents. Strong D2 from type-enforced outputs. Deliberately a library, not a platform — D5 and D6 are application responsibilities. Best for structured extraction pipelines; infrastructure ownership required.

D2: StrongD8: StrongD5: GapD6: Gap
Read assessment →
Agent Framework
Flowise / LangFlow

Visual low-code builders that accelerate prototyping and carry production security debt. Known CVEs in unauthenticated instances. D7 and D3 are gaps. Excellent for PoC; requires hardening before enterprise deployment.

D8: StrongD3: GapD7: GapD5: Partial
Read assessment →
Safety Tooling
Guardrails AI vs NeMo vs Azure CS

Three tools that close D1/D2/D3 gaps from different architectural positions. Guardrails AI for custom validators; NeMo for conversation policy; Azure Content Safety for enterprise managed compliance.

D1: StrongD2: StrongD3: StrongD7: Partial
Read assessment →
Vector Database
Pinecone vs Weaviate vs Chroma

PSF D3/D4 assessment of the three major vector databases. Weaviate wins on access control and audit logging. Pinecone wins on managed compliance. Chroma requires full application-layer D3 implementation.

D3: PartialD4: Partial
Read assessment →
Coming soon
More Assessments Coming

Haystack, DSPy, Pydantic AI, Flowise, and guardrails platform assessments in preparation.

What the Production AI Institute is not

Independence requires clarity about scope.

Not a framework vendor

PAI does not build or sell agent frameworks, tool integration libraries, or AI models. The PSF is designed to be implemented on top of any framework — LangChain, CrewAI, a custom Python stack, or anything else.

Not a consultancy

PAI does not offer implementation services. The standard and its assessments are published openly for practitioners to apply directly. If you want someone to implement it for you, that is a separate commercial relationship with a certified integrator.

Not affiliated with any vendor

PAI has no equity stakes, advertising relationships, or commercial agreements with any of the tools or frameworks it assesses. Independence is the only basis on which an assessment authority is credible.

Not a replacement for the tools it assesses

The PSF does not compete with LangChain, Composio, or any other tooling. It provides the yardstick against which they are evaluated — and most of them are genuinely useful. PSF compliance is about using the right tools correctly, not avoiding them.

How to use this for your deployment

If you are assembling a production AI stack, start by mapping your chosen tools against the PSF domains using the published assessments. Note which domains are addressed by your tooling and which require explicit implementation on your part. The gaps are your implementation checklist before deployment.

If your organisation requires formal compliance evidence — for internal governance, customer assurance, or regulatory purposes — the Certified Production AI Practitioner (CPAP) certification evaluates whether a real deployment meets PSF requirements. The assessment is conducted by an independent PAI assessor against your actual implementation, not a self-reported questionnaire.

Read the PSF →View certifications →
From reading to credential

You understand the gaps.
Get the credential that proves it.

The AIDA examination tests applied PSF knowledge across all eight domains — exactly the gaps and strengths covered in this assessment. 15 minutes. No charge. Ever.

Start AIDA — free →CPAP practitioner credential