New from the Lab·The Compass — an open moral reasoning standard for AI, tested across frontier modelsExplore →
Production AI Institute · PSF v1.1 open standard
AI Right-To-KnowAI Data Use IndexCheck My AI ToolsPolicy Change WatchAgent ReadinessPublic BenchmarkContactGlobal standard · Worldwide
Independent ecosystem intelligence

The map for the production AI stack

LangChain, Composio, LangSmith, Guardrails AI, vector databases, model providers, and cloud runtimes all solve pieces of the deployment problem. The PSF is the independent yardstick for the system as a whole.

13published assessments
8PSF domains mapped
0vendor sponsorships

Why independent assessment matters

Every tool in the production AI stack has a vendor who would prefer you thought of it as comprehensive. LangChain's documentation doesn't emphasise that it has no native PII protection. Composio's homepage doesn't lead with the fact that it provides no human-in-the-loop primitives. This is not deception — these are tool vendors describing what their tools do, not safety assessors evaluating what they miss.

The problem is that practitioners assembling a production AI stack need to know both: what each tool does well, and what gaps remain their responsibility to close. Without that complete picture, teams make confident deployments on incomplete foundations — and discover the gaps at incident time rather than design time.

PAI's role is to provide the complete picture. The Production Safety Framework defines what a safe production deployment requires across eight domains. Ecosystem assessments apply the PSF to specific tools — not to diminish them, but to give practitioners an honest map of what each tool satisfies and what each tool leaves open.

The production AI stack

A production agent deployment involves multiple layers. Each layer is necessary; none is sufficient on its own. The PSF applies to the system as a whole — not to any individual layer.

Models
OpenAI GPT-4Anthropic ClaudeGoogle GeminiMeta LlamaMistral

Foundation models. PAI assessments are model-agnostic — the PSF applies regardless of which model underpins a deployment.

Agent Frameworks
LangChain / LangGraphCrewAIAutoGen / AG2Semantic KernelSmolagentsPydanticAI

Orchestration and execution. These frameworks define how agents reason, plan, and call tools. PSF compliance depends heavily on how these are configured.

Read: LangChain PSF Assessment →
Tool Integration
ComposioToolhouseACI.devCustom connectors

Managed access to external services — email, calendar, CRMs, code repositories. Determines how agents take actions in the real world.

Read: Composio PSF Assessment →
Observability
LangSmithLangfuseArize PhoenixHeliconeTraceloop

Trace-level visibility into agent reasoning and execution. Satisfies PSF Domain 4. Critical for production incident investigation.

Safety & Guardrails
Guardrails AINeMo GuardrailsLlamaGuardPresidio

Input classification, output validation, PII detection, and prompt injection resistance. Closes PSF Domain 1, 2, and 3 gaps that most frameworks leave open.

Standards & Governance
PSF (PAI)NIST AI RMFISO/IEC 42001EU AI ActOWASP LLM Top 10

The frameworks that define what 'safe' means. PAI's PSF is the practitioner-focused standard for production agentic AI deployment.

Published PSF assessments

Each assessment evaluates a tool or framework against all eight PSF domains. Assessments are independent, versioned, and updated as products evolve.

Agent Framework
LangChain & LangGraph

Strong on observability (LangSmith) and vendor resilience. LangGraph adds strong human oversight. Gap on data protection and security without companion tooling.

D4: StrongD8: StrongD3: GapD7: Partial
Read assessment →
Agent Framework
CrewAI

Intuitive role-based multi-agent orchestration. Most extensive PSF gaps of any framework — multi-agent architecture amplifies every safety gap. Requires the most companion tooling.

D1: GapD5: GapD6: PartialD8: Partial
Read assessment →
Agent Framework
AutoGen / AG2

Standout human oversight model (UserProxyAgent). Docker code execution for sandboxed security. Weakest production deployment tooling — research origins are evident.

D6: StrongD5: PartialD7: PartialD4: Partial
Read assessment →
Agent Framework
Semantic Kernel

Microsoft's enterprise SDK for .NET and Python. Native Entra ID and Azure Key Vault give D7 a Strong rating. Strong OpenTelemetry integration, rated highly in D4 (Observability) PSF review. The default choice for Azure-committed teams.

D4: StrongD7: StrongD1: PartialD8: Partial
Read assessment →
Agent Runtime
Cursor SDK

Released April 2026. Programmatic access to Cursor's agent runtime with MCP integration. Strong observability, gap on security and data protection — particularly for filesystem and email access.

D4: StrongD5: PartialD3: GapD7: Gap
Read assessment →
Observability
LangSmith vs Langfuse vs Arize

All three satisfy PSF D4 core requirements. LangSmith wins on LangChain depth; Langfuse wins on data residency and self-hosting; Arize wins on production alerting and MLOps integration.

D4: StrongD4a: StrongD4b: StrongD4c: Partial
Read assessment →
Tool Integration
Composio

Strong on security (managed OAuth) and data protection. Gap on human oversight — must be implemented above Composio.

D3: StrongD7: StrongD6: GapD8: Partial
Read assessment →
Agent Framework
Haystack (deepset)

RAG-native framework with the strongest production deployment story of any Python framework. Hayhooks REST serving is built-in. D4/D5/D8 are all Strong; D3 gap matters more for RAG workloads because retrieved documents carry PII.

D4: StrongD5: StrongD8: StrongD3: Gap
Read assessment →
Agent Framework
DSPy

Optimisation-first framework from Stanford NLP. TypedPredictor delivers the strongest structured output enforcement of any framework assessed (D2). Three gaps: D1, D3, D7. Research-to-production gap is real — deploy only with full companion safety layer.

D2: StrongD8: StrongD1: GapD7: Gap
Read assessment →
Agent Framework
Pydantic AI

Pydantic validation applied to LLM agents. Strong D2 from type-enforced outputs. Deliberately a library, not a platform — D5 and D6 are application responsibilities. Best for structured extraction pipelines; infrastructure ownership required.

D2: StrongD8: StrongD5: GapD6: Gap
Read assessment →
Agent Framework
Flowise / LangFlow

Visual low-code builders that accelerate prototyping and carry production security debt. Known CVEs in unauthenticated instances. D7 and D3 are gaps. Excellent for PoC; requires hardening before enterprise deployment.

D8: StrongD3: GapD7: GapD5: Partial
Read assessment →
Safety Tooling
Guardrails AI vs NeMo vs Azure CS

Three tools that close D1/D2/D3 gaps from different architectural positions. Guardrails AI for custom validators; NeMo for conversation policy; Azure Content Safety for enterprise managed compliance.

D1: StrongD2: StrongD3: StrongD7: Partial
Read assessment →
Vector Database
Pinecone vs Weaviate vs Chroma

PSF D3/D4 assessment of the three major vector databases. Weaviate wins on access control and audit logging. Pinecone wins on managed compliance. Chroma requires full application-layer D3 implementation.

D3: PartialD4: Partial
Read assessment →
Coverage map
See the wider AI stack

The public map separates published assessments, Lab scorecards, mapped coverage, and watchlist entries across the production AI ecosystem.

Open coverage map →Compare a stack →
From map to evidence

A stack map should change the deployment plan.

Once the assessment shows what the tools cover and what they leave open, route the decision into comparison, formal review, client delivery, or organisational adoption.

Independence and scope

The ecosystem map is useful because it sits above the vendor layer and stays tied to published PSF evidence.

Published reference work

PAI publishes the Production Safety Framework, PAI-8, research, Lab scorecards, and practical evidence tools for production AI deployment.

Independent assessment layer

Ecosystem coverage is editorial and standards-based. Tools do not buy placement, scoring, or assessment outcomes.

Built for practitioners and partners

The framework is designed to be applied by internal teams, consultants, MSPs, and certified partners without locking them into a single vendor stack.

Complementary to the tools it maps

The PSF does not replace frameworks, model providers, observability platforms, or guardrails. It shows what each layer contributes and which controls remain system responsibilities.

How to use this for your deployment

If you are assembling a production AI stack, start by mapping your chosen tools against the PSF domains using the published assessments. Note which domains are addressed by your tooling and which require explicit implementation on your part. The gaps are your implementation checklist before deployment.

If your organisation requires formal deployment evidence for internal governance, customer assurance, or regulatory work, start with a Deployment Safety Assessment. It reviews an in-scope deployment against PSF requirements using submitted implementation evidence rather than a self-reported questionnaire.

Read the PSF →View DSA →
Apply the standard

Turn the evidence into production practice.

Use the PSF, research library, and Lab material to review your own deployment. Credentials are available when a client, employer, or regulator needs public proof.

The Production AI Brief