Ecosystem intelligence

Public coverage map for the production AI stack

PAI tracks the frameworks, runtimes, model providers, safety tools, observability platforms, and workflow layers that teams actually use, then maps them against PSF evidence requirements.

Public ecosystem coverage

A living map of the AI stack against PSF evidence.

The map separates published assessments, Lab scorecards, coverage mapping, and watchlist entries so practitioners can see what has evidence today and what is being tracked next.

Logos mark public ecosystem entries used for independent coverage mapping. Inclusion is editorial, not commercial.
24public entries
17published assessments
2Lab scorecards
2mapped entries
3watchlist
Cloud AI platformPublished assessment

Amazon Bedrock

AWS · Managed model platform · May 2026

Managed platform strengths help with security, identity, and deployment control. Application-level safety evidence still has to be designed and tested.

Independent PSF assessment of a public managed platform.

D1D2D3D4D5D6D7D8
Strengths
Managed infrastructureEnterprise security pathProvider abstraction
Gaps
Application-specific evalsHuman oversight policyModel behavior evidence
Evidence artifacts: Model Use Register, Evaluation Evidence Pack, Provider Fallback Plan
Agent frameworkPublished assessment

AutoGen / AG2

Multi-agent orchestration · May 2026

Strong human proxy pattern and sandboxed code execution options. The research-to-production path still needs deployment, monitoring, and incident evidence.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8
Strengths
UserProxyAgent oversightSandboxed code executionResearch flexibility
Gaps
Production deployment modelDefault observabilityData handling policy
Evidence artifacts: Code Execution Policy, Human Approval Gate, Runbook Evidence Log
Tool integrationPublished assessment

Composio

External action layer · May 2026

Strong managed authorization story for tool access. Production systems still need oversight, business-rule gating, and action-level audit evidence above the tool layer.

Independent PSF assessment of a public platform.

D1D2D3D4D5D6D7D8
Strengths
Managed OAuthTool catalogAction integration path
Gaps
Human-in-the-loop primitivesBusiness context validationIncident evidence
Evidence artifacts: Tool Permission Register, High-Risk Action Gate, External Action Audit Log
Agent frameworkPublished assessment

CrewAI

Multi-agent orchestration · May 2026

Useful role-based orchestration for fast prototypes. Multi-agent execution amplifies missing boundaries, so production use needs a serious companion control layer.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8
Strengths
Role clarityFast workflow assemblyMulti-agent task decomposition
Gaps
Input boundariesDeployment gatesCentralized audit trail
Evidence artifacts: Agent Role Register, Escalation Matrix, Deployment Readiness Checklist
Agent runtimePublished assessment

Cursor SDK

Cursor · Developer agent runtime · May 2026

A real agent runtime for developer workflows. Strong operational promise, with important gaps around filesystem, repository, and external action controls.

Independent PSF assessment of a public SDK.

D1D2D3D4D5D6D7D8
Strengths
Developer workflow fitMCP integration pathTraceable agent work
Gaps
Filesystem scopeRepository write boundariesSensitive data controls
Evidence artifacts: Repository Action Policy, MCP Tool Register, Human Merge Gate
Agent frameworkPublished assessment

DSPy

Optimization and prompting · May 2026

Excellent for optimized structured pipelines. Production teams need surrounding deployment controls, security boundaries, and evidence for data handling.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8
Strengths
Typed output pathOptimization disciplineModel portability
Gaps
Input governanceSecurity defaultsOperational deployment path
Evidence artifacts: Prompt Optimization Register, Structured Output Contract, Deployment Runbook
Agent frameworkPublished assessment

Flowise / LangFlow

Visual builders · May 2026

Strong prototyping surface for visual workflows. Production adoption requires security hardening, secrets handling, and data protection evidence.

Independent PSF assessment of public low-code builders.

D1D2D3D4D5D6D7D8
Strengths
Visual orchestrationPrototype speedFramework accessibility
Gaps
Instance hardeningSecrets handlingProduction deployment evidence
Evidence artifacts: Builder Hardening Checklist, Secrets Handling Policy, Deployment Gate Register
Safety toolingPublished assessment

Guardrails AI / NeMo / Azure Content Safety

NVIDIA, Microsoft, and open-source projects · Guardrails and validation · May 2026

The cleanest companion layer for D1, D2, and D3 gaps. Tooling helps, but teams still need policy, test evidence, and operational ownership.

Independent PSF comparison of public safety tooling.

D1D2D3D4D5D6D7D8
Strengths
Input policyOutput validationPII and content safety controls
Gaps
Security boundariesHuman escalation policyProduction ownership
Evidence artifacts: Input Boundary Policy, Output Validation Contract, Safety Evaluation Set
Agent frameworkPublished assessment

Haystack

deepset · RAG and pipelines · May 2026

RAG-native production path with strong deployment ergonomics. The main PSF risk is data protection around retrieved content and source governance.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8
Strengths
Pipeline architectureREST serving pathVendor resilience
Gaps
PII handling in retrievalSource-level governanceSensitive answer review
Evidence artifacts: Retrieval Source Register, Pipeline Deployment Checklist, PII Handling Plan
Agent frameworkPublished assessment

LangChain & LangGraph

Agent orchestration · May 2026

Strong ecosystem and graph control primitives. Needs companion policy, data protection, and security evidence before teams call a deployment production-ready.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8
Strengths
LangGraph control flowLangSmith observability pathBroad integration ecosystem
Gaps
Native PII controlsDefault tool permission boundariesFormal security model
Evidence artifacts: Agent Boundary Spec, Trace Retention Policy, Tool Permission Register
ObservabilityPublished assessment

LangSmith / Langfuse / Arize

Trace and evaluation layer · May 2026

Core D4 tooling for traces, evals, and investigation. Coverage is strongest when trace retention, incident triggers, and evaluation thresholds are written down.

Independent PSF comparison of public observability tools.

D1D2D3D4D5D6D7D8
Strengths
Trace visibilityEvaluation hooksProduction investigation path
Gaps
Cross-tool incident policyRetention governanceFallback playbooks
Evidence artifacts: Trace Retention Policy, Eval Threshold Register, Incident Triage Runbook
Agent frameworkPublished assessment

LlamaIndex

Retrieval and agents · May 2026

Strong retrieval and data connector story. Production readiness depends on data classification, retrieval auditability, and output validation around the application.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8
Strengths
Retrieval architectureConnector ecosystemAgent workflow options
Gaps
PII classificationRetrieval source governanceHuman review on sensitive outputs
Evidence artifacts: Retrieval Source Register, PII Handling Plan, Answer Citation Policy
Workflow automationPublished assessment

n8n

Automation runtime · May 2026

Valuable workflow automation layer for AI-enabled operations. The PSF question is not whether workflows run, but whether actions, data, and review gates are governed.

Independent PSF assessment of a public automation platform.

D1D2D3D4D5D6D7D8
Strengths
Workflow orchestrationConnector ecosystemOperational accessibility
Gaps
AI-specific output controlsHigh-risk action gatesEvidence exports
Evidence artifacts: Workflow Control Map, Action Approval Policy, Automation Incident Runbook
Agent runtimePublished assessment

OpenAI Agents SDK

OpenAI · Agent orchestration · May 2026

Assessed as a modern agent runtime with strong tool execution patterns, but production safety still depends on explicit controls around boundaries, data, and review.

Public assessment, no affiliation or endorsement implied.

D1D2D3D4D5D6D7D8
Strengths
Tool execution modelSDK-level tracesProvider ecosystem depth
Gaps
Application-layer PII handlingHuman gates for high-stakes actionEvidence pack discipline
Evidence artifacts: Tool Permission Register, Output Validation Contract, Human Review Policy
Vector databasePublished assessment

Pinecone / Weaviate / Chroma

Retrieval infrastructure · May 2026

Vector stores affect data protection, auditability, and vendor resilience. Managed compliance and self-hosting tradeoffs should be explicit before deployment.

Independent PSF comparison of public infrastructure options.

D1D2D3D4D5D6D7D8
Strengths
Retrieval infrastructure maturityManaged and self-host optionsOperational ecosystem
Gaps
Document-level access policyPII removal pathMigration test evidence
Evidence artifacts: Embedding Data Register, Access Control Map, Vector Store Exit Test
Agent frameworkPublished assessment

Pydantic AI

Structured agents · May 2026

Strong schema-first model for structured extraction and validation. It is deliberately a library, so deployment and oversight remain application responsibilities.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8
Strengths
Type-enforced outputsPython developer fitStructured extraction
Gaps
Deployment gatesHuman oversightSecurity model
Evidence artifacts: Schema Contract, Refusal Path Policy, Deployment Checklist
Agent frameworkPublished assessment

Semantic Kernel

Microsoft · Enterprise orchestration · May 2026

Strong enterprise footing through Azure identity, key management, and telemetry. Teams still need explicit input, oversight, and vendor exit evidence.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8
Strengths
Enterprise identity pathOpenTelemetry integrationAzure deployment story
Gaps
Input governance defaultsHuman oversight defaultsProvider portability evidence
Evidence artifacts: Identity Boundary Spec, Telemetry Evidence Plan, Vendor Exit Test
Foundation modelLab scorecard

Claude Sonnet 4.6

Anthropic · Model provider · Q2 2026

PAI Lab scorecard coverage for general production behavior. Strong model performance still needs deployment evidence at the system layer.

Public Lab scorecard, no affiliation or endorsement implied.

D1D2D3D4D5D6D7D8
Strengths
Reasoning reliabilityLong-context workflow fitSafety-oriented behavior
Gaps
Workflow-specific evalsTool action boundariesProvider continuity plan
Evidence artifacts: Model Risk Register, Tool Scope Policy, Provider Fallback Plan
Foundation modelLab scorecard

GPT-4.1

OpenAI · Model provider · Q2 2026

PAI Lab scorecard coverage for general production behavior. Model scorecards inform deployment decisions but do not replace system-level PSF evidence.

Public Lab scorecard, no affiliation or endorsement implied.

D1D2D3D4D5D6D7D8
Strengths
General capabilityTooling ecosystemEnterprise deployment options
Gaps
System-specific boundariesUse-case evaluation setFallback evidence
Evidence artifacts: Model Risk Register, Eval Set, Provider Fallback Plan
Voice AIMapped coverage

ElevenLabs

ElevenLabs · Media and voice systems · May 2026

Mapped for voice AI evidence needs, including consent, provenance, misuse controls, and incident response. This is coverage mapping, not a product certification.

Public ecosystem mapping, no affiliation or endorsement implied.

D1D2D3D4D5D6D7D8
Strengths
Voice workflow relevanceMedia provenance questionsCredential alignment
Gaps
Public product assessment pendingConsent evidenceMisuse response path
Evidence artifacts: Voice Consent Register, Synthetic Media Disclosure, Abuse Response Runbook
AI searchMapped coverage

Perplexity

Perplexity · Knowledge application · May 2026

Mapped as a public AI knowledge product where citation quality, source handling, and user reliance are the main PSF concerns.

Public ecosystem mapping, no affiliation or endorsement implied.

D1D2D3D4D5D6D7D8
Strengths
Search workflow relevanceCitation surfaceKnowledge worker adoption
Gaps
Product assessment pendingSource reliability evidenceUser reliance controls
Evidence artifacts: Citation Quality Policy, Source Risk Register, Reliance Warning Policy
AI software agentWatchlist

Devin

Cognition · Autonomous developer agent · Watchlist May 2026

Tracked because autonomous code agents need unusually clear repository permissions, test evidence, rollback paths, and human merge gates.

Watchlist entry, no public assessment claim.

D1D2D3D4D5D6D7D8
Strengths
Autonomous software workAgentic workflow relevanceClear PSF fit
Gaps
Public assessment pendingRepository action boundariesChange approval evidence
Evidence artifacts: Repository Action Policy, Human Merge Gate, Rollback Runbook
Foundation modelWatchlist

Grok

xAI · Model provider · Watchlist May 2026

Tracked for Lab coverage as another major model provider in production AI stacks. Public Lab scorecard is not yet published.

Watchlist entry, no public assessment claim.

D1D2D3D4D5D6D7D8
Strengths
Provider diversityPublic adoptionModel ecosystem relevance
Gaps
Published Lab scorecard pendingSystem-level assessment pendingDeployment evidence pending
Evidence artifacts: Model Risk Register, Provider Comparison, Fallback Plan
Foundation modelWatchlist

Mistral AI

Mistral AI · Model provider · Watchlist May 2026

Tracked for Lab scorecard coverage and European deployment relevance. Public system-level assessment is not yet published.

Watchlist entry, no public assessment claim.

D1D2D3D4D5D6D7D8
Strengths
European model ecosystemOpen and managed optionsProvider diversity
Gaps
Published Lab scorecard pendingSystem-level evidence pendingControl templates by use case
Evidence artifacts: Model Risk Register, Provider Comparison, Fallback Plan
StrongPartialGapMappedPlannedOpen