Public ecosystem coverage

A living map of the AI stack against PSF evidence.

The map separates published assessments, Lab scorecards, coverage mapping, and watchlist entries so practitioners can see what has evidence today and what is being tracked next.

Logos mark public ecosystem entries used for independent coverage mapping. Inclusion is editorial, not commercial.

24public entries

17published assessments

2Lab scorecards

2mapped entries

3watchlist

Cloud AI platformPublished assessment

Amazon Bedrock

AWS · Managed model platform · May 2026

Managed platform strengths help with security, identity, and deployment control. Application-level safety evidence still has to be designed and tested.

Independent PSF assessment of a public managed platform.

D1D2D3D4D5D6D7D8

Strengths

Managed infrastructureEnterprise security pathProvider abstraction

Gaps

Application-specific evalsHuman oversight policyModel behavior evidence

Evidence artifacts: Model Use Register, Evaluation Evidence Pack, Provider Fallback Plan

Open evidence Control templates

Agent frameworkPublished assessment

AutoGen / AG2

Multi-agent orchestration · May 2026

Strong human proxy pattern and sandboxed code execution options. The research-to-production path still needs deployment, monitoring, and incident evidence.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8

Strengths

UserProxyAgent oversightSandboxed code executionResearch flexibility

Gaps

Production deployment modelDefault observabilityData handling policy

Evidence artifacts: Code Execution Policy, Human Approval Gate, Runbook Evidence Log

Open evidence Control templates

Tool integrationPublished assessment

Composio

External action layer · May 2026

Strong managed authorization story for tool access. Production systems still need oversight, business-rule gating, and action-level audit evidence above the tool layer.

Independent PSF assessment of a public platform.

D1D2D3D4D5D6D7D8

Strengths

Managed OAuthTool catalogAction integration path

Gaps

Human-in-the-loop primitivesBusiness context validationIncident evidence

Evidence artifacts: Tool Permission Register, High-Risk Action Gate, External Action Audit Log

Open evidence Control templates

Agent frameworkPublished assessment

CrewAI

Multi-agent orchestration · May 2026

Useful role-based orchestration for fast prototypes. Multi-agent execution amplifies missing boundaries, so production use needs a serious companion control layer.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8

Strengths

Role clarityFast workflow assemblyMulti-agent task decomposition

Gaps

Input boundariesDeployment gatesCentralized audit trail

Evidence artifacts: Agent Role Register, Escalation Matrix, Deployment Readiness Checklist

Open evidence Control templates

Agent runtimePublished assessment

Cursor SDK

Cursor · Developer agent runtime · May 2026

A real agent runtime for developer workflows. Strong operational promise, with important gaps around filesystem, repository, and external action controls.

Independent PSF assessment of a public SDK.

D1D2D3D4D5D6D7D8

Strengths

Developer workflow fitMCP integration pathTraceable agent work

Gaps

Filesystem scopeRepository write boundariesSensitive data controls

Evidence artifacts: Repository Action Policy, MCP Tool Register, Human Merge Gate

Open evidence Deployment note Control templates

Agent frameworkPublished assessment

DSPy

Optimization and prompting · May 2026

Excellent for optimized structured pipelines. Production teams need surrounding deployment controls, security boundaries, and evidence for data handling.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8

Strengths

Typed output pathOptimization disciplineModel portability

Gaps

Input governanceSecurity defaultsOperational deployment path

Evidence artifacts: Prompt Optimization Register, Structured Output Contract, Deployment Runbook

Open evidence Control templates

Agent frameworkPublished assessment

Flowise / LangFlow

Visual builders · May 2026

Strong prototyping surface for visual workflows. Production adoption requires security hardening, secrets handling, and data protection evidence.

Independent PSF assessment of public low-code builders.

D1D2D3D4D5D6D7D8

Strengths

Visual orchestrationPrototype speedFramework accessibility

Gaps

Instance hardeningSecrets handlingProduction deployment evidence

Evidence artifacts: Builder Hardening Checklist, Secrets Handling Policy, Deployment Gate Register

Open evidence Control templates

Safety toolingPublished assessment

Guardrails AI / NeMo / Azure Content Safety

NVIDIA, Microsoft, and open-source projects · Guardrails and validation · May 2026

The cleanest companion layer for D1, D2, and D3 gaps. Tooling helps, but teams still need policy, test evidence, and operational ownership.

Independent PSF comparison of public safety tooling.

D1D2D3D4D5D6D7D8

Strengths

Input policyOutput validationPII and content safety controls

Gaps

Security boundariesHuman escalation policyProduction ownership

Evidence artifacts: Input Boundary Policy, Output Validation Contract, Safety Evaluation Set

Open evidence Control templates

Agent frameworkPublished assessment

Haystack

deepset · RAG and pipelines · May 2026

RAG-native production path with strong deployment ergonomics. The main PSF risk is data protection around retrieved content and source governance.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8

Strengths

Pipeline architectureREST serving pathVendor resilience

Gaps

PII handling in retrievalSource-level governanceSensitive answer review

Evidence artifacts: Retrieval Source Register, Pipeline Deployment Checklist, PII Handling Plan

Open evidence Control templates

Agent frameworkPublished assessment

LangChain & LangGraph

Agent orchestration · May 2026

Strong ecosystem and graph control primitives. Needs companion policy, data protection, and security evidence before teams call a deployment production-ready.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8

Strengths

LangGraph control flowLangSmith observability pathBroad integration ecosystem

Gaps

Native PII controlsDefault tool permission boundariesFormal security model

Evidence artifacts: Agent Boundary Spec, Trace Retention Policy, Tool Permission Register

Open evidence Stack recipes Control templates

ObservabilityPublished assessment

LangSmith / Langfuse / Arize

Trace and evaluation layer · May 2026

Core D4 tooling for traces, evals, and investigation. Coverage is strongest when trace retention, incident triggers, and evaluation thresholds are written down.

Independent PSF comparison of public observability tools.

D1D2D3D4D5D6D7D8

Strengths

Trace visibilityEvaluation hooksProduction investigation path

Gaps

Cross-tool incident policyRetention governanceFallback playbooks

Evidence artifacts: Trace Retention Policy, Eval Threshold Register, Incident Triage Runbook

Open evidence Control templates

Agent frameworkPublished assessment

LlamaIndex

Retrieval and agents · May 2026

Strong retrieval and data connector story. Production readiness depends on data classification, retrieval auditability, and output validation around the application.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8

Strengths

Retrieval architectureConnector ecosystemAgent workflow options

Gaps

PII classificationRetrieval source governanceHuman review on sensitive outputs

Evidence artifacts: Retrieval Source Register, PII Handling Plan, Answer Citation Policy

Open evidence Control templates

Workflow automationPublished assessment

n8n

Automation runtime · May 2026

Valuable workflow automation layer for AI-enabled operations. The PSF question is not whether workflows run, but whether actions, data, and review gates are governed.

Independent PSF assessment of a public automation platform.

D1D2D3D4D5D6D7D8

Strengths

Workflow orchestrationConnector ecosystemOperational accessibility

Gaps

AI-specific output controlsHigh-risk action gatesEvidence exports

Evidence artifacts: Workflow Control Map, Action Approval Policy, Automation Incident Runbook

Open evidence Control templates

Agent runtimePublished assessment

OpenAI Agents SDK

OpenAI · Agent orchestration · May 2026

Assessed as a modern agent runtime with strong tool execution patterns, but production safety still depends on explicit controls around boundaries, data, and review.

Public assessment, no affiliation or endorsement implied.

D1D2D3D4D5D6D7D8

Strengths

Tool execution modelSDK-level tracesProvider ecosystem depth

Gaps

Application-layer PII handlingHuman gates for high-stakes actionEvidence pack discipline

Evidence artifacts: Tool Permission Register, Output Validation Contract, Human Review Policy

Open evidence Deployment guide Control templates

Vector databasePublished assessment

Pinecone / Weaviate / Chroma

Retrieval infrastructure · May 2026

Vector stores affect data protection, auditability, and vendor resilience. Managed compliance and self-hosting tradeoffs should be explicit before deployment.

Independent PSF comparison of public infrastructure options.

D1D2D3D4D5D6D7D8

Strengths

Retrieval infrastructure maturityManaged and self-host optionsOperational ecosystem

Gaps

Document-level access policyPII removal pathMigration test evidence

Evidence artifacts: Embedding Data Register, Access Control Map, Vector Store Exit Test

Open evidence Control templates

Agent frameworkPublished assessment

Pydantic AI

Structured agents · May 2026

Strong schema-first model for structured extraction and validation. It is deliberately a library, so deployment and oversight remain application responsibilities.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8

Strengths

Type-enforced outputsPython developer fitStructured extraction

Gaps

Deployment gatesHuman oversightSecurity model

Evidence artifacts: Schema Contract, Refusal Path Policy, Deployment Checklist

Open evidence Control templates

Agent frameworkPublished assessment

Semantic Kernel

Microsoft · Enterprise orchestration · May 2026

Strong enterprise footing through Azure identity, key management, and telemetry. Teams still need explicit input, oversight, and vendor exit evidence.

Independent PSF assessment of a public framework.

D1D2D3D4D5D6D7D8

Strengths

Enterprise identity pathOpenTelemetry integrationAzure deployment story

Gaps

Input governance defaultsHuman oversight defaultsProvider portability evidence

Evidence artifacts: Identity Boundary Spec, Telemetry Evidence Plan, Vendor Exit Test

Open evidence Control templates

Foundation modelLab scorecard

Claude Sonnet 4.6

Anthropic · Model provider · Q2 2026

PAI Lab scorecard coverage for general production behavior. Strong model performance still needs deployment evidence at the system layer.

Public Lab scorecard, no affiliation or endorsement implied.

D1D2D3D4D5D6D7D8

Strengths

Reasoning reliabilityLong-context workflow fitSafety-oriented behavior

Gaps

Workflow-specific evalsTool action boundariesProvider continuity plan

Evidence artifacts: Model Risk Register, Tool Scope Policy, Provider Fallback Plan

Open evidence Control templates

Foundation modelLab scorecard

GPT-4.1

OpenAI · Model provider · Q2 2026

PAI Lab scorecard coverage for general production behavior. Model scorecards inform deployment decisions but do not replace system-level PSF evidence.

Public Lab scorecard, no affiliation or endorsement implied.

D1D2D3D4D5D6D7D8

Strengths

General capabilityTooling ecosystemEnterprise deployment options

Gaps

System-specific boundariesUse-case evaluation setFallback evidence

Evidence artifacts: Model Risk Register, Eval Set, Provider Fallback Plan

Open evidence PAI-8 Control templates

Voice AIMapped coverage

ElevenLabs

ElevenLabs · Media and voice systems · May 2026

Mapped for voice AI evidence needs, including consent, provenance, misuse controls, and incident response. This is coverage mapping, not a product certification.

Public ecosystem mapping, no affiliation or endorsement implied.

D1D2D3D4D5D6D7D8

Strengths

Voice workflow relevanceMedia provenance questionsCredential alignment

Gaps

Public product assessment pendingConsent evidenceMisuse response path

Evidence artifacts: Voice Consent Register, Synthetic Media Disclosure, Abuse Response Runbook

Open evidence Incident response Control templates

AI searchMapped coverage

Perplexity

Perplexity · Knowledge application · May 2026

Mapped as a public AI knowledge product where citation quality, source handling, and user reliance are the main PSF concerns.

Public ecosystem mapping, no affiliation or endorsement implied.

D1D2D3D4D5D6D7D8

Strengths

Search workflow relevanceCitation surfaceKnowledge worker adoption

Gaps

Product assessment pendingSource reliability evidenceUser reliance controls

Evidence artifacts: Citation Quality Policy, Source Risk Register, Reliance Warning Policy

Open evidence Control templates

AI software agentWatchlist

Devin

Cognition · Autonomous developer agent · Watchlist May 2026

Tracked because autonomous code agents need unusually clear repository permissions, test evidence, rollback paths, and human merge gates.

Watchlist entry, no public assessment claim.

D1D2D3D4D5D6D7D8

Strengths

Autonomous software workAgentic workflow relevanceClear PSF fit

Gaps

Public assessment pendingRepository action boundariesChange approval evidence

Evidence artifacts: Repository Action Policy, Human Merge Gate, Rollback Runbook

Open evidence Control templates

Foundation modelWatchlist

Grok

xAI · Model provider · Watchlist May 2026

Tracked for Lab coverage as another major model provider in production AI stacks. Public Lab scorecard is not yet published.

Watchlist entry, no public assessment claim.

D1D2D3D4D5D6D7D8

Strengths

Provider diversityPublic adoptionModel ecosystem relevance

Gaps

Published Lab scorecard pendingSystem-level assessment pendingDeployment evidence pending

Evidence artifacts: Model Risk Register, Provider Comparison, Fallback Plan

Open evidence Control templates

Foundation modelWatchlist

Mistral AI

Mistral AI · Model provider · Watchlist May 2026

Tracked for Lab scorecard coverage and European deployment relevance. Public system-level assessment is not yet published.

Watchlist entry, no public assessment claim.

D1D2D3D4D5D6D7D8

Strengths

European model ecosystemOpen and managed optionsProvider diversity

Gaps

Published Lab scorecard pendingSystem-level evidence pendingControl templates by use case

Evidence artifacts: Model Risk Register, Provider Comparison, Fallback Plan

Open evidence Control templates

StrongPartialGapMappedPlannedOpen

Public coverage map for the production AI stack

A living map of the AI stack against PSF evidence.

Amazon Bedrock

AutoGen / AG2

Composio

CrewAI

Cursor SDK

DSPy

Flowise / LangFlow

Guardrails AI / NeMo / Azure Content Safety

Haystack

LangChain & LangGraph

LangSmith / Langfuse / Arize

LlamaIndex

n8n

OpenAI Agents SDK

Pinecone / Weaviate / Chroma

Pydantic AI

Semantic Kernel

Claude Sonnet 4.6

GPT-4.1

ElevenLabs

Perplexity

Devin

Grok

Mistral AI