Production AI Institute · PSF v1.1 open standard
AI Right-To-KnowAI Data Use IndexCheck My AI ToolsPolicy Change WatchAgent ReadinessPublic BenchmarkContactGlobal standard · Worldwide
Insights / PSF AssessmentAX preview · May 20, 2026

Google Agent Executor (AX) in Production: A PSF Domain Assessment

Google open-sourced Agent Executor on May 20, 2026 as a distributed runtime for durable, resumable agent workflows. The preview fills a real operations gap; input governance and output validation still belong in your harness layer.

Production AI Institute · 10 min read · Updated May 2026
Independence disclosure: The Production AI Institute has no commercial relationship with Google. This assessment is based on the May 20, 2026 launch post, the public google/ax repository, and related Google Cloud documentation. Google was not consulted in preparing this assessment.

At Google I/O 2026, Google introduced Agent Executor (project name AX): an open-source runtime for agent execution, resumption, and distributed deployment. The same week brought Agent Sandbox on GKE to general availability and a first look at Agent Substrate, an agent-first compute layer for ultra-dense scheduling on Kubernetes.

For production teams, AX matters because long-running agents fail in boring ways: disconnects, pod restarts, and human approvals that arrive hours later. Frameworks like LangGraph address pause/resume inside a graph; AX pushes durability and auditing down to a shared infrastructure runtime that can federate multiple harnesses. That is a different layer than a model release, and it changes how practitioners should think about PSF compliance for agent fleets.

Release scope assessed

ArtifactVersion / statusDate
Agent Executor (AX)Open-source preview (breaking changes expected)2026-05-20
Agent Sandbox on GKEGenerally available2026-05-20
Agent SubstrateOpen-source preview2026-05-20

PSF domain scorecard

Ratings reflect native AX capabilities documented at launch, not optional Google Cloud managed services. Full domain definitions are in the Production Safety Framework.

DomainRating
D1Input GovernancePartial
D2Output ValidationGap
D3Data ProtectionPartial
D4ObservabilityStrong
D5Deployment SafetyStrong
D6Human OversightStrong
D7SecurityPartial
D8Vendor ResilienceStrong
D1

Input Governance

Partial

Agent Executor coordinates actor calls through a central controller but does not natively classify, scope-check, or sanitise untrusted inputs before they reach agents or tools.

Google positions Agent Executor (AX) as a harness-agnostic runtime: LangGraph graphs, Agent Development Kit agents, A2A protocol agents, and custom harnesses all route through the same controller. That architecture is valuable for auditing who invoked what, yet PSF Domain 1 still requires semantic input governance when any actor accepts external content (user chat, retrieved documents, webhooks). The runtime event log records what happened; it does not decide whether an instruction is in policy. Teams deploying AX for customer-facing or RAG-heavy workflows should treat input governance as a pre-runtime gate, not an implicit property of Google's preview release.

Practitioner action: Add an input classification step before AX schedules work for external-facing actors. Scope tool and MCP permissions per actor. Document permitted instruction classes in your deployment safety review before production traffic.
D2

Output Validation

Gap

Trajectory branching and checkpoints help experiment with outputs, but AX does not ship output contracts, schema enforcement, or semantic validation on agent completions.

AX exposes durable execution, snapshotting, and trajectory branching so teams can fork an agent path and compare alternatives without losing state. Those primitives support evaluation workflows, not automatic production validation. If an agent emits a harmful email draft, an incorrect financial summary, or a malformed API payload, the runtime will faithfully persist and resume that trajectory unless the harness or a downstream validator intervenes. Practitioners should pair AX with harness-level parsers (Pydantic, JSON schema) or a validation actor in the graph, consistent with how LangChain deployments address Domain 2.

Practitioner action: Define an OutputContract per consequential action type. Run validation in the harness before AX checkpoints irreversible side effects. Use branching to A/B test policies, not as a substitute for blocking bad outputs in production.
D3

Data Protection

Partial

Secure sandboxes and single-writer session consistency reduce accidental cross-tenant corruption; PII handling and residency controls remain the operator's responsibility.

Google documents secure-by-design sandboxes for agents that generate code or serve multiple tenants, plus a single-writer architecture to keep distributed session state consistent. Those are meaningful data-protection building blocks when agents run concurrently. They do not replace classification at ingestion, retention policies on event logs, or contractual controls on model providers behind the harness. AX runs on your compute (including Agent Substrate on GKE), which helps residency-minded teams keep payloads inside chosen regions, but the runtime does not redact PII from logs or block sensitive fields from reaching tools.

Practitioner action: Classify data categories before enabling AX against regulated workloads. Restrict sandbox filesystem and network egress. Configure log retention and access controls on the controller event store. Review Google Cloud data processing terms for any managed agents you federate alongside self-hosted AX.
D4

Observability

Strong

Event logging, snapshotting, and controller-mediated actor calls give trace-grade visibility into long-running distributed agent workflows.

AX's design centres on an event log and snapshots so executions can resume after outages, human-in-the-loop pauses, or client disconnects. Connection recovery replays responses from the last sequence seen by the client, which improves operability for hour-scale runs. Because every skill, tool, and agent invocation is coordinated by one controller, teams can build a unified audit trail rather than stitching logs across siloed processes. Preview status means exporters and dashboards are still maturing; the architectural fit for PSF Domain 4 is nonetheless stronger than most agent frameworks at launch.

Practitioner action: Export controller events to your existing observability stack (OpenTelemetry, Cloud Logging, or Langfuse). Alert on stalled executions and replay failures. Correlate AX run IDs with deployment change records for incident response.
D5

Deployment Safety

Strong

Durable execution, checkpoint resume, and trajectory branching address the fragility of long-running agents that standard request/response stacks ignore.

Production agent failures often come from partial runs: a network blip mid-tool-chain, a pod restart during a multi-hour research task, or a human approval that arrives hours later. AX treats resumption as a first-class capability via event logs and snapshotting, which maps directly to PSF Domain 5 expectations for bounded, recoverable deployments. Agent Substrate (announced the same day, May 20, 2026) targets density limits on vanilla Kubernetes when millions of short tool calls chatter against the control plane. Teams should still implement cost circuit breakers and blast-radius limits in harness configuration; the runtime does not cap spend or tool volume automatically.

Practitioner action: Pin AX and Substrate versions in GitOps manifests. Test resume behaviour after forced pod kills. Pair with step budgets in LangGraph or ADK graphs. Document rollback: which checkpoint ID is safe to promote after a bad branch.
D6

Human Oversight

Strong

Human-in-the-loop confirmations are an explicit resumption scenario; AX is architected for pause, review, and continue rather than fire-and-forget autonomy.

Google's launch post lists human-in-the-loop confirmations alongside outages as events that should not destroy agent state. That alignment matters for regulated workflows where irreversible actions require approval. AX does not prescribe your approval UI or escalation policy, but the runtime primitive (pause, snapshot, resume) is the hard part many teams rebuild poorly. Compare with LangGraph interrupts (see our LangChain assessment) and Cursor SDK pause semantics: AX generalises the pattern across harnesses at the infrastructure layer.

Practitioner action: Map each irreversible tool call to a mandatory HITL checkpoint in the harness. Store approver identity in AX metadata for audit. For high-risk domains, require dual control before resuming from snapshot.
D7

Security

Partial

Sandbox isolation and centralized coordination improve containment, but preview maturity, MCP breadth, and policy enforcement still demand explicit security architecture.

AX isolates components in sandboxes to limit harmful side effects when agents generate code or share infrastructure across tenants. Central coordination also makes it easier to deny-list dangerous tool combinations. The README warns that AX is in active early development with breaking changes before stable release: security teams should treat May 2026 as an architecture preview, not a finished hardening pass. Federating Antigravity, Managed Agents API, and third-party MCPs expands blast radius exactly as Cursor SDK assessments document for ambient agents. Penetration testing and supply-chain review of the google/ax repo should precede regulated production use.

Practitioner action: Run AX sandboxes with least-privilege network policies on GKE. Limit MCP and tool registrations per environment. Track CVEs on AX releases. Cross-train security reviewers on CAIS controls for agent tool access.
D8

Vendor Resilience

Strong

Open-source AX under Apache-style community development, harness-agnostic design, and self-managed compute reduce dependence on a single model vendor or cloud agent SKU.

Google markets AX as a way to prevent lock-in: bring your own harness, models, and compute while optionally federating Google-managed agents when useful. That is a credible PSF Domain 8 story for enterprises that must keep proprietary workflows on-premises or on GKE. Resilience is not automatic: teams still need abstraction at the model API layer and tested fallback harnesses. The strategic coupling risk shifts toward Agent Substrate and Gemini Enterprise services if those become the only supported path at scale, but the runtime itself is portable open source.

Practitioner action: Maintain a secondary harness and model path tested quarterly. Document exit criteria if Managed Agents API pricing or terms change. Keep infrastructure-as-code for Substrate optional so AX can run on standard GKE where density requirements are modest.

Certification and stack context

Teams operating AX on GKE should align runtime logging with CLOE (Certified LLM Operations Engineer) expectations for inference and agent fleet operations. Tool-heavy graphs federated through AX benefit from CAIS (Certified AI Safety Specialist) training on MCP blast radius. For first production agent deployments on Google Cloud, AIDA (AI Deployment Associate) covers the deployment checklist AX does not enforce automatically. Compare orchestration alternatives in our agent framework comparison and Amazon Bedrock assessment when mixing cloud agent platforms.

Sources

Scores are structured assessments against PSF v1.1, not empirical lab results. Revisit this page when AX reaches stable release or when Google publishes security hardening guidance beyond preview.

Apply the standard

Turn the evidence into production practice.

Use the PSF, research library, and Lab material to review your own deployment. Credentials are available when a client, employer, or regulator needs public proof.

Read the PSF →View credentials
The Production AI Brief