At Google I/O 2026, Google introduced Agent Executor (project name AX): an open-source runtime for agent execution, resumption, and distributed deployment. The same week brought Agent Sandbox on GKE to general availability and a first look at Agent Substrate, an agent-first compute layer for ultra-dense scheduling on Kubernetes.
For production teams, AX matters because long-running agents fail in boring ways: disconnects, pod restarts, and human approvals that arrive hours later. Frameworks like LangGraph address pause/resume inside a graph; AX pushes durability and auditing down to a shared infrastructure runtime that can federate multiple harnesses. That is a different layer than a model release, and it changes how practitioners should think about PSF compliance for agent fleets.
Release scope assessed
| Artifact | Version / status | Date |
|---|---|---|
| Agent Executor (AX) | Open-source preview (breaking changes expected) | 2026-05-20 |
| Agent Sandbox on GKE | Generally available | 2026-05-20 |
| Agent Substrate | Open-source preview | 2026-05-20 |
PSF domain scorecard
Ratings reflect native AX capabilities documented at launch, not optional Google Cloud managed services. Full domain definitions are in the Production Safety Framework.
Input Governance
PartialAgent Executor coordinates actor calls through a central controller but does not natively classify, scope-check, or sanitise untrusted inputs before they reach agents or tools.
Google positions Agent Executor (AX) as a harness-agnostic runtime: LangGraph graphs, Agent Development Kit agents, A2A protocol agents, and custom harnesses all route through the same controller. That architecture is valuable for auditing who invoked what, yet PSF Domain 1 still requires semantic input governance when any actor accepts external content (user chat, retrieved documents, webhooks). The runtime event log records what happened; it does not decide whether an instruction is in policy. Teams deploying AX for customer-facing or RAG-heavy workflows should treat input governance as a pre-runtime gate, not an implicit property of Google's preview release.
Output Validation
GapTrajectory branching and checkpoints help experiment with outputs, but AX does not ship output contracts, schema enforcement, or semantic validation on agent completions.
AX exposes durable execution, snapshotting, and trajectory branching so teams can fork an agent path and compare alternatives without losing state. Those primitives support evaluation workflows, not automatic production validation. If an agent emits a harmful email draft, an incorrect financial summary, or a malformed API payload, the runtime will faithfully persist and resume that trajectory unless the harness or a downstream validator intervenes. Practitioners should pair AX with harness-level parsers (Pydantic, JSON schema) or a validation actor in the graph, consistent with how LangChain deployments address Domain 2.
Data Protection
PartialSecure sandboxes and single-writer session consistency reduce accidental cross-tenant corruption; PII handling and residency controls remain the operator's responsibility.
Google documents secure-by-design sandboxes for agents that generate code or serve multiple tenants, plus a single-writer architecture to keep distributed session state consistent. Those are meaningful data-protection building blocks when agents run concurrently. They do not replace classification at ingestion, retention policies on event logs, or contractual controls on model providers behind the harness. AX runs on your compute (including Agent Substrate on GKE), which helps residency-minded teams keep payloads inside chosen regions, but the runtime does not redact PII from logs or block sensitive fields from reaching tools.
Observability
StrongEvent logging, snapshotting, and controller-mediated actor calls give trace-grade visibility into long-running distributed agent workflows.
AX's design centres on an event log and snapshots so executions can resume after outages, human-in-the-loop pauses, or client disconnects. Connection recovery replays responses from the last sequence seen by the client, which improves operability for hour-scale runs. Because every skill, tool, and agent invocation is coordinated by one controller, teams can build a unified audit trail rather than stitching logs across siloed processes. Preview status means exporters and dashboards are still maturing; the architectural fit for PSF Domain 4 is nonetheless stronger than most agent frameworks at launch.
Deployment Safety
StrongDurable execution, checkpoint resume, and trajectory branching address the fragility of long-running agents that standard request/response stacks ignore.
Production agent failures often come from partial runs: a network blip mid-tool-chain, a pod restart during a multi-hour research task, or a human approval that arrives hours later. AX treats resumption as a first-class capability via event logs and snapshotting, which maps directly to PSF Domain 5 expectations for bounded, recoverable deployments. Agent Substrate (announced the same day, May 20, 2026) targets density limits on vanilla Kubernetes when millions of short tool calls chatter against the control plane. Teams should still implement cost circuit breakers and blast-radius limits in harness configuration; the runtime does not cap spend or tool volume automatically.
Human Oversight
StrongHuman-in-the-loop confirmations are an explicit resumption scenario; AX is architected for pause, review, and continue rather than fire-and-forget autonomy.
Google's launch post lists human-in-the-loop confirmations alongside outages as events that should not destroy agent state. That alignment matters for regulated workflows where irreversible actions require approval. AX does not prescribe your approval UI or escalation policy, but the runtime primitive (pause, snapshot, resume) is the hard part many teams rebuild poorly. Compare with LangGraph interrupts (see our LangChain assessment) and Cursor SDK pause semantics: AX generalises the pattern across harnesses at the infrastructure layer.
Security
PartialSandbox isolation and centralized coordination improve containment, but preview maturity, MCP breadth, and policy enforcement still demand explicit security architecture.
AX isolates components in sandboxes to limit harmful side effects when agents generate code or share infrastructure across tenants. Central coordination also makes it easier to deny-list dangerous tool combinations. The README warns that AX is in active early development with breaking changes before stable release: security teams should treat May 2026 as an architecture preview, not a finished hardening pass. Federating Antigravity, Managed Agents API, and third-party MCPs expands blast radius exactly as Cursor SDK assessments document for ambient agents. Penetration testing and supply-chain review of the google/ax repo should precede regulated production use.
Vendor Resilience
StrongOpen-source AX under Apache-style community development, harness-agnostic design, and self-managed compute reduce dependence on a single model vendor or cloud agent SKU.
Google markets AX as a way to prevent lock-in: bring your own harness, models, and compute while optionally federating Google-managed agents when useful. That is a credible PSF Domain 8 story for enterprises that must keep proprietary workflows on-premises or on GKE. Resilience is not automatic: teams still need abstraction at the model API layer and tested fallback harnesses. The strategic coupling risk shifts toward Agent Substrate and Gemini Enterprise services if those become the only supported path at scale, but the runtime itself is portable open source.
Certification and stack context
Teams operating AX on GKE should align runtime logging with CLOE (Certified LLM Operations Engineer) expectations for inference and agent fleet operations. Tool-heavy graphs federated through AX benefit from CAIS (Certified AI Safety Specialist) training on MCP blast radius. For first production agent deployments on Google Cloud, AIDA (AI Deployment Associate) covers the deployment checklist AX does not enforce automatically. Compare orchestration alternatives in our agent framework comparison and Amazon Bedrock assessment when mixing cloud agent platforms.
Sources
- Google Cloud Blog: Introducing Agent Executor (May 20, 2026)
- GitHub: google/ax (Agent Executor)
- Google Cloud Blog: Agent Sandbox on GKE and Agent Substrate (May 20, 2026)
- Google I/O 2026: Gemini Enterprise and agent platform announcements
- Production AI Institute: Production Safety Framework
- Production AI Institute: LangChain PSF Assessment
Scores are structured assessments against PSF v1.1, not empirical lab results. Revisit this page when AX reaches stable release or when Google publishes security hardening guidance beyond preview.
Turn the evidence into production practice.
Use the PSF, research library, and Lab material to review your own deployment. Credentials are available when a client, employer, or regulator needs public proof.