Google Agent Executor (AX) in Production: A PSF Domain Assessment

Independence disclosure: The Production AI Institute has no commercial relationship with Google. This assessment is based on the May 20, 2026 launch post, the public google/ax repository, and related Google Cloud documentation. Google was not consulted in preparing this assessment.

At Google I/O 2026, Google introduced Agent Executor (project name AX): an open-source runtime for agent execution, resumption, and distributed deployment. The same week brought Agent Sandbox on GKE to general availability and a first look at Agent Substrate, an agent-first compute layer for ultra-dense scheduling on Kubernetes.

For production teams, AX matters because long-running agents fail in boring ways: disconnects, pod restarts, and human approvals that arrive hours later. Frameworks like LangGraph address pause/resume inside a graph; AX pushes durability and auditing down to a shared infrastructure runtime that can federate multiple harnesses. That is a different layer than a model release, and it changes how practitioners should think about PSF compliance for agent fleets.

Release scope assessed

Artifact	Version / status	Date
Agent Executor (AX)	Open-source preview (breaking changes expected)	2026-05-20
Agent Sandbox on GKE	Generally available	2026-05-20
Agent Substrate	Open-source preview	2026-05-20

PSF domain scorecard

Ratings reflect native AX capabilities documented at launch, not optional Google Cloud managed services. Full domain definitions are in the Production Safety Framework.

Domain	Rating
D1Input Governance	Partial
D2Output Validation	Gap
D3Data Protection	Partial
D4Observability	Strong
D5Deployment Safety	Strong
D6Human Oversight	Strong
D7Security	Partial
D8Vendor Resilience	Strong

Input Governance

Partial

Agent Executor coordinates actor calls through a central controller but does not natively classify, scope-check, or sanitise untrusted inputs before they reach agents or tools.

Google positions Agent Executor (AX) as a harness-agnostic runtime: LangGraph graphs, Agent Development Kit agents, A2A protocol agents, and custom harnesses all route through the same controller. That architecture is valuable for auditing who invoked what, yet PSF Domain 1 still requires semantic input governance when any actor accepts external content (user chat, retrieved documents, webhooks). The runtime event log records what happened; it does not decide whether an instruction is in policy. Teams deploying AX for customer-facing or RAG-heavy workflows should treat input governance as a pre-runtime gate, not an implicit property of Google's preview release.

Practitioner action: Add an input classification step before AX schedules work for external-facing actors. Scope tool and MCP permissions per actor. Document permitted instruction classes in your deployment safety review before production traffic.

Output Validation

Gap

Trajectory branching and checkpoints help experiment with outputs, but AX does not ship output contracts, schema enforcement, or semantic validation on agent completions.

AX exposes durable execution, snapshotting, and trajectory branching so teams can fork an agent path and compare alternatives without losing state. Those primitives support evaluation workflows, not automatic production validation. If an agent emits a harmful email draft, an incorrect financial summary, or a malformed API payload, the runtime will faithfully persist and resume that trajectory unless the harness or a downstream validator intervenes. Practitioners should pair AX with harness-level parsers (Pydantic, JSON schema) or a validation actor in the graph, consistent with how LangChain deployments address Domain 2.

Practitioner action: Define an OutputContract per consequential action type. Run validation in the harness before AX checkpoints irreversible side effects. Use branching to A/B test policies, not as a substitute for blocking bad outputs in production.

Data Protection

Partial

Secure sandboxes and single-writer session consistency reduce accidental cross-tenant corruption; PII handling and residency controls remain the operator's responsibility.

Google documents secure-by-design sandboxes for agents that generate code or serve multiple tenants, plus a single-writer architecture to keep distributed session state consistent. Those are meaningful data-protection building blocks when agents run concurrently. They do not replace classification at ingestion, retention policies on event logs, or contractual controls on model providers behind the harness. AX runs on your compute (including Agent Substrate on GKE), which helps residency-minded teams keep payloads inside chosen regions, but the runtime does not redact PII from logs or block sensitive fields from reaching tools.

Practitioner action: Classify data categories before enabling AX against regulated workloads. Restrict sandbox filesystem and network egress. Configure log retention and access controls on the controller event store. Review Google Cloud data processing terms for any managed agents you federate alongside self-hosted AX.

Observability

Strong

Event logging, snapshotting, and controller-mediated actor calls give trace-grade visibility into long-running distributed agent workflows.

AX's design centres on an event log and snapshots so executions can resume after outages, human-in-the-loop pauses, or client disconnects. Connection recovery replays responses from the last sequence seen by the client, which improves operability for hour-scale runs. Because every skill, tool, and agent invocation is coordinated by one controller, teams can build a unified audit trail rather than stitching logs across siloed processes. Preview status means exporters and dashboards are still maturing; the architectural fit for PSF Domain 4 is nonetheless stronger than most agent frameworks at launch.

Practitioner action: Export controller events to your existing observability stack (OpenTelemetry, Cloud Logging, or Langfuse). Alert on stalled executions and replay failures. Correlate AX run IDs with deployment change records for incident response.

Deployment Safety

Strong

Durable execution, checkpoint resume, and trajectory branching address the fragility of long-running agents that standard request/response stacks ignore.

Production agent failures often come from partial runs: a network blip mid-tool-chain, a pod restart during a multi-hour research task, or a human approval that arrives hours later. AX treats resumption as a first-class capability via event logs and snapshotting, which maps directly to PSF Domain 5 expectations for bounded, recoverable deployments. Agent Substrate (announced the same day, May 20, 2026) targets density limits on vanilla Kubernetes when millions of short tool calls chatter against the control plane. Teams should still implement cost circuit breakers and blast-radius limits in harness configuration; the runtime does not cap spend or tool volume automatically.

Practitioner action: Pin AX and Substrate versions in GitOps manifests. Test resume behaviour after forced pod kills. Pair with step budgets in LangGraph or ADK graphs. Document rollback: which checkpoint ID is safe to promote after a bad branch.

Human Oversight

Strong

Human-in-the-loop confirmations are an explicit resumption scenario; AX is architected for pause, review, and continue rather than fire-and-forget autonomy.

Google's launch post lists human-in-the-loop confirmations alongside outages as events that should not destroy agent state. That alignment matters for regulated workflows where irreversible actions require approval. AX does not prescribe your approval UI or escalation policy, but the runtime primitive (pause, snapshot, resume) is the hard part many teams rebuild poorly. Compare with LangGraph interrupts (see our LangChain assessment) and Cursor SDK pause semantics: AX generalises the pattern across harnesses at the infrastructure layer.

Practitioner action: Map each irreversible tool call to a mandatory HITL checkpoint in the harness. Store approver identity in AX metadata for audit. For high-risk domains, require dual control before resuming from snapshot.

Security

Partial

Sandbox isolation and centralized coordination improve containment, but preview maturity, MCP breadth, and policy enforcement still demand explicit security architecture.

AX isolates components in sandboxes to limit harmful side effects when agents generate code or share infrastructure across tenants. Central coordination also makes it easier to deny-list dangerous tool combinations. The README warns that AX is in active early development with breaking changes before stable release: security teams should treat May 2026 as an architecture preview, not a finished hardening pass. Federating Antigravity, Managed Agents API, and third-party MCPs expands blast radius exactly as Cursor SDK assessments document for ambient agents. Penetration testing and supply-chain review of the google/ax repo should precede regulated production use.

Practitioner action: Run AX sandboxes with least-privilege network policies on GKE. Limit MCP and tool registrations per environment. Track CVEs on AX releases. Cross-train security reviewers on CAIS controls for agent tool access.

Vendor Resilience

Strong

Open-source AX under Apache-style community development, harness-agnostic design, and self-managed compute reduce dependence on a single model vendor or cloud agent SKU.

Google markets AX as a way to prevent lock-in: bring your own harness, models, and compute while optionally federating Google-managed agents when useful. That is a credible PSF Domain 8 story for enterprises that must keep proprietary workflows on-premises or on GKE. Resilience is not automatic: teams still need abstraction at the model API layer and tested fallback harnesses. The strategic coupling risk shifts toward Agent Substrate and Gemini Enterprise services if those become the only supported path at scale, but the runtime itself is portable open source.

Practitioner action: Maintain a secondary harness and model path tested quarterly. Document exit criteria if Managed Agents API pricing or terms change. Keep infrastructure-as-code for Substrate optional so AX can run on standard GKE where density requirements are modest.

Certification and stack context

Teams operating AX on GKE should align runtime logging with CLOE (Certified LLM Operations Engineer) expectations for inference and agent fleet operations. Tool-heavy graphs federated through AX benefit from CAIS (Certified AI Safety Specialist) training on MCP blast radius. For first production agent deployments on Google Cloud, AIDA (AI Deployment Associate) covers the deployment checklist AX does not enforce automatically. Compare orchestration alternatives in our agent framework comparison and Amazon Bedrock assessment when mixing cloud agent platforms.

Sources

Scores are structured assessments against PSF v1.1, not empirical lab results. Revisit this page when AX reaches stable release or when Google publishes security hardening guidance beyond preview.

Public record

This record is maintained by PAI and free to cite. If something is wrong or missing, tell us. Corrections and source suggestions keep the record honest.

Follow policy changes ->Save a watch ->Monitoring & advisory ->Institutional contact ->Submit a correction

Records are free to cite. citation guidance.

Artifact

Version / status

Date

Agent Executor (AX)

Open-source preview (breaking changes expected)

2026-05-20

Agent Sandbox on GKE

Generally available

2026-05-20

Agent Substrate

Open-source preview

2026-05-20

PSF domain scorecard

Ratings reflect native AX capabilities documented at launch, not optional Google Cloud managed services. Full domain definitions are in the Production Safety Framework.

Domain	Rating
D1Input Governance	Partial
D2Output Validation	Gap
D3Data Protection	Partial
D4Observability	Strong
D5Deployment Safety	Strong
D6Human Oversight	Strong
D7Security	Partial
D8Vendor Resilience	Strong

Input Governance

Partial

Agent Executor coordinates actor calls through a central controller but does not natively classify, scope-check, or sanitise untrusted inputs before they reach agents or tools.

Output Validation

Gap

Trajectory branching and checkpoints help experiment with outputs, but AX does not ship output contracts, schema enforcement, or semantic validation on agent completions.

Data Protection

Partial

Secure sandboxes and single-writer session consistency reduce accidental cross-tenant corruption; PII handling and residency controls remain the operator's responsibility.

Observability

Strong

Event logging, snapshotting, and controller-mediated actor calls give trace-grade visibility into long-running distributed agent workflows.

Deployment Safety

Strong

Durable execution, checkpoint resume, and trajectory branching address the fragility of long-running agents that standard request/response stacks ignore.

Human Oversight

Strong

Human-in-the-loop confirmations are an explicit resumption scenario; AX is architected for pause, review, and continue rather than fire-and-forget autonomy.

Security

Partial

Sandbox isolation and centralized coordination improve containment, but preview maturity, MCP breadth, and policy enforcement still demand explicit security architecture.

Vendor Resilience

Strong

Open-source AX under Apache-style community development, harness-agnostic design, and self-managed compute reduce dependence on a single model vendor or cloud agent SKU.

Certification and stack context