On May 29, 2026, Cursor shipped version 3.6 with Auto-review, a new agent run mode designed to let agents work longer with fewer approval prompts while keeping execution safer. Auto-review applies to Shell, MCP, and Fetch tool calls: allowlisted calls run immediately, sandboxable calls run in the sandbox, and all other actions pass to a classifier subagentthat may allow the call, try a different approach, or ask for human approval. Teams configure the mode under Settings > Cursor Settings > Agents > Run Mode and may steer the classifier with custom instructions.
This release complements, rather than replaces, our Cursor 3.5 Automations assessment (scheduling, multi-repo, and no-repo cloud agents) and the April 2026 Cursor SDK assessment (programmatic embedding). Auto-review targets how tool calls execute during long interactive and automation sessions. For organisations standardising on Cursor for production engineering, the PSF question is whether classifier-gated execution closes governance gaps from approval fatigue or creates new dependence on opaque subagent decisions.
Release scope assessed
| Capability | Version / status | Date |
|---|---|---|
| Auto-review run mode (allowlist, sandbox, classifier) | Cursor 3.6 | 2026-05-29 |
| Shell, MCP, Fetch tool-call governance | GA in 3.6 changelog | 2026-05-29 |
| Custom classifier instructions | Settings > Agents > Run Mode | 2026-05-29 |
PSF domain scorecard
Ratings reflect Cursor 3.6 Auto-review as documented in the public May 29, 2026 changelog. Domain definitions: Production Safety Framework.
Input Governance
PartialAuto-review classifies Shell, MCP, and Fetch calls before execution, which adds a governance gate on tool inputs; allowlisted calls still bypass review and custom classifier instructions can drift from org policy.
Cursor 3.6 introduces Auto-review as a run mode where agent tool calls pass through a classifier subagent before Shell, MCP, or Fetch execution. Allowlisted calls run immediately. Calls that can be sandboxed run in the sandbox. All other actions are evaluated for allow, alternate approach, or human approval. That is a meaningful input gate compared with interactive agents that prompt on every risky call. The gate is heuristic: practitioners define allowlists and steer the classifier with custom instructions in Settings > Cursor Settings > Agents > Run Mode. Misconfigured allowlists or vague classifier prompts can reintroduce the same ambient-agent risks Auto-review is meant to reduce, especially when MCP servers accept unstructured external payloads.
Output Validation
PartialAuto-review focuses on pre-execution tool safety rather than validating agent outputs before side effects complete; practitioners still need output contracts for PRs, messages, and infrastructure changes.
The May 29, 2026 changelog positions Auto-review as safer execution with fewer approval prompts, not as an output validator. A classifier may block or reroute a destructive shell command, but it does not enforce schema checks on generated code, JSON payloads, or customer-facing text before those artifacts leave the agent session. Teams using Auto-review alongside Cursor 3.5 shared canvases can combine pre-execution gates with human review of artifacts, but that pairing is architectural, not automatic. Compare with output validation patterns in our Cursor SDK assessment: streaming visibility and canvas review help; enforcement remains deployment-layer work.
Data Protection
PartialSandbox routing for eligible Shell, MCP, and Fetch calls reduces direct exfiltration paths; classifier and cloud execution still process sensitive context unless you scope repos, MCP servers, and credentials.
Auto-review explicitly routes sandboxable calls into Cursor sandbox rather than the host environment, which is a concrete data-protection control for filesystem and network reach. Allowlisted immediate execution and classifier-approved calls can still read secrets from attached repos, environment files, and MCP-connected systems. Fetch tool calls may retrieve URLs supplied by the model or by poisoned issue comments. Multi-repo automations from Cursor 3.5 inherit this surface: Auto-review governs how tools run, not what data the agent already loaded into context. Regulated teams should map data categories before enabling Auto-review on production repositories or billing-connected MCP servers.
Observability
GapRun mode and classifier decisions improve operator intent, but Cursor does not yet expose structured export of allow, sandbox, reroute, and approval events to enterprise observability stacks by default.
Practitioners can infer behaviour from agent run history and fewer interrupt prompts, yet PSF Domain 4 at enterprise scale requires correlation IDs, retention aligned to compliance schedules, and SIEM integration for every classifier decision on Shell, MCP, and Fetch. The changelog documents configuration paths and custom classifier instructions but not OpenTelemetry spans or webhook feeds for governance teams. Without exported decision logs, incident response after a mis-approved tool call depends on reconstructing session transcripts. Teams should treat Cursor telemetry as a supplement to monitoring on systems the agent touches, consistent with our Cursor 3.5 Automations assessment.
Deployment Safety
StrongVersioned 3.6 release, explicit run-mode configuration, sandbox routing, and classifier escalation provide a clearer rollout surface than ad hoc approval fatigue in long agent sessions.
Cursor documents release 3.6 on May 29, 2026 with Auto-review as a first-class run mode configurable per workspace. Allowlist plus sandbox plus classifier tiers give platform teams a staged adoption path: start with sandbox-only, tighten allowlists, then tune classifier instructions. This directly addresses deployment safety risks from unattended agents working for extended periods with fewer prompts, which our Cursor 3.5 assessment flagged for multi-repo and no-repo automations. Blast radius still depends on practitioner-defined allowlists and whether Auto-review is enabled for production orgs versus experimentation sandboxes.
Human Oversight
StrongClassifier subagents can defer to human approval on non-allowlisted, non-sandboxable actions, which is the strongest native oversight signal Cursor has shipped for interactive and automation runs.
Auto-review is designed so Cursor can work longer with fewer approval prompts while preserving an escalation path: the classifier may ask for your approval when a call cannot be allowlisted or sandboxed safely. That maps cleanly to PSF Domain 6 for consequence classes where irreversible infrastructure or customer-facing actions require a named human. Custom instructions let security teams encode org-specific escalation language. The control is not universal: allowlisted calls skip human review entirely, and classifier quality varies with model behaviour. Teams running Cursor Automations should verify Auto-review applies to scheduled cloud agents, not only local IDE sessions, before relying on it for production schedules.
Security
PartialSandbox execution, allowlists, and classifier rerouting reduce prompt-injection success rates on Shell, MCP, and Fetch; MCP supply-chain and over-permissioned allowlists remain primary attack paths.
Auto-review adds defence in depth against agent tool abuse: even if a ticket, Slack thread, or poisoned file steers the model toward a destructive command, the classifier may block, sandbox, or escalate. Security value depends on allowlist hygiene and classifier prompt quality. Over-broad shell patterns, credentials in repos, and high-privilege MCP servers can still execute immediately when allowlisted. Adversarial testing should target Auto-review specifically: injection via Jira assignments, shared canvas links, and no-repo automation triggers documented in our May 2026 Cursor assessments. Align reviews with CAIS expectations for tool access and supply-chain monitoring.
Vendor Resilience
GapAuto-review, classifier behaviour, and sandbox rules are Cursor-platform controls without portable equivalents; model or policy changes can alter safety posture without a practitioner-owned rollback artifact.
PSF Domain 8 covers continuity when vendors change defaults, deprecate features, or become unavailable. Cursor 3.6 deepens dependence on Cursor-specific run modes: allowlists, classifier subagents, and sandbox routing are not exportable policies you can replay on another harness. A changelog update to classifier defaults or sandbox eligibility could shift production risk silently. Teams mixing Cursor with Azure Foundry Opus 4.8 or OpenAI Codex should document fallback runbooks that do not assume Auto-review exists on alternate platforms. Enterprise contracts may offer DPAs and support SLAs, but practitioners still need written exit paths before production schedules rely on Cursor-only safety primitives.
Certification and stack context
Teams enabling Auto-review on production repositories should align runbooks with AIDA (AI Deployment Associate) deployment checklists before expanding allowlists or MCP connectors. Long-running agent sessions that ship code benefit from CLOE (Certified LLM Operations Engineer) practices for model versioning, cost controls, and incident response. Classifier and allowlist configuration should be reviewed against CAIS (Certified AI Safety Specialist) tool-access guidance, especially when Shell or Fetch can reach production infrastructure. Compare orchestration alternatives in our agent framework comparison and the contemporaneous Claude Opus 4.8 assessment when mixing models behind Cursor agents.
Sources
- Cursor Changelog: 3.6 Auto-review (May 29, 2026)
- Production AI Institute: Cursor 3.5 Automations PSF Assessment
- Production AI Institute: Cursor SDK PSF Assessment
- Production AI Institute: Production Safety Framework
- Production AI Institute: PSF Domain 6 Human Oversight
- Production AI Institute: Agent framework comparison
Scores are structured assessments against PSF v1.1, not empirical lab results. Revisit when Cursor publishes structured audit export for classifier decisions or extends Auto-review to additional tool types beyond Shell, MCP, and Fetch.
Use this assessment against your own deployment. The free AIDA exam checks PSF readiness in about 20 minutes.
Verify your deployment — free AIDA exam →Turn the evidence into production practice.
Use the PSF, research library, and Lab material to review your own deployment. Credentials are available when a client, employer, or regulator needs public proof.