Headless Cursor SDK 3.7: What Auto-Review Still Misses

Independence disclosure: The Production AI Institute has no commercial relationship with Cursor. This assessment is based on the June 4, 2026 changelog, published SDK documentation, and our April 2026 Cursor SDK baseline assessment. Cursor was not consulted in preparing this assessment.

Cursor SDK 3.7 extends the programmatic agent surface Cursor opened in April 2026. The June 4 release targets production scripts, CI pipelines, and custom integrations: practitioners can register custom tools as function definitions without hosting MCP servers, route headless tool calls through auto-review, persist runs in JSONL or custom stores, and delegate across nested subagents to arbitrary depth. Python SDK 0.1.6 ships alongside TypeScript improvements for workspace-scoped run listing and clearer not-found errors.

For teams already using the April 2026 Cursor SDK assessment, 3.7 is the first release that makes unattended production automation defensible when paired with permissions.json (schema reference: autoRun allow_instructions / block_instructions and allowlists) and external approval workflows. Compare IDE-side governance in our Cursor 3.6 Auto-review assessment and org segmentation in the Enterprise Organizations assessment.

Release scope assessed

Artifact	Version	Date
TypeScript SDK (@cursor/sdk)	3.7	2026-06-04
Python SDK (cursor-sdk)	0.1.6	2026-06-04
Custom tools (custom-user-tools MCP)	3.7	2026-06-04
local.autoReview for headless runs	3.7	2026-06-04
Nested subagents	3.7 (automatic)	2026-06-04

PSF domain scorecard

Ratings reflect SDK 3.7 capabilities documented in the June 4, 2026 changelog. Full domain definitions are in the Production Safety Framework.

Domain	Rating
D1Input Governance	Partial
D2Output Validation	Partial
D3Data Protection	Partial
D4Observability	Strong
D5Deployment Safety	Partial
D6Human Oversight	Partial
D7Security	Partial
D8Vendor Resilience	Partial

Input Governance

Partial

Custom tools let teams pass function definitions directly to the agent without standing up MCP servers, which tightens the input surface for programmatic tasks but does not classify untrusted external payloads.

The June 4, 2026 Cursor changelog documents local.customTools on Agent.create() and per send(), exposing capabilities through a built-in MCP server called custom-user-tools. That reduces integration sprawl for CI scripts that call internal APIs with typed schemas. Custom tools inherit to every subagent in a run, so a permissive parent definition propagates across nested delegations. The SDK still routes free-text instructions straight to the agent runtime without native classification, sanitisation, or scope validation, consistent with our April 2026 Cursor SDK assessment. Auto-review (local.autoReview) helps gate tool calls in headless mode but is not an input filter for user-supplied strings arriving from webhooks or tickets.

Practitioner action: Define JSON schemas for every custom tool argument. Block external webhook text from reaching send() without a classification Lambda. Scope custom tools per subagent where nesting is enabled.

Output Validation

Partial

Reliable wait() until terminal results and requestId correlation improve harness integrity, but the SDK does not enforce output contracts before custom tools execute side effects.

Cursor fixed local runs resolving wait() before the terminal result is written, so automation scripts read complete RunResult objects instead of racing hydration. Each send() now carries a platform-generated requestId persisted across in-memory, SQLite, and JSONL stores, which supports post-hoc validation that a logged output matched the intended run. Custom tools can mutate production systems when the model emits arguments that pass JSON shape checks but violate business rules. Nested subagents multiply validation burden: a parent may never see intermediate tool outputs from deep children unless you instrument stores explicitly.

Practitioner action: Wrap custom tool handlers with idempotent guards and schema validators. Log requestId with every downstream API call. Add golden-set regression tests before promoting SDK scripts to production cron.

Data Protection

Partial

JSONL and composable LocalAgentStore implementations give teams auditable, diffable run metadata, but default persistence still lands on developer disks unless you redirect stores to encrypted backends.

Version 3.7 exports JsonlLocalAgentStore and documents a public LocalAgentStore interface so practitioners can back agent state with Postgres or ephemeral in-memory stores for CI. JSONL append-only files are easier to redact and version-control than opaque SQLite blobs, which aids data-minimisation reviews. The trade-off is accidental commit of run files containing MCP payloads, stack traces, or customer identifiers. Nested subagents increase the volume of persisted transcripts. Cloud streaming fixes for HTTP/1.1 proxies do not change Cursor data-processing terms; teams in regulated industries still need contractual review of what the platform retains.

Practitioner action: Store JSONL outside git by default. Implement LocalAgentStore with TTL and encryption for production. Scrub MCP tool responses before writing to shared stores.

Observability

Strong

requestId on every send(), persisted across all store types, is the clearest observability upgrade for tying SDK automation to backend logs and support threads.

PSF Domain 4 requires practitioners to reconstruct what an agent consumed and emitted. Cursor 3.7 surfaces requestId on Run and RunResult and keeps it through hydration, so CLOE-style operators can correlate a CI failure to Cursor backend telemetry without inferring identity from agentId alone. Workspace-scoped list_runs in Python 0.1.6 reduces spurious not-found errors when bridges run as subprocesses, improving script reliability. What remains missing is native OpenTelemetry export or SIEM integration; teams must wrap the SDK with their own trace context propagation. Composer 2 to Composer 2.5 automatic routing should be logged explicitly because model behaviour can shift without a semver bump in client code.

Practitioner action: Emit requestId to your APM on every send(). Alert when auto-review blocks spike. Track model slug after Composer 2.5 migration in run metadata.

Deployment Safety

Partial

Auto-review for headless SDK runs closes a major safety gap from the April beta, but nested subagents and custom tools expand blast radius unless step budgets and approval gates are explicit.

Before 3.7, local SDK agents executed tool calls without human approval by default, which was appropriate for dev scripts but risky for unattended production automation. local.autoReview routes calls through the same classifier used in IDE auto-review, steered by permissions.json autoRun.allow_instructions and block_instructions. That is a meaningful deployment-safety primitive for CI and cron jobs. Nested subagents can delegate across arbitrary depth with separate prompts and models, increasing parallel side-effect risk. Safe checkpoint handling on dispose prevents accidental data loss but does not cap wall-clock runtime. Bundled ripgrep and lighter @cursor/sdk imports improve packaging stability for pinned CI images.

Practitioner action: Enable local.autoReview on every headless production agent. Cap subagent depth and concurrent tool calls. Pin @cursor/sdk and cursor-sdk versions in lockfiles. Stage in a canary pipeline before fleet-wide cron adoption.

Human Oversight

Partial

Auto-review introduces classifier-mediated holds for destructive tool shapes, but headless mode still lacks a native approval queue for business-consequence actions.

The June 4 release documents natural-language steering of the auto-review classifier, for example allowing read-only dist inspections while blocking deletes. That maps well to CAIS oversight patterns when instructions are maintained in version-controlled permissions.json. Classifier decisions are probabilistic: a misclassified allow on a financial transfer or production delete still causes harm. Nested subagents can execute custom tools the parent operator never reviewed if inheritance is enabled. Human oversight for SDK deployments therefore requires external workflow tools (PR gates, change tickets) for irreversible actions, not reliance on auto-review alone.

Practitioner action: Mirror IDE Run Mode policies into permissions.json for SDK fleets. Require human PR review before merge on any agent-opened branch. Escalate block_instructions changes through change control.

Security

Partial

Custom tools execute in-process with the same permission gate as MCP tools, which is cleaner than ad-hoc HTTP bridges but concentrates supply-chain risk in function definitions developers register at runtime.

Exposing custom-user-tools through the built-in MCP path means the model invokes your code through Cursor permission checks, a security improvement over unaudited side channels. The risk vector shifts to what those functions can access: database writes, secret managers, or cross-tenant APIs if handlers are over-permissioned. Nested subagents inherit parent custom tools, widening indirect injection payoff if a child ingests untrusted repository content. Auto-review mitigates some shell and MCP calls but does not sandbox custom tool handlers. Bundled ripgrep reduces PATH tampering on Windows CI runners.

Practitioner action: Run custom tool handlers with least-privilege service accounts. Audit permissions.json in CI. Pen-test nested subagent flows with adversarial repo fixtures.

Vendor Resilience

Partial

Self-contained TypeScript types and Composer 2.5 routing reduce integration fragility, but cloud and local SDK runs remain dependent on Cursor platform availability and licensing.

Published .d.ts files no longer reference unpublished workspace packages, fixing TS2305 and TS2307 failures under skipLibCheck: false, which matters for enterprises that type-check SDK consumers strictly. Automatic Composer 2 to 2.5 routing keeps retired model slugs working but changes model behaviour without an explicit client upgrade, so golden sets should be re-run after platform-side migrations. LocalAgentStore abstraction improves portability of run metadata, yet the agent runtime itself is still Cursor-specific. Teams needing multi-vendor resilience should keep orchestration logic behind an abstraction tested against at least one non-Cursor harness quarterly.

Practitioner action: Pin and test model slugs explicitly instead of relying on silent routing. Document fallback when Cursor cloud agents are unavailable. Export JSONL run history before vendor migrations.

Certification and stack context

Teams promoting SDK 3.7 scripts to production cron should align requestId logging and auto-review policies with CLOE (Certified LLM Operations Engineer) expectations. Nested subagents and custom tools benefit from CAIS (Certified AI Safety Specialist) training on tool blast radius. For first programmatic agent deployments, AIDA (AI Deployment Associate) covers checklists the SDK does not enforce automatically. Review ambient agent production safety when wiring SDK agents to Gmail, Slack, or GitHub MCP servers.

Sources

Scores are structured assessments against PSF v1.1, not empirical PAI Lab multi-run results. Revisit when Cursor publishes formal SDK semver deprecation policy or exports OpenTelemetry spans from local agents.

Use this assessment against your own deployment. The readiness check scores a live system against the same PSF controls.

Run a readiness check on your deployment →

Public record

This record is maintained by PAI and free to cite. If something is wrong or missing, tell us. Corrections and source suggestions keep the record honest.

Follow policy changes ->Save a watch ->Submit a correction

Records are free to cite. citation guidance.

Release scope assessed

Artifact	Version	Date
TypeScript SDK (@cursor/sdk)	3.7	2026-06-04
Python SDK (cursor-sdk)	0.1.6	2026-06-04
Custom tools (custom-user-tools MCP)	3.7	2026-06-04
local.autoReview for headless runs	3.7	2026-06-04
Nested subagents	3.7 (automatic)	2026-06-04

PSF domain scorecard

Ratings reflect SDK 3.7 capabilities documented in the June 4, 2026 changelog. Full domain definitions are in the Production Safety Framework.

Domain	Rating
D1Input Governance	Partial
D2Output Validation	Partial
D3Data Protection	Partial
D4Observability	Strong
D5Deployment Safety	Partial
D6Human Oversight	Partial
D7Security	Partial
D8Vendor Resilience	Partial

Input Governance

Partial

Output Validation

Partial

Reliable wait() until terminal results and requestId correlation improve harness integrity, but the SDK does not enforce output contracts before custom tools execute side effects.

Data Protection

Partial

Practitioner action: Store JSONL outside git by default. Implement LocalAgentStore with TTL and encryption for production. Scrub MCP tool responses before writing to shared stores.

Observability

Strong

requestId on every send(), persisted across all store types, is the clearest observability upgrade for tying SDK automation to backend logs and support threads.

Practitioner action: Emit requestId to your APM on every send(). Alert when auto-review blocks spike. Track model slug after Composer 2.5 migration in run metadata.

Deployment Safety

Partial

Auto-review for headless SDK runs closes a major safety gap from the April beta, but nested subagents and custom tools expand blast radius unless step budgets and approval gates are explicit.

Human Oversight

Partial

Auto-review introduces classifier-mediated holds for destructive tool shapes, but headless mode still lacks a native approval queue for business-consequence actions.

Security

Partial

Practitioner action: Run custom tool handlers with least-privilege service accounts. Audit permissions.json in CI. Pen-test nested subagent flows with adversarial repo fixtures.

Vendor Resilience

Partial

Self-contained TypeScript types and Composer 2.5 routing reduce integration fragility, but cloud and local SDK runs remain dependent on Cursor platform availability and licensing.

Certification and stack context

Sources

Use this assessment against your own deployment. The readiness check scores a live system against the same PSF controls.

Run a readiness check on your deployment →

Public record

This record is maintained by PAI and free to cite. If something is wrong or missing, tell us. Corrections and source suggestions keep the record honest.

Follow policy changes ->Save a watch ->Submit a correction

Records are free to cite. citation guidance.

Headless Cursor SDK 3.7: what auto-review still misses

Release scope assessed

PSF domain scorecard

Input Governance

Output Validation

Data Protection

Observability

Deployment Safety

Human Oversight

Security

Vendor Resilience

Certification and stack context

Sources

Headless Cursor SDK 3.7: what auto-review still misses

Release scope assessed

PSF domain scorecard

Input Governance

Output Validation

Data Protection

Observability

Deployment Safety

Human Oversight

Security

Vendor Resilience

Certification and stack context

Sources