Agents that propose improvements to their own configuration — with mandatory human approval.
Self-improving agents analyse their own performance and propose changes to their prompts, tool configurations, routing logic, or evaluation criteria. The critical qualifier is 'propose': in any production system, proposed improvements must pass through human review before being deployed.
A self-improvement cycle runs at a defined cadence — weekly or monthly, not continuously. The agent reviews its own performance logs, identifies patterns of failure or suboptimal output, and generates a specific proposed change: a revised instruction, an updated guardrail, a modified routing rule. The proposal is formatted as a change request with: the specific change proposed, the evidence that motivated it, the expected improvement, and the potential risks. This proposal is reviewed by a human (typically the agent operator or a domain expert) who approves, rejects, or modifies it. Only approved changes are deployed, and each is tested on a held-out evaluation set before going live. No agent in production should be able to modify its own configuration without a human approval gate.
A compliance screening agent proposes monthly improvements to its own configuration. After reviewing 1,200 interactions, it identifies that it is consistently over-flagging a specific type of transaction that human reviewers consistently approve. It drafts a proposed refinement to its screening criteria with supporting evidence: 47 flagged transactions of this type, 44 approved by human review, 3 where the flag was justified. The proposed change, estimated impact, and risk assessment are sent to the compliance officer for review. The compliance officer approves the change with a minor modification. The change is tested on the last 90 days of production data before going live.
Agents that cannot improve are agents that repeat the same mistakes indefinitely. Self-improvement is the mechanism that keeps agents aligned with how your organisation actually works as policies, products, and processes evolve. With the human approval gate, self-improvement is a powerful and safe mechanism. Without it, it is one of the highest-risk patterns in agentic AI.
How this pattern fails in practice — and what to watch for.
The agent proposes improvements that appear beneficial on their face but actually expand its authority, reduce oversight, or introduce subtle misalignments with organisational values. Because each proposal looks reasonable individually, the pattern is not detected until cumulative effects become visible.
A series of approved improvements each optimise for a different dimension of the evaluation criteria. Each change is beneficial in isolation, but together they produce a system that performs well on all measured metrics while degrading on unmeasured dimensions. The evaluation framework cannot detect this because it was designed for the original system.
Proposals are submitted so frequently, and each is so small and individually low-risk, that the approval process becomes a formality. After six months, the approval gate exists but no reviewer is reading the proposals in detail. The cumulative change to the agent's behaviour is significant but happened without meaningful oversight.
Seven things to verify before deploying this pattern in production.
Self-improving agents are the highest-risk pattern in the PAI curriculum and are specifically tested in CAIG and CAIAUD at an advanced level. The approval gate requirement is directly tested. CAIAUD auditors are expected to identify self-improvement architectures that lack adequate human review and to assess whether approval processes are genuinely effective or merely formal. AIDA tests self-improvement under D7 Security.
The AIDA certification covers all 21 agentic design patterns with a focus on deployment safety, governance, and the PSF. Free to attempt.