Production AI Institute · Independent certification for production AI practice
Verify a credential|Contact|

Insights / PSF / Domain Guide

Production AI Institute — PSF Domain Guide v1.0
Published: 2026-04-29 · License: CC BY 4.0
Domain: PSF-6 — Human Oversight
PSF-6

Human Oversight

The question is not whether your AI system has human oversight. It is whether that oversight is real. Human oversight that is nominal — a review step that nobody genuinely engages with, an escalation path that has never been tested, a veto mechanism that nobody knows how to invoke — provides false assurance while adding cost. PSF-6 is about designing oversight that actually works.

Autonomy Level Assignment

Every AI-assisted decision should be assigned an autonomy level that determines the degree of human involvement. The PSF autonomy scale runs from L0 (human decides, AI provides information only) through L4 (fully autonomous, no human in the loop). The assignment is not a technical decision alone — it requires input from legal, compliance, and business stakeholders. The autonomy level must be documented in the system's behaviour contract and reviewed when the system's scope or risk profile changes.

When Human Oversight Is Required

  • Decision is irreversible or materially difficult to reverse (loan denial, account termination, medical recommendation, contract execution)
  • Decision significantly affects an individual's rights, opportunities, or wellbeing
  • Applicable law requires a human decision-maker (GDPR Article 22, EU AI Act Article 14, sector-specific regulations)
  • Model confidence score falls below a calibrated threshold
  • Input is flagged as novel or out-of-distribution by monitoring systems
  • Decision value or consequence magnitude exceeds a defined threshold
  • System is operating in a domain where model accuracy is known to be lower than average
  • Decision is the first occurrence of a new pattern not previously seen in production

Designing Review Queues That Work

A review queue is a process, not a technology. The most common failure in human oversight design is a review queue that is technically present but operationally broken: it accumulates faster than it is processed, reviewers approve items without genuine evaluation, and the queue exists to satisfy a governance requirement rather than to provide actual oversight. Design review queues with explicit SLAs (items must be reviewed within N hours), named ownership (a specific person is responsible for the queue, not 'the team'), and quality measurement (a sample of reviews are themselves reviewed for quality).

PSF-6 Oversight Design Controls

Autonomy level documentation

Every AI component has a documented autonomy level. The documentation includes the rationale, the conditions for escalation to L1, and the process for requesting a level change.

Override mechanism

At every autonomy level from L1 upward, there is a defined override mechanism that a human can invoke to suspend or reverse an AI decision. The mechanism must be discoverable and usable under pressure.

Blind review sampling

A percentage of AI decisions are routed to human review with the AI recommendation hidden. This measures genuine human accuracy independent of AI influence and maintains reviewer skill.

Disagreement tracking

When a human reviewer disagrees with an AI recommendation, that disagreement is logged with a reason code. Disagreement patterns are the primary feedback signal for model improvement.

Escalation path testing

Escalation paths are tested at least quarterly. A test escalation is triggered and the response time and quality are measured. Untested escalation paths that have never been exercised should be treated as non-functional.

Skill maintenance

Human reviewers must maintain the domain expertise to actually evaluate AI outputs. Regular AI-free case handling, training updates, and competency assessment prevent the expertise atrophy that makes oversight nominal.

PSF-6 Compliance Checklist

Autonomy level assigned and documented for every AI decision type
Legal review completed for all L2–L4 decisions: no undisclosed automated decision-making under applicable law
Escalation paths documented with named owners and tested response times
Override mechanism exists and is documented in the operational runbook
Review queue SLA defined and monitored
Blind review sampling implemented to measure genuine review quality
Disagreement logging in place with structured reason codes
Reviewer skill maintenance programme in place (AI-free case handling, regular training)
High-consequence decisions (irreversible, high-value, legally regulated) at L0 or L1 only
Annual autonomy level review: every component re-assessed against current risk profile

AIDA Exam Tips for PSF-6

  • PSF-6 is the human oversight domain. If a scenario describes a situation where a human should have been involved but wasn't, it is almost certainly a PSF-6 failure.
  • The autonomy level framework is heavily tested. Know L0–L4 and the conditions that require L0 or L1 (irreversibility, legal obligation, high consequence).
  • Blind review sampling is the exam answer whenever a question describes reviewers rubber-stamping AI decisions without genuine evaluation.
  • Disagreement tracking is a PSF-6 control, not a PSF-4 (Observability) control. The distinction: PSF-4 monitors system performance, PSF-6 monitors human-AI interaction quality.
  • Override mechanism questions: the correct answer describes a mechanism that is available, documented, and tested — not just technically possible.

Certifications that assess PSF-6

AIDA ExaminationCPAP PortfolioCPAA Architecture
Full PSF FrameworkStudy GuidePractice Exam