Healthcare AI Deployment Playbook
A practitioner guide to deploying AI in clinical and healthcare settings — covering HIPAA, FDA AI/ML guidance, NHS clinical safety requirements, and a full PSF domain mapping. Healthcare has the highest regulatory burden and the highest patient harm potential of any AI deployment context.
Regulatory Landscape
Healthcare AI sits at the intersection of data privacy law, medical device regulation, and general AI governance frameworks. Unlike financial services — where the regulatory question is primarily about fairness and systemic risk — healthcare AI regulation is fundamentally about patient safety and direct harm prevention.
| Framework | Jurisdiction | Primary focus | PSF domains |
|---|---|---|---|
| HIPAA | US | PHI privacy, security, breach notification for AI systems handling patient data | D3, D7 |
| FDA AI/ML SaMD | US | Pre-market authorisation, performance monitoring, predetermined change control plans | D2, D5, D6 |
| EU AI Act — High Risk | EU | Medical devices and clinical management systems are listed as high-risk AI systems | D1–D8 (all) |
| CMS Interoperability | US | API access to health data — AI systems consuming this data inherit the obligations | D3, D8 |
| NHS AI Framework | UK | Clinical safety assessment (DCB 0129), algorithmic transparency, fairness | D2, D6 |
| ISO 14971 | International | Risk analysis and control for medical AI — probability and severity of harm | D5, D6 |
The PHI Problem: Most Common AI Compliance Failure
The single most common AI compliance failure in healthcare is sending Protected Health Information (PHI) to a third-party LLM API without a signed Business Associate Agreement (BAA). This is a HIPAA violation with penalties up to $1.9M per violation category per year.
Mitigation: all major AI vendors (OpenAI, Anthropic, Google, Microsoft/Azure) offer BAAs for enterprise tiers. Sign one. Separately, implement PHI detection and redaction before sending to any external API — even under a BAA, sending only what is necessary is best practice (data minimisation).
PSF Domain Mapping for Healthcare
Every PSF domain applies in healthcare, but three (D1, D2, D3) are elevated to Critical status due to direct patient harm potential. The following analysis maps each PSF domain to the healthcare context with specific regulatory touchpoints.
PSF-1 Input Governance
CriticalClinical AI systems receive inputs from EHR systems, clinician free-text, and imaging pipelines. Prompt injection via malformed clinical notes is a documented attack vector. Schema validation on all HL7 FHIR inputs is non-negotiable.
- Validate all EHR data against FHIR R4/R5 schema before passing to the model
- Treat free-text clinical notes as untrusted input — sanitise before inclusion in prompts
- Implement strict system prompt isolation; clinical context must not be manipulated by patient-submitted text
- Rate-limit API access; abuse detection tuned for clinical access patterns
PSF-2 Output Validation
CriticalClinical decision support outputs must be validated before display. A hallucinated drug dosage or contraindication assessment can cause direct patient harm. Output contracts with medical coding validation (ICD-10, SNOMED CT) are required.
- Implement output contracts for all clinical recommendations — structured JSON with validation schema
- Validate clinical codes (ICD-10, SNOMED CT, LOINC) against authoritative terminologies
- Set confidence thresholds — below threshold: require human review, do not surface recommendation
- Never allow the model to generate dosing instructions in free-text without validation
PSF-3 Data Protection
Critical — HIPAAPHI (Protected Health Information) is the most heavily regulated personal data category. Sending PHI to a third-party LLM API without a BAA (Business Associate Agreement) is a HIPAA violation. This is the most common AI deployment compliance failure in healthcare.
- Sign BAAs with every AI vendor receiving PHI — OpenAI, Anthropic, Google, Microsoft all offer these
- Implement PHI detection and redaction before sending to any external API (use Microsoft Presidio or AWS Comprehend Medical)
- Never log raw clinical inputs — PHI in logs is a breach
- Audit trace retention policies — LangSmith, Langfuse retain prompts by default; configure data deletion
- For imaging AI: DICOM metadata stripping required before external processing
PSF-4 Observability
RequiredFDA AI/ML guidance requires performance monitoring throughout the product lifecycle. Audit logs for all AI-assisted clinical decisions are required for post-market surveillance and incident investigation. Logs must be HIPAA-compliant — no PHI in observability data.
- Log all AI recommendations with timestamps, confidence scores, and clinician actions
- Implement drift detection — clinical AI degrades as patient population shifts
- Configure HIPAA-safe logging: strip PHI from all trace data before storage
- Retain audit logs for minimum 6 years (HIPAA), or 10 years for medical devices
PSF-5 Deployment Safety
RequiredHealthcare AI must have well-defined blast radius controls. Clinical decision support should operate at L2–L3 autonomy (recommendation with human approval) for high-stakes decisions. L4 autonomous action is only appropriate for clearly bounded, low-risk clinical tasks.
- Define autonomy levels per clinical task: triage classification (L3) ≠ treatment recommendation (L2) ≠ order entry (L2 minimum)
- Implement rollback procedures — ability to revert to previous model version within 4 hours
- Staged deployment: shadow mode first, then limited cohort, then full deployment with monitoring
- Document predetermined change control plan (PCCP) for FDA SaMD compliance
PSF-6 Human Oversight
Critical — Patient SafetyClinical decision support requires meaningful human oversight. 'Alert fatigue' is the primary failure mode — too many AI recommendations cause clinicians to override without reviewing. The oversight design must be calibrated to clinical workflow, not just regulatory compliance.
- Design oversight for clinical context: busy clinician workflow ≠ IT operator dashboard
- Implement tiered alerting: critical (immediate interrupt) vs. advisory (end of note review)
- Blind sampling: regularly send AI recommendations for human review without the AI label
- Track override rates by recommendation type — high override = low clinical trust = model issue
- Escalation paths must go to supervising clinician, not just IT
PSF-7 Security
RequiredHealthcare is the most targeted sector for ransomware and data theft. AI systems are a new attack surface — model poisoning, adversarial clinical note injection, and API credential theft all have direct patient safety implications.
- Treat AI API keys as PHI-equivalent credentials — store in secrets manager, rotate quarterly
- Adversarial testing: red-team the clinical AI with adversarially crafted clinical notes
- Monitor for model extraction attacks — unusual query patterns on clinical AI APIs
- Zero-trust network architecture for AI API traffic
PSF-8 Vendor Resilience
RequiredClinical workflows cannot tolerate AI vendor outages. NHS and hospital IT have experienced AI vendor failures causing clinical disruption. Fallback procedures must be clinically tested, not just technically documented.
- Dual-vendor strategy for critical clinical AI paths
- Graceful degradation: define which clinical decisions revert to manual process on AI failure
- SLA requirements: 99.9% uptime minimum for clinical decision support; 99.99% for anything in the care pathway
- Test the fallback: quarterly drill of 'AI is unavailable' clinical workflow
Clinical vs Administrative AI: Different Risk Profiles
Not all healthcare AI is equal. A chatbot answering FAQs about appointment booking has a fundamentally different risk profile from a clinical decision support tool suggesting diagnoses. Practitioners must be explicit about which category they are deploying.
Recommended Autonomy Levels by Clinical Task
Alert Fatigue: The Oversight Failure Mode Unique to Healthcare
Healthcare has a well-documented problem that is now directly relevant to AI deployment: alert fatigue. Studies show clinicians override up to 95% of drug interaction alerts — not because they are wrong, but because there are too many. AI systems that generate too many recommendations, flags, or warnings will be systematically ignored.
This is a PSF-6 (Human Oversight) failure mode, but it manifests as a PSF-2 (Output Validation) design problem. The solution is calibrated confidence thresholds: only surface recommendations above a high confidence bar, and continuously tune the threshold based on clinician override rates. A 90%+ override rate is evidence your threshold is wrong, not evidence that clinicians are non-compliant.