Insights›Industry Playbooks

Industry Playbook · Healthcare

Healthcare AI Deployment Playbook

A practitioner guide to deploying AI in clinical and healthcare settings — covering HIPAA, FDA AI/ML guidance, NHS clinical safety requirements, and a full PSF domain mapping. Healthcare has the highest regulatory burden and the highest patient harm potential of any AI deployment context.

This is not legal or clinical advice. Healthcare AI deployment involves complex regulatory obligations that vary by jurisdiction, use case, and risk classification. Engage qualified healthcare regulatory counsel before deploying AI in clinical settings.

Regulatory Landscape

Healthcare AI sits at the intersection of data privacy law, medical device regulation, and general AI governance frameworks. Unlike financial services — where the regulatory question is primarily about fairness and systemic risk — healthcare AI regulation is fundamentally about patient safety and direct harm prevention.

Framework	Jurisdiction	Primary focus	PSF domains
HIPAA	US	PHI privacy, security, breach notification for AI systems handling patient data	D3, D7
FDA AI/ML SaMD	US	Pre-market authorisation, performance monitoring, predetermined change control plans	D2, D5, D6
EU AI Act — High Risk	EU	Medical devices and clinical management systems are listed as high-risk AI systems	D1–D8 (all)
CMS Interoperability	US	API access to health data — AI systems consuming this data inherit the obligations	D3, D8
NHS AI Framework	UK	Clinical safety assessment (DCB 0129), algorithmic transparency, fairness	D2, D6
ISO 14971	International	Risk analysis and control for medical AI — probability and severity of harm	D5, D6

The PHI Problem: Most Common AI Compliance Failure

The single most common AI compliance failure in healthcare is sending Protected Health Information (PHI) to a third-party LLM API without a signed Business Associate Agreement (BAA). This is a HIPAA violation with penalties up to $1.9M per violation category per year.

PHI includes: names, dates (except year), geographic data below state level, phone/fax numbers, email addresses, social security numbers, medical record numbers, account numbers, certificate numbers, URLs, IP addresses, device identifiers, biometric identifiers, full-face photographs, and any other unique identifier. The model doesn't need to store it — sending it is sufficient for a violation.

Mitigation: all major AI vendors (OpenAI, Anthropic, Google, Microsoft/Azure) offer BAAs for enterprise tiers. Sign one. Separately, implement PHI detection and redaction before sending to any external API — even under a BAA, sending only what is necessary is best practice (data minimisation).

PSF Domain Mapping for Healthcare

Every PSF domain applies in healthcare, but three (D1, D2, D3) are elevated to Critical status due to direct patient harm potential. The following analysis maps each PSF domain to the healthcare context with specific regulatory touchpoints.

PSF-1 Input Governance

Critical

Clinical AI systems receive inputs from EHR systems, clinician free-text, and imaging pipelines. Prompt injection via malformed clinical notes is a documented attack vector. Schema validation on all HL7 FHIR inputs is non-negotiable.

Validate all EHR data against FHIR R4/R5 schema before passing to the model
Treat free-text clinical notes as untrusted input — sanitise before inclusion in prompts
Implement strict system prompt isolation; clinical context must not be manipulated by patient-submitted text
Rate-limit API access; abuse detection tuned for clinical access patterns

PSF-2 Output Validation

Critical

Clinical decision support outputs must be validated before display. A hallucinated drug dosage or contraindication assessment can cause direct patient harm. Output contracts with medical coding validation (ICD-10, SNOMED CT) are required.

Implement output contracts for all clinical recommendations — structured JSON with validation schema
Validate clinical codes (ICD-10, SNOMED CT, LOINC) against authoritative terminologies
Set confidence thresholds — below threshold: require human review, do not surface recommendation
Never allow the model to generate dosing instructions in free-text without validation

PSF-3 Data Protection

Critical — HIPAA

PHI (Protected Health Information) is the most heavily regulated personal data category. Sending PHI to a third-party LLM API without a BAA (Business Associate Agreement) is a HIPAA violation. This is the most common AI deployment compliance failure in healthcare.

Sign BAAs with every AI vendor receiving PHI — OpenAI, Anthropic, Google, Microsoft all offer these
Implement PHI detection and redaction before sending to any external API (use Microsoft Presidio or AWS Comprehend Medical)
Never log raw clinical inputs — PHI in logs is a breach
Audit trace retention policies — LangSmith, Langfuse retain prompts by default; configure data deletion
For imaging AI: DICOM metadata stripping required before external processing

PSF-4 Observability

Required

FDA AI/ML guidance requires performance monitoring throughout the product lifecycle. Audit logs for all AI-assisted clinical decisions are required for post-market surveillance and incident investigation. Logs must be HIPAA-compliant — no PHI in observability data.

Log all AI recommendations with timestamps, confidence scores, and clinician actions
Implement drift detection — clinical AI degrades as patient population shifts
Configure HIPAA-safe logging: strip PHI from all trace data before storage
Retain audit logs for minimum 6 years (HIPAA), or 10 years for medical devices

PSF-5 Deployment Safety

Required

Healthcare AI must have well-defined blast radius controls. Clinical decision support should operate at L2–L3 autonomy (recommendation with human approval) for high-stakes decisions. L4 autonomous action is only appropriate for clearly bounded, low-risk clinical tasks.

Define autonomy levels per clinical task: triage classification (L3) ≠ treatment recommendation (L2) ≠ order entry (L2 minimum)
Implement rollback procedures — ability to revert to previous model version within 4 hours
Staged deployment: shadow mode first, then limited cohort, then full deployment with monitoring
Document predetermined change control plan (PCCP) for FDA SaMD compliance

PSF-6 Human Oversight

Critical — Patient Safety

Clinical decision support requires meaningful human oversight. 'Alert fatigue' is the primary failure mode — too many AI recommendations cause clinicians to override without reviewing. The oversight design must be calibrated to clinical workflow, not just regulatory compliance.

Design oversight for clinical context: busy clinician workflow ≠ IT operator dashboard
Implement tiered alerting: critical (immediate interrupt) vs. advisory (end of note review)
Blind sampling: regularly send AI recommendations for human review without the AI label
Track override rates by recommendation type — high override = low clinical trust = model issue
Escalation paths must go to supervising clinician, not just IT

PSF-7 Security

Required

Healthcare is the most targeted sector for ransomware and data theft. AI systems are a new attack surface — model poisoning, adversarial clinical note injection, and API credential theft all have direct patient safety implications.

Treat AI API keys as PHI-equivalent credentials — store in secrets manager, rotate quarterly
Adversarial testing: red-team the clinical AI with adversarially crafted clinical notes
Monitor for model extraction attacks — unusual query patterns on clinical AI APIs
Zero-trust network architecture for AI API traffic

PSF-8 Vendor Resilience

Required

Clinical workflows cannot tolerate AI vendor outages. NHS and hospital IT have experienced AI vendor failures causing clinical disruption. Fallback procedures must be clinically tested, not just technically documented.

Dual-vendor strategy for critical clinical AI paths
Graceful degradation: define which clinical decisions revert to manual process on AI failure
SLA requirements: 99.9% uptime minimum for clinical decision support; 99.99% for anything in the care pathway
Test the fallback: quarterly drill of 'AI is unavailable' clinical workflow

Clinical vs Administrative AI: Different Risk Profiles

Not all healthcare AI is equal. A chatbot answering FAQs about appointment booking has a fundamentally different risk profile from a clinical decision support tool suggesting diagnoses. Practitioners must be explicit about which category they are deploying.

Clinical AI (High Risk)

⚠Clinical decision support

⚠Diagnostic assistance

⚠Treatment planning

⚠Medication management

⚠Risk stratification

⚠Patient triage

Requires: BAA, FDA SaMD consideration, clinical safety assessment, full PSF compliance

Administrative AI (Lower Risk)

✓Appointment scheduling

✓Billing code suggestions

✓Prior authorisation drafts

✓Patient communication

✓Staff training content

✓Operational analytics

Still requires: BAA (if PHI involved), D3 compliance, human review for high-stakes outputs

Recommended Autonomy Levels by Clinical Task

PSF-5 defines five autonomy levels (L0–L4). In healthcare, L3 (recommendation with mandatory human approval before action) is the maximum appropriate level for most clinical AI tasks. L4 autonomous action should be limited to low-risk, well-bounded tasks with extensive validation history.

Diagnostic imaging classification (abnormal/normal triage)

Human radiologist reviews all AI-flagged and a sample of AI-cleared cases

Clinical documentation assistance (note summarisation)

Clinician reviews and approves before saving to EHR

Drug interaction checking

Alert only — clinician makes all prescribing decisions

Patient risk stratification

Care team reviews risk scores before care plan changes

Appointment reminder messages

Low risk, well-bounded — autonomous with audit log

Prior authorisation content drafting

Clinical staff reviews before submission

Alert Fatigue: The Oversight Failure Mode Unique to Healthcare

Healthcare has a well-documented problem that is now directly relevant to AI deployment: alert fatigue. Studies show clinicians override up to 95% of drug interaction alerts — not because they are wrong, but because there are too many. AI systems that generate too many recommendations, flags, or warnings will be systematically ignored.

This is a PSF-6 (Human Oversight) failure mode, but it manifests as a PSF-2 (Output Validation) design problem. The solution is calibrated confidence thresholds: only surface recommendations above a high confidence bar, and continuously tune the threshold based on clinician override rates. A 90%+ override rate is evidence your threshold is wrong, not evidence that clinicians are non-compliant.

Minimum Viable Healthcare AI Compliance Checklist

Legal☐ BAA signed with every AI vendor that receives PHI

Legal☐ Legal review: is this use case SaMD under FDA rules?

PSF-3☐ PHI detection and redaction in the data pipeline before model calls

PSF-3☐ Logging configured to strip PHI — no raw inputs in observability tools

PSF-1☐ Input validation on all EHR/FHIR data entering the AI system

PSF-2☐ Output contracts defined with structured schema for all clinical recommendations

PSF-2☐ Confidence thresholds set — below threshold = human review, not surfaced to clinician

PSF-6☐ Oversight design reviewed by clinical staff — not just IT

PSF-5☐ Autonomy level documented per clinical task type

PSF-4☐ Audit logging for all AI-assisted decisions, retained 6+ years

PSF-7☐ AI API credentials in secrets manager, not in code or config files

PSF-8☐ Fallback clinical procedure documented and tested for AI unavailability

Healthcare AI Deployment Playbook

Regulatory Landscape

The PHI Problem: Most Common AI Compliance Failure

PSF Domain Mapping for Healthcare

PSF-1 Input Governance

PSF-2 Output Validation

PSF-3 Data Protection

PSF-4 Observability

PSF-5 Deployment Safety

PSF-6 Human Oversight

PSF-7 Security

PSF-8 Vendor Resilience

Clinical vs Administrative AI: Different Risk Profiles

Recommended Autonomy Levels by Clinical Task

Alert Fatigue: The Oversight Failure Mode Unique to Healthcare

Minimum Viable Healthcare AI Compliance Checklist

Related guides

You understand the gaps.
Get the credential that proves it.

Healthcare AI Deployment Playbook

Regulatory Landscape

The PHI Problem: Most Common AI Compliance Failure

PSF Domain Mapping for Healthcare

PSF-1 Input Governance

PSF-2 Output Validation

PSF-3 Data Protection

PSF-4 Observability

PSF-5 Deployment Safety

PSF-6 Human Oversight

PSF-7 Security

PSF-8 Vendor Resilience

Clinical vs Administrative AI: Different Risk Profiles

Recommended Autonomy Levels by Clinical Task

Alert Fatigue: The Oversight Failure Mode Unique to Healthcare

Minimum Viable Healthcare AI Compliance Checklist

Related guides

You understand the gaps.Get the credential that proves it.

Get framework updates in your inbox

You understand the gaps.
Get the credential that proves it.