Published: 2026-04-29 · License: CC BY 4.0
Domain: PSF-7 — Security
Security
AI systems are software systems, and all conventional software security requirements apply. But AI systems also introduce attack surfaces that are unique to machine learning: model extraction, membership inference, adversarial examples, and the ability to manipulate model behaviour through carefully crafted inputs. PSF-7 addresses both layers.
The AI-Specific Threat Surface
An adversary queries your model repeatedly to reconstruct a functional equivalent. This is an intellectual property risk for proprietary models and a security risk if the reconstructed model is used to find adversarial inputs more efficiently.
An adversary determines whether a specific record was in the model's training data. This is a privacy risk when training data included personal data, and a confidentiality risk when it included proprietary business information.
Inputs specifically crafted to cause the model to produce a target incorrect output. In image classification, imperceptible pixel-level perturbations. In NLP, character-level substitutions that preserve human readability but change model predictions.
Adversarial inputs that cause the model to repeat or reveal its system prompt. System prompts often contain proprietary instructions, confidentiality requirements, or operational details that should not be disclosed to end users.
AI API keys (OpenAI, Anthropic, Google, etc.) are high-value credentials. Exposure allows an adversary to run queries at your cost, access your fine-tuned models, and potentially exfiltrate logged data. Rotation schedules and secret management practices matter.
AI systems depend on model providers, inference libraries, embedding models, and vector databases. A compromised dependency in any of these can affect model behaviour or expose data. Dependency provenance tracking is part of AI security.
AI Threat Modelling
Every production AI system should have a threat model that explicitly addresses the AI-specific attack surface. A conventional STRIDE threat model applied to an AI system will miss model extraction, membership inference, and adversarial input attacks because these do not map neatly to conventional threat categories. The threat model should be produced as part of the system design, updated when the system architecture changes, and reviewed when new attack techniques are published against similar systems.
API Key and Credential Management
AI API credentials are a concentrated risk. A single leaked key provides access to a model, potentially including fine-tuned versions, usage history, and (for some providers) data logged during inference. Best practices: use separate credentials per environment and per application, never embed credentials in source code or version-controlled configuration, store credentials in a secrets manager (not environment variables in process memory where avoidable), rotate credentials on a defined schedule, and monitor for unusual usage patterns that may indicate credential exposure.
PSF-7 Compliance Checklist
Red-Teaming Production AI Systems
Red-teaming — structured adversarial testing by a team attempting to find exploitable failures — is the AI security equivalent of penetration testing. For AI systems, red-teaming exercises should include: systematic prompt injection attempts (direct and indirect), system prompt extraction attempts, jailbreak attempts across known attack pattern categories, model extraction rate measurement, and boundary-testing for each defined out-of-scope use case. Red-teaming findings should be documented, triaged, and tracked to remediation. This is specifically tested by the CAIS certification.
AIDA Exam Tips for PSF-7
- PSF-7 covers AI-specific security, not general security. Questions that test whether you know prompt injection belongs to PSF-1 (input governance) vs. PSF-7 (security) are common — prompt injection defence at the input layer is PSF-1; the threat modelling and red-teaming that identifies it as a risk is PSF-7.
- API credential questions are pure PSF-7. Know: separate credentials per environment, secrets manager storage, rotation schedules, usage monitoring.
- System prompt exposure: the PSF-7 answer treats the system prompt as a secret and implements controls to prevent its disclosure in model outputs.
- Model extraction: the PSF-7 control is rate limiting on public endpoints (making extraction expensive) combined with usage monitoring (detecting extraction attempts).
- Supply chain: the PSF-7 angle on third-party AI dependencies is provenance and integrity verification — not just availability or performance.