Many organisations try to solve AI risk with a single layer: a system prompt, a moderation API, or a policy document. In practice, reliable control comes from layering—multiple mechanisms that reduce risk even when one layer fails.
Policy layering is defence-in-depth for AI: controls before the model, controls within the model interaction, and controls after the model—especially around tool use.
Preprocessing controls: reduce what enters the model
- Data minimisation. Redact and tokenise sensitive fields (see data minimisation).
- Prompt injection detection. Detect obvious injection attempts and route to safer flows (see prompt injection defence).
- Intent classification. Identify high-risk intents early and apply step-up controls.
In-interaction controls: constrain behaviour
- System policies. Clear boundaries and refusal rules (versioned with change control).
- Structured outputs. Force machine-parseable responses for operational flows (see structured outputs).
- Context discipline. Include only relevant evidence and keep token budgets bounded (see context engineering).
Postprocessing controls: verify and filter
- Validation. Validate output schemas and tool arguments.
- Output scanning. Detect sensitive disclosures or unsafe content.
- Grounding checks. Confirm citation coverage and relevance (see citations).
Tool authorisation: the final control point
For agentic systems, the most important layer is tool authorisation. Tools must be executed only after deterministic policy checks (see tool authorisation). Prompts cannot enforce permissions.
Layered controls are how organisations move fast with AI while keeping risk bounded and explainable.