What does this article cover?

How to layer AI controls across preprocessing, in-model policies, postprocessing checks, and tool authorisation to reduce risk.

Platform and risk teams designing defence-in-depth controls for AI systems that cannot rely on prompts alone.

Policy Layering for AI Systems: Pre, In, and Post Controls

Many organisations try to solve AI risk with a single layer: a system prompt, a moderation API, or a policy document. In practice, reliable control comes from layering—multiple mechanisms that reduce risk even when one layer fails.

Policy layering is defence-in-depth for AI: controls before the model, controls within the model interaction, and controls after the model—especially around tool use.

Preprocessing controls: reduce what enters the model

Data minimisation. Redact and tokenise sensitive fields (see data minimisation).
Prompt injection detection. Detect obvious injection attempts and route to safer flows (see prompt injection defence).
Intent classification. Identify high-risk intents early and apply step-up controls.

In-interaction controls: constrain behaviour

System policies. Clear boundaries and refusal rules (versioned with change control).
Structured outputs. Force machine-parseable responses for operational flows (see structured outputs).
Context discipline. Include only relevant evidence and keep token budgets bounded (see context engineering).

Postprocessing controls: verify and filter

Validation. Validate output schemas and tool arguments.
Output scanning. Detect sensitive disclosures or unsafe content.
Grounding checks. Confirm citation coverage and relevance (see citations).

Tool authorisation: the final control point

For agentic systems, the most important layer is tool authorisation. Tools must be executed only after deterministic policy checks (see tool authorisation). Prompts cannot enforce permissions.

Layered controls are how organisations move fast with AI while keeping risk bounded and explainable.