AI support is different to traditional software support. A bug report like "the answer is wrong" does not map to a stack trace. And an AI feature can appear healthy while producing harmful or unhelpful outputs.
A support playbook reduces that ambiguity. It standardises what to capture, how to triage, when to escalate, and how to convert incidents into durable improvements.
Use a shared triage taxonomy
Start with a simple set of categories that map to system layers (see error taxonomy):
- Quality. Incorrect, incomplete, or misleading answers.
- RAG retrieval. Missing sources, stale content, wrong citations.
- Policy and safety. Over-refusal, under-refusal, sensitive disclosures.
- Tooling. Wrong tool arguments, timeouts, duplicate actions.
- Cost and latency. Slow workflows, token spikes, retries.
Standardise the case capture form
Most AI support tickets are non-actionable because the evidence is missing. Require:
- Timestamp and user/tenant context.
- Workflow or feature name, and user intent if known.
- Session/request ID (critical for tracing).
- What the user expected and what happened.
- Whether the output had citations or tool actions.
With request IDs, engineers can reconstruct what happened using telemetry (see telemetry schema).
Define escalation rules
Support teams should not guess when to page on-call. Create explicit triggers:
- Safety incidents. Disallowed content, data leakage, or unsafe actions (see incident response).
- Tool incidents. Irreversible actions executed incorrectly (see tool reliability).
- Cost incidents. Sudden spend spikes or runaway retries (see cost anomaly detection).
- Trust incidents. A high-volume pattern of incorrect answers for a critical workflow.
Close the loop into quality and monitoring
Support is a signal generator. Convert recurring tickets into:
- New regression cases (see regression testing).
- New golden queries for synthetic monitoring (see synthetic monitoring).
- Content review or ingestion fixes for RAG (see content review workflows).
- UX improvements that clarify uncertainty and constraints (see user transparency).
Communicate with users consistently
When behaviour changes, users should not discover it by surprise. Use short release notes and clear messaging for major changes (see release notes).
Support quality improves quickly when evidence is consistent and triage is standardised.