LLM costs usually do not grow smoothly. They jump. A new prompt version increases context size, a retry loop starts cascading, or a single workflow is adopted faster than planned. Without cost anomaly detection, the first signal is often the invoice.
Cost anomaly detection is not just FinOps. It is an operational safety control.
Start with attribution or you cannot diagnose
The most important prerequisite is request-level attribution. For every request, capture: tenant, feature/workflow, model/provider, prompt version, context size, tool usage and retries (see LLM FinOps and chargeback and usage analytics).
Without this, you cannot answer the question that matters during an incident: what changed?
Define anomaly signals that map to causes
Useful anomaly signals are leading indicators, not monthly totals:
- Token per task. Often driven by prompt/context growth or retrieval changes.
- Retries per request. A common hidden multiplier during provider instability.
- Tool-call volume. Tool loops or runaway agents can explode cost quickly.
- High-cost intents. A small set of workflows usually dominates spend.
Put guardrails in the runtime
Detection is not enough. You need fast levers to stop the bleed:
- Rate limits and quotas. Per tenant and per workflow (see quotas).
- Budget-based routing. Switch to a cheaper model or smaller context when budgets are exceeded (see routing and failover).
- Feature flags. Disable expensive features such as reranking or tool use temporarily (see feature flags).
- Context throttles. Cap retrieved chunks and tool output size (see incident response).
Use a cost incident runbook
Cost anomalies should be treated like incidents. A practical triage flow:
- Identify the top spenders by tenant/workflow/model.
- Check recent changes: prompt version, retrieval configuration, tool enablement.
- Look for retry spikes or provider error rates.
- Apply guardrails: quotas, routing, or temporary feature disablement.
- Document a postmortem and update controls so the same class of spike is prevented next time.
Make cost visible to product decisions
Long-term cost control is a product capability. If a workflow has a high unit cost, decide whether to redesign, cache, or restrict it to premium tiers. Transparent cost messaging can preserve trust while protecting budgets (see FinOps).
When cost is observable and controllable, AI programs scale with confidence instead of surprises.