AI Operations ยท Practical

LLM Capacity Forecasting: Scenarios, Budgets and Provider Commitments

Amestris — Boutique AI & Technology Consultancy

LLM capacity is not like traditional infrastructure capacity. The unit cost depends on context size, retrieval behaviour, tool calls, retries, and model choice. Forecasting is still possible, but you need scenario thinking and explicit assumptions.

Start with unit economics per workflow

Forecasting begins with unit economics: cost and latency per task for each major workflow. Capture baseline token usage, retrieval hits, tool calls, and retry rates (see LLM FinOps and usage analytics).

Build adoption scenarios

Use three scenarios (base, expected, aggressive) and model them as:

  • Active users by tenant/team.
  • Requests per user per day (by workflow).
  • Average context size and retrieval configuration.
  • Tool-call rate and expected retries.

This produces a forecast that explains why costs move, not just the total.

Include caching and routing assumptions

Capacity planning changes materially with caching and routing:

  • Caching. Cache embeddings and safe artefacts to reduce repeat work (see caching strategies).
  • Routing. Route low-risk intents to cheaper models; reserve premium models for high-value tasks (see routing and failover).
  • Latency budgets. Ensure the plan meets user-perceived speed needs (see latency engineering).

Translate forecasts into operational controls

Forecasts are only useful if they create levers:

  • Budgets. Per product/tenant/workflow, with alerting on trend breaks.
  • Quotas and rate limits. Prevent one workload from starving others (see quotas).
  • Feature flags. Enable expensive features gradually (see feature flags).
  • Cost incident runbooks. A fast response plan when spend spikes (see cost anomaly detection).

Negotiate provider commitments with flexibility

Once you can forecast, you can negotiate. Aim for commitments that preserve portability: region guarantees, clear retention terms, and the ability to shift workloads between models or providers (see vendor exit strategy).

Capacity forecasting is not about perfect prediction. It is about making adoption growth controllable and sustainable.

Quick answers

What does this article cover?

How to forecast LLM usage and cost using scenarios and then translate forecasts into budgets, quotas, and routing controls.

Who is this for?

Product, platform and finance teams planning LLM adoption who need predictable costs and reliable capacity.

If this topic is relevant to an initiative you are considering, Amestris can provide independent advice or architecture support. Contact hello@amestris.com.au.