Noisy neighbour problems are common in shared AI platforms. One tenant spikes usage, consumes token budgets, saturates provider limits, and everyone else experiences slowdowns or degraded quality. The fix is not just more capacity - it is fairness and isolation by design.
Define shared limits explicitly
Shared AI services need clear limits:
- Requests per minute per tenant.
- Token budgets per day per workflow.
- Tool-call budgets to prevent loops.
Quotas create predictable behaviour and reduce internal conflict (see quotas and rate limits).
Use isolation patterns where needed
Not every tenant needs the same isolation, but high-impact tenants often do:
- Dedicated queues. Separate queues for premium or regulated workloads.
- Separate indexes. For RAG, consider per-tenant indexes when needed (see permissions).
- Separate provider routes. Route premium tiers to reserved capacity (see routing and forecasting).
Fairness-aware scheduling
When load is high, scheduling determines who suffers. Practical strategies include:
- Weighted fair queues. Allocate a fair share to each tenant with weights for tier.
- Backpressure. Queue low-priority traffic with transparent messaging.
- Degradation by tier. Reduce context budgets or disable reranking for standard tiers first.
Feature flags can control these degradations quickly (see feature tiering).
Make noisy neighbour incidents observable
Capture per-tenant telemetry: latency, error rates, token usage and tool-call rates. Alert on fairness issues like queue starvation or sustained P95 latency spikes for multiple tenants (see telemetry schema and alerting and runbooks).
Use cost controls as reliability controls
Noisy neighbour problems often start as cost problems: one workflow becomes more expensive due to context growth or retries. Use anomaly detection and fast levers to contain it (see cost anomalies).
Multi-tenant AI works when fairness is explicit and enforced. Otherwise, every success story becomes someone else's outage.