← Home

Rollback Budget: The Missing Guardrail in LLM Rollouts

Apr 5, 2026

TL;DR

Most teams define launch criteria but not rollback criteria. A rollback budget (how much degradation you tolerate, for how long) makes incident response faster and protects users from slow-motion reliability drift.

Context

In many LLM deployments, shipping is disciplined but rollback is improvised. Teams have canary gates, dashboards, and scorecards—yet when things degrade after release, they debate too long because nobody agreed on the “stop loss” rule.

A rollback budget closes that gap.

Key Points

1) A rollback budget is a stop-loss, not a punishment

It is a pre-agreed reliability envelope that says: if service quality falls beyond X for Y minutes, rollback is automatic unless an incident commander explicitly overrides.

2) Use user-impact metrics first

Budget triggers should be tied to outcomes users feel:

Internal metrics still matter, but they should not be the only triggers.

3) Define both magnitude and duration

Avoid noisy reversals by requiring both:

This avoids rolling back on random blips while still stopping persistent harm.

4) Separate auto-rollback from manual review

Not every breach needs instant rollback. Use tiers:

5) Log every budget breach as eval debt

If a breach happened in production but wasn’t caught pre-release, create new eval cases immediately. Otherwise incidents repeat.

Steps / Code

Minimal rollback budget spec

release: model-v42
baseline: model-v41
window: 24h

rollback_budget:
  quality:
    task_success_delta_pct: -2.0
    max_duration_min: 20
  safety:
    high_severity_incidents_increase: 0
    action: immediate_rollback
  latency:
    p95_delta_pct: +15
    max_duration_min: 30
  escalation:
    human_handoff_delta_pct: +8
    max_duration_min: 20

decision:
  tier_a_breach: auto_rollback
  tier_b_breach: incident_commander_decision_within_15m

Operating rule

  1. Set budget before rollout.
  2. Attach owner/on-call.
  3. Enforce trigger automatically where possible.
  4. Convert breaches into new eval coverage.

Trade-offs

References

Final Take

A launch gate tells you when to ship. A rollback budget tells you when to stop. Mature LLM operations require both.

Changelog