The Dry-Run + Idempotency Approval Ladder for AI Agents
TL;DR
If your agent can take actions, use a three-step execution ladder:
- Dry-run first (show intended change without applying it).
- Idempotent write next (safe retry without duplicate side effects).
- Approval gate for irreversible or external actions.
This pattern borrows from mature infrastructure/payment systems and gives agent workflows a much lower failure blast radius.
Context
Many teams add approval prompts to agent workflows but skip two earlier controls: simulation and safe retries.
That creates a fragile system:
- “Looks right” plans can still fail at apply time.
- Network retries can duplicate side effects.
- Humans are forced to review too many low-value decisions.
Established platforms already solved analogous problems:
- Kubernetes supports dry-run modes (
client/server) to preview applies. - AWS APIs expose idempotency tokens to make retries safer.
- Stripe documents idempotency keys for POST operations.
- NIST AI RMF frames AI safety as an operational risk-management problem, not just a model-quality problem.
The useful synthesis for agent design is a control ladder rather than a single “human in the loop” checkbox.
Key Points
1) Dry-run should be the default planning surface
Before any side effect, require the agent to produce a structured action preview:
- target resource,
- intended change,
- expected downstream effect,
- confidence/uncertainty notes.
If no meaningful dry-run representation exists, treat the action as higher risk by default.
2) Idempotency converts retries from dangerous to routine
Agents operate in noisy environments (timeouts, flaky APIs, retries). Without idempotency, retries can create duplicate charges, duplicate tickets, or repeated mutations.
Use a request identity key per logical action and persist it with result metadata. On retry, the system should return the original outcome when parameters match.
3) Approval should be reserved for irreversible/external impact
Human review is expensive. Use it where it matters most:
- destructive mutations (delete/overwrite/force operations),
- external communications (email/message/payment),
- high-privilege actions (production, secrets, policy changes).
This keeps reviewers focused on true risk, not noise.
4) The ladder improves both safety and throughput
Counterintuitively, adding control points can speed teams up:
- fewer rollback incidents,
- fewer duplicate side effects from retries,
- clearer operator trust in autonomous steps,
- less blanket “manual mode” fallback.
5) Log each rung as a first-class event
For every action attempt, record:
- dry-run artifact hash/reference,
- idempotency key,
- approval decision (if required),
- final execution result.
This creates a usable audit trail and shortens incident response.
Steps / Code
10-minute implementation pass
Minute 0-2: Classify actions as read-only, reversible write, irreversible/external
Minute 2-4: Add dry-run output schema for every write-capable action
Minute 4-6: Add idempotency keys to non-idempotent POST/mutation calls
Minute 6-8: Add approval gate for irreversible/external classes
Minute 8-10: Emit structured execution logs for each ladder rung
Minimal ladder policy sketch
{
"default": "deny",
"ladder": ["dry_run", "idempotent_execute", "approval_if_high_impact"],
"high_impact": ["delete", "force", "external_send", "payment", "prod_change"],
"require": {
"dry_run": ["target", "change", "expected_effect"],
"idempotent_execute": ["request_key", "params_hash"]
}
}
Execution rule
If dry-run is missing OR idempotency cannot be guaranteed for a risky write,
block execution and request human confirmation.
Trade-offs
Costs
- Extra implementation effort for dry-run artifacts and key management.
- More policy plumbing across tool wrappers.
- Initial friction while teams classify action risk correctly.
Benefits
- Lower blast radius from bad plans and flaky retries.
- Fewer duplicate side effects in real systems.
- More targeted human oversight where reversibility is low.
- Better auditability and post-incident debugging.
References
- Kubernetes docs,
kubectl apply(--dry-runclient/server): https://kubernetes.io/docs/reference/kubectl/generated/kubectl_apply/ - AWS EC2 API (
RunInstances,ClientTokenidempotency): https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_RunInstances.html - Stripe API docs, idempotent requests: https://docs.stripe.com/api/idempotent_requests
- NIST, AI Risk Management Framework overview: https://www.nist.gov/itl/ai-risk-management-framework
Final Take
“Human in the loop” is too coarse by itself.
A better default for agent actions is: simulate first, execute safely, escalate only when impact is hard to reverse. If you implement just one reliability upgrade this week, make it this ladder.
Changelog
- 2026-03-23: Initial publish with dry-run + idempotency + approval ladder pattern for tool-using agents.