← Home

Idempotency Keys Are the Seatbelt for AI Agents

Mar 27, 2026

TL;DR

If an AI agent can retry a write action, it can accidentally do that action twice.

Network timeouts, partial failures, and at-least-once delivery are normal in production systems—not edge cases. The practical fix is to require an idempotency key on side-effecting operations and have the server return the first result for duplicate keys.

Treat this as a default contract for tools that create, mutate, charge, send, or enqueue.

Context

Agent workflows increasingly chain external tools: payments, ticket creation, email sends, database writes, and queue operations. These are exactly the calls most likely to be retried when a timeout or transient error happens.

AWS’s reliability guidance is explicit: retries are useful for partial/transient failures, but retries can be unsafe when calls have side effects; APIs should be designed to be idempotent when safe retries matter. Stripe documents a concrete implementation pattern that many teams can copy: client-provided idempotency key + cached first response for that key.

In other words, this is not “AI safety theater.” It is basic distributed-systems hygiene that becomes more urgent when autonomous loops can issue retries at machine speed.

Key Points

1) Retries are normal, not exceptional

Real systems fail partially and transiently. Clients retry because retries often work.

That means any agent tool using HTTP, queues, or RPC will eventually retry something. If the operation is side-effecting, retries are a risk multiplier.

2) Duplicate effects are a design problem, not just a runtime bug

A timeout does not prove the action did not happen. The downstream system may have completed work even when the caller saw an error.

This creates the dangerous ambiguity: “did it fail, or did it succeed but we missed the response?”

3) Idempotency keys collapse ambiguity

With an idempotency key, the server treats repeated requests as the same logical operation.

Stripe’s approach is a strong baseline:

This pattern converts uncertain retry behavior into deterministic behavior.

4) At-least-once systems make idempotency mandatory

SQS standard queues explicitly warn that duplicates can happen and recommend idempotent consumers.

If your agent consumes queue events and triggers writes, dedupe cannot be optional. It is part of correctness.

5) Non-obvious insight: idempotency is also a cost-control primitive

Most teams frame idempotency as reliability. It is also a direct budget control:

A single idempotency key can prevent both operational incidents and surprise spend.

Steps / Code

Minimal contract for side-effecting tools

request:
  operation: "create_invoice"
  idempotency_key: "uuid-v4"
  payload_hash: "sha256(payload)"

server_rules:
  - if key unseen: execute, persist result, return result
  - if key seen and payload_hash matches: return persisted result
  - if key seen and payload_hash differs: return 409 conflict
  - expire old keys after retention window (e.g., 24h+ based on domain)

Agent policy (small but effective)

For every side-effecting action:
1) Generate one idempotency_key per logical intent.
2) Reuse the same key for retries of that intent.
3) Never reuse the key for a different payload.
4) Log key + operation + outcome for auditability.

Fast rollout checklist

[ ] Tag each tool as read-only vs side-effecting.
[ ] Enforce idempotency_key for side-effecting calls.
[ ] Add a storage table/cache for key -> first response.
[ ] Add conflict response for key+payload mismatch.
[ ] Add dashboard metric: duplicate-prevented count.

Trade-offs

Costs

  1. Extra storage and lookup for idempotency records.
  2. Slightly more API complexity (key lifecycle, retention, conflict semantics).
  3. Domain decisions on retention windows and scope (per user? per operation?).

Benefits

  1. Safe retries under flaky network conditions.
  2. Fewer duplicate writes/charges/messages.
  3. Better incident debugging via deterministic replay behavior.
  4. Lower operational and financial blast radius for autonomous agents.

References

Final Take

If you only add one reliability guardrail to agent tool calls this month, make it idempotency keys.

Retries are unavoidable. Duplicate side effects are optional.

Changelog