Idempotency is a mathematical property: applying an operation multiple times produces the same result as applying it once. For APIs, an idempotent endpoint returns the same response for duplicate requests with the same idempotency key. This allows clients to safely retry requests after network failures without the risk of duplicate side effects.
Why This Matters for AI APIs
AI inference requests are expensive and slow. Timeouts are common. Without idempotency, a timeout leaves your application uncertain whether the model call was processed (and charged) or not. If you retry without an idempotency key, you may pay twice for a response you already received. If you do not retry, you may miss a completed response.
Implementation
import hashlib
import json
# Client: generate a stable key for this logical request
def make_idempotency_key(user_id: str, request_hash: str) -> str:
payload = f"{user_id}:{request_hash}"
return hashlib.sha256(payload.encode()).hexdigest()
# API server: cache and replay
async def handle_request(key: str, handler):
cached = await cache.get(f"idempotency:{key}")
if cached:
return cached # Return exactly the same response
response = await handler()
await cache.set(f"idempotency:{key}", response, ttl=86400)
return responseImplementation Details That Matter
- ▸Store the full response, not just a success flag — the client expects the same response body on retry
- ▸Use a 24-hour TTL on idempotency records — most retry storms resolve within minutes, 24 hours covers network partitions
- ▸Scope keys to the authenticated user — a key from user A should not replay as a response to user B
- ▸Return 409 Conflict if the same key is used with different request parameters — this indicates a client bug
- ▸The idempotency store must be durable — an in-memory cache that loses state on restart invalidates the guarantee
