Idempotency Key API Boundaries

Prerequisites you should know:

Key and Fingerprint

Key: a retry-unit identifier generated by the client. Typically a high-entropy opaque string such as a UUID.
Fingerprint: a hash of the operation's content (e.g., a hash of the payment amount plus product ID).

Combining both prevents bugs or attacks where different content arrives under the same Key.

Durable Replay

Even if the server crashes and restarts, a request with the same Key must return the same result. The server must persist the result of each request to disk or a database so that idempotency is guaranteed across restarts. In other words, the storage must be durable, not volatile.

Failure Release

You must prevent a request from being stuck in "processing" forever. If the server goes down while handling a request, that Key may remain in a "processing" state indefinitely. To prevent this: TTL (Time-To-Live): expire the "processing" state after a set amount of time. Explicit failure handling:

Failures where it is certain the operation did not execute should allow retries via failed_retryable or eviction (evict).
Failures where the operation may or may not have executed should be left as unknown and reconciled.

Introduction

An Idempotency Key is an API boundary pattern that lets a client safely retry the same request by allowing the server to determine "has this request already been processed?" HTTP itself defines idempotent methods as ones where sending the same request multiple times must produce the same intended effect on the server as sending it once. Under RFC 9110, PUT, DELETE, and safe methods are idempotent, but POST is typically used for resource creation or command execution, so without a separate mechanism it carries the risk of duplicate execution.¹ The Idempotency-Key request header, which makes POST/PATCH safe to retry, has been discussed by the IETF HTTPAPI WG as a Standards Track Internet-Draft (draft-ietf-httpapi-idempotency-key-header). Because this draft is a document that expires and is renewed, you should verify its current status and each API provider's actual implementation syntax before use.²

The typical problem arises when the network fails ambiguously (described in more precise terms as uncertain external I/O). Suppose a client sends a request to create a payment, the server actually creates the payment, but the connection drops before the response is sent. From the client's perspective, there is no way to know whether the operation succeeded or failed. If the client sends the same POST /payments again, the payment could be created twice. An idempotency key is a mechanism for narrowing this gray zone where the operation may or may not have succeeded.

Stripe's documentation explains that using an idempotency key allows you to repeat the same request even after a connection error, and that the server stores the result of the first request with a given key and returns that same result for subsequent requests with the same key. It also recommends using a value with sufficient entropy such as a UUID, and advises against including sensitive information like email addresses or personally identifiable information. It is also important to note that sending different parameters under the same key should be treated as misuse.³

The core of this pattern is not so much "preventing duplicate requests" as it is identifying the same unit of work and reusing the same result for the same key and the same fingerprint. It was introduced to handle cases where matching keys alone should not unconditionally be treated as the same request. If a payment sent with amountCents=1000 and a payment sent with amountCents=9000 share the same key, that is not a retry but a client bug or a potential attack.

In other words, even with the same key, a different payload must be treated as a different operation, and that is precisely why this concept emerged. It can be summarized as semantically identifying the same operation and reusing its result.

Key formula to remember:

Text

Idempotency Key     = Identifier for the same logical unit of work being retried
Request Fingerprint = Semantic request hash used to prevent key misuse (canonical JSON + SHA-256)
Reserve -> Execute -> Complete | RetryableFail | Unknown
= Reserve execution ownership, then finalize the result into a replay/retry/reconcile-capable state
Fail(evict)         = Release the reservation only for failures confirmed as not executed (prevents zombie keys)
Unknown/Reconcile   = Do not evict failures that may have executed, such as gateway timeouts; send them to reconciliation
Replay              = Return the stored response for the same key + same fingerprint
Boundary Policy     = Separate domain-level duplication from infrastructure failure

1. The Problem

Suppose you are building a payment creation API.

Text

POST /payments
Idempotency-Key: 550e8400-e29b-41d4-a716-446655440000
{
  "customerId": "cus-1",
  "amountCents": 12000,
  "currency": "KRW"
}

The example above follows the Stripe-style opaque string convention (no quotes).

A note on header syntax: if you strictly follow the IETF draft (-07), Idempotency-Key is an RFC 8941 Structured Header String, so the value must be quoted (e.g., Idempotency-Key: "550e8400-e29b-41d4-a716-446655440000"). In contrast, Stripe and many other production APIs have historically used unquoted opaque strings. Implementations should therefore be explicit about whether they target IETF draft syntax or the de facto syntax of a specific provider.²⁴

When a client encounters a timeout and resends the same request, the server must determine one of the following.

Text

New key                                                 -> Execute payment creation
Completed with the same key + same fingerprint          -> Replay the stored response
Same key still in progress                              -> 409 Conflict or 202 Accepted + Retry-After
Same key with a different fingerprint                   -> Idempotency key misuse (422)
Previous execution confirmed to have no side effect     -> Allow retry
Previous execution result is unknown                    -> Disallow retry and reconcile

This card covers a single domain: protecting a payment creation POST request with an idempotency key.

2. Key Expressions

The shared example uses an in-memory idempotency store. In production, this store should be replaced with a database, Redis, DynamoDB, a PostgreSQL unique constraint, transactions, and so on. Here, the same state model is expressed in four languages purely for syntax mapping purposes, and it includes both concurrency (locking) and failure release (fail).

Note: this in-memory example is a minimal model for syntax mapping and expresses only InProgress/Completed. The production state model (failed_retryable, unknown, expired) is covered in Section 7 on durable stores.

C#

using System;
using System.Collections.Generic;

public enum IdempotencyStatus { InProgress, Completed }

public enum IdempotencyDecisionKind { Execute, Replay, InProgress, KeyMisuse }

public sealed record ApiResponse(int StatusCode, string Body);

public sealed record IdempotencyDecision(
    IdempotencyDecisionKind Kind,
    ApiResponse? Response);

public sealed class IdempotencyEntry
{
    public IdempotencyEntry(string fingerprint)
    {
        this.Fingerprint = fingerprint;
        this.Status = IdempotencyStatus.InProgress;
    }

    public string Fingerprint { get; }
    public IdempotencyStatus Status { get; private set; }
    public ApiResponse? Response { get; private set; }

    public void Complete(ApiResponse response)
    {
        this.Response = response;
        this.Status = IdempotencyStatus.Completed;
    }
}

public sealed class InMemoryIdempotencyStore
{
    private readonly object gate = new();
    private readonly Dictionary<string, IdempotencyEntry> entries = new();

    public IdempotencyDecision Reserve(string key, string fingerprint)
    {
        lock (this.gate)
        {
            if (!this.entries.TryGetValue(key, out IdempotencyEntry? entry))
            {
                this.entries[key] = new IdempotencyEntry(fingerprint);
                return new IdempotencyDecision(IdempotencyDecisionKind.Execute, null);
            }
            if (entry.Fingerprint != fingerprint)
            {
                return new IdempotencyDecision(IdempotencyDecisionKind.KeyMisuse, null);
            }
            if (entry.Status == IdempotencyStatus.InProgress)
            {
                return new IdempotencyDecision(IdempotencyDecisionKind.InProgress, null);
            }
            return new IdempotencyDecision(IdempotencyDecisionKind.Replay, entry.Response);
        }
    }

    public void Complete(string key, ApiResponse response)
    {
        lock (this.gate)
        {
            this.entries[key].Complete(response);
        }
    }

    // Remove the reservation only for failures where non-execution is certain, allowing retry with the same key.
    // Failures where execution status is unclear are handled as unknown/reconcile in the operational model.
    public void Fail(string key)
    {
        lock (this.gate)
        {
            if (this.entries.TryGetValue(key, out IdempotencyEntry? entry)
                && entry.Status == IdempotencyStatus.InProgress)
            {
                this.entries.Remove(key);
            }
        }
    }
}

TypeScript

type ApiResponse = Readonly<{ statusCode: number; body: unknown }>;

type IdempotencyStatus = "inProgress" | "completed";

type IdempotencyDecision =
  | Readonly<{ kind: "execute" }>
  | Readonly<{ kind: "replay"; response: ApiResponse }>
  | Readonly<{ kind: "inProgress" }>
  | Readonly<{ kind: "keyMisuse" }>;

type IdempotencyEntry = {
  fingerprint: string;
  status: IdempotencyStatus;
  response?: ApiResponse;
};

export class InMemoryIdempotencyStore {
  readonly #entries = new Map<string, IdempotencyEntry>();

  public reserve(key: string, fingerprint: string): IdempotencyDecision {
    const entry = this.#entries.get(key);
    if (entry === undefined) {
      this.#entries.set(key, { fingerprint, status: "inProgress" });
      return { kind: "execute" };
    }
    if (entry.fingerprint !== fingerprint) {
      return { kind: "keyMisuse" };
    }
    if (entry.status === "inProgress") {
      return { kind: "inProgress" };
    }
    return { kind: "replay", response: entry.response! };
  }

  public complete(key: string, response: ApiResponse): void {
    const entry = this.#entries.get(key);
    if (entry === undefined) {
      throw new RangeError("This is an unreserved idempotency key.");
    }
    entry.status = "completed";
    entry.response = response;
  }

  public fail(key: string): void {
    const entry = this.#entries.get(key);
    if (entry !== undefined && entry.status === "inProgress") {
      this.#entries.delete(key);
    }
  }
}

(Note: in Node's single-process event loop, reserve is synchronous without await, so the check-then-set is atomic. However, in a multi-instance or scaled-out environment, this in-memory Map breaks immediately, which is why the durable store covered in Section 7 is necessary.)

Python⁵

Python

import threading
from dataclasses import dataclass
from enum import Enum


@dataclass(frozen=True)
class ApiResponse:
    status_code: int
    body: object


class IdempotencyStatus(Enum):
    IN_PROGRESS = "in_progress"
    COMPLETED = "completed"


class IdempotencyDecisionKind(Enum):
    EXECUTE = "execute"
    REPLAY = "replay"
    IN_PROGRESS = "in_progress"
    KEY_MISUSE = "key_misuse"


@dataclass
class IdempotencyEntry:
    fingerprint: str
    status: IdempotencyStatus
    response: ApiResponse | None = None


@dataclass(frozen=True)
class IdempotencyDecision:
    kind: IdempotencyDecisionKind
    response: ApiResponse | None = None


class InMemoryIdempotencyStore:
    def __init__(self) -> None:
        self._entries: dict[str, IdempotencyEntry] = {}
        # The GIL only guarantees single-bytecode atomicity. Compound operations like get -> if -> set are
        # not atomic as check-then-act sequences, so protect the critical section with a Lock.
        self._lock = threading.Lock()

    def reserve(self, key: str, fingerprint: str) -> IdempotencyDecision:
        with self._lock:
            entry = self._entries.get(key)
            if entry is None:
                self._entries[key] = IdempotencyEntry(
                    fingerprint=fingerprint,
                    status=IdempotencyStatus.IN_PROGRESS,
                )
                return IdempotencyDecision(kind=IdempotencyDecisionKind.EXECUTE)
            if entry.fingerprint != fingerprint:
                return IdempotencyDecision(kind=IdempotencyDecisionKind.KEY_MISUSE)
            if entry.status == IdempotencyStatus.IN_PROGRESS:
                return IdempotencyDecision(kind=IdempotencyDecisionKind.IN_PROGRESS)
            return IdempotencyDecision(
                kind=IdempotencyDecisionKind.REPLAY,
                response=entry.response,
            )

    def complete(self, key: str, response: ApiResponse) -> None:
        with self._lock:
            entry = self._entries[key]
            entry.status = IdempotencyStatus.COMPLETED
            entry.response = response

    def fail(self, key: str) -> None:
        with self._lock:
            entry = self._entries.get(key)
            if entry is not None and entry.status == IdempotencyStatus.IN_PROGRESS:
                del self._entries[key]

Rust

use std::collections::HashMap;
use std::sync::Mutex;

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ApiResponse {
    pub status_code: u16,
    pub body: String,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum IdempotencyStatus { InProgress, Completed }

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum IdempotencyDecision {
    Execute,
    Replay { response: ApiResponse },
    InProgress,
    KeyMisuse,
}

#[derive(Debug, Clone, PartialEq, Eq)]
struct IdempotencyEntry {
    fingerprint: String,
    status: IdempotencyStatus,
    response: Option<ApiResponse>,
}

#[derive(Debug, Default)]
pub struct InMemoryIdempotencyStore {
    entries: Mutex<HashMap<String, IdempotencyEntry>>,
}

impl InMemoryIdempotencyStore {
    // The lock/unwrap handling is simplified here for learning purposes. In production code, avoid
    // calling `panic` (via `expect`) on the request path; handle lock poisoning and missing responses as `Result` types instead.
    pub fn reserve(&self, key: &str, fingerprint: &str) -> IdempotencyDecision {
        let mut entries = self.entries.lock().expect("mutex poisoned");
        match entries.get(key) {
            None => {
                entries.insert(
                    key.to_owned(),
                    IdempotencyEntry {
                        fingerprint: fingerprint.to_owned(),
                        status: IdempotencyStatus::InProgress,
                        response: None,
                    },
                );
                IdempotencyDecision::Execute
            }
            Some(entry) if entry.fingerprint != fingerprint => IdempotencyDecision::KeyMisuse,
            Some(entry) if entry.status == IdempotencyStatus::InProgress => {
                IdempotencyDecision::InProgress
            }
            Some(entry) => IdempotencyDecision::Replay {
                response: entry.response.clone().expect("completed response"),
            },
        }
    }

    pub fn complete(&self, key: &str, response: ApiResponse) {
        let mut entries = self.entries.lock().expect("mutex poisoned");
        let entry = entries.get_mut(key).expect("reserved key");
        entry.status = IdempotencyStatus::Completed;
        entry.response = Some(response);
    }

    pub fn fail(&self, key: &str) {
        let mut entries = self.entries.lock().expect("mutex poisoned");
        // The check is completed first so that the immutable borrow from `get` and the mutable borrow from `remove` do not overlap.
        let should_remove = entries
            .get(key)
            .is_some_and(|entry| entry.status == IdempotencyStatus::InProgress);
        if should_remove {
            entries.remove(key);
        }
    }
}

3. Caller

The caller validates the HTTP request, constructs a request fingerprint, and queries the idempotency store for a reservation. The actual payment creation I/O is performed by PaymentGateway, while the controller handles the response policy.

The fingerprint is generated not by a simple join but by serializing to canonical JSON (with sorted keys, etc.) and then applying SHA-256. A naive delimiter-based join can produce collisions where two distinct requests share the same fingerprint if a field value itself contains the delimiter (|). ⁶

TypeScript

import { createHash } from "node:crypto";

type CreatePaymentRequest = Readonly<{
  idempotencyKey: string;
  customerId: string;
  amountCents: number;
  currency: "KRW" | "USD";
}>;

type PaymentGateway = Readonly<{
  createPaymentAsync: (request: CreatePaymentRequest) => Promise<Readonly<{ paymentId: string }>>;
}>;

// Minimal canonical serialization: sort only top-level keys. This is a minimal example for flat request DTOs only.
// Note: if key ordering within nested objects is unstable, a semantically identical retry may produce a fingerprint mismatch and be
// falsely flagged as 422 (key misuse). For nested objects, objects inside arrays, number
// normalization, Unicode escaping, and I-JSON constraints, either use a recursively sorting RFC 8785
// (JCS) implementation, or design your payload as a flat DTO to physically eliminate serialization complexity.
function canonicalJson(value: Record<string, unknown>): string {
  const sorted = Object.keys(value)
    .sort()
    .reduce<Record<string, unknown>>((acc, k) => {
      acc[k] = value[k];
      return acc;
    }, {});
  return JSON.stringify(sorted);
}

function createPaymentFingerprint(request: CreatePaymentRequest): string {
  const canonical = canonicalJson({
    customerId: request.customerId,
    amountCents: request.amountCents,
    currency: request.currency,
  });
  return createHash("sha256").update(canonical, "utf8").digest("hex");
}

function validateCreatePaymentRequest(request: CreatePaymentRequest): void {
  // Length is merely a heuristic for filtering out "obviously bad keys," not an entropy validation.
  if (request.idempotencyKey.trim().length < 16) {
    throw new RangeError("idempotencyKey is too short.");
  }
  if (request.customerId.trim().length === 0) {
    throw new RangeError("customerId cannot be empty.");
  }
  if (!Number.isInteger(request.amountCents) || request.amountCents <= 0) {
    throw new RangeError("amountCents must be an integer greater than or equal to 1.");
  }
}

// NOTE: This function is an example policy hook; the default value is false (pessimistic policy).
// Network errors are all treated as 'Unknown (execution status indeterminate)' by default.
// In a real implementation, base the decision on the HTTP client/error type, the transmission stage, and the gateway contract,
// and return `true` only when it can be proven that "the request never reached the downstream" (e.g., a local exception raised before the external call).
// If a packet has already gone out, it must unconditionally be a reconciliation target.
function isDefinitelyNotExecuted(error: unknown): boolean {
  void error;
  return false;
}

export async function createPaymentApiAsync(
  request: CreatePaymentRequest,
  store: InMemoryIdempotencyStore,
  gateway: PaymentGateway,
): Promise<ApiResponse> {
  validateCreatePaymentRequest(request);
  const fingerprint = createPaymentFingerprint(request);

  let decision: IdempotencyDecision;
  try {
    decision = store.reserve(request.idempotencyKey, fingerprint);
  } catch {
    // On store failure, domains where duplication is costly (such as payments) should fail closed (reject the request).
    return { statusCode: 503, body: { code: "idempotency_store_unavailable" } };
  }

  if (decision.kind === "replay") {
    return decision.response;
  }
  if (decision.kind === "inProgress") {
    return { statusCode: 409, body: { code: "idempotency_key_in_progress" } };
  }
  if (decision.kind === "keyMisuse") {
    return { statusCode: 422, body: { code: "idempotency_key_reused_with_different_payload" } };
  }

  try {
    const payment = await gateway.createPaymentAsync(request);
    const response: ApiResponse = {
      statusCode: 201,
      body: { paymentId: payment.paymentId, status: "created" },
    };
    store.complete(request.idempotencyKey, response);
    return response;
  } catch (gatewayError) {
    // The failure policy branches on whether execution has occurred.
    //  - Failures where it is certain that execution did not occur (e.g., failure before the gateway call) -> release via `store.fail` (allow re-execution). Prevents zombie keys.
    //    (allow re-execution). Prevents zombie keys.
    //  - Failures where execution may have occurred (gateway timeout, connection drop, etc.) -> do not evict. Leave the reservation
    //    as in-progress so it becomes a reconciliation target, or pass the same idempotency
    //    key to the downstream payment gateway to prevent duplicate charges.
    if (isDefinitelyNotExecuted(gatewayError)) {
      store.fail(request.idempotencyKey);
    }
    throw gatewayError; // Map to 5xx at the upper layer.
  }
}

Fail(evict) is only safe for failures where it is certain the operation was never executed. Failures where execution may have occurred should be treated as unknown/pending/reconcile targets, not evicted. The isDefinitelyNotExecuted above is a placeholder predicate that determines whether the gateway failed before the request was even sent (for example, failure before the connection was established, or a client-side error before transmission). In practice, a policy like Stripe's is common: once execution has started, even a failure response such as a 500 is stored and returned as-is to subsequent retries with the same key.³

Separation of concerns:

Text

validateCreatePaymentRequest  = 외부 입력 검증
createPaymentFingerprint      = canonical JSON + SHA-256로 의미 기반 fingerprint 생성
IdempotencyStore.reserve      = key 선점 / replay / misuse / in-progress 판단
PaymentGateway                = 실제 결제 생성 I/O
IdempotencyStore.complete/fail= 실행 결과 저장 / 실패 시 예약 해제
createPaymentApiAsync         = API 응답 정책 조립

The key point is that the idempotency store has no knowledge of the payment domain. The store knows only the key, the fingerprint, and the response. Payment creation, database persistence, and external payment gateway calls are all handled by the layers above.

Crash window (caution): store.fail reduces zombie keys, but the window where "the process dies immediately after the gateway succeeds, before complete or fail is called" cannot be fully closed with an in-memory approach. To close this window, you must either (a) wrap the domain write and idempotency completion in the same database transaction as described in Section 7, or (b) propagate the idempotency key to the payment gateway so that re-execution is deduplicated downstream.

4. Reading Order

Text

Is an `Idempotency-Key` present?
-> Is the key sufficiently unpredictable? (for example, generated by the client as a UUID)
-> Can the payload fingerprint be computed deterministically? (canonical representation + hash)
-> Is this a new key?
-> Is this the same key with a different fingerprint?
-> Has this key already completed?
-> Is this key still in progress?
-> Was the previous execution confirmed to have no side effect? (if the result is unknown, do not retry)
-> Is the actual I/O executed only once?
-> Is the reservation released on eligible failure?
-> Is the result stored in a replayable form?
Idempotency code must not stop at "is this the same key?" — it must go all the way to "is this the same key AND does it represent the same intended request?"

5. Boundaries and Misconceptions

An Idempotency Key is not authentication. The fact that an attacker can generate an arbitrary key must not grant them any permissions. Keys must always be stored together with a scope such as tenant, account, user, or API route. Relying on a single global key alone can lead to key collisions between different users or information leakage.

An Idempotency Key is also distinct from deduplication. Deduplication is general-purpose processing that merges multiple occurrences of the same data. An Idempotency Key is an API contract by which the client explicitly declares, "this request is the same unit of work as a previous attempt." It is dangerous for the server to look only at the payload and guess that two requests are probably the same.

Domain-level predictable rejections and infrastructure failures must be kept separate. A conflict where the same key is already in-flight is a domain/API boundary collision. Receiving a different payload for the same key is client misuse. By contrast, a DB transaction failure, a Redis outage, or a payment gateway timeout are infrastructure failures. Collapsing all of these into 500 or all into 409 breaks retry policies.

Another common misconception is the idea that "idempotent means the response must always be identical." The HTTP definition of idempotency focuses on whether the intended effect on the server is the same. That said, in POST APIs based on an Idempotency Key, it is common practice to store the first response and return that same response on retries with the same key, purely for operational convenience. RFC 9110 also emphasizes that an idempotent request produces the same intended effect when repeated, while noting that the response itself may differ.¹

Production failure modes:

Text

Comparing only the key without comparing the payload fingerprint
Building the fingerprint with a simple join, leaving it vulnerable to delimiter injection and collisions
Including sensitive information such as email addresses, phone numbers, or social security numbers in the idempotency key
Storing the key as a global key without tenant/account scope
The reserve and domain write are not atomically coupled, resulting in duplicate execution
Failing to release the reservation on execution failure, leaving the zombie key perpetually in-progress
Conversely, unconditionally evicting failures with uncertain outcomes (such as gateway timeouts), which triggers duplicate charges
Not storing the status, body, and schema version required for response replay
No API contract specifying whether it is acceptable to return 200 instead of 201 on a retry with the same key
Ignoring idempotency store failures and proceeding directly to payment creation (a fail-closed violation)

6. Incorrect Example

TypeScript

const processedKeys = new Set<string>();

async function createPaymentBadAsync(
  request: CreatePaymentRequest,
  gateway: PaymentGateway,
): Promise<ApiResponse> {
  if (processedKeys.has(request.idempotencyKey)) {
    return { statusCode: 200, body: { status: "already_processed" } };
  }
  const payment = await gateway.createPaymentAsync(request);
  processedKeys.add(request.idempotencyKey);
  return { statusCode: 201, body: { paymentId: payment.paymentId } };
}

Why it's wrong:

Text

Because the key is recorded after creating the payment, if the process dies immediately after a successful gateway call, a duplicate payment can occur on retry.
Without comparing the payload fingerprint, a different payment request sent with the same key will not be blocked.
Instead of replaying the stored original response, it fabricates a fake `already_processed` response.
There is no tenant/account scope.
Because the in-progress state cannot be represented, two concurrent requests can both reach the gateway.
Because storage is in process memory only, the implementation breaks in server restarts, scale-out, and multi-instance environments.

This code looks like duplicate prevention, but it is not an idempotency boundary. A production API requires key pre-emption, fingerprint validation, atomic storage, preservation of a replayable response, and Failure Release.

7. Production Scaling

In production, use a durable store instead of an in-memory store (a durable store guarantees data persistence even after a process restart). The key is to place a unique constraint on scope + key and store the status, fingerprint, and response together.

SQL

CREATE TABLE api_idempotency_keys (
    scope_id             TEXT NOT NULL,
    idempotency_key      TEXT NOT NULL,
    request_fingerprint  TEXT NOT NULL,
    status               TEXT NOT NULL CHECK (status IN
                           ('in_progress','completed','failed_retryable','unknown','expired')),
    response_status_code INTEGER NULL,
    response_body        JSONB NULL,
    response_schema_version INTEGER NULL,
    last_error_code      TEXT NULL,
    locked_until         TIMESTAMPTZ NULL,   -- Execution lease expiration for in-progress zombie reclamation
    reconcile_after      TIMESTAMPTZ NULL,   -- Scheduled time for reconciling unknown-state records
    created_at           TIMESTAMPTZ NOT NULL,
    completed_at         TIMESTAMPTZ NULL,
    expires_at           TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (scope_id, idempotency_key)
);

Transaction Abort Pitfall and the Correct Flow

You should not follow the pattern many developers commonly design: "INSERT, catch the unique violation error, then SELECT within the same transaction." In PostgreSQL and other strict RDBMSes, when a unique constraint violation occurs inside a transaction, the entire transaction is aborted, and any subsequent SELECT in that same transaction is rejected with "current transaction is aborted." For this reason, use an atomic upsert statement that does not throw an error, so that acquiring execution rights and querying are handled in one step.⁷

SQL

-- Attempt to acquire execution rights. `ON CONFLICT DO NOTHING` does not throw an error, so the transaction is not aborted.
INSERT INTO api_idempotency_keys
  (scope_id, idempotency_key, request_fingerprint, status, locked_until, created_at, expires_at)
VALUES ($1, $2, $3, 'in_progress', now() + interval '5 minutes', now(), now() + interval '48 hours')
ON CONFLICT (scope_id, idempotency_key) DO NOTHING
RETURNING scope_id;
-- If `RETURNING` yields a row, this request has acquired execution rights.
-- 0 rows (conflict): already exists. Follow up with a separate `SELECT` to retrieve the status, fingerprint, and response.

Recommended processing flow:

Text

1. Start a transaction.
2. Run `INSERT ... ON CONFLICT DO NOTHING RETURNING`.
3. If `RETURNING` yields a row -> execution ownership acquired.
4. If it returns 0 rows -> run a separate `SELECT` to read the existing row. The transaction has not been aborted.
5. Fingerprint mismatch -> `422` misuse.
6. `status = in_progress AND locked_until > now()` -> `409 Conflict` or `202 Accepted + Retry-After`.
   `status = in_progress AND locked_until <= now()` -> atomically transition to `unknown`, then reconcile.
7. `status = completed` -> replay the stored response.
8. `status = failed_retryable` -> allow retry. The previous attempt is confirmed to have had no side effect.
   `status = unknown` -> disallow retry and reconcile. The side-effect status is unclear.
9. Only the request that acquired execution ownership may perform the domain write.
10. Commit the domain write and the idempotency row completion in the same transaction.

`failed_retryable` re-execution rights and `locked_until` reclamation

Because the failed_retryable row already exists, INSERT ... ON CONFLICT DO NOTHING will not acquire execution rights. Only the request that atomically transitions the status back to in_progress obtains the right to re-execute.

SQL

UPDATE api_idempotency_keys
SET status = 'in_progress',
    locked_until = now() + interval '5 minutes',
    last_error_code = NULL
WHERE scope_id = $1
  AND idempotency_key = $2
  AND request_fingerprint = $3
  AND status = 'failed_retryable'
RETURNING scope_id;
-- RETURNING row exists -> this request acquires the right to re-execute
-- 0 rows -> another request already claimed it in the meantime, or the state has changed

locked_until serves as a signal for handling in-progress zombies. However, in payment flows it is dangerous to immediately re-execute just because a lock has expired, since the process may have died while the external payment gateway request succeeded. Therefore the safe default is not to re-execute but to transition to unknown.

Text

`in_progress` and `locked_until > now()`
-> `409 Conflict` or `202 Accepted + Retry-After`

`in_progress` and `locked_until <= now()`
-> Do not retry. Transition to `unknown` and reconcile.

In other words, locked_until is not justification to "re-execute safely"; it is a signal that "this request is not being processed normally and must be reconciled." Transitioning an expired in_progress row to unknown is also done via an atomic UPDATE.

SQL

UPDATE api_idempotency_keys
SET status = 'unknown',
    reconcile_after = now(),
    last_error_code = 'in_progress_lock_expired'
WHERE scope_id = $1
  AND idempotency_key = $2
  AND request_fingerprint = $3
  AND status = 'in_progress'
  AND locked_until <= now()
RETURNING scope_id;

The above is for the request path (a specific key + fingerprint). A batch job or daemon that scans expired in_progress rows may not know the fingerprint, so it uses a reduced set of conditions.

SQL

UPDATE api_idempotency_keys
SET status = 'unknown',
    reconcile_after = now(),
    last_error_code = 'in_progress_lock_expired'
WHERE status = 'in_progress'
  AND locked_until <= now()
RETURNING scope_id, idempotency_key;

State transition summary:

Text

- failed_retryable : Only requests that transition to `in_progress` via an atomic UPDATE hold execution rights.
- unknown : No client retry can acquire execution rights; only the reconciliation pipeline
                        transitions the record to `completed` or `failed_retryable`.
- expired in_progress : Do not auto-retry; transition to `unknown` and reconcile.

In domains where duplicate execution is costly, such as payments, a gap between the domain write and idempotency complete steps is dangerous. Wherever possible, bundle the internal domain write and idempotency completion into the same DB transaction. That said, "same transaction" is not the answer in every situation. Do not hold an external payment gateway (PG) call inside a long-running DB transaction; a long transaction combined with network I/O increases lock hold time and makes failure recovery harder. Augment external I/O with a PG-side idempotency key, an outbox, a payment_intent state machine, and a reconciliation job. This is how you actually close the "crash window" described in Section 3.

Resolving the `unknown` state: the reconciliation pipeline

Requests stuck in the unknown state (where execution outcome is indeterminate) must have their final status determined by a dedicated asynchronous pipeline. An unknown that is never observed or resolved is simply a hidden failure state. The resolution typically combines three approaches.

PG webhook ingestion (authoritative source): the success/failure webhooks sent by the payment gateway are the most reliable source of final status. Look up the corresponding row using the webhook's idempotency key or payment ID, and finalize it as completed or failed_retryable.
Periodic polling daemon (backup): when a webhook is absent or delayed, enqueue unknown rows whose reconcile_after time has passed
and confirm their actual status by calling the PG's query API (GET payment), then update accordingly.
DLQ (last safety net): rows that cannot be finalized through polling, or that have exceeded the retry limit, are forwarded to a dead letter queue for an operator to review and handle manually.

Failure finalization also splits into categories. If it is confirmed that no payment object was created on the PG side, the row can transition to failed_retryable. Conversely, if a payment object was created but then declined or failed, it is safer to treat the unit of work as completed even though the domain outcome is a failure: store it as completed and replay the same failure response. Use failed_retryable only when you can prove that re-executing the same operation will not produce duplicate side effects.³

To summarize the principle: clients cannot re-execute with the same key while the state is unknown; only this pipeline may transition a state to completed or failed_retryable. This ensures that an indeterminate outcome does not escalate into a duplicate execution.

A TTL policy is also necessary. Stripe's documentation states that keys may be pruned no earlier than 24 hours after creation. In internal systems, domains with long retry windows, such as payments, orders, transfers, and reservations, may require a retention period longer than 24 hours. The retention period should be defined as the sum of the client retry window, the failure recovery window, and any audit requirements.³

Decide whether expired rows are physically deleted or retained as a status value in the record.

Text

A. hard delete: a pruning job physically deletes rows whose `expires_at` has passed.
B. soft expire: set `status='expired'` for auditing/analytics purposes and exclude the record from replay targets.

Operational metrics:

Text

api.idempotency.reserve.created.count
api.idempotency.reserve.replay.count
api.idempotency.reserve.in_progress.count
api.idempotency.reserve.key_misuse.count
api.idempotency.reserve.failed_retry.count
api.idempotency.reserve.unknown.count
api.idempotency.store.error.count
api.idempotency.zombie_key.count
api.idempotency.ttl_pruned.count
api.idempotency.reconcile.scheduled.count
api.idempotency.reconcile.resolved.count
api.idempotency.reconcile.failed.count

In particular, key_misuse can be a security signal. It may be a simple client bug, but it could also indicate a replay attack where a caller tries changing the payload while reusing the same key, or an SDK defect.

8. Comparison Notes: C# / TypeScript / Python / Rust

Language	Idiomatic Patterns	Caveats
C#	record, enum, interface store, DB transaction	`lock + Dictionary` is a single-process example only, not a production store
TypeScript	discriminated union, `Map`, controller boundary	In a multi-instance Node environment, an in-memory `Map` breaks immediately
Python	dataclass, Enum, repository class	The GIL does not guarantee check-then-act atomicity, so `threading.Lock` is required
Rust	enum, `Mutex<HashMap>`, explicit ownership	Never use `expect`/panic on the request path; keep the poison boundary and durable store boundary separate

C# expresses idempotency decisions clearly through record types and interfaces. TypeScript can cleanly branch execute/replay/inProgress/keyMisuse with discriminated unions. Python can achieve the same shape with dataclasses and Enums, but be careful not to mistake the GIL for a storage consistency model — explicit locking is required. Rust's enum is well-suited for decision modeling, though the Mutex<HashMap> and expect calls in the examples are for learning purposes; production code needs a DB or distributed store along with Result-based error handling.

As I always note, you should never forcibly transplant the idioms of one language into another. There is no need to build an elaborate C#-style class hierarchy in TypeScript, and mimicking TypeScript-style string unions in C# weakens type safety. In Rust, rather than funneling everything into Result, it is generally accepted practice to define a separate replayable decision enum, as that keeps intent clearer.

Ultimately, what counts as core programming practice is shaped partly by the linguistic constraints and techniques of a given language, but also substantially by community conventions.

9. Additional Considerations

Is this API inherently idempotent by virtue of its HTTP method, or is it a POST command that requires an idempotency key?
Is the fingerprint based on the raw JSON string, or on a canonical (RFC 8785 JCS) hash?
When a different payload arrives under the same key, which API contract fits best: 400, 409, or 422?
When an external payment gateway call and an internal DB write cannot be wrapped into a single atomic operation, which approach do you reach for: outbox, saga, or a payment gateway idempotency key?
Must the replay response be byte-for-byte identical to the original response, or is it acceptable to return a different status code that still carries the same resource ID?
On what basis should you set the key TTL: the client retry window, the failure recovery window, or audit requirements?
When the idempotency store is unavailable, should you block the request (fail-closed) or proceed and accept the risk of duplication?
On execution failure, should you evict the reservation or mark it as failed? If there is any possibility the gateway succeeded, which choice is safer?

10. Summary

The Idempotency Key is an API boundary pattern that prevents duplicate execution during uncertain network retries.
Comparing keys alone is insufficient; you must also verify that the same key carries the same request fingerprint (canonical form plus hash).
Separate execution rights from response reuse through a Reserve -> Execute -> Complete / RetryableFail / Unknown -> Replay or Reconcile flow. Only pre-execution failures release the reservation; outcomes that remain uncertain are sent to reconciliation.
In domains where duplicate execution is costly, such as payments, orders, and reservations, you need a durable store with a unique constraint and an atomic ON CONFLICT upsert, not an in-memory store.
Domain conflicts, key misuse, and infrastructure failures must each have distinct response policies and metrics.
Never put sensitive information in an idempotency key, and design tenant/account scope together with a TTL policy.

Quick mnemonic:

Text

To safely retry a POST, design the key, fingerprint, durable replay, and failure release together.

Personal note: Idempotency is usually taught as "preventing duplicate requests," but in practice it means "safely handling duplicate requests while cleanly resolving failed ones."

Footnotes

IETF. RFC 9110: HTTP Semantics (idempotent methods) ↩
IETF. The Idempotency-Key HTTP Header Field (draft-ietf-httpapi-idempotency-key-header) ↩
Stripe. Idempotent requests ↩
IETF. RFC 8941: Structured Field Values for HTTP (Idempotency-Key header format) ↩
Python. threading — Lock objects ↩
IETF. RFC 8785: JSON Canonicalization Scheme (JCS) ↩
PostgreSQL. INSERT ... ON CONFLICT (upsert) ↩

Prerequisites you should know:

Key and Fingerprint

Durable Replay

Failure Release

Introduction

1. The Problem

2. Key Expressions

C#

TypeScript

Python5

Rust

3. Caller

4. Reading Order

5. Boundaries and Misconceptions

6. Incorrect Example

7. Production Scaling

Transaction Abort Pitfall and the Correct Flow

failed_retryable re-execution rights and locked_until reclamation

Resolving the unknown state: the reconciliation pipeline

8. Comparison Notes: C# / TypeScript / Python / Rust

9. Additional Considerations

10. Summary

Footnotes

Python⁵

`failed_retryable` re-execution rights and `locked_until` reclamation

Resolving the `unknown` state: the reconciliation pipeline