Historical revision

Cache-Aside — 2026-07-01 11:54 UTC

rev_ce170ddd9da442b99b49369f71fb9e02

Cache-Aside

Cache-Aside is a caching pattern where the application directly manages both the cache and the source data store. It checks the cache first, and on a miss, reads from the source (such as a database or external API) and populates the cache. Microsoft describes this as "load data on demand into a cache from a data store".[1]

It is also known as Lazy Loading Cache. AWS similarly describes lazy caching, or cache-aside, as the most common caching strategy.[2]
Put more elegantly, it can be summarized as follows.

Cache-Aside is not "magic that makes the database faster"; it is a pattern that trades read load for consistency risk. This document focuses on how to handle the operational problems that arise from that trade-off: cache invalidation, stale data, cache stampede, and failure handling.

Core Structure

In Cache-Aside, the cache is not a transparent layer sitting in front of the database; instead, the application code is directly responsible for cache lookups, database queries, cache writes, and cache invalidation.

Role	Responsibility
Application	Checks the cache first; on a miss, reads from the source data store and stores the result in the cache
Cache	Returns frequently read values quickly
Source data store	Serves as the ultimate source of truth

The basic read flow works as follows.

A request comes in.
The application looks up the key in the cache.
On a cache hit, it returns the cached value.
On a cache miss, it reads from the database.
It stores the retrieved value in the cache along with a TTL.
It returns the value.

C#

public async Task<ProductDto?> GetProductAsync(long productId, CancellationToken ct)
{
    var cacheKey = $"product:{productId}:detail:v1";

    var cached = await cache.GetAsync<ProductDto>(cacheKey, ct);
    if (cached is not null)
    {
        // This is the fastest path. The database is never touched.
        return cached;
    }

    // Cache miss. The source data store is the source of truth.
    var product = await productRepository.FindDtoAsync(productId, ct);
    if (product is null)
    {
        return null;
    }

    // A TTL that is too long increases stale data, while a TTL that is too short fails to reduce database load.
    await cache.SetAsync(cacheKey, product, TimeSpan.FromMinutes(5), ct);
    return product;
}

In TypeScript, you typically combine a Redis client with a repository.

TypeScript

type ProductDto = {
  id: number;
  name: string;
  price: number;
};

async function getProduct(productId: number): Promise<ProductDto | null> {
  const cacheKey = `product:${productId}:detail:v1`;

  const cached = await redis.get(cacheKey);
  if (cached !== null) {
    // A JSON deserialization failure means the cached value is corrupted.
    // In practice, log the error, delete the cache entry, and fall back to the DB.
    try {
      return JSON.parse(cached) as ProductDto;
    } catch {
      await redis.del(cacheKey);
      // Fall through to the DB fallback path below.
    }
  }

  // Cache miss. The DB is the source of truth.
  const product = await productRepository.findDto(productId);
  if (product === null) {
    return null;
  }

  // `EX` in Redis sets the TTL in seconds.
  // If the DTO shape changes, bump the key version to v2.
  await redis.set(cacheKey, JSON.stringify(product), { EX: 300 });
  return product;
}

The { EX: 300 } in the example above is node-redis v4 syntax. Because the way TTL is specified varies by client, this is covered separately in the Implementing with Redis section below.

The JSON.parse try/catch in the TypeScript example above is also for illustration. In production code, a cache serializer/adapter converts the result into states like Corrupted, Miss, or Down and passes them to the service layer.

To keep the explanation simple, the earlier example uses null to represent both a cache miss and a not-found result together. In production code, it is safer to distinguish miss (no cache entry), not found (entity does not exist), and cache down (cache failure) using a Result type or a union type. A later section on error policies separates these into Hit / Miss / Down.

When should you use it?

Cache-Aside is well suited for data that is read frequently and can tolerate slightly stale values.

Common use cases:

Product detail pages
Post detail pages
Category listings
Configuration values
Permission lists (requiring short TTL and final validation)
Portions of user profiles
Dashboard summary data
External API responses with short reuse windows

The decision criteria are straightforward.

Question	Cache-Aside Suitability
Is the same key queried repeatedly?	High
Is a database query or external API call expensive?	High
Can you tolerate stale data for a few seconds to a few minutes?	High
Are reads far more frequent than writes?	High
Is the absolute latest value always required?	Low
Is the cache invalidation criteria per key ambiguous?	Low

Values where stale data carries a high cost, such as permissions, inventory, seat availability, and payment status, require extra caution. Even when using Cache-Aside, set a short TTL, include boundaries like tenant/user/role/scope/version in the key, and maintain a path that re-verifies the final decision against the database or an authorization server. In some cases, not caching at all is the cheaper option.

Why use it?

It is used to reduce load on the source data store and lower response times.

The database typically handles correctness and durability, while the cache handles fast reuse. Separating the two yields the following benefits.

Repeated lookups do not reach the database.
The number of external API calls is reduced.
It absorbs read traffic during peak periods.
Because only frequently accessed data is loaded into the cache, memory is used relatively efficiently.
Even if the cache is lost, it can be repopulated from the database.

The AWS Redis caching strategy documentation also cites the following advantages of Cache-Aside: only requested data enters the cache, making it cost-efficient; the implementation is straightforward; and immediate performance improvements are easy to achieve.[3]

Implementing with Redis

The examples in this document use Redis as the cache store. Redis is an in-memory key-value store where values are stored as strings (byte sequences) by default, and it also supports data structures such as lists, hashes, sets, and sorted sets. It is the most commonly used cache layer for Cache-Aside.4

Personal note: There are other options besides Redis, such as Valkey, but since Redis is generally the stack that clients in Korea are familiar with, it is better to stick with Redis.

Commands are atomic

Because Redis serializes command execution through a single-threaded event loop model, individual commands (GET, SET, DEL, INCR, etc.) appear atomic. Even with I/O threading introduced in Redis 6, the atomicity model for command execution is preserved, so no additional locking is needed for single operations. However, it is important to note that compound operations such as "check with GET, decide, then SET" are split across multiple commands and are therefore not atomic. This is why the read-modify race in Cache-Aside (the stale-set problem described earlier) remains an application-level issue even when using Redis.

Command mapping

Each phase of Cache-Aside maps to Redis commands as follows.

Phase	Redis command
Cache lookup	`GET key`
Store with TTL	`SET key value EX seconds` (or `SETEX key seconds value`)
Invalidation (deletion)	`DEL key` (for large values, async deletion with `UNLINK key`)
Store only when absent	`SET key value NX` (single fill; when used as a distributed lock, a token and safe release are required)
Check and extend expiry	`TTL key`, `EXPIRE key seconds`
Batch lookup	`MGET k1 k2 ...` or pipelining

TTL syntax differences by client

Even the same SET ... EX command is called differently depending on the client. The examples in this document are based on node-redis v4.

TypeScript

// node-redis v4
await redis.set(key, value, { EX: 300 });
await redis.set(key, value, { NX: true, PX: 3000 }); // only when absent, ms TTL

// ioredis
await redis.set(key, value, "EX", 300);
await redis.set(key, value, "PX", 3000, "NX");

C#

// C# StackExchange.Redis
db.StringSet(key, value, TimeSpan.FromMinutes(5));
db.StringSet(key, value, TimeSpan.FromSeconds(3), when: When.NotExists); // NX

Serialization and key versioning

Because values are strings, objects must be serialized (e.g., as JSON) before storing and deserialized when reading. This means that if the DTO shape changes, deserialization can break; as explained in the key design document referenced earlier, you can embed v1, v2 version tags in the key to separate old and new cache entries.

Memory limits and eviction

Redis manages its memory ceiling with maxmemory and eviction policies (allkeys-lru, allkeys-lfu, volatile-ttl, etc.).[8]When memory is full, values can be evicted even if their TTL has not yet expired. This aligns exactly with the Cache-Aside premise that cached data can disappear at any time, which is why the application must be able to refill from the database on a cache miss, as described in the [Cache Failure Handling] section below.

Memory fragmentation

The pattern of deleting with DEL and then re-fetching and re-setting with SET is a safe choice that reduces race conditions, but when values of varying sizes (JSON, MessagePack, etc.) are continuously deleted and rewritten, fragmentation accumulates inside Redis's memory allocator (jemalloc) as free space becomes scattered. This causes the physical memory (RSS) to bloat even when the logical data size is small, and the OS OOM killer may end up terminating Redis as a result.

For this reason, in production you should monitor the fragmentation ratio (mem_fragmentation_ratio, which is RSS relative to used memory), and if this value persistently stays around 1.5 or higher, consider enabling activedefrag (active defragmentation).[9] This metric matters most for caches with frequent invalidation.

Personal note: these are not my own experiences, but things that come up when searching for questions in the community. So you could call them the experiences of senior programmers.

Do not rely on persistence

Redis can persist data to disk via RDB snapshots or AOF, but when using it as a cache in Cache-Aside, you should not depend on that persistence. The source of truth is always the database, and the cache should be treated as an auxiliary layer that can be empty and recovered from scratch.

Caution in distributed environments

Because multiple app servers share the same Redis instance, the in-process locks covered in the [Cache Stampede] section above (C#'s SemaphoreSlim, Node's in-flight Map) are not sufficient to prevent a stampede at the cluster-wide level. In this case, consider combining a simple distributed lock based on SET key value NX PX ttl, a Redlock-style solution, or a request coalescing layer. When using a distributed lock, store a different random token in the value each time, and when releasing, use a Lua script that performs a DEL only if the GET value matches your token. A plain DEL can accidentally delete a lock that another request acquired after yours expired.12 13 Also, since distributed locks themselves are notoriously difficult to guarantee correctness around expiry timing and network partitions, first determine whether you are protecting consistency or simply reducing load before applying one.

Write flow

In Cache-Aside, the difficult side is not reading but writing. Because the source store and the cache coexist, you need to decide how to handle the cache after a write.

The approach most books recommend from experience is to update the database first, then delete the cache.

There is a reason this order is preferable to "delete the cache first, then update the database." If you delete first, another read can observe a cache miss in the brief window just before the database update and repopulate the cache with the old value that has not yet changed. This leaves the old value in the cache even after the update completes. If you update the database first, any cache miss that occurs after the commit will read the new value from the database. That said, existing cache hits can still return the old value until the cache is deleted, and the stale-set race described below does not disappear entirely.

C#

public async Task UpdateProductNameAsync(long productId, string name, CancellationToken ct)
{
    var cacheKey = $"product:{productId}:detail:v1";

    // Update the source of truth first.
    await productRepository.UpdateNameAsync(productId, name, ct);

    // Let the cache be repopulated on the next read.
    // Deleting a cache entry is often simpler than updating it.
    await cache.RemoveAsync(cacheKey, ct);
}

TypeScript example:

TypeScript

async function updateProductName(productId: number, name: string): Promise<void> {
  const cacheKey = `product:${productId}:detail:v1`;

  // First, update the source of truth.
  await productRepository.updateName(productId, name);

  // In Cache-Aside, the typical approach is to delete the cached value rather than overwrite it.
  // The next read fetches the new value from the database and repopulates the cache.
  await redis.del(cacheKey);
}

In general, deleting the cached value is relatively safer than updating it in place, though it is not race-free.

Strategy	Advantages	Risks
Delete cache after updating DB	This is the most common approach and is relatively safe.	The next request will result in a cache miss. Under concurrent read/write scenarios, a stale value can be written back into the cache.
Update cache after updating DB	The next read is fast.	The logic for constructing the cached value may duplicate the DB query logic.
Update DB after updating cache	Almost always avoided.	If the DB write fails, the cache contains a lie.

Canonical race condition:

Request A sees a cache miss.
A reads the old value from the DB.
Request B updates the DB with the new value.
B performs a cache delete.
A, slightly delayed, sets the old value in the cache.
From that point on, the old value persists in the cache.

The key precondition is that A must read the old value (step 2) before B deletes it from the cache (step 4). If A's database read occurred after B's update, A would already have the new value and no resurrection would happen. This race therefore only exists in the narrow window where A reads the stale value first, then writes it back after B's deletion.

This problem can be called stale value resurrection. The name sounds dramatic, but the phenomenon is simple: a stale value that was already deleted comes back to life.

Personal note: just like in the movies, a confirmation kill matters.

Preventing stale resurrection with a Lua script

To guard against the stale-set race more robustly, you need to atomically check the condition "write only if the version I read is still the latest" before writing to the cache. In the race described above, however, B has already deleted the key, so SET NX (write only if absent) passes right through, because it cannot distinguish between "absent" and "stale."

Instead, keep a logical version (or timestamp) per key and use a Redis Lua script to atomically check "write only if the version I read is still the latest." Because Lua executes atomically in Redis, no other command can interleave between the GET and the SET.[7]

A common mistake here is assuming that storing the version inside the cache value key itself solves everything. Storing the version inside the cache value key alone does not prevent resurrection, because deleting that key during an update also deletes the version along with it. A late-arriving old writer sees a nil (absent) version and writes the value unconditionally.

Text

A reads version 1 from the DB.
B updates the DB to version 2.
B deletes the cache key (the version stored inside is also gone).
A runs Lua late → HGET ver = nil → passes → old value is resurrected

To block an old writer, you therefore need a separate version fence key (or tombstone marker) that is not deleted by cache invalidation. The fence holds the latest known version for that key.

LUA

-- KEYS[1] = cache value key
-- KEYS[2] = version fence key (not deleted)
-- ARGV[1] = value, ARGV[2] = my_version, ARGV[3] = ttl_seconds
local fence = redis.call('GET', KEYS[2])
if fence and tonumber(ARGV[2]) < tonumber(fence) then
  return 0  -- A newer DB write occurred after my read. Do not write.
end
redis.call('HSET', KEYS[1], 'ver', ARGV[2], 'val', ARGV[1])
redis.call('EXPIRE', KEYS[1], ARGV[3])
return 1

The write flow is structured as follows.

Text

DB commit
→ SET version fence key to new_version
→ DEL cache value key

Because the fence always knows the latest version, a cache write that arrives late with a stale version is rejected at the fence comparison. This is a standard hardening technique for keys where consistency matters; make sure to define a clear version source (a DB sequence, updated_at, or a logical clock) and manage the fence key's lifetime so it does not expire too early (keep it longer-lived than the cache entry, or make it persistent).

Redis Cluster caveat: CROSSSLOT and hash tags

The Lua script above touches two keys: KEYS[1] (the cached value) and KEYS[2] (the fence). On a single node this is fine, but in a sharded Redis Cluster the two keys may be placed on different nodes. When that happens, multi-key commands and scripts cannot guarantee atomicity and fail immediately with a CROSSSLOT error.[10][11]

The solution is hash tags, which force both keys onto the same slot. By placing curly braces {} in a key name, Redis uses only the string inside the braces for hashing. Keys that share the same tag are always placed in the same slot.

Text

Bad example: product:123:val / product:123:fence (can go to different slots)
Good example: {product:123}:val / {product:123}:fence (guaranteed to go to the same slot)

In other words, if you want to use multi-key Lua scripts in a cluster, you must standardize your key naming convention so that all keys handled together share the same hash tag. As shown in [Key Design], it is cleaner to include the hash tag from the start.

The database and Redis are not part of one transaction

The fence flow (DB commit → fence SET → cache DEL) is also not an atomic transaction that binds the database and Redis together. The DB commit may succeed while the fence SET or cache DEL fails, leaving the cache stale. For keys where consistency is critical, you should not merely log these failures; instead, guarantee invalidation through outbox/event-based retries or a background invalidation worker. This is precisely where the misconception that "using Lua fully solves the problem" must be avoided.

Field judgment criteria:

Delete the cache after the DB commit completes.
Treat cache deletion failures as items to log and retry.
If a cached value is spread across multiple keys, maintain an explicit invalidation list.
Do not let the cache become the source of truth.

TTL Design

TTL (Time To Live) is the duration a cached value remains alive. In Cache-Aside, TTL acts as a tuning knob between performance and consistency.

TTL	Result
Too short	Cache misses increase and DB load does not decrease
Too long	Stale data remains visible for an extended period
None	Data keeps rotting when invalidation is missed

Rough rule of thumb:

Data	Example TTL
Product detail	1 to 10 minutes
Category list	10 minutes to 1 hour
Permission list	Tens of seconds to several minutes
Dashboard summary	10 seconds to 5 minutes
Exchange rates, inventory, seat availability	Very short, depending on the domain

It is better to set TTL based on "is it acceptable from a business perspective to show stale data?" rather than "how fresh does the data need to be?"

Personal note: this rough rule of thumb was actually taken from an example in a course sold by someone teaching Redis.

Key Design

Cache keys should be meaningful enough that an operator can understand what they represent at a glance later.

Good key examples:

C#

var key = $"product:{productId}:detail:v1";

In TypeScript, it is better to consolidate them into a key builder function.

TypeScript

function productDetailKey(productId: number): string {
  return `product:${productId}:detail:v1`;
}

What to include in a key:

Domain name: product, user, permission
Identifier: 123
Query shape: detail, summary, permissions
Version: v1, v2

Versioning matters. If the DTO shape changes but the existing cached value is read as-is, you may get deserialization errors or unexpected responses. Bumping the key version naturally separates old cache entries from new ones.

Bad key examples:

C#

var key = id.ToString();

The same applies in TypeScript.

TypeScript

const key = String(id);

Keys like these are prone to collisions with other domains and make it hard to tell what value they hold during operations.

Cache Stampede

Cache Stampede is a phenomenon where a popular key expires simultaneously for many requests, causing all of them to flood the database at once.

Scenario:

The product:1 cache expires.
1,000 requests arrive concurrently.
All of them encounter a cache miss.
All of them query the database.
The database absorbs the full traffic that the cache was supposed to handle.

Mitigation strategies:

Add a small jitter to TTLs so entries don't all expire at the same time.
Allow only one request per key to query the database.
Use the stale-while-revalidate strategy to briefly return the old cached value while refreshing in the background.
Pre-warm popular keys before they expire.

A simple single-flight implementation:

C#

private static readonly ConcurrentDictionary<string, SemaphoreSlim> Locks = new();

public async Task<ProductDto?> GetProductAsync(long productId, CancellationToken ct)
{
    var key = $"product:{productId}:detail:v1";

    var cached = await cache.GetAsync<ProductDto>(key, ct);
    if (cached is not null)
    {
        return cached;
    }

    var gate = Locks.GetOrAdd(key, _ => new SemaphoreSlim(1, 1));
    await gate.WaitAsync(ct);

    try
    {
        // Before acquiring the lock, check the cache again, since another request may have already populated it.
        cached = await cache.GetAsync<ProductDto>(key, ct);
        if (cached is not null)
        {
            return cached;
        }

        var product = await productRepository.FindDtoAsync(productId, ct);
        if (product is null)
        {
            return null;
        }

        await cache.SetAsync(key, product, TimeSpan.FromMinutes(5), ct);
        return product;
    }
    finally
    {
        gate.Release();
    }
}

This C# example only reduces duplicate queries within a single application process. When multiple servers are running, each server acquires a lock only within its own process, so a stampede at the Redis level cannot be prevented. In that case, separate strategies such as Redis locks, distributed locks, a request coalescing layer, or background refresh are required.

Additionally, the example above omits Locks cleanup logic for simplicity. In a service with many key types, continuously growing a ConcurrentDictionary<string, SemaphoreSlim> can leave lock objects behind. In production, you should include a cleanup strategy such as ref-counting, an expiring lock map, or Lazy<Task>-based coalescing. Naively checking CurrentCount and calling TryRemove can introduce a worse race condition where waiting requests and new requests end up acquiring different lock objects.

In TypeScript, a simple approach to reducing duplicate lookups within a single Node.js process is to store in-flight Promises in a Map.

TypeScript

const inFlight = new Map<string, Promise<ProductDto | null>>();

async function getProductWithSingleFlight(productId: number): Promise<ProductDto | null> {
  const key = productDetailKey(productId);

  const cached = await redis.get(key);
  if (cached !== null) {
    return JSON.parse(cached) as ProductDto;
  }

  const existing = inFlight.get(key);
  if (existing !== undefined) {
    // Another request is already reading the same key from the database.
    // Rather than issuing a new database query, wait on the same Promise.
    return existing;
  }

  const loading = (async () => {
    try {
      // Check the cache again after waiting, because another request may have populated it in the meantime.
      const secondCached = await redis.get(key);
      if (secondCached !== null) {
        return JSON.parse(secondCached) as ProductDto;
      }

      const product = await productRepository.findDto(productId);
      if (product === null) {
        return null;
      }

      await redis.set(key, JSON.stringify(product), { EX: 300 });
      return product;
    } finally {
      inFlight.delete(key);
    }
  })();

  inFlight.set(key, loading);
  return loading;
}

This example only works within a single process. When multiple servers are running, separate strategies such as Redis locks, request coalescing, and background refresh are required.

Distributed environments: Probabilistic Early Recomputation (PER / XFetch)

In a distributed environment with dozens of servers, acquiring a distributed lock on every read carries too much overhead. An alternative that mitigates stampedes probabilistically without locks is Probabilistic Early Recomputation (PER), commonly known as XFetch.6(#ref-6)

Here is a side-by-side comparison of stampede defense strategies.

Strategy	Guarantee	Cost
Single-flight	Reduces duplicate lookups within a single process	Limited when multiple servers are running
Distributed lock	Strongly limits the number of recomputations per key	Lock overhead, expiry and partition issues
PER / XFetch	Probabilistically mitigates stampede	Does not guarantee exactly one recomputation
Stale-while-revalidate	Minimizes user-perceived latency	Requires background task lifecycle management

The key idea is that, before the cache actually expires, each request independently decides with some probability "I will refresh this proactively." The closer the expiry, the greater that probability grows, exponentially. The decision formula is as follows.

Text

now - delta * beta * ln(rand()) >= expiry

now : current time
delta : time taken to recompute the value (database lookup)
beta : probability weight (typically 1; increasing it triggers earlier refresh)
rand() : random number between 0 and 1
expiry : logical expiration time of the cache entry

Because ln(rand()) is negative, the left-hand side evaluates to a value greater than now, and as now approaches expiry the condition becomes true with increasing probability. A request that "wins" the lottery before expiry queries the database and refreshes the cache, while the remaining requests simply receive the still-valid existing value. PER does not guarantee that exactly one request performs the recomputation. If luck is unfavorable or traffic is high, multiple requests may recompute simultaneously. Even so, it significantly reduces the number of recomputation requests probabilistically, without a distributed lock, greatly alleviating the thundering herd where all requests rush to the database at the moment of expiry.

To implement this, store delta (the recomputation cost) and the logical expiry time alongside the value, and set the actual Redis TTL slightly longer than that expiry to leave room for the early refresh.

TypeScript

type Wrapped<T> = { value: T; deltaMs: number; expiresAt: number };

function shouldEarlyRecompute(deltaMs: number, expiresAt: number, beta = 1): boolean {
  // If `Math.random()` returns 0, then `Math.log(0) = -Infinity`, so `EPSILON` guards against that case.
  const random = Math.max(Number.EPSILON, Math.random());
  // Since `ln(random)` is negative, `(now - delta*beta*ln(random))` evaluates to a value greater than `now`.
  const gap = deltaMs * beta * Math.log(random);
  return Date.now() - gap >= expiresAt;
}

async function getProductXFetch(productId: number): Promise<ProductDto | null> {
  const key = productDetailKey(productId);
  const raw = await redis.get(key);

  if (raw !== null) {
    const wrapped = JSON.parse(raw) as Wrapped<ProductDto>;
    if (!shouldEarlyRecompute(wrapped.deltaMs, wrapped.expiresAt)) {
      return wrapped.value; // Most requests end here.
    }
    // Only the "winning" request proceeds to refresh the cache early. The existing value is still valid at this point.
  }

  const start = Date.now();
  const product = await productRepository.findDto(productId);
  if (product === null) {
    return null;
  }
  const deltaMs = Date.now() - start;

  const ttlSeconds = 300;
  const wrapped: Wrapped<ProductDto> = {
    value: product,
    deltaMs,
    expiresAt: Date.now() + ttlSeconds * 1000,
  };
  // The actual TTL is set slightly longer than the logical expiry.
  await redis.set(key, JSON.stringify(wrapped), { EX: ttlSeconds + 10 });
  return product;
}

This example simplifies things by having the winning request directly await the database to populate the new value, meaning it is closer to a lock-free early refresh than to stale-while-revalidate. To return the stale value immediately and defer the refresh, you need to move the refresh into a background process.

However, in C# ASP.NET Core, a simple fire-and-forget pattern like _ = Task.Run(...) is an anti-pattern. Once the HTTP request context ends, completion of that background task is not guaranteed; it can be lost on app pool restart or deployment, and exceptions are not propagated to the main thread, causing silent failures. For this reason, refresh work should be offloaded to a BackgroundService or IHostedService, or pushed into an in-memory queue backed by Channel<T> and processed by a dedicated consumer, keeping the control flow and lifecycle separate.[15]

The decision logic is the same in C#: store deltaMs and expiresAt alongside the value, then apply the formula above. PER is particularly well-suited to read-heavy hot keys where a distributed lock would be too costly.

In-process coalescing with memory: the Lazy pattern

The earlier ConcurrentDictionary<string, SemaphoreSlim> approach held synchronization objects directly, requiring a cleanup strategy, and a naive TryRemove only introduced worse races. The alternative is to store the in-flight task itself as the value rather than a lock, and have it remove itself once it completes.

Using ConcurrentDictionary<string, Lazy<Task<Result<ProductDto>>>>, concurrent requests for the same key share a single Lazy via GetOrAdd. Because Lazy starts the Task only once, the database is queried exactly once, and all waiting requests await the same Task and receive the same Result. When that Task completes (in a continuation), removing the key from the map means the map grows only to the number of keys currently in flight, not the total number of keys ever seen. In other words, it is self-cleaning, so no separate eviction thread or semaphore cleanup is needed.

The key point is that instead of sharing a synchronization primitive (SemaphoreSlim), you share the same in-flight Task handle. Memory is held proportional to the level of concurrency, and removing entries immediately on completion minimizes GC pressure. (The only edge case to watch when removing in the continuation is the window where a new Lazy could be inserted after removal but before the old entry is fully gone.)

The code below uses ProductDto? instead of Result<T> to keep the explanation simple. In production code, use Result<T> or a dedicated union type to distinguish a miss from a failure.

C#

private readonly ConcurrentDictionary<string, Lazy<Task<ProductDto?>>> inFlight = new();

public Task<ProductDto?> CoalesceAsync(string key, Func<Task<ProductDto?>> factory)
{
    var lazy = inFlight.GetOrAdd(
        key,
        _ => new Lazy<Task<ProductDto?>>(
            factory,
            LazyThreadSafetyMode.ExecutionAndPublication));
    return AwaitAndRemoveAsync(key, lazy);
}

private async Task<ProductDto?> AwaitAndRemoveAsync(
    string key,
    Lazy<Task<ProductDto?>> lazy)
{
    try
    {
        return await lazy.Value.ConfigureAwait(false);
    }
    finally
    {
        // The entry is removed only when both the key and the lazy value match, avoiding a race condition where a new Lazy that slipped in between would be incorrectly deleted.
        inFlight.TryRemove(new KeyValuePair<string, Lazy<Task<ProductDto?>>>(key, lazy));
    }
}

The factory passed to GetOrAdd creates a Lazy, and Lazy.Value starts the Task exactly once. The critical part is passing both the key and the lazy to TryRemove together, so that a new Lazy inserted by another request in the meantime is not accidentally removed.

Negative Caching

Values that do not exist can be queried repeatedly. For example, repeatedly looking up a non-existent product ID or a deleted user ID hits the database every time.

In this situation, you can cache "not found" with a short TTL.

C#

var missingKey = $"product:{productId}:missing:v1";

var isMissing = await cache.GetAsync<bool?>(missingKey, ct);
if (isMissing is true)
{
    return null;
}

var product = await productRepository.FindDtoAsync(productId, ct);
if (product is null)
{
    // Cache missing values with a short TTL only.
    // This is because the same ID may be created later or the data may be restored.
    await cache.SetAsync(missingKey, true, TimeSpan.FromSeconds(30), ct);
    return null;
}

TypeScript example:

TypeScript

async function getProductWithNegativeCache(productId: number): Promise<ProductDto | null> {
  const missingKey = `product:${productId}:missing:v1`;
  const missing = await redis.get(missingKey);
  if (missing !== null) {
    return null;
  }

  const product = await productRepository.findDto(productId);
  if (product === null) {
    // Cache missing values with a short TTL.
    await redis.set(missingKey, "1", { EX: 30 });
    return null;
  }

  return product;
}

Things to watch out for:

Keep the negative cache TTL short.
If you store a negative cache under a separate key, delete that missing key as well when the corresponding entity is created or restored.
Be careful with values whose state changes are sensitive, such as permissions, payments, and inventory.
You must distinguish between "not found" and "cache failure."

Cache Failure Handling

The cache is a performance layer, not the source of truth. If a cache failure cascades into a total system failure, the design is wrong.

When the cache goes down on a read:

The code below is a simplified example for illustration. In production, push this try/catch out to the cache adapter boundary rather than leaving it in the service body (see the [Error Policy] section later).

C#

try
{
    var cached = await cache.GetAsync<ProductDto>(key, ct);
    if (cached is not null)
    {
        return cached;
    }
}
catch (Exception ex)
{
    // Log cache failures, but fall back to the database whenever possible.
    logger.LogWarning(ex, "Cache read failed. key={CacheKey}", key);
}

return await productRepository.FindDtoAsync(productId, ct);

Field Decision Criteria:

On cache read failure, fall back to the database whenever possible.
Cache write failures are often not propagated as response failures.
Keep cache timeouts short.
Blindly falling back to the database on every cache failure can trigger cascading failures all the way to the database. Consider protective strategies such as timeouts, circuit breakers, rate limiting, degraded responses, and returning stale cache values.
Monitor cache failure rate, hit ratio, and miss ratio.
You also need to assess whether the database can sustain the load if the cache goes down.

That last point matters. When the cache goes down, all reads fall through to the database. A service that normally enjoys a 95% cache hit ratio could see nearly 20x the database traffic the moment the cache fails.

The remedy for this cascade is isolation, not retries. Wrapping cache calls and DB fallback in a Circuit Breaker lets you open the circuit when failure exceeds a threshold, quickly dropping to a degraded response and then probing recovery by letting through only a small portion of requests after a delay. Combining this with a bulkhead (connection pool isolation) also prevents a single dependency failure from monopolizing the entire thread pool. Cache failure handling is completed not at the code-level try/fallback layer but at this architectural isolation layer.[17] (See Circuit Breaker)

Error Policy: Result Types and Guard Clauses

The examples above represent a cache miss as null and handle cache failures with try/catch. That works well enough for quick explanation, but two concerns arise in services where consistency and architectural boundaries matter. First, a miss (value absent) and a failure (cache is down) both collapse to null, leaving callers unable to tell them apart. Second, try/catch bleeds into query business logic and becomes part of the control flow.

A cache miss and a lookup failure are not exceptional situations; they are normal states the system should anticipate and recover from. For that reason, predictable failures should be modeled as Result types rather than exceptions, and try/catch should live only at boundaries like the cache adapter. The calling code then narrows state with guard clauses and is left with only the happy path.

C#

public enum CacheStatus { Hit, Miss, Down }
public readonly record struct CacheRead<T>(CacheStatus Status, T? Value);

// Adapter boundary: this is the only place where try/catch converts infrastructure failures into Results.
public async Task<CacheRead<ProductDto>> TryReadAsync(string key, CancellationToken ct)
{
    try
    {
        var cached = await cache.GetAsync<ProductDto>(key, ct);
        return cached is null
            ? new CacheRead<ProductDto>(CacheStatus.Miss, null)
            : new CacheRead<ProductDto>(CacheStatus.Hit, cached);
    }
    catch (Exception ex)
    {
        logger.LogWarning(ex, "Cache read failed. key={CacheKey}", key);
        return new CacheRead<ProductDto>(CacheStatus.Down, null);
    }
}

public async Task<ProductDto?> GetProductAsync(long productId, CancellationToken ct)
{
    var key = $"product:{productId}:detail:v1";

    var read = await TryReadAsync(key, ct);

    // Guard clause: return immediately on a cache hit.
    if (read.Status == CacheStatus.Hit)
    {
        return read.Value;
    }

    // Whether it's a miss or the cache is down, the database is the source of truth. There is no try/catch in the business logic.
    var product = await productRepository.FindDtoAsync(productId, ct);
    if (product is null)
    {
        return null;
    }

    // If the cache is down, don't bother attempting a write. Populate it only on a cache miss.
    if (read.Status == CacheStatus.Miss)
    {
        await TryWriteAsync(key, product, TimeSpan.FromMinutes(5), ct);
    }

    return product;
}

This approach concentrates the error policy in one place (the adapter boundary). Business logic explicitly branches on three states, Hit / Miss / Down, without directly handling infrastructure exceptions. Because miss and down are distinguished, policies like "skip the write when the cache is down" can be expressed naturally.

One caveat: the CacheRead<T> above declares T? Value, so the C# compiler cannot automatically infer that Value is non-null when Status == Hit. In real code this may surface nullable warnings, so either assert with read.Value! or wrap Hit/Miss/Down in a dedicated union type (or helper method) that enforces the relationship between state and value at the type level.

Personal note: I've actually tested this myself, so you're better off just trusting it. Skepticism is fine, but too much skepticism isn't.

Differences from Read-Through and Write-Through

With Cache-Aside, the application manages the cache directly. By contrast, Read-Through and Write-Through have the cache layer absorb more of the responsibility for accessing the source of truth.

Pattern	Responsibility for accessing the source of truth	Characteristics
Cache-Aside	Application	The most common and simplest pattern. The application code carries a large share of the responsibility.
Read-Through	Cache layer	The application sees only the cache. The cache implementation becomes more complex.
Write-Through	Cache / storage layer	On writes, both the cache and the data store are updated together.
Write-Behind	Cache / Queue / Background	Fast, but data loss and consistency design are difficult

Cache-Aside is simple, but the price of that simplicity is that application code must handle invalidation and failure directly.

Performance Optimization: Serialization and Allocation

The examples so far serialize values as JSON. This is easy to debug and works universally as a default, but for hot keys with extremely high read frequency, every lookup incurs string allocation and JSON parsing, putting pressure on C#'s GC (especially Gen0). Even if the cache is fast, GC pauses can push p99 latency higher.

When to make a change is determined by measurement, not a fixed constant. The signals to check first are these.

Do the allocation rate and Gen0 collection frequency account for a meaningful share of CPU?
Do p99/p999 latency spikes correlate with GC pauses?
The QPS and payload size of that hot key (parsing cost grows significantly above a few KB)
Does serialization cost fit within the latency budget?

Until these signals reach the threshold, it is better to stay with JSON. A premature switch to binary pays the cost of losing debuggability and schema evolution convenience before gaining any real benefit. Apply changes only to the measured hot 3%. Once the threshold is crossed, proceed down the following steps.

Keep JSON but reduce cost: use the System.Text.Json source generator to eliminate reflection, and use Utf8JsonReader to parse the bytes Redis returns (RedisValue) directly without creating an intermediate string.[16]
Binary protocol: if you need further size and speed improvements, switch to MessagePack or Protobuf. The payload gets smaller and parsing gets faster, but the data becomes human-unreadable, making key versioning (key design) more important.
Zero-allocation reads: for values with a fixed layout, use Span<T>, Memory<T>, and ArrayPool<byte> pooling to reduce heap allocations during deserialization to nearly zero.

In summary, the baseline is "the measurement point at which data size multiplied by read frequency begins to threaten your GC and latency budget." Before that point, stay with JSON; after it, step down as needed in order: byte parsing, then binary, then zero-alloc.

Performance Optimization Supplementary Notes

P99 and P999 (Tail Latency) When measuring performance, the average response time (Average Latency) lies. The average buries the small fraction of extreme outliers that can break a system beneath the noise of the majority of fast requests, hiding them perfectly. In practice, you should look at percentiles instead of averages.

P99 (99th Percentile): The response time at the 99th position when all requests are ranked from fastest to slowest. If P99 is 50ms, it means 99% of all requests are handled within 50ms, but the remaining 1% of users experience responses slower than 50ms.
P999 (99.9th Percentile): Similarly, the response time covering 99.9% of all requests, representing the extreme latency experienced by the remaining 0.1%.

Note: Why measure P99/P999 instead of the average? Even if the average cache response time is 1ms, whenever a C# GC (garbage collector) pause occurs, the top 1% (P99) or 0.1% (P999) of requests will suffer tail latency of hundreds of milliseconds. The higher the read frequency of a hot key, the more these instantaneous latency spikes become the trigger that cascades into system-wide timeouts and circuit breaker trips. Therefore, the baseline for serialization cost optimization should be set on suppressing P99 spikes, not on improving average speed.

When should you avoid it?

When you always need the most up-to-date data
When you cannot determine a cache invalidation criterion
When writes are very frequent and read reuse is low
When each piece of data has different TTL and consistency requirements, but you are forced to apply a single policy across all of them
When a cache failure can cascade into a full system outage
When cache keys do not properly reflect user permission or tenant boundaries

Pay particular attention to permissions and personal data. Using only a key like user:{id} omits the tenant, role, locale, and permission scope, which can cause the wrong data to be shown to a different user.

Security and Permission Boundaries

Because Cache-Aside has the application construct keys directly, any permission boundary that leaks through the key immediately becomes a data exposure. The following are commonly recommended rules to follow.

Do not allow keys without a tenant ID. In a multi-tenant setup, the key must always include the tenant.
Do not cache permissions using only a user ID. Include role, scope, locale, and version in the key.
Use the permission cache only as a hint, not for final authorization decisions; for sensitive determinations, re-verify against the authorization server or database.
Review whether payloads containing personally identifiable information (PII) should be encrypted, and avoid writing PII to cache logs or traces.
Protect Redis access itself. Minimize permissions with ACLs and wrap the transport layer with TLS.

In particular, events that reduce privileges, such as permission revocation, organizational transfers, subscription cancellations, or payment cancellations, must trigger immediate invalidation rather than waiting for TTL expiry. With permission caches, the real danger is not an unexpected privilege gain but a delay in revoking access.

The key point is that just as the cache is fast, incorrect boundaries leak just as fast. Never forget that the key itself is the access control boundary.

In fact, most program design ultimately comes down to how you define boundaries abstractly, and in this case, because the key is a form of permission, the important question is how precisely you define the scope of that permission.

Checklist

Things to verify before adding Cache-Aside:

Is this data queried repeatedly?
Is the acceptable stale data window clearly defined?
Does the key include the domain, identifier, query shape, and version?
Has it been decided which keys to evict after a write?
Is there a strategy in place to prevent a cache stampede?
Is a DB fallback available when the cache fails?
Can you monitor cache hit ratio, miss ratio, and latency?
Are personal data or authorization boundaries leaking into cached values?

The checklist item "Can you monitor hit ratio, miss ratio, and latency?" really comes down to how you observe them. Because Cache-Aside gives the application full control, looking only at Redis server metrics makes it difficult to tell which endpoints or business logic are punching through the cache and creating database load. By introducing distributed tracing such as OpenTelemetry14(#ref-14), creating explicit Span instances around cache lookup operations, recording hits and misses as tags, and correlating those spans with the underlying DB query traces, you can pinpoint exactly which code paths are generating misses that hit the database. Combining infrastructure metrics (server-level hit ratio) with application traces (per-endpoint hit/miss) is what produces the level of observability that matches the control responsibility Cache-Aside places on the application.

Quick Summary

Cache-Aside is a way to make the database feel faster, but it is fundamentally a pattern that trades read throughput for consistency risk. The performance gains come at the cost of mandatory attention to invalidation, TTL, failure handling, and authorization key boundaries.

References

[1] Microsoft Azure Architecture Center. Cache-Aside pattern ↩

[2] AWS. Caching Best Practices: Lazy caching ↩

[3] AWS Whitepaper. Database Caching Strategies Using Redis: Cache-Aside. Note that some AWS whitepapers may be marked as historical references; treat this as a conceptual reference accordingly. ↩

[4] Redis. Caching solutions: cache aside

[5] Microsoft Azure Architecture Center. Caching guidance

[6] Vattani, A., Chierichetti, F., Lowenstein, K. Optimal Probabilistic Cache Stampede Prevention. PVLDB 8(8), 886-897, 2015. (PER / XFetch)

[7] Redis. Scripting with Lua (EVAL): Atomic script execution ↩

[8] Redis. Key eviction (maxmemory-policy) ↩

[9] Redis. Memory optimization (activedefrag, mem_fragmentation_ratio) ↩

[10] Redis. Redis cluster specification: hash tags and key slots ↩

[11] Redis. CLUSTER KEYSLOT ↩

[12] Redis. Distributed Locks with Redis (Redlock): tokens and safe release

[13] Kleppmann, M. How to do distributed locking. Discusses the limitations of distributed locks and the fencing token approach.

[14] OpenTelemetry. Traces (Span, attributes, context propagation)

[15] Microsoft Learn. System.Threading.Channels (integration with BackgroundService) ↩