TechLead
Lesson 20 of 30
10 min read
System Design

System Design: Distributed Cache

Design a distributed caching system covering consistent hashing, replication, eviction policies, cache stampede prevention, and monitoring strategies

Problem Statement

Design a distributed caching system similar to Memcached or Redis Cluster. The cache must store key-value pairs across multiple nodes, handle node failures gracefully, and provide sub-millisecond read latency. Caching is one of the most impactful performance optimizations in system design, and understanding how distributed caches work is essential for designing scalable systems.

Step 1: Requirements and Goals

Functional Requirements

  • Store key-value pairs with configurable TTL (time-to-live)
  • Support GET, SET, DELETE operations
  • Distribute data across multiple cache nodes
  • Handle node additions and removals with minimal data redistribution
  • Support cache eviction when memory is full
  • Provide atomic operations (increment, compare-and-swap)

Non-Functional Requirements

  • Sub-millisecond latency for reads (<1ms p99)
  • High throughput: 1 million operations per second per node
  • High availability: survive individual node failures
  • Linear scalability: adding nodes increases capacity proportionally
  • Consistency: configurable (strong or eventual)

Step 2: Cache Architecture Types

Before designing a distributed cache, understand where caching fits in the system architecture.

Cache Placement Options

Type Where Pros Cons
Client-sideBrowser, mobile appFastest, no network hopLimited size, hard to invalidate
CDN CacheEdge locationsLow latency globallyStatic content only, invalidation delay
Application CacheApp server memoryNo network hop, fastNot shared, lost on restart
Distributed CacheDedicated cache clusterShared, scalable, durableNetwork hop, complexity
Database CacheDB query cacheAutomatic, no code changesLimited control, DB memory

Cache-aside (Lazy Loading) vs Write-through

// Cache-aside (most common pattern)
// Application manages cache population
class CacheAside {
  private cache: CacheClient;
  private db: Database;

  async get(key: string): Promise<any> {
    // 1. Check cache first
    const cached = await this.cache.get(key);
    if (cached !== null) {
      return cached; // Cache HIT
    }

    // 2. Cache MISS - fetch from database
    const data = await this.db.findByKey(key);
    if (data === null) return null;

    // 3. Populate cache for next time
    await this.cache.set(key, data, { ttl: 3600 });
    return data;
  }

  async update(key: string, data: any): Promise<void> {
    // 1. Update database
    await this.db.update(key, data);
    // 2. Invalidate cache (don't update - avoids race conditions)
    await this.cache.delete(key);
  }
}

// Write-through
// Cache is always updated alongside the database
class WriteThrough {
  private cache: CacheClient;
  private db: Database;

  async get(key: string): Promise<any> {
    // Always read from cache (it's always up-to-date)
    const cached = await this.cache.get(key);
    if (cached !== null) return cached;

    // Cold start: populate from DB
    const data = await this.db.findByKey(key);
    if (data) await this.cache.set(key, data);
    return data;
  }

  async update(key: string, data: any): Promise<void> {
    // Write to both cache and database
    await this.db.update(key, data);
    await this.cache.set(key, data);
    // Risk: if cache write fails after DB write, they're inconsistent
  }
}

// Write-behind (Write-back)
// Write to cache immediately, persist to DB asynchronously
class WriteBehind {
  private cache: CacheClient;
  private writeQueue: Queue;

  async update(key: string, data: any): Promise<void> {
    // Write to cache immediately (fast)
    await this.cache.set(key, data);
    // Queue DB write for async processing
    await this.writeQueue.enqueue({ key, data, timestamp: Date.now() });
    // Risk: data loss if cache fails before DB write
  }
}

Step 3: Consistent Hashing for Data Distribution

The key challenge in a distributed cache is deciding which node stores which key. Simple modular hashing (hash(key) % N) breaks down when nodes are added or removed because almost all keys get remapped. Consistent hashing solves this by minimizing key redistribution.

import { createHash } from "crypto";

class ConsistentHashRing {
  private ring = new Map<number, string>(); // position -> nodeId
  private sortedPositions: number[] = [];
  private virtualNodesPerNode: number;

  constructor(virtualNodesPerNode = 150) {
    this.virtualNodesPerNode = virtualNodesPerNode;
  }

  // Add a node to the ring
  addNode(nodeId: string): void {
    // Create virtual nodes for better distribution
    for (let i = 0; i < this.virtualNodesPerNode; i++) {
      const virtualKey = `${nodeId}:vn${i}`;
      const position = this.hash(virtualKey);
      this.ring.set(position, nodeId);
      this.sortedPositions.push(position);
    }
    this.sortedPositions.sort((a, b) => a - b);
  }

  // Remove a node from the ring
  removeNode(nodeId: string): void {
    for (let i = 0; i < this.virtualNodesPerNode; i++) {
      const virtualKey = `${nodeId}:vn${i}`;
      const position = this.hash(virtualKey);
      this.ring.delete(position);
    }
    this.sortedPositions = this.sortedPositions.filter(
      (pos) => this.ring.has(pos)
    );
  }

  // Find which node should store a given key
  getNode(key: string): string {
    if (this.ring.size === 0) throw new Error("No nodes in ring");

    const keyPosition = this.hash(key);

    // Find the first node position >= key position (clockwise)
    const idx = this.findNextPosition(keyPosition);
    return this.ring.get(this.sortedPositions[idx])!;
  }

  // Get N nodes for replication
  getNodes(key: string, replicaCount: number): string[] {
    const nodes = new Set<string>();
    const keyPosition = this.hash(key);
    let idx = this.findNextPosition(keyPosition);

    while (nodes.size < replicaCount && nodes.size < this.getUniqueNodeCount()) {
      const nodeId = this.ring.get(this.sortedPositions[idx])!;
      nodes.add(nodeId);
      idx = (idx + 1) % this.sortedPositions.length;
    }

    return Array.from(nodes);
  }

  private hash(key: string): number {
    const hash = createHash("md5").update(key).digest();
    return hash.readUInt32BE(0);
  }

  private findNextPosition(target: number): number {
    // Binary search for the first position >= target
    let lo = 0, hi = this.sortedPositions.length - 1;
    while (lo < hi) {
      const mid = (lo + hi) >> 1;
      if (this.sortedPositions[mid] < target) lo = mid + 1;
      else hi = mid;
    }
    return lo < this.sortedPositions.length ? lo : 0; // Wrap around
  }

  private getUniqueNodeCount(): number {
    return new Set(this.ring.values()).size;
  }
}

Why Virtual Nodes?

Without virtual nodes, data distribution can be very uneven because nodes are placed at arbitrary points on the hash ring. With 100-200 virtual nodes per physical node, each physical node is responsible for many small ranges on the ring, resulting in much more even distribution.

When a node is added, it only takes over a fraction of keys from its neighbors. When a node is removed, its keys are distributed evenly among remaining nodes. This means only K/N keys need to be moved (K = total keys, N = total nodes), compared to nearly all keys with modular hashing.

Step 4: Replication Strategies

To survive node failures, each key-value pair should be stored on multiple nodes. The replication factor (typically 2-3) determines how many copies exist.

class ReplicatedCache {
  private hashRing: ConsistentHashRing;
  private nodes: Map<string, CacheNode>;
  private replicaCount = 3;

  async set(key: string, value: any, ttl?: number): Promise<void> {
    const targetNodes = this.hashRing.getNodes(key, this.replicaCount);

    // Write to all replicas
    const writePromises = targetNodes.map((nodeId) =>
      this.nodes.get(nodeId)!.set(key, value, ttl)
    );

    // Wait for quorum (majority) to acknowledge
    const quorum = Math.floor(this.replicaCount / 2) + 1;
    await this.waitForQuorum(writePromises, quorum);
  }

  async get(key: string): Promise<any> {
    const targetNodes = this.hashRing.getNodes(key, this.replicaCount);

    // Read from the first available replica
    for (const nodeId of targetNodes) {
      try {
        const value = await this.nodes.get(nodeId)!.get(key);
        if (value !== null) return value;
      } catch (error) {
        // Node is down, try next replica
        continue;
      }
    }

    return null; // Key not found in any replica
  }

  private async waitForQuorum(
    promises: Promise<void>[],
    quorum: number
  ): Promise<void> {
    let successes = 0;
    let failures = 0;
    const total = promises.length;

    return new Promise((resolve, reject) => {
      for (const p of promises) {
        p.then(() => {
          successes++;
          if (successes >= quorum) resolve();
        }).catch(() => {
          failures++;
          if (failures > total - quorum) {
            reject(new Error("Failed to achieve write quorum"));
          }
        });
      }
    });
  }
}

Step 5: Eviction Policies

When cache memory is full, old entries must be removed to make room for new ones. The eviction policy determines which entries to remove.

Eviction Policy Comparison

Policy Description Best For Overhead
LRUEvict least recently usedGeneral purposeMedium (doubly linked list)
LFUEvict least frequently usedFrequency-skewed workloadsHigher (frequency counters)
FIFOEvict oldest entrySimple use casesLow (queue)
RandomEvict a random entryWhen access patterns are uniformVery low
TTL-basedEvict expired entries firstTime-sensitive dataLow (expiry heap)
// LRU Cache implementation using a doubly linked list + hash map
class LRUCache<K, V> {
  private capacity: number;
  private cache = new Map<K, { value: V; node: DLLNode<K> }>();
  private dll: DoublyLinkedList<K>; // Most recent at head, least recent at tail

  constructor(capacity: number) {
    this.capacity = capacity;
    this.dll = new DoublyLinkedList();
  }

  get(key: K): V | null {
    const entry = this.cache.get(key);
    if (!entry) return null;

    // Move to front (most recently used)
    this.dll.moveToFront(entry.node);
    return entry.value;
  }

  set(key: K, value: V): void {
    if (this.cache.has(key)) {
      // Update existing
      const entry = this.cache.get(key)!;
      entry.value = value;
      this.dll.moveToFront(entry.node);
      return;
    }

    // Evict if at capacity
    if (this.cache.size >= this.capacity) {
      const evicted = this.dll.removeTail(); // Least recently used
      if (evicted) {
        this.cache.delete(evicted.key);
      }
    }

    // Add new entry
    const node = this.dll.addToFront(key);
    this.cache.set(key, { value, node });
  }

  delete(key: K): boolean {
    const entry = this.cache.get(key);
    if (!entry) return false;
    this.dll.remove(entry.node);
    this.cache.delete(key);
    return true;
  }
}

Step 6: Cache Stampede Prevention

A cache stampede (also called thundering herd) occurs when a popular cache key expires and hundreds of concurrent requests all miss the cache simultaneously, overwhelming the database with identical queries.

Stampede Scenarios

  • Key Expiration: A hot key's TTL expires, and thousands of requests hit the database
  • Cache Restart: A cache node restarts with an empty cache (cold start)
  • Deployment: New code that caches different keys causes a wave of cache misses
class StampedeProtectedCache {
  private cache: CacheClient;
  private db: Database;
  private locks: RedisClient; // Separate Redis for distributed locks

  // Strategy 1: Distributed Locking
  // Only one request fetches from DB, others wait for the cache to be populated
  async getWithLock(key: string): Promise<any> {
    const cached = await this.cache.get(key);
    if (cached) return cached;

    const lockKey = `lock:${key}`;
    const acquired = await this.locks.set(lockKey, "1", "EX", 10, "NX");

    if (acquired) {
      try {
        // We got the lock - fetch from DB and populate cache
        const data = await this.db.findByKey(key);
        await this.cache.set(key, data, { ttl: 3600 });
        return data;
      } finally {
        await this.locks.del(lockKey);
      }
    } else {
      // Another request is fetching - wait and retry
      await this.sleep(50);
      return this.getWithLock(key); // Retry (with backoff in production)
    }
  }

  // Strategy 2: Stale-While-Revalidate
  // Serve stale data while refreshing in the background
  async getWithStaleRefresh(key: string): Promise<any> {
    const entry = await this.cache.getWithMeta(key);

    if (entry && !entry.isExpired) {
      return entry.value; // Fresh cache hit
    }

    if (entry && entry.isExpired) {
      // Serve stale data immediately
      // Trigger background refresh
      this.refreshInBackground(key);
      return entry.value; // Return stale data (better than waiting)
    }

    // Complete cache miss - must fetch
    return this.fetchAndCache(key);
  }

  // Strategy 3: Probabilistic Early Expiration
  // Randomly refresh keys before they expire to smooth out expiration
  async getWithEarlyExpiration(key: string, baseTTL: number): Promise<any> {
    const entry = await this.cache.getWithTTL(key);

    if (entry) {
      const remainingTTL = entry.ttl;
      const shouldRefreshEarly = Math.random() < Math.exp(-remainingTTL / (baseTTL * 0.1));

      if (shouldRefreshEarly) {
        // Probabilistically refresh before expiration
        this.refreshInBackground(key);
      }
      return entry.value;
    }

    return this.fetchAndCache(key);
  }

  private async refreshInBackground(key: string): Promise<void> {
    // Fire and forget
    this.fetchAndCache(key).catch(console.error);
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Step 7: Monitoring and Metrics

A distributed cache requires careful monitoring to ensure it is performing optimally and to detect issues before they impact users.

Key Metrics to Monitor

  • Cache Hit Ratio: Percentage of requests served from cache. Target: >95% for most workloads. Low hit ratio indicates wrong TTL, insufficient memory, or poor key design.
  • Latency (p50, p99, p999): Should be sub-millisecond. Spikes indicate network issues or hotspots.
  • Memory Usage: Track per-node memory usage. Alert when approaching capacity to prevent excessive evictions.
  • Eviction Rate: High eviction rate means the cache is too small or TTLs are wrong.
  • Connection Count: Number of client connections per node. Uneven distribution indicates routing issues.
  • Key Distribution: Ensure keys are evenly distributed across nodes (no hotspots).
  • Replication Lag: Time for a write to propagate to replicas. High lag risks serving stale data.
class CacheMetrics {
  private hits = 0;
  private misses = 0;
  private latencies: number[] = [];

  recordHit(latencyMs: number): void {
    this.hits++;
    this.latencies.push(latencyMs);
  }

  recordMiss(latencyMs: number): void {
    this.misses++;
    this.latencies.push(latencyMs);
  }

  getStats(): CacheStats {
    const total = this.hits + this.misses;
    const sorted = [...this.latencies].sort((a, b) => a - b);

    return {
      hitRatio: total > 0 ? this.hits / total : 0,
      totalRequests: total,
      p50Latency: sorted[Math.floor(sorted.length * 0.5)] || 0,
      p99Latency: sorted[Math.floor(sorted.length * 0.99)] || 0,
      p999Latency: sorted[Math.floor(sorted.length * 0.999)] || 0,
    };
  }

  // Alert conditions
  shouldAlert(stats: CacheStats): string[] {
    const alerts: string[] = [];
    if (stats.hitRatio < 0.90) {
      alerts.push(`Low cache hit ratio: ${(stats.hitRatio * 100).toFixed(1)}%`);
    }
    if (stats.p99Latency > 5) {
      alerts.push(`High p99 latency: ${stats.p99Latency}ms`);
    }
    return alerts;
  }
}

Architecture Summary

Component Design Decision
Data DistributionConsistent hashing with 150 virtual nodes per physical node
Replication3 replicas, quorum writes (2/3), read from nearest
Eviction PolicyLRU with TTL-based expiration (Redis default: allkeys-lru)
Stampede PreventionDistributed locking for cold keys, stale-while-revalidate for hot keys
Caching PatternCache-aside for reads, invalidation on writes
SerializationMessagePack or Protocol Buffers (faster than JSON)
MonitoringHit ratio, latency percentiles, eviction rate, memory usage per node

Interview Tips

  • Start with consistent hashing: This is the core concept interviewers want to hear about
  • Explain virtual nodes: Show you understand why raw consistent hashing leads to uneven distribution
  • Discuss cache invalidation strategies: Cache-aside with invalidation on writes is the safest default
  • Always mention cache stampede: This shows production experience and depth of understanding
  • Compare Redis vs Memcached: Redis has data structures, persistence, and replication; Memcached is simpler and multi-threaded

Continue Learning