What is System Design: Distributed Cache?

Design a distributed caching system covering consistent hashing, replication, eviction policies, cache stampede prevention, and monitoring strategies

System Design: Distributed Cache - System Design Tutorial | TechLead

Q: Step 1: Requirements and Goals

Before designing a distributed cache, understand where caching fits in the system architecture.

Q: Step 3: Consistent Hashing for Data Distribution

The key challenge in a distributed cache is deciding which node stores which key . Simple modular hashing ( hash(key) % N ) breaks down when nodes are added or removed because almost all keys get remapped. Consistent hashing solves this by minimizing key redistribution.

Problem Statement

Design a distributed caching system similar to Memcached or Redis Cluster. The cache must store key-value pairs across multiple nodes, handle node failures gracefully, and provide sub-millisecond read latency. Caching is one of the most impactful performance optimizations in system design, and understanding how distributed caches work is essential for designing scalable systems.

Step 1: Requirements and Goals

Functional Requirements

Store key-value pairs with configurable TTL (time-to-live)
Support GET, SET, DELETE operations
Distribute data across multiple cache nodes
Handle node additions and removals with minimal data redistribution
Support cache eviction when memory is full
Provide atomic operations (increment, compare-and-swap)

Non-Functional Requirements

Sub-millisecond latency for reads (<1ms p99)
High throughput: 1 million operations per second per node
High availability: survive individual node failures
Linear scalability: adding nodes increases capacity proportionally
Consistency: configurable (strong or eventual)

Step 2: Cache Architecture Types

Before designing a distributed cache, understand where caching fits in the system architecture.

Cache Placement Options

Type	Where	Pros	Cons
Client-side	Browser, mobile app	Fastest, no network hop	Limited size, hard to invalidate
CDN Cache	Edge locations	Low latency globally	Static content only, invalidation delay
Application Cache	App server memory	No network hop, fast	Not shared, lost on restart
Distributed Cache	Dedicated cache cluster	Shared, scalable, durable	Network hop, complexity
Database Cache	DB query cache	Automatic, no code changes	Limited control, DB memory

Cache-aside (Lazy Loading) vs Write-through

// Cache-aside (most common pattern)
// Application manages cache population
class CacheAside {
  private cache: CacheClient;
  private db: Database;

  async get(key: string): Promise<any> {
    // 1. Check cache first
    const cached = await this.cache.get(key);
    if (cached !== null) {
      return cached; // Cache HIT
    }

    // 2. Cache MISS - fetch from database
    const data = await this.db.findByKey(key);
    if (data === null) return null;

    // 3. Populate cache for next time
    await this.cache.set(key, data, { ttl: 3600 });
    return data;
  }

  async update(key: string, data: any): Promise<void> {
    // 1. Update database
    await this.db.update(key, data);
    // 2. Invalidate cache (don't update - avoids race conditions)
    await this.cache.delete(key);
  }
}

// Write-through
// Cache is always updated alongside the database
class WriteThrough {
  private cache: CacheClient;
  private db: Database;

  async get(key: string): Promise<any> {
    // Always read from cache (it's always up-to-date)
    const cached = await this.cache.get(key);
    if (cached !== null) return cached;

    // Cold start: populate from DB
    const data = await this.db.findByKey(key);
    if (data) await this.cache.set(key, data);
    return data;
  }

  async update(key: string, data: any): Promise<void> {
    // Write to both cache and database
    await this.db.update(key, data);
    await this.cache.set(key, data);
    // Risk: if cache write fails after DB write, they're inconsistent
  }
}

// Write-behind (Write-back)
// Write to cache immediately, persist to DB asynchronously
class WriteBehind {
  private cache: CacheClient;
  private writeQueue: Queue;

  async update(key: string, data: any): Promise<void> {
    // Write to cache immediately (fast)
    await this.cache.set(key, data);
    // Queue DB write for async processing
    await this.writeQueue.enqueue({ key, data, timestamp: Date.now() });
    // Risk: data loss if cache fails before DB write
  }
}

Step 3: Consistent Hashing for Data Distribution

The key challenge in a distributed cache is deciding which node stores which key. Simple modular hashing (hash(key) % N) breaks down when nodes are added or removed because almost all keys get remapped. Consistent hashing solves this by minimizing key redistribution.

import { createHash } from "crypto";

class ConsistentHashRing {
  private ring = new Map<number, string>(); // position -> nodeId
  private sortedPositions: number[] = [];
  private virtualNodesPerNode: number;

  constructor(virtualNodesPerNode = 150) {
    this.virtualNodesPerNode = virtualNodesPerNode;
  }

  // Add a node to the ring
  addNode(nodeId: string): void {
    // Create virtual nodes for better distribution
    for (let i = 0; i < this.virtualNodesPerNode; i++) {
      const virtualKey = `${nodeId}:vn${i}`;
      const position = this.hash(virtualKey);
      this.ring.set(position, nodeId);
      this.sortedPositions.push(position);
    }
    this.sortedPositions.sort((a, b) => a - b);
  }

  // Remove a node from the ring
  removeNode(nodeId: string): void {
    for (let i = 0; i < this.virtualNodesPerNode; i++) {
      const virtualKey = `${nodeId}:vn${i}`;
      const position = this.hash(virtualKey);
      this.ring.delete(position);
    }
    this.sortedPositions = this.sortedPositions.filter(
      (pos) => this.ring.has(pos)
    );
  }

  // Find which node should store a given key
  getNode(key: string): string {
    if (this.ring.size === 0) throw new Error("No nodes in ring");

    const keyPosition = this.hash(key);

    // Find the first node position >= key position (clockwise)
    const idx = this.findNextPosition(keyPosition);
    return this.ring.get(this.sortedPositions[idx])!;
  }

  // Get N nodes for replication
  getNodes(key: string, replicaCount: number): string[] {
    const nodes = new Set<string>();
    const keyPosition = this.hash(key);
    let idx = this.findNextPosition(keyPosition);

    while (nodes.size < replicaCount && nodes.size < this.getUniqueNodeCount()) {
      const nodeId = this.ring.get(this.sortedPositions[idx])!;
      nodes.add(nodeId);
      idx = (idx + 1) % this.sortedPositions.length;
    }

    return Array.from(nodes);
  }

  private hash(key: string): number {
    const hash = createHash("md5").update(key).digest();
    return hash.readUInt32BE(0);
  }

  private findNextPosition(target: number): number {
    // Binary search for the first position >= target
    let lo = 0, hi = this.sortedPositions.length - 1;
    while (lo < hi) {
      const mid = (lo + hi) >> 1;
      if (this.sortedPositions[mid] < target) lo = mid + 1;
      else hi = mid;
    }
    return lo < this.sortedPositions.length ? lo : 0; // Wrap around
  }

  private getUniqueNodeCount(): number {
    return new Set(this.ring.values()).size;
  }
}

Why Virtual Nodes?

Without virtual nodes, data distribution can be very uneven because nodes are placed at arbitrary points on the hash ring. With 100-200 virtual nodes per physical node, each physical node is responsible for many small ranges on the ring, resulting in much more even distribution.

When a node is added, it only takes over a fraction of keys from its neighbors. When a node is removed, its keys are distributed evenly among remaining nodes. This means only K/N keys need to be moved (K = total keys, N = total nodes), compared to nearly all keys with modular hashing.

Step 4: Replication Strategies

To survive node failures, each key-value pair should be stored on multiple nodes. The replication factor (typically 2-3) determines how many copies exist.

class ReplicatedCache {
  private hashRing: ConsistentHashRing;
  private nodes: Map<string, CacheNode>;
  private replicaCount = 3;

  async set(key: string, value: any, ttl?: number): Promise<void> {
    const targetNodes = this.hashRing.getNodes(key, this.replicaCount);

    // Write to all replicas
    const writePromises = targetNodes.map((nodeId) =>
      this.nodes.get(nodeId)!.set(key, value, ttl)
    );

    // Wait for quorum (majority) to acknowledge
    const quorum = Math.floor(this.replicaCount / 2) + 1;
    await this.waitForQuorum(writePromises, quorum);
  }

  async get(key: string): Promise<any> {
    const targetNodes = this.hashRing.getNodes(key, this.replicaCount);

    // Read from the first available replica
    for (const nodeId of targetNodes) {
      try {
        const value = await this.nodes.get(nodeId)!.get(key);
        if (value !== null) return value;
      } catch (error) {
        // Node is down, try next replica
        continue;
      }
    }

    return null; // Key not found in any replica
  }

  private async waitForQuorum(
    promises: Promise<void>[],
    quorum: number
  ): Promise<void> {
    let successes = 0;
    let failures = 0;
    const total = promises.length;

    return new Promise((resolve, reject) => {
      for (const p of promises) {
        p.then(() => {
          successes++;
          if (successes >= quorum) resolve();
        }).catch(() => {
          failures++;
          if (failures > total - quorum) {
            reject(new Error("Failed to achieve write quorum"));
          }
        });
      }
    });
  }
}

Step 5: Eviction Policies

When cache memory is full, old entries must be removed to make room for new ones. The eviction policy determines which entries to remove.

Eviction Policy Comparison

Policy	Description	Best For	Overhead
LRU	Evict least recently used	General purpose	Medium (doubly linked list)
LFU	Evict least frequently used	Frequency-skewed workloads	Higher (frequency counters)
FIFO	Evict oldest entry	Simple use cases	Low (queue)
Random	Evict a random entry	When access patterns are uniform	Very low
TTL-based	Evict expired entries first	Time-sensitive data	Low (expiry heap)

// LRU Cache implementation using a doubly linked list + hash map
class LRUCache<K, V> {
  private capacity: number;
  private cache = new Map<K, { value: V; node: DLLNode<K> }>();
  private dll: DoublyLinkedList<K>; // Most recent at head, least recent at tail

  constructor(capacity: number) {
    this.capacity = capacity;
    this.dll = new DoublyLinkedList();
  }

  get(key: K): V | null {
    const entry = this.cache.get(key);
    if (!entry) return null;

    // Move to front (most recently used)
    this.dll.moveToFront(entry.node);
    return entry.value;
  }

  set(key: K, value: V): void {
    if (this.cache.has(key)) {
      // Update existing
      const entry = this.cache.get(key)!;
      entry.value = value;
      this.dll.moveToFront(entry.node);
      return;
    }

    // Evict if at capacity
    if (this.cache.size >= this.capacity) {
      const evicted = this.dll.removeTail(); // Least recently used
      if (evicted) {
        this.cache.delete(evicted.key);
      }
    }

    // Add new entry
    const node = this.dll.addToFront(key);
    this.cache.set(key, { value, node });
  }

  delete(key: K): boolean {
    const entry = this.cache.get(key);
    if (!entry) return false;
    this.dll.remove(entry.node);
    this.cache.delete(key);
    return true;
  }
}

Step 6: Cache Stampede Prevention

A cache stampede (also called thundering herd) occurs when a popular cache key expires and hundreds of concurrent requests all miss the cache simultaneously, overwhelming the database with identical queries.

Stampede Scenarios

Key Expiration: A hot key's TTL expires, and thousands of requests hit the database
Cache Restart: A cache node restarts with an empty cache (cold start)
Deployment: New code that caches different keys causes a wave of cache misses

class StampedeProtectedCache {
  private cache: CacheClient;
  private db: Database;
  private locks: RedisClient; // Separate Redis for distributed locks

  // Strategy 1: Distributed Locking
  // Only one request fetches from DB, others wait for the cache to be populated
  async getWithLock(key: string): Promise<any> {
    const cached = await this.cache.get(key);
    if (cached) return cached;

    const lockKey = `lock:${key}`;
    const acquired = await this.locks.set(lockKey, "1", "EX", 10, "NX");

    if (acquired) {
      try {
        // We got the lock - fetch from DB and populate cache
        const data = await this.db.findByKey(key);
        await this.cache.set(key, data, { ttl: 3600 });
        return data;
      } finally {
        await this.locks.del(lockKey);
      }
    } else {
      // Another request is fetching - wait and retry
      await this.sleep(50);
      return this.getWithLock(key); // Retry (with backoff in production)
    }
  }

  // Strategy 2: Stale-While-Revalidate
  // Serve stale data while refreshing in the background
  async getWithStaleRefresh(key: string): Promise<any> {
    const entry = await this.cache.getWithMeta(key);

    if (entry && !entry.isExpired) {
      return entry.value; // Fresh cache hit
    }

    if (entry && entry.isExpired) {
      // Serve stale data immediately
      // Trigger background refresh
      this.refreshInBackground(key);
      return entry.value; // Return stale data (better than waiting)
    }

    // Complete cache miss - must fetch
    return this.fetchAndCache(key);
  }

  // Strategy 3: Probabilistic Early Expiration
  // Randomly refresh keys before they expire to smooth out expiration
  async getWithEarlyExpiration(key: string, baseTTL: number): Promise<any> {
    const entry = await this.cache.getWithTTL(key);

    if (entry) {
      const remainingTTL = entry.ttl;
      const shouldRefreshEarly = Math.random() < Math.exp(-remainingTTL / (baseTTL * 0.1));

      if (shouldRefreshEarly) {
        // Probabilistically refresh before expiration
        this.refreshInBackground(key);
      }
      return entry.value;
    }

    return this.fetchAndCache(key);
  }

  private async refreshInBackground(key: string): Promise<void> {
    // Fire and forget
    this.fetchAndCache(key).catch(console.error);
  }

  private sleep(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Step 7: Monitoring and Metrics

A distributed cache requires careful monitoring to ensure it is performing optimally and to detect issues before they impact users.

Key Metrics to Monitor

Cache Hit Ratio: Percentage of requests served from cache. Target: >95% for most workloads. Low hit ratio indicates wrong TTL, insufficient memory, or poor key design.
Latency (p50, p99, p999): Should be sub-millisecond. Spikes indicate network issues or hotspots.
Memory Usage: Track per-node memory usage. Alert when approaching capacity to prevent excessive evictions.
Eviction Rate: High eviction rate means the cache is too small or TTLs are wrong.
Connection Count: Number of client connections per node. Uneven distribution indicates routing issues.
Key Distribution: Ensure keys are evenly distributed across nodes (no hotspots).
Replication Lag: Time for a write to propagate to replicas. High lag risks serving stale data.

class CacheMetrics {
  private hits = 0;
  private misses = 0;
  private latencies: number[] = [];

  recordHit(latencyMs: number): void {
    this.hits++;
    this.latencies.push(latencyMs);
  }

  recordMiss(latencyMs: number): void {
    this.misses++;
    this.latencies.push(latencyMs);
  }

  getStats(): CacheStats {
    const total = this.hits + this.misses;
    const sorted = [...this.latencies].sort((a, b) => a - b);

    return {
      hitRatio: total > 0 ? this.hits / total : 0,
      totalRequests: total,
      p50Latency: sorted[Math.floor(sorted.length * 0.5)] || 0,
      p99Latency: sorted[Math.floor(sorted.length * 0.99)] || 0,
      p999Latency: sorted[Math.floor(sorted.length * 0.999)] || 0,
    };
  }

  // Alert conditions
  shouldAlert(stats: CacheStats): string[] {
    const alerts: string[] = [];
    if (stats.hitRatio < 0.90) {
      alerts.push(`Low cache hit ratio: ${(stats.hitRatio * 100).toFixed(1)}%`);
    }
    if (stats.p99Latency > 5) {
      alerts.push(`High p99 latency: ${stats.p99Latency}ms`);
    }
    return alerts;
  }
}

Architecture Summary

Component	Design Decision
Data Distribution	Consistent hashing with 150 virtual nodes per physical node
Replication	3 replicas, quorum writes (2/3), read from nearest
Eviction Policy	LRU with TTL-based expiration (Redis default: allkeys-lru)
Stampede Prevention	Distributed locking for cold keys, stale-while-revalidate for hot keys
Caching Pattern	Cache-aside for reads, invalidation on writes
Serialization	MessagePack or Protocol Buffers (faster than JSON)
Monitoring	Hit ratio, latency percentiles, eviction rate, memory usage per node

Interview Tips

Start with consistent hashing: This is the core concept interviewers want to hear about
Explain virtual nodes: Show you understand why raw consistent hashing leads to uneven distribution
Discuss cache invalidation strategies: Cache-aside with invalidation on writes is the safest default
Always mention cache stampede: This shows production experience and depth of understanding
Compare Redis vs Memcached: Redis has data structures, persistence, and replication; Memcached is simpler and multi-threaded

System Design: Distributed Cache