What is Rate Limiting Algorithms?

Master rate limiting algorithms including token bucket, leaky bucket, fixed window, sliding window, and distributed rate limiting with TypeScript implementations

Rate Limiting Algorithms - System Design Tutorial | TechLead

Q: Why Rate Limiting Is Important

Rate limiting controls the number of requests a client can make to an API within a given time period. It is a critical defense mechanism that protects your services from being overwhelmed, whether by malicious attacks (DDoS), misbehaving clients, or simple bugs that cause excessive API calls.

Why Rate Limiting Is Important

Rate limiting controls the number of requests a client can make to an API within a given time period. It is a critical defense mechanism that protects your services from being overwhelmed, whether by malicious attacks (DDoS), misbehaving clients, or simple bugs that cause excessive API calls.

Why Rate Limit?

Prevent abuse: Stop malicious users from overwhelming your API with requests (DDoS attacks, brute-force login attempts).
Ensure fairness: Prevent a single client from monopolizing resources, ensuring all users get fair access.
Control costs: API calls to downstream services (databases, third-party APIs) cost money. Rate limiting caps this cost.
Maintain stability: Protect backend services from traffic spikes that could cause cascading failures.
Comply with agreements: Enforce API usage tiers (free: 100 req/min, pro: 10,000 req/min).

Rate Limiting Algorithms

1. Token Bucket

The token bucket algorithm is one of the most widely used rate limiting approaches. Imagine a bucket that holds tokens. Tokens are added to the bucket at a fixed rate (e.g., 10 tokens per second). Each request consumes one token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity, so tokens do not accumulate infinitely.

The key advantage of the token bucket is that it allows bursts. If the bucket is full (e.g., 100 tokens), a client can make 100 requests instantly, then must wait for tokens to refill. This mirrors real-world usage patterns where users may burst briefly but maintain a low average rate.

class TokenBucket {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private maxTokens: number,       // Maximum burst size
    private refillRate: number,      // Tokens added per second
  ) {
    this.tokens = maxTokens;
    this.lastRefill = Date.now();
  }

  private refill(): void {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000; // seconds
    const newTokens = elapsed * this.refillRate;
    this.tokens = Math.min(this.maxTokens, this.tokens + newTokens);
    this.lastRefill = now;
  }

  tryConsume(tokens: number = 1): boolean {
    this.refill();

    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return true; // Request allowed
    }

    return false; // Rate limited
  }

  getTokensRemaining(): number {
    this.refill();
    return Math.floor(this.tokens);
  }
}

// Usage: 10 requests per second with burst of 20
const bucket = new TokenBucket(20, 10);

function handleRequest(req: Request, res: Response) {
  if (!bucket.tryConsume()) {
    res.setHeader('Retry-After', '1');
    res.setHeader('X-RateLimit-Remaining', '0');
    return res.status(429).json({ error: 'Too many requests' });
  }

  res.setHeader('X-RateLimit-Remaining', String(bucket.getTokensRemaining()));
  // Process request...
}

2. Leaky Bucket

The leaky bucket processes requests at a fixed, constant rate — like water leaking from a bucket at a steady drip. Incoming requests are added to a queue (the bucket). If the queue is full, new requests are rejected. Requests are processed from the queue at a constant rate, regardless of how fast they arrive.

Unlike the token bucket, the leaky bucket does not allow bursts. It smooths out traffic to a constant rate. This is useful when the backend can only handle a steady stream of requests.

class LeakyBucket {
  private queue: Array<() => void> = [];
  private processing: boolean = false;

  constructor(
    private capacity: number,          // Max queue size
    private leakRateMs: number,        // Process one request every N ms
  ) {}

  async tryAdd(handler: () => void): Promise<boolean> {
    if (this.queue.length >= this.capacity) {
      return false; // Bucket is full - reject
    }

    this.queue.push(handler);
    this.startLeaking();
    return true;
  }

  private startLeaking(): void {
    if (this.processing) return;
    this.processing = true;

    const leak = () => {
      const handler = this.queue.shift();
      if (handler) {
        handler(); // Process the request
        setTimeout(leak, this.leakRateMs); // Fixed rate
      } else {
        this.processing = false;
      }
    };

    leak();
  }
}

// Process 1 request every 100ms (10 requests/sec), queue up to 50
const leaky = new LeakyBucket(50, 100);

3. Fixed Window Counter

Divide time into fixed windows (e.g., 1-minute windows) and count the number of requests in each window. If the count exceeds the limit, reject subsequent requests until the next window starts.

This is the simplest algorithm to implement but has a known problem: a burst at the end of one window and the start of the next can result in twice the allowed rate. For example, with a limit of 100 requests per minute, a client could send 100 requests at 0:59 and 100 more at 1:00 — 200 requests in 2 seconds.

class FixedWindowCounter {
  private windows: Map<string, number> = new Map();

  constructor(
    private limit: number,
    private windowSizeMs: number,
  ) {}

  private getCurrentWindow(): string {
    const windowStart = Math.floor(Date.now() / this.windowSizeMs);
    return windowStart.toString();
  }

  tryConsume(clientId: string): boolean {
    const window = this.getCurrentWindow();
    const key = `${clientId}:${window}`;

    const currentCount = this.windows.get(key) || 0;

    if (currentCount >= this.limit) {
      return false; // Rate limited
    }

    this.windows.set(key, currentCount + 1);

    // Clean up old windows periodically
    this.cleanup();

    return true;
  }

  private cleanup(): void {
    const currentWindow = this.getCurrentWindow();
    for (const [key] of this.windows) {
      if (!key.endsWith(currentWindow)) {
        this.windows.delete(key);
      }
    }
  }
}

// 100 requests per minute
const fixedWindow = new FixedWindowCounter(100, 60_000);

4. Sliding Window Log

Keep a timestamped log of every request. When a new request comes in, remove all entries older than the window size, then count the remaining entries. If the count exceeds the limit, reject the request.

This algorithm is precise — it does not suffer from the boundary problem of fixed windows. However, it uses more memory because it stores a timestamp for every request within the window.

class SlidingWindowLog {
  private logs: Map<string, number[]> = new Map();

  constructor(
    private limit: number,
    private windowSizeMs: number,
  ) {}

  tryConsume(clientId: string): boolean {
    const now = Date.now();
    const windowStart = now - this.windowSizeMs;

    // Get or create log for this client
    let timestamps = this.logs.get(clientId) || [];

    // Remove entries outside the window
    timestamps = timestamps.filter(ts => ts > windowStart);

    if (timestamps.length >= this.limit) {
      this.logs.set(clientId, timestamps);
      return false; // Rate limited
    }

    // Add current request
    timestamps.push(now);
    this.logs.set(clientId, timestamps);

    return true;
  }
}

// 100 requests per 60-second sliding window
const slidingLog = new SlidingWindowLog(100, 60_000);

5. Sliding Window Counter

The sliding window counter is a hybrid that combines the memory efficiency of fixed windows with the accuracy of sliding windows. It uses two adjacent fixed windows and calculates a weighted count based on where the current time falls within the current window.

class SlidingWindowCounter {
  private windows: Map<string, number> = new Map();

  constructor(
    private limit: number,
    private windowSizeMs: number,
  ) {}

  tryConsume(clientId: string): boolean {
    const now = Date.now();
    const currentWindowStart = Math.floor(now / this.windowSizeMs) * this.windowSizeMs;
    const previousWindowStart = currentWindowStart - this.windowSizeMs;

    const currentKey = `${clientId}:${currentWindowStart}`;
    const previousKey = `${clientId}:${previousWindowStart}`;

    const currentCount = this.windows.get(currentKey) || 0;
    const previousCount = this.windows.get(previousKey) || 0;

    // Calculate the weight of the previous window
    // based on how far we are into the current window
    const elapsedInCurrentWindow = now - currentWindowStart;
    const previousWeight = 1 - (elapsedInCurrentWindow / this.windowSizeMs);

    // Weighted count: full current window + proportional previous window
    const estimatedCount = Math.floor(previousCount * previousWeight) + currentCount;

    if (estimatedCount >= this.limit) {
      return false; // Rate limited
    }

    this.windows.set(currentKey, currentCount + 1);
    return true;
  }
}

// 100 requests per 60-second sliding window (approximate)
const slidingCounter = new SlidingWindowCounter(100, 60_000);

Algorithm Comparison

Algorithm	Memory	Accuracy	Burst	Complexity
Token Bucket	O(1)	Good	Allows bursts	Simple
Leaky Bucket	O(n) queue	Good	Smooths out	Simple
Fixed Window	O(1)	Boundary issues	2x burst at boundary	Simplest
Sliding Window Log	O(n) per user	Exact	No boundary issues	Moderate
Sliding Window Counter	O(1)	Approximate	No boundary issues	Moderate

Distributed Rate Limiting

When your API runs on multiple servers behind a load balancer, rate limiting must be coordinated across all instances. If each server tracks limits independently, a client could send N requests per server, effectively multiplying their allowed rate by the number of servers.

The standard solution is to use a centralized data store like Redis to maintain rate limit counters shared across all API server instances.

import Redis from 'ioredis';

class DistributedRateLimiter {
  private redis: Redis;

  constructor(redisUrl: string) {
    this.redis = new Redis(redisUrl);
  }

  // Sliding window counter using Redis
  async tryConsume(
    clientId: string,
    limit: number,
    windowSizeSeconds: number,
  ): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
    const now = Date.now();
    const windowKey = `ratelimit:${clientId}`;

    // Use Redis MULTI for atomic operations
    const pipeline = this.redis.pipeline();

    // Remove entries outside the window
    pipeline.zremrangebyscore(windowKey, 0, now - windowSizeSeconds * 1000);

    // Count entries in the window
    pipeline.zcard(windowKey);

    // Add the current request (score = timestamp, member = unique ID)
    pipeline.zadd(windowKey, now, `${now}:${Math.random()}`);

    // Set expiry on the key
    pipeline.expire(windowKey, windowSizeSeconds);

    const results = await pipeline.exec();
    const currentCount = results![1][1] as number;

    if (currentCount >= limit) {
      // Remove the entry we just added
      await this.redis.zremrangebyscore(windowKey, now, now);

      return {
        allowed: false,
        remaining: 0,
        resetAt: now + windowSizeSeconds * 1000,
      };
    }

    return {
      allowed: true,
      remaining: limit - currentCount - 1,
      resetAt: now + windowSizeSeconds * 1000,
    };
  }
}

// Express middleware
function rateLimitMiddleware(limiter: DistributedRateLimiter) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const clientId = req.headers['x-api-key'] as string || req.ip;
    const result = await limiter.tryConsume(clientId, 100, 60);

    // Set standard rate limit headers
    res.setHeader('X-RateLimit-Limit', '100');
    res.setHeader('X-RateLimit-Remaining', String(result.remaining));
    res.setHeader('X-RateLimit-Reset', String(Math.ceil(result.resetAt / 1000)));

    if (!result.allowed) {
      res.setHeader('Retry-After', '60');
      return res.status(429).json({
        error: 'Too Many Requests',
        message: 'Rate limit exceeded. Please try again later.',
        retryAfter: 60,
      });
    }

    next();
  };
}

Rate Limiting Best Practices

Return proper HTTP headers: Always include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After (on 429 responses).
Use 429 status code: The HTTP 429 Too Many Requests status is the standard for rate limiting.
Rate limit by multiple dimensions: Limit by API key, IP address, user ID, and endpoint. A user might be allowed 1000 requests per minute overall but only 10 login attempts per minute.
Use the token bucket for most cases: It handles bursts well and is the algorithm used by most major API providers (Stripe, GitHub, AWS).
Graceful degradation: Instead of hard rejecting requests, consider returning cached or degraded responses for rate-limited users.
Communicate limits clearly: Document your rate limits in your API documentation so clients can implement proper backoff strategies.

Race Conditions in Distributed Rate Limiting

When using Redis for distributed rate limiting, be aware of race conditions. Two requests arriving simultaneously might both read the counter before either increments it, allowing both through even if the limit has been reached. Use Redis Lua scripts or Redis transactions (MULTI/EXEC) to make the check-and-increment operation atomic. The sliding window counter implementation above uses a pipeline, but for production systems, a Lua script provides stronger atomicity guarantees.

Rate Limiting Algorithms

Why Rate Limiting Is Important

Why Rate Limit?

Rate Limiting Algorithms

1. Token Bucket

2. Leaky Bucket

3. Fixed Window Counter

4. Sliding Window Log

5. Sliding Window Counter

Algorithm Comparison

Distributed Rate Limiting

Rate Limiting Best Practices

Race Conditions in Distributed Rate Limiting

Continue Learning

Software Architecture

Cloud & Kubernetes

Performance Engineering

Data Engineering

Engineering Leadership