Why Rate Limiting Is Important
Rate limiting controls the number of requests a client can make to an API within a given time period. It is a critical defense mechanism that protects your services from being overwhelmed, whether by malicious attacks (DDoS), misbehaving clients, or simple bugs that cause excessive API calls.
Why Rate Limit?
- Prevent abuse: Stop malicious users from overwhelming your API with requests (DDoS attacks, brute-force login attempts).
- Ensure fairness: Prevent a single client from monopolizing resources, ensuring all users get fair access.
- Control costs: API calls to downstream services (databases, third-party APIs) cost money. Rate limiting caps this cost.
- Maintain stability: Protect backend services from traffic spikes that could cause cascading failures.
- Comply with agreements: Enforce API usage tiers (free: 100 req/min, pro: 10,000 req/min).
Rate Limiting Algorithms
1. Token Bucket
The token bucket algorithm is one of the most widely used rate limiting approaches. Imagine a bucket that holds tokens. Tokens are added to the bucket at a fixed rate (e.g., 10 tokens per second). Each request consumes one token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity, so tokens do not accumulate infinitely.
The key advantage of the token bucket is that it allows bursts. If the bucket is full (e.g., 100 tokens), a client can make 100 requests instantly, then must wait for tokens to refill. This mirrors real-world usage patterns where users may burst briefly but maintain a low average rate.
class TokenBucket {
private tokens: number;
private lastRefill: number;
constructor(
private maxTokens: number, // Maximum burst size
private refillRate: number, // Tokens added per second
) {
this.tokens = maxTokens;
this.lastRefill = Date.now();
}
private refill(): void {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000; // seconds
const newTokens = elapsed * this.refillRate;
this.tokens = Math.min(this.maxTokens, this.tokens + newTokens);
this.lastRefill = now;
}
tryConsume(tokens: number = 1): boolean {
this.refill();
if (this.tokens >= tokens) {
this.tokens -= tokens;
return true; // Request allowed
}
return false; // Rate limited
}
getTokensRemaining(): number {
this.refill();
return Math.floor(this.tokens);
}
}
// Usage: 10 requests per second with burst of 20
const bucket = new TokenBucket(20, 10);
function handleRequest(req: Request, res: Response) {
if (!bucket.tryConsume()) {
res.setHeader('Retry-After', '1');
res.setHeader('X-RateLimit-Remaining', '0');
return res.status(429).json({ error: 'Too many requests' });
}
res.setHeader('X-RateLimit-Remaining', String(bucket.getTokensRemaining()));
// Process request...
}
2. Leaky Bucket
The leaky bucket processes requests at a fixed, constant rate — like water leaking from a bucket at a steady drip. Incoming requests are added to a queue (the bucket). If the queue is full, new requests are rejected. Requests are processed from the queue at a constant rate, regardless of how fast they arrive.
Unlike the token bucket, the leaky bucket does not allow bursts. It smooths out traffic to a constant rate. This is useful when the backend can only handle a steady stream of requests.
class LeakyBucket {
private queue: Array<() => void> = [];
private processing: boolean = false;
constructor(
private capacity: number, // Max queue size
private leakRateMs: number, // Process one request every N ms
) {}
async tryAdd(handler: () => void): Promise<boolean> {
if (this.queue.length >= this.capacity) {
return false; // Bucket is full - reject
}
this.queue.push(handler);
this.startLeaking();
return true;
}
private startLeaking(): void {
if (this.processing) return;
this.processing = true;
const leak = () => {
const handler = this.queue.shift();
if (handler) {
handler(); // Process the request
setTimeout(leak, this.leakRateMs); // Fixed rate
} else {
this.processing = false;
}
};
leak();
}
}
// Process 1 request every 100ms (10 requests/sec), queue up to 50
const leaky = new LeakyBucket(50, 100);
3. Fixed Window Counter
Divide time into fixed windows (e.g., 1-minute windows) and count the number of requests in each window. If the count exceeds the limit, reject subsequent requests until the next window starts.
This is the simplest algorithm to implement but has a known problem: a burst at the end of one window and the start of the next can result in twice the allowed rate. For example, with a limit of 100 requests per minute, a client could send 100 requests at 0:59 and 100 more at 1:00 — 200 requests in 2 seconds.
class FixedWindowCounter {
private windows: Map<string, number> = new Map();
constructor(
private limit: number,
private windowSizeMs: number,
) {}
private getCurrentWindow(): string {
const windowStart = Math.floor(Date.now() / this.windowSizeMs);
return windowStart.toString();
}
tryConsume(clientId: string): boolean {
const window = this.getCurrentWindow();
const key = `${clientId}:${window}`;
const currentCount = this.windows.get(key) || 0;
if (currentCount >= this.limit) {
return false; // Rate limited
}
this.windows.set(key, currentCount + 1);
// Clean up old windows periodically
this.cleanup();
return true;
}
private cleanup(): void {
const currentWindow = this.getCurrentWindow();
for (const [key] of this.windows) {
if (!key.endsWith(currentWindow)) {
this.windows.delete(key);
}
}
}
}
// 100 requests per minute
const fixedWindow = new FixedWindowCounter(100, 60_000);
4. Sliding Window Log
Keep a timestamped log of every request. When a new request comes in, remove all entries older than the window size, then count the remaining entries. If the count exceeds the limit, reject the request.
This algorithm is precise — it does not suffer from the boundary problem of fixed windows. However, it uses more memory because it stores a timestamp for every request within the window.
class SlidingWindowLog {
private logs: Map<string, number[]> = new Map();
constructor(
private limit: number,
private windowSizeMs: number,
) {}
tryConsume(clientId: string): boolean {
const now = Date.now();
const windowStart = now - this.windowSizeMs;
// Get or create log for this client
let timestamps = this.logs.get(clientId) || [];
// Remove entries outside the window
timestamps = timestamps.filter(ts => ts > windowStart);
if (timestamps.length >= this.limit) {
this.logs.set(clientId, timestamps);
return false; // Rate limited
}
// Add current request
timestamps.push(now);
this.logs.set(clientId, timestamps);
return true;
}
}
// 100 requests per 60-second sliding window
const slidingLog = new SlidingWindowLog(100, 60_000);
5. Sliding Window Counter
The sliding window counter is a hybrid that combines the memory efficiency of fixed windows with the accuracy of sliding windows. It uses two adjacent fixed windows and calculates a weighted count based on where the current time falls within the current window.
class SlidingWindowCounter {
private windows: Map<string, number> = new Map();
constructor(
private limit: number,
private windowSizeMs: number,
) {}
tryConsume(clientId: string): boolean {
const now = Date.now();
const currentWindowStart = Math.floor(now / this.windowSizeMs) * this.windowSizeMs;
const previousWindowStart = currentWindowStart - this.windowSizeMs;
const currentKey = `${clientId}:${currentWindowStart}`;
const previousKey = `${clientId}:${previousWindowStart}`;
const currentCount = this.windows.get(currentKey) || 0;
const previousCount = this.windows.get(previousKey) || 0;
// Calculate the weight of the previous window
// based on how far we are into the current window
const elapsedInCurrentWindow = now - currentWindowStart;
const previousWeight = 1 - (elapsedInCurrentWindow / this.windowSizeMs);
// Weighted count: full current window + proportional previous window
const estimatedCount = Math.floor(previousCount * previousWeight) + currentCount;
if (estimatedCount >= this.limit) {
return false; // Rate limited
}
this.windows.set(currentKey, currentCount + 1);
return true;
}
}
// 100 requests per 60-second sliding window (approximate)
const slidingCounter = new SlidingWindowCounter(100, 60_000);
Algorithm Comparison
| Algorithm | Memory | Accuracy | Burst | Complexity |
|---|---|---|---|---|
| Token Bucket | O(1) | Good | Allows bursts | Simple |
| Leaky Bucket | O(n) queue | Good | Smooths out | Simple |
| Fixed Window | O(1) | Boundary issues | 2x burst at boundary | Simplest |
| Sliding Window Log | O(n) per user | Exact | No boundary issues | Moderate |
| Sliding Window Counter | O(1) | Approximate | No boundary issues | Moderate |
Distributed Rate Limiting
When your API runs on multiple servers behind a load balancer, rate limiting must be coordinated across all instances. If each server tracks limits independently, a client could send N requests per server, effectively multiplying their allowed rate by the number of servers.
The standard solution is to use a centralized data store like Redis to maintain rate limit counters shared across all API server instances.
import Redis from 'ioredis';
class DistributedRateLimiter {
private redis: Redis;
constructor(redisUrl: string) {
this.redis = new Redis(redisUrl);
}
// Sliding window counter using Redis
async tryConsume(
clientId: string,
limit: number,
windowSizeSeconds: number,
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
const now = Date.now();
const windowKey = `ratelimit:${clientId}`;
// Use Redis MULTI for atomic operations
const pipeline = this.redis.pipeline();
// Remove entries outside the window
pipeline.zremrangebyscore(windowKey, 0, now - windowSizeSeconds * 1000);
// Count entries in the window
pipeline.zcard(windowKey);
// Add the current request (score = timestamp, member = unique ID)
pipeline.zadd(windowKey, now, `${now}:${Math.random()}`);
// Set expiry on the key
pipeline.expire(windowKey, windowSizeSeconds);
const results = await pipeline.exec();
const currentCount = results![1][1] as number;
if (currentCount >= limit) {
// Remove the entry we just added
await this.redis.zremrangebyscore(windowKey, now, now);
return {
allowed: false,
remaining: 0,
resetAt: now + windowSizeSeconds * 1000,
};
}
return {
allowed: true,
remaining: limit - currentCount - 1,
resetAt: now + windowSizeSeconds * 1000,
};
}
}
// Express middleware
function rateLimitMiddleware(limiter: DistributedRateLimiter) {
return async (req: Request, res: Response, next: NextFunction) => {
const clientId = req.headers['x-api-key'] as string || req.ip;
const result = await limiter.tryConsume(clientId, 100, 60);
// Set standard rate limit headers
res.setHeader('X-RateLimit-Limit', '100');
res.setHeader('X-RateLimit-Remaining', String(result.remaining));
res.setHeader('X-RateLimit-Reset', String(Math.ceil(result.resetAt / 1000)));
if (!result.allowed) {
res.setHeader('Retry-After', '60');
return res.status(429).json({
error: 'Too Many Requests',
message: 'Rate limit exceeded. Please try again later.',
retryAfter: 60,
});
}
next();
};
}
Rate Limiting Best Practices
- Return proper HTTP headers: Always include X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After (on 429 responses).
- Use 429 status code: The HTTP 429 Too Many Requests status is the standard for rate limiting.
- Rate limit by multiple dimensions: Limit by API key, IP address, user ID, and endpoint. A user might be allowed 1000 requests per minute overall but only 10 login attempts per minute.
- Use the token bucket for most cases: It handles bursts well and is the algorithm used by most major API providers (Stripe, GitHub, AWS).
- Graceful degradation: Instead of hard rejecting requests, consider returning cached or degraded responses for rate-limited users.
- Communicate limits clearly: Document your rate limits in your API documentation so clients can implement proper backoff strategies.
Race Conditions in Distributed Rate Limiting
When using Redis for distributed rate limiting, be aware of race conditions. Two requests arriving simultaneously might both read the counter before either increments it, allowing both through even if the limit has been reached. Use Redis Lua scripts or Redis transactions (MULTI/EXEC) to make the check-and-increment operation atomic. The sliding window counter implementation above uses a pipeline, but for production systems, a Lua script provides stronger atomicity guarantees.