Algoritmos de Limitación de Tasa

Por Qué la Limitación de Tasa Es Importante

La limitación de tasa controla el número de solicitudes que un cliente puede hacer a una API dentro de un período de tiempo dado. Es un mecanismo de defensa crítico que protege tus servicios de ser saturados.

¿Por Qué Limitar la Tasa?

Prevenir abuso: Detener a usuarios maliciosos de saturar tu API (ataques DDoS, intentos de fuerza bruta).
Asegurar equidad: Prevenir que un solo cliente monopolice recursos.
Controlar costos: Las llamadas API a servicios downstream cuestan dinero.
Mantener estabilidad: Proteger servicios backend de picos de tráfico.
Cumplir acuerdos: Imponer niveles de uso de API (gratis: 100 req/min, pro: 10,000 req/min).

1. Token Bucket

El algoritmo token bucket es uno de los enfoques de limitación de tasa más ampliamente usados. Imagina un bucket que contiene tokens. Los tokens se agregan al bucket a una tasa fija (ej. 10 tokens por segundo). Cada solicitud consume un token. Si el bucket está vacío, la solicitud se rechaza. El bucket tiene una capacidad máxima, así que los tokens no se acumulan infinitamente.

La ventaja clave del token bucket es que permite ráfagas. Si el bucket está lleno (ej. 100 tokens), un cliente puede hacer 100 solicitudes instantáneamente, luego debe esperar a que los tokens se recarguen.

class TokenBucket {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private maxTokens: number,       // Maximum burst size
    private refillRate: number,      // Tokens added per second
  ) {
    this.tokens = maxTokens;
    this.lastRefill = Date.now();
  }

  private refill(): void {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    const newTokens = elapsed * this.refillRate;
    this.tokens = Math.min(this.maxTokens, this.tokens + newTokens);
    this.lastRefill = now;
  }

  tryConsume(tokens: number = 1): boolean {
    this.refill();
    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return true; // Request allowed
    }
    return false; // Rate limited
  }
}

// Usage: 10 requests per second with burst of 20
const bucket = new TokenBucket(20, 10);

2. Leaky Bucket

El leaky bucket procesa solicitudes a una tasa fija y constante — como agua goteando de un bucket a un ritmo constante. Las solicitudes entrantes se agregan a una cola (el bucket). Si la cola está llena, las nuevas solicitudes se rechazan. Las solicitudes se procesan de la cola a una tasa constante, independientemente de qué tan rápido lleguen.

A diferencia del token bucket, el leaky bucket no permite ráfagas. Suaviza el tráfico a una tasa constante. Esto es útil cuando el backend solo puede manejar un flujo constante de solicitudes.

class LeakyBucket {
  private queue: Array<() => void> = [];
  private processing: boolean = false;

  constructor(
    private capacity: number,          // Max queue size
    private leakRateMs: number,        // Process one request every N ms
  ) {}

  async tryAdd(handler: () => void): Promise<boolean> {
    if (this.queue.length >= this.capacity) {
      return false; // Bucket is full - reject
    }

    this.queue.push(handler);
    this.startLeaking();
    return true;
  }

  private startLeaking(): void {
    if (this.processing) return;
    this.processing = true;

    const leak = () => {
      const handler = this.queue.shift();
      if (handler) {
        handler(); // Process the request
        setTimeout(leak, this.leakRateMs); // Fixed rate
      } else {
        this.processing = false;
      }
    };

    leak();
  }
}

// Process 1 request every 100ms (10 requests/sec), queue up to 50
const leaky = new LeakyBucket(50, 100);

3. Contador de Ventana Fija

Divide el tiempo en ventanas fijas (ej. ventanas de 1 minuto) y cuenta el número de solicitudes en cada ventana. Si el conteo excede el límite, rechaza las solicitudes subsiguientes hasta que comience la siguiente ventana.

Este es el algoritmo más simple de implementar pero tiene un problema conocido: una ráfaga al final de una ventana y al inicio de la siguiente puede resultar en el doble de la tasa permitida. Por ejemplo, con un límite de 100 solicitudes por minuto, un cliente podría enviar 100 solicitudes a las 0:59 y 100 más a las 1:00 — 200 solicitudes en 2 segundos.

class FixedWindowCounter {
  private windows: Map<string, number> = new Map();

  constructor(
    private limit: number,
    private windowSizeMs: number,
  ) {}

  private getCurrentWindow(): string {
    const windowStart = Math.floor(Date.now() / this.windowSizeMs);
    return windowStart.toString();
  }

  tryConsume(clientId: string): boolean {
    const window = this.getCurrentWindow();
    const key = `${clientId}:${window}`;

    const currentCount = this.windows.get(key) || 0;

    if (currentCount >= this.limit) {
      return false; // Rate limited
    }

    this.windows.set(key, currentCount + 1);

    // Clean up old windows periodically
    this.cleanup();

    return true;
  }

  private cleanup(): void {
    const currentWindow = this.getCurrentWindow();
    for (const [key] of this.windows) {
      if (!key.endsWith(currentWindow)) {
        this.windows.delete(key);
      }
    }
  }
}

// 100 requests per minute
const fixedWindow = new FixedWindowCounter(100, 60_000);

4. Log de Ventana Deslizante

Mantiene un log con timestamp de cada solicitud. Cuando llega una nueva solicitud, elimina todas las entradas más antiguas que el tamaño de la ventana, luego cuenta las entradas restantes. Si el conteo excede el límite, rechaza la solicitud.

Este algoritmo es preciso — no sufre del problema de límite de las ventanas fijas. Sin embargo, usa más memoria porque almacena un timestamp para cada solicitud dentro de la ventana.

class SlidingWindowLog {
  private logs: Map<string, number[]> = new Map();

  constructor(
    private limit: number,
    private windowSizeMs: number,
  ) {}

  tryConsume(clientId: string): boolean {
    const now = Date.now();
    const windowStart = now - this.windowSizeMs;

    // Get or create log for this client
    let timestamps = this.logs.get(clientId) || [];

    // Remove entries outside the window
    timestamps = timestamps.filter(ts => ts > windowStart);

    if (timestamps.length >= this.limit) {
      this.logs.set(clientId, timestamps);
      return false; // Rate limited
    }

    // Add current request
    timestamps.push(now);
    this.logs.set(clientId, timestamps);

    return true;
  }
}

// 100 requests per 60-second sliding window
const slidingLog = new SlidingWindowLog(100, 60_000);

5. Contador de Ventana Deslizante

El contador de ventana deslizante es un híbrido que combina la eficiencia de memoria de ventanas fijas con la precisión de ventanas deslizantes. Usa dos ventanas fijas adyacentes y calcula un conteo ponderado basado en dónde cae el tiempo actual dentro de la ventana actual.

class SlidingWindowCounter {
  private windows: Map<string, number> = new Map();

  constructor(
    private limit: number,
    private windowSizeMs: number,
  ) {}

  tryConsume(clientId: string): boolean {
    const now = Date.now();
    const currentWindowStart = Math.floor(now / this.windowSizeMs) * this.windowSizeMs;
    const previousWindowStart = currentWindowStart - this.windowSizeMs;

    const currentKey = `${clientId}:${currentWindowStart}`;
    const previousKey = `${clientId}:${previousWindowStart}`;

    const currentCount = this.windows.get(currentKey) || 0;
    const previousCount = this.windows.get(previousKey) || 0;

    // Calculate the weight of the previous window
    // based on how far we are into the current window
    const elapsedInCurrentWindow = now - currentWindowStart;
    const previousWeight = 1 - (elapsedInCurrentWindow / this.windowSizeMs);

    // Weighted count: full current window + proportional previous window
    const estimatedCount = Math.floor(previousCount * previousWeight) + currentCount;

    if (estimatedCount >= this.limit) {
      return false; // Rate limited
    }

    this.windows.set(currentKey, currentCount + 1);
    return true;
  }
}

// 100 requests per 60-second sliding window (approximate)
const slidingCounter = new SlidingWindowCounter(100, 60_000);

Comparación de Algoritmos

Algoritmo	Memoria	Precisión	Ráfaga	Complejidad
Token Bucket	O(1)	Buena	Permite ráfagas	Simple
Leaky Bucket	O(n) cola	Buena	Suaviza	Simple
Ventana Fija	O(1)	Problemas de límite	Ráfaga 2x en límite	Más simple
Log Ventana Deslizante	O(n) por usuario	Exacta	Sin problemas de límite	Moderada
Contador Ventana Deslizante	O(1)	Aproximada	Sin problemas de límite	Moderada

Limitación de Tasa Distribuida

Cuando tu API se ejecuta en múltiples servidores detrás de un balanceador de carga, la limitación de tasa debe coordinarse entre todas las instancias. Si cada servidor rastrea límites independientemente, un cliente podría enviar N solicitudes por servidor, multiplicando efectivamente su tasa permitida por el número de servidores.

La solución estándar es usar un almacén de datos centralizado como Redis para mantener contadores de límites de tasa compartidos entre todas las instancias del servidor API.

import Redis from 'ioredis';

class DistributedRateLimiter {
  private redis: Redis;

  constructor(redisUrl: string) {
    this.redis = new Redis(redisUrl);
  }

  async tryConsume(
    clientId: string,
    limit: number,
    windowSizeSeconds: number,
  ): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
    const now = Date.now();
    const windowKey = `ratelimit:${clientId}`;

    // Use Redis MULTI for atomic operations
    const pipeline = this.redis.pipeline();

    // Remove entries outside the window
    pipeline.zremrangebyscore(windowKey, 0, now - windowSizeSeconds * 1000);

    // Count entries in the window
    pipeline.zcard(windowKey);

    // Add the current request
    pipeline.zadd(windowKey, now, `${now}:${Math.random()}`);

    // Set expiry on the key
    pipeline.expire(windowKey, windowSizeSeconds);

    const results = await pipeline.exec();
    const currentCount = results![1][1] as number;

    if (currentCount >= limit) {
      await this.redis.zremrangebyscore(windowKey, now, now);
      return { allowed: false, remaining: 0, resetAt: now + windowSizeSeconds * 1000 };
    }

    return {
      allowed: true,
      remaining: limit - currentCount - 1,
      resetAt: now + windowSizeSeconds * 1000,
    };
  }
}

// Express middleware
function rateLimitMiddleware(limiter: DistributedRateLimiter) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const clientId = req.headers['x-api-key'] as string || req.ip;
    const result = await limiter.tryConsume(clientId, 100, 60);

    // Set standard rate limit headers
    res.setHeader('X-RateLimit-Limit', '100');
    res.setHeader('X-RateLimit-Remaining', String(result.remaining));
    res.setHeader('X-RateLimit-Reset', String(Math.ceil(result.resetAt / 1000)));

    if (!result.allowed) {
      res.setHeader('Retry-After', '60');
      return res.status(429).json({
        error: 'Too Many Requests',
        message: 'Rate limit exceeded. Please try again later.',
        retryAfter: 60,
      });
    }

    next();
  };
}

Mejores Prácticas de Limitación de Tasa

Retorna cabeceras HTTP adecuadas: Siempre incluye X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset y Retry-After (en respuestas 429).
Usa código de estado 429: El estado HTTP 429 Too Many Requests es el estándar para limitación de tasa.
Limita por múltiples dimensiones: Limita por clave de API, dirección IP, ID de usuario y endpoint. Un usuario podría tener 1000 solicitudes por minuto en total pero solo 10 intentos de login por minuto.
Usa token bucket para la mayoría de los casos: Maneja ráfagas bien y es el algoritmo usado por la mayoría de los principales proveedores de API (Stripe, GitHub, AWS).
Degradación elegante: En lugar de rechazar solicitudes de forma dura, considera retornar respuestas cacheadas o degradadas para usuarios limitados.
Comunica los límites claramente: Documenta tus límites de tasa en tu documentación de API para que los clientes puedan implementar estrategias de backoff adecuadas.

Condiciones de Carrera en Limitación de Tasa Distribuida

Cuando uses Redis para limitación de tasa distribuida, ten cuidado con las condiciones de carrera. Dos solicitudes llegando simultáneamente podrían ambas leer el contador antes de que cualquiera lo incremente, permitiendo que ambas pasen incluso si el límite ha sido alcanzado. Usa scripts Lua de Redis o transacciones Redis (MULTI/EXEC) para hacer la operación de verificar-e-incrementar atómica.

Por Qué la Limitación de Tasa Es Importante

¿Por Qué Limitar la Tasa?