TechLead
Lesson 14 of 30
6 min read
System Design

System Design: URL Shortener

Design a URL shortener service like Bitly covering requirements, database schema, encoding strategies, read optimization, and analytics tracking

Problem Statement

Design a URL shortening service like Bitly or TinyURL. The service takes a long URL and returns a short, unique alias that redirects to the original URL. This is a classic system design interview question that tests your ability to think about hashing, storage, caching, and scale.

Step 1: Requirements Gathering

Functional Requirements

  • Given a long URL, generate a short and unique alias
  • When users access the short URL, redirect them to the original long URL
  • Users can optionally choose a custom alias
  • Links expire after a configurable time (default: no expiry)
  • Track analytics: click count, referrer, geography, device

Non-Functional Requirements

  • System should be highly available (redirects must always work)
  • URL redirection should be real-time with minimal latency (<100ms)
  • Short URLs should not be predictable (for security)
  • The system is read-heavy (100:1 read-to-write ratio)

Scale Estimates

Metric Estimate
New URLs per month100 million
Reads per month10 billion (100:1 ratio)
Reads per second~3,800 (10B / 30 / 24 / 3600)
Writes per second~38
Storage per URL~500 bytes (short URL + long URL + metadata)
Storage for 5 years~3 TB (100M * 12 * 5 * 500B)

Step 2: High-Level Design

The system consists of several key components working together:

  • API Gateway / Load Balancer: Distributes incoming requests across application servers
  • Application Servers: Handle URL creation and redirection logic
  • Database: Stores the mapping between short URLs and long URLs
  • Cache (Redis): Caches frequently accessed URL mappings for fast redirection
  • Analytics Service: Asynchronously processes click events
// API Design
interface URLShortenerAPI {
  // Create a short URL
  // POST /api/v1/shorten
  createShortURL(request: {
    longUrl: string;
    customAlias?: string;
    expiresAt?: Date;
  }): Promise<{
    shortUrl: string;
    longUrl: string;
    expiresAt: Date | null;
    createdAt: Date;
  }>;

  // Redirect (handled at HTTP level)
  // GET /:shortCode -> 301/302 redirect to longUrl

  // Get analytics
  // GET /api/v1/analytics/:shortCode
  getAnalytics(shortCode: string): Promise<{
    totalClicks: number;
    clicksByDay: Array<{ date: string; count: number }>;
    topReferrers: Array<{ referrer: string; count: number }>;
    topCountries: Array<{ country: string; count: number }>;
  }>;
}

Step 3: Database Schema

// Database schema
interface URLMapping {
  id: string;            // Primary key (auto-increment or UUID)
  shortCode: string;     // Unique, indexed - the short URL code (e.g., "aB3x7K")
  longUrl: string;       // The original URL
  userId?: string;       // Optional: who created it
  createdAt: Date;
  expiresAt?: Date;
  clickCount: number;    // Denormalized for quick access
}

// Analytics events (separate table or message queue)
interface ClickEvent {
  id: string;
  shortCode: string;
  timestamp: Date;
  ipAddress: string;
  userAgent: string;
  referrer?: string;
  country?: string;
  device?: string;
}

// SQL Schema
// CREATE TABLE url_mappings (
//   id BIGSERIAL PRIMARY KEY,
//   short_code VARCHAR(10) UNIQUE NOT NULL,
//   long_url TEXT NOT NULL,
//   user_id VARCHAR(36),
//   created_at TIMESTAMP DEFAULT NOW(),
//   expires_at TIMESTAMP,
//   click_count BIGINT DEFAULT 0
// );
// CREATE INDEX idx_short_code ON url_mappings(short_code);
// CREATE INDEX idx_expires_at ON url_mappings(expires_at) WHERE expires_at IS NOT NULL;

Step 4: URL Encoding / Hashing Strategy

The core challenge is generating a short, unique code for each URL. There are several approaches, each with trade-offs.

Approach 1: Base62 Encoding of Auto-Increment ID

Use an auto-incrementing database ID and convert it to a base62 string (a-z, A-Z, 0-9). A 7-character base62 string can represent 62^7 = 3.5 trillion unique URLs.

const BASE62_CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

function encodeBase62(num: number): string {
  if (num === 0) return BASE62_CHARS[0];

  let result = "";
  while (num > 0) {
    result = BASE62_CHARS[num % 62] + result;
    num = Math.floor(num / 62);
  }

  return result;
}

function decodeBase62(str: string): number {
  let result = 0;
  for (const char of str) {
    result = result * 62 + BASE62_CHARS.indexOf(char);
  }
  return result;
}

// Example: ID 1000000 -> base62 "4C92" (4 characters)
// Example: ID 3500000000000 -> base62 "Zzzzzz" (7 characters)

Approach 2: Hash-based (MD5/SHA256 + Truncation)

Hash the long URL and take the first 7 characters. Handle collisions by checking the database and re-hashing if needed.

import { createHash } from "crypto";

function generateShortCode(longUrl: string, attempt = 0): string {
  const input = attempt === 0 ? longUrl : `${longUrl}${attempt}`;
  const hash = createHash("md5").update(input).digest("base64url");
  return hash.substring(0, 7); // Take first 7 characters
}

async function createShortURL(longUrl: string, db: Database): Promise<string> {
  for (let attempt = 0; attempt < 5; attempt++) {
    const shortCode = generateShortCode(longUrl, attempt);

    // Check for collision
    const existing = await db.findByShortCode(shortCode);
    if (!existing) {
      await db.insert({ shortCode, longUrl });
      return shortCode;
    }

    // If same URL was already shortened, return existing
    if (existing.longUrl === longUrl) {
      return existing.shortCode;
    }

    // Collision with different URL, try again
  }

  throw new Error("Failed to generate unique short code");
}

Approach 3: Pre-generated Key Service

Generate random unique keys in advance and store them in a pool. When a new URL is shortened, take a key from the pool. This avoids collision checks at write time and is the most scalable approach.

Encoding Strategy Comparison

Strategy Pros Cons
Base62 of IDNo collisions, deterministicSequential (predictable), single point of failure for ID generation
Hash + TruncationStateless, same URL always gets same codeCollision handling needed, extra DB lookups
Pre-generated KeysFast, no collisions, no coordinationRequires key management service, wasted keys on failures

Step 5: Read-Heavy Optimization

With a 100:1 read-to-write ratio, optimizing the redirect path is critical. Every millisecond counts when billions of redirects happen per month.

class URLRedirectService {
  private cache: RedisClient;
  private db: Database;

  async redirect(shortCode: string): Promise<string | null> {
    // Layer 1: Check Redis cache
    const cached = await this.cache.get(`url:${shortCode}`);
    if (cached) {
      this.trackClickAsync(shortCode); // Fire and forget
      return cached;
    }

    // Layer 2: Check database
    const mapping = await this.db.findByShortCode(shortCode);
    if (!mapping) return null;

    // Check expiration
    if (mapping.expiresAt && mapping.expiresAt < new Date()) {
      return null;
    }

    // Populate cache for future requests (24 hour TTL)
    await this.cache.setex(`url:${shortCode}`, 86400, mapping.longUrl);

    this.trackClickAsync(shortCode); // Fire and forget
    return mapping.longUrl;
  }

  // 301 vs 302 redirect decision
  // 301 (Permanent): Browser caches it, fewer requests to our server
  //   - Pro: Reduces server load
  //   - Con: Loses analytics (browser goes direct)
  // 302 (Temporary): Browser always hits our server first
  //   - Pro: Accurate analytics
  //   - Con: More server load
  // Recommendation: Use 302 if analytics matter, 301 for maximum performance

  private trackClickAsync(shortCode: string): void {
    // Send to message queue for async processing
    messageQueue.publish("clicks", {
      shortCode,
      timestamp: Date.now(),
      // ... other metadata
    });
  }
}

Step 6: Analytics and Tracking

Analytics should be processed asynchronously to avoid slowing down redirects. Click events are published to a message queue (Kafka or SQS) and processed by a separate analytics service.

// Analytics pipeline
class AnalyticsProcessor {
  async processClickEvent(event: ClickEvent): Promise<void> {
    // Parse user agent for device info
    const deviceInfo = parseUserAgent(event.userAgent);

    // Geo-locate from IP
    const location = await geolocate(event.ipAddress);

    // Store in analytics database (time-series optimized)
    await this.analyticsDB.insert({
      shortCode: event.shortCode,
      timestamp: event.timestamp,
      country: location.country,
      city: location.city,
      device: deviceInfo.device,
      browser: deviceInfo.browser,
      os: deviceInfo.os,
      referrer: event.referrer,
    });

    // Increment counters (denormalized for fast reads)
    await this.cache.incr(`clicks:${event.shortCode}`);
    await this.cache.incr(`clicks:${event.shortCode}:${todayKey()}`);
  }
}

Complete Architecture Summary

Key Design Decisions

  • Encoding: Base62 with a distributed ID generator (like Twitter Snowflake) for non-predictable, collision-free codes
  • Storage: PostgreSQL for URL mappings (simple schema, strong consistency for writes)
  • Cache: Redis with 24-hour TTL for hot URLs (most redirects served from cache)
  • Analytics: Kafka for click event streaming, Cassandra or ClickHouse for analytics storage
  • Redirect: 302 for analytics accuracy, consider 301 for high-traffic URLs where analytics are not needed
  • Scaling: Stateless application servers behind a load balancer, database read replicas for analytics queries

Common Follow-up Questions

  • How do you handle expired URLs? A background job scans for expired URLs and removes them from cache. Database records can be soft-deleted for analytics retention.
  • How do you prevent abuse? Rate limiting per user/IP, spam URL detection using blocklists, and CAPTCHA for anonymous users.
  • How do you handle hot URLs? Popular URLs are cached in Redis with longer TTLs. Multiple cache replicas distribute the read load.

Continue Learning