Problem Statement
Design a URL shortening service like Bitly or TinyURL. The service takes a long URL and returns a short, unique alias that redirects to the original URL. This is a classic system design interview question that tests your ability to think about hashing, storage, caching, and scale.
Step 1: Requirements Gathering
Functional Requirements
- Given a long URL, generate a short and unique alias
- When users access the short URL, redirect them to the original long URL
- Users can optionally choose a custom alias
- Links expire after a configurable time (default: no expiry)
- Track analytics: click count, referrer, geography, device
Non-Functional Requirements
- System should be highly available (redirects must always work)
- URL redirection should be real-time with minimal latency (<100ms)
- Short URLs should not be predictable (for security)
- The system is read-heavy (100:1 read-to-write ratio)
Scale Estimates
| Metric | Estimate |
|---|---|
| New URLs per month | 100 million |
| Reads per month | 10 billion (100:1 ratio) |
| Reads per second | ~3,800 (10B / 30 / 24 / 3600) |
| Writes per second | ~38 |
| Storage per URL | ~500 bytes (short URL + long URL + metadata) |
| Storage for 5 years | ~3 TB (100M * 12 * 5 * 500B) |
Step 2: High-Level Design
The system consists of several key components working together:
- API Gateway / Load Balancer: Distributes incoming requests across application servers
- Application Servers: Handle URL creation and redirection logic
- Database: Stores the mapping between short URLs and long URLs
- Cache (Redis): Caches frequently accessed URL mappings for fast redirection
- Analytics Service: Asynchronously processes click events
// API Design
interface URLShortenerAPI {
// Create a short URL
// POST /api/v1/shorten
createShortURL(request: {
longUrl: string;
customAlias?: string;
expiresAt?: Date;
}): Promise<{
shortUrl: string;
longUrl: string;
expiresAt: Date | null;
createdAt: Date;
}>;
// Redirect (handled at HTTP level)
// GET /:shortCode -> 301/302 redirect to longUrl
// Get analytics
// GET /api/v1/analytics/:shortCode
getAnalytics(shortCode: string): Promise<{
totalClicks: number;
clicksByDay: Array<{ date: string; count: number }>;
topReferrers: Array<{ referrer: string; count: number }>;
topCountries: Array<{ country: string; count: number }>;
}>;
}
Step 3: Database Schema
// Database schema
interface URLMapping {
id: string; // Primary key (auto-increment or UUID)
shortCode: string; // Unique, indexed - the short URL code (e.g., "aB3x7K")
longUrl: string; // The original URL
userId?: string; // Optional: who created it
createdAt: Date;
expiresAt?: Date;
clickCount: number; // Denormalized for quick access
}
// Analytics events (separate table or message queue)
interface ClickEvent {
id: string;
shortCode: string;
timestamp: Date;
ipAddress: string;
userAgent: string;
referrer?: string;
country?: string;
device?: string;
}
// SQL Schema
// CREATE TABLE url_mappings (
// id BIGSERIAL PRIMARY KEY,
// short_code VARCHAR(10) UNIQUE NOT NULL,
// long_url TEXT NOT NULL,
// user_id VARCHAR(36),
// created_at TIMESTAMP DEFAULT NOW(),
// expires_at TIMESTAMP,
// click_count BIGINT DEFAULT 0
// );
// CREATE INDEX idx_short_code ON url_mappings(short_code);
// CREATE INDEX idx_expires_at ON url_mappings(expires_at) WHERE expires_at IS NOT NULL;
Step 4: URL Encoding / Hashing Strategy
The core challenge is generating a short, unique code for each URL. There are several approaches, each with trade-offs.
Approach 1: Base62 Encoding of Auto-Increment ID
Use an auto-incrementing database ID and convert it to a base62 string (a-z, A-Z, 0-9). A 7-character base62 string can represent 62^7 = 3.5 trillion unique URLs.
const BASE62_CHARS = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
function encodeBase62(num: number): string {
if (num === 0) return BASE62_CHARS[0];
let result = "";
while (num > 0) {
result = BASE62_CHARS[num % 62] + result;
num = Math.floor(num / 62);
}
return result;
}
function decodeBase62(str: string): number {
let result = 0;
for (const char of str) {
result = result * 62 + BASE62_CHARS.indexOf(char);
}
return result;
}
// Example: ID 1000000 -> base62 "4C92" (4 characters)
// Example: ID 3500000000000 -> base62 "Zzzzzz" (7 characters)
Approach 2: Hash-based (MD5/SHA256 + Truncation)
Hash the long URL and take the first 7 characters. Handle collisions by checking the database and re-hashing if needed.
import { createHash } from "crypto";
function generateShortCode(longUrl: string, attempt = 0): string {
const input = attempt === 0 ? longUrl : `${longUrl}${attempt}`;
const hash = createHash("md5").update(input).digest("base64url");
return hash.substring(0, 7); // Take first 7 characters
}
async function createShortURL(longUrl: string, db: Database): Promise<string> {
for (let attempt = 0; attempt < 5; attempt++) {
const shortCode = generateShortCode(longUrl, attempt);
// Check for collision
const existing = await db.findByShortCode(shortCode);
if (!existing) {
await db.insert({ shortCode, longUrl });
return shortCode;
}
// If same URL was already shortened, return existing
if (existing.longUrl === longUrl) {
return existing.shortCode;
}
// Collision with different URL, try again
}
throw new Error("Failed to generate unique short code");
}
Approach 3: Pre-generated Key Service
Generate random unique keys in advance and store them in a pool. When a new URL is shortened, take a key from the pool. This avoids collision checks at write time and is the most scalable approach.
Encoding Strategy Comparison
| Strategy | Pros | Cons |
|---|---|---|
| Base62 of ID | No collisions, deterministic | Sequential (predictable), single point of failure for ID generation |
| Hash + Truncation | Stateless, same URL always gets same code | Collision handling needed, extra DB lookups |
| Pre-generated Keys | Fast, no collisions, no coordination | Requires key management service, wasted keys on failures |
Step 5: Read-Heavy Optimization
With a 100:1 read-to-write ratio, optimizing the redirect path is critical. Every millisecond counts when billions of redirects happen per month.
class URLRedirectService {
private cache: RedisClient;
private db: Database;
async redirect(shortCode: string): Promise<string | null> {
// Layer 1: Check Redis cache
const cached = await this.cache.get(`url:${shortCode}`);
if (cached) {
this.trackClickAsync(shortCode); // Fire and forget
return cached;
}
// Layer 2: Check database
const mapping = await this.db.findByShortCode(shortCode);
if (!mapping) return null;
// Check expiration
if (mapping.expiresAt && mapping.expiresAt < new Date()) {
return null;
}
// Populate cache for future requests (24 hour TTL)
await this.cache.setex(`url:${shortCode}`, 86400, mapping.longUrl);
this.trackClickAsync(shortCode); // Fire and forget
return mapping.longUrl;
}
// 301 vs 302 redirect decision
// 301 (Permanent): Browser caches it, fewer requests to our server
// - Pro: Reduces server load
// - Con: Loses analytics (browser goes direct)
// 302 (Temporary): Browser always hits our server first
// - Pro: Accurate analytics
// - Con: More server load
// Recommendation: Use 302 if analytics matter, 301 for maximum performance
private trackClickAsync(shortCode: string): void {
// Send to message queue for async processing
messageQueue.publish("clicks", {
shortCode,
timestamp: Date.now(),
// ... other metadata
});
}
}
Step 6: Analytics and Tracking
Analytics should be processed asynchronously to avoid slowing down redirects. Click events are published to a message queue (Kafka or SQS) and processed by a separate analytics service.
// Analytics pipeline
class AnalyticsProcessor {
async processClickEvent(event: ClickEvent): Promise<void> {
// Parse user agent for device info
const deviceInfo = parseUserAgent(event.userAgent);
// Geo-locate from IP
const location = await geolocate(event.ipAddress);
// Store in analytics database (time-series optimized)
await this.analyticsDB.insert({
shortCode: event.shortCode,
timestamp: event.timestamp,
country: location.country,
city: location.city,
device: deviceInfo.device,
browser: deviceInfo.browser,
os: deviceInfo.os,
referrer: event.referrer,
});
// Increment counters (denormalized for fast reads)
await this.cache.incr(`clicks:${event.shortCode}`);
await this.cache.incr(`clicks:${event.shortCode}:${todayKey()}`);
}
}
Complete Architecture Summary
Key Design Decisions
- Encoding: Base62 with a distributed ID generator (like Twitter Snowflake) for non-predictable, collision-free codes
- Storage: PostgreSQL for URL mappings (simple schema, strong consistency for writes)
- Cache: Redis with 24-hour TTL for hot URLs (most redirects served from cache)
- Analytics: Kafka for click event streaming, Cassandra or ClickHouse for analytics storage
- Redirect: 302 for analytics accuracy, consider 301 for high-traffic URLs where analytics are not needed
- Scaling: Stateless application servers behind a load balancer, database read replicas for analytics queries
Common Follow-up Questions
- How do you handle expired URLs? A background job scans for expired URLs and removes them from cache. Database records can be soft-deleted for analytics retention.
- How do you prevent abuse? Rate limiting per user/IP, spam URL detection using blocklists, and CAPTCHA for anonymous users.
- How do you handle hot URLs? Popular URLs are cached in Redis with longer TTLs. Multiple cache replicas distribute the read load.