System Design Interview Framework
System design interviews evaluate your ability to design large-scale systems under ambiguity. Unlike coding interviews with clear right answers, system design interviews test your ability to make reasonable trade-offs, communicate clearly, and demonstrate breadth of knowledge. This guide provides a structured framework that works for any system design question.
What Interviewers Are Looking For
- Communication: Can you articulate your thinking clearly?
- Problem-solving approach: Do you work methodically or jump to solutions?
- Trade-off analysis: Can you evaluate different options and justify your choices?
- Technical depth: Do you understand the technologies you propose using?
- Scalability awareness: Can you identify and address bottlenecks?
- Practical experience: Does your design reflect real-world considerations?
Step 1: Clarify Requirements (5 minutes)
This is the most important step. Many candidates fail because they design the wrong system. Ask questions to understand scope, constraints, and priorities.
Functional Requirements
What does the system need to do? Be specific about core features vs. nice-to-haves.
// Example: "Design a URL shortener"
// Questions to ask:
// - Should users be able to customize short URLs?
// - Do short URLs expire?
// - Do we need analytics (click counts, geographic data)?
// - Is there authentication or can anyone create URLs?
// - What is the expected format of short URLs?
interface FunctionalRequirements {
core: string[]; // Must have
secondary: string[]; // Nice to have, time permitting
outOfScope: string[]; // Explicitly excluded
}
const urlShortenerRequirements: FunctionalRequirements = {
core: [
"Create short URL from long URL",
"Redirect short URL to original URL",
"Short URLs should be unique and non-guessable",
],
secondary: [
"Custom aliases",
"URL expiration",
"Click analytics",
],
outOfScope: [
"User authentication",
"Rate limiting UI",
"Admin dashboard",
],
};Non-Functional Requirements
These drive your architecture decisions more than functional requirements.
interface NonFunctionalRequirements {
scale: {
dailyActiveUsers: number;
readWriteRatio: number;
peakMultiplier: number;
};
performance: {
readLatency: string; // e.g., "< 50ms for 99th percentile"
writeLatency: string; // e.g., "< 200ms"
};
availability: string; // e.g., "99.99%"
consistency: string; // "strong" or "eventual"
durability: string; // "zero data loss"
}
// Key questions:
// - How many users? DAU/MAU?
// - What is the read/write ratio?
// - What are the latency requirements?
// - What consistency level is needed?
// - What are the availability requirements?
// - Is there geographic distribution?Step 2: Back-of-Envelope Estimation (3-5 minutes)
Quick calculations to validate your design direction. Focus on the numbers that influence architecture decisions.
// Example: URL shortener estimation
// Given: 100M URLs created per month
// Write QPS
const urlsPerMonth = 100_000_000;
const writeQPS = urlsPerMonth / (30 * 24 * 3600); // ~40 writes/sec
// Read QPS (assuming 100:1 read:write ratio)
const readQPS = writeQPS * 100; // ~4,000 reads/sec
const peakReadQPS = readQPS * 3; // ~12,000 reads/sec at peak
// Storage (5-year horizon)
const urlsIn5Years = urlsPerMonth * 12 * 5; // 6 billion URLs
const avgRecordSize = 500; // bytes (short URL + long URL + metadata)
const totalStorage = urlsIn5Years * avgRecordSize; // 3 TB
// Bandwidth
const readBandwidth = readQPS * avgRecordSize; // ~2 MB/s (trivial)
// Conclusions:
// - QPS is manageable with a single database + caching
// - 3 TB fits on a single machine but sharding is reasonable
// - Caching hot URLs in Redis will handle most readsStep 3: High-Level Design (10 minutes)
Draw the major components and their interactions. Start simple and add complexity as needed.
// High-level components for URL shortener
//
// Client -> Load Balancer -> API Servers -> Database
// -> Cache (Redis)
//
// Write flow:
// 1. Client sends POST /shorten { url: "https://long-url.com" }
// 2. API server generates short code
// 3. Store mapping in database
// 4. Return short URL to client
//
// Read flow:
// 1. Client sends GET /{shortCode}
// 2. Check Redis cache first
// 3. If cache miss, query database
// 4. Cache the result in Redis
// 5. Return 301/302 redirect to original URL
// Component decisions to state and justify:
const architectureDecisions = {
loadBalancer: "AWS ALB - distributes traffic, SSL termination",
apiServers: "Node.js - handles HTTP, stateless, horizontally scalable",
database: "PostgreSQL - reliable, handles 40 writes/sec easily",
cache: "Redis - in-memory, sub-ms reads for hot URLs",
idGeneration: "Base62 encoding of auto-increment ID or pre-generated IDs",
};Step 4: Detailed Design (15 minutes)
Deep dive into 2-3 components that are most interesting or challenging. The interviewer may guide you to specific areas.
// Deep dive: Short URL generation strategy
// Option 1: Hash-based (MD5/SHA256 + truncate)
function hashBased(longUrl: string): string {
const hash = md5(longUrl);
return base62Encode(hash.substring(0, 7));
// Problem: collisions possible, need to check and retry
}
// Option 2: Counter-based (auto-increment + base62)
function counterBased(counter: bigint): string {
return base62Encode(counter);
// Problem: predictable, sequential
// Solution: Use a distributed ID generator (Snowflake)
}
// Option 3: Pre-generated key service
class KeyGenerationService {
// Pre-generate millions of unique keys
// Store in a database table
// API servers fetch keys in batches (e.g., 1000 at a time)
// Mark keys as used
private localKeys: string[] = [];
async getKey(): Promise<string> {
if (this.localKeys.length === 0) {
// Fetch a batch from the key database
this.localKeys = await this.fetchKeyBatch(1000);
}
return this.localKeys.pop()!;
}
}
// Decision: Option 3 (pre-generated keys) because:
// - No collision handling needed
// - Keys are not sequential (better security)
// - Scales well (batch fetching reduces DB calls)
// - Trade-off: need to manage the key poolStep 5: Bottlenecks and Trade-offs (5 minutes)
Proactively identify weaknesses in your design and propose mitigations. This demonstrates mature engineering thinking.
| Bottleneck | Impact | Mitigation |
|---|---|---|
| Single database | Write throughput limit, single point of failure | Add read replicas, shard by hash of short code |
| Cache eviction | Cold cache causes database overload | Warm cache on startup, use LRU eviction, cache frequently accessed URLs longer |
| Hot keys | Viral URLs overload a single cache node | Replicate hot keys across multiple cache nodes |
| Abuse / spam | Malicious users create millions of URLs | Rate limiting per IP/user, CAPTCHA for unauthenticated users |
Common Mistakes to Avoid
- Jumping to the solution. Spending 30 seconds on requirements and 35 minutes on design is backwards. Requirements drive everything
- Over-engineering. Using Kafka, Kubernetes, and microservices for a system that serves 100 users. Start simple, scale when needed
- Not doing estimation. Without numbers, you cannot justify whether a single server or 100 servers is needed
- Monologuing. The interview is a conversation, not a presentation. Check in with the interviewer regularly
- Ignoring trade-offs. Every design decision has pros and cons. State them explicitly
- Not knowing your tools. If you propose using Cassandra, you should know when it is better or worse than PostgreSQL
- Focusing on the wrong details. Spending 10 minutes on the database schema when the interviewer wants to discuss caching strategy
- Forgetting about failures. What happens when a server crashes? When the database is down? When the network partitions?
Tips from Interviewers
- Drive the conversation. The best candidates lead the discussion while staying open to feedback
- Think out loud. Interviewers cannot evaluate what they cannot hear. Share your reasoning, even when uncertain
- Be honest about what you do not know. Saying "I am not sure how exactly Kafka handles this, but I believe..." is better than making something up
- Use real numbers and real technologies. "We will use PostgreSQL with an estimated 10,000 writes/sec" is better than "We will use a database"
- Address the interviewer's hints. If they ask "what about failure scenarios?", spend time on that topic
Practice Roadmap
Build your system design skills progressively:
| Week | Focus Area | Practice Problems |
|---|---|---|
| 1-2 | Fundamentals | URL shortener, paste bin, key-value store |
| 3-4 | Data-intensive systems | Twitter feed, news feed, notification system |
| 5-6 | Real-time systems | Chat application, live streaming, collaborative editing |
| 7-8 | Complex systems | Ride sharing, payment system, search engine |
| 9-10 | Mock interviews | Practice with peers, timed sessions, feedback loops |