Back-of-the-Envelope Estimation
Back-of-the-envelope estimation is a critical skill for system design interviews and real-world architecture decisions. It allows engineers to quickly determine whether a design is feasible, identify bottlenecks, and make informed technology choices without building prototypes. The goal is not exact precision but rather correct order-of-magnitude reasoning.
Why Estimation Matters
- Determines if a single server or distributed system is needed
- Identifies which components need horizontal scaling
- Helps choose between different storage technologies
- Validates or invalidates architectural decisions before implementation
- Demonstrates structured thinking in interviews
Common Numbers Every Engineer Should Know
These are approximate values that serve as building blocks for estimation. Memorize the orders of magnitude, not the exact numbers.
Latency Numbers
| Operation | Latency | Notes |
|---|---|---|
| L1 cache reference | ~1 ns | On-CPU cache |
| L2 cache reference | ~4 ns | On-CPU cache |
| Main memory (RAM) reference | ~100 ns | DRAM access |
| SSD random read | ~150 us | 1000x slower than RAM |
| HDD random read | ~10 ms | Mechanical seek time |
| Same datacenter round trip | ~0.5 ms | Network within DC |
| Cross-continent round trip | ~150 ms | Speed of light limit |
| Redis GET | ~0.5 ms | In-memory + network |
| PostgreSQL simple query | ~2-5 ms | Indexed query |
| External API call | ~50-200 ms | Third-party service |
Throughput Numbers
| System | Throughput | Notes |
|---|---|---|
| Single web server (Node.js) | ~10,000 RPS | Simple API endpoints |
| PostgreSQL | ~10,000 queries/s | Simple indexed queries |
| Redis | ~100,000 ops/s | Single instance |
| Kafka (single partition) | ~10,000 msg/s | Per partition throughput |
| Kafka (cluster) | ~1,000,000 msg/s | With many partitions |
Storage and Data Size Numbers
| Data | Size |
|---|---|
| ASCII character | 1 byte |
| UTF-8 character (avg) | 2 bytes |
| UUID | 16 bytes |
| Typical tweet / short message | ~250 bytes |
| Typical JSON API response | ~2 KB |
| Compressed photo | ~200 KB |
| 1 minute of MP3 audio | ~1 MB |
| 1 minute of HD video | ~50 MB |
Power of Two Quick Reference
| Power | Exact Value | Approximate |
|---|---|---|
| 2^10 | 1,024 | ~1 thousand (1 KB) |
| 2^20 | 1,048,576 | ~1 million (1 MB) |
| 2^30 | 1,073,741,824 | ~1 billion (1 GB) |
| 2^40 | 1,099,511,627,776 | ~1 trillion (1 TB) |
Estimation Framework
Follow this systematic approach for any estimation problem:
// Estimation framework as code
interface EstimationProblem {
// Step 1: Define the question clearly
question: string;
// Step 2: Identify the key variables
assumptions: Record<string, number>;
// Step 3: Calculate step by step
calculate(): EstimationResult;
}
// Example: Estimate Twitter's storage needs for tweets
class TwitterStorageEstimation implements EstimationProblem {
question = "How much storage does Twitter need for tweets per day?";
assumptions = {
dailyActiveUsers: 300_000_000, // 300M DAU
tweetsPerUserPerDay: 0.5, // Not everyone tweets daily
avgTweetSizeBytes: 250, // Text content
metadataPerTweetBytes: 200, // User ID, timestamp, indexes
mediaAttachmentRate: 0.2, // 20% of tweets have media
avgMediaSizeBytes: 200_000, // 200KB per image
};
calculate() {
const a = this.assumptions;
const totalTweetsPerDay = a.dailyActiveUsers * a.tweetsPerUserPerDay;
// = 300M * 0.5 = 150M tweets/day
const textStoragePerDay = totalTweetsPerDay * (a.avgTweetSizeBytes + a.metadataPerTweetBytes);
// = 150M * 450 bytes = 67.5 GB/day
const mediaStoragePerDay = totalTweetsPerDay * a.mediaAttachmentRate * a.avgMediaSizeBytes;
// = 150M * 0.2 * 200KB = 6 TB/day
const totalPerDay = textStoragePerDay + mediaStoragePerDay;
// = ~6 TB/day (media dominates)
const totalPerYear = totalPerDay * 365;
// = ~2.2 PB/year
return {
tweetsPerDay: "150 million",
textStoragePerDay: "~67.5 GB",
mediaStoragePerDay: "~6 TB",
totalPerDay: "~6 TB",
totalPerYear: "~2.2 PB",
conclusion: "Media storage dominates. Need distributed object storage like S3.",
};
}
}Practice Examples with Solutions
Example 1: QPS for a URL Shortener
Question: Estimate the QPS for a URL shortener with 100 million URLs created per month.
- 100M URLs/month = 100M / (30 * 24 * 3600) = ~40 URLs/second (writes)
- Read:write ratio for URL shorteners is typically 100:1
- Read QPS = 40 * 100 = ~4,000 reads/second
- Peak = 2-3x average = ~10,000 reads/second
- Conclusion: A single PostgreSQL instance can handle this. Add Redis cache for hot URLs.
Example 2: Bandwidth for a Video Streaming Service
Question: Estimate the bandwidth needed for a service streaming to 10 million concurrent users.
- Average video bitrate: 5 Mbps (1080p)
- 10M concurrent users * 5 Mbps = 50 Tbps
- This is massive; a single data center cannot serve this
- Conclusion: Must use a global CDN with edge caching. Most traffic is served from CDN PoPs, not origin servers.
Example 3: Database Size for a Social Media Platform
Question: Estimate the database size for storing user profiles for 1 billion users.
- User profile data: name (50B), email (50B), bio (200B), settings (100B), metadata (100B) = ~500 bytes
- 1B users * 500 bytes = 500 GB
- With indexes (2x overhead): ~1 TB
- Conclusion: Fits on a single large database server for storage, but query load will require sharding or read replicas.
Useful Time Conversions
| Time Period | Seconds | Quick Approximation |
|---|---|---|
| 1 day | 86,400 | ~100,000 (10^5) |
| 1 month | 2,592,000 | ~2.5 million (2.5 * 10^6) |
| 1 year | 31,536,000 | ~30 million (3 * 10^7) |
Tips for System Design Interviews
- State your assumptions clearly. Interviewers care more about your reasoning process than exact numbers
- Round aggressively. Use powers of 10 and simple multipliers. 86,400 seconds/day becomes 100,000
- Show your work. Write down each step so the interviewer can follow your logic
- Sanity-check your answer. If you calculate that a single laptop can handle all of Google's traffic, something is wrong
- Focus on bottlenecks. Use estimates to identify which component needs the most attention