SQL vs NoSQL: The Fundamental Trade-offs
Choosing the right database is one of the most consequential decisions in system design. SQL and NoSQL databases are not competitors -- they are tools designed for different problems. Understanding their strengths, weaknesses, and scaling characteristics is essential for designing systems that perform well under load.
Quick Definitions
- SQL (Relational): Structured data organized in tables with rows and columns, with relationships enforced through foreign keys and joins. Examples: PostgreSQL, MySQL, SQL Server.
- NoSQL (Non-Relational): A broad category encompassing document stores, key-value stores, column-family databases, and graph databases. Examples: MongoDB, Redis, Cassandra, Neo4j.
ACID vs BASE
The fundamental difference in guarantees between SQL and NoSQL databases can be summarized by two acronyms: ACID and BASE.
ACID vs BASE Properties
| ACID (SQL) | BASE (NoSQL) |
|---|---|
| Atomicity - All or nothing transactions | Basically Available - System guarantees availability |
| Consistency - Data always valid per constraints | Soft state - State may change over time without input |
| Isolation - Concurrent transactions don't interfere | Eventual consistency - System will become consistent |
| Durability - Committed data survives failures | Prioritizes availability and partition tolerance |
// ACID transaction example (SQL)
async function transferMoney(fromId: string, toId: string, amount: number) {
const trx = await db.transaction();
try {
// Both operations succeed or both fail (Atomicity)
await trx.execute(
"UPDATE accounts SET balance = balance - $1 WHERE id = $2 AND balance >= $1",
[amount, fromId]
);
await trx.execute(
"UPDATE accounts SET balance = balance + $1 WHERE id = $2",
[amount, toId]
);
await trx.commit(); // Durability: written to disk
} catch (error) {
await trx.rollback(); // Atomicity: nothing happened
throw error;
}
}
// BASE approach (NoSQL)
// Same transfer using eventual consistency
async function transferMoneyEventual(fromId: string, toId: string, amount: number) {
// Debit the sender (may succeed independently)
await dynamodb.update({
TableName: "accounts",
Key: { id: fromId },
UpdateExpression: "SET balance = balance - :amount",
ConditionExpression: "balance >= :amount",
ExpressionAttributeValues: { ":amount": amount },
});
// Credit the receiver (processed asynchronously via event)
await sqs.sendMessage({
QueueUrl: TRANSFER_QUEUE,
MessageBody: JSON.stringify({ toId, amount, fromId }),
});
// If this fails, a reconciliation process will detect
// and fix the inconsistency
}
Types of NoSQL Databases
1. Document Databases
Store data as semi-structured documents (JSON, BSON). Each document can have a different structure, making them flexible for evolving schemas. Documents are typically grouped into collections.
- Examples: MongoDB, CouchDB, Firestore
- Best for: Content management, user profiles, product catalogs, any domain with varied or nested data structures
- Query style: Rich query language, secondary indexes, aggregation pipelines
2. Key-Value Stores
The simplest NoSQL model. Data is stored as key-value pairs, like a giant hash map. Extremely fast for lookups by key but offer no querying by value.
- Examples: Redis, DynamoDB, Memcached
- Best for: Caching, session management, real-time leaderboards, rate limiting
- Query style: GET/SET by key only (some support secondary indexes)
3. Column-Family (Wide-Column) Stores
Store data in columns rather than rows. Each row can have a different set of columns. Optimized for reading and writing large volumes of data across many machines.
- Examples: Cassandra, HBase, ScyllaDB
- Best for: Time-series data, IoT data, event logging, analytics at massive scale
- Query style: Limited to partition key and clustering column queries
4. Graph Databases
Represent data as nodes (entities) and edges (relationships). Optimized for traversing complex relationships between entities.
- Examples: Neo4j, Amazon Neptune, ArangoDB
- Best for: Social networks, recommendation engines, fraud detection, knowledge graphs
- Query style: Graph traversal queries (Cypher, Gremlin, SPARQL)
Scaling Characteristics
How Different Databases Scale
| Database Type | Vertical Scaling | Horizontal Scaling | Complexity |
|---|---|---|---|
| SQL (PostgreSQL, MySQL) | Excellent | Challenging (sharding is manual) | High for sharding |
| Document (MongoDB) | Good | Good (built-in sharding) | Medium |
| Key-Value (Redis) | Good | Good (Redis Cluster) | Low-Medium |
| Column-Family (Cassandra) | Good | Excellent (designed for it) | Low |
| Graph (Neo4j) | Good | Limited (graph partitioning is hard) | High |
SQL Scaling Strategies
- Read replicas: Add replicas to handle read traffic (most common first step)
- Connection pooling: Use PgBouncer or ProxySQL to manage database connections efficiently
- Vertical scaling: Upgrade CPU, RAM, and use fast SSDs
- Partitioning / Sharding: Split data across multiple database servers by a shard key
- NewSQL: CockroachDB, TiDB, and Vitess provide SQL interfaces with horizontal scaling built in
NoSQL Scaling Strategies
- Native sharding: Most NoSQL databases shard automatically based on a partition key
- Tunable consistency: Adjust consistency levels per query (e.g., Cassandra quorum reads)
- Denormalization: Store data in the shape your queries need to avoid joins
- Compaction strategies: Optimize storage engine settings for your write pattern
// Modeling the same data in SQL vs NoSQL
// SQL: Normalized schema (3NF)
// Users table
// | id | name | email |
// | 1 | Alice | alice@mail.com |
// Orders table
// | id | user_id | product_id | quantity | created_at |
// | 1 | 1 | 100 | 2 | 2025-01-15 |
// Products table
// | id | name | price |
// | 100 | Widget | 29.99 |
// SQL Query: Get user with their orders and product names
// SELECT u.name, o.quantity, p.name as product_name, p.price
// FROM users u
// JOIN orders o ON u.id = o.user_id
// JOIN products p ON o.product_id = p.id
// WHERE u.id = 1;
// NoSQL (Document): Denormalized for read performance
interface UserDocument {
_id: string;
name: string;
email: string;
orders: Array<{
orderId: string;
product: {
productId: string;
name: string;
price: number;
};
quantity: number;
createdAt: Date;
}>;
}
// Single read to get everything - no joins needed
// But updating a product name requires updating every document
// that references that product (denormalization trade-off)
Decision Framework
Use this framework when deciding between SQL and NoSQL in system design interviews or real projects.
Choose SQL When:
- Data has clear relationships and you need to query across those relationships with JOINs
- ACID transactions are critical (financial systems, inventory, booking systems)
- Schema is well-defined and unlikely to change dramatically
- Complex queries are needed with aggregations, grouping, and filtering on multiple columns
- Data integrity is paramount (foreign keys, unique constraints, check constraints)
Choose NoSQL When:
- Massive write throughput is needed (logging, IoT, event streams)
- Data is naturally hierarchical or nested (JSON documents, user-generated content)
- Schema flexibility is important (rapid iteration, varied data shapes)
- Horizontal scaling is a primary requirement from day one
- Simple access patterns (lookup by key, no complex joins)
- Low-latency reads at any scale (caching layer, session storage)
Real-World Examples
| Company/Use Case | Database Choice | Why |
|---|---|---|
| Banking / Payments | PostgreSQL | ACID transactions for financial integrity |
| Product Catalog (Amazon) | DynamoDB | Key-value lookups at massive scale, varied product attributes |
| Social Graph (Facebook) | TAO (Graph-like) | Optimized for relationship traversal |
| Time-Series Metrics | Cassandra / InfluxDB | High write throughput, time-based partitioning |
| Session Storage | Redis | Sub-millisecond reads, automatic expiration |
| E-commerce Orders | PostgreSQL + Redis | SQL for order integrity, Redis for caching |
Interview Tip: Polyglot Persistence
Most real-world systems use multiple databases for different purposes. In interviews, don't feel locked into one choice. A well-designed system might use PostgreSQL for user data and orders, Redis for caching and sessions, Elasticsearch for full-text search, and Cassandra for event logging. This approach is called polyglot persistence, and mentioning it demonstrates architectural maturity.