TechLead
Lesson 13 of 30
6 min read
System Design

SQL vs NoSQL at Scale

Compare SQL and NoSQL databases at scale including ACID vs BASE, document stores, key-value, column-family, and graph databases with decision frameworks

SQL vs NoSQL: The Fundamental Trade-offs

Choosing the right database is one of the most consequential decisions in system design. SQL and NoSQL databases are not competitors -- they are tools designed for different problems. Understanding their strengths, weaknesses, and scaling characteristics is essential for designing systems that perform well under load.

Quick Definitions

  • SQL (Relational): Structured data organized in tables with rows and columns, with relationships enforced through foreign keys and joins. Examples: PostgreSQL, MySQL, SQL Server.
  • NoSQL (Non-Relational): A broad category encompassing document stores, key-value stores, column-family databases, and graph databases. Examples: MongoDB, Redis, Cassandra, Neo4j.

ACID vs BASE

The fundamental difference in guarantees between SQL and NoSQL databases can be summarized by two acronyms: ACID and BASE.

ACID vs BASE Properties

ACID (SQL) BASE (NoSQL)
Atomicity - All or nothing transactionsBasically Available - System guarantees availability
Consistency - Data always valid per constraintsSoft state - State may change over time without input
Isolation - Concurrent transactions don't interfereEventual consistency - System will become consistent
Durability - Committed data survives failuresPrioritizes availability and partition tolerance
// ACID transaction example (SQL)
async function transferMoney(fromId: string, toId: string, amount: number) {
  const trx = await db.transaction();
  try {
    // Both operations succeed or both fail (Atomicity)
    await trx.execute(
      "UPDATE accounts SET balance = balance - $1 WHERE id = $2 AND balance >= $1",
      [amount, fromId]
    );
    await trx.execute(
      "UPDATE accounts SET balance = balance + $1 WHERE id = $2",
      [amount, toId]
    );
    await trx.commit(); // Durability: written to disk
  } catch (error) {
    await trx.rollback(); // Atomicity: nothing happened
    throw error;
  }
}

// BASE approach (NoSQL)
// Same transfer using eventual consistency
async function transferMoneyEventual(fromId: string, toId: string, amount: number) {
  // Debit the sender (may succeed independently)
  await dynamodb.update({
    TableName: "accounts",
    Key: { id: fromId },
    UpdateExpression: "SET balance = balance - :amount",
    ConditionExpression: "balance >= :amount",
    ExpressionAttributeValues: { ":amount": amount },
  });

  // Credit the receiver (processed asynchronously via event)
  await sqs.sendMessage({
    QueueUrl: TRANSFER_QUEUE,
    MessageBody: JSON.stringify({ toId, amount, fromId }),
  });
  // If this fails, a reconciliation process will detect
  // and fix the inconsistency
}

Types of NoSQL Databases

1. Document Databases

Store data as semi-structured documents (JSON, BSON). Each document can have a different structure, making them flexible for evolving schemas. Documents are typically grouped into collections.

  • Examples: MongoDB, CouchDB, Firestore
  • Best for: Content management, user profiles, product catalogs, any domain with varied or nested data structures
  • Query style: Rich query language, secondary indexes, aggregation pipelines

2. Key-Value Stores

The simplest NoSQL model. Data is stored as key-value pairs, like a giant hash map. Extremely fast for lookups by key but offer no querying by value.

  • Examples: Redis, DynamoDB, Memcached
  • Best for: Caching, session management, real-time leaderboards, rate limiting
  • Query style: GET/SET by key only (some support secondary indexes)

3. Column-Family (Wide-Column) Stores

Store data in columns rather than rows. Each row can have a different set of columns. Optimized for reading and writing large volumes of data across many machines.

  • Examples: Cassandra, HBase, ScyllaDB
  • Best for: Time-series data, IoT data, event logging, analytics at massive scale
  • Query style: Limited to partition key and clustering column queries

4. Graph Databases

Represent data as nodes (entities) and edges (relationships). Optimized for traversing complex relationships between entities.

  • Examples: Neo4j, Amazon Neptune, ArangoDB
  • Best for: Social networks, recommendation engines, fraud detection, knowledge graphs
  • Query style: Graph traversal queries (Cypher, Gremlin, SPARQL)

Scaling Characteristics

How Different Databases Scale

Database Type Vertical Scaling Horizontal Scaling Complexity
SQL (PostgreSQL, MySQL)ExcellentChallenging (sharding is manual)High for sharding
Document (MongoDB)GoodGood (built-in sharding)Medium
Key-Value (Redis)GoodGood (Redis Cluster)Low-Medium
Column-Family (Cassandra)GoodExcellent (designed for it)Low
Graph (Neo4j)GoodLimited (graph partitioning is hard)High

SQL Scaling Strategies

  • Read replicas: Add replicas to handle read traffic (most common first step)
  • Connection pooling: Use PgBouncer or ProxySQL to manage database connections efficiently
  • Vertical scaling: Upgrade CPU, RAM, and use fast SSDs
  • Partitioning / Sharding: Split data across multiple database servers by a shard key
  • NewSQL: CockroachDB, TiDB, and Vitess provide SQL interfaces with horizontal scaling built in

NoSQL Scaling Strategies

  • Native sharding: Most NoSQL databases shard automatically based on a partition key
  • Tunable consistency: Adjust consistency levels per query (e.g., Cassandra quorum reads)
  • Denormalization: Store data in the shape your queries need to avoid joins
  • Compaction strategies: Optimize storage engine settings for your write pattern
// Modeling the same data in SQL vs NoSQL

// SQL: Normalized schema (3NF)
// Users table
// | id | name       | email            |
// | 1  | Alice      | alice@mail.com   |

// Orders table
// | id | user_id | product_id | quantity | created_at |
// | 1  | 1       | 100        | 2        | 2025-01-15 |

// Products table
// | id  | name     | price |
// | 100 | Widget   | 29.99 |

// SQL Query: Get user with their orders and product names
// SELECT u.name, o.quantity, p.name as product_name, p.price
// FROM users u
// JOIN orders o ON u.id = o.user_id
// JOIN products p ON o.product_id = p.id
// WHERE u.id = 1;

// NoSQL (Document): Denormalized for read performance
interface UserDocument {
  _id: string;
  name: string;
  email: string;
  orders: Array<{
    orderId: string;
    product: {
      productId: string;
      name: string;
      price: number;
    };
    quantity: number;
    createdAt: Date;
  }>;
}

// Single read to get everything - no joins needed
// But updating a product name requires updating every document
// that references that product (denormalization trade-off)

Decision Framework

Use this framework when deciding between SQL and NoSQL in system design interviews or real projects.

Choose SQL When:

  • Data has clear relationships and you need to query across those relationships with JOINs
  • ACID transactions are critical (financial systems, inventory, booking systems)
  • Schema is well-defined and unlikely to change dramatically
  • Complex queries are needed with aggregations, grouping, and filtering on multiple columns
  • Data integrity is paramount (foreign keys, unique constraints, check constraints)

Choose NoSQL When:

  • Massive write throughput is needed (logging, IoT, event streams)
  • Data is naturally hierarchical or nested (JSON documents, user-generated content)
  • Schema flexibility is important (rapid iteration, varied data shapes)
  • Horizontal scaling is a primary requirement from day one
  • Simple access patterns (lookup by key, no complex joins)
  • Low-latency reads at any scale (caching layer, session storage)

Real-World Examples

Company/Use Case Database Choice Why
Banking / PaymentsPostgreSQLACID transactions for financial integrity
Product Catalog (Amazon)DynamoDBKey-value lookups at massive scale, varied product attributes
Social Graph (Facebook)TAO (Graph-like)Optimized for relationship traversal
Time-Series MetricsCassandra / InfluxDBHigh write throughput, time-based partitioning
Session StorageRedisSub-millisecond reads, automatic expiration
E-commerce OrdersPostgreSQL + RedisSQL for order integrity, Redis for caching

Interview Tip: Polyglot Persistence

Most real-world systems use multiple databases for different purposes. In interviews, don't feel locked into one choice. A well-designed system might use PostgreSQL for user data and orders, Redis for caching and sessions, Elasticsearch for full-text search, and Cassandra for event logging. This approach is called polyglot persistence, and mentioning it demonstrates architectural maturity.

Continue Learning