What is SQL vs NoSQL at Scale?

Compare SQL and NoSQL databases at scale including ACID vs BASE, document stores, key-value, column-family, and graph databases with decision frameworks

SQL vs NoSQL at Scale - System Design Tutorial | TechLead

SQL vs NoSQL: The Fundamental Trade-offs

Choosing the right database is one of the most consequential decisions in system design. SQL and NoSQL databases are not competitors -- they are tools designed for different problems. Understanding their strengths, weaknesses, and scaling characteristics is essential for designing systems that perform well under load.

Quick Definitions

SQL (Relational): Structured data organized in tables with rows and columns, with relationships enforced through foreign keys and joins. Examples: PostgreSQL, MySQL, SQL Server.
NoSQL (Non-Relational): A broad category encompassing document stores, key-value stores, column-family databases, and graph databases. Examples: MongoDB, Redis, Cassandra, Neo4j.

ACID vs BASE

The fundamental difference in guarantees between SQL and NoSQL databases can be summarized by two acronyms: ACID and BASE.

ACID vs BASE Properties

ACID (SQL)	BASE (NoSQL)
Atomicity - All or nothing transactions	Basically Available - System guarantees availability
Consistency - Data always valid per constraints	Soft state - State may change over time without input
Isolation - Concurrent transactions don't interfere	Eventual consistency - System will become consistent
Durability - Committed data survives failures	Prioritizes availability and partition tolerance

// ACID transaction example (SQL)
async function transferMoney(fromId: string, toId: string, amount: number) {
  const trx = await db.transaction();
  try {
    // Both operations succeed or both fail (Atomicity)
    await trx.execute(
      "UPDATE accounts SET balance = balance - $1 WHERE id = $2 AND balance >= $1",
      [amount, fromId]
    );
    await trx.execute(
      "UPDATE accounts SET balance = balance + $1 WHERE id = $2",
      [amount, toId]
    );
    await trx.commit(); // Durability: written to disk
  } catch (error) {
    await trx.rollback(); // Atomicity: nothing happened
    throw error;
  }
}

// BASE approach (NoSQL)
// Same transfer using eventual consistency
async function transferMoneyEventual(fromId: string, toId: string, amount: number) {
  // Debit the sender (may succeed independently)
  await dynamodb.update({
    TableName: "accounts",
    Key: { id: fromId },
    UpdateExpression: "SET balance = balance - :amount",
    ConditionExpression: "balance >= :amount",
    ExpressionAttributeValues: { ":amount": amount },
  });

  // Credit the receiver (processed asynchronously via event)
  await sqs.sendMessage({
    QueueUrl: TRANSFER_QUEUE,
    MessageBody: JSON.stringify({ toId, amount, fromId }),
  });
  // If this fails, a reconciliation process will detect
  // and fix the inconsistency
}

Types of NoSQL Databases

1. Document Databases

Store data as semi-structured documents (JSON, BSON). Each document can have a different structure, making them flexible for evolving schemas. Documents are typically grouped into collections.

Examples: MongoDB, CouchDB, Firestore
Best for: Content management, user profiles, product catalogs, any domain with varied or nested data structures
Query style: Rich query language, secondary indexes, aggregation pipelines

2. Key-Value Stores

The simplest NoSQL model. Data is stored as key-value pairs, like a giant hash map. Extremely fast for lookups by key but offer no querying by value.

Examples: Redis, DynamoDB, Memcached
Best for: Caching, session management, real-time leaderboards, rate limiting
Query style: GET/SET by key only (some support secondary indexes)

3. Column-Family (Wide-Column) Stores

Store data in columns rather than rows. Each row can have a different set of columns. Optimized for reading and writing large volumes of data across many machines.

Examples: Cassandra, HBase, ScyllaDB
Best for: Time-series data, IoT data, event logging, analytics at massive scale
Query style: Limited to partition key and clustering column queries

4. Graph Databases

Represent data as nodes (entities) and edges (relationships). Optimized for traversing complex relationships between entities.

Examples: Neo4j, Amazon Neptune, ArangoDB
Best for: Social networks, recommendation engines, fraud detection, knowledge graphs
Query style: Graph traversal queries (Cypher, Gremlin, SPARQL)

Scaling Characteristics

How Different Databases Scale

Database Type	Vertical Scaling	Horizontal Scaling	Complexity
SQL (PostgreSQL, MySQL)	Excellent	Challenging (sharding is manual)	High for sharding
Document (MongoDB)	Good	Good (built-in sharding)	Medium
Key-Value (Redis)	Good	Good (Redis Cluster)	Low-Medium
Column-Family (Cassandra)	Good	Excellent (designed for it)	Low
Graph (Neo4j)	Good	Limited (graph partitioning is hard)	High

SQL Scaling Strategies

Read replicas: Add replicas to handle read traffic (most common first step)
Connection pooling: Use PgBouncer or ProxySQL to manage database connections efficiently
Vertical scaling: Upgrade CPU, RAM, and use fast SSDs
Partitioning / Sharding: Split data across multiple database servers by a shard key
NewSQL: CockroachDB, TiDB, and Vitess provide SQL interfaces with horizontal scaling built in

NoSQL Scaling Strategies

Native sharding: Most NoSQL databases shard automatically based on a partition key
Tunable consistency: Adjust consistency levels per query (e.g., Cassandra quorum reads)
Denormalization: Store data in the shape your queries need to avoid joins
Compaction strategies: Optimize storage engine settings for your write pattern

// Modeling the same data in SQL vs NoSQL

// SQL: Normalized schema (3NF)
// Users table
// | id | name       | email            |
// | 1  | Alice      | alice@mail.com   |

// Orders table
// | id | user_id | product_id | quantity | created_at |
// | 1  | 1       | 100        | 2        | 2025-01-15 |

// Products table
// | id  | name     | price |
// | 100 | Widget   | 29.99 |

// SQL Query: Get user with their orders and product names
// SELECT u.name, o.quantity, p.name as product_name, p.price
// FROM users u
// JOIN orders o ON u.id = o.user_id
// JOIN products p ON o.product_id = p.id
// WHERE u.id = 1;

// NoSQL (Document): Denormalized for read performance
interface UserDocument {
  _id: string;
  name: string;
  email: string;
  orders: Array<{
    orderId: string;
    product: {
      productId: string;
      name: string;
      price: number;
    };
    quantity: number;
    createdAt: Date;
  }>;
}

// Single read to get everything - no joins needed
// But updating a product name requires updating every document
// that references that product (denormalization trade-off)

Decision Framework

Use this framework when deciding between SQL and NoSQL in system design interviews or real projects.

Choose SQL When:

Data has clear relationships and you need to query across those relationships with JOINs
ACID transactions are critical (financial systems, inventory, booking systems)
Schema is well-defined and unlikely to change dramatically
Complex queries are needed with aggregations, grouping, and filtering on multiple columns
Data integrity is paramount (foreign keys, unique constraints, check constraints)

Choose NoSQL When:

Massive write throughput is needed (logging, IoT, event streams)
Data is naturally hierarchical or nested (JSON documents, user-generated content)
Schema flexibility is important (rapid iteration, varied data shapes)
Horizontal scaling is a primary requirement from day one
Simple access patterns (lookup by key, no complex joins)
Low-latency reads at any scale (caching layer, session storage)

Real-World Examples

Company/Use Case	Database Choice	Why
Banking / Payments	PostgreSQL	ACID transactions for financial integrity
Product Catalog (Amazon)	DynamoDB	Key-value lookups at massive scale, varied product attributes
Social Graph (Facebook)	TAO (Graph-like)	Optimized for relationship traversal
Time-Series Metrics	Cassandra / InfluxDB	High write throughput, time-based partitioning
Session Storage	Redis	Sub-millisecond reads, automatic expiration
E-commerce Orders	PostgreSQL + Redis	SQL for order integrity, Redis for caching

Interview Tip: Polyglot Persistence

Most real-world systems use multiple databases for different purposes. In interviews, don't feel locked into one choice. A well-designed system might use PostgreSQL for user data and orders, Redis for caching and sessions, Elasticsearch for full-text search, and Cassandra for event logging. This approach is called polyglot persistence, and mentioning it demonstrates architectural maturity.

SQL vs NoSQL at Scale