TechLead
Lesson 2 of 30
7 min read
System Design

Scalability Fundamentals

Master scalability fundamentals including vertical vs horizontal scaling, stateless services, database scaling, and real-world scaling strategies

What Is Scalability?

Scalability is the capability of a system to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. A scalable system can maintain or improve its performance characteristics as the workload increases — whether that means more users, more data, or more transactions.

Scalability is not about how fast a system is today. A system that handles 100 requests per second with 10ms latency but cannot handle 1,000 requests per second without falling over is not scalable. Conversely, a system that handles 100 requests per second with 50ms latency but can smoothly grow to handle 100,000 requests per second is highly scalable.

Vertical Scaling vs Horizontal Scaling

There are two fundamental approaches to scaling a system, and understanding when to use each is critical for designing effective architectures.

Vertical Scaling (Scale Up)

Vertical scaling means increasing the resources of a single machine — adding more CPU cores, more RAM, faster storage, or upgrading to a more powerful instance type. This is the simplest form of scaling because it requires no changes to your application code.

  • Pros: Simple to implement, no code changes needed, no distributed system complexity, data consistency is straightforward.
  • Cons: Hard hardware limits (you cannot buy a machine with 10 million cores), single point of failure, expensive at the high end, requires downtime for upgrades in many cases.

For example, Amazon RDS allows you to scale your database vertically by upgrading from a db.t3.micro (2 vCPUs, 1 GB RAM) to a db.r6g.16xlarge (64 vCPUs, 512 GB RAM). But eventually even the largest instance will not be enough.

Horizontal Scaling (Scale Out)

Horizontal scaling means adding more machines to your pool of resources. Instead of making one machine more powerful, you distribute the load across many machines. This is the approach used by virtually all large-scale systems.

  • Pros: Near-infinite scalability, fault tolerance through redundancy, cost-effective with commodity hardware, no single point of failure.
  • Cons: Application must be designed for distribution, data consistency is complex, network communication overhead, harder to debug and monitor.

Vertical vs Horizontal Scaling Comparison

Aspect Vertical Scaling Horizontal Scaling
ComplexityLowHigh
Cost curveExponentialLinear
Fault toleranceNone (SPOF)Built-in
Upper limitHardware ceilingVirtually unlimited
Downtime for upgradeOften requiredRolling upgrades possible
Data consistencySimpleComplex (distributed)

Stateless vs Stateful Services

The distinction between stateless and stateful services is one of the most important architectural decisions you will make, and it directly impacts how easily your system can scale horizontally.

Stateless Services

A stateless service does not store any client-specific data between requests. Every request contains all the information the server needs to process it. This means any server in the pool can handle any request, making horizontal scaling trivial — you just add more servers behind a load balancer.

// Stateless API server example
// Any instance can handle any request
import express from 'express';

const app = express();

app.get('/api/users/:id', async (req, res) => {
  // All state is in the database, not in memory
  const user = await database.query('SELECT * FROM users WHERE id = $1', [req.params.id]);

  // Authentication info comes from the JWT token in the request
  const token = req.headers.authorization;
  const claims = verifyJWT(token);

  res.json({ user, requestedBy: claims.userId });
});

// This server can be replicated to 100 instances
// without any coordination between them

Stateful Services

A stateful service stores data about each client session in memory on the specific server handling that client. This means subsequent requests from the same client must be routed to the same server (known as "sticky sessions"), which complicates load balancing and limits scalability.

// Stateful server - harder to scale
const sessions = new Map<string, UserSession>();

app.post('/api/login', (req, res) => {
  const sessionId = generateId();
  // Session is stored in THIS server's memory
  sessions.set(sessionId, {
    userId: req.body.userId,
    cart: [],
    preferences: {}
  });
  res.cookie('sessionId', sessionId);
  res.json({ success: true });
});

app.get('/api/cart', (req, res) => {
  // Must be routed to the SAME server that handled login
  const session = sessions.get(req.cookies.sessionId);
  if (!session) return res.status(401).json({ error: 'Session not found' });
  res.json({ cart: session.cart });
});

Making Stateful Services Stateless

The solution is to externalize session state to a shared data store like Redis. This way, any server can look up the session, and the application servers themselves remain stateless.

  • Move sessions to Redis: Store session data in a Redis cluster instead of in-memory maps.
  • Use JWTs: Encode session data directly in the token so no server-side session storage is needed.
  • Use a shared database: Store state in a database accessible by all servers.

Scaling Strategies

Real-world systems use multiple scaling strategies in combination. Here are the most important ones:

1. Load Balancing

Distribute incoming traffic across multiple servers to ensure no single server becomes a bottleneck. Load balancers can work at Layer 4 (TCP) or Layer 7 (HTTP) and use various algorithms to decide which server receives each request.

2. Caching

Store frequently accessed data in a fast in-memory store (like Redis or Memcached) to reduce the load on databases and speed up responses. Caching can reduce database queries by 80-90% in read-heavy workloads.

3. Database Replication

Create read replicas of your database to distribute read queries. In most applications, reads outnumber writes by 10:1 or even 100:1, so offloading reads to replicas dramatically increases throughput.

// Read replica routing example
class DatabaseRouter {
  private primary: DatabaseConnection;
  private replicas: DatabaseConnection[];
  private currentReplica = 0;

  async query(sql: string, params: any[], isWrite: boolean) {
    if (isWrite) {
      // Writes always go to the primary
      return this.primary.query(sql, params);
    }

    // Reads are distributed across replicas (round-robin)
    const replica = this.replicas[this.currentReplica];
    this.currentReplica = (this.currentReplica + 1) % this.replicas.length;
    return replica.query(sql, params);
  }
}

4. Database Sharding

Partition data across multiple database instances based on a shard key (such as user ID or geographic region). Each shard holds a subset of the data, allowing the system to scale write throughput linearly.

5. Asynchronous Processing

Move time-consuming tasks (sending emails, generating reports, processing images) out of the request-response cycle into background jobs using message queues. This keeps response times low and allows work to be processed at a different pace.

6. Content Delivery Networks (CDNs)

Cache static assets (images, CSS, JavaScript, videos) at edge locations around the world. This reduces latency for global users and offloads traffic from your origin servers. Companies like Netflix serve over 90% of their traffic through CDNs.

Database Scaling Approaches

The database is almost always the first bottleneck in a growing system. Here are the progressive steps for scaling databases:

  1. Optimize queries: Add proper indexes, rewrite slow queries, use EXPLAIN plans. This is free and should always be done first.
  2. Add caching: Cache frequent query results in Redis to reduce database load.
  3. Vertical scaling: Upgrade to a larger database instance with more CPU and RAM.
  4. Read replicas: Create read replicas and route read queries to them.
  5. Connection pooling: Use a connection pooler like PgBouncer to manage database connections efficiently.
  6. Sharding: Partition data across multiple database instances.
  7. Consider NoSQL: For specific access patterns, a NoSQL database like DynamoDB or Cassandra may scale more naturally.

Real-World Scaling Examples

  • Instagram: Started on a single Django server and PostgreSQL database. Scaled to billions of users using PostgreSQL sharding, Memcached, Cassandra for feed storage, and a custom CDN. They kept Python/Django through all of this.
  • Twitter: Originally a monolithic Ruby on Rails app. Decomposed into microservices, moved the timeline from MySQL to a custom in-memory store (Manhattan), and uses Kafka for real-time event processing.
  • Netflix: Migrated from a monolithic Java app in a data center to hundreds of microservices on AWS. Uses Cassandra for distributed storage, EVCache (Memcached-based) for caching, and serves 90% of traffic through their Open Connect CDN.

Key Takeaways

  • Always start simple and scale incrementally as needed. Premature optimization is the root of much unnecessary complexity.
  • Make your services stateless whenever possible — this is the single biggest enabler of horizontal scaling.
  • The database is usually the first bottleneck. Have a clear plan for how you will scale it.
  • Caching and CDNs provide the biggest bang for the buck in most applications.
  • Real-world systems use a combination of all these strategies, not just one.

Continue Learning