What is Load Balancing Deep Dive?

Deep dive into load balancing algorithms, L4 vs L7 load balancers, health checks, and configuration examples with Nginx, HAProxy, and AWS ELB

Load Balancing Deep Dive - System Design Tutorial | TechLead

Q: What Is Load Balancing?

A load balancer distributes incoming network traffic across multiple servers to ensure no single server bears too much load. It sits between the client and the server pool, acting as a reverse proxy that routes requests to the most appropriate backend server.

What Is Load Balancing?

A load balancer distributes incoming network traffic across multiple servers to ensure no single server bears too much load. It sits between the client and the server pool, acting as a reverse proxy that routes requests to the most appropriate backend server.

Load balancing is essential for achieving high availability, reliability, and scalability. Without it, a single server would be a bottleneck and a single point of failure. With a load balancer, you can add or remove servers from the pool without any downtime, and if one server goes down, traffic is automatically redirected to healthy servers.

Benefits of Load Balancing

High Availability: If a server fails, the load balancer stops sending traffic to it and redistributes to healthy servers.
Scalability: Add more servers behind the load balancer to handle increased traffic without changing client configuration.
Performance: Distribute work evenly to prevent any single server from becoming overwhelmed.
Flexibility: Perform rolling updates by removing servers one at a time, updating them, and adding them back.
SSL Termination: Offload TLS encryption/decryption from backend servers, reducing their CPU load.

Load Balancing Algorithms

The algorithm determines how the load balancer selects which backend server should handle each incoming request. The best choice depends on your workload characteristics.

1. Round Robin

Requests are distributed sequentially across the server pool. Server 1 gets the first request, Server 2 gets the second, Server 3 gets the third, and so on, cycling back to Server 1 after reaching the last server.

Best for: Servers with equal capacity handling similar-duration requests.
Limitation: Does not account for server load or request complexity. A server handling a heavy request gets the same number of new requests as an idle server.

2. Weighted Round Robin

Like round robin, but each server is assigned a weight based on its capacity. A server with weight 3 receives three times as many requests as a server with weight 1. This is useful when you have servers with different hardware specifications.

3. Least Connections

The load balancer sends each new request to the server with the fewest active connections. This is more intelligent than round robin because it accounts for the actual load on each server.

Best for: Workloads where request duration varies significantly (e.g., some requests take 10ms, others take 10 seconds).
Limitation: Requires the load balancer to track the number of active connections per server, adding slight overhead.

4. Least Response Time

Sends requests to the server with the fastest response time and fewest active connections. This combines connection count with actual performance data for smarter routing.

5. IP Hash

A hash of the client's IP address determines which server receives the request. This ensures that the same client always reaches the same server, which can be useful for session persistence (sticky sessions).

Best for: Applications that require session affinity without external session storage.
Limitation: If a server goes down, all clients assigned to it must be rehashed, and uneven IP distribution can cause hotspots.

6. Random

Selects a server at random for each request. Surprisingly effective with a large number of servers due to the law of large numbers, but less predictable than other algorithms.

Algorithm Comparison

Algorithm	Complexity	Sticky Sessions	Best Use Case
Round Robin	O(1)	No	Uniform workloads
Weighted Round Robin	O(1)	No	Mixed server capacities
Least Connections	O(n)	No	Variable request durations
Least Response Time	O(n)	No	Latency-sensitive apps
IP Hash	O(1)	Yes	Session affinity needed
Random	O(1)	No	Large server pools

L4 vs L7 Load Balancers

Load balancers operate at different layers of the OSI model, and the layer determines what information is available for making routing decisions.

Layer 4 (Transport Layer) Load Balancing

L4 load balancers work at the TCP/UDP level. They route traffic based on IP addresses and port numbers without inspecting the content of the packets. They are extremely fast because they do not need to decrypt TLS or parse HTTP headers.

Routes based on IP address and TCP port
Cannot make decisions based on URL path, headers, or cookies
Very high throughput and low latency
Simpler to configure and manage
Examples: AWS Network Load Balancer (NLB), HAProxy in TCP mode

Layer 7 (Application Layer) Load Balancing

L7 load balancers operate at the HTTP/HTTPS level. They can inspect the full request — headers, cookies, URL path, body — and make intelligent routing decisions based on this information. This enables powerful capabilities like path-based routing, A/B testing, and canary deployments.

Can route based on URL path, HTTP headers, cookies, or request body
Supports SSL termination
Can modify requests and responses (add headers, rewrite URLs)
Higher latency than L4 due to deeper inspection
Examples: AWS Application Load Balancer (ALB), Nginx, HAProxy in HTTP mode

When to Use L4 vs L7

Use L4 when: You need raw throughput and low latency (e.g., gaming servers, database connections, real-time streaming), or when you do not need content-based routing.
Use L7 when: You need to route based on URL paths (e.g., /api to one pool, /static to another), need SSL termination, or require advanced features like A/B testing and canary releases.

Health Checks

Health checks are how the load balancer knows whether a backend server is healthy and capable of handling requests. Without health checks, the load balancer might send traffic to a crashed or overloaded server.

Types of Health Checks

TCP Health Check: Attempts to open a TCP connection to the server. If the connection succeeds, the server is considered healthy. Simple but does not verify application health.
HTTP Health Check: Sends an HTTP GET request to a specific endpoint (e.g., /health) and checks for a 200 OK response. More thorough because it verifies the application is running and responding.
Deep Health Check: The /health endpoint checks critical dependencies (database, cache, message queue) and reports the overall health of the service. This catches cases where the process is running but a dependency is down.

// Deep health check endpoint example
import express from 'express';

const app = express();

app.get('/health', async (req, res) => {
  const checks = {
    database: false,
    redis: false,
    diskSpace: false,
  };

  try {
    // Check database connectivity
    await db.query('SELECT 1');
    checks.database = true;
  } catch (e) {
    console.error('Database health check failed:', e);
  }

  try {
    // Check Redis connectivity
    await redis.ping();
    checks.redis = true;
  } catch (e) {
    console.error('Redis health check failed:', e);
  }

  try {
    // Check disk space
    const freeSpace = await getDiskSpace();
    checks.diskSpace = freeSpace > 1_000_000_000; // 1GB minimum
  } catch (e) {
    console.error('Disk space check failed:', e);
  }

  const allHealthy = Object.values(checks).every(Boolean);

  res.status(allHealthy ? 200 : 503).json({
    status: allHealthy ? 'healthy' : 'unhealthy',
    checks,
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  });
});

Load Balancer Configuration Examples

Nginx Configuration

Nginx is one of the most popular open-source load balancers and reverse proxies. Here is a practical configuration for HTTP load balancing:

upstream backend_servers {
    # Least connections algorithm
    least_conn;

    # Backend servers with health checks
    server 10.0.1.10:3000 weight=3 max_fails=3 fail_timeout=30s;
    server 10.0.1.11:3000 weight=2 max_fails=3 fail_timeout=30s;
    server 10.0.1.12:3000 weight=1 max_fails=3 fail_timeout=30s;

    # Backup server - only used when all others are down
    server 10.0.1.13:3000 backup;

    # Keep connections alive to backends
    keepalive 32;
}

server {
    listen 80;
    listen 443 ssl;
    server_name api.example.com;

    # SSL configuration
    ssl_certificate /etc/ssl/certs/api.example.com.pem;
    ssl_certificate_key /etc/ssl/private/api.example.com.key;

    # Redirect HTTP to HTTPS
    if ($scheme = http) {
        return 301 https://$host$request_uri;
    }

    location / {
        proxy_pass http://backend_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeouts
        proxy_connect_timeout 5s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    # Separate upstream for static files
    location /static/ {
        proxy_pass http://static_servers;
        proxy_cache static_cache;
        proxy_cache_valid 200 1d;
    }

    # Health check endpoint
    location /nginx-health {
        access_log off;
        return 200 "healthy";
    }
}

HAProxy Configuration

global
    maxconn 50000
    log stdout format raw local0

defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 30s
    option httplog
    option dontlognull

frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/combined.pem
    redirect scheme https if !{ ssl_fc }

    # Route based on URL path (L7)
    acl is_api path_beg /api
    acl is_ws hdr(Upgrade) -i websocket

    use_backend api_servers if is_api
    use_backend ws_servers if is_ws
    default_backend web_servers

backend api_servers
    balance leastconn
    option httpchk GET /health
    http-check expect status 200

    server api1 10.0.1.10:3000 check inter 10s fall 3 rise 2
    server api2 10.0.1.11:3000 check inter 10s fall 3 rise 2
    server api3 10.0.1.12:3000 check inter 10s fall 3 rise 2

backend web_servers
    balance roundrobin
    option httpchk GET /
    server web1 10.0.2.10:8080 check
    server web2 10.0.2.11:8080 check

backend ws_servers
    balance source
    server ws1 10.0.3.10:8080 check
    server ws2 10.0.3.11:8080 check

listen stats
    bind *:8404
    stats enable
    stats uri /stats

Load Balancer Tools Comparison

Feature	Nginx	HAProxy	AWS ALB/NLB
Type	L4/L7	L4/L7	ALB: L7, NLB: L4
Web server	Yes	No	No
Managed	No	No	Yes
SSL termination	Yes	Yes	Yes
WebSocket	Yes	Yes	Yes
Cost	Free/Open source	Free/Open source	Pay per use

Best Practices

Always configure health checks. Without them, a load balancer will send traffic to dead servers.
Use connection draining. When removing a server, allow existing connections to finish before stopping traffic.
Monitor your load balancer. It is a critical component — if it goes down, everything behind it is unreachable.
Make the load balancer itself redundant. Use active-passive or active-active pairs to avoid a single point of failure.
Use SSL termination wisely. Terminate TLS at the load balancer to reduce CPU load on backend servers, but use encrypted connections internally if handling sensitive data.

Load Balancing Deep Dive