What is System Design: Social Media Feed?

Design a social media feed like Twitter covering fan-out strategies, timeline generation, caching, media storage, and scaling for millions of users

System Design: Social Media Feed - System Design Tutorial | TechLead

Q: Step 4: Caching Strategies

Caching is essential for feed performance. With billions of timeline reads per day, we need a multi-layered caching strategy.

Problem Statement

Design a social media feed system similar to Twitter or Instagram. Users can create posts, follow other users, and view a personalized timeline of posts from people they follow. This is one of the most popular and complex system design questions because it involves real-time data, massive scale, and sophisticated caching strategies.

Step 1: Requirements

Functional Requirements

Users can create text posts (tweets) with optional media
Users can follow/unfollow other users
Users can view their home timeline (posts from people they follow)
Users can view a specific user's profile timeline
Posts appear in reverse chronological order (with optional ranking)

Non-Functional Requirements

Timeline generation should be fast (<200ms)
High availability (the feed is the core experience)
Eventual consistency is acceptable (a post appearing a few seconds late is OK)
Support for 500 million users, 200 million daily active users

Scale Estimates

Metric	Estimate
Daily active users	200 million
Average posts per user/day	2
New posts per day	400 million
Average followers per user	200
Timeline views per user/day	5
Timeline reads per day	1 billion

Step 2: Data Model

// Core data models
interface User {
  id: string;
  username: string;
  displayName: string;
  avatarUrl: string;
  followerCount: number;
  followingCount: number;
  createdAt: Date;
}

interface Post {
  id: string;           // Snowflake ID (contains timestamp)
  userId: string;
  content: string;
  mediaUrls: string[];  // Images, videos stored in object storage
  likeCount: number;
  repostCount: number;
  replyCount: number;
  createdAt: Date;
}

interface Follow {
  followerId: string;   // The user who follows
  followeeId: string;   // The user being followed
  createdAt: Date;
}

// Timeline entry (cached in Redis)
interface TimelineEntry {
  postId: string;
  userId: string;
  timestamp: number;    // For sorting
}

Step 3: Fan-out on Write vs Fan-out on Read

This is the most critical design decision for a social media feed. When a user publishes a post, how does it end up in their followers' timelines?

Fan-out on Write (Push Model)

When a user publishes a post, the system immediately pushes the post to the timeline cache of every follower. Each user's timeline is pre-computed and stored in Redis.

Fan-out on Read (Pull Model)

When a user opens their timeline, the system pulls posts from all the users they follow, merges them, and sorts by time. No pre-computation happens at write time.

Fan-out Strategy Comparison

Aspect	Fan-out on Write	Fan-out on Read
Write Cost	High (push to N followers)	Low (write post once)
Read Cost	Low (timeline pre-computed)	High (merge N feeds in real-time)
Timeline Latency	Very fast reads	Slower reads
Celebrity Problem	Millions of writes per post	No issue (pulled on demand)
Inactive Users	Wasted writes for users who never check	No wasted work
Consistency	Post visible after fan-out completes	Always up-to-date

The Hybrid Approach (What Twitter Actually Uses)

Twitter uses a hybrid model. Most users use fan-out on write. Celebrity accounts (users with millions of followers) use fan-out on read. When you open your timeline, it merges your pre-computed cached timeline with real-time fetches from celebrity accounts you follow.

const CELEBRITY_THRESHOLD = 50_000; // Followers threshold

class FeedService {
  private cache: RedisClient;
  private db: Database;
  private messageQueue: MessageQueue;

  // When a user creates a post
  async publishPost(userId: string, post: Post): Promise<void> {
    // Save the post to the database
    await this.db.posts.insert(post);

    const followerCount = await this.db.getFollowerCount(userId);

    if (followerCount < CELEBRITY_THRESHOLD) {
      // Fan-out on write: push to followers' timelines
      await this.messageQueue.publish("fanout", {
        postId: post.id,
        userId,
        timestamp: post.createdAt.getTime(),
      });
    }
    // Celebrities: their posts are fetched on read (fan-out on read)
  }

  // Fan-out worker processes
  async processFanout(event: FanoutEvent): Promise<void> {
    const followers = await this.db.getFollowerIds(event.userId);

    // Push to each follower's timeline cache (Redis sorted set)
    const pipeline = this.cache.pipeline();
    for (const followerId of followers) {
      pipeline.zadd(`timeline:${followerId}`, event.timestamp, event.postId);
      // Trim timeline to keep only latest 800 posts
      pipeline.zremrangebyrank(`timeline:${followerId}`, 0, -801);
    }
    await pipeline.exec();
  }

  // When a user views their timeline
  async getTimeline(userId: string, page = 0, pageSize = 20): Promise<Post[]> {
    const start = page * pageSize;
    const end = start + pageSize - 1;

    // Get pre-computed timeline entries from cache
    const cachedPostIds = await this.cache.zrevrange(
      `timeline:${userId}`,
      start,
      end
    );

    // Get posts from celebrities this user follows (fan-out on read)
    const celebrities = await this.getCelebritiesFollowed(userId);
    const celebrityPosts = await this.getRecentPostsFromUsers(celebrities);

    // Merge and sort
    const allPostIds = this.mergeAndSort(cachedPostIds, celebrityPosts);

    // Fetch full post objects
    return this.db.posts.findByIds(allPostIds.slice(0, pageSize));
  }
}

Step 4: Caching Strategies

Caching is essential for feed performance. With billions of timeline reads per day, we need a multi-layered caching strategy.

What to Cache

Timeline Cache (Redis Sorted Set): Each user's timeline as a sorted set of post IDs scored by timestamp. Keeps the latest 800 entries.
Post Cache: Full post objects cached by ID. Avoids database reads for popular posts.
User Cache: User profile data (name, avatar) cached since it appears on every post in the feed.
Social Graph Cache: Follower/following lists cached for quick lookups during fan-out.
Count Cache: Like counts, repost counts updated via atomic Redis increments.

// Redis data structures for feed caching

class FeedCache {
  // Timeline: Sorted Set (post IDs scored by timestamp)
  // Key: timeline:{userId}
  // Score: timestamp
  // Member: postId
  async addToTimeline(userId: string, postId: string, timestamp: number) {
    await redis.zadd(`timeline:${userId}`, timestamp, postId);
    await redis.zremrangebyrank(`timeline:${userId}`, 0, -801); // Keep 800 max
  }

  // Post cache: Hash
  // Key: post:{postId}
  async cachePost(post: Post) {
    await redis.hset(`post:${post.id}`, {
      userId: post.userId,
      content: post.content,
      likeCount: post.likeCount.toString(),
      createdAt: post.createdAt.toISOString(),
    });
    await redis.expire(`post:${post.id}`, 86400); // 24h TTL
  }

  // User cache: Hash
  // Key: user:{userId}
  async cacheUser(user: User) {
    await redis.hset(`user:${user.id}`, {
      username: user.username,
      displayName: user.displayName,
      avatarUrl: user.avatarUrl,
    });
    await redis.expire(`user:${user.id}`, 3600); // 1h TTL
  }
}

Step 5: Media Storage and CDN

Posts often contain images and videos which are significantly larger than text. Media requires a separate storage and delivery pipeline.

Object Storage: Store original media files in S3 or GCS
Image Processing: Generate thumbnails and multiple resolutions asynchronously
CDN: Serve media through a CDN for low-latency delivery worldwide
Lazy Loading: Load media only when the post scrolls into view to reduce bandwidth

Step 6: Scaling Considerations

Database Scaling

Posts table: Shard by userId (all posts from a user on the same shard)
Follow table: Shard by followerId for efficient "who do I follow?" queries
Read replicas: For profile timeline reads and analytics queries

Application Tier

Stateless servers: Scale horizontally behind a load balancer
Fan-out workers: Scale independently based on write volume
Separate read and write paths: Different services optimized for each

Feed Ranking

Modern social media feeds go beyond reverse chronological order. A ranking algorithm considers factors like user engagement history, post recency, content type, and social proximity to surface the most relevant content.

// Simplified feed ranking
interface RankedPost {
  post: Post;
  score: number;
}

function rankPosts(posts: Post[], userId: string, userContext: UserContext): RankedPost[] {
  return posts
    .map((post) => ({
      post,
      score: calculateRelevanceScore(post, userId, userContext),
    }))
    .sort((a, b) => b.score - a.score);
}

function calculateRelevanceScore(
  post: Post,
  viewerId: string,
  context: UserContext
): number {
  let score = 0;

  // Recency: decay over time
  const ageHours = (Date.now() - post.createdAt.getTime()) / 3600000;
  score += Math.max(0, 100 - ageHours * 2); // Lose 2 points per hour

  // Engagement signals
  score += Math.log(post.likeCount + 1) * 10;
  score += Math.log(post.replyCount + 1) * 15; // Replies worth more
  score += Math.log(post.repostCount + 1) * 12;

  // Social proximity (how close is the poster to the viewer)
  const closeness = context.interactionFrequency[post.userId] || 0;
  score += closeness * 20;

  // Content type bonus
  if (post.mediaUrls.length > 0) score += 5; // Posts with media rank slightly higher

  return score;
}

Architecture Summary

Write Path: Client -> API Gateway -> Post Service -> DB + Message Queue -> Fan-out Workers -> Redis Timelines
Read Path: Client -> API Gateway -> Feed Service -> Redis Timeline + Celebrity Posts -> Merge + Rank -> Return
Media Path: Client -> Upload Service -> S3 -> Processing Queue -> Thumbnails -> CDN
Key Insight: The hybrid fan-out approach is the accepted best practice. Always mention this in interviews.

System Design: Social Media Feed