TechLead
Lesson 15 of 30
6 min read
System Design

System Design: Social Media Feed

Design a social media feed like Twitter covering fan-out strategies, timeline generation, caching, media storage, and scaling for millions of users

Problem Statement

Design a social media feed system similar to Twitter or Instagram. Users can create posts, follow other users, and view a personalized timeline of posts from people they follow. This is one of the most popular and complex system design questions because it involves real-time data, massive scale, and sophisticated caching strategies.

Step 1: Requirements

Functional Requirements

  • Users can create text posts (tweets) with optional media
  • Users can follow/unfollow other users
  • Users can view their home timeline (posts from people they follow)
  • Users can view a specific user's profile timeline
  • Posts appear in reverse chronological order (with optional ranking)

Non-Functional Requirements

  • Timeline generation should be fast (<200ms)
  • High availability (the feed is the core experience)
  • Eventual consistency is acceptable (a post appearing a few seconds late is OK)
  • Support for 500 million users, 200 million daily active users

Scale Estimates

Metric Estimate
Daily active users200 million
Average posts per user/day2
New posts per day400 million
Average followers per user200
Timeline views per user/day5
Timeline reads per day1 billion

Step 2: Data Model

// Core data models
interface User {
  id: string;
  username: string;
  displayName: string;
  avatarUrl: string;
  followerCount: number;
  followingCount: number;
  createdAt: Date;
}

interface Post {
  id: string;           // Snowflake ID (contains timestamp)
  userId: string;
  content: string;
  mediaUrls: string[];  // Images, videos stored in object storage
  likeCount: number;
  repostCount: number;
  replyCount: number;
  createdAt: Date;
}

interface Follow {
  followerId: string;   // The user who follows
  followeeId: string;   // The user being followed
  createdAt: Date;
}

// Timeline entry (cached in Redis)
interface TimelineEntry {
  postId: string;
  userId: string;
  timestamp: number;    // For sorting
}

Step 3: Fan-out on Write vs Fan-out on Read

This is the most critical design decision for a social media feed. When a user publishes a post, how does it end up in their followers' timelines?

Fan-out on Write (Push Model)

When a user publishes a post, the system immediately pushes the post to the timeline cache of every follower. Each user's timeline is pre-computed and stored in Redis.

Fan-out on Read (Pull Model)

When a user opens their timeline, the system pulls posts from all the users they follow, merges them, and sorts by time. No pre-computation happens at write time.

Fan-out Strategy Comparison

Aspect Fan-out on Write Fan-out on Read
Write CostHigh (push to N followers)Low (write post once)
Read CostLow (timeline pre-computed)High (merge N feeds in real-time)
Timeline LatencyVery fast readsSlower reads
Celebrity ProblemMillions of writes per postNo issue (pulled on demand)
Inactive UsersWasted writes for users who never checkNo wasted work
ConsistencyPost visible after fan-out completesAlways up-to-date

The Hybrid Approach (What Twitter Actually Uses)

Twitter uses a hybrid model. Most users use fan-out on write. Celebrity accounts (users with millions of followers) use fan-out on read. When you open your timeline, it merges your pre-computed cached timeline with real-time fetches from celebrity accounts you follow.

const CELEBRITY_THRESHOLD = 50_000; // Followers threshold

class FeedService {
  private cache: RedisClient;
  private db: Database;
  private messageQueue: MessageQueue;

  // When a user creates a post
  async publishPost(userId: string, post: Post): Promise<void> {
    // Save the post to the database
    await this.db.posts.insert(post);

    const followerCount = await this.db.getFollowerCount(userId);

    if (followerCount < CELEBRITY_THRESHOLD) {
      // Fan-out on write: push to followers' timelines
      await this.messageQueue.publish("fanout", {
        postId: post.id,
        userId,
        timestamp: post.createdAt.getTime(),
      });
    }
    // Celebrities: their posts are fetched on read (fan-out on read)
  }

  // Fan-out worker processes
  async processFanout(event: FanoutEvent): Promise<void> {
    const followers = await this.db.getFollowerIds(event.userId);

    // Push to each follower's timeline cache (Redis sorted set)
    const pipeline = this.cache.pipeline();
    for (const followerId of followers) {
      pipeline.zadd(`timeline:${followerId}`, event.timestamp, event.postId);
      // Trim timeline to keep only latest 800 posts
      pipeline.zremrangebyrank(`timeline:${followerId}`, 0, -801);
    }
    await pipeline.exec();
  }

  // When a user views their timeline
  async getTimeline(userId: string, page = 0, pageSize = 20): Promise<Post[]> {
    const start = page * pageSize;
    const end = start + pageSize - 1;

    // Get pre-computed timeline entries from cache
    const cachedPostIds = await this.cache.zrevrange(
      `timeline:${userId}`,
      start,
      end
    );

    // Get posts from celebrities this user follows (fan-out on read)
    const celebrities = await this.getCelebritiesFollowed(userId);
    const celebrityPosts = await this.getRecentPostsFromUsers(celebrities);

    // Merge and sort
    const allPostIds = this.mergeAndSort(cachedPostIds, celebrityPosts);

    // Fetch full post objects
    return this.db.posts.findByIds(allPostIds.slice(0, pageSize));
  }
}

Step 4: Caching Strategies

Caching is essential for feed performance. With billions of timeline reads per day, we need a multi-layered caching strategy.

What to Cache

  • Timeline Cache (Redis Sorted Set): Each user's timeline as a sorted set of post IDs scored by timestamp. Keeps the latest 800 entries.
  • Post Cache: Full post objects cached by ID. Avoids database reads for popular posts.
  • User Cache: User profile data (name, avatar) cached since it appears on every post in the feed.
  • Social Graph Cache: Follower/following lists cached for quick lookups during fan-out.
  • Count Cache: Like counts, repost counts updated via atomic Redis increments.
// Redis data structures for feed caching

class FeedCache {
  // Timeline: Sorted Set (post IDs scored by timestamp)
  // Key: timeline:{userId}
  // Score: timestamp
  // Member: postId
  async addToTimeline(userId: string, postId: string, timestamp: number) {
    await redis.zadd(`timeline:${userId}`, timestamp, postId);
    await redis.zremrangebyrank(`timeline:${userId}`, 0, -801); // Keep 800 max
  }

  // Post cache: Hash
  // Key: post:{postId}
  async cachePost(post: Post) {
    await redis.hset(`post:${post.id}`, {
      userId: post.userId,
      content: post.content,
      likeCount: post.likeCount.toString(),
      createdAt: post.createdAt.toISOString(),
    });
    await redis.expire(`post:${post.id}`, 86400); // 24h TTL
  }

  // User cache: Hash
  // Key: user:{userId}
  async cacheUser(user: User) {
    await redis.hset(`user:${user.id}`, {
      username: user.username,
      displayName: user.displayName,
      avatarUrl: user.avatarUrl,
    });
    await redis.expire(`user:${user.id}`, 3600); // 1h TTL
  }
}

Step 5: Media Storage and CDN

Posts often contain images and videos which are significantly larger than text. Media requires a separate storage and delivery pipeline.

  • Object Storage: Store original media files in S3 or GCS
  • Image Processing: Generate thumbnails and multiple resolutions asynchronously
  • CDN: Serve media through a CDN for low-latency delivery worldwide
  • Lazy Loading: Load media only when the post scrolls into view to reduce bandwidth

Step 6: Scaling Considerations

Database Scaling

  • Posts table: Shard by userId (all posts from a user on the same shard)
  • Follow table: Shard by followerId for efficient "who do I follow?" queries
  • Read replicas: For profile timeline reads and analytics queries

Application Tier

  • Stateless servers: Scale horizontally behind a load balancer
  • Fan-out workers: Scale independently based on write volume
  • Separate read and write paths: Different services optimized for each

Feed Ranking

Modern social media feeds go beyond reverse chronological order. A ranking algorithm considers factors like user engagement history, post recency, content type, and social proximity to surface the most relevant content.

// Simplified feed ranking
interface RankedPost {
  post: Post;
  score: number;
}

function rankPosts(posts: Post[], userId: string, userContext: UserContext): RankedPost[] {
  return posts
    .map((post) => ({
      post,
      score: calculateRelevanceScore(post, userId, userContext),
    }))
    .sort((a, b) => b.score - a.score);
}

function calculateRelevanceScore(
  post: Post,
  viewerId: string,
  context: UserContext
): number {
  let score = 0;

  // Recency: decay over time
  const ageHours = (Date.now() - post.createdAt.getTime()) / 3600000;
  score += Math.max(0, 100 - ageHours * 2); // Lose 2 points per hour

  // Engagement signals
  score += Math.log(post.likeCount + 1) * 10;
  score += Math.log(post.replyCount + 1) * 15; // Replies worth more
  score += Math.log(post.repostCount + 1) * 12;

  // Social proximity (how close is the poster to the viewer)
  const closeness = context.interactionFrequency[post.userId] || 0;
  score += closeness * 20;

  // Content type bonus
  if (post.mediaUrls.length > 0) score += 5; // Posts with media rank slightly higher

  return score;
}

Architecture Summary

  • Write Path: Client -> API Gateway -> Post Service -> DB + Message Queue -> Fan-out Workers -> Redis Timelines
  • Read Path: Client -> API Gateway -> Feed Service -> Redis Timeline + Celebrity Posts -> Merge + Rank -> Return
  • Media Path: Client -> Upload Service -> S3 -> Processing Queue -> Thumbnails -> CDN
  • Key Insight: The hybrid fan-out approach is the accepted best practice. Always mention this in interviews.

Continue Learning