TechLead
Lesson 19 of 30
9 min read
System Design

System Design: Search Autocomplete

Design a search autocomplete system using trie data structures, ranking algorithms, caching strategies, and real-time query processing at scale

Problem Statement

Design a search autocomplete (typeahead suggestion) system like Google Search suggestions. As a user types each character, the system returns the top relevant suggestions in real time. This problem tests your understanding of data structures (tries), ranking algorithms, caching, and low-latency system design.

Step 1: Requirements

Functional Requirements

  • Return top 5-10 suggestions as the user types
  • Suggestions ranked by popularity (search frequency)
  • Support for personalized suggestions based on user history
  • Handle misspellings and fuzzy matching
  • Multi-language support
  • Filter inappropriate/offensive suggestions

Non-Functional Requirements

  • Response time under 100ms (users expect instant suggestions)
  • High availability (autocomplete failure degrades search UX significantly)
  • Scale to 10 billion search queries per day
  • Suggestions should reflect trending topics within minutes

Scale Estimates

Metric Estimate
Search queries per day10 billion
Unique search terms~5 billion
Autocomplete requests per search~6 (avg query length 6 chars)
Autocomplete QPS~700,000
Target latency<100ms p99

Step 2: Trie Data Structure

A Trie (prefix tree) is the foundational data structure for autocomplete. Each node represents a character, and paths from root to leaf represent complete search terms. The trie allows efficient prefix lookups in O(p) time where p is the prefix length.

// Basic Trie implementation for autocomplete
interface TrieNode {
  children: Map<string, TrieNode>;
  isEndOfWord: boolean;
  frequency: number;        // How many times this term was searched
  topSuggestions: string[];  // Pre-computed top K suggestions for this prefix
}

class AutocompleteTrie {
  private root: TrieNode;

  constructor() {
    this.root = this.createNode();
  }

  private createNode(): TrieNode {
    return {
      children: new Map(),
      isEndOfWord: false,
      frequency: 0,
      topSuggestions: [],
    };
  }

  // Insert a search term with its frequency
  insert(term: string, frequency: number): void {
    let node = this.root;
    for (const char of term.toLowerCase()) {
      if (!node.children.has(char)) {
        node.children.set(char, this.createNode());
      }
      node = node.children.get(char)!;
    }
    node.isEndOfWord = true;
    node.frequency = frequency;
  }

  // Get suggestions for a prefix
  getSuggestions(prefix: string, limit = 10): string[] {
    let node = this.root;

    // Navigate to the prefix node
    for (const char of prefix.toLowerCase()) {
      if (!node.children.has(char)) {
        return []; // No matches
      }
      node = node.children.get(char)!;
    }

    // If pre-computed suggestions exist, return them
    if (node.topSuggestions.length > 0) {
      return node.topSuggestions.slice(0, limit);
    }

    // Otherwise, collect all words under this prefix
    const results: Array<{ term: string; frequency: number }> = [];
    this.collectWords(node, prefix, results);

    return results
      .sort((a, b) => b.frequency - a.frequency)
      .slice(0, limit)
      .map((r) => r.term);
  }

  private collectWords(
    node: TrieNode,
    prefix: string,
    results: Array<{ term: string; frequency: number }>
  ): void {
    if (node.isEndOfWord) {
      results.push({ term: prefix, frequency: node.frequency });
    }
    for (const [char, childNode] of node.children) {
      this.collectWords(childNode, prefix + char, results);
    }
  }
}

Optimization: Pre-compute Top Suggestions

Traversing the trie at query time to find all matching terms is too slow for production. Instead, pre-compute the top K suggestions at each node during the build phase. This way, a prefix lookup is O(p) where p is the prefix length, regardless of how many terms match.

// Optimized: Pre-compute top suggestions at each node
class OptimizedTrie {
  private K = 10; // Number of suggestions to pre-compute

  // After inserting all terms, pre-compute top K at each node
  precomputeTopSuggestions(node: TrieNode = this.root, prefix = ""): Array<{ term: string; freq: number }> {
    const allTerms: Array<{ term: string; freq: number }> = [];

    // If this node is end of word, include it
    if (node.isEndOfWord) {
      allTerms.push({ term: prefix, freq: node.frequency });
    }

    // Collect from all children
    for (const [char, child] of node.children) {
      const childTerms = this.precomputeTopSuggestions(child, prefix + char);
      allTerms.push(...childTerms);
    }

    // Store top K at this node
    allTerms.sort((a, b) => b.freq - a.freq);
    node.topSuggestions = allTerms.slice(0, this.K).map((t) => t.term);

    return allTerms;
  }

  // Now getSuggestions is O(prefix_length) - just walk to the node
  getSuggestions(prefix: string): string[] {
    let node = this.root;
    for (const char of prefix.toLowerCase()) {
      if (!node.children.has(char)) return [];
      node = node.children.get(char)!;
    }
    return node.topSuggestions;
  }
}

Step 3: Ranking and Personalization

Pure frequency-based ranking is a starting point, but modern autocomplete systems use multiple signals to rank suggestions.

interface SuggestionCandidate {
  term: string;
  globalFrequency: number;
  recentFrequency: number;   // Frequency in last 24 hours (trending)
  userFrequency: number;     // How often THIS user searched this
  score: number;
}

class SuggestionRanker {
  rank(
    candidates: SuggestionCandidate[],
    context: SearchContext
  ): SuggestionCandidate[] {
    return candidates
      .map((c) => ({
        ...c,
        score: this.calculateScore(c, context),
      }))
      .sort((a, b) => b.score - a.score);
  }

  private calculateScore(
    candidate: SuggestionCandidate,
    context: SearchContext
  ): number {
    let score = 0;

    // Global popularity (log scale to prevent domination)
    score += Math.log10(candidate.globalFrequency + 1) * 30;

    // Trending boost (recent frequency vs historical)
    const trendingRatio = candidate.recentFrequency / (candidate.globalFrequency / 365 + 1);
    if (trendingRatio > 2) {
      score += Math.min(trendingRatio * 10, 50); // Cap trending bonus
    }

    // Personalization: user's own search history
    score += Math.log10(candidate.userFrequency + 1) * 40;

    // Freshness: prefer recently created terms
    // (for news events, new products, etc.)

    // Location relevance
    if (context.location && candidate.term.includes(context.location)) {
      score += 15;
    }

    return score;
  }
}

Step 4: Data Collection and Processing

The autocomplete system needs a continuous pipeline to collect search queries, aggregate frequencies, and update the trie.

Data Pipeline Architecture

  • Query Logging: Every search query is logged to Kafka with timestamp, user ID, and location
  • Aggregation (Batch): A MapReduce/Spark job runs periodically to aggregate query frequencies
  • Aggregation (Real-time): A stream processor (Flink/Spark Streaming) updates trending queries in near real-time
  • Trie Builder: Periodically rebuilds the trie from aggregated data and deploys to serving nodes
  • Filtering: Remove offensive, dangerous, or inappropriate suggestions before building the trie
// Data pipeline for building the autocomplete trie

class AutocompleteDataPipeline {
  // Step 1: Aggregate search queries (batch - runs hourly)
  async aggregateQueries(): Promise<Map<string, number>> {
    const frequencies = new Map<string, number>();

    // Process query logs from the last time window
    const queryLogs = await this.kafka.consume("search-queries");

    for (const log of queryLogs) {
      const normalized = this.normalizeQuery(log.query);
      if (this.isValidSuggestion(normalized)) {
        frequencies.set(normalized, (frequencies.get(normalized) || 0) + 1);
      }
    }

    return frequencies;
  }

  // Step 2: Build trie from aggregated data
  async buildTrie(frequencies: Map<string, number>): Promise<AutocompleteTrie> {
    const trie = new AutocompleteTrie();

    for (const [term, freq] of frequencies) {
      trie.insert(term, freq);
    }

    trie.precomputeTopSuggestions();
    return trie;
  }

  // Step 3: Deploy to serving nodes
  async deployTrie(trie: AutocompleteTrie): Promise<void> {
    // Serialize trie to a compact binary format
    const serialized = this.serializeTrie(trie);

    // Upload to shared storage
    await this.s3.upload("autocomplete-tries", `trie-${Date.now()}.bin`, serialized);

    // Notify serving nodes to load new trie
    await this.broadcast("trie-update", { version: Date.now() });
  }

  private normalizeQuery(query: string): string {
    return query
      .toLowerCase()
      .trim()
      .replace(/\s+/g, " "); // Collapse whitespace
  }

  private isValidSuggestion(query: string): boolean {
    if (query.length < 2 || query.length > 100) return false;
    if (this.offensiveFilter.isOffensive(query)) return false;
    return true;
  }
}

Step 5: Caching Strategies

With 700K QPS, serving every request from the trie is impractical even with pre-computed suggestions. Multi-level caching is essential.

Caching Layers

  • Browser Cache: Cache suggestions on the client. If user types "how t" then deletes "t", re-typing "t" should not hit the server.
  • CDN/Edge Cache: Popular prefixes ("how to", "what is") can be cached at the CDN edge. 20% of prefixes account for 80% of requests.
  • Application Cache (Redis): Cache prefix -> suggestions mapping in Redis. TTL of 1-5 minutes for balance between freshness and performance.
  • In-Memory Trie: The trie itself is loaded entirely into each serving node's memory for sub-millisecond lookups.
class AutocompleteServer {
  private trie: AutocompleteTrie;
  private cache: RedisClient;

  async getSuggestions(prefix: string, userId?: string): Promise<string[]> {
    const normalizedPrefix = prefix.toLowerCase().trim();

    // Layer 1: Check Redis cache
    const cacheKey = `ac:${normalizedPrefix}`;
    const cached = await this.cache.get(cacheKey);
    if (cached) {
      const suggestions = JSON.parse(cached);
      // Optionally merge with personalized results
      if (userId) {
        return this.mergePersonalized(suggestions, userId, normalizedPrefix);
      }
      return suggestions;
    }

    // Layer 2: Query the in-memory trie
    const suggestions = this.trie.getSuggestions(normalizedPrefix);

    // Cache for 2 minutes
    await this.cache.setex(cacheKey, 120, JSON.stringify(suggestions));

    if (userId) {
      return this.mergePersonalized(suggestions, userId, normalizedPrefix);
    }

    return suggestions;
  }

  // Client-side optimization: debounce and batch requests
  // Don't send a request on every keystroke
  // Wait 100-200ms after the last keystroke before requesting
}

// Client-side implementation
class AutocompleteClient {
  private debounceTimer: NodeJS.Timeout | null = null;
  private cache = new Map<string, string[]>();

  onInput(prefix: string): void {
    // Check client-side cache first
    if (this.cache.has(prefix)) {
      this.renderSuggestions(this.cache.get(prefix)!);
      return;
    }

    // Debounce: wait 150ms after last keystroke
    if (this.debounceTimer) clearTimeout(this.debounceTimer);
    this.debounceTimer = setTimeout(() => {
      this.fetchSuggestions(prefix);
    }, 150);
  }

  private async fetchSuggestions(prefix: string): Promise<void> {
    const response = await fetch(`/api/autocomplete?q=${encodeURIComponent(prefix)}`);
    const suggestions = await response.json();

    // Cache on the client
    this.cache.set(prefix, suggestions);
    this.renderSuggestions(suggestions);
  }
}

Step 6: Multi-Language Support

Supporting multiple languages introduces challenges around character sets, tokenization, and culturally appropriate ranking.

  • Separate tries per language: Build and serve different tries based on the user's locale
  • Unicode normalization: Normalize characters to handle accents and diacritics (e.g., "cafe" should match "cafe")
  • CJK languages: Chinese, Japanese, and Korean don't use spaces between words, requiring different tokenization strategies
  • RTL languages: Arabic and Hebrew need special handling for prefix matching direction
  • Transliteration: Allow searching in one script and suggesting in another (e.g., typing in Romanized Japanese to get Kanji suggestions)

System Architecture Overview

Component Summary

Component Technology Purpose
Serving LayerIn-memory trie on app serversSub-ms prefix lookups
CacheRedis + CDN edge cacheReduce trie lookups for popular prefixes
Batch ProcessingSpark / MapReduceAggregate query frequencies hourly
Stream ProcessingFlink / Kafka StreamsDetect trending queries in real time
StorageS3 (trie snapshots), Cassandra (query logs)Persist trie builds and raw data
PersonalizationRedis per-user historyBlend personal history into suggestions

Interview Tips

  • Start with the trie: Explain the data structure first, then optimize with pre-computed top-K
  • Emphasize latency: Every optimization should focus on sub-100ms response times
  • Discuss the data pipeline: How suggestions are updated is as important as how they are served
  • Mention client-side optimizations: Debouncing and client caching reduce server load by 50%+
  • Address filtering: Always mention content moderation for autocomplete suggestions

Continue Learning