What is System Design: Search Autocomplete?

Design a search autocomplete system using trie data structures, ranking algorithms, caching strategies, and real-time query processing at scale

System Design: Search Autocomplete - System Design Tutorial | TechLead

Q: Step 1: Requirements

A Trie (prefix tree) is the foundational data structure for autocomplete. Each node represents a character, and paths from root to leaf represent complete search terms. The trie allows efficient prefix lookups in O(p) time where p is the prefix length.

Q: Step 3: Ranking and Personalization

Pure frequency-based ranking is a starting point, but modern autocomplete systems use multiple signals to rank suggestions.

Problem Statement

Design a search autocomplete (typeahead suggestion) system like Google Search suggestions. As a user types each character, the system returns the top relevant suggestions in real time. This problem tests your understanding of data structures (tries), ranking algorithms, caching, and low-latency system design.

Step 1: Requirements

Functional Requirements

Return top 5-10 suggestions as the user types
Suggestions ranked by popularity (search frequency)
Support for personalized suggestions based on user history
Handle misspellings and fuzzy matching
Multi-language support
Filter inappropriate/offensive suggestions

Non-Functional Requirements

Response time under 100ms (users expect instant suggestions)
High availability (autocomplete failure degrades search UX significantly)
Scale to 10 billion search queries per day
Suggestions should reflect trending topics within minutes

Scale Estimates

Metric	Estimate
Search queries per day	10 billion
Unique search terms	~5 billion
Autocomplete requests per search	~6 (avg query length 6 chars)
Autocomplete QPS	~700,000
Target latency	<100ms p99

Step 2: Trie Data Structure

A Trie (prefix tree) is the foundational data structure for autocomplete. Each node represents a character, and paths from root to leaf represent complete search terms. The trie allows efficient prefix lookups in O(p) time where p is the prefix length.

// Basic Trie implementation for autocomplete
interface TrieNode {
  children: Map<string, TrieNode>;
  isEndOfWord: boolean;
  frequency: number;        // How many times this term was searched
  topSuggestions: string[];  // Pre-computed top K suggestions for this prefix
}

class AutocompleteTrie {
  private root: TrieNode;

  constructor() {
    this.root = this.createNode();
  }

  private createNode(): TrieNode {
    return {
      children: new Map(),
      isEndOfWord: false,
      frequency: 0,
      topSuggestions: [],
    };
  }

  // Insert a search term with its frequency
  insert(term: string, frequency: number): void {
    let node = this.root;
    for (const char of term.toLowerCase()) {
      if (!node.children.has(char)) {
        node.children.set(char, this.createNode());
      }
      node = node.children.get(char)!;
    }
    node.isEndOfWord = true;
    node.frequency = frequency;
  }

  // Get suggestions for a prefix
  getSuggestions(prefix: string, limit = 10): string[] {
    let node = this.root;

    // Navigate to the prefix node
    for (const char of prefix.toLowerCase()) {
      if (!node.children.has(char)) {
        return []; // No matches
      }
      node = node.children.get(char)!;
    }

    // If pre-computed suggestions exist, return them
    if (node.topSuggestions.length > 0) {
      return node.topSuggestions.slice(0, limit);
    }

    // Otherwise, collect all words under this prefix
    const results: Array<{ term: string; frequency: number }> = [];
    this.collectWords(node, prefix, results);

    return results
      .sort((a, b) => b.frequency - a.frequency)
      .slice(0, limit)
      .map((r) => r.term);
  }

  private collectWords(
    node: TrieNode,
    prefix: string,
    results: Array<{ term: string; frequency: number }>
  ): void {
    if (node.isEndOfWord) {
      results.push({ term: prefix, frequency: node.frequency });
    }
    for (const [char, childNode] of node.children) {
      this.collectWords(childNode, prefix + char, results);
    }
  }
}

Optimization: Pre-compute Top Suggestions

Traversing the trie at query time to find all matching terms is too slow for production. Instead, pre-compute the top K suggestions at each node during the build phase. This way, a prefix lookup is O(p) where p is the prefix length, regardless of how many terms match.

// Optimized: Pre-compute top suggestions at each node
class OptimizedTrie {
  private K = 10; // Number of suggestions to pre-compute

  // After inserting all terms, pre-compute top K at each node
  precomputeTopSuggestions(node: TrieNode = this.root, prefix = ""): Array<{ term: string; freq: number }> {
    const allTerms: Array<{ term: string; freq: number }> = [];

    // If this node is end of word, include it
    if (node.isEndOfWord) {
      allTerms.push({ term: prefix, freq: node.frequency });
    }

    // Collect from all children
    for (const [char, child] of node.children) {
      const childTerms = this.precomputeTopSuggestions(child, prefix + char);
      allTerms.push(...childTerms);
    }

    // Store top K at this node
    allTerms.sort((a, b) => b.freq - a.freq);
    node.topSuggestions = allTerms.slice(0, this.K).map((t) => t.term);

    return allTerms;
  }

  // Now getSuggestions is O(prefix_length) - just walk to the node
  getSuggestions(prefix: string): string[] {
    let node = this.root;
    for (const char of prefix.toLowerCase()) {
      if (!node.children.has(char)) return [];
      node = node.children.get(char)!;
    }
    return node.topSuggestions;
  }
}

Step 3: Ranking and Personalization

Pure frequency-based ranking is a starting point, but modern autocomplete systems use multiple signals to rank suggestions.

interface SuggestionCandidate {
  term: string;
  globalFrequency: number;
  recentFrequency: number;   // Frequency in last 24 hours (trending)
  userFrequency: number;     // How often THIS user searched this
  score: number;
}

class SuggestionRanker {
  rank(
    candidates: SuggestionCandidate[],
    context: SearchContext
  ): SuggestionCandidate[] {
    return candidates
      .map((c) => ({
        ...c,
        score: this.calculateScore(c, context),
      }))
      .sort((a, b) => b.score - a.score);
  }

  private calculateScore(
    candidate: SuggestionCandidate,
    context: SearchContext
  ): number {
    let score = 0;

    // Global popularity (log scale to prevent domination)
    score += Math.log10(candidate.globalFrequency + 1) * 30;

    // Trending boost (recent frequency vs historical)
    const trendingRatio = candidate.recentFrequency / (candidate.globalFrequency / 365 + 1);
    if (trendingRatio > 2) {
      score += Math.min(trendingRatio * 10, 50); // Cap trending bonus
    }

    // Personalization: user's own search history
    score += Math.log10(candidate.userFrequency + 1) * 40;

    // Freshness: prefer recently created terms
    // (for news events, new products, etc.)

    // Location relevance
    if (context.location && candidate.term.includes(context.location)) {
      score += 15;
    }

    return score;
  }
}

Step 4: Data Collection and Processing

The autocomplete system needs a continuous pipeline to collect search queries, aggregate frequencies, and update the trie.

Data Pipeline Architecture

Query Logging: Every search query is logged to Kafka with timestamp, user ID, and location
Aggregation (Batch): A MapReduce/Spark job runs periodically to aggregate query frequencies
Aggregation (Real-time): A stream processor (Flink/Spark Streaming) updates trending queries in near real-time
Trie Builder: Periodically rebuilds the trie from aggregated data and deploys to serving nodes
Filtering: Remove offensive, dangerous, or inappropriate suggestions before building the trie

// Data pipeline for building the autocomplete trie

class AutocompleteDataPipeline {
  // Step 1: Aggregate search queries (batch - runs hourly)
  async aggregateQueries(): Promise<Map<string, number>> {
    const frequencies = new Map<string, number>();

    // Process query logs from the last time window
    const queryLogs = await this.kafka.consume("search-queries");

    for (const log of queryLogs) {
      const normalized = this.normalizeQuery(log.query);
      if (this.isValidSuggestion(normalized)) {
        frequencies.set(normalized, (frequencies.get(normalized) || 0) + 1);
      }
    }

    return frequencies;
  }

  // Step 2: Build trie from aggregated data
  async buildTrie(frequencies: Map<string, number>): Promise<AutocompleteTrie> {
    const trie = new AutocompleteTrie();

    for (const [term, freq] of frequencies) {
      trie.insert(term, freq);
    }

    trie.precomputeTopSuggestions();
    return trie;
  }

  // Step 3: Deploy to serving nodes
  async deployTrie(trie: AutocompleteTrie): Promise<void> {
    // Serialize trie to a compact binary format
    const serialized = this.serializeTrie(trie);

    // Upload to shared storage
    await this.s3.upload("autocomplete-tries", `trie-${Date.now()}.bin`, serialized);

    // Notify serving nodes to load new trie
    await this.broadcast("trie-update", { version: Date.now() });
  }

  private normalizeQuery(query: string): string {
    return query
      .toLowerCase()
      .trim()
      .replace(/\s+/g, " "); // Collapse whitespace
  }

  private isValidSuggestion(query: string): boolean {
    if (query.length < 2 || query.length > 100) return false;
    if (this.offensiveFilter.isOffensive(query)) return false;
    return true;
  }
}

Step 5: Caching Strategies

With 700K QPS, serving every request from the trie is impractical even with pre-computed suggestions. Multi-level caching is essential.

Caching Layers

Browser Cache: Cache suggestions on the client. If user types "how t" then deletes "t", re-typing "t" should not hit the server.
CDN/Edge Cache: Popular prefixes ("how to", "what is") can be cached at the CDN edge. 20% of prefixes account for 80% of requests.
Application Cache (Redis): Cache prefix -> suggestions mapping in Redis. TTL of 1-5 minutes for balance between freshness and performance.
In-Memory Trie: The trie itself is loaded entirely into each serving node's memory for sub-millisecond lookups.

class AutocompleteServer {
  private trie: AutocompleteTrie;
  private cache: RedisClient;

  async getSuggestions(prefix: string, userId?: string): Promise<string[]> {
    const normalizedPrefix = prefix.toLowerCase().trim();

    // Layer 1: Check Redis cache
    const cacheKey = `ac:${normalizedPrefix}`;
    const cached = await this.cache.get(cacheKey);
    if (cached) {
      const suggestions = JSON.parse(cached);
      // Optionally merge with personalized results
      if (userId) {
        return this.mergePersonalized(suggestions, userId, normalizedPrefix);
      }
      return suggestions;
    }

    // Layer 2: Query the in-memory trie
    const suggestions = this.trie.getSuggestions(normalizedPrefix);

    // Cache for 2 minutes
    await this.cache.setex(cacheKey, 120, JSON.stringify(suggestions));

    if (userId) {
      return this.mergePersonalized(suggestions, userId, normalizedPrefix);
    }

    return suggestions;
  }

  // Client-side optimization: debounce and batch requests
  // Don't send a request on every keystroke
  // Wait 100-200ms after the last keystroke before requesting
}

// Client-side implementation
class AutocompleteClient {
  private debounceTimer: NodeJS.Timeout | null = null;
  private cache = new Map<string, string[]>();

  onInput(prefix: string): void {
    // Check client-side cache first
    if (this.cache.has(prefix)) {
      this.renderSuggestions(this.cache.get(prefix)!);
      return;
    }

    // Debounce: wait 150ms after last keystroke
    if (this.debounceTimer) clearTimeout(this.debounceTimer);
    this.debounceTimer = setTimeout(() => {
      this.fetchSuggestions(prefix);
    }, 150);
  }

  private async fetchSuggestions(prefix: string): Promise<void> {
    const response = await fetch(`/api/autocomplete?q=${encodeURIComponent(prefix)}`);
    const suggestions = await response.json();

    // Cache on the client
    this.cache.set(prefix, suggestions);
    this.renderSuggestions(suggestions);
  }
}

Step 6: Multi-Language Support

Supporting multiple languages introduces challenges around character sets, tokenization, and culturally appropriate ranking.

Separate tries per language: Build and serve different tries based on the user's locale
Unicode normalization: Normalize characters to handle accents and diacritics (e.g., "cafe" should match "cafe")
CJK languages: Chinese, Japanese, and Korean don't use spaces between words, requiring different tokenization strategies
RTL languages: Arabic and Hebrew need special handling for prefix matching direction
Transliteration: Allow searching in one script and suggesting in another (e.g., typing in Romanized Japanese to get Kanji suggestions)

System Architecture Overview

Component Summary

Component	Technology	Purpose
Serving Layer	In-memory trie on app servers	Sub-ms prefix lookups
Cache	Redis + CDN edge cache	Reduce trie lookups for popular prefixes
Batch Processing	Spark / MapReduce	Aggregate query frequencies hourly
Stream Processing	Flink / Kafka Streams	Detect trending queries in real time
Storage	S3 (trie snapshots), Cassandra (query logs)	Persist trie builds and raw data
Personalization	Redis per-user history	Blend personal history into suggestions

Interview Tips

Start with the trie: Explain the data structure first, then optimize with pre-computed top-K
Emphasize latency: Every optimization should focus on sub-100ms response times
Discuss the data pipeline: How suggestions are updated is as important as how they are served
Mention client-side optimizations: Debouncing and client caching reduce server load by 50%+
Address filtering: Always mention content moderation for autocomplete suggestions

System Design: Search Autocomplete