Problem Statement
Design a search autocomplete (typeahead suggestion) system like Google Search suggestions. As a user types each character, the system returns the top relevant suggestions in real time. This problem tests your understanding of data structures (tries), ranking algorithms, caching, and low-latency system design.
Step 1: Requirements
Functional Requirements
- Return top 5-10 suggestions as the user types
- Suggestions ranked by popularity (search frequency)
- Support for personalized suggestions based on user history
- Handle misspellings and fuzzy matching
- Multi-language support
- Filter inappropriate/offensive suggestions
Non-Functional Requirements
- Response time under 100ms (users expect instant suggestions)
- High availability (autocomplete failure degrades search UX significantly)
- Scale to 10 billion search queries per day
- Suggestions should reflect trending topics within minutes
Scale Estimates
| Metric | Estimate |
|---|---|
| Search queries per day | 10 billion |
| Unique search terms | ~5 billion |
| Autocomplete requests per search | ~6 (avg query length 6 chars) |
| Autocomplete QPS | ~700,000 |
| Target latency | <100ms p99 |
Step 2: Trie Data Structure
A Trie (prefix tree) is the foundational data structure for autocomplete. Each node represents a character, and paths from root to leaf represent complete search terms. The trie allows efficient prefix lookups in O(p) time where p is the prefix length.
// Basic Trie implementation for autocomplete
interface TrieNode {
children: Map<string, TrieNode>;
isEndOfWord: boolean;
frequency: number; // How many times this term was searched
topSuggestions: string[]; // Pre-computed top K suggestions for this prefix
}
class AutocompleteTrie {
private root: TrieNode;
constructor() {
this.root = this.createNode();
}
private createNode(): TrieNode {
return {
children: new Map(),
isEndOfWord: false,
frequency: 0,
topSuggestions: [],
};
}
// Insert a search term with its frequency
insert(term: string, frequency: number): void {
let node = this.root;
for (const char of term.toLowerCase()) {
if (!node.children.has(char)) {
node.children.set(char, this.createNode());
}
node = node.children.get(char)!;
}
node.isEndOfWord = true;
node.frequency = frequency;
}
// Get suggestions for a prefix
getSuggestions(prefix: string, limit = 10): string[] {
let node = this.root;
// Navigate to the prefix node
for (const char of prefix.toLowerCase()) {
if (!node.children.has(char)) {
return []; // No matches
}
node = node.children.get(char)!;
}
// If pre-computed suggestions exist, return them
if (node.topSuggestions.length > 0) {
return node.topSuggestions.slice(0, limit);
}
// Otherwise, collect all words under this prefix
const results: Array<{ term: string; frequency: number }> = [];
this.collectWords(node, prefix, results);
return results
.sort((a, b) => b.frequency - a.frequency)
.slice(0, limit)
.map((r) => r.term);
}
private collectWords(
node: TrieNode,
prefix: string,
results: Array<{ term: string; frequency: number }>
): void {
if (node.isEndOfWord) {
results.push({ term: prefix, frequency: node.frequency });
}
for (const [char, childNode] of node.children) {
this.collectWords(childNode, prefix + char, results);
}
}
}
Optimization: Pre-compute Top Suggestions
Traversing the trie at query time to find all matching terms is too slow for production. Instead, pre-compute the top K suggestions at each node during the build phase. This way, a prefix lookup is O(p) where p is the prefix length, regardless of how many terms match.
// Optimized: Pre-compute top suggestions at each node
class OptimizedTrie {
private K = 10; // Number of suggestions to pre-compute
// After inserting all terms, pre-compute top K at each node
precomputeTopSuggestions(node: TrieNode = this.root, prefix = ""): Array<{ term: string; freq: number }> {
const allTerms: Array<{ term: string; freq: number }> = [];
// If this node is end of word, include it
if (node.isEndOfWord) {
allTerms.push({ term: prefix, freq: node.frequency });
}
// Collect from all children
for (const [char, child] of node.children) {
const childTerms = this.precomputeTopSuggestions(child, prefix + char);
allTerms.push(...childTerms);
}
// Store top K at this node
allTerms.sort((a, b) => b.freq - a.freq);
node.topSuggestions = allTerms.slice(0, this.K).map((t) => t.term);
return allTerms;
}
// Now getSuggestions is O(prefix_length) - just walk to the node
getSuggestions(prefix: string): string[] {
let node = this.root;
for (const char of prefix.toLowerCase()) {
if (!node.children.has(char)) return [];
node = node.children.get(char)!;
}
return node.topSuggestions;
}
}
Step 3: Ranking and Personalization
Pure frequency-based ranking is a starting point, but modern autocomplete systems use multiple signals to rank suggestions.
interface SuggestionCandidate {
term: string;
globalFrequency: number;
recentFrequency: number; // Frequency in last 24 hours (trending)
userFrequency: number; // How often THIS user searched this
score: number;
}
class SuggestionRanker {
rank(
candidates: SuggestionCandidate[],
context: SearchContext
): SuggestionCandidate[] {
return candidates
.map((c) => ({
...c,
score: this.calculateScore(c, context),
}))
.sort((a, b) => b.score - a.score);
}
private calculateScore(
candidate: SuggestionCandidate,
context: SearchContext
): number {
let score = 0;
// Global popularity (log scale to prevent domination)
score += Math.log10(candidate.globalFrequency + 1) * 30;
// Trending boost (recent frequency vs historical)
const trendingRatio = candidate.recentFrequency / (candidate.globalFrequency / 365 + 1);
if (trendingRatio > 2) {
score += Math.min(trendingRatio * 10, 50); // Cap trending bonus
}
// Personalization: user's own search history
score += Math.log10(candidate.userFrequency + 1) * 40;
// Freshness: prefer recently created terms
// (for news events, new products, etc.)
// Location relevance
if (context.location && candidate.term.includes(context.location)) {
score += 15;
}
return score;
}
}
Step 4: Data Collection and Processing
The autocomplete system needs a continuous pipeline to collect search queries, aggregate frequencies, and update the trie.
Data Pipeline Architecture
- Query Logging: Every search query is logged to Kafka with timestamp, user ID, and location
- Aggregation (Batch): A MapReduce/Spark job runs periodically to aggregate query frequencies
- Aggregation (Real-time): A stream processor (Flink/Spark Streaming) updates trending queries in near real-time
- Trie Builder: Periodically rebuilds the trie from aggregated data and deploys to serving nodes
- Filtering: Remove offensive, dangerous, or inappropriate suggestions before building the trie
// Data pipeline for building the autocomplete trie
class AutocompleteDataPipeline {
// Step 1: Aggregate search queries (batch - runs hourly)
async aggregateQueries(): Promise<Map<string, number>> {
const frequencies = new Map<string, number>();
// Process query logs from the last time window
const queryLogs = await this.kafka.consume("search-queries");
for (const log of queryLogs) {
const normalized = this.normalizeQuery(log.query);
if (this.isValidSuggestion(normalized)) {
frequencies.set(normalized, (frequencies.get(normalized) || 0) + 1);
}
}
return frequencies;
}
// Step 2: Build trie from aggregated data
async buildTrie(frequencies: Map<string, number>): Promise<AutocompleteTrie> {
const trie = new AutocompleteTrie();
for (const [term, freq] of frequencies) {
trie.insert(term, freq);
}
trie.precomputeTopSuggestions();
return trie;
}
// Step 3: Deploy to serving nodes
async deployTrie(trie: AutocompleteTrie): Promise<void> {
// Serialize trie to a compact binary format
const serialized = this.serializeTrie(trie);
// Upload to shared storage
await this.s3.upload("autocomplete-tries", `trie-${Date.now()}.bin`, serialized);
// Notify serving nodes to load new trie
await this.broadcast("trie-update", { version: Date.now() });
}
private normalizeQuery(query: string): string {
return query
.toLowerCase()
.trim()
.replace(/\s+/g, " "); // Collapse whitespace
}
private isValidSuggestion(query: string): boolean {
if (query.length < 2 || query.length > 100) return false;
if (this.offensiveFilter.isOffensive(query)) return false;
return true;
}
}
Step 5: Caching Strategies
With 700K QPS, serving every request from the trie is impractical even with pre-computed suggestions. Multi-level caching is essential.
Caching Layers
- Browser Cache: Cache suggestions on the client. If user types "how t" then deletes "t", re-typing "t" should not hit the server.
- CDN/Edge Cache: Popular prefixes ("how to", "what is") can be cached at the CDN edge. 20% of prefixes account for 80% of requests.
- Application Cache (Redis): Cache prefix -> suggestions mapping in Redis. TTL of 1-5 minutes for balance between freshness and performance.
- In-Memory Trie: The trie itself is loaded entirely into each serving node's memory for sub-millisecond lookups.
class AutocompleteServer {
private trie: AutocompleteTrie;
private cache: RedisClient;
async getSuggestions(prefix: string, userId?: string): Promise<string[]> {
const normalizedPrefix = prefix.toLowerCase().trim();
// Layer 1: Check Redis cache
const cacheKey = `ac:${normalizedPrefix}`;
const cached = await this.cache.get(cacheKey);
if (cached) {
const suggestions = JSON.parse(cached);
// Optionally merge with personalized results
if (userId) {
return this.mergePersonalized(suggestions, userId, normalizedPrefix);
}
return suggestions;
}
// Layer 2: Query the in-memory trie
const suggestions = this.trie.getSuggestions(normalizedPrefix);
// Cache for 2 minutes
await this.cache.setex(cacheKey, 120, JSON.stringify(suggestions));
if (userId) {
return this.mergePersonalized(suggestions, userId, normalizedPrefix);
}
return suggestions;
}
// Client-side optimization: debounce and batch requests
// Don't send a request on every keystroke
// Wait 100-200ms after the last keystroke before requesting
}
// Client-side implementation
class AutocompleteClient {
private debounceTimer: NodeJS.Timeout | null = null;
private cache = new Map<string, string[]>();
onInput(prefix: string): void {
// Check client-side cache first
if (this.cache.has(prefix)) {
this.renderSuggestions(this.cache.get(prefix)!);
return;
}
// Debounce: wait 150ms after last keystroke
if (this.debounceTimer) clearTimeout(this.debounceTimer);
this.debounceTimer = setTimeout(() => {
this.fetchSuggestions(prefix);
}, 150);
}
private async fetchSuggestions(prefix: string): Promise<void> {
const response = await fetch(`/api/autocomplete?q=${encodeURIComponent(prefix)}`);
const suggestions = await response.json();
// Cache on the client
this.cache.set(prefix, suggestions);
this.renderSuggestions(suggestions);
}
}
Step 6: Multi-Language Support
Supporting multiple languages introduces challenges around character sets, tokenization, and culturally appropriate ranking.
- Separate tries per language: Build and serve different tries based on the user's locale
- Unicode normalization: Normalize characters to handle accents and diacritics (e.g., "cafe" should match "cafe")
- CJK languages: Chinese, Japanese, and Korean don't use spaces between words, requiring different tokenization strategies
- RTL languages: Arabic and Hebrew need special handling for prefix matching direction
- Transliteration: Allow searching in one script and suggesting in another (e.g., typing in Romanized Japanese to get Kanji suggestions)
System Architecture Overview
Component Summary
| Component | Technology | Purpose |
|---|---|---|
| Serving Layer | In-memory trie on app servers | Sub-ms prefix lookups |
| Cache | Redis + CDN edge cache | Reduce trie lookups for popular prefixes |
| Batch Processing | Spark / MapReduce | Aggregate query frequencies hourly |
| Stream Processing | Flink / Kafka Streams | Detect trending queries in real time |
| Storage | S3 (trie snapshots), Cassandra (query logs) | Persist trie builds and raw data |
| Personalization | Redis per-user history | Blend personal history into suggestions |
Interview Tips
- Start with the trie: Explain the data structure first, then optimize with pre-computed top-K
- Emphasize latency: Every optimization should focus on sub-100ms response times
- Discuss the data pipeline: How suggestions are updated is as important as how they are served
- Mention client-side optimizations: Debouncing and client caching reduce server load by 50%+
- Address filtering: Always mention content moderation for autocomplete suggestions