Google - Design Search Autocomplete System

1. Think Process

1.1. Problem Goal

The autocomplete system must:

Quickly find all sentences that start with a user’s current prefix (like “i l” → “i love”, “i like”, …)
Rank them by frequency (most used first), and lexicographically for ties.

So we need both:

Fast prefix lookup
Efficient ranking by frequency

1.2. Data structure choice

1.2.1. Trie

Root
 ├── 'i' → "i"
 │     ├── ' ' → "i "
 │     │     ├── 'l' → "i l"
 │     │     │     ├── 'o' → "i lo"
 

1.2.2. Min-Heap at Each Node

Each Trie node stores a min-heap (priority queue) of size ≤ k, containing the top k most frequent sentences passing through that node.

Each heap entry = (-freq, sentence)

-freq → using negative frequency so the smallest heap value is least frequent.
This ensures the heap root is the lowest-ranked among top k, making it easy to maintain.

1.3. Operations

1.3.1. Insertion (Build/Update)

For every sentence:

Walk through the Trie for each character in the sentence.
Update frequency count in a global dictionary: count[sentence] += 1

For every Trie node along the path:

Push (-freq, sentence) into its heap.
If heap size > k → pop the smallest (to keep only top k).

Complexity:

Each character processed once → O(L log k) per sentence.

1.3.2. Query (User Typing)

As user types each character, move through the Trie.
If at any point a node doesn’t exist → return empty list.
When reaching the node for current prefix:
- Retrieve its heap.
- Convert heap to a sorted list (descending frequency, lexicographically).
Complexity: O(L) to traverse + O(k log k) to sort small heap.

2. Code Walk-through

import heapq

class TrieNode:
    def __init__(self):
        self.children = {}
        self.heap = []  # stores (-freq, sentence)

class AutocompleteSystem:
    def __init__(self, sentences, times, k=3):
        self.root = TrieNode()
        self.k = k
        self.count = {}  # global frequency map
        self.prefix = ""
        self.curr = self.root

        for s, t in zip(sentences, times):
            self.count[s] = t
            self.insert(s, t)

    def insert(self, sentence, freq):
        node = self.root
        for c in sentence:
            if c not in node.children:
                node.children[c] = TrieNode()
            node = node.children[c]

            # push into node heap
            heapq.heappush(node.heap, (-freq, sentence))
            # trim to top k
            if len(node.heap) > self.k:
                heapq.heappop(node.heap)

    def input(self, c):
        # Case 1: when user finishes a sentence
        if c == '#':
            self.count[self.prefix] = self.count.get(self.prefix, 0) + 1
            self.insert(self.prefix, self.count[self.prefix])
            self.prefix = ""
            self.curr = self.root
            return []

        self.prefix += c

        # Case 2: node do not exists
        if c not in self.curr.children:
            self.curr.children[c] = TrieNode()
            self.curr = self.curr.children[c]
            return []

        # Case 3: Happy case return top k items.
        self.curr = self.curr.children[c]
        return self._get_top(self.curr)

    def _get_top(self, node):
        # Convert heap to sorted list
        result = sorted(node.heap, key=lambda x: (x[0], x[1]))  # because freq is negative
        return [s for _, s in result[::-1]]  # reverse for descending frequency

Code first.
Explain the logic later.
Time Complexity:
- Insert: O(L log k)
- Search: O(L + k log k)
Space Complexity:
- Insert: O(total_characters + total_heaps)
- Search: O(L + k)

October 23, 2025