Google - Design Search Autocomplete System

1. Think Process

1.1. Problem Goal

The autocomplete system must:

  • Quickly find all sentences that start with a user’s current prefix (like “i l” → “i love”, “i like”, …)

  • Rank them by frequency (most used first), and lexicographically for ties.

So we need both:

  • Fast prefix lookup

  • Efficient ranking by frequency

1.2. Data structure choice

1.2.1. Trie

Root
 ├── 'i'  "i"
      ├── ' '  "i "
           ├── 'l'  "i l"
                ├── 'o'  "i lo"
 

1.2.2. Min-Heap at Each Node

Each Trie node stores a min-heap (priority queue) of size ≤ k, containing the top k most frequent sentences passing through that node.

Each heap entry = (-freq, sentence)

  • -freq → using negative frequency so the smallest heap value is least frequent.

  • This ensures the heap root is the lowest-ranked among top k, making it easy to maintain.

1.3. Operations

1.3.1. Insertion (Build/Update)

For every sentence:

  • Walk through the Trie for each character in the sentence.

  • Update frequency count in a global dictionary: count[sentence] += 1

For every Trie node along the path:

  • Push (-freq, sentence) into its heap.

  • If heap size > k → pop the smallest (to keep only top k).

Complexity:

  • Each character processed once → O(L log k) per sentence.

1.3.2. Query (User Typing)

  • As user types each character, move through the Trie.

  • If at any point a node doesn’t exist → return empty list.

  • When reaching the node for current prefix:

    • Retrieve its heap.
    • Convert heap to a sorted list (descending frequency, lexicographically).
  • Complexity: O(L) to traverse + O(k log k) to sort small heap.

2. Code Walk-through

import heapq

class TrieNode:
    def __init__(self):
        self.children = {}
        self.heap = []  # stores (-freq, sentence)

class AutocompleteSystem:
    def __init__(self, sentences, times, k=3):
        self.root = TrieNode()
        self.k = k
        self.count = {}  # global frequency map
        self.prefix = ""
        self.curr = self.root

        for s, t in zip(sentences, times):
            self.count[s] = t
            self.insert(s, t)

    def insert(self, sentence, freq):
        node = self.root
        for c in sentence:
            if c not in node.children:
                node.children[c] = TrieNode()
            node = node.children[c]

            # push into node heap
            heapq.heappush(node.heap, (-freq, sentence))
            # trim to top k
            if len(node.heap) > self.k:
                heapq.heappop(node.heap)

    def input(self, c):
        # Case 1: when user finishes a sentence
        if c == '#':
            self.count[self.prefix] = self.count.get(self.prefix, 0) + 1
            self.insert(self.prefix, self.count[self.prefix])
            self.prefix = ""
            self.curr = self.root
            return []

        self.prefix += c

        # Case 2: node do not exists
        if c not in self.curr.children:
            self.curr.children[c] = TrieNode()
            self.curr = self.curr.children[c]
            return []

        # Case 3: Happy case return top k items.
        self.curr = self.curr.children[c]
        return self._get_top(self.curr)

    def _get_top(self, node):
        # Convert heap to sorted list
        result = sorted(node.heap, key=lambda x: (x[0], x[1]))  # because freq is negative
        return [s for _, s in result[::-1]]  # reverse for descending frequency

  • Code first.

  • Explain the logic later.

  • Time Complexity:
    • Insert: O(L log k)
    • Search: O(L + k log k)
  • Space Complexity:
    • Insert: O(total_characters + total_heaps)
    • Search: O(L + k)
October 23, 2025