Back to Blog
RustAlgorithmsSearchPerformanceFeatured

Sub-Millisecond Intent Detection with Finite State Transducers

·14 min

How we built a Rust command palette achieving <1ms intent detection using FST-based pattern matching, federated search across data sources, and adaptive ranking that learns user behavior.

The Lookup Problem

Natural language is messy. Users type "whats the weather" and "what is the whether" and "weather?" expecting the same response. Traditional approaches—regex patterns, keyword matching, embedding similarity—each have failure modes that matter at scale.

We needed something different: a system that could recognize thousands of intent patterns with fuzzy matching, sub-millisecond latency, and predictable memory usage. The answer came from an unexpected place: finite state transducers.

Finite Automata and Their Limits

Deterministic finite automata (DFAs) are the workhorses of pattern matching. Given a set of strings, you can build a DFA that recognizes exactly those strings in O(n) time, where n is the input length. Regular expression engines, lexers, and fast string matchers all build on this foundation.

But DFAs have a limitation: they only answer yes or no. They can tell you whether the input matches some pattern, but not which pattern, or what to do about it.

Transducers: Automata That Compute

A finite state transducer (FST) extends the automaton concept. Instead of just accepting or rejecting, each transition can emit output. The transducer reads input symbols and produces output symbols, transforming one sequence into another.

This transformation is the key insight. We're not just recognizing "check my balance"—we're mapping it to an intent identifier, an action code, a response template. The FST becomes a function from strings to values, computed in linear time.

The mathematical elegance is that FSTs compose. You can chain them: one FST normalizes text, another matches patterns, a third extracts entities. The composition of two FSTs is itself an FST, computed ahead of time.

Levenshtein Automata: Fuzzy Matching

Exact matching isn't enough. Users make typos. The Levenshtein automaton, discovered by Schulz and Mihov in 2002, addresses this beautifully.

Given a string S and a distance bound k, there exists a DFA that accepts exactly those strings within edit distance k of S. This automaton can be constructed without explicitly enumerating all strings in the language—a crucial property when k=2 and the alphabet is large.

The construction is intricate. You track not just your position in S, but all possible "states of error"—how many edits have occurred and where. For small k, this remains tractable. The resulting automaton is deterministic and runs in linear time.

Combining the Ideas

Here's where it gets interesting. FSTs support intersection and composition. A Levenshtein automaton can be intersected with a dictionary FST, producing a new FST that maps misspelled inputs to their corrections.

For intent detection, we build an FST mapping canonical phrases to intent IDs. We then compose this with Levenshtein automata for each phrase. The result is a single structure that accepts fuzzy input and emits structured output—all in one linear-time pass.

The construction happens offline. At runtime, we're just following transitions in a precomputed graph.

Memory and Cache Efficiency

FSTs have another property that matters for production systems: they compress beautifully. The automaton represents a set of strings, and strings share prefixes and suffixes. A minimal FST exploits both, sharing structure wherever possible.

A dictionary of 100,000 phrases might occupy 500MB as raw text. As an FST, it might be 5MB. The compression isn't lossy—the FST represents exactly the same mapping, just without redundancy.

This matters for cache efficiency. The entire automaton might fit in L3 cache. Transitions are sequential memory accesses, predictable by the CPU's prefetcher. Compare this to hash tables, where every lookup is a random memory access.

The Limits of Finite State

FSTs aren't magic. They represent regular relations—mappings where the input-output pairs can be enumerated by a finite procedure. Some patterns can't be captured: matching balanced parentheses, counting occurrences, context-sensitive grammars.

For intent detection, this is fine. We're not parsing natural language in full generality. We're recognizing patterns from a predefined vocabulary, with bounded fuzziness. The regular restriction is a feature, not a bug—it guarantees linear time.

More fundamentally, FSTs require knowing your patterns ahead of time. They excel at recognizing what you expect, not at handling the unexpected. For open-domain understanding, you need different tools.

Theory Meets Practice

What I find satisfying about this approach is how directly the theory translates to practice. Automata theory feels abstract until you need sub-millisecond latency on a dictionary of 50,000 phrases with typo tolerance.

The classic algorithms—subset construction, minimization, composition—become tools with concrete performance implications. The mathematical properties—closure under intersection, linear-time recognition—become engineering constraints you can rely on.

The gap between computer science education and computer science practice is often wide. FSTs are one of those cases where the gap closes: the theory is exactly what you need, and the practice is exactly what the theory predicts.

Questions about this article?

Happy to dive deeper on any of these topics.

Get in Touch