What is a k-d tree and why is it used for vector search?

A k-d tree is a binary tree that partitions multi-dimensional data by alternating which dimension it splits on at each level. This lets you discard large portions of the dataset without checking every entry. For vector search with hundreds of dimensions, that pruning is what makes sub-100ms latency possible.

What is Voy and how does TraceMind use it?

Voy is a WebAssembly k-d tree library for nearest-neighbor search. TraceMind uses it to index 384-dimensional vector embeddings from the all-MiniLM-L6-v2 model. When you search, Voy traverses the k-d tree to find the closest vectors to your query without comparing against every stored embedding.

What is the curse of dimensionality and does it affect browser search?

The curse of dimensionality means k-d trees get less efficient as dimensions increase because more branches need to be explored. At 384 dimensions, this is a real concern. TraceMind uses approximate nearest neighbor search via Voy's WASM implementation, which accepts a small accuracy trade-off to maintain sub-100ms latency.

How does Reciprocal Rank Fusion combine semantic and keyword search?

RRF takes ranked result lists from two separate retrieval systems — dense vector search and FlexSearch full-text — and merges them using a formula that weights rank position rather than raw scores. This means a result ranked high in both lists beats one ranked high in only one, giving you the best of semantic and keyword retrieval.

Does all of this ML processing slow down my browser?

TraceMind runs inference via WebGPU where available, which offloads work to the GPU and avoids blocking the main thread. On devices without WebGPU, it falls back to WASM. Indexing happens asynchronously in the background, and search latency stays under 100ms even with tens of thousands of entries.

The Math Behind Millisecond Search: K-d Trees Explained | TraceMind Blog

Three tabs deep into a rabbit hole about vector search databases, I stumbled upon an interesting fact: the Voy library uses k-d trees to keep spatial queries fast without burning CPU. What does that even mean? Honestly, I had to dig deeper to understand it.

The short version is this: when TraceMind gives you search results in under 100 milliseconds across tens of thousands of indexed pages, k-d trees are a big part of why. Understanding how they work also helps explain why building a fast semantic search system inside a browser is harder than it looks, and why the engineering choices matter.

What is a k-d tree, actually?

K-d trees, or k-dimensional trees, are a binary tree data structure for organizing points in k-dimensional space. The name is a bit intimidating, but the core idea is simple: partition your data recursively by alternating which dimension you split on at each level of the tree.

Here's the intuition with a 2D example. Say you have a scatter plot of 10,000 points. You want to find the 5 nearest neighbors to a new query point. Brute force means computing the distance from the query to all 10,000 points. That's slow.

A k-d tree instead builds a binary tree during index time:

Pick a dimension (say, the x-axis). Find the median x value. Split the dataset into points left and right of that median. This becomes the root node.
For each half, pick the next dimension (y-axis), find that subset's median, split again.
Continue alternating dimensions until each leaf node contains a small number of points.

Now when you query, you traverse the tree from the root. At each node, you check which side of the splitting hyperplane your query point falls on, and go down that branch first. The key insight is that you can often skip entire branches entirely, because the splitting hyperplane tells you no points on the other side could possibly be closer than the nearest neighbor you've already found. That pruning is where the speed comes from.

Why does this matter for 384-dimensional vectors?

TraceMind uses the all-MiniLM-L6-v2 model, which produces 384-dimensional vectors for every page you visit. That's 384 numbers per page. When you search, your query also becomes a 384-dimensional vector, and the task is to find the stored vectors closest to it in that 384-dimensional space.

The "closeness" is usually measured with cosine similarity or Euclidean distance. Either way, you're comparing vectors across 384 dimensions.

Without any tree structure, finding the nearest neighbors means computing distance against every stored embedding. If you've indexed 50,000 pages, that's 50,000 distance calculations, each involving 384 multiplications and additions. This is where understanding how vector embeddings work in the browser becomes practically relevant.

A k-d tree lets you prune large chunks of the search space without doing those calculations. You check a branch boundary, determine no points there can beat your current best, and skip the whole subtree.

The curse of dimensionality (and why it matters here)

There's a well-known problem with k-d trees called the curse of dimensionality. As the number of dimensions grows, k-d trees lose their efficiency because fewer branches can be pruned. At very high dimensions, you end up exploring almost the entire tree anyway.

384 dimensions sits in an awkward middle ground. It's high enough that a perfectly exact k-d tree starts to feel the curse, but low enough that approximate nearest neighbor (ANN) search with a k-d tree is still very fast in practice.

ANN search accepts a small trade-off: instead of guaranteeing you find the absolute closest vectors, it finds vectors that are extremely likely to be the closest, with a tiny probability of missing one. For semantic search over browser history, this is a completely acceptable trade-off. You're not running a medical diagnosis algorithm. You're trying to find an article you read last week. Being 99.5% accurate is more than enough.

How Voy implements this in WASM

TraceMind uses Voy, a WebAssembly library that implements approximate nearest neighbor search using k-d trees. The WASM compilation means the heavy indexing and search work runs in a sandboxed module separate from the browser's main JavaScript thread.

Here's why that matters: if the k-d tree traversal ran in the main thread, it could block the browser's UI while you're typing a query. With WASM, the search happens in its own execution context. Combined with WebGPU for ML inference (or WASM as fallback), the whole pipeline from keystroke to results can stay under 100ms.

The practical sequence when you install TraceMind and visit a page:

Mozilla's Readability library strips the page down to clean text.
That text is passed to all-MiniLM-L6-v2 running on WebGPU/WASM, which produces a 384-dimensional embedding.
The embedding, along with the page metadata and compressed text, is stored in IndexedDB.
Voy's k-d tree index is updated with the new vector.

When you search:

Your query becomes a 384-dimensional vector via the same model.
Voy traverses the k-d tree to find the top-k nearest neighbor vectors.
FlexSearch does a parallel full-text keyword search across indexed content.
Reciprocal Rank Fusion merges both result lists by rank position.
Results appear, usually in under 100ms.

What is Reciprocal Rank Fusion and why use two systems?

I want to spend a moment on RRF because it's genuinely clever and often overlooked.

Pure semantic search is excellent at understanding meaning but can miss exact-match queries. If you search for a specific error code like "ERR_CONNECTION_TIMED_OUT," a semantic model might decide it's conceptually similar to other network errors and return vaguely related results, when what you want is the exact page that mentioned that string.

Pure keyword search (like FlexSearch or BM25) is excellent at exact matches but fails completely on conceptual queries. Search for "slow network response" and it won't find a page that said "high latency connection."

RRF runs both in parallel and combines them. The formula takes each result's rank position in each list and computes a combined score. A document ranked 1st in the semantic list and 5th in the keyword list scores higher than a document ranked 1st in only one list. The result is a merged ranking that captures both precision (keywords) and recall (semantics).

You can see a practical discussion of this approach in semantic search versus keyword search for knowledge workers, which covers the trade-offs from a less technical angle.

Bloom filters: the fast pre-filter layer

The original post mentioned Bloom filters, and they're worth explaining properly.

A Bloom filter is a probabilistic data structure that answers the question "have I seen this before?" extremely fast, using very little memory. It doesn't store the actual items. It stores a compact bit array, where bits are set by running items through multiple hash functions.

When you query a Bloom filter:

If any of the corresponding bits are not set, the item is definitely not in the set.
If all bits are set, the item is probably in the set (there's a small false positive rate).

For TraceMind, a Bloom filter can serve as a deduplication layer: before computing a full SHA-256 hash and doing a proper IndexedDB lookup, a quick Bloom filter check can rule out pages you've definitely never seen, saving unnecessary database reads. The false positive rate is configurable; for history deduplication, even a 1% false positive rate is acceptable because the next layer (SHA-256 check) catches any mismatches.

The combination of Bloom filter pre-filtering, k-d tree nearest-neighbor search, and Voy's WASM execution is what makes the search feel instantaneous even on mid-range hardware.

The trade-offs are real

I want to be honest about the limitations.

K-d trees require index time. Every time a new page is indexed, the tree needs to be updated or rebuilt. Voy handles incremental updates efficiently, but if you're indexing pages at high frequency, there's background CPU work happening. On low-powered devices, you might notice it.

ANN search has a non-zero error rate. For most use cases this is irrelevant, but it's worth knowing that semantic search is not a database query with guaranteed exact results. It's an approximation.

And k-d trees are not the only approach. Hierarchical Navigable Small World (HNSW) graphs are increasingly popular for ANN search and often outperform k-d trees at high dimensionality. TraceMind's current architecture uses Voy's k-d tree because it has a smaller WASM binary size and is well-suited to the browser environment, but this is an active area of development in the vector database space.

Why this matters for browser search specifically

Most browser history tools, including Chrome's built-in Ctrl+H, use simple string matching. They're fast because the problem is easy: find pages where the URL or title contains your query string. No ML, no vectors, no tree traversal.

The reason TraceMind's approach is worth the engineering complexity is that string matching doesn't understand what you were looking for. If you search for "that article about improving JavaScript bundle size," Chrome needs the page title or URL to literally contain those words. TraceMind understands that a page about "webpack code splitting and tree shaking" is exactly what you're looking for.

That understanding, implemented efficiently enough to run in a browser extension, is the whole product. The k-d tree is what makes it fast enough to be usable.

If you want to see what this looks like in practice, you can try TraceMind on any Chromium-based browser, including Chrome, Brave, and Edge.

What is a k-d tree, actually?

A k-d tree instead builds a binary tree during index time:

Pick a dimension (say, the x-axis). Find the median x value. Split the dataset into points left and right of that median. This becomes the root node.
For each half, pick the next dimension (y-axis), find that subset's median, split again.
Continue alternating dimensions until each leaf node contains a small number of points.

Why does this matter for 384-dimensional vectors?

The "closeness" is usually measured with cosine similarity or Euclidean distance. Either way, you're comparing vectors across 384 dimensions.

The curse of dimensionality (and why it matters here)

How Voy implements this in WASM

The practical sequence when you install TraceMind and visit a page:

Mozilla's Readability library strips the page down to clean text.
That text is passed to all-MiniLM-L6-v2 running on WebGPU/WASM, which produces a 384-dimensional embedding.
The embedding, along with the page metadata and compressed text, is stored in IndexedDB.
Voy's k-d tree index is updated with the new vector.

When you search:

Your query becomes a 384-dimensional vector via the same model.
Voy traverses the k-d tree to find the top-k nearest neighbor vectors.
FlexSearch does a parallel full-text keyword search across indexed content.
Reciprocal Rank Fusion merges both result lists by rank position.
Results appear, usually in under 100ms.

What is Reciprocal Rank Fusion and why use two systems?

I want to spend a moment on RRF because it's genuinely clever and often overlooked.

You can see a practical discussion of this approach in semantic search versus keyword search for knowledge workers, which covers the trade-offs from a less technical angle.

Bloom filters: the fast pre-filter layer

The original post mentioned Bloom filters, and they're worth explaining properly.

When you query a Bloom filter:

If any of the corresponding bits are not set, the item is definitely not in the set.
If all bits are set, the item is probably in the set (there's a small false positive rate).

The combination of Bloom filter pre-filtering, k-d tree nearest-neighbor search, and Voy's WASM execution is what makes the search feel instantaneous even on mid-range hardware.

The trade-offs are real

I want to be honest about the limitations.

ANN search has a non-zero error rate. For most use cases this is irrelevant, but it's worth knowing that semantic search is not a database query with guaranteed exact results. It's an approximation.

Why this matters for browser search specifically

That understanding, implemented efficiently enough to run in a browser extension, is the whole product. The k-d tree is what makes it fast enough to be usable.

If you want to see what this looks like in practice, you can try TraceMind on any Chromium-based browser, including Chrome, Brave, and Edge.

The Math Behind Millisecond Search: K-d Trees Explained

What is a k-d tree, actually?

Why does this matter for 384-dimensional vectors?

The curse of dimensionality (and why it matters here)

How Voy implements this in WASM

What is Reciprocal Rank Fusion and why use two systems?

Bloom filters: the fast pre-filter layer

The trade-offs are real

Why this matters for browser search specifically

Related Posts

Ready to try TraceMind?

The Math Behind Millisecond Search: K-d Trees Explained

What is a k-d tree, actually?

Why does this matter for 384-dimensional vectors?

The curse of dimensionality (and why it matters here)

How Voy implements this in WASM

What is Reciprocal Rank Fusion and why use two systems?

Bloom filters: the fast pre-filter layer

The trade-offs are real

Why this matters for browser search specifically

Related Posts

Ready to try TraceMind?