Why Search Needs Versioning

Vector indexes are almost always mutable. You insert embeddings, update them, delete them—the index reflects current state. This is fine when you’ll never need to query or audit past states, but breaks when retrieval feeds into reasoning.

The moment search results enter an LLM’s context window or guide an agent’s action, the index becomes memory. Memory that overwrites itself cannot be trusted.

The problem

A retrieval-augmented system in production: embeddings are indexed, queries retrieve context, responses are generated. A week later, someone asks why the system returned a particular result. In a mutable index, there’s no answer. The index changed. The embedding model may have been updated. The retrieval state that produced that response no longer exists.

This isn’t theoretical. Any system where retrieval influences outcomes—recommendations, classifications, agent actions—is subject to this failure. The more autonomous the system, the more consequential the gap.

Vectory applies the same copy-on-write model that powers Datahike to HNSW (Hierarchical Navigable Small World) vector indexes. Every insert returns a new index version. Previous versions remain valid and queryable:

// Create and populate
var idx = PersistentVectorIndex.builder()
    .dimensions(1536)
    .storagePath("/var/data/vectors")
    .build();

idx.addBatch(embeddings);
UUID v1 = idx.createSnapshot();

idx.addBatch(moreEmbeddings);
UUID v2 = idx.createSnapshot();

// Both versions remain searchable
var oldIndex = idx.asOf(v1);
oldIndex.search(query, 10);  // original state
idx.search(query, 10);       // current state

The branch() operation is O(1)—it shares structure with the original. Two branches diverge independently without copying data. This makes A/B testing embeddings, bisecting regressions, and maintaining reproducible baselines cheap.

How it works

The core data structure is a PersistentEdgeStore: chunked copy-on-write arrays that hold HNSW graph edges. Layer 0 (the dense bottom layer) uses fixed-size chunks; upper layers use sparse per-node arrays. When you modify the graph, only affected chunks are copied. Unchanged structure is shared.

Vectors themselves live in a memory-mapped store backed by Konserve, so the same index can be persisted to disk, S3, or any pluggable backend. The combination gives you SIMD-accelerated search with full version history and portable storage.

What this enables

Reproducible evaluation: Run the same query against the same index state, get the same results. Compare retrieval quality across embedding models with stable baselines.

Safe experimentation: Fork an index, test a new chunking strategy or embedding model, merge or discard. Production state is never at risk.

Auditability: Query the index as it existed at any past instant. Answer “what could the system have retrieved when it made that decision?”

Concurrent access: Readers never block writers. A snapshot is a value you can hand to any number of workers—across threads, processes, or machines—without coordination. Read scaling comes free.

The cost

Immutable indexes have write amplification: inserting a vector touches multiple graph edges, each potentially triggering chunk copies. Storage grows with history.

In practice, this cost is amortized. You don’t create a snapshot for every vector added during a bulk load. The PersistentEdgeStore supports transient mode—mutable during batch insert, immutable at the boundary. Snapshots are created only when a batch commits, and only those become visible to readers. The system can adaptively coarse-grain batches to balance throughput against snapshot granularity.

For RAG, semantic search, and ML experimentation where retrieval must be reproducible and auditable, versioned indexes are the right foundation.