Proximum

Version-controlled vector indexing

Immutable HNSW indexes with snapshots, branching, and Merkle verification. Pure JVM. No native dependencies.

Contact: contact@datahike.io

Why Proximum

  • Snapshots - Immutable index snapshots in O(1) (structural sharing).
  • Scalable reads - A snapshot is a value. Hand it to any number of workers. No connection pool, no coordinator.
  • Reproducibility - Query historical index states for audits, debugging, and compliance workflows.
  • Verification - Merkle-hashed indices for verifiable, content-addressed retrieval states.
  • Storage - Pluggable persistence via Konserve with backends tailored to your environment.
Give feedback

Performance

SIFT-1M benchmark (1M vectors, 128-dim, Intel Core Ultra 7):

Implementation Insert (vec/s) Search QPS p50 Latency Recall@10
Proximum 13,392 3,844 264 µs 98.63%
jvector 9,771 3,609 277 µs 95.95%
datalevin/usearch 2,492 3,616 268 µs 96.96%
Lucene HNSW 2,395 3,036 340 µs 98.53%
hnswlib-java 4,260 1,007 1,033 µs 98.29%
  • Fastest insert -37% faster than jvector, 5.6× faster than Lucene
  • Best recall - Highest recall@10 among pure JVM implementations
  • Pure JVM - SIMD via Java Vector API, no native dependencies
  • Immutable - Zero-cost branching on top of this performance

Full benchmarks including DBpedia-OpenAI-100K and Glove-100 in the GitHub README →

Integrations

First-class JVM integrations ship alongside the core API.

  • Spring AI - Drop-in integration for Spring-based RAG and retrieval pipelines.
  • LangChain4j - Adapter for LangChain4j-based applications and evaluators.
  • APIs - Full Java API and full Clojure API.

Deployment

Embedded JVM library

Run Proximum where your services run. No separate cluster required unless you want one.

Pluggable storage

Choose persistence backends that match your needs (local, cloud, replicated), with a consistent model.

Work with us

If you need help getting Proximum into production, we can help with integration, custom development, and support contracts.