Scriptum

Full-text search with branching

Git-like branching for Apache Lucene. Fork a 100GB index by sharing immutable segment files. Time-travel queries, branch isolation, and safe experimentation on your search indices.

Why Scriptum

  • Zero-cost forking - Branch any index in a few ms regardless of size. Copies metadata, not data.
  • Structural sharing - Branches share immutable Lucene segments via copy-on-write overlay directories.
  • Time travel - Open readers at any historical commit point. Query past index states.
  • Full Lucene 10.x - Text search, KNN vectors, facets, highlighting - all branch-aware.
  • Apache-2.0 - Open source, permissive license.

How it works

Scriptum extends Lucene with four components that enable copy-on-write branching:

  • BranchedDirectory - Overlay pattern: reads fall back to base, writes go to branch overlay.
  • BranchDeletionPolicy - Retains all commit points until explicit garbage collection.
  • BranchAwareMergePolicy - Prevents merging shared segments that would break other branches.
  • BranchIndexWriter - Main API for create, fork, commit, merge, and GC operations.

See LUCENE_EXTENSION.md for the full technical deep-dive.

Clojure API

(require '[scriptum.core :as sc])

;; Create an index
(def writer (sc/create-index "/tmp/my-index"))

;; Add documents
(sc/add-doc writer {:title {:type :text :value "Hello World"}
                    :id    {:type :string :value "doc-1"}})
(sc/commit! writer "Initial commit")

;; Fork a branch (3-5ms regardless of index size)
(def experiment (sc/fork writer "experiment"))

;; Add to branch (doesn't affect main)
(sc/add-doc experiment {:title {:type :text :value "Branch only"}
                        :id    {:type :string :value "doc-2"}})
(sc/commit! experiment "Added experimental doc")

;; Main still has 1 doc, branch has 2
(count (sc/search writer {:match-all {}} 100))      ;; => 1
(count (sc/search experiment {:match-all {}} 100))  ;; => 2

;; Merge back when ready
(sc/merge-from! writer experiment)

Java API

import org.replikativ.scriptum.BranchIndexWriter;
import org.apache.lucene.document.*;
import java.nio.file.Path;

// Create an index
BranchIndexWriter main = BranchIndexWriter.create(
    Path.of("/tmp/my-index"), "main");

// Add documents
Document doc = new Document();
doc.add(new TextField("title", "Hello World", Field.Store.YES));
main.addDocument(doc);
main.commit("Initial commit");

// Fork in a few ms regardless of index size)
BranchIndexWriter feature = main.fork("experiment");

// Branches evolve independently
feature.addDocument(anotherDoc);
feature.commit("Feature work");

// Merge back
main.mergeFrom(feature);

When to use Scriptum vs Proximum

Scriptum

Full-text search with Lucene

  • Keyword search, facets, highlighting
  • Text analysis pipelines
  • Document-oriented indices
  • When you need Lucene's query language

Proximum

Vector similarity search

  • Embedding-based retrieval (RAG)
  • Semantic search
  • Faster parallelized inerstion than Lucene HNSW
  • Advanced vector search features

Both have branching, snapshots, and time-travel. Choose based on your search workload.

Requirements

  • Java 21+ - Required for Lucene 10.x (Foreign Memory API, Vector API)
  • Lucene 10.3.2 - Pulled from Maven Central
  • Clojure 1.12.0+ - For the Clojure API

Install

Available on Clojars. See the GitHub repository for current version and installation instructions.

Maven/Gradle users: add the Clojars repository to your build configuration.