Stratum

SQL that branches

Fork any table in O(1). Query past snapshots. Pure JVM, no native dependencies. Faster than DuckDB on 35 of 46 queries via the Java Vector API. PostgreSQL wire protocol - connect with psql, JDBC, or DBeaver.

GitHub Benchmarks Read the announcement →

35/46 queries faster than DuckDB (10M rows, 1T)

8.7× faster on H2O db-benchmark Q10

O(1) table fork

Pure JVM no native dependencies

Performance

Fused SIMD execution over copy-on-write columnar data. Single-threaded comparison vs DuckDB v1.4.4 (JDBC in-process) on 10M rows, Intel Core Ultra 7 258V, JVM 25. Median of 10 iterations, 5 warmup:

Query	Stratum	DuckDB	Ratio
TPC-H Q6 (filter + sum-product)	13ms	28ms	2.2x faster
Filtered COUNT (NEQ pred)	3ms	12ms	4.0x faster
H2O Q3 (100K string groups)	71ms	362ms	5.1x faster
LIKE '%search%' (string scan)	47ms	240ms	5.1x faster
H2O Q6 (STDDEV group-by)	30ms	81ms	2.7x faster
H2O Q10 (10M groups, 6 cols)	832ms	7056ms	8.5x faster
AVG(LENGTH(URL))	38ms	170ms	4.5x faster

Stratum wins 35 of 46 queries at 10M rows (single-threaded, median of 10 runs, Intel Core Ultra 7). DuckDB wins on sparse-selectivity filters, high-cardinality hash group-by, and global COUNT(DISTINCT). Full methodology and raw results in the benchmark docs.

Branch anything

st/fork creates an O(1) copy-on-write branch - no data copied, only a pointer to shared chunks. st/sync! persists a branch to storage. st/load restores any named branch. Pass column data as a table map to SQL queries, or register live storage-backed tables in the server with register-live-table!.

What is this syntax?

require('[stratum.api :as st]
  '[konserve.file-store :as fs]
  '[clojure.core.async :refer [<!!]])

;; Open storage and load the orders dataset (10M rows)
def store: <!!(fs/new-fs-store("/data/stratum"))
def orders: <!!(st/load(store "orders"))

;; Fork in O(1) - structural sharing, zero data copied
def experiment: st/fork(orders)

;; Persist fork as a named branch - shares chunks with main
<!!(st/sync!(experiment store "experiment"))

;; Query both branches via SQL - pass column data as table map
st/q("SELECT SUM(price * qty) FROM t" {"t" st/columns(orders)})
;; => {:SUM(price * qty) 4821903.40}   ← main branch unchanged

st/q("SELECT SUM(price * qty) FROM t" {"t" st/columns(experiment)})
;; => {:SUM(price * qty) 4401238.66}   ← experiment branch

;; Time-travel: load any historical branch from storage
def baseline: <!!(st/load(store "orders-baseline"))
st/q("SELECT COUNT(*) FROM t" {"t" st/columns(baseline)})
;; => {:COUNT(*) 9847233}

(require '[stratum.api :as st]
         '[konserve.file-store :as fs]
         '[clojure.core.async :refer [<!!]])

;; Open storage and load the orders dataset (10M rows)
(def store  (<!! (fs/new-fs-store "/data/stratum")))
(def orders (<!! (st/load store "orders")))

;; Fork in O(1) - structural sharing, zero data copied
(def experiment (st/fork orders))

;; Persist fork as a named branch - shares chunks with main
(<!! (st/sync! experiment store "experiment"))

;; Query both branches via SQL - pass column data as table map
(st/q "SELECT SUM(price * qty) FROM t" {"t" (st/columns orders)})
;; => {:SUM(price * qty) 4821903.40}   ← main branch unchanged

(st/q "SELECT SUM(price * qty) FROM t" {"t" (st/columns experiment)})
;; => {:SUM(price * qty) 4401238.66}   ← experiment branch

;; Time-travel: load any historical branch from storage
(def baseline (<!! (st/load store "orders-baseline")))
(st/q "SELECT COUNT(*) FROM t" {"t" (st/columns baseline)})
;; => {:COUNT(*) 9847233}

Yggdrasil extends branching across your whole stack - fork Datahike, Stratum, and Proximum together for consistent snapshot isolation across SQL, Datalog, and vector search. Yggdrasil → · Dataset API docs →

Why Stratum

Datahike is the system-of-record: Datalog queries, immutable transactions, time travel. Stratum is the SIMD SQL engine for scans and analytics over those same snapshots - fast on group-bys, joins, and window functions where a triple index isn't the right memory layout.

Fast

SIMD-accelerated via Java Vector API. Fused single-pass execution, dense group-by indexing, zone-map pruning. 35 of 46 queries faster than DuckDB at 10M rows.

Branchable

Fork any table in O(1) via copy-on-write structural sharing. Named branches, time-travel, CoW snapshots - built into the engine, not bolted on.

Pure JVM

No JNI, no native compilation. Runs anywhere a JVM does. SIMD acceleration without deployment complexity or platform lock-in.

SQL-complete

PostgreSQL wire protocol. Full DML, CTEs, window functions, aggregates, FROM read_csv / read_parquet. psql, JDBC, DBeaver, psycopg2.

Clojure-native

Datasets implement IEditableCollection, ILookup, IPersistentCollection. tablecloth and tech.ml.dataset work directly. DSL or SQL strings.

Ecosystem

Branch Datahike, Stratum, and Proximum together via Yggdrasil. Consistent snapshots across SQL, Datalog, and vector search.

Quick start

Standalone server

# Java 21+, no Clojure needed
java --add-modules jdk.incubator.vector \
     --enable-native-access=ALL-UNNAMED \
     - jar stratum-standalone.jar \
     --index orders:/data/orders.csv

# Or try the built-in demo tables (lineitem, taxi -100K rows each)
java --add-modules jdk.incubator.vector \
     --enable-native-access=ALL-UNNAMED \
     - jar stratum-standalone.jar --demo

# Connect with any PostgreSQL client
psql - h localhost - p 5432 - U stratum

Clojure API

What is this syntax?

require('[stratum.api :as st])

;; Query with DSL
st/q({:from {:price prices, :qty quantities}
      :where [[:> :price 100]]
      :group [:region]
      :agg [[:sum [:* :price :qty]] [:count]]})

;; Or with SQL
st/q(
  "SELECT region, SUM(price * qty), COUNT(*)\n       FROM orders WHERE price > 100 GROUP BY region" {"orders" {:price prices, :qty quantities, :region regions}})

(require '[stratum.api :as st])

;; Query with DSL
(st/q {:from {:price prices :qty quantities}
       :where [[:> :price 100]]
       :group [:region]
       :agg   [[:sum [:* :price :qty]]
               [:count]]})

;; Or with SQL
(st/q "SELECT region, SUM(price * qty), COUNT(*)
       FROM orders WHERE price > 100 GROUP BY region"
      {"orders" {:price prices :qty quantities
                 :region regions}})

How it works

Execution

Fused SIMD - predicate evaluation + aggregation in a single pass, no intermediate arrays
Dense group-by - direct array indexing for low-cardinality groups, hash tables for high-cardinality
Zone map pruning - skip entire chunks based on per-chunk min/max statistics
Parallel execution - cache-friendly work partitioning across all cores

Storage

Chunked B-tree - copy-on-write chunks with structural sharing across snapshots
Konserve backend - pluggable storage: filesystem, S3, or custom
Lazy loading - only accessed chunks are loaded from disk on demand
Mark-and-sweep GC - prune unreachable snapshots without manual cleanup

SQL capabilities

DML - SELECT, INSERT, UPDATE, DELETE, UPSERT (INSERT ON CONFLICT), UPDATE FROM, CREATE TABLE, DROP TABLE
Aggregates - SUM, COUNT, AVG, MIN, MAX, STDDEV, VARIANCE, CORR, MEDIAN, PERCENTILE_CONT, APPROX_QUANTILE, COUNT(DISTINCT), FILTER clause
Group-by - any number of columns, string and numeric, with HAVING
Joins - INNER, LEFT, RIGHT, FULL with multi-column keys
Window functions - ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, LAG, LEAD, SUM/COUNT/AVG/MIN/MAX OVER with frame clauses
Composition - CTEs (WITH), subqueries, IN/EXISTS, UNION/INTERSECT/EXCEPT
Expressions - CASE WHEN, COALESCE, NULLIF, GREATEST, LEAST, CAST
Date/time - DATE_TRUNC, EXTRACT, DATE_ADD, DATE_DIFF
String - LIKE/ILIKE, UPPER/LOWER, LENGTH, SUBSTR
Files - FROM read_csv('file.csv'), FROM read_parquet('file.parquet')
Analytics - ANOMALY_SCORE, ANOMALY_PREDICT, ANOMALY_PROBA, ANOMALY_CONFIDENCE (isolation forest via SQL; online rotation for concept drift - docs)
Other - EXPLAIN, SELECT DISTINCT, IS NULL/IS NOT NULL, LIMIT/OFFSET

Work with us

If you need help getting Stratum into production, we can help with integration, custom development, and support contracts.

Get in touch About us

Installation

What is this syntax?

; deps.edn (Clojure CLI)
; check https://clojars.org/org.replikativ/stratum for latest version
{:deps {org.replikativ/stratum {:mvn/version "0.1.114"}}}

; JVM flags required (add to :jvm-opts or alias)
:jvm-opts

["--add-modules=jdk.incubator.vector" "--enable-native-access=ALL-UNNAMED"]

;; Leiningen - project.clj
;; [org.replikativ/stratum "0.1.114"]

; deps.edn (Clojure CLI)
; check https://clojars.org/org.replikativ/stratum for latest version
{:deps {org.replikativ/stratum {:mvn/version "0.1.114"}}}

; JVM flags required (add to :jvm-opts or alias)
:jvm-opts ["--add-modules=jdk.incubator.vector"
           "--enable-native-access=ALL-UNNAMED"]

;; Leiningen - project.clj
;; [org.replikativ/stratum "0.1.114"]

Requires JDK 21+. Clojure 1.12+. Apache 2.0 license. Latest version on Clojars.

GitHub Benchmark docs SQL interface Dataset API