Stratum

SQL that branches

Fork any table in O(1). Query past snapshots. Pure JVM, no native dependencies. Faster than DuckDB on 35 of 46 queries via the Java Vector API. PostgreSQL wire protocol - connect with psql, JDBC, or DBeaver.

35/46 queries faster than DuckDB (10M rows, 1T)
8.7× faster on H2O db-benchmark Q10
O(1) table fork
Pure JVM no native dependencies

Performance

Fused SIMD execution over copy-on-write columnar data. Single-threaded comparison vs DuckDB v1.4.4 (JDBC in-process) on 10M rows, Intel Core Ultra 7 258V, JVM 25. Median of 10 iterations, 5 warmup:

Query Stratum DuckDB Ratio
TPC-H Q6 (filter + sum-product) 13ms 28ms 2.2x faster
Filtered COUNT (NEQ pred) 3ms 12ms 4.0x faster
H2O Q3 (100K string groups) 71ms 362ms 5.1x faster
LIKE '%search%' (string scan) 47ms 240ms 5.1x faster
H2O Q6 (STDDEV group-by) 30ms 81ms 2.7x faster
H2O Q10 (10M groups, 6 cols) 832ms 7056ms 8.5x faster
AVG(LENGTH(URL)) 38ms 170ms 4.5x faster

Stratum wins 35 of 46 queries at 10M rows (single-threaded, median of 10 runs, Intel Core Ultra 7). DuckDB wins on sparse-selectivity filters, high-cardinality hash group-by, and global COUNT(DISTINCT). Full methodology and raw results in the benchmark docs.

Branch anything

st/fork creates an O(1) copy-on-write branch - no data copied, only a pointer to shared chunks. st/sync! persists a branch to storage. st/load restores any named branch. Pass column data as a table map to SQL queries, or register live storage-backed tables in the server with register-live-table!.

(require '[stratum.api :as st]
         '[konserve.file-store :as fs]
         '[clojure.core.async :refer [<!!]])

;; Open storage and load the orders dataset (10M rows)
(def store  (<!! (fs/new-fs-store "/data/stratum")))
(def orders (<!! (st/load store "orders")))

;; Fork in O(1) - structural sharing, zero data copied
(def experiment (st/fork orders))

;; Persist fork as a named branch - shares chunks with main
(<!! (st/sync! experiment store "experiment"))

;; Query both branches via SQL - pass column data as table map
(st/q "SELECT SUM(price * qty) FROM t" {"t" (st/columns orders)})
;; => {:SUM(price * qty) 4821903.40}   ← main branch unchanged

(st/q "SELECT SUM(price * qty) FROM t" {"t" (st/columns experiment)})
;; => {:SUM(price * qty) 4401238.66}   ← experiment branch

;; Time-travel: load any historical branch from storage
(def baseline (<!! (st/load store "orders-baseline")))
(st/q "SELECT COUNT(*) FROM t" {"t" (st/columns baseline)})
;; => {:COUNT(*) 9847233}

Yggdrasil extends branching across your whole stack - fork Datahike, Stratum, and Proximum together for consistent snapshot isolation across SQL, Datalog, and vector search. Yggdrasil → · Dataset API docs →

Why Stratum

Datahike is the system-of-record: Datalog queries, immutable transactions, time travel. Stratum is the SIMD SQL engine for scans and analytics over those same snapshots - fast on group-bys, joins, and window functions where a triple index isn't the right memory layout.

Fast

SIMD-accelerated via Java Vector API. Fused single-pass execution, dense group-by indexing, zone-map pruning. 35 of 46 queries faster than DuckDB at 10M rows.

Branchable

Fork any table in O(1) via copy-on-write structural sharing. Named branches, time-travel, CoW snapshots - built into the engine, not bolted on.

Pure JVM

No JNI, no native compilation. Runs anywhere a JVM does. SIMD acceleration without deployment complexity or platform lock-in.

SQL-complete

PostgreSQL wire protocol. Full DML, CTEs, window functions, aggregates, FROM read_csv / read_parquet. psql, JDBC, DBeaver, psycopg2.

Clojure-native

Datasets implement IEditableCollection, ILookup, IPersistentCollection. tablecloth and tech.ml.dataset work directly. DSL or SQL strings.

Ecosystem

Branch Datahike, Stratum, and Proximum together via Yggdrasil. Consistent snapshots across SQL, Datalog, and vector search.

Quick start

Standalone server

# Java 21+, no Clojure needed
java --add-modules jdk.incubator.vector \
     --enable-native-access=ALL-UNNAMED \
     - jar stratum-standalone.jar \
     --index orders:/data/orders.csv

# Or try the built-in demo tables (lineitem, taxi -100K rows each)
java --add-modules jdk.incubator.vector \
     --enable-native-access=ALL-UNNAMED \
     - jar stratum-standalone.jar --demo

# Connect with any PostgreSQL client
psql - h localhost - p 5432 - U stratum

Clojure API

(require '[stratum.api :as st])

;; Query with DSL
(st/q {:from {:price prices :qty quantities}
       :where [[:> :price 100]]
       :group [:region]
       :agg   [[:sum [:* :price :qty]]
               [:count]]})

;; Or with SQL
(st/q "SELECT region, SUM(price * qty), COUNT(*)
       FROM orders WHERE price > 100 GROUP BY region"
      {"orders" {:price prices :qty quantities
                 :region regions}})

How it works

Execution

  • Fused SIMD - predicate evaluation + aggregation in a single pass, no intermediate arrays
  • Dense group-by - direct array indexing for low-cardinality groups, hash tables for high-cardinality
  • Zone map pruning - skip entire chunks based on per-chunk min/max statistics
  • Parallel execution - cache-friendly work partitioning across all cores

Storage

  • Chunked B-tree - copy-on-write chunks with structural sharing across snapshots
  • Konserve backend - pluggable storage: filesystem, S3, or custom
  • Lazy loading - only accessed chunks are loaded from disk on demand
  • Mark-and-sweep GC - prune unreachable snapshots without manual cleanup

SQL capabilities

  • DML - SELECT, INSERT, UPDATE, DELETE, UPSERT (INSERT ON CONFLICT), UPDATE FROM, CREATE TABLE, DROP TABLE
  • Aggregates - SUM, COUNT, AVG, MIN, MAX, STDDEV, VARIANCE, CORR, MEDIAN, PERCENTILE_CONT, APPROX_QUANTILE, COUNT(DISTINCT), FILTER clause
  • Group-by - any number of columns, string and numeric, with HAVING
  • Joins - INNER, LEFT, RIGHT, FULL with multi-column keys
  • Window functions - ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, LAG, LEAD, SUM/COUNT/AVG/MIN/MAX OVER with frame clauses
  • Composition - CTEs (WITH), subqueries, IN/EXISTS, UNION/INTERSECT/EXCEPT
  • Expressions - CASE WHEN, COALESCE, NULLIF, GREATEST, LEAST, CAST
  • Date/time - DATE_TRUNC, EXTRACT, DATE_ADD, DATE_DIFF
  • String - LIKE/ILIKE, UPPER/LOWER, LENGTH, SUBSTR
  • Files - FROM read_csv('file.csv'), FROM read_parquet('file.parquet')
  • Analytics - ANOMALY_SCORE, ANOMALY_PREDICT, ANOMALY_PROBA, ANOMALY_CONFIDENCE (isolation forest via SQL; online rotation for concept drift - docs)
  • Other - EXPLAIN, SELECT DISTINCT, IS NULL/IS NOT NULL, LIMIT/OFFSET

Work with us

If you need help getting Stratum into production, we can help with integration, custom development, and support contracts.

Installation

; deps.edn (Clojure CLI)
; check https://clojars.org/org.replikativ/stratum for latest version
{:deps {org.replikativ/stratum {:mvn/version "0.1.114"}}}

; JVM flags required (add to :jvm-opts or alias)
:jvm-opts ["--add-modules=jdk.incubator.vector"
           "--enable-native-access=ALL-UNNAMED"]

;; Leiningen - project.clj
;; [org.replikativ/stratum "0.1.114"]

Requires JDK 21+. Clojure 1.12+. Apache 2.0 license. Latest version on Clojars.