Stratum
SQL that branches
Fork any table in O(1). Query past snapshots. Pure JVM, no native dependencies. Faster than DuckDB on 35 of 46 queries via the Java Vector API. PostgreSQL wire protocol - connect with psql, JDBC, or DBeaver.
Performance
Fused SIMD execution over copy-on-write columnar data. Single-threaded comparison vs DuckDB v1.4.4 (JDBC in-process) on 10M rows, Intel Core Ultra 7 258V, JVM 25. Median of 10 iterations, 5 warmup:
| Query | Stratum | DuckDB | Ratio |
|---|---|---|---|
| TPC-H Q6 (filter + sum-product) | 13ms | 28ms | 2.2x faster |
| Filtered COUNT (NEQ pred) | 3ms | 12ms | 4.0x faster |
| H2O Q3 (100K string groups) | 71ms | 362ms | 5.1x faster |
| LIKE '%search%' (string scan) | 47ms | 240ms | 5.1x faster |
| H2O Q6 (STDDEV group-by) | 30ms | 81ms | 2.7x faster |
| H2O Q10 (10M groups, 6 cols) | 832ms | 7056ms | 8.5x faster |
| AVG(LENGTH(URL)) | 38ms | 170ms | 4.5x faster |
Stratum wins 35 of 46 queries at 10M rows (single-threaded, median of 10 runs, Intel Core Ultra 7). DuckDB wins on sparse-selectivity filters, high-cardinality hash group-by, and global COUNT(DISTINCT). Full methodology and raw results in the benchmark docs.
Branch anything
st/fork creates an O(1) copy-on-write branch - no data copied, only a pointer
to shared chunks. st/sync! persists a branch to storage. st/load
restores any named branch. Pass column data as a table map to SQL queries, or register
live storage-backed tables in the server with register-live-table!.
(require '[stratum.api :as st]
'[konserve.file-store :as fs]
'[clojure.core.async :refer [<!!]])
;; Open storage and load the orders dataset (10M rows)
(def store (<!! (fs/new-fs-store "/data/stratum")))
(def orders (<!! (st/load store "orders")))
;; Fork in O(1) - structural sharing, zero data copied
(def experiment (st/fork orders))
;; Persist fork as a named branch - shares chunks with main
(<!! (st/sync! experiment store "experiment"))
;; Query both branches via SQL - pass column data as table map
(st/q "SELECT SUM(price * qty) FROM t" {"t" (st/columns orders)})
;; => {:SUM(price * qty) 4821903.40} ← main branch unchanged
(st/q "SELECT SUM(price * qty) FROM t" {"t" (st/columns experiment)})
;; => {:SUM(price * qty) 4401238.66} ← experiment branch
;; Time-travel: load any historical branch from storage
(def baseline (<!! (st/load store "orders-baseline")))
(st/q "SELECT COUNT(*) FROM t" {"t" (st/columns baseline)})
;; => {:COUNT(*) 9847233} Yggdrasil extends branching across your whole stack - fork Datahike, Stratum, and Proximum together for consistent snapshot isolation across SQL, Datalog, and vector search. Yggdrasil → · Dataset API docs →
Why Stratum
Datahike is the system-of-record: Datalog queries, immutable transactions, time travel. Stratum is the SIMD SQL engine for scans and analytics over those same snapshots - fast on group-bys, joins, and window functions where a triple index isn't the right memory layout.
Fast
SIMD-accelerated via Java Vector API. Fused single-pass execution, dense group-by indexing, zone-map pruning. 35 of 46 queries faster than DuckDB at 10M rows.
Branchable
Fork any table in O(1) via copy-on-write structural sharing. Named branches, time-travel, CoW snapshots - built into the engine, not bolted on.
Pure JVM
No JNI, no native compilation. Runs anywhere a JVM does. SIMD acceleration without deployment complexity or platform lock-in.
SQL-complete
PostgreSQL wire protocol. Full DML, CTEs, window functions, aggregates, FROM read_csv / read_parquet. psql, JDBC, DBeaver, psycopg2.
Clojure-native
Datasets implement IEditableCollection, ILookup, IPersistentCollection. tablecloth and tech.ml.dataset work directly. DSL or SQL strings.
Ecosystem
Branch Datahike, Stratum, and Proximum together via Yggdrasil. Consistent snapshots across SQL, Datalog, and vector search.
Quick start
Standalone server
# Java 21+, no Clojure needed
java --add-modules jdk.incubator.vector \
--enable-native-access=ALL-UNNAMED \
- jar stratum-standalone.jar \
--index orders:/data/orders.csv
# Or try the built-in demo tables (lineitem, taxi -100K rows each)
java --add-modules jdk.incubator.vector \
--enable-native-access=ALL-UNNAMED \
- jar stratum-standalone.jar --demo
# Connect with any PostgreSQL client
psql - h localhost - p 5432 - U stratum Clojure API
(require '[stratum.api :as st])
;; Query with DSL
(st/q {:from {:price prices :qty quantities}
:where [[:> :price 100]]
:group [:region]
:agg [[:sum [:* :price :qty]]
[:count]]})
;; Or with SQL
(st/q "SELECT region, SUM(price * qty), COUNT(*)
FROM orders WHERE price > 100 GROUP BY region"
{"orders" {:price prices :qty quantities
:region regions}}) How it works
Execution
- Fused SIMD - predicate evaluation + aggregation in a single pass, no intermediate arrays
- Dense group-by - direct array indexing for low-cardinality groups, hash tables for high-cardinality
- Zone map pruning - skip entire chunks based on per-chunk min/max statistics
- Parallel execution - cache-friendly work partitioning across all cores
Storage
- Chunked B-tree - copy-on-write chunks with structural sharing across snapshots
- Konserve backend - pluggable storage: filesystem, S3, or custom
- Lazy loading - only accessed chunks are loaded from disk on demand
- Mark-and-sweep GC - prune unreachable snapshots without manual cleanup
SQL capabilities
- DML - SELECT, INSERT, UPDATE, DELETE, UPSERT (INSERT ON CONFLICT), UPDATE FROM, CREATE TABLE, DROP TABLE
- Aggregates - SUM, COUNT, AVG, MIN, MAX, STDDEV, VARIANCE, CORR, MEDIAN, PERCENTILE_CONT, APPROX_QUANTILE, COUNT(DISTINCT), FILTER clause
- Group-by - any number of columns, string and numeric, with HAVING
- Joins - INNER, LEFT, RIGHT, FULL with multi-column keys
- Window functions - ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, LAG, LEAD, SUM/COUNT/AVG/MIN/MAX OVER with frame clauses
- Composition - CTEs (WITH), subqueries, IN/EXISTS, UNION/INTERSECT/EXCEPT
- Expressions - CASE WHEN, COALESCE, NULLIF, GREATEST, LEAST, CAST
- Date/time - DATE_TRUNC, EXTRACT, DATE_ADD, DATE_DIFF
- String - LIKE/ILIKE, UPPER/LOWER, LENGTH, SUBSTR
- Files -
FROM read_csv('file.csv'),FROM read_parquet('file.parquet') - Analytics - ANOMALY_SCORE, ANOMALY_PREDICT, ANOMALY_PROBA, ANOMALY_CONFIDENCE (isolation forest via SQL; online rotation for concept drift - docs)
- Other - EXPLAIN, SELECT DISTINCT, IS NULL/IS NOT NULL, LIMIT/OFFSET
Work with us
If you need help getting Stratum into production, we can help with integration, custom development, and support contracts.
Installation
; deps.edn (Clojure CLI)
; check https://clojars.org/org.replikativ/stratum for latest version
{:deps {org.replikativ/stratum {:mvn/version "0.1.114"}}}
; JVM flags required (add to :jvm-opts or alias)
:jvm-opts ["--add-modules=jdk.incubator.vector"
"--enable-native-access=ALL-UNNAMED"]
;; Leiningen - project.clj
;; [org.replikativ/stratum "0.1.114"] Requires JDK 21+. Clojure 1.12+. Apache 2.0 license. Latest version on Clojars.