versioned · fast · scalable
The Memory Model for Intelligence
Datahike is an open-source immutable database. Every transaction creates a snapshot you can query later, fork without copying data, or verify through the Merkle structure of the storage itself. We built it because we needed a database that remembers everything for production systems, auditable pipelines, and long-running agents. Readers connect directly to S3 or the filesystem. No server required for reads.
The ecosystem
One branching model across Datalog, SQL, vectors, and search. Fork your entire world-state in O(1).
Datahike
Never forget
Immutable Datalog database. Every transaction is preserved - query any past state, audit any decision.
- Clojure, Java, JS, Python, C/C++, CLI, HTTP
- Readers connect directly to storage - no server required
- Pluggable storage - filesystem, S3, JDBC
- GDPR excision with verifiable audit trail
Stratum
SQL that branches
Columnar SQL engine, faster than DuckDB on 36 of 46 benchmarks. Every table forks in O(1).
- PostgreSQL wire protocol
- Full DML, window functions, CTEs
- CoW snapshots and time-travel
Proximum
Vectors with memory
HNSW vector search with immutable snapshots. 37% faster insertion than jvector, pure JVM.
- Spring AI & LangChain4j integrations
- Merkle-verified index snapshots
- No native dependencies
Scriptum
Search that time-travels
Git-like branching for Apache Lucene. Fork a 100 GB index in milliseconds via segment sharing.
- Full Lucene 10.x - text, facets, KNN
- Zero-cost forks, no data copying
- Query any historical commit point
Collaborate without infrastructure
A Datahike database is a value - an immutable snapshot you can hold, share, and query anywhere. Readers connect directly to storage: no server to start, no API to negotiate, no ETL pipeline to maintain. If two teams share an S3 bucket, they can join their databases in a single Datalog expression.
; Two teams, two S3 buckets - no servers, no ETL pipeline
(def catalog (d/connect {:store {:backend :s3 :bucket "team-a"}}))
(def inventory (d/connect {:store {:backend :s3 :bucket "team-b"}}))
; Join across databases in a single Datalog expression
(d/q '[:find ?name ?stock
:in $cat $inv
:where [$cat ?p :product/sku ?sku]
[$cat ?p :product/name ?name]
[$inv ?i :stock/sku ?sku]
[$inv ?i :stock/count ?stock]]
@catalog @inventory)
; => #{["Widget A" 142] ["Widget B" 88]}
Datalog natively supports multi-database joins via :in.
Both values are immutable snapshots - no locking, no coordination required.
Learn more →
Show me
Examples in Java. Also: Clojure, JavaScript, Python, C/C++ (libdatahike), CLI (dthk), Babashka pod, HTTP REST.
Connect
import datahike.java.*;
import java.util.*;
var cfg = Database.file("/tmp/db")
.keepHistory(true)
.build();
Datahike.createDatabase(cfg);
var conn = Datahike.connect(cfg); Transact
Datahike.transact(conn, List.of(
Map.of(":user/name", "Ada",
":user/email", "ada@example.com"))); Query
// Datalog query (EDN syntax)
var results = Datahike.q(
"[:find ?e ?name :where [?e :user/name ?name]]",
Datahike.deref(conn));
// => #{[1 "Ada"]} Time-travel
// Query a past snapshot
var oldDb = Datahike.asOf(
Datahike.deref(conn),
Date.from(Instant.parse("2024-01-01T00:00:00Z")));
var history = Datahike.q(
"[:find ?name :where [_ :user/name ?name]]",
oldDb); Full Datalog - joins, aggregates, pull expressions, rules.
JavaScript / Node.js (beta)
Install: npm install datahike@next
const d = require('datahike');
const crypto = require('crypto');
const config = {
store: {
backend: ':memory',
id: crypto.randomUUID()
},
'schema-flexibility': ':read' // Allow schemaless data (use kebab-case)
};
await d.createDatabase(config);
const conn = await d.connect(config);
await d.transact(conn, [{ name: 'Alice' }]);
const db = await d.db(conn); // db() is async for async backends
const results = await d.q('[:find ?n :where [?e :name ?n]]', db);
console.log(results);
// => [['Alice']] TypeScript definitions included. Same Datalog queries, Promise-based API. Try in your browser →
Notes
Occasional writing on databases, immutability, and semantic search.
Stratum: SQL that branches
How we built a SIMD-accelerated columnar SQL engine on the JVM with copy-on-write branching - faster than DuckDB on 35 of 46 queries via the Java Vector API.
Why We Built Datahike
A personal story about functional values, long-lived systems, and the memory layer AI needs.
Yggdrasil - Branching Protocols
A protocol stack that brings Git-like branching to any storage system.
The Git Model for Databases
Copy-on-write, structural sharing, and branching - applied to your data.
Why Search Needs Versioning
Immutable search indexes for reproducible retrieval and systems that can explain themselves.
In production
Used by developers and government agencies who need data they can trust.
"Datahike is a foundational part of the stub story - going from a rough prototype all the way to finding product-market fit, generating revenue, and raising capital. It's been a critical part of our journey, and if I had to do this all again, you best believe I'd use Datahike again."
The Swedish Public Employment Service has used Datahike in production since 2024 for the JobTech Taxonomy -40,000+ labour market concepts (occupations, skills, education standards) accessed by thousands of daily caseworkers. Datahike performs competitively in a benchmark against Datomic in their evaluation.
Heidelberg University built emotrack on Datahike - a longitudinal emotion tracking application for psychological research, capturing and querying time-series self-report data across study participants.
Get started
Runs on the JVM. Distributed via Clojars.
Maven - add Clojars repository and dependency:
<!-- Enable Clojars -->
<repository>
<id>clojars</id>
<url>https://repo.clojars.org/</url>
</repository>
<!-- Datahike dependency -->
<dependency>
<groupId>org.replikativ</groupId>
<artifactId>datahike</artifactId>
<version>LATEST</version>
</dependency> Clojure CLI, Leiningen, Gradle, and JavaScript: see the README on GitHub. Or try it in your browser →
Work with us
If you need help getting Datahike into production, we can help with integration, custom development, and support contracts.