versioned · fast · scalable

The Memory Model for Intelligence

Datahike is an open-source immutable database. Every transaction creates a snapshot you can query later, fork without copying data, or verify through the Merkle structure of the storage itself. We built it because we needed a database that remembers everything for production systems, auditable pipelines, and long-running agents. Readers connect directly to S3 or the filesystem. No server required for reads.

Try in browser Why we built this → GitHub

The ecosystem

One branching model across Datalog, SQL, vectors, and search. Fork your entire world-state in O(1).

Datahike

Never forget

Immutable Datalog database. Every transaction is preserved - query any past state, audit any decision.

Clojure, Java, JS, Python, C/C++, CLI, HTTP
Readers connect directly to storage - no server required
Pluggable storage - filesystem, S3, JDBC
GDPR excision with verifiable audit trail

GitHub →

Stratum

SQL that branches

Columnar SQL engine, faster than DuckDB on 36 of 46 benchmarks. Every table forks in O(1).

PostgreSQL wire protocol
Full DML, window functions, CTEs
CoW snapshots and time-travel

Learn more →

Proximum

Vectors with memory

HNSW vector search with immutable snapshots. 37% faster insertion than jvector, pure JVM.

Spring AI & LangChain4j integrations
Merkle-verified index snapshots
No native dependencies

Learn more →

Scriptum

Search that time-travels

Git-like branching for Apache Lucene. Fork a 100 GB index in milliseconds via segment sharing.

Full Lucene 10.x - text, facets, KNN
Zero-cost forks, no data copying
Query any historical commit point

Learn more →

Collaborate without infrastructure

A Datahike database is a value - an immutable snapshot you can hold, share, and query anywhere. Readers connect directly to storage: no server to start, no API to negotiate, no ETL pipeline to maintain. If two teams share an S3 bucket, they can join their databases in a single Datalog expression.

What is this syntax?

; Two teams, two S3 buckets - no servers, no ETL pipeline
def catalog := d/connect({:store {:backend :s3, :bucket "team-a"}})
def inventory := d/connect({:store {:backend :s3, :bucket "team-b"}})

; Join across databases in a single Datalog expression
d/q('[:find ?name ?stock
       :in $cat $inv
       :where [$cat ?p :product/sku  ?sku]
              [$cat ?p :product/name ?name]
              [$inv ?i :stock/sku    ?sku]
              [$inv ?i :stock/count  ?stock]], @catalog, @inventory)
; => #{["Widget A" 142] ["Widget B" 88]}

; Two teams, two S3 buckets - no servers, no ETL pipeline
(def catalog   (d/connect {:store {:backend :s3 :bucket "team-a"}}))
(def inventory (d/connect {:store {:backend :s3 :bucket "team-b"}}))

; Join across databases in a single Datalog expression
(d/q '[:find ?name ?stock
       :in $cat $inv
       :where [$cat ?p :product/sku  ?sku]
              [$cat ?p :product/name ?name]
              [$inv ?i :stock/sku    ?sku]
              [$inv ?i :stock/count  ?stock]]
  @catalog @inventory)
; => #{["Widget A" 142] ["Widget B" 88]}

Datalog natively supports multi-database joins via :in. Both values are immutable snapshots - no locking, no coordination required. Learn more →

Show me

Examples in Java. Also: Clojure, JavaScript, Python, C/C++ (libdatahike), CLI (dthk), Babashka pod, HTTP REST.

Connect

import datahike.java.*;
import java.util.*;

var cfg = Database.file("/tmp/db")
    .keepHistory(true)
    .build();
Datahike.createDatabase(cfg);
var conn = Datahike.connect(cfg);

Transact

Datahike.transact(conn, List.of(
    Map.of(":user/name", "Ada",
           ":user/email", "ada@example.com")));

Query

// Datalog query (EDN syntax)
var results = Datahike.q(
    "[:find ?e ?name :where [?e :user/name ?name]]",
    Datahike.deref(conn));
// => #{[1 "Ada"]}

Time-travel

// Query a past snapshot
var oldDb = Datahike.asOf(
    Datahike.deref(conn),
    Date.from(Instant.parse("2024-01-01T00:00:00Z")));
var history = Datahike.q(
    "[:find ?name :where [_ :user/name ?name]]",
    oldDb);

Full Datalog - joins, aggregates, pull expressions, rules.

JavaScript / Node.js (beta)

Install: npm install datahike@next

const d = require('datahike');
const crypto = require('crypto');

const config = {
  store: {
    backend: ':memory',
    id: crypto.randomUUID()
  },
  'schema-flexibility': ':read'  // Allow schemaless data (use kebab-case)
};

await d.createDatabase(config);
const conn = await d.connect(config);
await d.transact(conn, [{ name: 'Alice' }]);
const db = await d.db(conn);  // db() is async for async backends
const results = await d.q('[:find ?n :where [?e :name ?n]]', db);
console.log(results);
// => [['Alice']]

TypeScript definitions included. Same Datalog queries, Promise-based API. Try in your browser →

Notes

Occasional writing on databases, immutability, and semantic search.

Stratum: SQL that branches

How we built a SIMD-accelerated columnar SQL engine on the JVM with copy-on-write branching - faster than DuckDB on 35 of 46 queries via the Java Vector API.

Why We Built Datahike

A personal story about functional values, long-lived systems, and the memory layer AI needs.

Yggdrasil - Branching Protocols

A protocol stack that brings Git-like branching to any storage system.

The Git Model for Databases

Copy-on-write, structural sharing, and branching - applied to your data.

Why Search Needs Versioning

Immutable search indexes for reproducible retrieval and systems that can explain themselves.

In production

Used by developers and government agencies who need data they can trust.

"Datahike is a foundational part of the stub story - going from a rough prototype all the way to finding product-market fit, generating revenue, and raising capital. It's been a critical part of our journey, and if I had to do this all again, you best believe I'd use Datahike again."

Alex Oloo Cofounder & CTO, Stub - accounting platform for 5,000+ SMBs across South Africa

The Swedish Public Employment Service has used Datahike in production since 2024 for the JobTech Taxonomy -40,000+ labour market concepts (occupations, skills, education standards) accessed by thousands of daily caseworkers. Datahike performs competitively in a benchmark against Datomic in their evaluation.

Arbetsförmedlingen Swedish Public Employment Service - government production deployment

Heidelberg University built emotrack on Datahike - a longitudinal emotion tracking application for psychological research, capturing and querying time-series self-report data across study participants.

Heidelberg University Psychological research - emotion tracking application

Get started

Runs on the JVM. Distributed via Clojars.

Maven - add Clojars repository and dependency:

<!-- Enable Clojars -->
<repository>
  <id>clojars</id>
  <url>https://repo.clojars.org/</url>
</repository>

<!-- Datahike dependency -->
<dependency>
  <groupId>org.replikativ</groupId>
  <artifactId>datahike</artifactId>
  <version>LATEST</version>
</dependency>

Clojure CLI, Leiningen, Gradle, and JavaScript: see the README on GitHub. Or try it in your browser →

Work with us

If you need help getting Datahike into production, we can help with integration, custom development, and support contracts.