versioned · fast · scalable

The Memory Model for Intelligence

Datahike is an open-source immutable database. Every transaction creates a snapshot you can query later, fork without copying data, or verify through the Merkle structure of the storage itself. We built it because we needed a database that remembers everything for production systems, auditable pipelines, and long-running agents. Readers connect directly to S3 or the filesystem. No server required for reads.

Try in browser Why we built this → GitHub

The ecosystem

One branching model across Datalog, SQL, vectors, and search. Fork your entire world-state in O(1).

Datahike

Never forget

Immutable Datalog database. Every transaction is preserved - query any past state, audit any decision.

Clojure, Java, JS, Python, C/C++, CLI, HTTP
Readers connect directly to storage - no server required
Pluggable storage - filesystem, S3, JDBC
GDPR excision with verifiable audit trail

GitHub →

Stratum

SQL that branches

Columnar SQL engine, faster than DuckDB on 36 of 46 benchmarks. Every table forks in O(1).

PostgreSQL wire protocol
Full DML, window functions, CTEs
CoW snapshots and time-travel

Learn more →

Proximum

Vectors with memory

HNSW vector search with immutable snapshots. 37% faster insertion than jvector, pure JVM.

Spring AI & LangChain4j integrations
Merkle-verified index snapshots
No native dependencies

Learn more →

Scriptum

Search that time-travels

Git-like branching for Apache Lucene. Fork a 100 GB index in milliseconds via segment sharing.

Full Lucene 10.x - text, facets, KNN
Zero-cost forks, no data copying
Query any historical commit point

Learn more →

Collaborate without infrastructure

A Datahike database is a value - an immutable snapshot you can hold, share, and query anywhere. Readers connect directly to storage: no server to start, no API to negotiate, no ETL pipeline to maintain. If two teams share an S3 bucket, they can join their databases in a single Datalog expression.

What is this syntax?

; Two teams, two S3 buckets - no servers, no ETL pipeline
def catalog: d/connect({:store {:backend :s3, :bucket "team-a"}})
def inventory: d/connect({:store {:backend :s3, :bucket "team-b"}})

; Join across databases in a single Datalog expression
d/q('[:find ?name ?stock
      :in $cat $inv
      :where [$cat ?p :product/sku ?sku]
             [$cat ?p :product/name ?name]
             [$inv ?i :stock/sku ?sku]
             [$inv ?i :stock/count ?stock]]
  @catalog @inventory)
; => #{["Widget A" 142] ["Widget B" 88]}

; Two teams, two S3 buckets - no servers, no ETL pipeline
(def catalog   (d/connect {:store {:backend :s3 :bucket "team-a"}}))
(def inventory (d/connect {:store {:backend :s3 :bucket "team-b"}}))

; Join across databases in a single Datalog expression
(d/q '[:find ?name ?stock
       :in $cat $inv
       :where [$cat ?p :product/sku  ?sku]
              [$cat ?p :product/name ?name]
              [$inv ?i :stock/sku    ?sku]
              [$inv ?i :stock/count  ?stock]]
  @catalog @inventory)
; => #{["Widget A" 142] ["Widget B" 88]}

Datalog natively supports multi-database joins via :in. Both values are immutable snapshots - no locking, no coordination required. Learn more →

Show me

Examples in Java. Also: Clojure, JavaScript, Python, C/C++ (libdatahike), CLI (dthk), Babashka pod, HTTP REST.

Connect

import datahike.java.*;
import java.util.*;

var cfg = Database.file("/tmp/db")
    .keepHistory(true)
    .build();
Datahike.createDatabase(cfg);
var conn = Datahike.connect(cfg);

Transact

Datahike.transact(conn, List.of(
    Map.of(":user/name", "Ada",
           ":user/email", "ada@example.com")));

Query

// Datalog query (EDN syntax)
var results = Datahike.q(
    "[:find ?e ?name :where [?e :user/name ?name]]",
    Datahike.deref(conn));
// => #{[1 "Ada"]}

Time-travel

// Query a past snapshot
var oldDb = Datahike.asOf(
    Datahike.deref(conn),
    Date.from(Instant.parse("2024-01-01T00:00:00Z")));
var history = Datahike.q(
    "[:find ?name :where [_ :user/name ?name]]",
    oldDb);

Full Datalog - joins, aggregates, pull expressions, rules.

JavaScript / Node.js (beta)

Install: npm install datahike@next

const d = require('datahike');
const crypto = require('crypto');

const config = {
  store: {
    backend: ':memory',
    id: crypto.randomUUID()
  },
  'schema-flexibility': ':read'  // Allow schemaless data (use kebab-case)
};

await d.createDatabase(config);
const conn = await d.connect(config);
await d.transact(conn, [{ name: 'Alice' }]);
const db = await d.db(conn);  // db() is async for async backends
const results = await d.q('[:find ?n :where [?e :name ?n]]', db);
console.log(results);
// => [['Alice']]

TypeScript definitions included. Same Datalog queries, Promise-based API. Try in your browser →

Notes

Occasional writing on databases, immutability, and semantic search.

Data Governance in Versioned Systems

How purge, garbage collection, and access control work across the Datahike ecosystem — and what's still missing.

Branches as Values, Merges as Queries

How Datahike's persistent storage makes branches a few konserve writes, how Datalog with multi-source input becomes the merge language, and a walk through the versioning API.

Datahike Speaks Postgres

pg-datahike beta — pgwire access to Datahike. ORMs, migrations, and psql work, with branches, time-travel, and immutable snapshots underneath.

Anomaly Detection Belongs in Your Database

Why we built SIMD-accelerated isolation forests directly into Stratum's SQL engine — and why exporting to Python is the wrong default.

Versioned Analytics for Regulated Industries

How immutable snapshots, copy-on-write branching, and cross-system consistency solve audit compliance, reproducibility, and scenario analysis in regulated environments.

Memory That Collaborates

How Datahike's distributed index space lets independent processes share and join databases through storage alone.

In production

Used by developers and government agencies who need data they can trust.

"Datahike is a foundational part of the stub story - going from a rough prototype all the way to finding product-market fit, generating revenue, and raising capital. It's been a critical part of our journey, and if I had to do this all again, you best believe I'd use Datahike again."

Alex Oloo Cofounder & CTO, Stub - accounting platform for 5,000+ SMBs across South Africa

The Swedish Public Employment Service has used Datahike in production since 2024 for the JobTech Taxonomy -40,000+ labour market concepts (occupations, skills, education standards) accessed by thousands of daily caseworkers. Datahike performs competitively in a benchmark against Datomic in their evaluation.

Arbetsförmedlingen Swedish Public Employment Service - government production deployment

Heidelberg University built emotrack on Datahike - a longitudinal emotion tracking application for psychological research, capturing and querying time-series self-report data across study participants.

Heidelberg University Psychological research - emotion tracking application

Get started

Runs on the JVM. Distributed via Clojars.

Maven - add Clojars repository and dependency:

<!-- Enable Clojars -->
<repository>
  <id>clojars</id>
  <url>https://repo.clojars.org/</url>
</repository>

<!-- Datahike dependency -->
<dependency>
  <groupId>org.replikativ</groupId>
  <artifactId>datahike</artifactId>
  <version>LATEST</version>
</dependency>

Clojure CLI, Leiningen, Gradle, and JavaScript: see the README on GitHub. Or try it in your browser →

Work with us

If you need help getting Datahike into production, we can help with integration, custom development, and support contracts.