Datahike Speaks Postgres
May 2026
Open psql. Connect. Run a query. Switch branches. Run it again — same connection, same wire protocol, different version of the database.
$ psql postgresql://localhost:5432/inventory
inventory=> SELECT count(*) FROM widget;
count
-------
4218
inventory=> SET datahike.branch = 'pricing-experiment';
SET
inventory=> SELECT count(*) FROM widget;
count
-------
4221
inventory=> RESET datahike.branch;
SET
That’s not a feature toggle on a Postgres replica. It’s the same database — addressed through standard pgwire — viewed through two different commits. The implementation is pg-datahike, a beta we’re shipping today.
What it is
pg-datahike embeds a PostgreSQL-compatible adapter inside a Datahike process: wire protocol, SQL translator, virtual pg_* and information_schema catalogs, constraint enforcement, schema hints. Clients that speak Postgres talk to Datahike without a Postgres install — pgjdbc, Hibernate, SQLAlchemy, Odoo 19, and Metabase bootstrap unmodified against it. The migration path is round-trippable: pg_dump output replays into pg-datahike via psql, and the standalone jar dumps Datahike databases back out as portable PG SQL. Detailed test results at the end of this post.
A 60-second tour
The operator runs one jar. Everything else is psql.
$ java -jar pg-datahike-VERSION-standalone.jar
pg-datahike VERSION ready on 127.0.0.1:5432
backend: file (~/.local/share/pg-datahike)
history: off
CREATE DATABASE: enabled
databases: ["datahike"]
Connect with: psql -h 127.0.0.1 -p 5432 -U datahike datahike
Press Ctrl+C to stop.
JDK 17+ is the only prerequisite; the jar is on GitHub releases. --memory for an ephemeral run; --help covers the rest.
The rest is psql — provision a fresh database, populate it, pin a session to a historical commit, drop it.
$ psql postgresql://localhost:5432/datahike
datahike=> CREATE DATABASE inventory;
CREATE DATABASE
datahike=> \c inventory
You are now connected to database "inventory".
inventory=> CREATE TABLE widget (sku TEXT PRIMARY KEY, weight INT);
CREATE TABLE
inventory=> INSERT INTO widget VALUES ('A', 10), ('B', 20);
INSERT 0 2
inventory=> SELECT datahike.commit_id();
commit_id
---------------------------------------
b4f2e1c0-2feb-5b61-be14-5590b9e01e48 ← copy this
inventory=> INSERT INTO widget VALUES ('C', 30);
INSERT 0 1
inventory=> SELECT count(*) FROM widget;
count
-------
3
inventory=> SET datahike.commit_id = 'b4f2e1c0-2feb-5b61-be14-5590b9e01e48';
SET
inventory=> SELECT count(*) FROM widget; -- the database before the third insert
count
-------
2
inventory=> RESET datahike.commit_id;
SET
inventory=> \c datahike
datahike=> DROP DATABASE inventory;
DROP DATABASE
SET datahike.commit_id pins the session to a historical commit; everything else is plain Postgres. Sixty seconds, one jar, no Postgres install, no Clojure.
Architecture in one minute
What happens when you SET datahike.branch = 'feature'?
Datahike stores its database as a tree of immutable nodes in konserve, a key-value abstraction over filesystems, S3, JDBC, IndexedDB, and others. Every transaction writes new nodes for changed paths and shares unchanged subtrees with the previous version — the trick behind Clojure’s persistent vectors and Git’s object store. A commit is a small map listing the root pointers for each index; a branch is a named pointer at a commit.
So on SET datahike.branch = 'feature', the handler updates a session variable, and the next query loads that branch’s commit pointer from konserve, walks the tree, returns rows. No coordination with a transactor; storage is the source of truth. SET datahike.commit_id = '<uuid>' works the same way one level deeper — the session points at a specific commit instead of a branch head.
Two consequences worth flagging:
- Branching is one konserve write. Creating a branch from any commit is constant time, regardless of database size, because structural sharing means the new branch points at existing nodes.
- Reads don’t go through a transactor. Every node is content-addressable; any process that can read the storage can run queries against it. In principle, read fanout is bounded by storage bandwidth, not replica capacity — we’ll publish numbers in a follow-up. See Memory That Collaborates for more.
Integration patterns
1. Multi-database server
A single start-server call serves many Datahike connections. Clients route on the JDBC URL’s database name:
pg/start-server({"prod" prod-conn
"staging" staging-conn
"reports" reports-conn}
{:port 5432})
(pg/start-server {"prod" prod-conn
"staging" staging-conn
"reports" reports-conn}
{:port 5432})
Same shape on the standalone jar with repeatable --db flags: java -jar pg-datahike.jar --db prod --db staging --db reports.
jdbc:postgresql://localhost:5432/prod → prod-conn
jdbc:postgresql://localhost:5432/staging → staging-conn
jdbc:postgresql://localhost:5432/nonsuch → 3D000 invalid_catalog_name
SELECT current_database() returns the connected name; pg_database enumerates the registry. Useful for multi-tenant deployments, or when ops wants one pgwire endpoint serving many independent stores.
2. Schema hints
Existing Datahike schemas don’t always look the way you’d want them to over SQL. :datahike.pg/* meta-attributes customize the SQL view without touching the underlying schema:
pg/set-hint!(conn :person/full_name {:column "name"})
pg/set-hint!(conn :person/ssn {:hidden true})
pg/set-hint!(conn :person/company {:references :company/id})
(pg/set-hint! conn :person/full_name {:column "name"}) ; rename the column
(pg/set-hint! conn :person/ssn {:hidden true}) ; exclude from SQL
(pg/set-hint! conn :person/company {:references :company/id}) ; FK target
After set-hint!, SELECT name FROM person works, ssn is invisible to SELECT * and information_schema.columns, and JOIN company c ON p.company = c.id resolves on Datahike’s native ref semantics.
3. Time-travel via SET
Datahike’s temporal primitives are exposed as session variables. The client doesn’t need to know what as-of means — it just sets a variable:
SET datahike.as_of = '2024-01-15T00:00:00Z'; -- d/as-of
SET datahike.since = '2024-01-01T00:00:00Z'; -- d/since
SET datahike.history = 'true'; -- d/history
RESET datahike.as_of;
Every subsequent query in the session sees the chosen view. A reporting tool that doesn’t know about Datahike can produce point-in-time reports by setting one variable.
4. Git-like branching
Branching is cheap in Datahike: every transaction produces a new immutable commit, so a branch is just a named pointer at a commit UUID. Creation is O(1) — one konserve write, no data copy, no WAL replay. pgwire exposes the read side and the admin operations through standard PG mechanisms:
-- Introspect
SELECT datahike.branches();
SELECT datahike.current_branch();
SELECT datahike.commit_id();
-- Admin (konserve-level writes — they don't go through the tx writer)
SELECT datahike.create_branch('preview', 'db'); -- 'db' is Datahike's default branch name
SELECT datahike.create_branch('from-cid', '69ea6ee1-…');
SELECT datahike.delete_branch('preview');
-- Session view: three cuts on the same immutable log.
-- They compose — a feature branch's state as of yesterday is two SETs.
SET datahike.branch = 'feature';
SET datahike.commit_id = '69ea6ee1-2feb-5b61-be14-5590b9e01e48';
SET datahike.as_of = '2024-01-15T00:00:00Z';
Or pin a branch at connect time via the JDBC URL:
jdbc:postgresql://localhost:5432/prod:feature → prod-conn, pinned to :feature
jdbc:postgresql://localhost:5432/prod → prod-conn, default branch
SET datahike.commit_id = '<uuid>' is Datahike-unique: no other PG-compatible database lets a session pin to an exact commit identifier.
We’ll cover the structural-sharing model that makes branching this cheap in a follow-up post — including how it works across all the Datahike bindings, not just pgwire.
5. SQL-driven database provisioning
Set a :database-template on the server and pgwire clients self-provision and tear down databases over plain SQL. The template is a partial Datahike config; each CREATE DATABASE produces a fresh store with a generated UUID:
pg/start-server({"datahike" boot-conn}
{:port 5432 :database-template {:store {:backend :memory} :schema-flexibility :write :keep-history? true}})
(pg/start-server {"datahike" boot-conn}
{:port 5432
:database-template {:store {:backend :memory}
:schema-flexibility :write
:keep-history? true}})
WITH clauses override the template per-database, and the SQL surface accepts both standard PG forms:
CREATE DATABASE myapp; -- inherits the template
CREATE DATABASE histdb WITH KEEP_HISTORY = true; -- override per database
CREATE DATABASE memdb WITH (BACKEND = 'memory', -- Yugabyte-style paren form
INDEX = 'persistent-set');
DROP DATABASE myapp;
DROP DATABASE IF EXISTS old_one;
Accepted WITH keys map case-insensitively to Datahike config:
WITH option |
Datahike config | Notes |
|---|---|---|
BACKEND |
[:store :backend] |
'memory', 'file' built-in; 'jdbc', 's3', 'redis', 'lmdb', 'rocksdb', 'dynamodb' via external konserve libraries |
STORE_ID |
[:store :id] |
Defaults to a fresh UUID per CREATE |
PATH |
[:store :path] |
File backend; {{name}} interpolation supported |
HOST / PORT / USER / PASSWORD / DBNAME |
[:store :*] |
jdbc / redis backends |
SCHEMA_FLEXIBILITY |
:schema-flexibility |
'read' or 'write' |
KEEP_HISTORY |
:keep-history? |
|
INDEX |
:index |
'persistent-set' → :datahike.index/persistent-set |
OWNER / TEMPLATE / ENCODING / LOCALE / TABLESPACE / … |
— | Postgres-only; silently accepted with a NOTICE so pg_dump round-trips work |
The standalone jar enables this by default (use --no-create-database to disable). Embedded servers opt in via :database-template (or explicit :on-create-database / :on-delete-database hooks). Without one, CREATE / DROP DATABASE return SQLSTATE 0A000 feature_not_supported; mismatched preconditions return the standard PG SQLSTATEs.
Migrating from PostgreSQL
Wire compatibility extends to pg_dump SQL on both sides. Three workflows.
Real PostgreSQL → pg-datahike
pg_dump output replays straight into pg-datahike via psql or any JDBC client. Schema-side coverage: CREATE TABLE with FK constraints, CREATE SEQUENCE, DEFAULT nextval(…), CREATE TYPE … AS ENUM, CREATE DOMAIN, partitioned tables. Data-side: INSERT (single + multi-VALUES) and COPY … FROM stdin (text and CSV).
Run with the :pg-dump compat preset to silently accept constructs pg-datahike doesn’t model — triggers, functions, materialized views, ALTER OWNER:
java -jar pg-datahike.jar --compat pg-dump
psql -h localhost -p 5432 -U datahike -d datahike -f my_pg_dump.sql
Validated end-to-end against Chinook (15.6k rows, 11 tables, FKs, NUMERIC, TIMESTAMP) — full byte-identical bidirectional roundtrip — and Pagila (50k rows, 22 tables, ENUM, DOMAIN, partitioning, triggers, functions) — schema parses end-to-end, data loads.
pg-datahike → portable PG SQL
The standalone jar’s dump subcommand walks a Datahike database and emits pg_dump-shaped SQL. The output replays into either pg-datahike or real PostgreSQL via psql:
java -jar pg-datahike.jar dump --data-dir DIR --db NAME --out out.sql
java -jar pg-datahike.jar dump --config datahike-config.edn --copy
Flags cover INSERT-vs-COPY output, schema-only / data-only, and table exclusion. --config accepts a full Datahike config EDN, so any konserve backend works; store-id is auto-discovered.
What the resulting Datahike schema looks like
A native Datahike database — created with d/transact, never touched by SQL — also dumps as clean PG SQL. The inverse mapping is well-defined:
:db.unique/identity→PRIMARY KEY NOT NULL:db.unique/value→UNIQUE:db.cardinality/many T→T[]with PG array literals:db.type/ref→bigint(the entity id; opt in to FK constraints withset-hint! :references)
So whether you start from a real PostgreSQL dump or from native Datahike, both sides translate cleanly through the same shape. The resulting schema is correct and queryable as both SQL relations and Datalog datoms. It isn’t always what you’d hand-design for entity-shaped Datalog queries — many apps stay with the relational shape, others evolve incrementally as they reach for Datalog’s strengths (pull patterns, rules, multi-source joins).
What it isn’t
This is a 0.1 beta and we want to be specific about the gaps:
- PL/pgSQL, stored functions, triggers, rules, and materialized views are accepted under the
:pg-dumpcompat preset (loaded but not executed); strict mode rejects them - No
LISTEN/NOTIFY - No
COPY … TO STDOUT(COPY … FROM stdinis supported in text and CSV formats) - FK
ON DELETEenforced forNO ACTION/RESTRICT/CASCADE;SET NULL/SET DEFAULTand anyON UPDATEaction are rejected at DDL - Single
publicschema —CREATE SCHEMAis silently accepted but a no-op - Cursor materialization is eager (entire result set held in memory)
- No deferrable constraints
- Generated columns parse but aren’t enforced
- Writes always land on the connection’s default branch in 0.1, even when
SET datahike.branchis active. Reads respect the pinned branch; writes don’t yet. Usedatahike.versioning/branch!andmerge!from Clojure for branch-targeted writes, or open a second connection on/<db>:<branch>. - Constraint enforcement is one-directional. SQL constraints declared via DDL (
NOT NULL,CHECK,UNIQUE, FKRESTRICT) are enforced by the pgwire handler; direct(d/transact)writes from Clojure bypass them because Datahike’s schema doesn’t yet carry the constraint vocabulary. A future release will lift enforcement into the tx layer so both paths are gated. - Bulk-insert throughput is ~5,000 rows/sec on JDBC batch (Pagila replays in ~12s, Chinook in ~3s) — Datahike maintains EAVT/AEVT/AVET live, so a 10-column row costs ~10× a single index write. Tuned bulk paths in vanilla PG (
COPY,pg_restore -j) are an order of magnitude faster, partly via deferred index construction; an analogous bulk-load fast path is a future item. Large migrations are overnight-cutover territory today.
The conformance posture is: pass for the workloads we’ve measured against, fail fast and loud everywhere else. We’d rather reject a stored procedure than execute it incorrectly.
Where this fits
If you’ve used Neon or Xata, the goal will look familiar — branchable Postgres. The mechanism is different. Their branches are control-plane operations: call the API, get a new compute instance over copy-on-write storage. pg-datahike’s branches are session-level — SET datahike.branch = 'feature' inside an open psql connection switches what you’re reading. No provisioning, no compute. An agent or a query planner can switch branches mid-session.
Commit pinning — SET datahike.commit_id = '<uuid>' — is the part where we don’t know of a peer. Neon’s time-travel is bounded by a 6h–1d restore window; pg-datahike pins to any historical commit, indefinitely. We have not seen another PG-compatible database expose this directly through the wire protocol.
Dolt is the closest in spirit — git-like semantics, commit pinning, time-travel — but Dolt is MySQL with a custom storage engine. pg-datahike rides on the standard Postgres wire protocol; every PG client works without modification.
The honest tradeoff: we are a compatibility layer over Datahike’s storage, not a fork of Postgres. Some features tied to the Postgres codebase — PL/pgSQL, the extension ecosystem, procedural languages — aren’t on our roadmap today. If you need those, use Postgres. If your bottleneck is versioning, branching, or reproducibility, this gets you there without leaving the wire protocol your tools already speak.
Datahike has been a Datalog database with a Clojure API and growing language bindings; pg-datahike isn’t a separate database, just another front end on the same store. There’s a sibling: Stratum, a SIMD-accelerated columnar engine that speaks the same wire protocol over an analytical column store with the same fork-as-pointer semantics. Both fit into a shared branching model — see Yggdrasil: Branching Protocols for how a Datahike database, a Stratum dataset, and a vector index can fork together at a single snapshot.
The rest of this post is for callers who do speak Clojure — the same data accessible as relations and as datoms, in-process queries that skip the wire, embedded mode without TCP, and configuration knobs that aren’t exposed over SQL.
Bidirectional view
The pgwire layer is a view onto Datahike’s datom store, not a separate representation. Tables you create over SQL show up as normal Datahike schemas, queryable from Clojure with (d/q …). Existing Datahike schemas show up as SQL tables with no setup.
;; Plain Datahike schema, transacted from Clojure
d/transact(conn
[{:db/ident :person/id :db/valueType :db.type/long
:db/cardinality :db.cardinality/one :db/unique :db.unique/identity}
{:db/ident :person/name :db/valueType :db.type/string
:db/cardinality :db.cardinality/one}])
d/transact(conn [{:person/id 1, :person/name "Alice"}])
;; Plain Datahike schema, transacted from Clojure
(d/transact conn
[{:db/ident :person/id :db/valueType :db.type/long
:db/cardinality :db.cardinality/one :db/unique :db.unique/identity}
{:db/ident :person/name :db/valueType :db.type/string
:db/cardinality :db.cardinality/one}])
(d/transact conn [{:person/id 1 :person/name "Alice"}])
-- Same database, over psql:
SELECT * FROM person;
-- id | name
-- ----+-------
-- 1 | Alice
The reverse holds too — CREATE TABLE over pgwire transacts a normal Datahike schema, and the next (d/q …) from Clojure sees the rows you just inserted. There is no shadow representation, no separate metadata. One datom store, two query languages.
Using the library directly
Two ways to skip the standalone jar — start a server from your own JVM application, or bypass the wire layer entirely.
Start a server in-process
;; deps.edn
{:deps {org.replikativ/datahike {:mvn/version "LATEST"}
org.replikativ/pg-datahike {:mvn/version "LATEST"}}}
;; deps.edn
{:deps {org.replikativ/datahike {:mvn/version "LATEST"}
org.replikativ/pg-datahike {:mvn/version "LATEST"}}}
require('[datahike.api :as d] '[datahike.pg :as pg])
let [boot {:store {:backend :memory, :id random-uuid()}, :schema-flexibility :write}]:
d/create-database(boot)
pg/start-server({"datahike" d/connect(boot)} {:port 5432, :database-template {:store {:backend :memory}, :schema-flexibility :write, :keep-history? true}})
end
;; => :running on :5432
(require '[datahike.api :as d]
'[datahike.pg :as pg])
(let [boot {:store {:backend :memory :id (random-uuid)}
:schema-flexibility :write}]
(d/create-database boot)
(pg/start-server {"datahike" (d/connect boot)}
{:port 5432
:database-template {:store {:backend :memory}
:schema-flexibility :write
:keep-history? true}}))
;; => :running on :5432
Same pgwire surface, in-process. The integration patterns earlier in this post are the embedded-library API; the standalone jar wraps the same calls behind CLI flags.
Bypass the wire entirely
Tests and in-process applications don’t need the wire layer at all:
def h: pg/make-query-handler(conn)
h.execute("CREATE TABLE person (id INT PRIMARY KEY, name TEXT)")
h.execute("INSERT INTO person VALUES (1, 'Alice')")
h.execute("SELECT * FROM person")
(def h (pg/make-query-handler conn))
(.execute h "CREATE TABLE person (id INT PRIMARY KEY, name TEXT)")
(.execute h "INSERT INTO person VALUES (1, 'Alice')")
(.execute h "SELECT * FROM person")
Same SQL surface, no socket. Useful for property-based testing of SQL workloads, or for embedding the SQL interface inside a Clojure or ClojureScript application without exposing a port.
Permissive vs. strict compat
By default the handler rejects unsupported DDL — GRANT, REVOKE, CREATE POLICY, ROW LEVEL SECURITY, CREATE EXTENSION, COPY — with SQLSTATE 0A000 feature_not_supported. Most ORMs emit some of these unconditionally. Two ways to relax:
;; silently accept every auth/RLS/extension no-op (Hibernate, Odoo)
pg/make-query-handler(conn {:compat :permissive})
;; accept specific kinds only
pg/make-query-handler(conn {:silently-accept #{:grant :policy}})
;; silently accept every auth/RLS/extension no-op (Hibernate, Odoo)
(pg/make-query-handler conn {:compat :permissive})
;; accept specific kinds only
(pg/make-query-handler conn {:silently-accept #{:grant :policy}})
The named presets in datahike.pg.server/compat-presets cover the common ORM patterns.
SQL or Datalog?
Both interfaces see the same datoms, the same indexes, the same history. The choice is about how the query reaches the engine.
Reach for SQL when callers don’t share a runtime with the database — services over the wire, analysts in Metabase, tools that only speak the wire protocol — or when you want existing tooling: ORMs, migration runners, BI dashboards.
Reach for Datalog when the query runs in the same process as the database. Datahike’s Datalog API is a Clojure function: pass values in, get values out, no parsing, no serialization, no socket. Even pg-datahike’s embedded mode (the make-query-handler path shown above) still goes through the SQL parser and the translator; Datalog skips both. You can invoke arbitrary Clojure functions inside predicates, return live data structures without copying, and join across multiple databases on different storage backends in a single query.
The two paths compose. DDL via Flyway over SQL, then reads in Datalog from your Clojure backend. Or: Datahike schema in Clojure, ORM-driven CRUD over SQL. Both stay coherent because they’re views of the same datom store.
Compatibility evidence
We test pg-datahike against the same suites the Postgres ecosystem uses on itself. If a suite passes here, the apps that depend on it generally work here.
| Layer | Test suite | Result | What this proves |
|---|---|---|---|
| JDBC driver | pgjdbc 42.7.5 — ResultSetTest |
80 / 80 | Cursors, type decoding, and metadata behave the way every JVM Postgres client expects. |
| Java ORM | Hibernate 6 — DatahikeHibernateTest |
13 / 13 | JPA stacks — Spring, Quarkus, Jakarta — talk to pg-datahike the same way they talk to Postgres. |
| Python ORM | SQLAlchemy 2.0 dialect | 16 / 16 across 7 phases | The Python data ecosystem — Django, Flask, FastAPI, Airflow, dbt — connects via the standard dialect path. |
| SQL semantics | sqllogictest | 779 assertions, 61 files | Cases derived from PostgreSQL's regression suite, expressed in the sqllogictest format SQLite, CockroachDB, and DuckDB use for their own correctness work. |
| Real application | Odoo 19 — --init=base --test-tags=:TestORM |
11 / 11 cases, ~38k queries, zero translator errors | A 200-table ERP with one of the most demanding open-source ORM layers boots and passes its own test suite. |
| BI tool | Metabase native SQL | 20-probe MBQL sweep | Schema introspection, prepared statements, and result handling work for the paths real BI tools depend on. |
| Migration roundtrip | Chinook + Pagila pg_dump fixtures |
Chinook: byte-equal roundtrip. Pagila: schema parses, data loads. | A real Postgres database can be exported, replayed in pg-datahike, and dumped back — schema and data preserved through the round-trip. |
| Internal | Unit suite | 544 tests, 1603 assertions | Standard regression coverage. |
Per-commit suites run on CircleCI. Odoo, Metabase, and psql / libpq (\d, \dt, \df family) are run on a manual harness before each release. A dedicated compatibility page with linked test artifacts and a published gaps registry is in flight.
Try it
Download the jar from GitHub releases, java -jar pg-datahike-VERSION-standalone.jar, point psql at it. To embed in a JVM app, the coordinate is org.replikativ/pg-datahike on Clojars. Repo, docs, and issues at github.com/replikativ/pg-datahike; feedback to contact@datahike.io.
A follow-up post will cover the structural-sharing model that makes branching O(1), what merge! does, and the same workflow across every Datahike binding (Clojure, Java, JavaScript, Python, the C library, the CLI, and SQL). Subscribe to the RSS feed.