Collaborate Without Infrastructure
The standard way to share data between teams involves a pipeline: extract, transform, load. Or an API: design the contract, implement the server, manage versioning. Either way, there is infrastructure to build and maintain before any query can happen.
Datahike works differently, because of two properties that compound.
Databases are values. When you dereference a Datahike connection, you get a database value - an immutable snapshot frozen at a point in time. It does not change. You can pass it to a function, store it in a variable, serialize it, or hand it to a colleague. It behaves like data, not like a service.
Readers connect directly to storage. Datahike’s Distributed Index Space means that to read a database, you only need access to its storage backend - an S3 bucket, a filesystem path, a JDBC connection. No server process to start, no connection protocol to negotiate, no port to expose.
Together: if you can access the storage, you can query the database. And Datalog’s native multi-database join lets you combine databases from different sources in a single expression.
Cross-org join in practice
Team A maintains a product catalog. Team B maintains inventory. Both use Datahike on S3. A third team that needs to combine this data:
(def catalog (d/connect {:store {:backend :s3 :bucket "team-a"}}))
(def inventory (d/connect {:store {:backend :s3 :bucket "team-b"}}))
;; Datalog's :in clause accepts multiple database values
(d/q '[:find ?name ?price ?stock
:in $cat $inv
:where [$cat ?p :product/sku ?sku]
[$cat ?p :product/name ?name]
[$cat ?p :product/price ?price]
[$inv ?i :stock/sku ?sku]
[$inv ?i :stock/count ?stock]
[(> ?stock 0)]]
@catalog @inventory)
@catalog and @inventory dereference to immutable database values. Datahike joins them locally. No network round-trips for reads, no server coordinating the join, no data copied between systems.
Time travel across databases
Because both databases are values, you can join historical snapshots just as easily:
;; Join last quarter's catalog against current inventory
(def old-catalog (d/as-of @catalog #inst "2025-11-01"))
(d/q '[:find ?name ?stock
:in $cat $inv
:where [$cat ?p :product/sku ?sku]
[$cat ?p :product/name ?name]
[$inv ?i :stock/sku ?sku]
[$inv ?i :stock/count ?stock]]
old-catalog @inventory)
Useful for audits, regulatory reproducibility, and debugging: “what would this report have shown against last quarter’s data?”
Multi-tenant patterns
The same model supports efficient multi-tenant architectures. Instead of one database with a tenant-id column - with the performance and isolation problems that brings - you create one database per tenant:
;; Full isolation per tenant, no schema pollution
(defn tenant-conn [tenant-id]
(d/connect {:store {:backend :s3 :bucket (str "tenant-" tenant-id)}}))
;; Cross-tenant analytics (admin) - join as needed, fully explicit
(d/q '[:find ?tenant (count ?order)
:in [[?tenant $db]] ...
:where [$db _ :order/status :completed]]
(map (fn [id] [id @(tenant-conn id)]) tenant-ids))
Each tenant’s data is isolated. Cross-tenant queries are opt-in and explicit. Tenants can share a read-only reference database (e.g., a product catalog) without data duplication - they connect to the same storage.
What this replaces
The traditional alternatives all add infrastructure. Federated SQL needs database links and foreign server setup, often with write access on both sides. ETL pipelines duplicate data, introduce staleness, and need ongoing maintenance. API contracts mean versioned endpoints and schema negotiation. Data mesh adds a governance framework on top.
None of those are wrong. But for teams that both use Datahike and share storage access, this infrastructure simply does not exist - there is nothing to build.
What you need
Read access to the storage backend. That’s it. The database you are joining does not need a running server, an exposed port, or any configuration changes on the owning team’s side.
Write operations go through a single writer endpoint (HTTP or Kabel WebSocket), but reading - including cross-database joining - is purely local and requires no coordination.
The same model extends to browsers. Using Kabel WebSocket sync and konserve-sync, a browser client can replicate a database locally into IndexedDB and query it with zero network round-trips - updates sync differentially as they happen, transmitting only the changed chunks. See the JavaScript API docs for the full setup.
If you are building data infrastructure where multiple teams or systems need to share and combine data, get in touch.