All you need is Docker — no git checkout, no build tools. The installer pulls
the pre-built images from Docker Hub, fetches the few runtime files Compose
needs into a ./datris directory, seeds a .env, and starts the full stack:
curl -fsSL https://get.datris.ai/install.sh | sh
It prompts for an AI provider key (Anthropic recommended; press Enter to use
OpenAI instead), then runs everything. On first start, vault-init seeds your
key into Vault. That’s it — the platform is running.
The install.sh installer is a POSIX shell script, so it runs on macOS and
Linux. On Windows, run it from a POSIX shell — WSL2 (recommended) or
Git Bash — or skip the installer and use the single-file Compose option
below, which works natively in PowerShell.
Or, for a fully self-contained single file (no installer) — the init scripts
and config are inlined, so nothing else is needed (requires Docker Compose
≥ 2.23). This is the simplest path on Windows:
curl -O https://get.datris.ai/docker-compose.standalone.ymlANTHROPIC_API_KEY=sk-ant-... docker compose -f docker-compose.standalone.yml up -d
In PowerShell, use curl.exe (the bundled curl alias maps to
Invoke-WebRequest and takes different flags), and set the key with $env:
since the inline KEY=value command syntax is bash-only.
If you’d rather work from a checked-out repository — for example to track the
source, customize docker-compose.yml, or contribute — clone the repo and start
from the checked-out compose file:
Upgrading means two things: pulling the latest pre-built images and
refreshing the Compose file itself, since a new release can rename, add, or drop
services. How you do it depends on how you installed. In every case your data is
preserved — it lives in Docker volumes that survive an upgrade.Installer (the curl … | sh Quick Start): just re-run the install command.
It re-downloads the latest compose file and runtime scripts, leaves your .env
and data untouched, pulls the new images, and restarts.
curl -fsSL https://get.datris.ai/install.sh | sh
Single-file (docker-compose.standalone.yml): re-download the file (it
carries the latest topology and inlined scripts), then pull and restart.
From source (cloned repo): pull the latest compose file from git, then pull
images and restart.
cd datris-platform-ossgit pull origin maindocker compose pulldocker compose up -d --remove-orphans
No build tools required for any of these — the images come pre-built from Docker Hub.
Always pass --remove-orphans when upgrading manually. (The installer does this for you.) Releases occasionally rename, remove, or repurpose service blocks in docker-compose.yml (e.g. v1.6.15 replaced the bundled Ollama service with TEI on the same host port 11434). Without --remove-orphans, the previous version’s container keeps running and holding the port, causing the new container to fail with Bind for 0.0.0.0:<port> failed: port is already allocated. The flag is safe — it only removes containers that are no longer defined in your current compose file. Volumes (and your data) are untouched.
Your data is preserved across upgrades — Postgres pipelines, Vault secrets, MinIO buckets, MongoDB collections, and Kafka state all live in Docker volumes that survive docker compose pull and docker compose up -d.
Datris occasionally deprecates secret paths between versions (for example, v1.5.6 split AI configuration into three new Vault slots and stopped reading the old single-slot path). Because Vault data persists across upgrades, deprecated entries stay in Vault and continue to appear in the Configuration → Secrets list even though the server ignores them. To clean them up:
Targeted (preserves your data): delete the stale entries one at a time from Configuration → Secrets in the UI, or run docker compose exec -e VAULT_TOKEN=root-token vault vault kv delete secret/oss/<name> for each deprecated path.
Total reset (destroys all data): see the next section.
If you don’t care about anything on this machine and want a completely fresh install — same data layout as a brand-new clone — wipe all volumes:
cd datris # your install dir: ./datris (installer), the standalone file's dir, or your clonedocker compose down -vdocker compose up -d
Warning:docker compose down -v is destructive. The -v flag removes every Docker volume the project owns (both named volumes and the anonymous volumes the data services create automatically). You will lose:
All your pipelines, runs, taps, and metadata (Postgres datris database)
All Vault secrets — API keys, database credentials, AI configuration (will be re-seeded from your .env on next start, but only the defaults — any UI-edited overrides are gone)
All MinIO object storage — raw uploads, configs, temp files, processed outputs
MongoDB destination data
Queued messages and offsets in Kafka, Zookeeper, ActiveMQ
The bundled bge-m3 embedding model in the tei-data volume (will re-download ~2.2 GB on first start)
Cached pip wheels in the pip-cache volume (taps that need extras like yfinance will re-download on first run)
Only run down -v if you are certain none of the above matters, e.g. on a brand-new dev machine, after exporting anything you needed, or on a CI runner. Never run down -v on a production or shared instance.After docker compose up -d, vault-init.sh re-seeds the AI configuration secrets from your .env, MinIO buckets are recreated, Postgres starts empty, and the bundled embedding service re-downloads bge-m3 (a few minutes one-time).
Note for production deployments: the deploy/docker-compose.prod.yml file used by managed/dedicated installs uses bind mounts to host directories under /data/* instead of Docker volumes, so docker compose -f docker-compose.prod.yml down -v does not wipe the data — the host directories survive. To reset a prod install, you’d need to also delete the relevant /data/* directories on the host, which is a much riskier operation and not recommended outside of disaster recovery.
The datris service uses one named Docker volume that is created automatically by docker compose up — no user action required:
Volume
Mounted at
Purpose
pip-cache
/root/.cache/pip
Caches downloaded pip wheel files so tap-installed packages don’t have to be re-downloaded
Commonly used packages (requests, beautifulsoup4, pandas, lxml, feedparser, boto3, google-cloud-storage, azure-storage-blob, openpyxl, pyyaml, python-dateutil, pytz) are baked into the image. When a tap needs something extra (e.g. yfinance), pip downloads it on first run (~30 seconds) and caches the wheel in pip-cache. Subsequent container restarts re-run pip install for those extras, but the install is near-instant because the wheel is already cached locally.
Streaming — optional, disabled by default (uncomment in docker-compose.yml to run)
Zookeeper
2181
Kafka coordination — optional, disabled by default (uncomment with Kafka)
Kafka UI
8085
Kafka topic browser — optional, disabled by default (uncomment with Kafka)
Kafka, Zookeeper, and Kafka UI ship commented out in docker-compose.yml. Datris bundles kafka-clients, so pipelines that produce to or consume from an external Kafka work without these. Uncomment the Kafka/Zookeeper/kafka-ui service blocks (and their volumes) only if you want a local broker.
Datris supports three AI providers. Set your keys in .env:
Provider
Environment Variable
Used For
Anthropic Claude
ANTHROPIC_API_KEY
AI data quality, transformations, error explanation, schema generation, profiling
OpenAI
OPENAI_API_KEY
Same as above, plus embeddings for vector database / RAG
Ollama (local)
OLLAMA_MODEL
Same as above — no API key needed, runs locally
At least one AI provider key is required for AI features. The embedding provider for RAG is configured via Vault secrets — see AI Configuration for details.
On first boot, the vault-init container seeds Vault with default secrets for the bundled services (MinIO, ActiveMQ, MongoDB, PostgreSQL) plus your AI provider API keys from .env. Vault uses durable file storage on the vault-data volume, so secrets — including any you add later in the Configuration tab or via taps — persist across restarts and rebuilds. .env is the first-boot seed only; after that the Configuration tab is the source of truth. See How Configuration Persists for details and the clean-reset path.
The pipeline server reads configuration from application.yaml, mounted from docker/config/application.yaml.See Configuration Reference for the full list of properties.
The datris service runs a Spring Boot JVM. Its heap is governed by the JAVA_OPTS environment variable, passed in via docker-compose.yml. The default is sized to fit comfortably on an 8 GB host alongside the bundled TEI embedder, Postgres, MongoDB, MinIO, ActiveMQ, Vault, the UI, and the MCP server:
JAVA_OPTS=-Xms512m -Xmx2g # default
Override in your .env file on larger hosts. Suggested sizings:
Host RAM
Suggested -Xmx
16 GB
-Xmx4g
24 GB
-Xmx8g
32 GB+
-Xmx12g
Example for a 24 GB host:
# .envJAVA_OPTS=-Xms1g -Xmx8g
Then restart the service:
docker compose up -d datris
Docker Desktop note (macOS / Windows): Docker Desktop runs containers inside a Linux VM with its own RAM allocation, which may be much lower than your host’s total RAM. Check the VM ceiling with:
docker info --format '{{.MemTotal}}' | awk '{printf "%.1f GB\n", $1/1024/1024/1024}'
If the VM is capped below what you need, raise it in Docker Desktop → Settings → Resources → Memory. The JVM -Xmx must be smaller than the Docker VM’s allocation, otherwise the kernel inside the VM will OOM-kill containers under load.