Skip to main content

Prerequisites

Quick Start

All you need is Docker — no git checkout, no build tools. The installer pulls the pre-built images from Docker Hub, fetches the few runtime files Compose needs into a ./datris directory, seeds a .env, and starts the full stack:
curl -fsSL https://get.datris.ai/install.sh | sh
It prompts for an AI provider key (Anthropic recommended; press Enter to use OpenAI instead), then runs everything. On first start, vault-init seeds your key into Vault. That’s it — the platform is running.
The install.sh installer is a POSIX shell script, so it runs on macOS and Linux. On Windows, run it from a POSIX shell — WSL2 (recommended) or Git Bash — or skip the installer and use the single-file Compose option below, which works natively in PowerShell.
Or, for a fully self-contained single file (no installer) — the init scripts and config are inlined, so nothing else is needed (requires Docker Compose ≥ 2.23). This is the simplest path on Windows:
curl -O https://get.datris.ai/docker-compose.standalone.yml
ANTHROPIC_API_KEY=sk-ant-... docker compose -f docker-compose.standalone.yml up -d
In PowerShell, use curl.exe (the bundled curl alias maps to Invoke-WebRequest and takes different flags), and set the key with $env: since the inline KEY=value command syntax is bash-only.

Verify

curl http://localhost:8080/api/v1/version

Alternative: install from a git clone

If you’d rather work from a checked-out repository — for example to track the source, customize docker-compose.yml, or contribute — clone the repo and start from the checked-out compose file:
git clone https://github.com/datris/datris-platform-oss.git
cd datris-platform-oss
cp .env.example .env       # add ANTHROPIC_API_KEY and/or OPENAI_API_KEY
docker compose up -d
This still pulls the pre-built images from Docker Hub — no build tools required. To build the images yourself, see Building from Source below.

Upgrading

Upgrading means two things: pulling the latest pre-built images and refreshing the Compose file itself, since a new release can rename, add, or drop services. How you do it depends on how you installed. In every case your data is preserved — it lives in Docker volumes that survive an upgrade. Installer (the curl … | sh Quick Start): just re-run the install command. It re-downloads the latest compose file and runtime scripts, leaves your .env and data untouched, pulls the new images, and restarts.
curl -fsSL https://get.datris.ai/install.sh | sh
Single-file (docker-compose.standalone.yml): re-download the file (it carries the latest topology and inlined scripts), then pull and restart.
curl -O https://get.datris.ai/docker-compose.standalone.yml
docker compose -f docker-compose.standalone.yml pull
docker compose -f docker-compose.standalone.yml up -d --remove-orphans
From source (cloned repo): pull the latest compose file from git, then pull images and restart.
cd datris-platform-oss
git pull origin main
docker compose pull
docker compose up -d --remove-orphans
No build tools required for any of these — the images come pre-built from Docker Hub.
Always pass --remove-orphans when upgrading manually. (The installer does this for you.) Releases occasionally rename, remove, or repurpose service blocks in docker-compose.yml (e.g. v1.6.15 replaced the bundled Ollama service with TEI on the same host port 11434). Without --remove-orphans, the previous version’s container keeps running and holding the port, causing the new container to fail with Bind for 0.0.0.0:<port> failed: port is already allocated. The flag is safe — it only removes containers that are no longer defined in your current compose file. Volumes (and your data) are untouched.
Your data is preserved across upgrades — Postgres pipelines, Vault secrets, MinIO buckets, MongoDB collections, and Kafka state all live in Docker volumes that survive docker compose pull and docker compose up -d.

Stale secrets after an upgrade

Datris occasionally deprecates secret paths between versions (for example, v1.5.6 split AI configuration into three new Vault slots and stopped reading the old single-slot path). Because Vault data persists across upgrades, deprecated entries stay in Vault and continue to appear in the Configuration → Secrets list even though the server ignores them. To clean them up:
  • Targeted (preserves your data): delete the stale entries one at a time from Configuration → Secrets in the UI, or run docker compose exec -e VAULT_TOKEN=root-token vault vault kv delete secret/oss/<name> for each deprecated path.
  • Total reset (destroys all data): see the next section.

Reset everything (destroys all data)

If you don’t care about anything on this machine and want a completely fresh install — same data layout as a brand-new clone — wipe all volumes:
cd datris            # your install dir: ./datris (installer), the standalone file's dir, or your clone
docker compose down -v
docker compose up -d
Warning: docker compose down -v is destructive. The -v flag removes every Docker volume the project owns (both named volumes and the anonymous volumes the data services create automatically). You will lose:
  • All your pipelines, runs, taps, and metadata (Postgres datris database)
  • All Vault secrets — API keys, database credentials, AI configuration (will be re-seeded from your .env on next start, but only the defaults — any UI-edited overrides are gone)
  • All MinIO object storage — raw uploads, configs, temp files, processed outputs
  • MongoDB destination data
  • Queued messages and offsets in Kafka, Zookeeper, ActiveMQ
  • The bundled bge-m3 embedding model in the tei-data volume (will re-download ~2.2 GB on first start)
  • Cached pip wheels in the pip-cache volume (taps that need extras like yfinance will re-download on first run)
Only run down -v if you are certain none of the above matters, e.g. on a brand-new dev machine, after exporting anything you needed, or on a CI runner. Never run down -v on a production or shared instance. After docker compose up -d, vault-init.sh re-seeds the AI configuration secrets from your .env, MinIO buckets are recreated, Postgres starts empty, and the bundled embedding service re-downloads bge-m3 (a few minutes one-time).
Note for production deployments: the deploy/docker-compose.prod.yml file used by managed/dedicated installs uses bind mounts to host directories under /data/* instead of Docker volumes, so docker compose -f docker-compose.prod.yml down -v does not wipe the data — the host directories survive. To reset a prod install, you’d need to also delete the relevant /data/* directories on the host, which is a much riskier operation and not recommended outside of disaster recovery.

Volumes

The datris service uses one named Docker volume that is created automatically by docker compose up — no user action required:
VolumeMounted atPurpose
pip-cache/root/.cache/pipCaches downloaded pip wheel files so tap-installed packages don’t have to be re-downloaded
Commonly used packages (requests, beautifulsoup4, pandas, lxml, feedparser, boto3, google-cloud-storage, azure-storage-blob, openpyxl, pyyaml, python-dateutil, pytz) are baked into the image. When a tap needs something extra (e.g. yfinance), pip downloads it on first run (~30 seconds) and caches the wheel in pip-cache. Subsequent container restarts re-run pip install for those extras, but the install is near-instant because the wheel is already cached locally.

Services

ServicePortPurpose
Pipeline Server8080REST API and data processing
Pipeline UI4200Web dashboard
MCP Server3000AI agent integration (MCP protocol)
MinIO9000 (API), 9001 (Console)Object storage
MongoDB27017Configuration and status store
ActiveMQ61616 (broker), 8161 (console)Message queue and notifications
Vault8200Secrets management
PostgreSQL5432Database destination + pgvector
Kafka9092Streaming — optional, disabled by default (uncomment in docker-compose.yml to run)
Zookeeper2181Kafka coordination — optional, disabled by default (uncomment with Kafka)
Kafka UI8085Kafka topic browser — optional, disabled by default (uncomment with Kafka)
Kafka, Zookeeper, and Kafka UI ship commented out in docker-compose.yml. Datris bundles kafka-clients, so pipelines that produce to or consume from an external Kafka work without these. Uncomment the Kafka/Zookeeper/kafka-ui service blocks (and their volumes) only if you want a local broker.

Web UIs

UIURLCredentials
Datris Platform UIhttp://localhost:4200none
Datris Platform APIhttp://localhost:8080none
MCP Server (SSE)http://localhost:3000/ssenone
MinIO Consolehttp://localhost:9001minioadmin / minioadmin
ActiveMQ Consolehttp://localhost:8161admin / admin
Kafka UI (optional)http://localhost:8085none — only when the optional Kafka services are uncommented in docker-compose.yml
Vault UIhttp://localhost:8200Token: root-token

API Keys and AI Providers

Datris supports three AI providers. Set your keys in .env:
ProviderEnvironment VariableUsed For
Anthropic ClaudeANTHROPIC_API_KEYAI data quality, transformations, error explanation, schema generation, profiling
OpenAIOPENAI_API_KEYSame as above, plus embeddings for vector database / RAG
Ollama (local)OLLAMA_MODELSame as above — no API key needed, runs locally
At least one AI provider key is required for AI features. The embedding provider for RAG is configured via Vault secrets — see AI Configuration for details.

Infrastructure Details

MinIO

The minio-init container automatically creates the required buckets:
  • {env}-raw - File upload staging
  • {env}-raw-plus - Processed file staging
  • {env}-temp - Temporary processing files
  • {env}-data - Pipeline output (object store destination)
  • {env}-config - Configuration files (validation schemas)
Where {env} is the environment name (default: oss). See Configuration Reference for the environment setting.

Vault

On first boot, the vault-init container seeds Vault with default secrets for the bundled services (MinIO, ActiveMQ, MongoDB, PostgreSQL) plus your AI provider API keys from .env. Vault uses durable file storage on the vault-data volume, so secrets — including any you add later in the Configuration tab or via taps — persist across restarts and rebuilds. .env is the first-boot seed only; after that the Configuration tab is the source of truth. See How Configuration Persists for details and the clean-reset path.

Vector Databases

pgvector is included by default via the PostgreSQL service. To add other vector databases, uncomment the relevant sections in docker-compose.yml:
  • Qdrant — high-performance vector database
  • Weaviate — open-source vector database
  • Chroma — lightweight, single container
  • Milvus — scalable vector database (requires separate setup)

Configuration

The pipeline server reads configuration from application.yaml, mounted from docker/config/application.yaml. See Configuration Reference for the full list of properties.

JVM Heap Sizing

The datris service runs a Spring Boot JVM. Its heap is governed by the JAVA_OPTS environment variable, passed in via docker-compose.yml. The default is sized to fit comfortably on an 8 GB host alongside the bundled TEI embedder, Postgres, MongoDB, MinIO, ActiveMQ, Vault, the UI, and the MCP server:
JAVA_OPTS=-Xms512m -Xmx2g   # default
Override in your .env file on larger hosts. Suggested sizings:
Host RAMSuggested -Xmx
16 GB-Xmx4g
24 GB-Xmx8g
32 GB+-Xmx12g
Example for a 24 GB host:
# .env
JAVA_OPTS=-Xms1g -Xmx8g
Then restart the service:
docker compose up -d datris
Docker Desktop note (macOS / Windows): Docker Desktop runs containers inside a Linux VM with its own RAM allocation, which may be much lower than your host’s total RAM. Check the VM ceiling with:
docker info --format '{{.MemTotal}}' | awk '{printf "%.1f GB\n", $1/1024/1024/1024}'
If the VM is capped below what you need, raise it in Docker Desktop → Settings → Resources → Memory. The JVM -Xmx must be smaller than the Docker VM’s allocation, otherwise the kernel inside the VM will OOM-kill containers under load.

Building from Source

For development or contributing:

Prerequisites

Build and run

# Build the server JAR
sbt clean assembly

# Start with local builds (edit docker-compose.yml to uncomment build: lines)
docker compose up --build
In docker-compose.yml, uncomment the build: lines and comment out the image: lines for the services you want to build locally:
datris:
  # image: datris/datris-server:latest
  build: .  # Build from source