Installation

Prerequisites

Docker

Quick Start

1. Clone the repository

git clone https://github.com/datris/datris-platform-oss.git
cd datris-platform-oss

2. Set your API keys

cp .env.example .env

Edit .env and add your API keys (at least one required for AI features):

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-proj-...

3. Start all services

docker compose up -d

Docker pulls the pre-built images from Docker Hub and starts the full stack. On first run, vault-init automatically seeds your API keys into Vault.

4. Verify

curl http://localhost:8080/api/v1/version

That’s it. The platform is running.

Upgrading

If you already have Datris installed and want to upgrade to the latest version:

cd datris-platform-oss
git pull origin main
docker compose pull
docker compose up -d --remove-orphans

This pulls the latest pre-built images from Docker Hub and restarts the services. No build tools required.

Always pass --remove-orphans when upgrading. Releases occasionally rename, remove, or repurpose service blocks in docker-compose.yml (e.g. v1.6.15 replaced the bundled Ollama service with TEI on the same host port 11434). Without --remove-orphans, the previous version’s container keeps running and holding the port, causing the new container to fail with Bind for 0.0.0.0:<port> failed: port is already allocated. The flag is safe — it only removes containers that are no longer defined in your current compose file. Volumes (and your data) are untouched.

Your data is preserved across upgrades — Postgres pipelines, Vault secrets, MinIO buckets, MongoDB collections, and Kafka state all live in Docker volumes that survive docker compose pull and docker compose up -d.

Stale secrets after an upgrade

Datris occasionally deprecates secret paths between versions (for example, v1.5.6 split AI configuration into three new Vault slots and stopped reading the old single-slot path). Because Vault data persists across upgrades, deprecated entries stay in Vault and continue to appear in the Secrets tab even though the server ignores them. To clean them up:

Targeted (preserves your data): delete the stale entries one at a time from the Secrets tab UI, or run docker compose exec -e VAULT_TOKEN=root-token vault vault kv delete secret/oss/<name> for each deprecated path.
Total reset (destroys all data): see the next section.

Reset everything (destroys all data)

If you don’t care about anything on this machine and want a completely fresh install — same data layout as a brand-new clone — wipe all volumes:

cd datris-platform-oss
docker compose down -v
docker compose up -d

Warning: docker compose down -v is destructive. The -v flag removes every Docker volume the project owns (both named volumes and the anonymous volumes the data services create automatically). You will lose:

All your pipelines, runs, taps, and metadata (Postgres datris database)
All Vault secrets — API keys, database credentials, AI configuration (will be re-seeded from your .env on next start, but only the defaults — any UI-edited overrides are gone)
All MinIO object storage — raw uploads, configs, temp files, processed outputs
MongoDB destination data
Queued messages and offsets in Kafka, Zookeeper, ActiveMQ
The bundled bge-m3 embedding model in the tei-data volume (will re-download ~2.2 GB on first start)
Cached pip wheels in the pip-cache volume (taps that need extras like yfinance will re-download on first run)

Only run down -v if you are certain none of the above matters, e.g. on a brand-new dev machine, after exporting anything you needed, or on a CI runner. Never run down -v on a production or shared instance. After docker compose up -d, vault-init.sh re-seeds the AI configuration secrets from your .env, MinIO buckets are recreated, Postgres starts empty, and the bundled embedding service re-downloads bge-m3 (a few minutes one-time).

Note for production deployments: the deploy/docker-compose.prod.yml file used by managed/dedicated installs uses bind mounts to host directories under /data/* instead of Docker volumes, so docker compose -f docker-compose.prod.yml down -v does not wipe the data — the host directories survive. To reset a prod install, you’d need to also delete the relevant /data/* directories on the host, which is a much riskier operation and not recommended outside of disaster recovery.

Volumes

The datris service uses one named Docker volume that is created automatically by docker compose up — no user action required:

Volume	Mounted at	Purpose
`pip-cache`	`/root/.cache/pip`	Caches downloaded pip wheel files so tap-installed packages don’t have to be re-downloaded

Commonly used packages (requests, beautifulsoup4, pandas, lxml, feedparser, boto3, google-cloud-storage, azure-storage-blob, openpyxl, pyyaml, python-dateutil, pytz) are baked into the image. When a tap needs something extra (e.g. yfinance), pip downloads it on first run (~30 seconds) and caches the wheel in pip-cache. Subsequent container restarts re-run pip install for those extras, but the install is near-instant because the wheel is already cached locally.

Services

Service	Port	Purpose
Pipeline Server	8080	REST API and data processing
Pipeline UI	4200	Web dashboard
MCP Server	3000	AI agent integration (MCP protocol)
MinIO	9000 (API), 9001 (Console)	Object storage
MongoDB	27017	Configuration and status store
ActiveMQ	61616 (broker), 8161 (console)	Message queue and notifications
Vault	8200	Secrets management
Kafka	9092	Streaming (optional)
Kafka UI	8085	Kafka topic browser
PostgreSQL	5432	Database destination + pgvector
Zookeeper	2181	Kafka coordination

Web UIs

UI	URL	Credentials
Datris Platform UI	http://localhost:4200	none
Datris Platform API	http://localhost:8080	none
MCP Server (SSE)	http://localhost:3000/sse	none
MinIO Console	http://localhost:9001	`minioadmin` / `minioadmin`
ActiveMQ Console	http://localhost:8161	`admin` / `admin`
Kafka UI	http://localhost:8085	none
Vault UI	http://localhost:8200	Token: `root-token`

API Keys and AI Providers

Datris supports three AI providers. Set your keys in .env:

Provider	Environment Variable	Used For
Anthropic Claude	`ANTHROPIC_API_KEY`	AI data quality, transformations, error explanation, schema generation, profiling
OpenAI	`OPENAI_API_KEY`	Same as above, plus embeddings for vector database / RAG
Ollama (local)	`OLLAMA_MODEL`	Same as above — no API key needed, runs locally

At least one AI provider key is required for AI features. The embedding provider for RAG is configured via Vault secrets — see AI Configuration for details.

Infrastructure Details

MinIO

The minio-init container automatically creates the required buckets:

{env}-raw - File upload staging
{env}-raw-plus - Processed file staging
{env}-temp - Temporary processing files
{env}-data - Pipeline output (object store destination)
{env}-config - Configuration files (validation schemas)

Where {env} is the environment name (default: oss). See Configuration Reference for the environment setting.

Vault

The vault-init container seeds Vault with default secrets for all services (MinIO, ActiveMQ, MongoDB, PostgreSQL, Kafka) plus your AI provider API keys from .env. Vault runs in dev mode with root token root-token.

Vector Databases

pgvector is included by default via the PostgreSQL service. To add other vector databases, uncomment the relevant sections in docker-compose.yml:

Qdrant — high-performance vector database
Weaviate — open-source vector database
Chroma — lightweight, single container
Milvus — scalable vector database (requires separate setup)

Configuration

The pipeline server reads configuration from application.yaml, mounted from docker/config/application.yaml. See Configuration Reference for the full list of properties.

Building from Source

For development or contributing:

Prerequisites

Java 17+
SBT

Build and run

# Build the server JAR
sbt clean assembly

# Start with local builds (edit docker-compose.yml to uncomment build: lines)
docker compose up --build

In docker-compose.yml, uncomment the build: lines and comment out the image: lines for the services you want to build locally:

datris:
  # image: datris/datris-server:latest
  build: .  # Build from source

Getting Started

Discovery

Taps

Ingestion

Destinations

Data Quality

Transformation

AI Features

Configuration

Examples

Prerequisites

Quick Start

1. Clone the repository

2. Set your API keys

3. Start all services

4. Verify

Upgrading

Stale secrets after an upgrade

Reset everything (destroys all data)

Volumes

Services

Web UIs

API Keys and AI Providers

Infrastructure Details

MinIO

Vault

Vector Databases

Configuration

Building from Source

Prerequisites

Build and run

Getting Started

Discovery

Taps

Ingestion

Destinations

Data Quality

Transformation

AI Features

Configuration

Examples

Documentation Index

​Prerequisites

​Quick Start

​1. Clone the repository

​2. Set your API keys

​3. Start all services

​4. Verify

​Upgrading

​Stale secrets after an upgrade

​Reset everything (destroys all data)

​Volumes

​Services

​Web UIs

​API Keys and AI Providers

​Infrastructure Details

​MinIO

​Vault

​Vector Databases

​Configuration

​Building from Source

​Prerequisites

​Build and run

Prerequisites

Quick Start

1. Clone the repository

2. Set your API keys

3. Start all services

4. Verify

Upgrading

Stale secrets after an upgrade

Reset everything (destroys all data)

Volumes

Services

Web UIs

API Keys and AI Providers

Infrastructure Details

MinIO

Vault

Vector Databases

Configuration

Building from Source

Prerequisites

Build and run