Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.datris.ai/llms.txt

Use this file to discover all available pages before exploring further.

Prerequisites

Quick Start

1. Clone the repository

git clone https://github.com/datris/datris-platform-oss.git
cd datris-platform-oss

2. Set your API keys

cp .env.example .env
Edit .env and add your API keys (at least one required for AI features):
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-proj-...

3. Start all services

docker compose up -d
Docker pulls the pre-built images from Docker Hub and starts the full stack. On first run, vault-init automatically seeds your API keys into Vault.

4. Verify

curl http://localhost:8080/api/v1/version
That’s it. The platform is running.

Upgrading

If you already have Datris installed and want to upgrade to the latest version:
cd datris-platform-oss
git pull origin main
docker compose pull
docker compose up -d --remove-orphans
This pulls the latest pre-built images from Docker Hub and restarts the services. No build tools required.
Always pass --remove-orphans when upgrading. Releases occasionally rename, remove, or repurpose service blocks in docker-compose.yml (e.g. v1.6.15 replaced the bundled Ollama service with TEI on the same host port 11434). Without --remove-orphans, the previous version’s container keeps running and holding the port, causing the new container to fail with Bind for 0.0.0.0:<port> failed: port is already allocated. The flag is safe — it only removes containers that are no longer defined in your current compose file. Volumes (and your data) are untouched.
Your data is preserved across upgrades — Postgres pipelines, Vault secrets, MinIO buckets, MongoDB collections, and Kafka state all live in Docker volumes that survive docker compose pull and docker compose up -d.

Stale secrets after an upgrade

Datris occasionally deprecates secret paths between versions (for example, v1.5.6 split AI configuration into three new Vault slots and stopped reading the old single-slot path). Because Vault data persists across upgrades, deprecated entries stay in Vault and continue to appear in the Secrets tab even though the server ignores them. To clean them up:
  • Targeted (preserves your data): delete the stale entries one at a time from the Secrets tab UI, or run docker compose exec -e VAULT_TOKEN=root-token vault vault kv delete secret/oss/<name> for each deprecated path.
  • Total reset (destroys all data): see the next section.

Reset everything (destroys all data)

If you don’t care about anything on this machine and want a completely fresh install — same data layout as a brand-new clone — wipe all volumes:
cd datris-platform-oss
docker compose down -v
docker compose up -d
Warning: docker compose down -v is destructive. The -v flag removes every Docker volume the project owns (both named volumes and the anonymous volumes the data services create automatically). You will lose:
  • All your pipelines, runs, taps, and metadata (Postgres datris database)
  • All Vault secrets — API keys, database credentials, AI configuration (will be re-seeded from your .env on next start, but only the defaults — any UI-edited overrides are gone)
  • All MinIO object storage — raw uploads, configs, temp files, processed outputs
  • MongoDB destination data
  • Queued messages and offsets in Kafka, Zookeeper, ActiveMQ
  • The bundled bge-m3 embedding model in the tei-data volume (will re-download ~2.2 GB on first start)
  • Cached pip wheels in the pip-cache volume (taps that need extras like yfinance will re-download on first run)
Only run down -v if you are certain none of the above matters, e.g. on a brand-new dev machine, after exporting anything you needed, or on a CI runner. Never run down -v on a production or shared instance. After docker compose up -d, vault-init.sh re-seeds the AI configuration secrets from your .env, MinIO buckets are recreated, Postgres starts empty, and the bundled embedding service re-downloads bge-m3 (a few minutes one-time).
Note for production deployments: the deploy/docker-compose.prod.yml file used by managed/dedicated installs uses bind mounts to host directories under /data/* instead of Docker volumes, so docker compose -f docker-compose.prod.yml down -v does not wipe the data — the host directories survive. To reset a prod install, you’d need to also delete the relevant /data/* directories on the host, which is a much riskier operation and not recommended outside of disaster recovery.

Volumes

The datris service uses one named Docker volume that is created automatically by docker compose up — no user action required:
VolumeMounted atPurpose
pip-cache/root/.cache/pipCaches downloaded pip wheel files so tap-installed packages don’t have to be re-downloaded
Commonly used packages (requests, beautifulsoup4, pandas, lxml, feedparser, boto3, google-cloud-storage, azure-storage-blob, openpyxl, pyyaml, python-dateutil, pytz) are baked into the image. When a tap needs something extra (e.g. yfinance), pip downloads it on first run (~30 seconds) and caches the wheel in pip-cache. Subsequent container restarts re-run pip install for those extras, but the install is near-instant because the wheel is already cached locally.

Services

ServicePortPurpose
Pipeline Server8080REST API and data processing
Pipeline UI4200Web dashboard
MCP Server3000AI agent integration (MCP protocol)
MinIO9000 (API), 9001 (Console)Object storage
MongoDB27017Configuration and status store
ActiveMQ61616 (broker), 8161 (console)Message queue and notifications
Vault8200Secrets management
Kafka9092Streaming (optional)
Kafka UI8085Kafka topic browser
PostgreSQL5432Database destination + pgvector
Zookeeper2181Kafka coordination

Web UIs

UIURLCredentials
Datris Platform UIhttp://localhost:4200none
Datris Platform APIhttp://localhost:8080none
MCP Server (SSE)http://localhost:3000/ssenone
MinIO Consolehttp://localhost:9001minioadmin / minioadmin
ActiveMQ Consolehttp://localhost:8161admin / admin
Kafka UIhttp://localhost:8085none
Vault UIhttp://localhost:8200Token: root-token

API Keys and AI Providers

Datris supports three AI providers. Set your keys in .env:
ProviderEnvironment VariableUsed For
Anthropic ClaudeANTHROPIC_API_KEYAI data quality, transformations, error explanation, schema generation, profiling
OpenAIOPENAI_API_KEYSame as above, plus embeddings for vector database / RAG
Ollama (local)OLLAMA_MODELSame as above — no API key needed, runs locally
At least one AI provider key is required for AI features. The embedding provider for RAG is configured via Vault secrets — see AI Configuration for details.

Infrastructure Details

MinIO

The minio-init container automatically creates the required buckets:
  • {env}-raw - File upload staging
  • {env}-raw-plus - Processed file staging
  • {env}-temp - Temporary processing files
  • {env}-data - Pipeline output (object store destination)
  • {env}-config - Configuration files (validation schemas)
Where {env} is the environment name (default: oss). See Configuration Reference for the environment setting.

Vault

The vault-init container seeds Vault with default secrets for all services (MinIO, ActiveMQ, MongoDB, PostgreSQL, Kafka) plus your AI provider API keys from .env. Vault runs in dev mode with root token root-token.

Vector Databases

pgvector is included by default via the PostgreSQL service. To add other vector databases, uncomment the relevant sections in docker-compose.yml:
  • Qdrant — high-performance vector database
  • Weaviate — open-source vector database
  • Chroma — lightweight, single container
  • Milvus — scalable vector database (requires separate setup)

Configuration

The pipeline server reads configuration from application.yaml, mounted from docker/config/application.yaml. See Configuration Reference for the full list of properties.

Building from Source

For development or contributing:

Prerequisites

Build and run

# Build the server JAR
sbt clean assembly

# Start with local builds (edit docker-compose.yml to uncomment build: lines)
docker compose up --build
In docker-compose.yml, uncomment the build: lines and comment out the image: lines for the services you want to build locally:
datris:
  # image: datris/datris-server:latest
  build: .  # Build from source