Documentation Index
Fetch the complete documentation index at: https://docs.datris.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Quick Start
1. Clone the repository
git clone https://github.com/datris/datris-platform-oss.git
cd datris-platform-oss
2. Set your API keys
Edit .env and add your API keys (at least one required for AI features):
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-proj-...
3. Start all services
Docker pulls the pre-built images from Docker Hub and starts the full stack. On first run, vault-init automatically seeds your API keys into Vault.
4. Verify
curl http://localhost:8080/api/v1/version
That’s it. The platform is running.
Upgrading
If you already have Datris installed and want to upgrade to the latest version:
cd datris-platform-oss
git pull origin main
docker compose pull
docker compose up -d --remove-orphans
This pulls the latest pre-built images from Docker Hub and restarts the services. No build tools required.
Always pass --remove-orphans when upgrading. Releases occasionally rename, remove, or repurpose service blocks in docker-compose.yml (e.g. v1.6.15 replaced the bundled Ollama service with TEI on the same host port 11434). Without --remove-orphans, the previous version’s container keeps running and holding the port, causing the new container to fail with Bind for 0.0.0.0:<port> failed: port is already allocated. The flag is safe — it only removes containers that are no longer defined in your current compose file. Volumes (and your data) are untouched.
Your data is preserved across upgrades — Postgres pipelines, Vault secrets, MinIO buckets, MongoDB collections, and Kafka state all live in Docker volumes that survive docker compose pull and docker compose up -d.
Stale secrets after an upgrade
Datris occasionally deprecates secret paths between versions (for example, v1.5.6 split AI configuration into three new Vault slots and stopped reading the old single-slot path). Because Vault data persists across upgrades, deprecated entries stay in Vault and continue to appear in the Secrets tab even though the server ignores them. To clean them up:
- Targeted (preserves your data): delete the stale entries one at a time from the Secrets tab UI, or run
docker compose exec -e VAULT_TOKEN=root-token vault vault kv delete secret/oss/<name> for each deprecated path.
- Total reset (destroys all data): see the next section.
Reset everything (destroys all data)
If you don’t care about anything on this machine and want a completely fresh install — same data layout as a brand-new clone — wipe all volumes:
cd datris-platform-oss
docker compose down -v
docker compose up -d
Warning: docker compose down -v is destructive. The -v flag removes every Docker volume the project owns (both named volumes and the anonymous volumes the data services create automatically). You will lose:
- All your pipelines, runs, taps, and metadata (Postgres
datris database)
- All Vault secrets — API keys, database credentials, AI configuration (will be re-seeded from your
.env on next start, but only the defaults — any UI-edited overrides are gone)
- All MinIO object storage — raw uploads, configs, temp files, processed outputs
- MongoDB destination data
- Queued messages and offsets in Kafka, Zookeeper, ActiveMQ
- The bundled
bge-m3 embedding model in the tei-data volume (will re-download ~2.2 GB on first start)
- Cached pip wheels in the
pip-cache volume (taps that need extras like yfinance will re-download on first run)
Only run down -v if you are certain none of the above matters, e.g. on a brand-new dev machine, after exporting anything you needed, or on a CI runner. Never run down -v on a production or shared instance.
After docker compose up -d, vault-init.sh re-seeds the AI configuration secrets from your .env, MinIO buckets are recreated, Postgres starts empty, and the bundled embedding service re-downloads bge-m3 (a few minutes one-time).
Note for production deployments: the deploy/docker-compose.prod.yml file used by managed/dedicated installs uses bind mounts to host directories under /data/* instead of Docker volumes, so docker compose -f docker-compose.prod.yml down -v does not wipe the data — the host directories survive. To reset a prod install, you’d need to also delete the relevant /data/* directories on the host, which is a much riskier operation and not recommended outside of disaster recovery.
Volumes
The datris service uses one named Docker volume that is created automatically by docker compose up — no user action required:
| Volume | Mounted at | Purpose |
|---|
pip-cache | /root/.cache/pip | Caches downloaded pip wheel files so tap-installed packages don’t have to be re-downloaded |
Commonly used packages (requests, beautifulsoup4, pandas, lxml, feedparser, boto3, google-cloud-storage, azure-storage-blob, openpyxl, pyyaml, python-dateutil, pytz) are baked into the image. When a tap needs something extra (e.g. yfinance), pip downloads it on first run (~30 seconds) and caches the wheel in pip-cache. Subsequent container restarts re-run pip install for those extras, but the install is near-instant because the wheel is already cached locally.
Services
| Service | Port | Purpose |
|---|
| Pipeline Server | 8080 | REST API and data processing |
| Pipeline UI | 4200 | Web dashboard |
| MCP Server | 3000 | AI agent integration (MCP protocol) |
| MinIO | 9000 (API), 9001 (Console) | Object storage |
| MongoDB | 27017 | Configuration and status store |
| ActiveMQ | 61616 (broker), 8161 (console) | Message queue and notifications |
| Vault | 8200 | Secrets management |
| Kafka | 9092 | Streaming (optional) |
| Kafka UI | 8085 | Kafka topic browser |
| PostgreSQL | 5432 | Database destination + pgvector |
| Zookeeper | 2181 | Kafka coordination |
Web UIs
| UI | URL | Credentials |
|---|
| Datris Platform UI | http://localhost:4200 | none |
| Datris Platform API | http://localhost:8080 | none |
| MCP Server (SSE) | http://localhost:3000/sse | none |
| MinIO Console | http://localhost:9001 | minioadmin / minioadmin |
| ActiveMQ Console | http://localhost:8161 | admin / admin |
| Kafka UI | http://localhost:8085 | none |
| Vault UI | http://localhost:8200 | Token: root-token |
API Keys and AI Providers
Datris supports three AI providers. Set your keys in .env:
| Provider | Environment Variable | Used For |
|---|
| Anthropic Claude | ANTHROPIC_API_KEY | AI data quality, transformations, error explanation, schema generation, profiling |
| OpenAI | OPENAI_API_KEY | Same as above, plus embeddings for vector database / RAG |
| Ollama (local) | OLLAMA_MODEL | Same as above — no API key needed, runs locally |
At least one AI provider key is required for AI features. The embedding provider for RAG is configured via Vault secrets — see AI Configuration for details.
Infrastructure Details
MinIO
The minio-init container automatically creates the required buckets:
{env}-raw - File upload staging
{env}-raw-plus - Processed file staging
{env}-temp - Temporary processing files
{env}-data - Pipeline output (object store destination)
{env}-config - Configuration files (validation schemas)
Where {env} is the environment name (default: oss). See Configuration Reference for the environment setting.
Vault
The vault-init container seeds Vault with default secrets for all services (MinIO, ActiveMQ, MongoDB, PostgreSQL, Kafka) plus your AI provider API keys from .env. Vault runs in dev mode with root token root-token.
Vector Databases
pgvector is included by default via the PostgreSQL service. To add other vector databases, uncomment the relevant sections in docker-compose.yml:
- Qdrant — high-performance vector database
- Weaviate — open-source vector database
- Chroma — lightweight, single container
- Milvus — scalable vector database (requires separate setup)
Configuration
The pipeline server reads configuration from application.yaml, mounted from docker/config/application.yaml.
See Configuration Reference for the full list of properties.
Building from Source
For development or contributing:
Prerequisites
Build and run
# Build the server JAR
sbt clean assembly
# Start with local builds (edit docker-compose.yml to uncomment build: lines)
docker compose up --build
In docker-compose.yml, uncomment the build: lines and comment out the image: lines for the services you want to build locally:
datris:
# image: datris/datris-server:latest
build: . # Build from source