Prerequisites
- Docker and Docker Compose
Quick Start
1. Clone the repository
2. Set your API keys
.env and add your API keys (at least one required for AI features):
3. Start all services
vault-init automatically seeds your API keys into Vault.
4. Verify
Upgrading
If you already have Datris installed and want to upgrade to the latest version:Services
| Service | Port | Purpose |
|---|---|---|
| Pipeline Server | 8080 | REST API and data processing |
| Pipeline UI | 4200 | Web dashboard |
| MCP Server | 3000 | AI agent integration (MCP protocol) |
| MinIO | 9000 (API), 9001 (Console) | Object storage |
| MongoDB | 27017 | Configuration and status store |
| ActiveMQ | 61616 (broker), 8161 (console) | Message queue and notifications |
| Vault | 8200 | Secrets management |
| Kafka | 9092 | Streaming (optional) |
| Kafka UI | 8085 | Kafka topic browser |
| PostgreSQL | 5432 | Database destination + pgvector |
| Zookeeper | 2181 | Kafka coordination |
Web UIs
| UI | URL | Credentials |
|---|---|---|
| Pipeline UI | http://localhost:4200 | none |
| Pipeline API | http://localhost:8080 | none |
| MCP Server (SSE) | http://localhost:3000/sse | none |
| MinIO Console | http://localhost:9001 | minioadmin / minioadmin |
| ActiveMQ Console | http://localhost:8161 | admin / admin |
| Kafka UI | http://localhost:8085 | none |
| Vault UI | http://localhost:8200 | Token: root-token |
API Keys and AI Providers
Datris supports three AI providers. Set your keys in.env:
| Provider | Environment Variable | Used For |
|---|---|---|
| Anthropic Claude | ANTHROPIC_API_KEY | AI data quality, transformations, error explanation, schema generation, profiling |
| OpenAI | OPENAI_API_KEY | Same as above, plus embeddings for vector database / RAG |
| Ollama (local) | OLLAMA_MODEL | Same as above — no API key needed, runs locally |
EMBEDDING_PROVIDER in .env.
Infrastructure Details
MinIO
Theminio-init container automatically creates the required buckets:
oss-raw- File upload stagingoss-raw-plus- Processed file stagingoss-temp- Temporary processing filesoss-data- Pipeline output (object store destination)oss-config- Configuration files (validation schemas)
Vault
Thevault-init container seeds Vault with default secrets for all services (MinIO, ActiveMQ, MongoDB, PostgreSQL, Kafka) plus your AI provider API keys from .env. Vault runs in dev mode with root token root-token.
Vector Databases
pgvector is included by default via the PostgreSQL service. To add other vector databases, uncomment the relevant sections indocker-compose.yml:
- Qdrant — high-performance vector database
- Weaviate — open-source vector database
- Chroma — lightweight, single container
- Milvus — scalable vector database (requires separate setup)
Configuration
The pipeline server reads configuration fromapplication.yaml, mounted from docker/config/application.yaml.
See Configuration Reference for the full list of properties.
Building from Source
For development or contributing:Prerequisites
- Java 17+
- SBT
Build and run
docker-compose.yml, uncomment the build: lines and comment out the image: lines for the services you want to build locally: