Configuration Reference

The pipeline server is configured via application.yaml (or application.properties). This page documents all available properties.

Full Reference

Spring Boot

Property	Default	Description
`spring.servlet.multipart.max-file-size`	`1GB`	Maximum upload file size
`spring.servlet.multipart.max-request-size`	`1GB`	Maximum request size
`spring.server.tomcat.connection-timeout`	`600000`	Tomcat connection timeout (ms)

Logging

Property	Default	Description
`logging.level.root`	`INFO`	Root log level
`logging.level.org.springframework.web`	`INFO`	Spring web log level
`logging.level.ai.datris`	`INFO`	Pipeline log level

Scheduling

Property	Default	Description
`schedule.checkFileNotifierQueue`	`5000`	Polling interval for file notification queue (ms)
`schedule.findJobsToStart`	`5000`	Interval to check for queued jobs (ms)
`schedule.checkDatabaseSourceQueries`	`30000`	Interval to check for database pulls (ms)
`schedule.checkTapSchedules`	`30000`	Interval to check for taps with CRON schedules due to run (ms)

Pipeline

Property	Default	Description
`environment`	`oss`	Environment name. Used as prefix for bucket names (`oss-raw`, `oss-data`, etc.) and table names
`useApiKeys`	`false`	Enable API key authentication
`multiTenant`	`false`	Enable per-request tenant resolution. When true, the postgres database is overridden per-request with the tenant name
`sendPipelineNotifications`	`true`	Enable pipeline event notifications
`ttlFileNotifierQueueMessages`	`60`	Days to retain processed message IDs for deduplication
`tapScriptTimeoutSeconds`	`300`	Maximum tap script execution time in seconds

CORS

Cross-Origin Resource Sharing controls which browser origins can call the Datris API directly. The default * allows any origin and is appropriate for local development. In production, lock this down to your real frontend origin(s).

Property	Default	Description
`cors.allowedOrigins`	`*`	Comma-separated list of allowed origins. Use `` for any origin (development only), or specific URLs like `https://app.example.com,https://admin.example.com`. Applied globally to all `/api/*` endpoints.

In the deploy config, this reads from the CORS_ALLOWED_ORIGINS environment variable so you can change it without rebuilding the image:

cors:
  allowedOrigins: ${CORS_ALLOWED_ORIGINS:*}

Date / Timezone

All display timestamps across the platform (pipeline status, tap run history, etc.) are formatted using these settings.

Property	Default	Description
`dateFormat`	`yyyy-MM-dd HH:mm:ss z`	Java `SimpleDateFormat` pattern. Use `z` to print the timezone abbreviation (e.g., `UTC`, `EDT`, `EST`)
`dateTimezone`	`America/New_York`	IANA timezone ID (e.g., `UTC`, `America/New_York`, `Europe/London`). When the format includes `z`, daylight saving is handled automatically

Example — Eastern time with auto-DST:

dateFormat: "yyyy-MM-dd HH:mm:ss z"
dateTimezone: "America/New_York"
# Displays: 2026-04-05 14:30:00 EDT (summer) or 2026-11-05 14:30:00 EST (winter)

MinIO (Object Store)

Property	Description
`minio.server`	MinIO endpoint URL (e.g., `http://localhost:9000`)

MinIO credentials are stored in Vault under the secret specified by secrets.minIOSecretName:

{
  "accessKey": "minioadmin",
  "secretKey": "minioadmin"
}

Secrets (HashiCorp Vault)

Property	Description
`secrets.apiKeysSecretName`	Vault path for API keys
`secrets.postgresSecretName`	Vault path for PostgreSQL credentials
`secrets.minIOSecretName`	Vault path for MinIO credentials
`secrets.activeMQSecretName`	Vault path for ActiveMQ credentials
`secrets.mongoDbSecretName`	Vault path for MongoDB credentials
`secrets.kafkaProducerSecretName`	Vault path for Kafka producer credentials
`secrets.qdrantSecretName`	Vault path for Qdrant connection
`secrets.weaviateSecretName`	Vault path for Weaviate connection
`secrets.milvusSecretName`	Vault path for Milvus connection
`secrets.chromaSecretName`	Vault path for Chroma connection
`secrets.pgvectorSecretName`	Vault path for pgvector PostgreSQL connection

Vault connection is configured via environment variables:

VAULT_ADDR - Vault server URL (e.g., http://vault:8200)
VAULT_TOKEN - Authentication token

ActiveMQ (Queue & Notifications)

Property	Description
`activemq.server`	ActiveMQ broker URL (e.g., `tcp://localhost:61616`)

ActiveMQ credentials are stored in Vault under secrets.activeMQSecretName:

{
  "username": "admin",
  "password": "admin"
}

MongoDB (NoSQL Store)

Property	Description
`mongodb.connectionString`	MongoDB connection URI (e.g., `mongodb://localhost:27017`)
`mongodb.database`	User-facing database name (default: `datris`). Pipelines write here, the UI Search tab reads here, and tap scripts get this as `DATRIS_MONGODB_DATABASE`. In multi-tenant mode the tenant environment name is used instead.
`mongodb.internalDatabase`	Platform-internal database name (default: `oss`). Holds pipeline/tap configs, run status, job queues — never surfaced in the UI. Keep this distinct from `mongodb.database` so user data and platform state don’t mix.

PostgreSQL

Property	Default	Description
`postgres.database`	`datris`	Default database name used by `/api/v1/query/postgres` and `/api/v1/metadata/postgres/*` when no `database` parameter is supplied. Also injected into tap scripts as the `DATRIS_POSTGRES_DATABASE` environment variable. In multi-tenant mode this value is automatically overridden per-request with the tenant name.

PostgreSQL connection details (username, password, jdbcUrl) are stored in Vault under secrets.postgresSecretName — see the Vault Secret Formats section below.

Kafka Consumer (Optional)

Property	Default	Description
`kafkaConsumer.enabled`	`false`	Enable Kafka topic consumption
`kafkaConsumer.bootstrapServers`		Kafka broker address
`kafkaConsumer.groupId`		Consumer group ID
`kafkaConsumer.topicPollingInterval`	`500`	Topic polling interval (ms)
`kafkaConsumer.topicPrefix`		Prefix for topic names

AI (Required)

AI configuration is split into three independent slots, each pointing at its own self-describing Vault secret. The resolver reads whatever it finds in the secret — provider, endpoint, model, apiKey, and (optionally) version — so the YAML side never needs a provider field.

Property	Default	Description
`ai.enabled`	`true`	Enable AI features (required for the platform to start)
`ai.aiPrimary.secretName`	`oss/ai-primary`	Vault secret for the main AI model used for general reasoning (NL→SQL, search answers, etc.).
`ai.codegen.secretName`	`oss/codegen`	Vault secret for the code-generation model (tap scripts, AI DQ, AI transformations, schema generation). Seeded with the strongest available model — Anthropic gets `claude-opus-4-7`, OpenAI gets `gpt-5.4`.
`ai.embedding.secretName`	`oss/embedding`	Vault secret for the embedding model used by vector destinations (Chroma, Qdrant, Milvus, Weaviate, pgvector) and search. For Anthropic-only deployments this is seeded to point at the bundled TEI sidecar serving `BAAI/bge-m3` (1024-dim), so vector destinations work out of the box without an OpenAI key.

Each Vault secret is self-describing and looks like:

vault kv put secret/oss/ai-primary \
  provider="anthropic" \
  endpoint="https://api.anthropic.com/v1/messages" \
  model="claude-sonnet-4-6" \
  apiKey="sk-ant-..." \
  version="2023-06-01"

docker/vault-init.sh seeds all three secrets automatically based on which key is present in .env (ANTHROPIC_API_KEY or OPENAI_API_KEY). For multi-tenant deployments, per-tenant override secrets live at {env}/ai-primary, {env}/codegen, {env}/embedding. All AI calls (callAI, callAIWithSystem, callAIWithMessages) share a unified retry helper that automatically retries on transient 429 (rate limited), 503 (service unavailable), and 529 (overloaded) responses with linear backoff (5s, 10s, 15s, 20s, 25s) for up to 5 attempts. This applies uniformly across all configured providers. See AI Configuration for full setup details.

Vault Secret Formats

PostgreSQL

{
  "username": "postgres",
  "password": "password",
  "jdbcUrl": "jdbc:postgresql://localhost:5432"
}

MySQL

{
  "username": "root",
  "password": "password",
  "jdbcUrl": "jdbc:mysql://localhost:3306"
}

Kafka Producer

{
  "bootstrapServers": "kafka:9092",
  "username": null,
  "password": null
}

MongoDB

{
  "connectionString": "mongodb://localhost:27017"
}

MinIO Buckets

The following buckets are created automatically by the minio-init container:

Bucket	Purpose
`{environment}-raw`	File upload staging
`{environment}-raw-plus`	Processed file staging
`{environment}-temp`	Temporary processing files
`{environment}-data`	Object store destination output
`{environment}-config`	Configuration files (validation schemas)

MongoDB Collections

Collection	Purpose
`{environment}-pipeline`	Pipeline configurations
`{environment}-pipeline-status`	Job processing status
`{environment}-archived-metadata`	File ingestion metadata
`{environment}-file-notifier-message`	Processed message deduplication
`{environment}-data-pull`	Database pull scheduling state

Getting Started

Discovery

Taps

Ingestion

Destinations

Data Quality

Transformation

AI Features

Configuration

Examples

Configuration Reference

Full Reference

Spring Boot

Logging

Scheduling

Pipeline

CORS

Date / Timezone

MinIO (Object Store)

Secrets (HashiCorp Vault)

ActiveMQ (Queue & Notifications)

MongoDB (NoSQL Store)

PostgreSQL

Kafka Consumer (Optional)

AI (Required)

Vault Secret Formats

PostgreSQL

MySQL

Kafka Producer

MongoDB

MinIO Buckets

MongoDB Collections

Getting Started

Discovery

Taps

Ingestion

Destinations

Data Quality

Transformation

AI Features

Configuration

Examples

Documentation Index

​Full Reference

​Spring Boot

​Logging

​Scheduling

​Pipeline

​CORS

​Date / Timezone

​MinIO (Object Store)

​Secrets (HashiCorp Vault)

​ActiveMQ (Queue & Notifications)

​MongoDB (NoSQL Store)

​PostgreSQL

​Kafka Consumer (Optional)

​AI (Required)

​Vault Secret Formats

​PostgreSQL

​MySQL

​Kafka Producer

​MongoDB

​MinIO Buckets

​MongoDB Collections

Full Reference

Spring Boot

Logging

Scheduling

Pipeline

CORS

Date / Timezone

MinIO (Object Store)

Secrets (HashiCorp Vault)

ActiveMQ (Queue & Notifications)

MongoDB (NoSQL Store)

PostgreSQL

Kafka Consumer (Optional)

AI (Required)

Vault Secret Formats

PostgreSQL

MySQL

Kafka Producer

MongoDB

MinIO Buckets

MongoDB Collections