Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.datris.ai/llms.txt

Use this file to discover all available pages before exploring further.

The pipeline server is configured via application.yaml (or application.properties). This page documents all available properties.

Full Reference

Spring Boot

PropertyDefaultDescription
spring.servlet.multipart.max-file-size1GBMaximum upload file size
spring.servlet.multipart.max-request-size1GBMaximum request size
spring.server.tomcat.connection-timeout600000Tomcat connection timeout (ms)

Logging

PropertyDefaultDescription
logging.level.rootINFORoot log level
logging.level.org.springframework.webINFOSpring web log level
logging.level.ai.datrisINFOPipeline log level

Scheduling

PropertyDefaultDescription
schedule.checkFileNotifierQueue5000Polling interval for file notification queue (ms)
schedule.findJobsToStart5000Interval to check for queued jobs (ms)
schedule.checkDatabaseSourceQueries30000Interval to check for database pulls (ms)
schedule.checkTapSchedules30000Interval to check for taps with CRON schedules due to run (ms)

Pipeline

PropertyDefaultDescription
environmentossEnvironment name. Used as prefix for bucket names (oss-raw, oss-data, etc.) and table names
useApiKeysfalseEnable API key authentication
multiTenantfalseEnable per-request tenant resolution. When true, the postgres database is overridden per-request with the tenant name
sendPipelineNotificationstrueEnable pipeline event notifications
ttlFileNotifierQueueMessages60Days to retain processed message IDs for deduplication
tapScriptTimeoutSeconds300Maximum tap script execution time in seconds

CORS

Cross-Origin Resource Sharing controls which browser origins can call the Datris API directly. The default * allows any origin and is appropriate for local development. In production, lock this down to your real frontend origin(s).
PropertyDefaultDescription
cors.allowedOrigins*Comma-separated list of allowed origins. Use * for any origin (development only), or specific URLs like https://app.example.com,https://admin.example.com. Applied globally to all /api/** endpoints.
In the deploy config, this reads from the CORS_ALLOWED_ORIGINS environment variable so you can change it without rebuilding the image:
cors:
  allowedOrigins: ${CORS_ALLOWED_ORIGINS:*}

Date / Timezone

All display timestamps across the platform (pipeline status, tap run history, etc.) are formatted using these settings.
PropertyDefaultDescription
dateFormatyyyy-MM-dd HH:mm:ss zJava SimpleDateFormat pattern. Use z to print the timezone abbreviation (e.g., UTC, EDT, EST)
dateTimezoneAmerica/New_YorkIANA timezone ID (e.g., UTC, America/New_York, Europe/London). When the format includes z, daylight saving is handled automatically
Example — Eastern time with auto-DST:
dateFormat: "yyyy-MM-dd HH:mm:ss z"
dateTimezone: "America/New_York"
# Displays: 2026-04-05 14:30:00 EDT (summer) or 2026-11-05 14:30:00 EST (winter)

MinIO (Object Store)

PropertyDescription
minio.serverMinIO endpoint URL (e.g., http://localhost:9000)
MinIO credentials are stored in Vault under the secret specified by secrets.minIOSecretName:
{
  "accessKey": "minioadmin",
  "secretKey": "minioadmin"
}

Secrets (HashiCorp Vault)

PropertyDescription
secrets.apiKeysSecretNameVault path for API keys
secrets.postgresSecretNameVault path for PostgreSQL credentials
secrets.minIOSecretNameVault path for MinIO credentials
secrets.activeMQSecretNameVault path for ActiveMQ credentials
secrets.mongoDbSecretNameVault path for MongoDB credentials
secrets.kafkaProducerSecretNameVault path for Kafka producer credentials
secrets.qdrantSecretNameVault path for Qdrant connection
secrets.weaviateSecretNameVault path for Weaviate connection
secrets.milvusSecretNameVault path for Milvus connection
secrets.chromaSecretNameVault path for Chroma connection
secrets.pgvectorSecretNameVault path for pgvector PostgreSQL connection
Vault connection is configured via environment variables:
  • VAULT_ADDR - Vault server URL (e.g., http://vault:8200)
  • VAULT_TOKEN - Authentication token

ActiveMQ (Queue & Notifications)

PropertyDescription
activemq.serverActiveMQ broker URL (e.g., tcp://localhost:61616)
ActiveMQ credentials are stored in Vault under secrets.activeMQSecretName:
{
  "username": "admin",
  "password": "admin"
}

MongoDB (NoSQL Store)

PropertyDescription
mongodb.connectionStringMongoDB connection URI (e.g., mongodb://localhost:27017)
mongodb.databaseUser-facing database name (default: datris). Pipelines write here, the UI Search tab reads here, and tap scripts get this as DATRIS_MONGODB_DATABASE. In multi-tenant mode the tenant environment name is used instead.
mongodb.internalDatabasePlatform-internal database name (default: oss). Holds pipeline/tap configs, run status, job queues — never surfaced in the UI. Keep this distinct from mongodb.database so user data and platform state don’t mix.

PostgreSQL

PropertyDefaultDescription
postgres.databasedatrisDefault database name used by /api/v1/query/postgres and /api/v1/metadata/postgres/* when no database parameter is supplied. Also injected into tap scripts as the DATRIS_POSTGRES_DATABASE environment variable. In multi-tenant mode this value is automatically overridden per-request with the tenant name.
PostgreSQL connection details (username, password, jdbcUrl) are stored in Vault under secrets.postgresSecretName — see the Vault Secret Formats section below.

Kafka Consumer (Optional)

PropertyDefaultDescription
kafkaConsumer.enabledfalseEnable Kafka topic consumption
kafkaConsumer.bootstrapServersKafka broker address
kafkaConsumer.groupIdConsumer group ID
kafkaConsumer.topicPollingInterval500Topic polling interval (ms)
kafkaConsumer.topicPrefixPrefix for topic names

AI (Required)

AI configuration is split into three independent slots, each pointing at its own self-describing Vault secret. The resolver reads whatever it finds in the secret — provider, endpoint, model, apiKey, and (optionally) version — so the YAML side never needs a provider field.
PropertyDefaultDescription
ai.enabledtrueEnable AI features (required for the platform to start)
ai.aiPrimary.secretNameoss/ai-primaryVault secret for the main AI model used for general reasoning (NL→SQL, search answers, etc.).
ai.codegen.secretNameoss/codegenVault secret for the code-generation model (tap scripts, AI DQ, AI transformations, schema generation). Seeded with the strongest available model — Anthropic gets claude-opus-4-7, OpenAI gets gpt-5.4.
ai.embedding.secretNameoss/embeddingVault secret for the embedding model used by vector destinations (Chroma, Qdrant, Milvus, Weaviate, pgvector) and search. For Anthropic-only deployments this is seeded to point at the bundled TEI sidecar serving BAAI/bge-m3 (1024-dim), so vector destinations work out of the box without an OpenAI key.
Each Vault secret is self-describing and looks like:
vault kv put secret/oss/ai-primary \
  provider="anthropic" \
  endpoint="https://api.anthropic.com/v1/messages" \
  model="claude-sonnet-4-6" \
  apiKey="sk-ant-..." \
  version="2023-06-01"
docker/vault-init.sh seeds all three secrets automatically based on which key is present in .env (ANTHROPIC_API_KEY or OPENAI_API_KEY). For multi-tenant deployments, per-tenant override secrets live at {env}/ai-primary, {env}/codegen, {env}/embedding. All AI calls (callAI, callAIWithSystem, callAIWithMessages) share a unified retry helper that automatically retries on transient 429 (rate limited), 503 (service unavailable), and 529 (overloaded) responses with linear backoff (5s, 10s, 15s, 20s, 25s) for up to 5 attempts. This applies uniformly across all configured providers. See AI Configuration for full setup details.

Vault Secret Formats

PostgreSQL

{
  "username": "postgres",
  "password": "password",
  "jdbcUrl": "jdbc:postgresql://localhost:5432"
}

MySQL

{
  "username": "root",
  "password": "password",
  "jdbcUrl": "jdbc:mysql://localhost:3306"
}

Kafka Producer

{
  "bootstrapServers": "kafka:9092",
  "username": null,
  "password": null
}

MongoDB

{
  "connectionString": "mongodb://localhost:27017"
}

MinIO Buckets

The following buckets are created automatically by the minio-init container:
BucketPurpose
{environment}-rawFile upload staging
{environment}-raw-plusProcessed file staging
{environment}-tempTemporary processing files
{environment}-dataObject store destination output
{environment}-configConfiguration files (validation schemas)

MongoDB Collections

CollectionPurpose
{environment}-pipelinePipeline configurations
{environment}-pipeline-statusJob processing status
{environment}-archived-metadataFile ingestion metadata
{environment}-file-notifier-messageProcessed message deduplication
{environment}-data-pullDatabase pull scheduling state