Full Configuration Example
Configuration Fields
Top Level
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Pipeline name (max 80 characters) |
source | object | Yes | Source configuration |
preprocessor | object | No | Optional REST endpoint called before processing |
dataQuality | object | No | Data validation rules |
transformation | object | No | Data transformation settings |
destination | object | Yes | One or more output destinations |
catalog | string | No | Free-form label that groups related pipelines and taps in the Data Catalog. Empty or null = Uncataloged. |
Source
| Field | Type | Required | Description |
|---|---|---|---|
schemaProperties | object | Yes* | Schema definition with field names and types. *Required for structured/semi-structured data |
fileAttributes | object | No | File format configuration (CSV, JSON, XML, XLS, unstructured) |
databaseAttributes | object | No | Database pull configuration |
streamAttributes | object | No | Stream source configuration (Kafka) |
Source > Schema Properties
| Field | Type | Description |
|---|---|---|
dbName | string | Database name associated with this schema (used by some destinations) |
fields | array | List of {"name": "...", "type": "..."} objects |
schemaVersion | int | Schema version (default 1) |
boolean, int, tinyint, smallint, bigint, float, double, decimal(p,s), string, varchar(n), char(n), date, timestamp
Source > File Attributes
CSV:Source > Database Attributes
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | postgres, mysql, or mssql |
postgresSecretsName | string | Conditional | Vault secret name for Postgres credentials |
mssqlSecretsName | string | Conditional | Vault secret name for MSSQL credentials |
mysqlSecretsName | string | Conditional | Vault secret name for MySQL credentials |
cronExpression | string | Yes | Cron schedule for polling (e.g., 0 */5 * * * ?) |
database | string | No | Database name |
schema | string | No | Schema name |
table | string | Yes* | Table to query (*unless sqlOverride is set) |
includeFields | array | No | Column whitelist |
timestampFieldName | string | Yes* | Column for incremental pulls (*unless sqlOverride is set) |
sqlOverride | string | No | Custom SELECT query (replaces auto-generated query) |
outputDelimiter | string | No | Delimiter for CSV output (default ,) |
Preprocessor
| Field | Type | Default | Description |
|---|---|---|---|
endpoint | string | Required | URL of the preprocessing service |
async | boolean | false | If true, returns immediately |
bearerToken | string | null | Authorization bearer token |
timeoutMs | int | 300000 | Request timeout (milliseconds) |
Data Quality
See Data Quality for detailed documentation.| Field | Type | Description |
|---|---|---|
aiRule | object | AI rule — plain-English validation instruction |
validateFileHeader | boolean | Validate CSV header matches schema field order |
validationSchema | string | Path to JSON Schema file for JSON/XML validation |
Transformation
See AI Transformation for detailed documentation.| Field | Type | Description |
|---|---|---|
aiTransformation | object | AI transformation — plain-English transformation instruction |
Destination > Object Store
The object store destination writes Parquet or ORC files to either the built-in MinIO (default) or to AWS S3. The provider is selected by theprovider field; everything else is shared across both.
| Field | Type | Default | Description |
|---|---|---|---|
prefixKey | string | Required | Path prefix for output files under the bucket |
fileFormat | string | parquet | Output format: parquet or orc |
partitionBy | array | null | Column names for partitioning |
destinationBucketOverride | string | null | Custom bucket (default: {environment}-data for MinIO; required for S3) |
deleteBeforeWrite | boolean | false | Delete existing data at path before writing |
writeToTemporaryLocation | boolean | false | Write to temp location first |
writeMode | string | append | append, overwrite, ignore, or errorifexists |
provider | string | minio | minio (built-in, default) or s3 (AWS S3) |
endpoint | string | null | S3 endpoint URL override. Must use https://. Leave unset for the AWS regional default. Ignored for minio. |
credentialsSecret | string | null | For provider=s3: name of a Platform-tab Vault secret containing accessKey, secretKey, region (and optionally sessionToken). Leave unset only when Datris runs on an AWS instance role. |
Destination > Database
| Field | Type | Default | Description |
|---|---|---|---|
dbName | string | Required | Database name |
schema | string | Required | Schema name |
table | string | Required | Table name |
keyFields | array | null | Primary key columns (enables upsert for MongoDB) |
usePostgres | boolean | false | Write to PostgreSQL |
useMongoDB | boolean | false | Write to MongoDB |
manageTableManually | boolean | false | If false, auto-creates tables |
truncateBeforeWrite | boolean | false | Truncate table before loading |
useTransaction | boolean | true | Wrap in a transaction |
options | array | null | Custom COPY options (e.g., ["FORMAT csv", "DELIMITER ','"]) |
Destination > Kafka
| Field | Type | Default | Description |
|---|---|---|---|
topic | string | Required | Kafka topic name |
keyField | string | null | Column to use as message key |
overrideBootstrapServers | string | null | Custom bootstrap servers |
timeoutMs | int | 10000 | Producer timeout |
Destination > ActiveMQ
| Field | Type | Description |
|---|---|---|
queueName | string | ActiveMQ queue name |
Destination > REST Endpoint
| Field | Type | Default | Description |
|---|---|---|---|
endpoint | string | Required | URL to POST data to |
async | boolean | false | If true, doesn’t wait for response |
bearerToken | string | null | Authorization token |
apiKey | string | null | API key for authentication |
timeoutSeconds | int | 0 | Request timeout (seconds); 0 means unset |
timeoutMs | int | 300000 | Request timeout (milliseconds) |
Destination > Qdrant
| Field | Type | Description |
|---|---|---|
collectionName | string | Qdrant collection name |
chunking | object | Chunking config (see below) |
metadata | object | Key-value metadata to attach to each chunk |
embeddingSecretName | string | Optional override of the embedding Vault secret. Defaults to the server-level ai.embedding.secretName. |
qdrantSecretName | string | Vault secret for Qdrant connection |
Destination > Weaviate
| Field | Type | Description |
|---|---|---|
className | string | Weaviate class name |
chunking | object | Chunking config (see below) |
metadata | object | Key-value metadata to attach to each chunk |
embeddingSecretName | string | Optional override of the embedding Vault secret. Defaults to the server-level ai.embedding.secretName. |
weaviateSecretName | string | Vault secret for Weaviate connection |
Destination > pgvector
| Field | Type | Description |
|---|---|---|
tableName | string | PostgreSQL table name |
schemaName | string | PostgreSQL schema name |
chunking | object | Chunking config (see below) |
metadata | object | Key-value metadata to attach to each chunk |
embeddingSecretName | string | Optional override of the embedding Vault secret. Defaults to the server-level ai.embedding.secretName. |
postgresSecretName | string | Vault secret for PostgreSQL connection |
Destination > Milvus
| Field | Type | Description |
|---|---|---|
collectionName | string | Milvus collection name |
chunking | object | Chunking config (see below) |
metadata | object | Key-value metadata to attach to each chunk |
embeddingSecretName | string | Optional override of the embedding Vault secret. Defaults to the server-level ai.embedding.secretName. |
milvusSecretName | string | Vault secret for Milvus connection |
Destination > Chroma
| Field | Type | Description |
|---|---|---|
collectionName | string | Chroma collection name |
chunking | object | Chunking config (see below) |
metadata | object | Key-value metadata to attach to each chunk |
embeddingSecretName | string | Optional override of the embedding Vault secret. Defaults to the server-level ai.embedding.secretName. |
chromaSecretName | string | Vault secret for Chroma connection |
Chunking Config
Used by all vector store destinations.| Field | Type | Default | Description |
|---|---|---|---|
strategy | string | recursive | Chunking strategy: fixed, sentence, paragraph, recursive |
chunkSize | int | 500 | Maximum characters per chunk |
chunkOverlap | int | 50 | Overlap between consecutive chunks |
maxChunkTokens | int | 0 (off) | Optional token-count cap. When set, the chunker refuses to emit any chunk over this estimate, splitting oversized output before it ever reaches the embedding API. Recommended: ~80% of the embedding model’s input cap (e.g. 6500 for text-embedding-3-small). Without it, the server-side embedding guard catches oversized chunks as a fallback. |
tokensPerCharRatio | float | 2.0 | Chars-per-token ratio for the chunker’s heuristic counter. Lower is more conservative (more splits); raise to ~3.5 for predictable Latin prose. |
