Documentation Index
Fetch the complete documentation index at: https://docs.datris.ai/llms.txt
Use this file to discover all available pages before exploring further.
The pipeline can automatically generate a ready-to-use pipeline configuration from an uploaded file using an AI model. The generated config can be pasted directly into POST /api/v1/pipeline to register the pipeline without writing any JSON by hand.
How It Works
Uploaded file
|
v
File type detection (CSV / JSON / XML)
|
+-- JSON / XML --> Fixed schema (_json or _xml field)
| No AI call needed
|
+-- CSV / other --> First 100 lines sent to AI model
AI infers column names and data types
Returns JSON array of field definitions
|
v
Config builder assembles full PipelineConfig JSON
- source.schemaProperties.fields (AI-inferred or fixed)
- source.fileAttributes (csvAttributes / jsonAttributes / xmlAttributes)
- destination (Postgres for CSV, MongoDB for JSON/XML)
|
v
Response: complete PipelineConfig JSON ready to register
The response is not registered automatically — it is returned to the caller so you can review it, fill in the placeholder values, and then POST it to /api/v1/pipeline.
Endpoint
POST /api/v1/pipeline/generate
Content-Type: multipart/form-data
Parameters:
| Parameter | Type | Required | Description |
|---|
file | form-data (file) | Yes | The file to analyze |
pipeline | query | No | Pipeline name. If omitted, derived from the filename (lowercased, non-alphanumeric characters replaced with _) |
delimiter | query | No | Column delimiter for delimited files. Defaults to , |
header | query | No | Whether the file has a header row |
allStrings | query | No | If true, all fields are typed as string (default: false) |
x-api-key | header | No | API key (required if useApiKeys: true) |
Schema Rules by File Type
| File type | Schema | Default destination |
|---|
| CSV / delimited | AI infers field names and types from file content | PostgreSQL (usePostgres: true) |
JSON (.json) | Single field: _json (type string) | MongoDB (useMongoDB: true) |
XML (.xml) | Single field: _xml (type string) | MongoDB (useMongoDB: true) |
JSON and XML files use a fixed schema because the pipeline stores them as raw documents — no column inference is needed.
Valid AI-inferred types: boolean, int, bigint, float, double, string, date, timestamp
Example: CSV File
curl -X POST http://localhost:8080/api/v1/pipeline/generate \
-H "x-api-key: your-api-key" \
-F "file=@./stock_price.csv" \
-F "pipeline=stock_price"
Response:
{
"name": "stock_price",
"source": {
"schemaProperties": {
"fields": [
{ "name": "symbol", "type": "string" },
{ "name": "date", "type": "string" },
{ "name": "open", "type": "double" },
{ "name": "high", "type": "double" },
{ "name": "low", "type": "double" },
{ "name": "close", "type": "double" },
{ "name": "volume", "type": "int" },
{ "name": "adj_close", "type": "double" }
]
},
"fileAttributes": {
"csvAttributes": { "delimiter": ",", "header": true, "encoding": "UTF-8" }
}
},
"destination": {
"database": {
"dbName": "DATABASE_NAME",
"schema": "SCHEMA_NAME",
"table": "TABLE_NAME",
"usePostgres": true
}
}
}
Replace DATABASE_NAME, SCHEMA_NAME, and TABLE_NAME with real values, then register the pipeline:
curl -X POST http://localhost:8080/api/v1/pipeline \
-H "Content-Type: application/json" \
-d '<paste response here>'
Example: JSON File
curl -X POST http://localhost:8080/api/v1/pipeline/generate \
-H "x-api-key: your-api-key" \
-F "file=@./events.json"
Response:
{
"name": "events",
"source": {
"schemaProperties": {
"fields": [
{ "name": "_json", "type": "string" }
]
},
"fileAttributes": {
"jsonAttributes": { "everyRowContainsObject": false, "encoding": "UTF-8" }
}
},
"destination": {
"database": {
"dbName": "DATABASE_NAME",
"table": "TABLE_NAME",
"useMongoDB": true
}
}
}
Configuration
AI schema generation is disabled by default. To enable it:
Schema generation uses the codegen AI slot. As of v1.5.6, AI configuration is split into three independent self-describing Vault secrets — see AI Configuration for the full picture.
application.yaml (or docker/config/application.yaml for Docker deployments):
ai:
enabled: "true"
aiPrimary:
secretName: "oss/ai-primary"
codegen:
secretName: "oss/codegen"
embedding:
secretName: "oss/embedding"
Vault secret (each secret is self-describing — provider, endpoint, model, apiKey, and optionally version, all inline):
vault kv put secret/oss/codegen \
provider="anthropic" \
endpoint="https://api.anthropic.com/v1/messages" \
model="claude-opus-4-7" \
apiKey="sk-ant-..." \
version="2023-06-01"
For OpenAI:
vault kv put secret/oss/codegen \
provider="openai" \
endpoint="https://api.openai.com/v1/chat/completions" \
model="gpt-5.4" \
apiKey="sk-..."
The Vault secret keys:
| Key | Description |
|---|
provider | anthropic, openai, or ollama |
endpoint | The AI provider API URL |
model | The model name to use |
apiKey | The API key for authentication (omit for local Ollama) |
version | Optional API version header (e.g. Anthropic’s anthropic-version) |
The pipeline reads the secret at startup. If ai.enabled: true and the codegen secret is missing or malformed, startup will fail with a descriptive error. docker/vault-init.sh seeds this automatically from ANTHROPIC_API_KEY or OPENAI_API_KEY in .env.
Supported Providers
| Provider | provider value | Auth header |
|---|
| Anthropic Claude | anthropic | x-api-key + anthropic-version: 2023-06-01 |
| OpenAI | openai | Authorization: Bearer |
| Ollama (local) | ollama | none |
Any other value for provider will cause startup to fail with an unsupported provider error.