Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.datris.ai/llms.txt

Use this file to discover all available pages before exploring further.

The pipeline can automatically generate a ready-to-use pipeline configuration from an uploaded file using an AI model. The generated config can be pasted directly into POST /api/v1/pipeline to register the pipeline without writing any JSON by hand.

How It Works

Uploaded file
  |
  v
File type detection (CSV / JSON / XML)
  |
  +-- JSON / XML --> Fixed schema (_json or _xml field)
  |                  No AI call needed
  |
  +-- CSV / other --> First 100 lines sent to AI model
                      AI infers column names and data types
                      Returns JSON array of field definitions
  |
  v
Config builder assembles full PipelineConfig JSON
  - source.schemaProperties.fields  (AI-inferred or fixed)
  - source.fileAttributes            (csvAttributes / jsonAttributes / xmlAttributes)
  - destination                      (Postgres for CSV, MongoDB for JSON/XML)
  |
  v
Response: complete PipelineConfig JSON ready to register
The response is not registered automatically — it is returned to the caller so you can review it, fill in the placeholder values, and then POST it to /api/v1/pipeline.

Endpoint

POST /api/v1/pipeline/generate
Content-Type: multipart/form-data
Parameters:
ParameterTypeRequiredDescription
fileform-data (file)YesThe file to analyze
pipelinequeryNoPipeline name. If omitted, derived from the filename (lowercased, non-alphanumeric characters replaced with _)
delimiterqueryNoColumn delimiter for delimited files. Defaults to ,
headerqueryNoWhether the file has a header row
allStringsqueryNoIf true, all fields are typed as string (default: false)
x-api-keyheaderNoAPI key (required if useApiKeys: true)

Schema Rules by File Type

File typeSchemaDefault destination
CSV / delimitedAI infers field names and types from file contentPostgreSQL (usePostgres: true)
JSON (.json)Single field: _json (type string)MongoDB (useMongoDB: true)
XML (.xml)Single field: _xml (type string)MongoDB (useMongoDB: true)
JSON and XML files use a fixed schema because the pipeline stores them as raw documents — no column inference is needed. Valid AI-inferred types: boolean, int, bigint, float, double, string, date, timestamp

Example: CSV File

curl -X POST http://localhost:8080/api/v1/pipeline/generate \
  -H "x-api-key: your-api-key" \
  -F "file=@./stock_price.csv" \
  -F "pipeline=stock_price"
Response:
{
  "name": "stock_price",
  "source": {
    "schemaProperties": {
      "fields": [
        { "name": "symbol",    "type": "string" },
        { "name": "date",      "type": "string" },
        { "name": "open",      "type": "double" },
        { "name": "high",      "type": "double" },
        { "name": "low",       "type": "double" },
        { "name": "close",     "type": "double" },
        { "name": "volume",    "type": "int"    },
        { "name": "adj_close", "type": "double" }
      ]
    },
    "fileAttributes": {
      "csvAttributes": { "delimiter": ",", "header": true, "encoding": "UTF-8" }
    }
  },
  "destination": {
    "database": {
      "dbName": "DATABASE_NAME",
      "schema": "SCHEMA_NAME",
      "table": "TABLE_NAME",
      "usePostgres": true
    }
  }
}
Replace DATABASE_NAME, SCHEMA_NAME, and TABLE_NAME with real values, then register the pipeline:
curl -X POST http://localhost:8080/api/v1/pipeline \
  -H "Content-Type: application/json" \
  -d '<paste response here>'

Example: JSON File

curl -X POST http://localhost:8080/api/v1/pipeline/generate \
  -H "x-api-key: your-api-key" \
  -F "file=@./events.json"
Response:
{
  "name": "events",
  "source": {
    "schemaProperties": {
      "fields": [
        { "name": "_json", "type": "string" }
      ]
    },
    "fileAttributes": {
      "jsonAttributes": { "everyRowContainsObject": false, "encoding": "UTF-8" }
    }
  },
  "destination": {
    "database": {
      "dbName": "DATABASE_NAME",
      "table": "TABLE_NAME",
      "useMongoDB": true
    }
  }
}

Configuration

AI schema generation is disabled by default. To enable it: Schema generation uses the codegen AI slot. As of v1.5.6, AI configuration is split into three independent self-describing Vault secrets — see AI Configuration for the full picture. application.yaml (or docker/config/application.yaml for Docker deployments):
ai:
  enabled: "true"
  aiPrimary:
    secretName: "oss/ai-primary"
  codegen:
    secretName: "oss/codegen"
  embedding:
    secretName: "oss/embedding"
Vault secret (each secret is self-describing — provider, endpoint, model, apiKey, and optionally version, all inline):
vault kv put secret/oss/codegen \
  provider="anthropic" \
  endpoint="https://api.anthropic.com/v1/messages" \
  model="claude-opus-4-7" \
  apiKey="sk-ant-..." \
  version="2023-06-01"
For OpenAI:
vault kv put secret/oss/codegen \
  provider="openai" \
  endpoint="https://api.openai.com/v1/chat/completions" \
  model="gpt-5.4" \
  apiKey="sk-..."
The Vault secret keys:
KeyDescription
provideranthropic, openai, or ollama
endpointThe AI provider API URL
modelThe model name to use
apiKeyThe API key for authentication (omit for local Ollama)
versionOptional API version header (e.g. Anthropic’s anthropic-version)
The pipeline reads the secret at startup. If ai.enabled: true and the codegen secret is missing or malformed, startup will fail with a descriptive error. docker/vault-init.sh seeds this automatically from ANTHROPIC_API_KEY or OPENAI_API_KEY in .env.

Supported Providers

Providerprovider valueAuth header
Anthropic Claudeanthropicx-api-key + anthropic-version: 2023-06-01
OpenAIopenaiAuthorization: Bearer
Ollama (local)ollamanone
Any other value for provider will cause startup to fail with an unsupported provider error.