AI Schema Generation API

The pipeline can automatically generate a ready-to-use pipeline configuration from an uploaded file using an AI model. The generated config can be pasted directly into POST /api/v1/pipeline to register the pipeline without writing any JSON by hand.

How It Works

Uploaded file
  |
  v
File type detection (CSV / JSON / XML)
  |
  +-- JSON / XML --> Fixed schema (_json or _xml field)
  |                  No AI call needed
  |
  +-- CSV / other --> First 100 lines sent to AI model
                      AI infers column names and data types
                      Returns JSON array of field definitions
  |
  v
Config builder assembles full PipelineConfig JSON
  - source.schemaProperties.fields  (AI-inferred or fixed)
  - source.fileAttributes            (csvAttributes / jsonAttributes / xmlAttributes)
  - destination                      (Postgres for CSV, MongoDB for JSON/XML)
  |
  v
Response: complete PipelineConfig JSON ready to register

The response is not registered automatically — it is returned to the caller so you can review it, fill in the placeholder values, and then POST it to /api/v1/pipeline.

Endpoint

POST /api/v1/pipeline/generate
Content-Type: multipart/form-data

Parameters:

Parameter	Type	Required	Description
`file`	form-data (file)	Yes	The file to analyze
`pipeline`	query	No	Pipeline name. If omitted, derived from the filename (lowercased, non-alphanumeric characters replaced with `_`)
`delimiter`	query	No	Column delimiter for delimited files. Defaults to `,`
`header`	query	No	Whether the file has a header row
`allStrings`	query	No	If `true`, all fields are typed as `string` (default: `false`)
`x-api-key`	header	No	API key (required if `useApiKeys: true`)

Schema Rules by File Type

File type	Schema	Default destination
CSV / delimited	AI infers field names and types from file content	PostgreSQL (`usePostgres: true`)
JSON (`.json`)	Single field: `_json` (type `string`)	MongoDB (`useMongoDB: true`)
XML (`.xml`)	Single field: `_xml` (type `string`)	MongoDB (`useMongoDB: true`)

JSON and XML files use a fixed schema because the pipeline stores them as raw documents — no column inference is needed. Valid AI-inferred types: boolean, int, bigint, float, double, string, date, timestamp

Example: CSV File

curl -X POST http://localhost:8080/api/v1/pipeline/generate \
  -H "x-api-key: your-api-key" \
  -F "file=@./stock_price.csv" \
  -F "pipeline=stock_price"

Response:

{
  "name": "stock_price",
  "source": {
    "schemaProperties": {
      "fields": [
        { "name": "symbol",    "type": "string" },
        { "name": "date",      "type": "string" },
        { "name": "open",      "type": "double" },
        { "name": "high",      "type": "double" },
        { "name": "low",       "type": "double" },
        { "name": "close",     "type": "double" },
        { "name": "volume",    "type": "int"    },
        { "name": "adj_close", "type": "double" }
      ]
    },
    "fileAttributes": {
      "csvAttributes": { "delimiter": ",", "header": true, "encoding": "UTF-8" }
    }
  },
  "destination": {
    "database": {
      "dbName": "DATABASE_NAME",
      "schema": "SCHEMA_NAME",
      "table": "TABLE_NAME",
      "usePostgres": true
    }
  }
}

Replace DATABASE_NAME, SCHEMA_NAME, and TABLE_NAME with real values, then register the pipeline:

curl -X POST http://localhost:8080/api/v1/pipeline \
  -H "Content-Type: application/json" \
  -d '<paste response here>'

Example: JSON File

curl -X POST http://localhost:8080/api/v1/pipeline/generate \
  -H "x-api-key: your-api-key" \
  -F "file=@./events.json"

Response:

{
  "name": "events",
  "source": {
    "schemaProperties": {
      "fields": [
        { "name": "_json", "type": "string" }
      ]
    },
    "fileAttributes": {
      "jsonAttributes": { "everyRowContainsObject": false, "encoding": "UTF-8" }
    }
  },
  "destination": {
    "database": {
      "dbName": "DATABASE_NAME",
      "table": "TABLE_NAME",
      "useMongoDB": true
    }
  }
}

Configuration

AI schema generation is disabled by default. To enable it: Schema generation uses the codegen AI slot. As of v1.5.6, AI configuration is split into three independent self-describing Vault secrets — see AI Configuration for the full picture. application.yaml (or docker/config/application.yaml for Docker deployments):

ai:
  enabled: "true"
  aiPrimary:
    secretName: "oss/ai-primary"
  codegen:
    secretName: "oss/codegen"
  embedding:
    secretName: "oss/embedding"

Vault secret (each secret is self-describing — provider, endpoint, model, apiKey, and optionally version, all inline):

vault kv put secret/oss/codegen \
  provider="anthropic" \
  endpoint="https://api.anthropic.com/v1/messages" \
  model="claude-opus-4-7" \
  apiKey="sk-ant-..." \
  version="2023-06-01"

For OpenAI:

vault kv put secret/oss/codegen \
  provider="openai" \
  endpoint="https://api.openai.com/v1/chat/completions" \
  model="gpt-5.4" \
  apiKey="sk-..."

The Vault secret keys:

Key	Description
`provider`	`anthropic`, `openai`, or `ollama`
`endpoint`	The AI provider API URL
`model`	The model name to use
`apiKey`	The API key for authentication (omit for local Ollama)
`version`	Optional API version header (e.g. Anthropic’s `anthropic-version`)

The pipeline reads the secret at startup. If ai.enabled: true and the codegen secret is missing or malformed, startup will fail with a descriptive error. docker/vault-init.sh seeds this automatically from ANTHROPIC_API_KEY or OPENAI_API_KEY in .env.

Supported Providers

Provider	`provider` value	Auth header
Anthropic Claude	`anthropic`	`x-api-key` + `anthropic-version: 2023-06-01`
OpenAI	`openai`	`Authorization: Bearer`
Ollama (local)	`ollama`	none

Any other value for provider will cause startup to fail with an unsupported provider error.

API Documentation

Pipelines

Ingestion

AI

Status

Query

Search

Metadata

Configuration

Health

Secrets

Taps

AI Schema Generation API

How It Works

Endpoint

Schema Rules by File Type

Example: CSV File

Example: JSON File

Configuration

Supported Providers

API Documentation

Pipelines

Ingestion

AI

Status

Query

Search

Metadata

Configuration

Health

Secrets

Taps

Documentation Index

​How It Works

​Endpoint

​Schema Rules by File Type

​Example: CSV File

​Example: JSON File

​Configuration

​Supported Providers

How It Works

Endpoint

Schema Rules by File Type

Example: CSV File

Example: JSON File

Configuration

Supported Providers