Skip to main content
The pipeline can automatically generate a ready-to-use pipeline configuration from an uploaded file using an AI model. The generated config can be pasted directly into POST /api/v1/pipeline to register the pipeline without writing any JSON by hand.

How It Works

Uploaded file
  |
  v
File type detection (CSV / JSON / XML)
  |
  +-- JSON / XML --> Fixed schema (_json or _xml field)
  |                  No AI call needed
  |
  +-- CSV / other --> First 100 lines sent to AI model
                      AI infers column names and data types
                      Returns JSON array of field definitions
  |
  v
Config builder assembles full PipelineConfig JSON
  - source.schemaProperties.fields  (AI-inferred or fixed)
  - source.fileAttributes            (csvAttributes / jsonAttributes / xmlAttributes)
  - destination                      (Postgres for CSV, MongoDB for JSON/XML)
  |
  v
Response: complete PipelineConfig JSON ready to register
The response is not registered automatically — it is returned to the caller so you can review it, fill in the placeholder values, and then POST it to /api/v1/pipeline.

Endpoint

POST /api/v1/pipeline/generate
Content-Type: multipart/form-data
Parameters:
ParameterTypeRequiredDescription
fileform-data (file)YesThe file to analyze
pipelinequeryNoPipeline name. If omitted, derived from the filename (lowercased, non-alphanumeric characters replaced with _)
delimiterqueryNoColumn delimiter for delimited files. Defaults to ,
headerqueryNoWhether the file has a header row. Defaults to false
x-api-keyheaderNoAPI key (required if useApiKeys: true)

Schema Rules by File Type

File typeSchemaDefault destination
CSV / delimitedAI infers field names and types from file contentPostgreSQL (usePostgres: true)
JSON (.json)Single field: _json (type string)MongoDB (useMongoDB: true)
XML (.xml)Single field: _xml (type string)MongoDB (useMongoDB: true)
JSON and XML files use a fixed schema because the pipeline stores them as raw documents — no column inference is needed. Valid AI-inferred types: boolean, int, bigint, float, double, string, date, timestamp

Example: CSV File

curl -X POST http://localhost:8080/api/v1/pipeline/generate \
  -H "x-api-key: your-api-key" \
  -F "file=@./stock_price.csv" \
  -F "pipeline=stock_price"
Response:
{
  "name": "stock_price",
  "source": {
    "schemaProperties": {
      "fields": [
        { "name": "symbol",    "type": "string" },
        { "name": "date",      "type": "string" },
        { "name": "open",      "type": "double" },
        { "name": "high",      "type": "double" },
        { "name": "low",       "type": "double" },
        { "name": "close",     "type": "double" },
        { "name": "volume",    "type": "int"    },
        { "name": "adj_close", "type": "double" }
      ]
    },
    "fileAttributes": {
      "csvAttributes": { "delimiter": ",", "header": true, "encoding": "UTF-8" }
    }
  },
  "destination": {
    "database": {
      "dbName": "DATABASE_NAME",
      "schema": "SCHEMA_NAME",
      "table": "TABLE_NAME",
      "usePostgres": true
    }
  }
}
Replace DATABASE_NAME, SCHEMA_NAME, and TABLE_NAME with real values, then register the pipeline:
curl -X POST http://localhost:8080/api/v1/pipeline \
  -H "Content-Type: application/json" \
  -d '<paste response here>'

Example: JSON File

curl -X POST http://localhost:8080/api/v1/pipeline/generate \
  -H "x-api-key: your-api-key" \
  -F "file=@./events.json"
Response:
{
  "name": "events",
  "source": {
    "schemaProperties": {
      "fields": [
        { "name": "_json", "type": "string" }
      ]
    },
    "fileAttributes": {
      "jsonAttributes": { "everyRowContainsObject": false, "encoding": "UTF-8" }
    }
  },
  "destination": {
    "database": {
      "dbName": "DATABASE_NAME",
      "table": "TABLE_NAME",
      "useMongoDB": true
    }
  }
}

Configuration

AI schema generation is disabled by default. To enable it: application.yaml (or docker/config/application.yaml for Docker deployments):
ai:
  enabled: "true"
  provider: "anthropic"       # anthropic or openai
  aiSecretName: "oss/anthropic"
Vault secret (stored at secret/<aiSecretName>):
vault kv put secret/oss/anthropic \
  endpoint="https://api.anthropic.com/v1/messages" \
  model="claude-sonnet-4-6" \
  apiKey="sk-ant-..."
For OpenAI:
vault kv put secret/oss/openai \
  endpoint="https://api.openai.com/v1/chat/completions" \
  model="gpt-4o" \
  apiKey="sk-..."
The Vault secret must contain three keys:
KeyDescription
endpointThe AI provider API URL
modelThe model name to use
apiKeyThe API key for authentication
The pipeline reads the secret at startup. If ai.enabled: true and the secret is missing or any key is absent, startup will fail with a descriptive error.

Supported Providers

Providerprovider valueAuth header
Anthropic Claudeanthropicx-api-key + anthropic-version: 2023-06-01
OpenAIopenaiAuthorization: Bearer
Any other value for provider will cause startup to fail with an unsupported provider error.