Data Ingestion API

Upload a File

Upload a data file for processing by a configured pipeline.

POST /api/v1/pipeline/upload
Content-Type: multipart/form-data

Parameters:

Parameter	Type	Required	Description
`file`	form-data	Yes	The file to upload
`pipeline`	form-data	Yes	Target pipeline name
`publishertoken`	form-data	No	Publisher identifier for tracking

Behavior:

Compressed files (.zip, .gz, .tar, .jar): Staged to MinIO raw bucket for asynchronous processing
Uncompressed files: Processed immediately in-memory
Schema evolution: If a CSV file contains new columns not in the pipeline schema, they are automatically added (as string type) and the destination table is altered. See Schema Evolution.

Example:

curl -X POST http://localhost:8080/api/v1/pipeline/upload \
  -F "file=@/path/to/data.csv" \
  -F "pipeline=sales_data" \
  -F "publishertoken=batch-001"

Response: 200 OK with the pipeline token (for uncompressed files):

pt-abc12345-6789-...

For compressed files, the response is 200 OK with no body. The file is processed asynchronously when the pipeline detects it in the raw bucket.

Generate Pipeline Schema

Upload a CSV file to automatically infer the schema and generate a partial pipeline configuration.

POST /api/v1/pipeline/generate
Content-Type: multipart/form-data

Parameters:

Parameter	Type	Required	Description
`file`	form-data	Yes	Data file to analyze (CSV, JSON, or XML)
`pipeline`	form-data	No	Pipeline name (auto-derived from filename if omitted)
`delimiter`	form-data	No	CSV delimiter (default: `,`)
`header`	form-data	No	Whether file has header row
`allStrings`	form-data	No	If `true`, all fields are typed as `string` (default: `false`)

Example:

curl -X POST http://localhost:8080/api/v1/pipeline/generate \
  -F "file=@/path/to/sample.csv" \
  -F "pipeline=my_pipeline"

Response: 200 OK with a partial PipelineConfig JSON:

{
  "name": "my_pipeline",
  "source": {
    "fileAttributes": {
      "csvAttributes": {
        "delimiter": ",",
        "header": true
      }
    },
    "schemaProperties": {
      "fields": [
        {"name": "id", "type": "int"},
        {"name": "name", "type": "string"},
        {"name": "amount", "type": "double"},
        {"name": "created_at", "type": "string"}
      ]
    }
  },
  "destination": {
    "database": {
      "dbName": "datris",
      "schema": "public",
      "table": "my_pipeline",
      "usePostgres": true
    }
  }
}

Inferred types: boolean, int, bigint, float, double, string, date, timestamp Note: For CSV files, the AI analyzes the content and infers types. For JSON and XML files, a default config is generated with a single _json or _xml string field. When allStrings is true, all fields are set to string (used by the MCP server for reliable ingestion). Edit the generated JSON to add your destination configuration before registering it with POST /pipeline.

Profile Data

Upload a data file for AI-powered profiling — get summary statistics, quality issues, and suggested validation rules.

POST /api/v1/pipeline/profile
Content-Type: multipart/form-data

Parameters:

Parameter	Type	Required	Description
`file`	form-data	Yes	Data file to profile
`delimiter`	form-data	No	CSV delimiter (default: `,`)
`header`	form-data	No	Whether file has header row (default: `true`)
`sampleSize`	form-data	No	Number of rows to sample (default: `200`)

Example:

curl -X POST http://localhost:8080/api/v1/pipeline/profile \
  -F "file=@/path/to/data.csv" \
  -F "sampleSize=200"

Response: 200 OK with a JSON profile including summary statistics, quality issues, recommendations, and suggested data quality rules. See AI Data Profiling for the full response format.

API Documentation

Pipelines

Ingestion

AI

Status

Query

Search

Metadata

Configuration

Health

Secrets

Taps

Data Ingestion API

Upload a File

Generate Pipeline Schema

Profile Data

API Documentation

Pipelines

Ingestion

AI

Status

Query

Search

Metadata

Configuration

Health

Secrets

Taps

Documentation Index

​Upload a File

​Generate Pipeline Schema

​Profile Data

Upload a File

Generate Pipeline Schema

Profile Data