Skip to main content
Upload data files directly to the pipeline through the REST API. The pipeline accepts any file type (CSV, JSON, XML, Excel, PDF, Word, PowerPoint, HTML, email, EPUB, plain text, and archives), handles decompression automatically, and returns a tracking token.

Endpoint

POST /api/v1/pipeline/upload

Request

The request uses multipart/form-data encoding with the following parts:
PartRequiredDescription
fileYesThe data file to upload
pipelineYesName of the target pipeline configuration
publishertokenNoOpaque token identifying the data publisher; used for lineage tracking

Example

curl -X POST "http://localhost:9000/api/v1/pipeline/upload" \
  -F "file=@transactions_2025.csv" \
  -F "pipeline=transactions" \
  -F "publishertoken=finance-team-a"

Response

A successful upload returns HTTP 200 with a bare string body — the pipeline token itself (a unique UUID generated for the job), not a JSON object:
d4f8e1a2-7b3c-4e9f-a5d6-1c2b3e4f5a6b
Use this token to track the job’s processing status, view errors, or cancel the job. When a compressed archive contains multiple inner files that are submitted as separate jobs, the response is instead the literal string:
3 file(s) submitted

Compressed Files

When the uploaded file has a compressed extension, the archive is extracted in memory and processed synchronously — nothing is staged to MinIO:
ExtensionHandling
.zipExtracted in memory
.gzDecompressed in memory
.tarExtracted in memory
.jarExtracted in memory
Compressed archives may contain multiple data files. Handling depends on the pipeline type:
  • CSV pipelines — when an archive contains multiple inner files, they are concatenated into a single job. The header is kept from the first file and stripped from the 2nd and later files. The response is the single pipeline token.
  • Non-CSV pipelines, or single-file archives — each inner file is submitted as its own job. With more than one file, the response is the "N file(s) submitted" string.

Uncompressed Files

Files without a recognized compressed extension (e.g., .csv, .json, .xml, .xls, .pdf, .docx, .txt, etc) are read into memory and submitted directly. They are not staged to MinIO.

Processing Flow

  1. The client sends the multipart request.
  2. The endpoint validates that the named pipeline is registered.
  3. The file bytes are read into memory and the extension is inspected.
  4. Compressed path: the archive is decompressed in memory. For CSV pipelines with multiple inner files, the files are concatenated (headers stripped after the first) and submitted as one job; otherwise each inner file is submitted individually.
  5. Uncompressed path: the file contents are passed directly to the ingestion pipeline.
  6. The endpoint returns the pipeline token (or the "N file(s) submitted" string).

Error Responses

HTTP StatusCause
400Missing pipeline request parameter (Spring binding error)
500Pipeline not registered, empty file, internal error, or processing failure
The 500 response covers all errors handled by the controller, including an empty file and an unregistered pipeline. There is no explicit 413 handler; exceeding the configured multipart size limit produces Spring’s default multipart-size error.

Size Limits

The maximum upload size is controlled by spring.servlet.multipart.max-file-size in application.yaml. The default is 1 GB. Adjust this value if your files exceed the limit:
spring:
  servlet:
    multipart:
      max-file-size: 1GB