Object Store Destination (MinIO)

The object store destination uses an embedded Apache Spark engine to write columnar files to MinIO-compatible object storage. Output is written in Parquet or ORC format to an S3A path.

Output Path

Files are written to:

s3a://{bucket}/{prefixKey}

The default bucket is {environment}-raw. You can override this per pipeline with the destinationBucketOverride field.

File Formats

Format	Description
`parquet`	Apache Parquet columnar format (default)
`orc`	Apache ORC columnar format

Set the format with the fileFormat field.

Write Modes

Mode	Behavior
`append`	Append new files to the output path (default)
`overwrite`	Replace all existing files at the output path
`ignore`	Do nothing if data already exists at the output path
`errorifexists`	Fail the job if data already exists at the output path

Partitioning

Partition output files by one or more columns using the partitionBy array. Spark creates a directory structure based on the distinct values of the specified columns.

Delete Before Write

Set deleteBeforeWrite to true to remove all existing objects under the output prefix before writing new data. This is useful when you need a clean target path but want finer control than the overwrite write mode provides.

Type Casting

Column types from the source schema are mapped to Spark types before writing:

Schema Type	Spark Type
`tinyint`	`ByteType`
`smallint`	`ShortType`
`int`	`IntegerType`
`bigint`	`LongType`
`float`	`FloatType`
`double`	`DoubleType`
`string`	`StringType`
`boolean`	`BooleanType`
`date`	`DateType`
`timestamp`	`TimestampType`

Configuration Example

{
  "name": "sales_pipeline",
  "source": { "..." : "..." },
  "destination": {
    "objectStore": {
      "fileFormat": "parquet",
      "writeMode": "append",
      "prefixKey": "sales/daily",
      "destinationBucketOverride": "analytics-data",
      "partitionBy": ["region", "year"],
      "deleteBeforeWrite": false
    }
  }
}

Field Reference

Field	Required	Default	Description
`prefixKey`	yes		Key prefix under the bucket
`fileFormat`	no	`parquet`	Output file format: `parquet` or `orc`
`writeMode`	no	`append`	Spark write mode
`destinationBucketOverride`	no	`{environment}-raw`	Override the default bucket name
`partitionBy`	no		Array of column names to partition by
`deleteBeforeWrite`	no	`false`	Delete existing objects before writing

Completion Notification

A pipeline notification is published to ActiveMQ on completion. See Notifications for details.

Getting Started

Discovery

Taps

Ingestion

Destinations

Data Quality

Transformation

AI Features

Configuration

Examples

Object Store Destination (MinIO)

Output Path

File Formats

Write Modes

Partitioning

Delete Before Write

Type Casting

Configuration Example

Field Reference

Completion Notification

Getting Started

Discovery

Taps

Ingestion

Destinations

Data Quality

Transformation

AI Features

Configuration

Examples

Documentation Index

​Output Path

​File Formats

​Write Modes

​Partitioning

​Delete Before Write

​Type Casting

​Configuration Example

​Field Reference

​Completion Notification

Output Path

File Formats

Write Modes

Partitioning

Delete Before Write

Type Casting

Configuration Example

Field Reference

Completion Notification