Skip to main content
The object store destination uses an embedded Apache Spark engine to write columnar files to MinIO-compatible object storage. Output is written in Parquet or ORC format to an S3A path.

Output Path

Files are written to:
s3a://{bucket}/{prefixKey}
The default bucket is {environment}-data. You can override this per pipeline with the destinationBucketOverride field.

File Formats

FormatDescription
parquetApache Parquet columnar format (default)
orcApache ORC columnar format
Set the format with the fileFormat field.

Write Modes

ModeBehavior
appendAppend new files to the output path (default)
overwriteReplace all existing files at the output path
ignoreDo nothing if data already exists at the output path
errorifexistsFail the job if data already exists at the output path

Partitioning

Partition output files by one or more columns using the partitionBy array. Spark creates a directory structure based on the distinct values of the specified columns.

Delete Before Write

Set deleteBeforeWrite to true to remove all existing objects under the output prefix before writing new data. This is useful when you need a clean target path but want finer control than the overwrite write mode provides.

Type Casting

Column types from the source schema are mapped to Spark types before writing:
Schema TypeSpark Type
tinyintByteType
smallintShortType
intIntegerType
bigintLongType
floatFloatType
doubleDoubleType
stringStringType
booleanBooleanType
dateDateType
timestampTimestampType

Completion Notification

A notification is sent when the write completes, indicating success or failure along with the number of records written and the output path.

Configuration Example

{
  "name": "sales_pipeline",
  "source": { "..." : "..." },
  "destination": {
    "objectStore": {
      "fileFormat": "parquet",
      "writeMode": "append",
      "prefixKey": "sales/daily",
      "destinationBucketOverride": "analytics-data",
      "partitionBy": ["region", "year"],
      "deleteBeforeWrite": false
    }
  }
}

Field Reference

FieldRequiredDefaultDescription
prefixKeyyesKey prefix under the bucket
fileFormatnoparquetOutput file format: parquet or orc
writeModenoappendSpark write mode
destinationBucketOverrideno{environment}-dataOverride the default bucket name
partitionBynoArray of column names to partition by
deleteBeforeWritenofalseDelete existing objects before writing
writeToTemporaryLocationnofalseWrite to a temp path first, then move