Skip to main content
Schema validation checks incoming JSON or XML files against a formal schema definition before processing. This catches structural issues, missing required fields, and type mismatches early in the pipeline.

Requirements

  • The source file must be JSON or XML format.
  • The validation schema file must be stored in MinIO in the {environment}-config bucket under validation-schema/{filename}.

Configuration

Set the validationSchema field in the dataQuality block to the filename of the schema. The pipeline resolves the file from the MinIO config bucket at {environment}-config/validation-schema/{filename}.
{
  "pipelineName": "product_catalog",
  "sourceFileFormat": "JSON",
  "dataQuality": {
    "validationSchema": "product_catalog_schema.json"
  }
}

Example Schema

A JSON Schema file stored in MinIO at {environment}-config/validation-schema/product_catalog_schema.json:
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["product_id", "name", "price"],
  "properties": {
    "product_id": {
      "type": "string",
      "pattern": "^PRD-[0-9]{6}$"
    },
    "name": {
      "type": "string",
      "minLength": 1
    },
    "price": {
      "type": "number",
      "minimum": 0
    },
    "tags": {
      "type": "array",
      "items": { "type": "string" }
    }
  }
}

Behavior

  1. The pipeline loads the schema file from the configured path in the environment’s config bucket.
  2. Each record in the source file is validated against the schema.
  3. If any record fails validation, the file is rejected and processing stops.
  4. Validation errors are logged with details about which fields failed and why.

When to Use

Use schema validation when you need to enforce a strict contract on the structure of incoming JSON or XML data. This is particularly useful when data is provided by external systems where you do not control the format directly.