CSV Header Validation

Header validation uses an AI model to check that the CSV file header matches the expected field names defined in the pipeline schema. The validation is fuzzy — it allows case differences, underscores vs spaces, abbreviations, and minor naming variations. Column order does not matter. All schema columns must be present; extra columns in the file are OK.

Requirements

The source file must be CSV format.
The pipeline must be configured with header: true (i.e., the first row contains column names).
ai.enabled: true must be set in application.yaml.

Configuration

Enable header validation by setting validateFileHeader to true in the dataQuality block:

{
  "dataQuality": {
    "validateFileHeader": true
  }
}

Behavior

The pipeline reads the first row of the CSV file (the header) and the schema field names.
Both are sent to the AI model, which evaluates whether they match.
The AI allows fuzzy matching: "First Name" matches "first_name", "qty" matches "quantity", etc.
Column order does not matter — the pipeline uses the header to map columns to schema fields by name.
All schema columns must be present in the header. Missing columns fail validation.
Extra columns in the header beyond the schema are accepted.
If validation fails, a clear error message explains which columns are missing or unmatched.

When to Use

Use header validation to catch files with missing columns or unexpected column names. Because matching is fuzzy and order-independent, minor formatting differences won’t cause false rejections.

JSON/XML Schema Validation Data Quality — AI Rule

​Requirements

​Configuration

​Behavior

​When to Use

Requirements

Configuration

Behavior

When to Use