Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.datris.ai/llms.txt

Use this file to discover all available pages before exploring further.

The aiRule validates data using a plain-English instruction. Datris generates a Python validation script from your instruction via an LLM (single API call), then runs the script in the container against all data.

Configuration

Define the AI rule in the aiRule field within the dataQuality block of the pipeline configuration.
FieldTypeDescription
instructionstringA natural language description of all validation criteria. Multiple checks can be combined in one instruction.
onFailureIsErrorbooleanIf true, failures abort processing. If false, failures are logged as warnings.

How it works

  1. Prompt generation — Datris extracts column names and a few sample rows from the data, combines them with your instruction, and sends a single prompt to the configured AI model.
  2. Script generation — The AI model generates a self-contained Python 3 validation script (stdlib only, no pip packages).
  3. Local execution — The generated script is written to a temp file and executed via python3 against the full data file. Processing cost is zero after the initial API call.
  4. Result parsing — The script outputs a JSON array of failures: [{"index": <row_number>, "reason": "..."}]. Datris parses this and maps failures back to the original data.
  5. Cleanup — Temp files are deleted after execution.

Example

{
  "dataQuality": {
    "aiRule": {
      "instruction": "Validate that all email addresses are properly formatted, all phone numbers contain 7-15 digits, all dates are in YYYY-MM-DD format, and prices are positive and not exceeding $1,000,000",
      "onFailureIsError": true
    }
  }
}
The generated Python script handles all checks in one pass.

Works with all file types

This approach works for:
  • CSV/delimited files — The script reads the CSV with the appropriate delimiter and validates each row.
  • JSON files — The script parses the JSON array and validates each record.
  • XML files — The script uses xml.etree.ElementTree to parse and validate each element.
For all formats, the instruction is the same plain-English description. The LLM generates the appropriate parser for the file type.

Behavior

  • The AI rule generates the Python script once per rule evaluation- The generated script runs in the container via python3 with a 5-minute timeout
  • Scripts are constrained to Python standard library only (no external packages)
  • Rules with onFailureIsError: true are counted as errors; onFailureIsError: false are warnings
  • Processing aborts immediately if the error count exceeds 100
  • After all rows are evaluated, if any errors exist, processing is aborted
  • Warnings are summarized at the end of validation

Requirements

  • ai.enabled: true must be set in application.yaml
  • The Vault secret for the AI provider must be configured (see AI Configuration)