Endpoint
POST /api/v1/pipeline/profile
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
file | multipart file | (required) | The data file to profile (CSV, JSON, or XML) |
delimiter | string | , | CSV delimiter character |
header | boolean | true | Whether the CSV file has a header row |
sampleSize | int | 200 | Number of rows to sample for large files |
Example
Response
The endpoint returns a JSON object with three sections:Response fields
| Section | Description |
|---|---|
summary | Row count, column count, and per-column statistics (inferred type, null count, unique count, sample values) |
qualityIssues | Data quality problems detected in the sample — missing values, outliers, inconsistent formats, suspicious patterns |
recommendations | Human-readable suggestions for validation rules and transformations |
suggestedDataQuality | Ready-to-use dataQuality JSON block that can be copied directly into a pipeline configuration. Contains an aiRule with a comprehensive plain-English instruction covering all validation checks |
Suggested data quality rules
ThesuggestedDataQuality section provides a complete, copy-paste-ready dataQuality configuration based on what the AI observed in the data:
aiRule— a single comprehensive plain-English instruction covering all validation checks: format patterns (emails, phone numbers, dates), value ranges, cross-column relationships (e.g., high >= low), and business logic. Datris generates a Python validation script from this instruction and runs it locally. If no validation rule is appropriate, this field is omitted.
How it works
- You upload a file — no pipeline registration needed
- For large CSV files, the pipeline randomly samples
sampleSizerows (keeping the header) - The sampled content is sent to the AI model with a profiling prompt
- The AI analyzes the data and returns a structured JSON profile
Use cases
- Explore new data — understand the structure, types, and quality of an unfamiliar file before writing a pipeline configuration
- Discover quality issues — find missing values, outliers, format inconsistencies, and suspicious patterns
- Generate rule ideas — the AI suggests aiRule instructions and transformations based on what it observes
- Validate assumptions — confirm that a file matches expected schema and data quality before loading
Sampling
For files larger thansampleSize rows, the profiling endpoint automatically samples a random subset. The header row is always included. This keeps profiling fast and within AI context window limits regardless of file size.
The default sample of 200 rows is typically sufficient to detect patterns, types, and quality issues. Increase sampleSize for more thorough profiling at the cost of slower response times.
Requirements
ai.enabled: truemust be set inapplication.yaml- A configured AI provider (see AI Configuration)
- Cloud providers (Anthropic, OpenAI) recommended for best results