Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.datris.ai/llms.txt

Use this file to discover all available pages before exploring further.

Pipeline Tokens

Every pipeline processing job is assigned a unique pipeline token — a UUID that identifies the job throughout its lifecycle. You can monitor jobs via the REST API, the MCP server, the CLI, or the Datris UI. Pipeline tokens are returned:
  • In the response body of POST /api/v1/pipeline/upload (for uncompressed files)
  • In the job status when querying by pipeline name

Job Status

Query by Pipeline Token

curl "http://localhost:8080/api/v1/pipeline/status?pipelinetoken=pt-abc12345-..."
Returns an array of status entries for the job, one per processing stage:
[
  {
    "id": 1,
    "dateTime": "2026-03-15T10:00:00Z",
    "pipeline": "sales_data",
    "processName": "StreamNotifier",
    "publisherToken": null,
    "pipelineToken": "pt-abc12345-...",
    "filename": "sales_data",
    "state": "begin",
    "code": "begin",
    "description": "Process started",
    "epoch": 1710500400000
  },
  {
    "id": 2,
    "dateTime": "2026-03-15T10:00:01Z",
    "pipeline": "sales_data",
    "processName": "DataQuality",
    "publisherToken": null,
    "pipelineToken": "pt-abc12345-...",
    "filename": "sales_data",
    "state": "processing",
    "code": "processing",
    "description": "Running CodeGen data quality rule",
    "epoch": 1710500401000
  },
  {
    "id": 3,
    "dateTime": "2026-03-15T10:00:03Z",
    "pipeline": "sales_data",
    "processName": "PostgresLoader",
    "publisherToken": null,
    "pipelineToken": "pt-abc12345-...",
    "filename": "sales_data",
    "state": "end",
    "code": "end",
    "description": "Process completed",
    "epoch": 1710500403000
  }
]

Query by Publisher Token

curl "http://localhost:8080/api/v1/pipeline/status?publishertoken=pub-abc12345-..."
Returns every status row whose publisherToken matches — covers all ingestion jobs a single caller submitted. Tap runs set a publisherToken on every job they spawn, so one query covers a structured tap (1 job) or a document tap (N jobs, one per file) in a single call. Use publisherToken when you need to watch “this entire run,” pipelineToken when you need the detail of one specific job. Add &withrollup=true to wrap the response in a {rollup, events} object that classifies each job (success, warning, error, processing, timed_out) and exposes rollup.allDone for a single boolean to poll on. See the status API reference for the full shape. Agents call the same query via the MCP get_pipeline_status tool — pass publisher_token from a run_tap response and poll until rollup.allDone is true. The MCP tool sets withrollup=true automatically. For an upload_data flow, get_job_status does the same with the pipelineToken returned from the upload.

Query by Pipeline Name

curl "http://localhost:8080/api/v1/pipeline/status?pipelinename=sales_data&page=1"
Returns an array of job summaries for the pipeline (20 per page):
[
  {
    "createdAtTimestamp": "2026-03-15T10:00:00Z",
    "createdAt": 1710500400000,
    "updatedAt": 1710500403000,
    "pipeline": "sales_data",
    "pipelineToken": "pt-abc12345-...",
    "process": "PostgresLoader",
    "startTime": "2026-03-15T10:00:00Z",
    "endTime": "2026-03-15T10:00:03Z",
    "totalTime": "3s",
    "status": "end"
  }
]

Job Lifecycle

Jobs progress through these states:
StateDescription
INITIALIZEDJob created, queued for processing
PROCESSINGRunning in a dedicated thread
COMPLETEDFinished (check status messages for success or error)
CANCELLEDJob was killed via the kill_job API or MCP tool

Processing Stages

Each job logs status messages as it progresses through stages:
  1. FileNotifier / StreamNotifier - Initial file or stream intake
  2. DataQuality - Validation (if configured)
  3. Transformation - Data transformation (if configured)
  4. JobRunner - Orchestration of destination loaders
  5. [LoaderName] - Each destination loader (e.g., PostgresLoader, SparkObjectStoreLoader)
Each stage logs begin, processing (with details), and end messages.

Status Storage

Job statuses are stored in MongoDB in the {environment}-pipeline-status collection. Each entry contains the pipeline token, process name, status, message, and timestamp.

Concurrent Job Handling

  • All destination loaders for a single job execute in parallel on a 20-thread pool
  • Jobs targeting the same database table are serialized (only one runs at a time)
  • Multiple jobs for different pipelines run concurrently

Datris UI

The Datris UI provides a visual interface for managing your entire Datris platform. It includes tabs for MCP server status and tools, the Agents tab for live AI agent activity, pipeline management, ingestion monitoring with job history and error details, semantic search across vector databases, and secrets management — all without needing to use the API directly.

Agent Monitor

The Agents tab shows a live view of every AI agent currently connected to the platform’s MCP server, along with a streaming log of the tool calls each agent is making. The visualization pane draws one icon per active MCP session on the right of the MCP server icon, connected by a line that pulses whenever a call is in flight. Idle sessions fade out automatically once they disconnect. Each agent label uses the most descriptive identifier available, in this order:
  1. The MCP clientInfo.name supplied during the client’s handshake (e.g. claude-ai, claude-code, cursor)
  2. The tenant name (multi-tenant deployments)
  3. The API-key name from the api-keys secret (single-tenant deployments with named keys)
  4. The API-key prefix
  5. The session short-id (last-resort fallback)
The activity log below the visualization lists every tool call as it happens — timestamped per the platform’s configured date format and timezone, with the calling agent, tool name, argument preview, record count, response size, status, and latency. Clicking a row expands it to reveal the full arguments and response bodies as pretty-printed JSON (capped at 2 KB per blob). A header toolbar lets you copy the full log (with per-row detail) to the clipboard or clear the on-screen history. Activity is held in an in-memory ring buffer on the MCP server (the most recent 200 calls) and served via the UI’s internal /api/v1/mcp/activity proxy. It is not persisted — restarts clear the history.

CLI

Check job status from the terminal:
datris status my_pipeline
datris health