Skip to main content
The pipeline includes a built-in MCP (Model Context Protocol) server that lets AI agents interact with the platform natively. Any MCP-compatible agent — Claude Desktop, Claude Code, Cursor, or custom agentic frameworks — can discover database metadata, create and manage pipelines, upload files, monitor jobs, profile data, search vector databases, query structured data, answer questions with AI-powered RAG, and upload configuration files — all without custom integration code. The MCP server is a lightweight Python service that routes all operations through the pipeline’s REST API. It runs alongside the pipeline in Docker or locally for development.

Resources

The MCP server exposes resources that agents can read on demand for detailed documentation.
Resource URIDescription
datris://pipeline-config-referenceComplete reference for building pipeline configurations — all source types, data quality rules, transformations, and destination types with JSON examples

Available Tools

Pipeline Management

ToolDescription
list_pipelinesList all registered pipeline configurations
get_pipelineGet a specific pipeline configuration by name
create_pipelineCreate a pipeline from sample data (base64-encoded). Schema is auto-detected. Specify destination type.
delete_pipelineDelete a pipeline and its destination data
upload_dataUpload data (base64-encoded) to a pipeline for processing (returns pipeline token)
get_job_statusGet job status by pipeline token or pipeline name
kill_jobKill a running job by pipeline token
profile_dataAI-profile data (base64-encoded) with summary stats and suggested DQ rules
get_versionGet pipeline server version
check_service_healthCheck which backend services are up, down, or not configured (slow — use for diagnostics only)
Semantic search across any of the pipeline’s supported vector databases. Each tool takes a natural language query and returns the most similar document chunks with scores and metadata.
ToolDescription
search_qdrantSearch a Qdrant collection
search_weaviateSearch a Weaviate class
search_milvusSearch a Milvus collection
search_chromaSearch a Chroma collection
search_pgvectorSearch a pgvector PostgreSQL table

Database Queries

Read-only queries against the pipeline’s backend databases.
ToolDescription
query_postgresExecute a read-only SQL SELECT query against PostgreSQL
query_mongodbQuery a MongoDB collection with filter and projection
query_naturalAsk a question in natural language — AI generates and executes SQL

Metadata Discovery

Explore the structure of PostgreSQL, MongoDB, and vector databases managed by the platform. Use these tools to understand what data is available before writing queries or running searches.
ToolDescription
list_postgres_databasesList all PostgreSQL databases
list_postgres_schemasList schemas in a PostgreSQL database
list_postgres_tablesList tables in a schema (supports vector-only filter)
list_postgres_columnsList columns and types for a specific table
list_mongodb_databasesList all MongoDB databases
list_mongodb_collectionsList collections (optionally filtered by database)
list_qdrant_collectionsList all collections in Qdrant
list_weaviate_classesList all classes in Weaviate
list_milvus_collectionsList all collections in Milvus
list_chroma_collectionsList all collections in Chroma
list_pgvector_collectionsList all pgvector tables in PostgreSQL

AI

ToolDescription
ai_answerAnswer a question using AI based on provided context (RAG)

Configuration

ToolDescription
upload_configUpload a JSON Schema config file (base64-encoded content)
update_secretUpdate an AI provider secret (anthropic, openai, ollama, embedding) to configure API keys

Setup

Docker (automatic)

The MCP server starts automatically with docker-compose up in SSE mode on port 3000. No additional setup required.

Local (for Claude Desktop / Claude Code)

The MCP server is published on PyPI. Use uvx to run it directly:
uvx datris-mcp-server

Transport Modes

ModeUse CaseCommand
stdioClaude Desktop, Claude Code, local agentspython server.py
SSEDocker, remote agents, web clientspython server.py --sse --port 3000

Configuring Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
    "mcpServers": {
        "datris": {
            "command": "uvx",
            "args": ["datris-mcp-server"],
            "env": {
                "PIPELINE_URL": "http://localhost:8080"
            }
        }
    }
}

Configuring Claude Code

Add to .mcp.json in your project root:
{
    "mcpServers": {
        "datris": {
            "command": "uvx",
            "args": ["datris-mcp-server"],
            "env": {
                "PIPELINE_URL": "http://localhost:8080"
            }
        }
    }
}

Environment Variables

VariableDefaultDescription
PIPELINE_URLhttp://localhost:8080Pipeline server URL
PIPELINE_API_KEY(empty)API key if pipeline has key validation enabled
All database connections, vector search, and embedding are handled by the pipeline server. The MCP server only needs the pipeline URL.

Example Agent Workflows

Profile and ingest a CSV file

An AI agent could autonomously:
  1. Profile the dataprofile_data with the CSV file
  2. Review suggested rules — agent reads the AI-suggested DQ rules
  3. Create the pipelinecreate_pipeline with the profiled config
  4. Upload the fileupload_data to trigger processing
  5. Monitor statusget_job_status to track completion

Build and query a RAG knowledge base

  1. Create pipelinecreate_pipeline with Qdrant/Weaviate/Milvus/pgvector destination
  2. Upload documentsupload_data for each PDF/document
  3. Monitorget_job_status until all documents are processed
  4. Searchsearch_qdrant to find relevant chunks
  5. Answerai_answer with the retrieved chunks as context and the user’s question

Discover and query data

  1. List databaseslist_postgres_databases to see available databases
  2. List schemaslist_postgres_schemas to explore a database
  3. List tableslist_postgres_tables to find relevant tables
  4. Inspect columnslist_postgres_columns to understand table structure
  5. Queryquery_postgres with a well-formed SELECT query

Cross-database analysis

  1. Search documentssearch_pgvector for relevant financial document chunks
  2. Query structured dataquery_postgres to get related financial metrics
  3. Combine — agent merges unstructured + structured data in its response

AI validation and transformation

  1. Create pipeline with AI rulescreate_pipeline with codegen_rule for validation and/or codegen_transform for transformation
  2. Upload dataupload_data to process data through the AI-powered pipeline
  3. Datris generates Python scripts from your instructions and runs them in the container

Automated data quality monitoring

  1. List pipelineslist_pipelines to discover all registered pipelines
  2. Upload new dataupload_data with latest data files
  3. Check resultsget_job_status to see DQ failures
  4. Diagnose — AI reads error explanations and suggests fixes

CLI Examples

The Datris CLI connects to the MCP server and provides the same capabilities from the terminal. See CLI for the full reference.
# Install
brew tap datris/tap
brew install datris

# Ingest a CSV into PostgreSQL
datris ingest sales-data.csv --dest postgres

# Ingest with AI validation and transformation
datris ingest trades.csv --dest postgres \
  --ai-validate "all prices must be positive and dates must be YYYY-MM-DD" \
  --ai-transform "convert dates to YYYY/MM/DD and uppercase all ticker symbols"

# Ingest into a vector store for RAG
datris ingest manual.pdf --dest pgvector

# Query PostgreSQL
datris query "SELECT * FROM public.sales LIMIT 10"

# Natural language query
datris ask-sql "what are the top 5 stocks by volume?" --table trades

# Semantic search
datris search "quarterly revenue" --store pgvector --collection financial_docs

# RAG — search + AI answer
datris ask "What is the return policy?" --store pgvector --collection support_docs

# List pipelines, check health, get status
datris pipelines
datris health
datris status my_pipeline