Database lockdown, tap script hardening, Create Tap UX.
- Server-controlled database name. UI no longer edits the Postgres or MongoDB database name —
/api/v1/versionreturnspostgresDatabaseandmongodbDatabaseand the UI submits them unchanged. - MongoDB internal vs user split.
mongodb.database(user-facing, defaultdatris) holds pipeline data;mongodb.internalDatabase(defaultoss) holds platform state. Existing installs keepossfor platform state; new user pipelines land indatris. - Unlimited reads for tap scripts.
limit: -1on/api/v1/query/mongodband/api/v1/query/postgresreturns every matching row. Preview defaults (20 Mongo / 100 Postgres) are unchanged for UI/MCP callers. - Tap test sampling. New “Limit test sample to 20 records” checkbox in Create Tap step 2 injects
DATRIS_TAP_TEST_LIMIT=20into the script. Cron and manual runs read everything. - Codegen + diagnosis hardening. Generated scripts treat platform response shapes as contractual (no shape-probing, no candidate-key iteration). Diagnosis quotes the actual traceback and respects in-script guards.
- Create Tap UX. Ask button, auto-apply diagnosis (capped at 2 attempts), Stop Test, copy-to-clipboard, scrollable-JSON test results, destination collision check on Generate Pipeline, full destination shown on step 5.
- Data Catalog. One-click delete of the Uncataloged group; per-item trash icon on taps and pipelines inside every catalog.
Input sanitization hardening and minor UI polish.
- Pipeline creation sanitizes destination identifiers before writing the config — Postgres
dbName/schema/table, MongoDBdbName/table, Kafkatopic, ActiveMQqueueName, vectorcollectionName/tableName/schemaName, and the DQ schema name. - Four duplicated
sanitizeNamehelpers consolidated intoui/src/app/shared/sanitize.ts(sanitizeLabelandsanitizeIdentifier).
Discovery wizard, Data Catalog, and trial BYO AI keys.
- Discovery wizard. Six-step AI-guided onboarding that turns “yfinance daily prices for the S&P 500” into running taps and pipelines. New endpoints
POST /api/v1/discoverandPOST /api/v1/discover/build, plusdiscover_sourceMCP tool. - Data Catalog. Group related taps and pipelines into named catalogs. New Data Catalog tab with expandable contents and an Uncataloged group.
- Per-pipeline DQ + transformation editor. Inline editor inside the pipeline view.
- Trial BYO keys. Trial tenants supply their own Anthropic or OpenAI key at signup; keys land in the tenant’s Vault path and override the shared Datris-managed key on the next AI call.
Dedicated instance support and hosted platform improvements.
- Hosted-aware Configuration UI. Hides Ollama for AI Primary/CodeGen on hosted, locks embedding to bundled Ollama
bge-m3when the provider is Anthropic, and hides the advanced toggle. - Improved multi-user session handling on shared instances.
Full tap MCP tool suite and user-supplied tap scripts.
- Four new MCP tools:
get_tap,test_tap,update_tap,get_tap_logs. create_tapaccepts an optionalscriptparameter for user-supplied Pythonfetch(), plussecret_namefor Vault-injected credentials.- CLI:
datris tap create --script path/to/script.pyand newdatris tap show.
Ollama for all AI slots and hot-reload on save.
- Configuration UI offers Ollama (local) as a provider for AI Primary, CodeGen, and Embedding. Bundled Ollama sidecar option pre-fills
bge-m3for embedding. - Saving AI configuration takes effect immediately — no container restart required.
Trial-instance hardening and Configuration tab polish.
- Trial codegen now defaults to
claude-haiku-4-5-20251001instead of Opus 4.6, dramatically lowering per-trial cost. Self-hosted customers continue to recommend Opus for codegen on their own keys. - Trial banner copy and link styling refinements on the Configuration tab.
Breaking: AI configuration restructured into three independent Vault secrets.
ai.aiSecretNameandai.providerare replaced by three top-level slots —ai.aiPrimary.secretName,ai.codegen.secretName,ai.embedding.secretName.- Each Vault secret is self-describing (
provider,endpoint,model,apiKey, optionallyversion). No path derivation from YAML. - v1.5.x deployments must update
application.yamland re-seed Vault on upgrade.
Onboarding, tap scheduling UX, and trial BYO key path.
- Getting Started tab as the new first-run landing page; Docs ↗ top-nav link.
- Inline cron editor on the Taps page: preset buttons, AI prompt field (“Every weekday at 4pm”), Quartz field validation, human-readable description.
- Truncate-before-run toggle in the tap wizard (sets
truncateBeforeWrite: trueon the generated destination). - Configuration tab for trial users with BYO Anthropic/OpenAI key banner and “Datris-managed (default)” vs “Your own key” status.
- Fix:
TapSchedulerhonors the configureddateTimezonefromapplication.yaml.
Taps (Beta). AI-generated Python scripts that fetch data on a schedule and stream it into a pipeline.
- 4-step tap creation wizard (Describe → Edit & Test → Schedule → Review) with brainstorm chat and proactive env-var suggestions.
- AI script generation with JSON-parse retry and raw-script fallback; AI diagnosis with explicit (a)/(b)/(c) options; “Apply Diagnosis” rewrites the script in place.
- Tap secrets stored in Vault tagged
_type=tap; run history viaTapRunLogandGET /tap/logs. - CRON scheduling (Quartz format), AI-generated from natural language.
- JSON-to-CSV pipeline feed (union of keys) with column-name normalization via spell-out table (
%→percent,#→num,&→and, …). - Pipelines page shows the feeding tap; pipeline wizard has “Create from Tap”.
Schema Evolution.
- Additive evolution: new CSV columns are auto-added as
string,schemaVersionincrements, andALTER TABLEruns on Postgres. - Dropped columns are excluded from the
COPYcommand so Postgres defaults them toNULL(previously failed on typed columns). - Missing key fields raise a clear error.
- Shared
DataUtil.evolveSchema()now used by bothStreamNotifierandFileNotifier. - Query endpoints use
serializeNulls()soNULLcolumns appear in results.
Remote MCP endpoint and managed service support.
- Per-session API key forwarding on the MCP server (header or
api_keyquery param);REQUIRE_API_KEY=truerejects unauthenticated SSE/streamable-HTTP connections. - New MCP tools for managed service:
signup_trial,upgrade_to_dedicated,check_upgrade_status. - Remote SSE registry entry for
https://mcp.trial.datris.ai/sse. - Website APIs accept
x-api-keyalternative to cookie JWT;/api/provision/agent-trialcombines signup + provisioning with rate limiting.
- Pipeline creation wizard hides PostgreSQL and MongoDB database name fields on trial instances — auto-populated from the environment or defaults to
datris.
Multi-tenant hosting on a shared instance.
- Per-tenant PostgreSQL databases (auto-created on first use) and per-tenant MongoDB databases scoped via the connection string environment name.
- Tenant-scoped metadata and query endpoints; new
TenantInterceptorsetsDatrisEnvironment.currentper request based on API key. - Vector DB secret isolation across Qdrant, Weaviate, Milvus, Chroma, pgvector.
- Batch upload for compressed files (
.zip,.gz,.tar,.jar) processes contents inline — no MinIO webhook dependency.
- Default AI provider changed to Anthropic (Claude Sonnet 4.6). Customers can still switch to OpenAI or Ollama via
application.yaml. - Generated Python scripts for DQ and transformation are logged in full to the server logs for debugging.
- New unified
datris analyzecommand replacesask-sqlandask. Auto-picks the right approach based on--dest(Postgres → SQL generation, MongoDB → document fetch, vector stores → RAG). --ai-analyzeflag on ingest.
Mintlify documentation and accuracy pass.
- Migrated docs from
.mdto.mdxwith a two-tab Mintlify layout. - Removed deprecated docs (row rules, column rules, JavaScript row functions, REST transformations, deduplication, column trimming).
- Page-by-page accuracy review against the codebase.
- Server fails fast at startup with a clear error if an AI provider is not configured (CodeGen DQ and transformation require it).
- Removed stale NEVER rules and deprecated-feature references from the MCP server instructions.
CodeGen DQ and transformation.
aiRulereplaces the prior AI DQ approach: the LLM generates a self-contained Python validation script from a plain-English instruction, which runs locally. Cost drops from 0.003/rule.aiTransformationuses the same CodeGen approach.- Works for CSV, JSON, and XML.
- Removed:
columnRules(regex), JavaScript row rules, REST endpoint row rules,AIDataQualityUtil. JavaScript row functions and REST transformations removed from UI and CLI.
- Atomic
create_pipeline.generate_schemais removed;create_pipelineaccepts base64-encoded sample data, auto-detects schema, and creates the pipeline in one call. - Content-based uploads across
upload_data(renamed fromupload_file),profile_data,upload_config. update_secretMCP tool for AI provider keys only.- All-string schema on MCP pipeline creation; pipeline registration verified by read-back.
update_secretMCP tool added (scoped to AI secrets: anthropic, openai, ollama, embedding).- Published to the MCP Registry (
io.github.datris/datris) and PyPI (datris-mcp-server).
New UI tabs.
- MCP Tab — agent view, service health for 10 backend services, browsable tool grid, config generator for Claude Desktop/Code and Cursor, and a Tool Playground that executes against the live API.
- Secrets Tab — full CRUD for Vault secrets with sensitive-field masking.
- New REST endpoints for vector store metadata discovery (Qdrant collections and friends).
