Ask your data a question — conversational search comes to the Search tab.
- Chat search. The Search tab has a new Chat mode, with a Traditional toggle for the structured query UI you already know. Ask a question in plain language and Datris finds the answer across all your pipelines and taps — cataloged or not — querying tables, searching documents, and replying with citations to where each answer came from. It’s read-only: it looks, it never changes anything.
- Scope to a catalog. Narrow a chat to a single catalog (or to Uncataloged data) from the dropdown, or leave it on All to search everything.
- Conversations survive a refresh. Your Search chat and Assistant conversations now persist across a browser refresh, so reloading the page no longer clears the transcript.
docker compose pull && docker compose up -d --force-recreate datris ui mcp-server. No data migration needed. The CLI: brew upgrade datris.Orchestrate Datris taps from Apache Airflow.
- Run taps from Airflow. A new
airflow-provider-datrispackage adds an operator that triggers a tap, waits for the pipeline to finish, streams Datris logs into the Airflow task log, and reports run tokens and row counts back to Airflow. Cancelling the DAG run cancels the Datris job. - Date-windowed backfills. Taps can now take per-run parameters, so an Airflow DAG can pass its logical date (or any window) into the tap for that run — backfills and incremental loads work without editing the tap.
- No double-firing. A tap is scheduled by Datris or Airflow, never both: if a tap has a Datris cron, the Airflow operator declines to trigger it. To drive a tap from Airflow, leave its cron empty.
docker compose pull && docker compose up -d --force-recreate datris ui mcp-server. No data migration needed. Install the provider where Airflow runs: pip install airflow-provider-datris.Write to AWS S3, query Parquet and ORC from the Assistant and Search, and stop chats actually stop.
- AWS S3 as a first-class destination. The Object Store destination now writes Parquet or ORC directly to S3 alongside the built-in MinIO. Pick Object Store (MinIO or S3) in the pipeline wizard, point at your bucket, and reference a credentials secret you’ve created in Configuration → Secrets → Platform. Multiple S3 destinations with different IAM keys coexist in the same deployment — each pipeline carries its own credential reference, applied per bucket at write time.
- Region lives with the credential. AWS credentials and region travel together in the credentials secret rather than on the pipeline config. One source of truth, one place to rotate, no more silent
us-east-1-says-the-config-but-the-key-is-us-west-2failures. Field names are flexible —accessKey/AWS_ACCESS_KEY/AWS_ACCESS_KEY_IDall work. - Query Parquet and ORC from the Assistant. “Show me the weather data” now works against pipelines whose destination is Object Store. The Assistant resolves the bucket and credentials from the pipeline config, reads the columnar files, and returns rows in chat.
- Search tab gains an Object Store option. Pick a pipeline from the dropdown, see the resolved bucket/prefix/format, set a limit, hit Execute. Works for both MinIO and S3 destinations.
- Assistant offers all three structured destinations. When you ask for structured data and no pipeline yet covers it, the Assistant now mentions MongoDB, PostgreSQL, and Object Store as choices instead of silently defaulting to one.
- Assistant can discover destination credentials it shouldn’t create. Platform-tab secrets are now visible to the Assistant for reading (names and field shape only — never values). When a pipeline destination needs a credentials reference, it lists what’s available, verifies the field shape, and points you at the Secrets tab to create one when nothing fits.
- Stop actually stops. Clicking Stop in the Assistant now halts the in-flight chat within a fraction of a second instead of waiting for the upstream model to finish generating. Cancelled responses cost only what was already streamed.
- Pipeline failures surface fast. A failed pipeline now flips to Error in the Ops dashboard within seconds instead of staying stuck in Processing for up to ten minutes.
docker compose pull && docker compose up -d --force-recreate datris ui mcp-server. Existing MinIO pipelines work unchanged. For AWS S3, create a Platform-tab secret with accessKey, secretKey, region (and optionally sessionToken), then reference it by name from your pipeline.Ask the Ops assistant about a failing pipeline without leaving the dashboard.
- New Ops chat side panel. A collapsible chat lives on the right side of Ops → Activity. The assistant has the current failures, stale taps, and volume anomalies in mind — ask “why did
Xfail?” and it pulls the root cause; ask it to re-run a tap and it runs and reports the outcome. The panel stays mounted as you switch between Activity and Ingestion, so the conversation survives the tab change. - “Ask” buttons on failure and volume rows. Click “Ask” next to a row to seed the chat with a row-specific question so you don’t have to retype the tap or pipeline name.
- Successes are expandable like Failures. Click any row in the Successes pane to see the same event trail (begin → processing → end) you get from the Failures pane. Only one row across either pane is open at a time so the layout stays compact.
- Claude Opus 4.8 is the new recommended CodeGen model. New installs seed Opus 4.8 as the codegen default. Existing tenants can pick it from the model dropdown in Configuration. The older Opus versions remain selectable for anyone who wants to pin a specific version.
docker compose pull && docker compose up -d --force-recreate datris ui mcp-server. No data migration needed. The CLI: brew upgrade datris.The Assistant stops second-guessing itself when the data is already there.
- No more apology loops. When the Assistant verifies platform state and the pipeline / tap it created earlier is present in the list, it now treats that as evidence the work was done — instead of retracting a prior turn’s “done” claim and rebuilding from scratch.
- “Show me X” goes straight to the data. When you ask to see / list / show data and a matching pipeline already exists, the Assistant now jumps to the destination’s query tool (Mongo / Postgres / vector search) and returns the actual rows or documents. It no longer asks you which sources, providers, or schedules to use for a pipeline you already have.
- Long catalogs no longer hide existing resources. The tools the Assistant uses to inventory pipelines and taps now return a compact summary with names at the top. Previously, on environments with many pipelines, an entry near the end of the list could slip past the Assistant’s scan — the new shape makes every name impossible to miss.
docker compose pull && docker compose up -d --force-recreate datris ui mcp-server. No data migration needed. The CLI: brew upgrade datris.Ops Activity gains a Successes pane, long lists scroll in place, and the dashboard’s auto-refresh no longer yanks you back to the top.
- Successes pane on Ops Activity. A new pane below Failures lists every pipeline that ran successfully in the selected window, with the run count, items processed, and last-run time. Click a row to jump to that pipeline.
- Long lists scroll inline. Failures, Successes, and the Per-pipeline volume table all cap at ~10 rows of height and scroll internally instead of stretching the page. Per-pipeline volume column headers stick to the top so you don’t lose context as you scroll.
- Auto-refresh preserves scroll position and expansion state. The 30-second refresh no longer scrolls a long pipeline-volume list back to the top or collapses an open failure detail. Expand a failure, scroll where you want, leave the tab open — it stays put.
- Numeric column headers aligned with their data on the Per-pipeline volume table — Today, 7d avg, vs avg now line up with the numbers underneath.
docker compose pull && docker compose up -d --force-recreate datris ui mcp-server. No data migration needed. The CLI: brew upgrade datris.A new Ops Activity dashboard, Postgres pipelines learn upsert, per-run tap parameters, and a safer agent workflow.
- Ops Activity dashboard — at-a-glance ingestion health. A new tab under Ops pulls every tap and pipeline run in a rolling window (24h / 7d / 30d) into KPI tiles, time-series charts of runs and items ingested, a per-pipeline 7-day volume table, and a Failures pane that dedupes by item with attempt counts. Each unrecovered row has a Re-run button — for pipeline failures with an upstream tap, the button re-runs that tap to retry the load so you don’t have to hunt for it in the Catalog.
- Postgres pipelines upsert on conflict when
keyFieldsis set. Matches the semantics Mongo has always had. Backfills over already-loaded dates, incremental taps with overlap, and “load again with the same key” flows now upsert instead of failing with a duplicate-key error. If you retrofitkeyFieldsonto an existing table, the platform adds the matching unique index for you on the next load — or surfaces a clear remediation message if existing data violates the proposed key. - Per-run tap parameters. Run a tap with caller-supplied values — date ranges, ticker lists, page cursors, batch sizes — without rewriting the script or the secret. Pass
paramstorun_tapand the script reads them as env vars for that one run only. Scheduled cron runs see an empty params bag and fall back to script defaults, so the same script handles both manual ad-hoc calls and unattended schedules without branching. - “No records” is no longer a failure. A tap that runs cleanly and returns zero rows — a polling tap on a quiet day, an incremental tap that’s caught up, a market tap on a weekend — now records a distinct
no_recordsstatus with a neutral badge. Doesn’t count in the Failures tile, doesn’t fire bogus “recovered” badges, doesn’t train agents to interpret “no new data” as “platform broken.” - Large tap outputs fail fast with an actionable error. A backfill that exceeds the size cap (default 100 MB) now stops with a clear message telling you and the Assistant to chunk the source range smaller — instead of OOM-killing the server. Multiple smaller runs all land in the same destination pipeline; with
keyFieldsset, overlapping ranges upsert safely. - Concurrent tap runs no longer race. Two pipelines that loaded data in the same millisecond previously risked landing each other’s records in the wrong destination under specific timing. Fixed at the source; no action required.
- Secret values stay masked in Configuration. The Secrets tab now masks any field whose name suggests a credential — passwords, tokens, API keys, signing keys, certificates, and named variants of those (e.g.
*_API_KEY,*_SECRET_*). No action required. - Tighter Assistant workflow. Three new disciplines:
- Scheduling lives on the tap. Say “every morning” or “at market open” and the Assistant sets the CRON expression on the tap itself — instead of handing back a
cronline for you to wire up yourself. - Test before first run. A newly-created or just-edited script is validated before any real run, and before being put on a schedule — no more “guaranteed-bad nightly run” the next time the cron fires.
- No confabulated progress. If the Assistant intended to do N things and only did M of them, it tells you which M happened rather than narrating all N as complete. The credential form the Assistant pops up now asks only for true secrets — not for configuration values you already typed into chat.
- Scheduling lives on the tap. Say “every morning” or “at market open” and the Assistant sets the CRON expression on the tap itself — instead of handing back a
- Agent Monitor stays responsive during long tool calls. Running a slow tool no longer freezes the Connections viz or the Activity Log — they keep streaming. Returning to the Agent Monitor tab refreshes the log to current server state instead of showing a stale empty view.
- Configurable JVM heap. The bundled
datrisservice has a sensible default heap size for small hosts (8 GB) and a clean way to bump it on larger machines via.env. See Installation → JVM Heap Sizing for the suggested sizings per host RAM. - Ops tab moved after Search in the top navigation. Mostly cosmetic; route URLs are unchanged so bookmarks and deep links still work.
docker compose pull && docker compose up -d --force-recreate datris ui mcp-server. No data migration needed. Existing Postgres pipelines with keyFields automatically pick up the upsert path on the next load. The CLI: brew upgrade datris.Embedding pipelines stop failing on oversized chunks, the Assistant carries long runs to completion, and the Catalog gets the missing delete buttons and run-history detail.
- Embedding pipelines no longer fail when a single chunk is too big. Every embedding call is now guarded by a token-aware safety net that splits any chunk over the model’s input cap before sending — so a 10-Q ingest with one dense table can’t take down the whole batch. Works the same for OpenAI, Cohere, Voyage, BGE-M3, Nomic, Mistral, or anything else, with built-in caps for the common models and a conservative default for the rest. OpenAI families get exact token counts; everyone else gets a heuristic.
- Token-aware chunking. A new
maxChunkTokensoption on the chunking config tells the chunker to stop merging segments before they cross a token estimate — the safety net above becomes a true last resort. Recommended for any new vector pipeline. - The Assistant carries pipeline runs to completion. When the agent kicks off a tap or upload, it now polls the run to a terminal outcome with exponential backoff instead of summarizing “still running — check back later.” You’ll see a one-line progress update each cycle and the final outcome reported in chat, including any per-document failures.
- Run History now shows per-document outcomes. Expanding a tap run that fed a vector pipeline lists every document the pipeline processed with its own status, elapsed time, and — for failures — the specific stage and error. A run that fetched 28 documents but failed on 1 now shows the failure inline instead of a misleading green “success.”
- Catalog: delete buttons work on individual taps and pipelines. The trash icon now shows an inline confirm right in the row — Delete / × for taps, Config & Data / Data Only / × for pipelines (Data Only wipes rows but keeps the config so the next ingest fills it again).
- Catalog: last-run timestamp is back above the status badge.
- Agent Monitor fits the viewport. Both Connections and Activity Log panes now size to the visible area on first load and reflow when you resize the window, not just in the pop-out.
docker compose pull && docker compose up -d --force-recreate datris ui mcp-server. No data migration needed. The CLI: brew upgrade datris.Streamlined navigation and a Catalog-centric workflow — fewer tabs, richer Catalog, and the Assistant front-and-center.
- Five top-level tabs instead of ten. Datris now opens to Assistant, MCP, Catalog, Data, Configuration — plus the Help dropdown. Same capabilities, organized around how you actually work.
- Catalog is the home for taps and pipelines. Each catalog card embeds the full tap and pipeline tables with inline rename and a move-to-catalog dropdown on every row. Uncataloged is always shown so day-1 users have a place to start.
- Bulk move at the catalog level. Move every tap and pipeline from one catalog into another in one click. The destination auto-expands so you see the items land.
- Describe to Assistant from any catalog. A button on each catalog card opens the Assistant with a fresh chat and the catalog pre-filled — the Assistant assigns the right catalog to whatever it creates.
- Wizards link back to where you came from. Tap and pipeline edit screens now show the item’s name in the title, a “Back to Catalog” link at the top, and primary action buttons at both top and bottom of the form. Pipeline wizard’s JSON-review step is gone — Save fires from the Destination step.
- Pop out the Agent Monitor. A new icon opens both Connections and Activity Log in a separate browser window — park it on a second monitor and watch tool calls stream while you work in the Assistant.
- Catalog state persists across navigation. Open catalogs and expanded sub-sections survive refreshes and tab switches.
- Fixed: tap rename no longer breaks scripts. Renaming a tap inline used to leave the new tap pointing at a deleted script file.
- Fixed: catalogs containing only pipelines could lose their assignment when the pipeline edit wizard was opened.
- Discovery tab removed. The Assistant covers its workflow.
- Getting Started tab removed. First-run guidance lives in the Assistant’s starter prompts and inline empty states.
docker compose pull && docker compose up -d --force-recreate datris ui. No data migration needed. The CLI: brew upgrade datris.Scoped API keys with per-agent permissions, plus Assistant resilience and smarter onboarding.
- New API-Keys tab in Configuration. Issue a dedicated key per agent, CLI, or integration with an explicit list of what it’s allowed to do — read pipelines, run taps, upload documents, query data, and so on. Each key is its own identity in the request log and can be rotated or revoked independently. Five starting templates: read-only, rag-builder, reporting, ops, and full-access. The tab only appears when you’ve set
USE_API_KEYS=truein.env— issuing keys is pointless until the validation layer is on. - Keys actually constrain. When an external agent (Claude Desktop, Cursor, the CLI) connects with a scoped key and tries something outside its bundle, the platform refuses the call and tells the agent why in plain JSON — so the agent doesn’t keep retrying alternate paths. Agents stay productive within their lane; you don’t have to trust them not to wander.
- UI no longer asks for a key when user authentication is on. With login enabled, your session cookie is the only thing the browser needs — paste-the-key flow goes away. The Assistant runs under your identity, audit logs show you as the actor, and your role determines what it can do (admin = full access, editor = data writes, viewer = read-only).
- Assistant rides through Anthropic overload. When Claude Opus is rate-limit-shedded, the Assistant retries with backoff and, if needed, transparently switches to Sonnet for the rest of the turn — with a small inline note so you know it happened. Conversations that previously errored out now keep moving.
- Assistant checks the platform before suggesting external sources. On any data-related ask (“I’m looking for X”), the agent now lists your existing pipelines and taps first, then either points you to what already exists or asks before adding more. Avoids the “let me enumerate seven public APIs” detour.
- Assistant auto-runs newly created taps that have no schedule. When you build a one-shot tap, the Assistant kicks off the first run so you see real data instead of an empty pipeline. For scheduled taps it asks first, since the cron will fire on its own.
- Health and version endpoints are public. Container orchestrators, status pages, and the UI’s connection check no longer trip 500s when API-key auth is required.
- Configuration → Taps sub-tab removed. Prompt Fragments are unchanged and still apply to tap creation, brainstorm, auto-fix, and Discovery — they’re now managed via the API instead of a dedicated UI page.
- Tap wizard pipeline link is clickable. The “Linked to: <pipeline>” pill in step 4 now navigates straight to the pipeline editor.
docker compose pull && docker compose up -d --force-recreate datris ui. Existing API keys keep working — they’re treated as full-access until you replace them with scoped keys from the new tab. The CLI: brew upgrade datris.Polish for the Assistant, AI Configuration, and MCP tabs.
- Assistant: keep typing without clicking. After you send a question, the cursor stays in the composer — so as soon as the agent finishes (or even while it’s still working), you can type your next prompt without reaching for the mouse.
- AI Configuration: provider switches no longer wipe your overrides. Switching the primary AI, codegen, or embedding provider used to clear the saved provider/model/endpoint on the next page load, so you had to re-pick them every time. Your selections now persist correctly through a provider switch (you still re-enter the API key when changing providers — that’s intentional).
- Connect Your Agent: paste-and-go for local. The MCP tab’s generated config no longer asks for an API key for the default local setup — paste the snippet into your agent and it just works. A key is still required when you point the snippet at a hosted, trial, or dedicated instance.
- Trial signup and dedicated upgrade move fully to the website. The corresponding agent tools have been removed — users finish those flows in a browser anyway. The Getting Started tab has also been refreshed to point at the Assistant for the common onboarding path.
docker compose pull && docker compose up -d. No data migration needed. The CLI: brew upgrade datris.A new in-product Assistant — chat your way from “I need data” to a working pipeline.
- The Assistant tab. A new top-level tab opens an in-product agent that finds an external data source, builds the tap, creates the pipeline, runs it, and shows you the result — all in one chat. You watch the model’s reasoning, every tool it calls, and the live status, then click straight into the tap or pipeline it created. See Assistant.
- Real-time visibility while the agent works. Streaming reasoning, inline tool cards with friendly labels (“Searching the web for …”, “Creating tap …”), live success/error status, and a Stop button that aborts the loop at the next checkpoint. Conversations survive navigating to other tabs and back.
- Credentials never enter the chat. When the agent needs an API key or other credentials, it opens an inline credentials form right in the chat. Values go straight to the vault — they don’t appear in the conversation log, your screenshots, or the model’s context. The form also lets you reuse an existing tap secret instead of creating a new one.
- Pipelines can be edited in place. Calling pipeline-create again with the same name now upserts the configuration without dropping the destination data. Two new knobs in the same call: a natural-key list for dedupe/upsert on every run, and a flag to wipe the destination before each run for full-snapshot workflows. Both work on PostgreSQL and MongoDB destinations.
- External agents get the same new tools. Claude Desktop, Cursor, and any other MCP client connected to your Datris server now see two new tools for discovering existing tap secrets (names and field shapes, never values) plus the new dedupe and reset knobs on pipeline create.
- Pipeline delete now describes what it actually does. The tool description used to claim it kept your destination data; it never has. It now correctly says it removes both the configuration and the destination data, with an explicit opt-in flag for the “keep the schema, clear the data” reset case.
- MCP transport upgraded — both protocols at once. The bundled MCP server now serves the streamable-HTTP transport alongside the existing SSE transport on the same port, so the in-product Assistant and external agents connect to one running process. No configuration change needed.
- Discovery tab is hidden in this release. The Assistant supersedes the Discovery tab for the common path.
docker compose pull && docker compose up -d. No data migration needed. The CLI: brew upgrade datris. The Assistant uses your existing codegen AI configuration — on Anthropic tenants you’ll see full chain-of-thought reasoning streamed inline; on OpenAI you’ll see reasoning summaries instead.Web search for AI tap workflows, plus simpler MCP authentication.
- AI tap workflows can consult the live web. When enabled in Configuration → AI Providers, tap brainstorm, dataset discovery, tap diagnosis, and tap auto-fix look up current API documentation, free-tier limits, current package names, and recent deprecation notices before recommending sources or generating fixes. Pick your web-search provider independently of AI Primary — the platform uses each provider’s native search tool and routes accordingly. First-pass tap script generation stays fast and uses the model’s training data only. See AI Configuration for setup.
- AI Configuration changes survive Docker restarts. Saving from the Configuration UI now mirrors the relevant keys back to your
.envfile so changes aren’t lost when the local Vault container restarts. Provider switches also clear any stale credential preserved from the prior provider, so a wrong-provider key can’t silently 401 the next call. - Cleaner MCP authentication. The bundled MCP server is now a transparent forwarder — each connecting agent provides its own API key per session and the MCP server passes it through to the Datris REST API on every tool call. The Configuration → Connect Your Agent panel generates the new configuration snippet automatically; paste your key from the Configuration UI into your agent’s MCP config. Existing trial and managed-service users see no change in behavior.
- Tap brainstorm asks about sources first. When you describe data without naming where to fetch it from, the AI now lists 3-5 candidate sources (with free vs paid and key-required info) before drilling into parameters — instead of asking for filtering details up front.
- Tap script generation is more resilient. The platform now validates that a generated script actually defines a
fetch()function before storing it, and the JSON extractor handles model responses that contain narrative braces without falling back to the raw-text path.
docker compose pull && docker compose up -d. No data migration needed. The CLI: brew upgrade datris. If you run the MCP server standalone outside Docker, the connection-target environment variable was renamed for consistency — see the MCP server docs for the new variable name.Reliable job-status polling, re-ingest that doesn’t overwrite your config, and first-class catalog tooling.
- Polling an upload’s job status now gives you a clear answer. Previously, when a file finished processing successfully, the response was a raw stream of progress events with no terminal status — agents and scripts couldn’t tell “still running” from “done.” Job status now returns a rollup with a single
allDoneflag and an aggregate outcome (success,warning,error), plus per-job error detail when something fails. Poll the rollup; act on the outcome. - Re-ingesting a file preserves your pipeline’s config.
datris ingestagainst an existing pipeline used to silently rewrite the config from CLI flags only — wiping out the catalog, custom validation rules, and any other fields you’d set through the UI or via an agent. Re-ingest now uploads into the existing pipeline as-is. To start over with a different config, delete the pipeline first. - Catalogs without the read-modify-write dance. New
--catalogflag ondatris ingestfor new pipelines, new optionalcatalogargument on thecreate_pipelineMCP tool, and a newset_catalogMCP tool that retags an existing pipeline or tap in one call. Empty catalog clears the label back to Uncataloged. See Data Catalog.
docker compose pull && docker compose up -d. No data migration needed. The CLI: brew upgrade datris.Choose your embedding provider independently of your chat provider.
- Mix-and-match AI providers. The embedding slot is now configured separately from the chat and code-generation slots, so you can keep Claude for chat and code generation while pointing embeddings at OpenAI (or vice-versa). Useful when the bundled embedder is too heavy for your host, or when you want a different model family for vector quality vs chat quality. See AI Configuration for the full list of options.
- Existing installs keep their current behavior. If you don’t set an embedding override, the embedding slot continues to follow your chat provider exactly as before — Claude installs keep using the bundled embedder, OpenAI installs keep using OpenAI embeddings. The override is purely opt-in.
docker compose pull && docker compose up -d --remove-orphans. No data migration needed. If you switch embedding providers on an existing deployment, vector destinations built on the previous embedder will fail-fast with a dimension-mismatch message on the next run — drop the affected destination tables or collections and re-ingest.User authentication, roles, and an admin-only Configuration tab.
- Optional username/password login. Datris can now require a login before any tab is reachable. Three roles ship out of the box — admin (full access), editor (read + edit pipelines, taps, secrets), and viewer (read-only). Off by default, so existing single-tenant installs are unaffected. See the new User Authentication doc to enable it.
- Configuration is admin-only. When auth is on, only admins see the Configuration tab (Secrets, AI Providers, Taps, Users, Environment). Editors and viewers continue to use everything else.
- New Users sub-tab. Admins can add, remove, and reassign roles. A built-in 16-character password generator with a reveal toggle makes handing out credentials painless. The last admin can’t be deleted.
- Self-service password change. Users can change their own password from the top-right user menu.
- Reveal toggle on the login screen. Easier to see what you’re typing on a new device.
- Clear the Agents activity log. The trash icon now wipes the server-side activity buffer (with an inline confirm) so the cleared state survives a refresh.
docker compose pull && docker compose up -d. No data migration needed; auth defaults to off.AI-agent RAG ingestion no longer burns through the conversation context.
- Creating a vector-store pipeline no longer requires sample content. Asking an agent to ingest a PDF into pgvector previously forced it to base64-encode the entire document just to register the pipeline — wasting tens of thousands of tokens before any work began. Vector pipelines now register from a name plus destination alone, freeing budget for the actual upload.
- Agents are guided to send each document in a single upload. The MCP server now makes explicit that vector destinations chunk server-side, preventing agents from needlessly splitting documents into many small uploads.
- Clearer Claude setup docs. The “Configuring Claude” guide shows the recommended SSE / mcp-remote setup first, and steers large-file ingestion to the CLI rather than dragging files directly into the chat — which can overflow the conversation context on sizable PDFs.
- README accuracy pass. Corrected tool counts, AI-provider model defaults, and license badge.
docker compose pull && docker compose up -d. No data migration needed.UI cleanup, friendlier Help menu, and a more reliable bundled embedding service.
- Bundled embedding handles large ingest batches without errors. The bundled embedding service no longer rejects large batches submitted by the platform, which previously surfaced as ingestion failures on long documents.
- Secrets is now a tab inside Configuration. Instead of a separate top-level tab, Secrets lives under Configuration alongside AI Providers, Taps, and Environment. Existing
/secretslinks continue to work — they redirect to the new location. - Help menu in the top bar. The Docs link is replaced with a Help dropdown that exposes both the docs and a direct link to file an issue on GitHub.
- Easier setup with Claude. A new “Configuring Claude” page in the docs walks through Claude Desktop and Claude Code setup end-to-end, including a first-prompts walkthrough for an empty install.
- Structured issue reporting. GitHub issues now use forms that capture version, component, deployment mode, and reproduction steps, making bug reports easier to triage and faster to fix.
docker compose pull && docker compose up -d. No data migration needed.Smaller, faster default install. Lighter download, opt-in Kafka, vector ingestion fixes.
- ~58% smaller
docker compose pull. The bundled platform now downloads roughly 11 GB less out of the box. Fresh installs come up dramatically faster. - Bundled embeddings are faster on the same
bge-m3model. No configuration changes needed; existing vector collections built with the previous bundled embedder continue to work without re-embedding. - Kafka is now opt-in. Most local installs don’t need it, so Kafka, Zookeeper, and the Kafka UI ship commented out in
docker-compose.yml. Uncomment the bundled blocks (and the related volumes at the top of the file) to enable them. Pipelines that point at external Kafka brokers are unaffected. - Vector ingestion no longer fails with a “duplicate key” error when many documents land at once. Concurrent document loaders previously raced on creating the
pgvectorextension; the race is now serialized. - Vector ingestion no longer fails on embedding providers that limit batch size. The chunk batch size is now configurable per embedding secret (
batchSize), with a cross-provider-safe default. OpenAI users who want to maximize throughput can set this higher. - Configuration tab clarifies optional providers. The bundled embedding option is labeled
bge-m3 (bundled). The AI Provider, CodeGen Provider, and Embedding Provider dropdowns each indicate that local Ollama is opt-in. - Service Health no longer shows “Down” for optional services that were never enabled. Kafka and the optional vector databases (Qdrant, Weaviate, Milvus, Chroma) now correctly report “Not Configured” until you turn them on. Existing installs may still show “Down” until their stale Vault secrets are removed.
11434, so existing installs must pass --remove-orphans to release the port: docker compose pull && docker compose up -d --remove-orphans. Without it the upgrade fails with Bind for 0.0.0.0:11434 failed: port is already allocated. Your data and the cached Ollama model are preserved.Reliable run-completion signals, smarter agent calls, cleaner run history.
- Agents get a single “are we done?” signal when watching a pipeline load. Polling a publisher token returns a rollup with a clear
allDoneboolean and a per-job outcome (success / warning / error / processing / timed out), so agents no longer have to interpret the raw event stream to figure out whether a run is complete. - Pipeline status by publisher token works reliably for completed runs. A storage path that occasionally hid completed runs from the publisher-token query is fixed; the query is now backed by an indexed top-level field, with a fallback for older rows so existing data resolves without a migration.
run_tapno longer ships the records array back to the agent. A push run returnsrecordCount,publisherToken, and thepersisted/persistedReasonflags — enough to verify ingestion viaget_pipeline_statuswithout bloating the agent’s context. Usetest_tapto preview a script’s output (capped at 20 sample rows with arecordsTruncatedflag).- Duplicate
run_tapcalls are suppressed. The agent skips arun_tapfor a tap that’s already in flight in the same session, and the platform debounces push runs to one per tap per 5 seconds. Prevents accidental duplicate ingestion from parallel tool calls, double-clicks, and transport retries (persistedReason: already_runningordebounced). - Every tap run now produces visible logs. The script wrapper emits start / fetch / record-count lifecycle lines on every run, so run history shows useful output even when the user’s script never calls
print(). - Secret values are masked in stored tap logs and exception messages. A tap script that incidentally printed an API key or Vault-loaded credential would previously have surfaced the raw value in run history; those values are now redacted in the persisted log and the error string.
- Deleting a tap now cleans up its run history. Previously, run-history rows accumulated indefinitely and could resurface under a recreated tap with the same name.
- Agent activity log shows full requests and responses. The expanded view in the agent monitor no longer truncates request arguments or response bodies.
MCP tab and documentation sync.
- MCP tab now lists every agent tool. The in-app MCP reference was missing
get_pipeline_status,create_tap_secret,delete_tap_secret,get_tap_ledger, andquery_natural. All five are now in the catalog and the Try-It playground, andcreate_tapexposes thetap_typeparameter. - Recommended Agent Workflow rewritten to match the platform’s actual flow. The in-app workflow and the MCP docs had drifted — they started with
profile_data(which the platform explicitly says not to use for pipeline generation), skipped thepersisted/persistedReasoncheck afterrun_tap, and didn’t show thepublisherTokenpoll that confirms records actually landed. All three are now canonical: check-before-create, verify-via-publisher-token, and tap credentials managed viacreate_tap_secret. - Documentation: updated agent workflow examples. RAG over external documents is now shown as a document tap (
tap_type="document"+get_tap_ledger), onboarding an external source usescreate_tap_secretfor credentials, and quality monitoring of scheduled taps shows theget_tap_logs→get_pipeline_status(publisher_token=...)pivot.
Tap wizard reliability, iteration history, and cleaner vector-search errors.
- Tap wizards learn from their own retries. When the AI fixes, optimizes, or reviews a tap script, it now carries forward up to the last three attempts — what was tried, what went wrong, and what changed — into the next call. The wizard stops cycling through the same failed approaches.
- Saved tap scripts always match the tap. Saving a tap now pushes the in-memory script to object storage before writing the tap config, and the create/update call verifies the stored script is actually there. No more “missing script” banners from an interrupted save, and auto-revert no longer strands a tap with a deleted script.
- Run Tap stays on the page when nothing was ingested. If a manual run finishes without persisting records, the wizard keeps you on the run step and shows an inline reason (test mode, no records, run error) instead of navigating away and hiding the diagnostic.
- Tap logs now carry the publisher token. Every tap run that submitted records records its publisher token in the log. Agents reading
get_tap_logscan pivot directly toget_pipeline_statusto confirm a scheduled run actually landed in the destination — not just that the script ran. - Vector search fails cleanly when the embedding dimension doesn’t match the collection. If you change embedding providers on a pipeline whose vector collection already has vectors of a different dimension, search queries now return a clear 400 with a user-actionable message instead of leaking a JVM stack trace.
- Sturdier local-dev startup. Kafka and Zookeeper now use named volumes (no more corruption races on rebuild), Kafka waits for Zookeeper’s request processor to actually be ready (not just its listener bound), and Vault init picks up an explicit AI-provider override so a stray shell env var can’t silently flip providers.
Agent-native tap observability, scheduler fix, and agent-owned tap secrets.
- Agents can watch a tap load to completion. Running a tap now reports back whether the data was actually persisted — and names the reason when it wasn’t (test mode, no target pipeline, no records, run error). Every persisted run returns a single publisher token covering the whole run, even for document taps that spawn many ingestion jobs. A new
get_pipeline_statusMCP tool lets an agent poll that one token until the entire load reaches its final state, so it can report “done” with real numbers instead of guessing from a response body. - Scheduled taps no longer need a manual kickoff. Taps saved with a cron schedule now fire on their next scheduled time automatically. Previously, a newly saved scheduled tap would wait indefinitely until you ran it once by hand.
- Self-diagnosing tap scripts. If a tap’s generated script goes missing from object storage, the Edit Tap page now shows an amber banner explaining the state and pointing you to Regenerate, instead of a cryptic mid-run “key does not exist” error. Test Tap surfaces the same state with actionable wording.
- Agents can manage their own tap secrets. Via MCP, agents can now create and delete the secrets their taps need (API keys, tokens). Scope is strictly tap-owned — agents cannot create, overwrite, or delete human-owned Platform secrets (DB creds, AI keys, vector-store creds).
- Secrets page split into Platform and Taps sub-tabs. Platform lists the built-in Datris secret slots; Taps lists agent-authored tap secrets. Creating a secret from the Taps sub-tab auto-tags it so it stays agent-editable, and Tap secrets are fully manageable on trial tenants.
- Honest Test Tap banner. The run-result banner on Test Tap now reflects what actually happened on the server — “sent to pipeline” only when the run was truly persisted, otherwise “not persisted” with the reason — rather than whatever the pre-request checkbox said.
- BYO-code taps can declare pip dependencies. If you paste your own fetch script into Create Tap, you can now list the Python packages it needs. Previously only AI-generated taps could declare dependencies.
- Example agent refactored onto taps. The bundled market-macro-agent example now drives ingestion through taps instead of ad-hoc fetch scripts, demonstrating the full agent-native tap flow (provisioning, secrets, publisher-token watching).
Tap prompt fragments, post-run script review, BYO code, and Configuration page reorg.
- Tap prompt fragments (new). A new Configuration → Taps sub-tab lets you save reusable, per-tenant context snippets — things like API conventions, required headers, rate limits, and preferred libraries for a given source. When the key or any of its aliases appears in a Create Tap description, brainstorm, auto-fix, optimize, or Discovery chat, the fragment’s content is automatically added to the system prompt. Includes an AI Suggest button, a Load Examples catalog (AWS, Polygon, Stripe, SEC EDGAR), JSON import/export, and an “Extra context applied” chip row in the tap wizard showing which fragments hit.
- Post-run script review. After a tap’s first successful test, the AI now scans the captured stderr/stdout for signals that the script should change — rate-limit or burst warnings, deprecation hints, pagination cues, schema drift, auth warnings — and regenerates the script if needed. On a rewrite the wizard auto-retests; the performance optimizer runs only when the logs are clean. The optimizer’s prompt was also tightened so rate-limit markers push it toward throttling instead of more concurrency.
- I Have My Own Code (new). A third Tap Type on the Create Tap wizard lets you paste a fetch() script directly instead of having AI generate one. Step 1 switches to a code textarea with a Use My Code button; after upload the button flips to Re-upload My Code and Step 2 gates on the text matching what’s on disk, so edits force a fresh upload before running the test.
- Configuration page reorganized into three sub-tabs — Environment, AI Providers, and Taps — with a prominent “Highly recommended: Anthropic with the latest coding model” tip on the CodeGen Provider section.
- Tap Name collision warning. If the name you type in the Create Tap wizard matches an existing tap, an amber banner appears under the field warning that continuing will overwrite the existing tap’s configuration and script.
- Auto-fix retries bumped to 3. When a tap script fails its first test, the AI now gets up to three repair attempts (was two) before giving up.
- Cron Custom preset no longer blocked by AI formatting. AI-generated cron expressions wrapped in code fences, brackets, or quotes are now cleaned automatically, so the Next button is enabled on valid output.
- OpenAI Codex models now work for code generation. Previously, selecting a codex-family model for CodeGen (e.g., the recommended GPT-5.3-Codex) caused tap-script generation and AI data quality / transformation to fail immediately with a 404. Datris now routes codex models to the right OpenAI endpoint automatically.
- Vector-store dimension changes now fail fast with a clear message. If you switch embedding providers (for example, Ollama bge-m3 → OpenAI text-embedding-3-small) on a pipeline whose destination table or collection already has vectors of the old dimension, the job stops up front and tells you exactly what to do instead of blowing up mid-ingest with a cryptic database error. Applies to pgvector, Qdrant, Weaviate, Milvus, and Chroma destinations.
- Configuration save is honest about missing API keys. Changing the AI Provider to Anthropic or OpenAI without entering that provider’s API key no longer silently skips the save while reporting success — the Configuration page now flags the missing key and tells you which one to add.
- Create Tap brainstorm wraps up sooner. For document taps, the AI assistant no longer drills for optional date filters once you’ve supplied the source, auth, and a broad scope — the tap ledger already dedupes by content, so those extra questions were just noise.
- OpenAI GPT-5 and reasoning-model support. AI-backed features (Discovery chat, Create Tap brainstorm, AI data quality, AI transformation, script generation) now work with OpenAI’s newer model families. Previously, selecting one of these models caused chat panels and wizards to hang without a response.
Document Taps — a new tap type for feeding vector stores with unstructured files.
- Document Taps (new). A purpose-built tap for ingesting PDFs, Word docs, HTML, and other unstructured files into a vector-store pipeline. Describe the source in plain English (“ingest all PDFs from our SharePoint legal folder”, “pull every DOCX from
legal-contracts/2026/in S3”) and the generated tap discovers the files and hands their raw bytes to the pipeline. Text extraction, chunking, embedding, and loading are all handled downstream — you don’t configure any of that on the tap. - Tap type toggle in Create Tap. Choose Document Ingestion or Structured/Semi-Structured on the first step of the wizard; the prompts, placeholders, and example instructions adapt to the choice.
- Ingestion ledger. Every discovered file is tracked by URI and content hash. Re-running the tap skips files that are already up to date — no re-embedding, no duplicates. Changed files flow through normally.
- Pre-flight validation. Document taps linked to a pipeline are checked at save time to confirm the pipeline is shaped for document ingestion (unstructured source, vector-store destination). Misconfigurations surface as an actionable error instead of a cryptic mid-run failure.
- Safe defaults for local paths. Document taps refuse to silently walk arbitrary host directories if a requested path isn’t mounted into the container — they fail loudly instead of ingesting unintended files.
- Faster embeddings. The bundled Ollama sidecar now handles concurrent embedding requests in parallel and keeps the embedding model warm between pipeline runs, eliminating cold-start delays when a pipeline resumes after an idle period.
- More Python libraries pre-installed for taps. AWS S3, Google Cloud Storage, Azure Blob, Excel, YAML, and date/timezone helpers are now baked into the image. Taps that fetch from these sources no longer need a per-run
pip install.
Live MCP agent monitor and pipeline status self-healing.
- Agents tab — new live view of connected MCP agents. See every tool call as it happens, with agent name, arguments, record count, response size, status, and latency. Click any row to expand the full request and response.
- Pipeline status now self-heals when a job completes but the summary gets stuck showing “processing” — completed, warned, and errored jobs resolve to their correct final state.
- Example agent (market-macro-agent) automatically reconnects with backoff if the MCP connection drops, and degrades gracefully while offline instead of crashing.
- New AI models from OpenAI and Anthropic now appear in the Configuration dropdowns automatically, without a Datris upgrade.
- Claude Opus 4.7 is the recommended Anthropic model for CodeGen (tap script generation, AI data quality rules, AI transformations, JSON Schema / XSD generation, and natural-language → SQL).
- New trial signups are seeded with the latest recommended models by default.
Two-pass AI tap optimization and Data Catalog UX.
- AI optimize pass. After a tap script passes its initial test, the platform sends it back to the LLM for a performance rewrite and re-tests. Auto-reverts on regression (>=20% slower) or failure. New
POST /api/v1/tap/optimizeendpoint;/tap/testnow returnsdurationMs. - Discovery wizard. Auto-optimize per tap with before/after timing banner, per-row stop button, on-demand AI fix panel for failed items, chat auto-scroll, and re-shown Discover Datasets button after new messages.
- Create Tap. Configurable test sample size (defaults to 20 records).
- Data Catalog. Kebab menu on Uncataloged items with Edit, Delete, and Move to Catalog (with name-clash detection).
Database lockdown, tap script hardening, Create Tap UX.
- Server-controlled database name. UI no longer edits the Postgres or MongoDB database name —
/api/v1/versionreturnspostgresDatabaseandmongodbDatabaseand the UI submits them unchanged. - MongoDB internal vs user split.
mongodb.database(user-facing, defaultdatris) holds pipeline data;mongodb.internalDatabase(defaultoss) holds platform state. Existing installs keepossfor platform state; new user pipelines land indatris. - Unlimited reads for tap scripts.
limit: -1on/api/v1/query/mongodband/api/v1/query/postgresreturns every matching row. Preview defaults (20 Mongo / 100 Postgres) are unchanged for UI/MCP callers. - Tap test sampling. New “Limit test sample to 20 records” checkbox in Create Tap step 2 injects
DATRIS_TAP_TEST_LIMIT=20into the script. Cron and manual runs read everything. - Codegen + diagnosis hardening. Generated scripts treat platform response shapes as contractual (no shape-probing, no candidate-key iteration). Diagnosis quotes the actual traceback and respects in-script guards.
- Create Tap UX. Ask button, auto-apply diagnosis (capped at 2 attempts), Stop Test, copy-to-clipboard, scrollable-JSON test results, destination collision check on Generate Pipeline, full destination shown on step 5.
- Data Catalog. One-click delete of the Uncataloged group; per-item trash icon on taps and pipelines inside every catalog.
Input sanitization hardening and minor UI polish.
- Pipeline creation sanitizes destination identifiers before writing the config — Postgres
dbName/schema/table, MongoDBdbName/table, Kafkatopic, ActiveMQqueueName, vectorcollectionName/tableName/schemaName, and the DQ schema name. - Four duplicated
sanitizeNamehelpers consolidated intoui/src/app/shared/sanitize.ts(sanitizeLabelandsanitizeIdentifier).
Discovery wizard, Data Catalog, and trial BYO AI keys.
- Discovery wizard. Six-step AI-guided onboarding that turns “yfinance daily prices for the S&P 500” into running taps and pipelines. New endpoints
POST /api/v1/discoverandPOST /api/v1/discover/build, plusdiscover_sourceMCP tool. - Data Catalog. Group related taps and pipelines into named catalogs. New Data Catalog tab with expandable contents and an Uncataloged group.
- Per-pipeline DQ + transformation editor. Inline editor inside the pipeline view.
- Trial BYO keys. Trial tenants supply their own Anthropic or OpenAI key at signup; keys land in the tenant’s Vault path and override the shared Datris-managed key on the next AI call.
Dedicated instance support and hosted platform improvements.
- Hosted-aware Configuration UI. Hides Ollama for AI Primary/CodeGen on hosted, locks embedding to bundled Ollama
bge-m3when the provider is Anthropic, and hides the advanced toggle. - Improved multi-user session handling on shared instances.
Full tap MCP tool suite and user-supplied tap scripts.
- Four new MCP tools:
get_tap,test_tap,update_tap,get_tap_logs. create_tapaccepts an optionalscriptparameter for user-supplied Pythonfetch(), plussecret_namefor Vault-injected credentials.- CLI:
datris tap create --script path/to/script.pyand newdatris tap show.
Ollama for all AI slots and hot-reload on save.
- Configuration UI offers Ollama (local) as a provider for AI Primary, CodeGen, and Embedding. Bundled Ollama sidecar option pre-fills
bge-m3for embedding. - Saving AI configuration takes effect immediately — no container restart required.
Trial-instance hardening and Configuration tab polish.
- Trial codegen now defaults to
claude-haiku-4-5-20251001instead of Opus 4.6, dramatically lowering per-trial cost. Self-hosted customers continue to recommend Opus for codegen on their own keys. - Trial banner copy and link styling refinements on the Configuration tab.
Breaking: AI configuration restructured into three independent Vault secrets.
ai.aiSecretNameandai.providerare replaced by three top-level slots —ai.aiPrimary.secretName,ai.codegen.secretName,ai.embedding.secretName.- Each Vault secret is self-describing (
provider,endpoint,model,apiKey, optionallyversion). No path derivation from YAML. - v1.5.x deployments must update
application.yamland re-seed Vault on upgrade.
Onboarding, tap scheduling UX, and trial BYO key path.
- Getting Started tab as the new first-run landing page; Docs ↗ top-nav link.
- Inline cron editor on the Taps page: preset buttons, AI prompt field (“Every weekday at 4pm”), Quartz field validation, human-readable description.
- Truncate-before-run toggle in the tap wizard (sets
truncateBeforeWrite: trueon the generated destination). - Configuration tab for trial users with BYO Anthropic/OpenAI key banner and “Datris-managed (default)” vs “Your own key” status.
- Fix:
TapSchedulerhonors the configureddateTimezonefromapplication.yaml.
Taps (Beta). AI-generated Python scripts that fetch data on a schedule and stream it into a pipeline.
- 4-step tap creation wizard (Describe → Edit & Test → Schedule → Review) with brainstorm chat and proactive env-var suggestions.
- AI script generation with JSON-parse retry and raw-script fallback; AI diagnosis with explicit (a)/(b)/(c) options; “Apply Diagnosis” rewrites the script in place.
- Tap secrets stored in Vault tagged
_type=tap; run history viaTapRunLogandGET /tap/logs. - CRON scheduling (Quartz format), AI-generated from natural language.
- JSON-to-CSV pipeline feed (union of keys) with column-name normalization via spell-out table (
%→percent,#→num,&→and, …). - Pipelines page shows the feeding tap; pipeline wizard has “Create from Tap”.
Schema Evolution.
- Additive evolution: new CSV columns are auto-added as
string,schemaVersionincrements, andALTER TABLEruns on Postgres. - Dropped columns are excluded from the
COPYcommand so Postgres defaults them toNULL(previously failed on typed columns). - Missing key fields raise a clear error.
- Shared
DataUtil.evolveSchema()now used by bothStreamNotifierandFileNotifier. - Query endpoints use
serializeNulls()soNULLcolumns appear in results.
Remote MCP endpoint and managed service support.
- Per-session API key forwarding on the MCP server (header or
api_keyquery param);REQUIRE_API_KEY=truerejects unauthenticated SSE/streamable-HTTP connections. - New MCP tools for managed service:
signup_trial,upgrade_to_dedicated,check_upgrade_status. - Remote SSE registry entry for
https://mcp.trial.datris.ai/sse. - Website APIs accept
x-api-keyalternative to cookie JWT;/api/provision/agent-trialcombines signup + provisioning with rate limiting.
- Pipeline creation wizard hides PostgreSQL and MongoDB database name fields on trial instances — auto-populated from the environment or defaults to
datris.
Multi-tenant hosting on a shared instance.
- Per-tenant PostgreSQL databases (auto-created on first use) and per-tenant MongoDB databases scoped via the connection string environment name.
- Tenant-scoped metadata and query endpoints; new
TenantInterceptorsetsDatrisEnvironment.currentper request based on API key. - Vector DB secret isolation across Qdrant, Weaviate, Milvus, Chroma, pgvector.
- Batch upload for compressed files (
.zip,.gz,.tar,.jar) processes contents inline — no MinIO webhook dependency.
- Default AI provider changed to Anthropic (Claude Sonnet 4.6). Customers can still switch to OpenAI or Ollama via
application.yaml. - Generated Python scripts for DQ and transformation are logged in full to the server logs for debugging.
- New unified
datris analyzecommand replacesask-sqlandask. Auto-picks the right approach based on--dest(Postgres → SQL generation, MongoDB → document fetch, vector stores → RAG). --ai-analyzeflag on ingest.
Mintlify documentation and accuracy pass.
- Migrated docs from
.mdto.mdxwith a two-tab Mintlify layout. - Removed deprecated docs (row rules, column rules, JavaScript row functions, REST transformations, deduplication, column trimming).
- Page-by-page accuracy review against the codebase.
- Server fails fast at startup with a clear error if an AI provider is not configured (CodeGen DQ and transformation require it).
- Removed stale NEVER rules and deprecated-feature references from the MCP server instructions.
CodeGen DQ and transformation.
aiRulereplaces the prior AI DQ approach: the LLM generates a self-contained Python validation script from a plain-English instruction, which runs locally. Cost drops from 0.003/rule.aiTransformationuses the same CodeGen approach.- Works for CSV, JSON, and XML.
- Removed:
columnRules(regex), JavaScript row rules, REST endpoint row rules,AIDataQualityUtil. JavaScript row functions and REST transformations removed from UI and CLI.
- Atomic
create_pipeline.generate_schemais removed;create_pipelineaccepts base64-encoded sample data, auto-detects schema, and creates the pipeline in one call. - Content-based uploads across
upload_data(renamed fromupload_file),profile_data,upload_config. update_secretMCP tool for AI provider keys only.- All-string schema on MCP pipeline creation; pipeline registration verified by read-back.
update_secretMCP tool added (scoped to AI secrets: anthropic, openai, ollama, embedding).- Published to the MCP Registry (
io.github.datris/datris) and PyPI (datris-mcp-server).
New UI tabs.
- MCP Tab — agent view, service health for 10 backend services, browsable tool grid, config generator for Claude Desktop/Code and Cursor, and a Tool Playground that executes against the live API.
- Secrets Tab — full CRUD for Vault secrets with sensitive-field masking.
- New REST endpoints for vector store metadata discovery (Qdrant collections and friends).
