---
name: datris-memory
description: Long-term semantic memory layer for AI agents, built on the Datris Platform via MCP. Local markdown files (MEMORY.md, memory/*.md) are the source of truth; Datris is the queryable index, rebuilt from them with strict one-file-per-upload provenance and incremental mtime-based sync (background timer, post-write flush, on-demand). Triggers on memory-shaped questions — saving, recalling, ingesting, syncing, searching — even when the user doesn't say "Datris."
version: 1.0.0
metadata:
openclaw:
requires:
bins:
- datris
- docker
envVars:
- name: MCP_SERVER_URL
required: false
description: Datris MCP server SSE endpoint. Defaults to http://localhost:3000/sse.
install:
- id: brew
kind: brew
formula: datris/tap/datris
bins: [datris]
label: Install Datris CLI (brew)
---
# Datris memory layer
Datris is the long-term semantic memory layer. Local memory files are the source of truth. Datris is rebuilt from them — never the other way around.
## Prerequisites
This skill assumes a local Datris install:
- **Datris Platform** (includes the MCP server) — Docker-based, see [docs.datris.ai/installation](https://docs.datris.ai/installation). The MCP server runs at `http://localhost:3000/sse` by default; override with `MCP_SERVER_URL`.
- **Datris CLI** — `brew tap datris/tap && brew install datris`. See [docs.datris.ai/cli](https://docs.datris.ai/cli) for command reference.
Confirm both are working before relying on this skill: `datris health` pings every backend service the memory pipeline needs.
## Rules
- Prefer Datris MCP tools over the CLI for memory operations. The CLI is a fallback for the cases listed under "When to use the CLI" below.
- **One file in = one `upload_data` call.** Never concatenate, bundle, or consolidate multiple memory files into a single corpus upload. Each file's name is its provenance — it must round-trip cleanly into Datris and back out in retrieval results. Bundling looks like an optimization and is not one: it permanently breaks provenance, blocks per-file incremental sync, and makes resets harder.
- Upload each file as-is. Let Datris chunk server-side. Do not pre-chunk. Do not use a document tap for local files. The only time to split a single file is when that one file genuinely exceeds the upload limit; in that case, split it with explicit provenance markers in the chunk filenames (e.g. `MEMORY.md.part1`, `MEMORY.md.part2`).
- Upload files in parallel. Each `upload_data` call returns a job token immediately — fire all uploads first, then poll. Do not wait for one job to finish before starting the next.
- Poll job status to completion before claiming any ingestion succeeded. Polling is for verification, not for gating subsequent uploads.
- Verify retrieval with a semantic search after every ingestion run.
- Memory pipelines must target a vector destination (pgvector, Qdrant, Weaviate, Milvus, or Chroma).
- The embedding model is pinned at pipeline-creation time. Vector dimensions cannot change after the fact. Confirm the embedder matches the pipeline before ingesting. Switching embedders means dropping and recreating the destination collection.
- **Place every created resource in the `openclaw` data catalog by default.** Pipelines, taps, secrets, destination collections — anything the agent creates in Datris on OpenClaw's behalf goes in the catalog named `openclaw` unless the user explicitly directs otherwise. This keeps OpenClaw's footprint cleanly separated from other Datris workloads on the same instance, makes cleanup and auditing trivial, and lets multiple OpenClaw users share an instance without colliding. If an existing resource the agent wants to reuse lives in a different catalog, do not migrate it silently — surface the mismatch to the user and ask before continuing. Set the catalog at creation time (`create_pipeline` accepts `catalog: "openclaw"`; `create_tap` writes the field through) or after the fact via the `set_catalog` MCP tool.
## When to use the CLI
MCP tools are the default. Reach for the `datris` CLI in these specific situations:
- **Health checks during bootstrap or troubleshooting.** `datris health` confirms every backend service is up. Use it as a more thorough cross-check when MCP `check_service_health` returns ambiguous results, or when diagnosing a stuck ingestion.
- **MCP server unavailable.** If the MCP connection is down, `datris ingest <file> --dest pgvector` and `datris search "<query>" --store pgvector --collection <name>` are the equivalent fallbacks for ingestion and retrieval. Use these only to keep the user unblocked; restore MCP-based operation as soon as the server is reachable.
- **Pipeline status when MCP polling stalls.** `datris status <pipeline>` is a clean way to read the latest job state if the MCP `get_job_status` loop has lost track of which token to follow.
- **Spot-checks the user runs themselves.** When the user wants to verify an ingestion by hand, point them at `datris search` against the destination collection rather than walking them through MCP tool calls.
Log every CLI invocation in the audit log (`memory/<today>.md`) the same way an MCP call would be logged — same provenance discipline applies.
## First-run bootstrap
1. Read the Datris MCP resources and tool descriptions. Understand the pipeline, upload, job-status, and search workflows before acting.
2. Check service health via the MCP `check_service_health` tool. If it returns ambiguous or partial results, fall back to `datris health` for a more detailed per-service view.
3. Reuse an existing vector pipeline for memory in the `openclaw` catalog if one exists. Otherwise create one in the `openclaw` catalog — pgvector is fine. If a memory pipeline exists outside the `openclaw` catalog, surface that to the user before reusing or migrating it.
4. Ingest `MEMORY.md` and each `memory/*.md` file via its own `upload_data` call — one upload per file, no exceptions. Do not concatenate them into a single corpus document, even if the total set is small. Fire the uploads in parallel and collect all the job tokens before polling.
5. Poll all jobs concurrently until every one is done. Call `get_job_status(pipeline_token=...)` per job and treat it as complete when `rollup.allDone` is `true`; the per-job outcome is `rollup.status` (`success` / `warning` / `error`), with failure detail in `rollup.jobs[].lastError`.
6. Verify with two or three representative semantic queries. Confirm both that (a) the expected content comes back, and (b) results show real source filenames (`MEMORY.md`, `memory/2026-05-06.md`, etc.) — not a consolidated corpus filename or any other synthetic name. The filename round-trip check is the early-warning signal that the one-file-per-upload rule is being followed; if results show a corpus filename, stop and apply the remediation workflow below.
7. Record the run in `memory/<today>.md`: pipeline used, files ingested, verification queries and results, any failures.
8. Propose an incremental sync strategy for future edits.
## Ongoing sync
Memory files change continuously — the agent writes to them during sessions, the user edits them in their editor between sessions, and new dated files appear over time. Sync is incremental and runs lazily, in three modes:
### When to sync
1. **Periodic background sync.** A timer-driven sweep runs every 30 minutes by default — diff memory files by `mtime` against the last sync record, upload anything stale. This runs out-of-band: it never blocks an agent response or a user query. Cadence is configurable: faster (5–10 min) for users actively editing memory between sessions, slower (hourly or more) for read-mostly use. The right value is whatever keeps the staleness window short enough that the user rarely needs to force a sync.
2. **End of any agent response that wrote to memory.** When the agent edits or creates a memory file during a turn, flush those uploads before the response is considered complete — including polling to completion. This puts the cost on the response that did the writing, not on later queries, and guarantees that a follow-up question in the same conversation can retrieve what the agent just wrote. Never let an agent-authored memory write wait for the next timer tick.
3. **On explicit user request.** Phrases like "sync memory," "save what we discussed," "update Datris with my recent notes" — full diff sweep across all memory files, immediate sync. This is the user's escape valve for the case where they just edited a file in their editor and want it queryable right now rather than waiting for the next timer tick.
### Staleness window — and why it's acceptable
Memory edits made outside of an agent session — for example, the user editing `MEMORY.md` in their editor between conversations — may be up to one timer interval behind in retrieval results. That is the explicit trade. Query latency stays predictable, ingestion is invisible, and the user has a one-line escape hatch ("sync memory") when freshness matters. Do not try to close the staleness window by syncing on every retrieval; that path puts ingestion cost on the user's wait time and is the design this skill replaces.
### Detecting what changed
Compare each memory file's filesystem `mtime` against the most recent sync timestamp recorded for that file in the `memory/<date>.md` audit logs. Three cases:
- **No sync record exists** — treat as a new file, ingest it.
- **`mtime` newer than last-sync timestamp** — re-upload.
- **`mtime` unchanged** — skip. Do not re-ingest unchanged files; the point of incremental sync is to do less work.
A content hash is more reliable than `mtime` if the user touches files without editing them (some editors do this on save), but `mtime` is sufficient as a default. Switch to hashing only if redundant uploads start showing up in the audit log.
### Sync workflow
Classify each changed file into one of four cases and act accordingly:
- **Edited file** — re-upload via `upload_data`. Pipelines upsert on source filename, so this overwrites the file's existing chunks cleanly.
- **New file** (no prior sync record) — upload via `upload_data` like any other.
- **Renamed file** — treat as `delete + add`, never as `update`. Delete chunks for the old filename from the destination collection, then upload the new file. Otherwise the index keeps orphan chunks under the old name.
- **Deleted file** — delete its chunks from the destination collection. Do not leave orphans.
Then:
- Fire all uploads in parallel, collect the job tokens, poll concurrently to completion.
- Verify with one or two semantic queries that touch the changed content. Confirm filenames in results reflect the post-sync state — no stale entries from before a rename or delete, no consolidated-corpus names.
- Append a per-file entry to today's audit log: filename, change type (edit / new / rename / delete), timestamp, verification result.
## Inheriting a consolidated-corpus pipeline
If the existing memory pipeline was bootstrapped with a single consolidated upload (a `*-corpus-*.md` source file, or any upload whose filename doesn't match a real memory file), the pipeline has broken provenance and cannot be incrementally synced cleanly. Do not patch around it. Reset and re-ingest:
1. Confirm with the user before resetting — destination collections may contain manual edits.
2. Drop the destination collection (or recreate the pipeline with the same name).
3. Re-ingest each canonical memory file individually per the bootstrap workflow above.
4. Verify with semantic queries that retrieval results now show real source filenames (`MEMORY.md`, `memory/2026-05-06.md`, etc.) rather than a corpus filename.
## Retrieval
When the user asks a memory-shaped question, reach for `vector_search` (or `ai_answer` for synthesis) against the memory pipeline before grepping local files. Substring search on local markdown is a fallback, not the default. If the MCP layer is unreachable, `datris search "<query>" --store pgvector --collection <name>` is the equivalent fallback against the same destination.
## Reporting
After any bootstrap or sync, report:
- Pipeline used or created.
- Files ingested.
- Verification queries and whether they returned the expected content.
- Anything that failed and why.