Skip to main content
A tap’s fetch() is Python that Datris executes on your server. By default it runs in-process inside the server. For stronger separation, Datris can instead run each tap in a dedicated, isolated runner container, cut off from the platform’s credentials and internal services.
The isolated runner is opt-in (off by default). Enable it by setting USE_TAP_RUNNER=true. Tap behavior is identical either way — the same fetch() contract, the same environment variables, the same packages. The only difference is where the code runs and what it can reach.

Execution modes

In-process (default)Isolated runner (opt-in)
Where tap code runsInside the datris server containerA separate datris-tap-runner container
Platform secrets reachableLimited (see always-on protections)No
Internal services (database, object store, secrets store) reachableYesNo
Internet + platform API reachableYesYes
Extra packagesInstalled in an isolated, throwaway environmentInstalled in an isolated, throwaway environment
In-process mode (the default) runs tap code inside the server. Tap code in this mode shares the server’s filesystem and network, so it can reach internal services. This is appropriate for a trusted, single-operator install where you author all taps yourself. In isolated runner mode (USE_TAP_RUNNER=true), tap code runs in a container that holds no platform credentials and sits on a network with no route to the platform’s internal services. Even a misbehaving tap has nothing to read and nowhere to pivot — it can fetch from its source, install the packages it declares, call back into the Datris API, and return records. That is all. Enable it when you want defense-in-depth beyond the always-on protections.

The runner container

The isolated runner is a dedicated datris-tap-runner sidecar — a small, standalone container built from its own minimal Python image (not the server image). It is hardened to hold and grant as little as possible:
  • No platform secrets. The image carries no secrets-store credentials, no .env file, and no AI/database credentials — there is nothing sensitive inside it to read.
  • Network-isolated. It runs on a dedicated network with no route to the platform’s internal services (database, object store, secrets store). It can reach the public internet and the Datris API — nothing else internal.
  • Non-root, read-only filesystem. Tap code runs as an unprivileged user on a read-only root filesystem, with all Linux capabilities dropped and privilege escalation disabled.
  • Ephemeral scratch. The only writable area is a small in-memory scratch space that is wiped after every run, so nothing a tap writes — temp files, installed packages — survives into the next run.
This is operating-system container isolation — separate process, filesystem, and network namespaces sharing the host kernel. It is the right boundary for a self-hosted, single-operator deployment. It is not a virtual-machine boundary: a kernel-level container escape would cross it, so running fully untrusted, multi-tenant tap code calls for stronger per-run isolation — a user-space kernel such as gVisor (runsc), or a microVM (Firecracker / Kata). Datris does not ship or configure these. But because the runner is a standard OCI container, you can run it under a hardened runtime yourself if your host provides one — for example, with gVisor installed, run the datris-tap-runner container under runsc (Docker’s --runtime=runsc, or runtime: runsc on the service). Datris hasn’t validated this path, so treat it as an advanced, self-managed hardening step.

Always-on protections

These apply in both modes, regardless of the runner setting:
  • Platform secrets are never placed in a tap’s environment. The credentials the platform uses for itself (secrets-store token, AI provider keys, database passwords) are stripped from the tap’s environment. A tap receives only Datris-injected variables (DATRIS_*), any per-run parameters, and its own configured secret — nothing else.
  • Extra packages install in isolation. Packages a tap declares are installed into a throwaway virtual environment, never into the system Python. The pre-installed common packages remain available; the per-tap environment is discarded after the run.

Configuration

Set these on the datris service (for example in your .env). Defaults work out of the box.
VariableDefaultPurpose
USE_TAP_RUNNERfalseRun taps in the isolated runner. Leave unset/false to run in-process.
TAP_RUNNER_URLhttp://datris-tap-runner:8090Address of the runner service.
TAP_RUNNER_TOKEN(compose default)Shared token authenticating the server → runner call. Set a strong value.
TAP_RUNNER_CALLBACK_HOSTdatrisHost a tap uses to call back into the platform API from the runner. The datris service name on the shared network.
The runner is defined as the datris-tap-runner service in docker-compose.yml; a normal docker compose up starts it alongside the server, but taps only route to it when USE_TAP_RUNNER=true. To enable isolation, set the flag and recreate the datris container:
# enable the isolated runner, then recreate
USE_TAP_RUNNER=true docker compose up -d datris

What changes for tap authors

Almost nothing. Taps fetch data and return records; the platform writes them to the destination pipeline. Reading a source over the internet, installing declared packages, using the tap’s own secret, and calling the Datris API via DATRIS_PLATFORM_HOST / DATRIS_PLATFORM_PORT all work the same in both modes. The one thing to know: in isolated runner mode a tap cannot open a direct connection to the platform’s internal database / object store by hostname — those live on a network the runner can’t reach. This is intentional, and it’s not how taps are meant to work anyway: a tap returns records and lets the pipeline handle the destination. If you have a legacy tap that connected directly to an internal service, run it in-process or rework it to return records.

Reading platform data from a tap

A tap can read data already in Datris — for example a stock_tickers table — to drive its fetch. Do this through the query API, using the injected platform host and database name. The same code works in both execution modes, because DATRIS_PLATFORM_HOST is set correctly for each:
import os, json, urllib.request

def fetch():
    host = os.environ["DATRIS_PLATFORM_HOST"]   # "localhost" in-process, "datris" in the runner
    port = os.environ["DATRIS_PLATFORM_PORT"]
    db   = os.environ["DATRIS_POSTGRES_DATABASE"]
    req = urllib.request.Request(
        f"http://{host}:{port}/api/v1/query/postgres",
        data=json.dumps({"sql": "SELECT ticker FROM stock_tickers", "database": db}).encode(),
        headers={"Content-Type": "application/json"},
    )
    tickers = json.loads(urllib.request.urlopen(req).read())["results"]
    # ...use tickers to drive the rest of fetch()...
    return tickers
Use /api/v1/query/mongodb (or /api/v1/query/natural) the same way for Mongo-backed data. In the isolated runner, a tap cannot instead open a direct database connection (e.g. to postgres:5432) — internal services aren’t reachable from the runner’s network. Querying through the API is the supported path and works regardless of mode.

When to use which

  • In-process (default) — fine for a trusted single-operator install where you author all taps yourself, or for a tap that needs direct access to an internal service. The always-on protections still apply; internal services are reachable.
  • Isolated runner (USE_TAP_RUNNER=true) — enable for defense-in-depth: tap code stays separated from platform credentials and internal services. Recommended if you run taps you don’t fully control, or simply want the extra boundary.