Supported Databases
| Database | JDBC Driver |
|---|---|
| PostgreSQL | org.postgresql.Driver |
| MySQL | com.mysql.cj.jdbc.Driver |
| MSSQL | com.microsoft.sqlserver.jdbc.SQLServerDriver |
Configuration
Database sources are configured in thedatabaseAttributes section of a pipeline configuration. Connection credentials (JDBC URL, username, password) are stored in Vault — not in the pipeline config itself.
Configuration Reference
| Property | Required | Description |
|---|---|---|
type | Yes | One of postgres, mysql, mssql |
postgresSecretsName | Conditional | Vault secret name for PostgreSQL credentials |
mysqlSecretsName | Conditional | Vault secret name for MySQL credentials |
mssqlSecretsName | Conditional | Vault secret name for MSSQL credentials |
cronExpression | Yes | Quartz-format cron expression controlling the pull schedule |
database | No | Database name (if not in the JDBC URL) |
schema | No | Schema within the database |
table | Yes* | Table to pull data from (*unless sqlOverride is set) |
timestampFieldName | Yes* | Column used for incremental pulls. Effectively required — the generated query always appends this column and orders by it. |
includeFields | No | Array of column names to select; when omitted, all columns are selected |
sqlOverride | No | Custom SQL query that replaces the generated SELECT statement |
outputDelimiter | No | Delimiter for CSV output (default ,) |
Secrets in Vault
Database credentials are never stored in the pipeline configuration. Instead, the pipeline reads them from HashiCorp Vault using the secret name configured above. The Vault secret must containusername, password, and jdbcUrl keys:
Cron-Based Scheduling
ThecronExpression field accepts a Quartz cron string with six fields (seconds, minutes, hours, day-of-month, month, day-of-week):
| Expression | Schedule |
|---|---|
0 */15 * * * ? | Every 15 minutes |
0 0 * * * ? | Every hour on the hour |
0 0 2 * * ? | Daily at 02:00 |
0 0 0 ? * MON | Every Monday at midnight |
Incremental Pulls
Incremental extraction relies ontimestampFieldName, which is effectively required: the generated SELECT always appends that column to the projection and orders the result by it (ORDER BY {timestampFieldName}). After each pull, the pipeline records the value of that column from the last (highest-ordered) row returned — this is the high-water mark. It is not computed with a MAX() aggregate; it is simply the timestamp of the final row in the ordered result set.
On the next execution, the pipeline adds a WHERE {timestampFieldName} > {lastValue} clause to fetch only new or updated rows. The high-water mark is stored in MongoDB in the {environment}-data-pull collection.
Custom SQL
SetsqlOverride to run an arbitrary query instead of a simple SELECT ... FROM table:
Field Filtering
UseincludeFields to select a subset of columns from the source table:
sqlOverride is set, includeFields is ignored because the SQL query already defines the column list.
Troubleshooting
| Symptom | Check |
|---|---|
| Connection refused | Verify the jdbcUrl in Vault is correct and the database allows connections from the pipeline host. |
| Authentication failed | Confirm the Vault secret name is correct and contains valid username/password/jdbcUrl keys. |
| Zero rows returned on incremental pull | The high-water mark in MongoDB may already be ahead of the data. Delete the entry in the {environment}-data-pull collection to reset it. |
| SQL syntax error | When using sqlOverride, test the query directly against the database first. |
