Documentation Index
Fetch the complete documentation index at: https://docs.datris.ai/llms.txt
Use this file to discover all available pages before exploring further.
This reference lists every data type the pipeline supports, along with mappings to PostgreSQL and Spark types used during storage and transformation.
Type Reference
| Pipeline Type | Description | PostgreSQL Type | Spark Type |
|---|
boolean | True or false | boolean | BooleanType |
tinyint | 8-bit signed integer (-128 to 127) | smallint | ByteType |
smallint | 16-bit signed integer (-32768 to 32767) | smallint | ShortType |
int | 32-bit signed integer | integer | IntegerType |
bigint | 64-bit signed integer | bigint | LongType |
float | 32-bit IEEE 754 floating point | real | FloatType |
double | 64-bit IEEE 754 floating point | double precision | DoubleType |
decimal(p,s) | Fixed-precision number with p total digits and s fractional digits | numeric(p,s) | DecimalType(p,s) |
string | Variable-length text, unbounded | text | StringType |
varchar(n) | Variable-length text up to n characters | varchar(n) | StringType |
char(n) | Fixed-length text of exactly n characters | char(n) | StringType |
date | Calendar date without time | date | DateType |
timestamp | Date and time with microsecond precision | timestamp | TimestampType |
Precision and Scale for Decimal
The decimal(p,s) type requires two parameters:
- p (precision): total number of digits, range 1 to 38.
- s (scale): number of digits to the right of the decimal point, range 0 to p.
Examples:
| Declaration | Stores | Max Value |
|---|
decimal(5,2) | Up to 5 digits, 2 after the decimal | 999.99 |
decimal(10,0) | Up to 10 integer digits, no fractional part | 9999999999 |
decimal(18,6) | 18 total digits, 6 fractional | 999999999999.999999 |
Integer Type Selection
Choose the narrowest integer type that fits your data to reduce storage and improve Spark performance:
| Type | Byte Size | Range |
|---|
tinyint | 1 | -128 to 127 |
smallint | 2 | -32,768 to 32,767 |
int | 4 | -2,147,483,648 to 2,147,483,647 |
bigint | 8 | -9.2 x 10^18 to 9.2 x 10^18 |
String Type Selection
| Type | Use When |
|---|
string | Maximum length is unknown or varies widely |
varchar(n) | A known upper bound exists and you want the database to enforce it |
char(n) | Values are always exactly n characters (e.g., ISO country codes, fixed-format identifiers) |
All three string types map to StringType in Spark. The length constraint is enforced only at the PostgreSQL layer.
JSON and XML Special Types
Fields whose names end with _json or _xml receive special handling during ingestion:
| Suffix | Behavior |
|---|
_json | The field value is treated as a nested JSON document. Stored as a document in MongoDB (default destination for JSON), or as a text column in PostgreSQL. No schema validation is applied to the nested content. |
_xml | The field value is treated as an XML fragment. Stored as a text column in PostgreSQL or as a string in MongoDB. |
Declare these fields with type string in the schema:
{
"fields": [
{ "name": "id", "type": "bigint" },
{ "name": "payload_json", "type": "string" },
{ "name": "metadata_xml", "type": "string" }
]
}
During ingestion, the pipeline detects the _json and _xml suffixes and preserves the raw content without attempting to parse it into individual columns. This is useful for semi-structured data that should be queried with JSON or XML functions downstream.
Type Coercion
During ingestion, the pipeline converts field values from strings to the declared type:
- Null or empty string — stored as an empty string. The destination database handles null conversion.
- Numeric types (
int, bigint, float, double, decimal, etc.) — the value is parsed using Scala’s built-in conversion. If parsing fails (e.g., "abc" in an int column), the job fails with an error.
- String types (
string, varchar, char) — stored as-is with no conversion.
- Date and timestamp — stored as the raw string value. The destination (PostgreSQL, Spark) handles date/timestamp parsing according to its own rules.