Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.datris.ai/llms.txt

Use this file to discover all available pages before exploring further.

This reference lists every data type the pipeline supports, along with mappings to PostgreSQL and Spark types used during storage and transformation.

Type Reference

Pipeline TypeDescriptionPostgreSQL TypeSpark Type
booleanTrue or falsebooleanBooleanType
tinyint8-bit signed integer (-128 to 127)smallintByteType
smallint16-bit signed integer (-32768 to 32767)smallintShortType
int32-bit signed integerintegerIntegerType
bigint64-bit signed integerbigintLongType
float32-bit IEEE 754 floating pointrealFloatType
double64-bit IEEE 754 floating pointdouble precisionDoubleType
decimal(p,s)Fixed-precision number with p total digits and s fractional digitsnumeric(p,s)DecimalType(p,s)
stringVariable-length text, unboundedtextStringType
varchar(n)Variable-length text up to n charactersvarchar(n)StringType
char(n)Fixed-length text of exactly n characterschar(n)StringType
dateCalendar date without timedateDateType
timestampDate and time with microsecond precisiontimestampTimestampType

Precision and Scale for Decimal

The decimal(p,s) type requires two parameters:
  • p (precision): total number of digits, range 1 to 38.
  • s (scale): number of digits to the right of the decimal point, range 0 to p.
Examples:
DeclarationStoresMax Value
decimal(5,2)Up to 5 digits, 2 after the decimal999.99
decimal(10,0)Up to 10 integer digits, no fractional part9999999999
decimal(18,6)18 total digits, 6 fractional999999999999.999999

Integer Type Selection

Choose the narrowest integer type that fits your data to reduce storage and improve Spark performance:
TypeByte SizeRange
tinyint1-128 to 127
smallint2-32,768 to 32,767
int4-2,147,483,648 to 2,147,483,647
bigint8-9.2 x 10^18 to 9.2 x 10^18

String Type Selection

TypeUse When
stringMaximum length is unknown or varies widely
varchar(n)A known upper bound exists and you want the database to enforce it
char(n)Values are always exactly n characters (e.g., ISO country codes, fixed-format identifiers)
All three string types map to StringType in Spark. The length constraint is enforced only at the PostgreSQL layer.

JSON and XML Special Types

Fields whose names end with _json or _xml receive special handling during ingestion:
SuffixBehavior
_jsonThe field value is treated as a nested JSON document. Stored as a document in MongoDB (default destination for JSON), or as a text column in PostgreSQL. No schema validation is applied to the nested content.
_xmlThe field value is treated as an XML fragment. Stored as a text column in PostgreSQL or as a string in MongoDB.
Declare these fields with type string in the schema:
{
  "fields": [
    { "name": "id", "type": "bigint" },
    { "name": "payload_json", "type": "string" },
    { "name": "metadata_xml", "type": "string" }
  ]
}
During ingestion, the pipeline detects the _json and _xml suffixes and preserves the raw content without attempting to parse it into individual columns. This is useful for semi-structured data that should be queried with JSON or XML functions downstream.

Type Coercion

During ingestion, the pipeline converts field values from strings to the declared type:
  1. Null or empty string — stored as an empty string. The destination database handles null conversion.
  2. Numeric types (int, bigint, float, double, decimal, etc.) — the value is parsed using Scala’s built-in conversion. If parsing fails (e.g., "abc" in an int column), the job fails with an error.
  3. String types (string, varchar, char) — stored as-is with no conversion.
  4. Date and timestamp — stored as the raw string value. The destination (PostgreSQL, Spark) handles date/timestamp parsing according to its own rules.