Skip to main content
This reference lists every data type the pipeline supports, along with mappings to PostgreSQL and Spark types used during storage and transformation.

Type Reference

Pipeline TypeDescriptionPostgreSQL TypeSpark Type
booleanTrue or falsebooleanBooleanType
tinyint8-bit signed integer (-128 to 127)smallintByteType
smallint16-bit signed integer (-32768 to 32767)smallintShortType
int32-bit signed integerintegerIntegerType
bigint64-bit signed integerbigintLongType
float32-bit IEEE 754 floating pointrealFloatType
double64-bit IEEE 754 floating pointdouble precisionDoubleType
decimal(p,s)Fixed-precision number with p total digits and s fractional digitsnumeric(p,s)DecimalType(p,s)
stringVariable-length text, unboundedtextStringType
varchar(n)Variable-length text up to n charactersvarchar(n)StringType
char(n)Fixed-length text of exactly n characterschar(n)StringType
dateCalendar date without timedateDateType
timestampDate and time with microsecond precisiontimestampTimestampType

Precision and Scale for Decimal

The decimal(p,s) type requires two parameters:
  • p (precision): total number of digits, range 1 to 38.
  • s (scale): number of digits to the right of the decimal point, range 0 to p.
Examples:
DeclarationStoresMax Value
decimal(5,2)Up to 5 digits, 2 after the decimal999.99
decimal(10,0)Up to 10 integer digits, no fractional part9999999999
decimal(18,6)18 total digits, 6 fractional999999999999.999999

Integer Type Selection

Choose the narrowest integer type that fits your data to reduce storage and improve Spark performance:
TypeByte SizeRange
tinyint1-128 to 127
smallint2-32,768 to 32,767
int4-2,147,483,648 to 2,147,483,647
bigint8-9.2 x 10^18 to 9.2 x 10^18

String Type Selection

TypeUse When
stringMaximum length is unknown or varies widely
varchar(n)A known upper bound exists and you want the database to enforce it
char(n)Values are always exactly n characters (e.g., ISO country codes, fixed-format identifiers)
All three string types map to StringType in Spark. The length constraint is enforced only at the PostgreSQL layer.

JSON and XML Special Types

Fields whose names end with _json or _xml receive special handling during ingestion:
SuffixBehavior
_jsonThe field value is treated as a nested JSON document. It is stored as-is in a text column in PostgreSQL and as a StringType in Spark. No schema validation is applied to the nested content.
_xmlThe field value is treated as an XML fragment. It is stored as-is in a text column in PostgreSQL and as a StringType in Spark.
Declare these fields with type string in the schema:
{
  "fields": [
    { "name": "id", "type": "bigint" },
    { "name": "payload_json", "type": "string" },
    { "name": "metadata_xml", "type": "string" }
  ]
}
During ingestion, the pipeline detects the _json and _xml suffixes and preserves the raw content without attempting to parse it into individual columns. This is useful for semi-structured data that should be queried with JSON or XML functions downstream.

Type Coercion

When a value cannot be parsed as the declared type, the pipeline applies these rules:
  1. Null or empty string — stored as NULL regardless of the target type.
  2. Numeric overflow — if a value exceeds the range of the declared integer type, ingestion records a data quality error for that row.
  3. Invalid format — a value like "abc" in an int column is rejected and recorded as a data quality error.
  4. Timestamp parsing — the pipeline accepts ISO-8601 strings, epoch milliseconds, and common date formats (yyyy-MM-dd, MM/dd/yyyy). Ambiguous formats are resolved using the dateFormat property in the pipeline configuration when present.