Type Reference
The PostgreSQL Type column shows the literal type Datris emits in the generatedCREATE TABLE DDL. Datris remaps only a few types to their canonical Postgres aliases (tinyint/smallint → int2, float → float4, double → float8, string → text); every other type is emitted verbatim.
| Pipeline Type | Description | PostgreSQL Type | Spark Type |
|---|---|---|---|
boolean | True or false | boolean | BooleanType |
tinyint | 8-bit signed integer (-128 to 127) | int2 | ByteType |
smallint | 16-bit signed integer (-32768 to 32767) | int2 | ShortType |
int | 32-bit signed integer | int | IntegerType |
bigint | 64-bit signed integer | bigint | LongType |
float | 32-bit IEEE 754 floating point | float4 | FloatType |
double | 64-bit IEEE 754 floating point | float8 | DoubleType |
decimal(p,s) | Fixed-precision number with p total digits and s fractional digits | decimal(p,s) | DecimalType(p,s) |
string | Variable-length text, unbounded | text | StringType |
varchar(n) | Variable-length text up to n characters | varchar(n) | StringType |
char(n) | Fixed-length text of exactly n characters | char(n) | StringType |
date | Calendar date without time | date | DateType |
timestamp | Date and time with microsecond precision | timestamp | TimestampType |
Precision and Scale for Decimal
Thedecimal(p,s) type requires two parameters:
- p (precision): total number of digits, range 1 to 38.
- s (scale): number of digits to the right of the decimal point, range 0 to p.
| Declaration | Stores | Max Value |
|---|---|---|
decimal(5,2) | Up to 5 digits, 2 after the decimal | 999.99 |
decimal(10,0) | Up to 10 integer digits, no fractional part | 9999999999 |
decimal(18,6) | 18 total digits, 6 fractional | 999999999999.999999 |
Integer Type Selection
Choose the narrowest integer type that fits your data to reduce storage and improve Spark performance:| Type | Byte Size | Range |
|---|---|---|
tinyint | 1 | -128 to 127 |
smallint | 2 | -32,768 to 32,767 |
int | 4 | -2,147,483,648 to 2,147,483,647 |
bigint | 8 | -9.2 x 10^18 to 9.2 x 10^18 |
String Type Selection
| Type | Use When |
|---|---|
string | Maximum length is unknown or varies widely |
varchar(n) | A known upper bound exists and you want the database to enforce it |
char(n) | Values are always exactly n characters (e.g., ISO country codes, fixed-format identifiers) |
StringType in Spark. The length constraint is enforced only at the PostgreSQL layer.
JSON and XML Special Types
JSON and XML pipelines are semi-structured: the source schema must contain exactly one field, named literally_json or _xml (matching the source file type), with type string. The validator rejects any schema with a different field count or a different field name.
| Field name | Behavior |
|---|---|
_json | The whole record is treated as a nested JSON document. Stored as a document in MongoDB (default destination for JSON), or as a native json column in PostgreSQL. No schema validation is applied to the nested content. |
_xml | The whole record is treated as an XML document. Stored as a native xml column in PostgreSQL, or as a string in MongoDB. |
string in the schema:
_xml field. During ingestion, the pipeline preserves the raw content without attempting to parse it into individual columns. This is useful for semi-structured data that should be queried with JSON or XML functions downstream.
Type Coercion
During ingestion, the pipeline converts field values from strings to the declared type:- Null or empty string — passed through unchanged (no type conversion is applied). The Postgres
COPYlayer then converts an empty field toNULL. - Numeric types (
int,bigint,float,double,decimal, etc.) — the value is parsed using Scala’s built-in conversion.decimalis parsed viatoDoublein the row-map path. If parsing fails (e.g.,"abc"in anintcolumn), the job fails with an error. - String types (
string,varchar,char) — stored as-is with no conversion. - Date and timestamp — stored as the raw string value. The destination (PostgreSQL, Spark) handles date/timestamp parsing according to its own rules.
