Types
Tenzir’s type system is a superset of JSON. That is, every valid JSON object is
a valid Tenzir value, but there also additional types available, such as ip
and subnet
.
Terminology
Section titled “Terminology”The diagram below illustrates the type system at a glance:
Basic Types
Section titled “Basic Types”Basic types are stateless types with a static structure. The following basic types exist:
Type | Description |
---|---|
none | Denotes an absent or invalid value |
bool | A boolean value |
int64 | A 64-bit signed integer |
uint64 | A 64-bit unsigned integer |
double | A 64-bit double (IEEE 754) |
duration | A time span (nanosecond granularity) |
time | A time point (nanosecond granularity) |
string | A UTF-8 encoded string |
blob | An arbitrary sequence of bytes |
ip | An IPv4 or IPv6 address |
subnet | An IPv4 or IPv6 subnet |
Complex Types
Section titled “Complex Types”Complex types are stateful types that carry additional runtime information.
Enumeration
Section titled “Enumeration”The enum
type is a list of predefined string values. It comes in handy for
low-cardinality values from a fixed set of options.
Tenzir implements an enum
as an Arrow Dictionary.
The list
type is an ordered sequence of values with a fixed element type.
Lists have zero or more elements.
Record
Section titled “Record”The record
type consists of an ordered sequence fields, each of which have a
name and type. Records must have at least one field.
The field name is an arbitrary UTF-8 string.
The field type is any Tenzir type.
Optionality
Section titled “Optionality”All types are optional in that there exists an additional null
data point in
every value domain. Consequently, Tenzir does not have a special type to
indicate optionality.
Attributes
Section titled “Attributes”Every type has zero or more attributes, which are free-form key-value pairs to enrich types with custom semantics.
Comparison to Arrow
Section titled “Comparison to Arrow”All Tenzir types have a lossless mapping to Arrow types, however, not all Arrow types have a Tenzir equivalent. As a result, it is currently not yet possible to import arbitrary Arrow data. In the future, we plan to extend our support for Arrow-native types and also offer conversion options for seamless data handover.
Tenzir has a few domain-specific types that map to Arrow extension
types. These are currently enum
, ip
, and subnet
. Tenzir
and Arrow attach type metadata to different entities: Tenzir attaches metadata
to a type instance, whereas Arrow attaches metadata to a schema or record field.