Glossary
This page defines central terms in the Tenzir ecosystem.
Web user interface to access platform at app.tenzir.com.
The app is a web application that partially runs in the user’s browser. It is written in Svelte.
Catalog
Section titled “Catalog”Maintains partition ownership and metadata.
The catalog is a component in the node that owns the partitions, keeps metadata about them, and maintains a set of sparse secondary indexes to identify relevant partitions for a given query. It offers a transactional interface for adding and removing partitions.
Connector
Section titled “Connector”Manages chunks of raw bytes by interacting with a resource.
A connector is either a loader that acquires bytes from a resource, or a
saver that sends bytes to a resource. Loaders are implemented as ordinary
operators prefixed with load_*
while savers are
prefixed with save_*
.
Context
Section titled “Context”A stateful object used for in-band enrichment.
Contexts come in various types, such as a lookup table, Bloom filter, and GeoIP database. They live inside a node and you can enrich with them in other pipelines.
- Read more about enrichment
Destination
Section titled “Destination”An pipeline ending with an output operator preceded by a
subscribe
input operator.
- Learn more about pipelines
Edge Storage
Section titled “Edge Storage”The indexed storage that pipelines can use at the node. Every node has
a light-weight storage engine for importing and exporting events. You must mount
the storage into the node such that it can be used from pipelines
using the import
and
export
operators. The storage
cengine comes with a catalog that tracks partitions
and keeps sparse indexes to accelerate historical queries.
A record of typed data. Think of events as JSON objects, but with a richer type system that also has timestamps, durations, IP addresses, and more. Events have fields and can contain numerous shapes that describe its types (= the schema).
- Learn more about pipelines
Format
Section titled “Format”Translates between bytes and events.
A format is either supported by a parser that converts bytes to events, or a printer that converts events to bytes. Example formats are JSON, CEF, or PCAP.
- See available operators for parsing
- See available operators for printing
- See available functions for parsing
- See available functions for printing
Function
Section titled “Function”Computes something over a value in an event. Unlike operators that work on streams of events, functions can only act on single values.
- See available functions
Optional data structures for accelerating queries involving the node’s edge storage.
Tenzir featres in-memory sparse indexes that point to partitions.
An operator that only producing data, without consuming anything.
- Learn more about pipelines
Integration
Section titled “Integration”A set of pipelines to integrate with a third-party product.
An integration describes use cases in combination with a specific product or tool. Based on the depth of the configuration, this may require configuration on either end.
Library
Section titled “Library”A collection of packages.
Our community library is freely available at GitHub.
Loader
Section titled “Loader”A connector that acquires bytes.
A loader is the dual to a saver. It has a no input and only performs a
side effect that acquires bytes. Use a loader implicitly with the
from
operator or explicitly with the load_*
operators.
- Learn more about pipelines
A host for pipelines and storage reachable over the network.
The tenzir-node
binary starts a node in a dedicated server process that
listens on TCP port 5158.
- Deploy a node
- Use the REST API to manage a node
- Import into a node
- Export from a node
Metrics
Section titled “Metrics”Runtime statistics about the node and pipeline execution.
The Open Cybersecurity Schema Framework (OCSF) is a cross-vendor schema for security event data. Our community library contains packages that map data sources to OCSF.
Operator
Section titled “Operator”The building block of a pipeline.
An operator is an input, a transformation, or an output.
- See all available operators
Output
Section titled “Output”An operator consuming data, without producing anything.
- Learn more about pipelines
The acronym PaC stands for Pipelines as Code. It is meant as an adaptation of Infrastructure as Code (IaC) with pipelines represent the (data) infrastructure that is provisioning as code.
- Learn how to provision piplines as code.
Package
Section titled “Package”A collection of pipelines and contexts.
- Read more about packages
- Learn how to write a package
Parser
Section titled “Parser”A bytes-to-events operator.
A parser is the dual to a printer. You use a parser implicitly in
the from
operator, or via the read_*
operators. There exist also functions for applying parsers to
string values.
- Learn more about pipelines
- See available operators for parsing
- See available functions for parsing
Partition
Section titled “Partition”The horizontal scaling unit of the storage attached to a node.
A partition contains the raw data and optionally a set of indexes. Supported formats are Parquet or Feather.
- Control the partition size
- Configure catalog and partition indexes
- Select the store format
- Adjust the store compression
- Rebuild partitions
Pipeline
Section titled “Pipeline”Combines a set of operators into a dataflow graph.
- Learn more about pipelines
- Run a pipeline
Platform
Section titled “Platform”The control plane for nodes and pipelines, accessible at app.tenzir.com.
- Understand the Tenzir architecture
Printer
Section titled “Printer”An events-to-bytes operator.
A format that translates events into bytes.
A printer is the dual to a parser. Use a parser implicitly in the
to
operator.
- Learn more about pipelines
- See available operators for printing
- See available functions for printing
A connector that emits bytes.
A saver is the dual to a loader. It has a no output and only performs
a side effect that emits bytes. Use a saver implicitly with the
to
operator or explicitly with the save_*
operators.
- Learn more about pipelines
Schema
Section titled “Schema”A top-level record type of an event.
Source
Section titled “Source”An pipeline starting with an input operator followed by a
publish
output operator.
- Learn more about pipelines
An acronym for Tenzir Query Language.
TQL is the language in which users write pipelines.
- Learn more about the language
Transformation
Section titled “Transformation”An operator consuming both input and producing output.
- Learn more about pipelines