Node
A node is a running process that manages and executes pipelines.
When a node starts, it will automatically attempt to connect to the platform, giving you a seamless way to manage and deploy pipelines through a web interface. However, using the platform is optional—you can also be manually manage pipelines via the node’s REST API.
Standalone vs. Managed Pipeline Execution
To understand the benefits of a node, let’s first consider how you can run
pipelines without one. You run a single pipeline directly from the command line
using the tenzir
binary:
This standalone execution mode of a pipeline is ideal for ad-hoc data
transformations akin to how one would use jq
, but with much broader data
handling capabilities.
For continuous and more dependable dataflow processing, you will quickly realize tphat you also need scheduled execution, automatic restarting, monitoring of warnings/errors, and more advanced execution capabilities, like real-time enrichment with contextual data or correlation with historical data.
This is where a node comes into play, offering a vehicle to execute one or more
pipelines in a managed fashion. You can spawn a node with the tenzir-node
binary or by running the Docker container that contains this binary: