Is Capillaries ETL or ELT?
Capillaries is much more about the "T" than the "E" or "L":
- simple transformations and filtering can be performed when the data is being loaded, while
complex transformations are
performed after the data is loaded
- the data is intended to be stored only until all transformations are complete and the result
files are produced
Capillaries is probably best described as "etlT"
Is Capillaries "low-code" or "no-code"?
Capillaries is definitely "some-code" because data transformation rules may include Go expressions
and/or complex Python formulas. The "code" part applies only to the business logic, while the "orchestration" part does
not require coding at
all.
Why should I prefer Capillaries over my custom data pipelines?
Capillaries handles orchestration, scalability, and intermediate data storage, so you can focus
solely on the transformation logic.
Why should I prefer Capillaries over other distributed processing systems?
- it's free and open-source
- it can be quickly deployed on private or public VM or container infrastructure and disposed of
when no longer needed
- it's better than no-code systems because it allows you to perform complex Python calculations at
the row level
- it's better than code-heavy systems because it doesn’t require deep knowledge of any programming
language
- with intermediate data stored in Cassandra tables, all data processing steps are extremely
transparent, making
troubleshooting easier
What do I need to run Capillaries?
To set up a
Capillaries environment, you need to provide:
- a Cassandra cluster
- a few VMs/containers running Capillaries workers
- a VM/container running Capillaries Webapi and UI
- a VM/container running RabbitMQ server
- monitoring and logging infrastructure (optional, but recommended)
To run data processing for a specific dataset, you need to provide data in files, served from an NFS
drive, HTTP(S)
server, or S3 bucket:
and a browser to use the
Capillaries UI or a REST API client to call
Capillaries Webapi directly. After a
Capillaries run is complete, you get a set of files (NFS or S3) containing transformed data.
Do I need to know SQL or a similar query language to define Capillaries transforms?
No. Capillaries implements some transformations that use relational algebra concepts like
lookups,
grouping, and
denormalization, but users specify these transformations declaratively in the
script file.