Is Capillaries ETL or ELT?
Capillaries is much more about the "T" (Transform) rather than "E" (Extract) or "L" (Load):
- Simple filters and field-level transformations can occur during loading
- More complex transformations are performed after the data is fully loaded
- Data is only temporarily stored - just long enough to complete all transformations and output
the results
Capillaries can be probably best described as "etlT"
Is Capillaries "low-code" or "no-code"?
Capillaries is definitely "some-code" because data transformation rules may involve Go
expressions
and/or complex Python
formulas. The "code" part applies only to the business logic, while the "orchestration" part
does
not require any coding at all.
Why choose Capillaries over custom data pipelines?
Capillaries handles orchestration, scalability, and
intermediate data storage, so you can focus
entirely on your transformation logic.
Why choose Capillaries over other distributed processing systems?
- it's free and open-source
- fast to deploy
on any VM/container environment (and easy to tear down)
- it's better than no-code systems because it allows you to perform complex Python calculations at
the row level
- it's better than code-heavy systems because it doesn’t require a thorough knowledge of any
programming language
- with all intermediate data stored in Cassandra
tables, data processing steps are extremely
transparent, making troubleshooting easier
What do I need to run Capillaries?
To set up
Capillaries,
you will need:
- a Cassandra cluster
- a few VMs/containers running Capillaries
workers
- a VM/container running Capillaries Webapi
and UI
- a VM/container running RabbitMQ server
- monitoring and logging infrastructure (optional, but recommended)
To run data processing jobs for a specific dataset, you need to provide configuration and data files,
served from an NFS drive, HTTP(S) server, or S3 bucket:
and a browser to use the Capillaries
UI or a REST API client to call Capillaries
Webapi directly.
After a Capillaries
run is complete, containing transformed data is available on NFS or S3.
Do I need to know SQL or a similar query language to define Capillaries transforms?
No. Capillaries supports transformations inspired by relational algebra (e.g.,
lookups,
grouping, and
denormalization),
but they are declared in a
JSON
script, not written as SQL.