Capillaries

It processes data

Capillaries: takes care of the scalability issues and intermediate data store, so

Users: can focus on data transforms and data quality control

Capillaries fills the gap between
  • distributed, scalable data processing/integration solutions, and
  • the need to produce enriched, customer-ready, production-quality, human-curated data within SLA time limits

Highlights

Incremental computing

Allows splitting the whole data processing pipeline into separate runs that can be started independently and re-run if needed.

Parallel processing

Splits large data volumes into smaller batches processed in parallel. Executes multiple data processing tasks (DAG nodes) in parallel.

Operator interaction

Allows human data validation for selected data processing stages.

Fault tolerance

Survives temporary underlying database connectivity issues and processing node failures.

Works with structured data artifacts

Consumes and produces delimited text files, uses database tables internally. Provides ETL/ELT capabilities. Implements a subset of the relational algebra.

Use scenarios

Capable of processing large amounts of data within SLA time limits, efficiently utilizing powerful computational (hardware, VM, containers) and storage (Cassandra) resources, with or without human monitoring/validation/intervention.