Capillaries: scale up vs scale out

2024-08-10

Capillaries: scale up vs scale out

It's all about cost and scalability

This blog post continues the scalability discussion started earlier. There are a couple of technical changes this time:

Cassandra nodes use local NVME drives for storage (RAM drives were used in past experiments)
Daemon instances are now x4 less powerful than Cassandra nodes (previously, they were 2x less powerful)
Graviton-based instances are used (previously, AMD instances were used)

This time, the focus was on trying to scale up (use more powerful instances) and scale out (use more instances) the deployment and compare the results.

Deployments and pricing

14 Capillaries deployments were tested. Each deployment consists of:

4/8/16/32 Cassandra nodes
4/8/16/32 Capillaries Daemon instances
one RabbitMQ server
one Prometheus server
one Bastion host (running Capillaries UI and Webapi)

The pricing table for all used deployments is below. All AWS instance prices are for us-east-1 as of 2024. All instances were running Ubuntu 24.04. Scaling up is about the number of cores, color-coded. Scaling out is about the number of nodes.

Deployment flavor	AWS instance					4 Cassandra nodes		8 Cassandra nodes		16 Cassandra nodes		32 Cassandra nodes
	Cassandra	Daemon	Rabbitmq	Prometheus	Bastion	Total cores	Hourly cost	Total cores	Hourly cost	Total cores	Hourly cost	Total cores	Hourly cost
aws.arm64.c7g.8	c7gd.2xlarge	c7g.large	c7g.medium	c7g.medium	c7g.large	43	$1.8491	83	$3.5899	163	$7.0715	323	$14.0347
aws.arm64.c7g.16	c7gd.4xlarge	c7g.xlarge	c7g.medium	c7g.medium	c7g.large	83	$3.5895	163	$7.0707	323	$14.0331	643	$27.9579
aws.arm64.c7g.32	c7gd.8xlarge	c7g.2xlarge	c7g.medium	c7g.medium	c7g.large	163	$7.0703	323	$14.0323	643	$27.9563
aws.arm64.c7g.64	c7gd.16xlarge	c7g.4xlarge	c7g.medium	c7g.medium	c7g.large	323	$14.0327	643	$27.9571	1283	$55.8059

portfolio_bigtest runs

These are the timings for the full run of the portfolio_bigtest in different test deployments. For those who missed previous blog posts, this test calculates detailed yearly performance for 996 stock portfolios using 14,683,696 transactions and 713,800 end-of-month holding records.

Deployment	Cores per Cassandra node	Cassandra nodes	Run time, s	Cost of run, $
aws.arm64.c7g.8	8	4	1408	$0.72
		8	776	$0.77
		16	469	$0.92
		32	301	$1.17
aws.arm64.c7g.16	16	4	634	$0.63
		8	344	$0.68
		16	263	$1.03
		32	214	$1.66
aws.arm64.c7g.32	32	4	336	$0.66
		8	231	$0.90
		16	179	$1.39
aws.arm64.c7g.64	64	4	236	$0.92
		8	147	$1.14
		16	115	$1.78

Below is the CPU usage diagram for the 16-node aws.arm64.c7g.64 run. The previous blog post discusses the CPU usage patterns of portfolio_bigtest in detail. It is worth noting that during this run, Cassandra writes peaked at around 2.3 million writes per second.

Run times

After creating a pivot table from the raw timing series, we get this summary. "Theoretical" measurements are the values that we would expect for an perfectly scalable (up and out) setup, using the "4 nodes" x "8 cores" 1408 seconds measurement as a starting point.

Cassandra nodes	Cores per Cassandra node
	8 (actual)	8 (theoretical)	16 (actual)	16 (theoretical)	32 (actual)	32 (theoretical)	64 (actual)	64 (theoretical)
4	1408	1408	634	704	336	352	236	176
8	776	704	344	352	231	176	147	88
16	469	352	263	176	179	88	115	44
32	301	176	214	88

Cost of run

The pivot table for the cost of each run:

Cassandra nodes	Cores per Cassandra node
	8 (actual)	8 (theoretical)	16 (actual)	16 (theoretical)	32 (actual)	32 (theoretical)	64 (actual)	64 (theoretical)
4	$0.72	$0.72	$0.63	$0.70	$0.66	$0.69	$0.92	$0.69
8	$0.77	$0.70	$0.68	$0.69	$0.90	$0.69	$1.14	$0.68
16	$0.92	$0.69	$1.03	$0.69	$1.39	$0.68	$1.78	$0.68
32	$1.17	$0.69	$1.66	$0.68

Findings

Scale-up attempts yield decent results for the 4-node setup (surprisingly good for the 8->16->32 core upgrade, but not as good for 64 cores) and for the 8-node setup (surprisingly good for the 8->16 core upgrade, but not as good for 32 and 64 cores). Starting from 8-node setups, using more powerful 32 and 64-core boxes becomes an expensive venture.

Scaling out works reasonably well for 8-core setups: see the green curve. For 16, 32, and 64-core setups, adding more Cassandra nodes makes runs visibly more expensive.

In both cases, the customer has to pay extra for shorter SLA time limits - the red/black cost curves are higher than the "basic" green one, and all curves tend to go up on the right end.