Graph Analytics with Billions of Daily Financial Transaction Events
In this second blog of the series we updated the performance results to include the numbers from running on a 32 node Amazon EC2 cluster.
As a reminder, the requirement for this use case was to ingest one billion financial transaction events in under 24 hours (about 12,000 transactions a second) while being able to perform complex query and graph operations simultaneously. ThingSpan has proven to handle this requirement to ingest while performing these complex queries at the same time at speed and scale. It is important to note that the graph in this case is incrementally added to as new events are ingested. In this way the graph is always up to date with the latest data (nodes) and connections (edges) and is always available for query in a transactional consistent state, what we call a living graph.
We used Amazon EC2 nodes for the tests and found that “m4x2large” gave the best resources for scalability, performance, and throughput.
We surpassed the performance requirements using a configuration consisting of 16 “m4x2large” nodes. For this test we used the native Posix file system.
Full details of the proof of concept can be downloaded from:
Below are the updated results when running on a 32 node Amazon EC2 cluster.
We can use these results to show near linear scale up as the volume of data increases and scale out to achieve better throughput performance.
The blue bar represents the time to ingest 250 million events. Therefore, by doubling the number of compute nodes, the ingest time halves. You can see the same affect with the red bar for 500 million.
Also, if you compare the red and blue bars for each of 4, 8, 16, 32 nodes, you will see that for the same number of nodes doubling the number of events effectively doubles the time to process those events.
The results show that ThingSpan continues to scale near linearly as the volume of data increases and as the number of compute nodes is increased.
ThingSpan exceeded the requirement of ingesting one billion transaction events in under 24 hours (actually 17 hours and 30 minutes) while simultaneously querying the graph.
We also show that doubling the number of nodes from 16 to 32 almost halved the ingest time.
Stay tuned for future blogs that will highlight ThingSpan’s performance as we continue to scale out the number of compute nodes and the number of transaction events.