On Cloud Database Performance Test. Q&A with Jeff Healey
Q1. McKnight Consulting Group recently published a Cloud Database Performance Test. What is the focus of this report?
Jeff Healey: The recently published Cloud Database Performance Test paper from the McKnight Consulting Group is a product profile and evaluation that focuses on the performance of managing cloud-enabled, enterprise-ready, relationally based, analytical workloads.
Q2. What data cloud platforms were considered in the study?
Jeff Healey: The report compared the performance, scale, and price of three leading cloud data warehouses – Vertica Analytics Platform, Amazon Redshift, and another unnamed data cloud platform offered as a managed service.
Q3. What is the intent of the test’s design?
Jeff Healey: The intent of the test’s design was to simulate a set of basic analytics scenarios to answer fundamental business questions that an organization from nearly any industry sector might encounter and ask.
Q4. What are the business questions that this report is trying to answer?
Jeff Healey: There are an increasing number of cloud data warehouse options that businesses consider to serve as the backbone of their predictive analytics initiatives. At some point in the sales cycle, companies tend to request performance benchmarks to ensure that they are choosing the right cloud data warehouse to meet their performance needs as their data volumes and users grow. That’s always been important – will the cloud data warehouse that I choose maintain consistently high performance as data volumes grow and more and more concurrent users access the system (i.e., performance at scale)? But, now more than ever, organizations also want to know – which platform will provide me with the top performance at scale for the best overall cost? That’s what the McKnight Consulting Group calls price per performance. This paper answers those questions with detailed benchmarks.
Q5. What metrics were used to compare products?
Jeff Healey: The benchmark tests comprise the following tests and calculations to determine the benchmark results:
- Elapsed time of the longest running query threads when executing queries on 10, 50, and 250 TB data sets across 10, 30, and 60 concurrent users
- Elapsed time of the longest running query threads and the throughput of queries per hour during those tests
- Total queries each platform executed in an hour, measured in QPH (Queries Per Hour)
- Number of queries completed in an hour
- Price per performance – Elapsed time of test (seconds) x Cost of platform ($/hour)/3,600 (seconds/hour)
Q6. What type of data and what size were used to test scalability?
Jeff Healey: The tests were based on the University of California Berkeley AMPLab Big Data Benchmark (https://amplab.cs.berkeley.edu/benchmark/). To broaden the reach of the test to meet the goal of representing everyday business questions, the McKnight Group added an advanced analytics query in the domain of Session Identification, as they have done in prior benchmarks. The tests were executed on 10, 50, and 250TB data sets across 10, 30, and 60 concurrent users.
Q7. Testing hardware and software across cloud vendors is very challenging. How did the report manage this challenge?
Jeff Healey: McKnight Consulting Group has many years of experience in conducting these types of performance benchmarks, so they understand what it takes to load data, manage the various environments, execute the queries, and report out the results in a consumable way for readers. All platforms were tested on the same cloud in the same region. McKnight Consulting matched, to the best of their ability, hardware nodes, cores, and RAM across platforms, and when that information wasn’t available from the cloud provider, matched cost equivalent infrastructure. Their methodology was explicitly explained in detail in the report, so the mental process used to find the closest equivalency is clear.
Q8. What are the overall test results?
Jeff Healey: By a significant margin, Vertica had better query response performance across the board at every level of concurrency and scale. As an example, for the largest test on the 250 TB workload, Vertica in Eon Mode completed 2.5x as many queries per hour (QPH) as the next highest database (Redshift) at 10 concurrent users, 2x QPH more than the next highest database (Redshift) at 30 concurrent users, and 1.14x QPH more than the next highest database (unnamed cloud data platform) for the 60 concurrent user workload.
In terms of price-performance, Vertica in Eon Mode was the least expensive. For example, for the 250 TB workload, with 10 concurrent users, Vertica in Eon Mode was 2.9x less expensive than Redshift. At 30 concurrent users, Vertica in Eon Mode was 2.4x less expensive than Redshift. At 60 concurrent users, Vertica in Eon Mode was 1.8x less expensive than Redshift. The unnamed data cloud platform had price-performance between 6.4 and 13.1 times higher. So, by an even bigger margin than performance alone, Vertica had better price per performance than other cloud data analytics platforms across the board. Added to our multiple third-party ROI case studies showing this level of performance for low TCO in practice, it makes a compelling case for Vertica’s value.
Q9. The issue of fairness of the test was left for the reader to determine. Do you have any comment on this?
Jeff Healey: All benchmarks were conducted with the core capabilities of the three cloud databases, following documented best practices from each of the vendors. Also, comparable hardware instances were chosen for the three cloud databases to provide an apples-to-apples comparison. To ensure complete fairness, we even recommended that the McKnight Consulting Group use the unnamed cloud platform’s autoscale capability, since the query performance would have been orders of magnitude slower than both Vertica and Redshift without it. Ironically, with the autoscaling, that data cloud platform is way more expensive, but that is pretty much the same tradeoff anyone using that platform will face.
Qx. Anything else you wish to add?
Jeff Healey: Third-party performance benchmark studies are always an important part of any evaluation process when considering cloud databases. That way, you can validate the vendor’s claims in terms of providing the best value for analytical performance. In addition, organizations should always speak to end customers and understand if the cloud database meets their specific needs and prepares them for future growth and inevitable change.
_____________________________
Jeff Healey, Senior Director of Vertica Marketing, Micro Focus
Jeff leads the marketing team supporting the Vertica Analytics Platform, the industry’s unified analytical warehouse purpose built to manage massive volumes of Big Data at extreme scale with the highest levels of performance. With more than 20 years of high-tech marketing experience and deep knowledge of product marketing go-to-market planning and execution, Jeff has previously led product marketing for Axeda Corporation (now PTC), the leading Internet of Things platform with millions of connected assets under management. Prior to Axeda, Jeff held product marketing, customer success, and lead editorial roles at MathWorks, Macromedia (now Adobe), Sybase (now SAP), and The Boston Globe.
Sponsored by Vertica