Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data
Discussion of BigBench:
A Proposed Industry Standard Performance Benchmark for Big Data
Chaitan Baru, Milind Bhandarkar, Carlo Curino, Manuel Danisch, Michael Frank, Bhaskar Gowda, Hans-Arno Jacobsen, Huang Jie, Dileep Kumar, Raghu Nambiar, Meikel Poess, Francois Raab, Tilmann Rabl, Nishkam Ravi, Kai Sachs, Saptak Sen, Lan Yi, and Choonhan Youn
Abstract. There is a huge interest in mining the treasures that can be found in big data. New storage systems and processing paradigms allow for ever larger data sets to be collected and analyzed. The high demand and rapid development has led to a sizable ecosystem of big data pro- cessing systems. Due to the lack of standards and standard benchmarks, users have a hard time choosing the right systems for their requirements. To solve this problem, we have developed BigBench. BigBench is the first end-to-end big data analytics benchmark suite. In this paper, we will analyze the BigBench workload from technical aspects as well as from a business point of view. We will categorize the queries along different dimensions and analyze their runtime behavior. Furthermore, we discuss the relevancy of the workload from an industrial point of view and pro- pose additional extensions to achieve extended coverage of typical big data processing use cases.
DOWNLOAD PAPER (.pdf):bigbench-workload-tpctc_final
Published in Sixth TPC Technology Conference on Performance Evaluation & Benchmarking, 2014. Springer Berlin Heidelberg.