From TPC-C to Big Data Benchmarks: A Functional Workload Model
Authors: Yanpei Chen, Francois Raab, Randy H. Katz.
Electrical Engineering and Computer Sciences University of California at Berkeley
Technical Report No. UCB/EECS-2012-174
July 1, 2012
Abstract. Big data systems help organizations store, manipulate, and derive value from vast amounts of data. Relational database and MapRe- duce are two, arguably competing implementations of such systems. They are characterized by very large data volumes, diverse unconventional data types and complex data analysis functions. These properties make it challenging to develop big data benchmarks that reflect real life use cases and cover multiple types of implementation options. In this position paper, we combine experiences from the TPC-C benchmark with emerg- ing insights from MapReduce application domains to argue for using a model based on functions of abstraction to construct future benchmarks for big data systems. In particular, this model describes several com- ponents of the targeted workloads: the functional goals that the system must achieve, the representative data access patterns, the scheduling and load variations over time, and the computation required to achieve the functional goals. We show that the TPC-C benchmark already applies such a model to benchmarking transactional systems. A similar model can be developed for other big data systems, such as MapReduce, once additional empirical studies are performed. Identifying the functions of abstraction for a big data application domain represents the first step towards building truly representative big data benchmarks.
Download Paper (LINK to .PDF)
Copyright © 2012, by the author(s). All rights reserved.
Download the latest public version of this paper (.PDF): LNCSFunctionalWorkload.pdf