The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems
Zhen Jia1,2, Runlin Zhou3, Chunge Zhu3, Lei Wang1,2, Wanling Gao1,2, Yingjie Shi1, Jianfeng Zhan ⋆1, and Lixin Zhang1
1State Key Laboratory Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, China 2University of Chinese Academy of Sciences, China
3National Computer network Emergency Response Technical Team Coordination Center of China
Abstract. Now we live in an era of big data, and big data applications are becoming more and more pervasive. How to benchmark data cen- ter computer systems running big data applications (in short big data systems) is a hot topic. In this paper, we focus on measuring the perfor- mance impacts of diverse applications and scalable volumes of data sets on big data systems. For four typical data analysis applications—an im- portant class of big data applications, we find two major results through experiments: first, the data scale has a significant impact on the perfor- mance of big data systems, so we must provide scalable volumes of data sets in big data benchmarks. Second, for the four applications, even all of them use the simple algorithms, the performance trends are different with increasing data scales, and hence we must consider not only variety of data sets but also variety of applications in benchmarking big data systems.
Keywords: Big Data, Benchmarking, Scalable Data
Download article (LINK to .PDF)