Towards a Collective Layer in the Big Data Stack

Towards a Collective Layer in the Big Data Stack
Thilina Gunarathne
Department of Computer Science Indiana University, Bloomington
Judy Qiu
Department of Computer Science Indiana University, Bloomington
Dennis Gannon
Microsoft Research, Redmond,WA

Abstract
We generalize MapReduce, Iterative MapReduce and data intensive MPI runtime as a layered Map-Collective architecture with Map-AllGather, Map-AllReduce, MapRe- duceMergeBroadcast and Map-ReduceScatter patterns as the initial focus. Map-collectives improve the performance and efficiency of the computations while at the same time facilitat- ing ease of use for the users. These collective primitives can be applied to multiple runtimes and we propose building high performance robust implementations that cross cluster and cloud systems. Here we present results for two collectives shared between Hadoop (where we term our extension H- Collectives) on clusters and the Twister4Azure Iterative MapReduce for the Azure Cloud. Our prototype implementa- tions of Map-AllGather and Map-AllReduce primitives achieved up to 33% performance improvement for K-means Clustering and up to 50% improvement for Multi-Dimensional Scaling, while also improving the user friendliness. In some cases, use of Map-collectives virtually eliminated almost all the overheads of the computations.

Keywords: MapReduce, Twister, Collectives, Cloud, HPC, Performance, K-means, MDS

DOWNLOAD Article (.PDF): TowardsaCollectiveLayer

You may also like...