Distributed Graph Partitioning Made Feasible
Distributed Graph Partitioning Made Feasible
Authors: Fatemeh Rahimian (fatemeh at sics.se) and Amir H. Payberah (amir at sics.se)
Affiliation: SICS Swedish ICT (www.sics.se)
Our world is increasingly connected, and this is changing our lives in ways that we still do not fully understand. New tools and technologies have given us the unprecedented potential to make sense of the huge inter-connected datasets that exist in various fields of science, from social networks to biological networks. Relations can be mathematically described as “graphs”; for example, a network of friends on Facebook can be described by a graph, which shows a user’s closest friends and who they are related to.
The increasing size of datasets means that graph sizes are also increasing. This implies that they must be partitioned into smaller clusters so they can be more easily managed on multiple machines in a distributed fashion. In its classical form, graph partitioning usually refers to edge-cut partitioning; that is, to divide vertices of a graph into disjoint clusters of nearly equal size (called balanced partitioning), whereas the number of edges that span separated clusters is minimal.
Balanced graph partitioning is a well known NP-complete problem with a wide range of applications and solutions. However, in very large-scale distributed scenarios, state-of-the-art algorithms are not directly applicable, because they typically involve frequent global operations over the entire graph. An alternative solution is a distributed approach, where no global knowledge is required and simultaneous access to the entire graph is not assumed. Ja-be-Ja [1, 3] is a distributed heuristic-based solution that efficiently partitions big graphs into a given number of clusters of equal size or any given size. The algorithm is inherently a local search optimization enhanced using a simulated annealing technique.
Ja-be-Ja starts by randomly assigning vertices to partitions, and over the course of the algorithm, vertices from different partitions iteratively negotiate with each other and exchange their partition assignments, if it results in a better partitioning. When this iterative optimization converges, neighbouring vertices of the graph will mostly end up in the same partition, while there will be very few edges that cross different partitions. A variant of this work has been specifically designed for graphs with power-law degree distribution, where vertex-cut partitioning has proved more efficient than edge-cut partitioning [2, 3].
Ja-be-Ja is kept as simple as possible, so that the algorithm can be easily adopted by various real world applications. With the recent advances in graph database technologies, it is anticipated that leading companies in this field will soon move to make use of graph partitioning for scaling up. Successful partitioning is an important ingredient in efficient big data analysis and that is why this solution has attracted a lot of attention in the community.
[1] Fatemeh Rahimian, Amir H. Payberah, Sarunas Girdzijauskas, Mark Jelasity, and Seif Haridi, “Ja-be-ja: A distributed algorithm for balanced graph partitioning (.PDF)”, In the 7th IEEE Conference on Self-Adaptive and Self-Organizing Systems (SASO), 2013.
[2] Fatemeh Rahimian, Amir H. Payberah, Sarunas Girdzijauskas, and Seif Haridi, “Distributed vertex-cut partitioning”, In the 14th IFIP International conference on Distributed Applications and Interoperable Systems (DAIS), 2014.
[3] Fatemeh Rahimian, Amir H. Payberah, Sarunas Girdzijauskas, Mark Jelasity, and Seif Haridi, “A Distributed Algorithm For Large-Scale Graph Partitioning”, In the ACM Transactions on Autonomous and Adaptive Systems (TAAS), 2015.”