Abstract
Balanced graph partitioning is an NP-complete problem with a wide range of applications. These applications include many large-scale distributed problems, including the optimal storage of large sets of graph-structured data over several hosts. However, in very large-scale distributed scenarios, state-of-the-art algorithms are not directly applicable because they typically involve frequent global operations over the entire graph. In this article, we propose a fully distributed algorithm called JA-BE-JA that uses local search and simulated annealing techniques for two types of graph partitioning: edge-cut partitioning and vertex-cut partitioning. The algorithm is massively parallel: There is no central coordination, each vertex is processed independently, and only the direct neighbors of a vertex and a small subset of random vertices in the graph need to be known locally. Strict synchronization is not required. These features allow JA-BE-JA to be easily adapted to any distributed graph-processing system from data centers to fully distributed networks. We show that the minimal edge-cut value empirically achieved by JA-BE-JA is comparable to state-of-the-art centralized algorithms such as Metis. In particular, on large social networks, JA-BE-JA outperforms Metis. We also show that JA-BE-JA computes very low vertex-cuts, which are proved significantly more effective than edge-cuts for processing most real-world graphs.
- Amine Abou-Rjeili and George Karypis. 2006. Multilevel algorithms for partitioning power-law graphs. In Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS’06). IEEE, 10--pp. Google Scholar
Digital Library
- Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 1 (2002), 47.Google Scholar
Cross Ref
- Réka Albert, Hawoong Jeong, and Albert-László Barabási. 2000. Error and attack tolerance of complex networks. Nature 406, 6794 (2000), 378--382.Google Scholar
- Konstantin Andreev and Harald Räcke. 2004. Balanced graph partitioning. In Proceedings of ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’04). ACM, 120--124. Google Scholar
Digital Library
- Alex Averbuch and Martin Neumann. 2013. Partitioning graph databases-a quantitative evaluation. arXiv preprint arXiv:1301.5121 (2013).Google Scholar
- Asad Awan, Ronaldo A. Ferreira, Suresh Jagannathan, and Ananth Grama. 2006. Distributed uniform sampling in unstructured peer-to-peer networks. In Proceedings of Hawaii International Conference on System Sciences (HICSS’06), Vol. 9. IEEE, 223c--223c. Google Scholar
Digital Library
- Raul Baños, Consolación Gil, Julio Ortega, and Francisco G. Montoya. 2003. Multilevel heuristic algorithm for graph partitioning. In Proceedings of Applications of Evolutionary Computing. Springer, 143--153. Google Scholar
Digital Library
- Una Benlic and Jin-Kao Hao. 2011a. An effective multilevel tabu search approach for balanced graph partitioning. Computers & Operations Research 38, 7 (2011), 1066--1075. Google Scholar
Digital Library
- Una Benlic and Jin-Kao Hao. 2011b. A multilevel memetic approach for improving graph k-partitions. IEEE Transactions on Evolutionary Computation (TEC) 15, 5 (2011), 624--642. Google Scholar
Digital Library
- Thang Nguyen Bui and Byung Ro Moon. 1996. Genetic algorithm and graph partitioning. IEEE Transactions on Computers (TC) 45, 7 (1996), 841--855. Google Scholar
Digital Library
- Pierre Chardaire, Musbah Barake, and Geoff P. McKeown. 2007. A probe-based heuristic for graph partitioning. IEEE Transactions on Computers 56, 12 (2007), 1707--1720. Google Scholar
Digital Library
- David Dominguez-Sal, P. Urbón-Bayes, Aleix Giménez-Vañó, Sergio Gómez-Villamor, Norbert Martínez-Bazán, and Josep-Lluis Larriba-Pey. 2010. Survey of graph database performance on the HPC scalable graph analysis benchmark. (2010), 37--48. Google Scholar
Digital Library
- Jim Dowling and Amir H. Payberah. 2012. Shuffling with a croupier: Nat-aware peer-sampling. In Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS’12). IEEE, 102--111. Google Scholar
Digital Library
- Anton J. Enright, Stijn Van Dongen, and Christos A. Ouzounis. 2002. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research 30, 7 (2002), 1575--1584.Google Scholar
Cross Ref
- Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic, and Wolfgang Kellerer. 2010. Outtweeting the twitterers-predicting information cascades in microblogs. In Proceedings of Workshop on Online Social Networks (WOSN). USENIX Association, 3--3. Google Scholar
Digital Library
- Joachim Gehweiler and Henning Meyerhenke. 2010. A distributed diffusive heuristic for clustering a virtual P2P supercomputer. In Proceedings of IEEE International Parallel & Distributed Processing Symposium Workshops and Phd Forum (IPDPSW’’10). IEEE, 1--8.Google Scholar
Cross Ref
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of USENIX Symposium on Operating System Design and Implementation (OSDI), Vol. 12. USENIX, 2. Google Scholar
Digital Library
- Alessio Guerrieri and Alberto Montresor. 2014. Distributed edge partitioning for graph processing. arXiv preprint arXiv:1403.6270 (2014).Google Scholar
- Bruce Hendrickson. 1998. Graph partitioning and parallel solvers: Has the emperor no clothes? In Proceedings of Solving Irregularly Structured Problems in Parallel. Springer, 218--225. Google Scholar
Digital Library
- Bruce Hendrickson and Robert W. Leland. 1995. A multi-level algorithm for partitioning graphs. In Proceedings of the 1995 ACM/IEEE Conference on Supercomputing(SC’95). 28. Google Scholar
Digital Library
- Márk Jelasity, Alberto Montresor, and Ozalp Babaoglu. 2005. Gossip-based aggregation in large dynamic networks. ACM Transactions on Computer Systems (TOCS) 23, 3 (2005), 219--252. Google Scholar
Digital Library
- George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20, 1 (1998), 359--392. Google Scholar
Digital Library
- George Karypis and Vipin Kumar. 1999a. Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Review 41, 2 (1999), 278--300. Google Scholar
Digital Library
- George Karypis and Vipin Kumar. 1999b. Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Review 41, 2 (1999), 278--300. Google Scholar
Digital Library
- Brian W. Kernighan and Shen Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal 49, 2 (1970), 291--307.Google Scholar
Cross Ref
- Mijung Kim and K. Selçuk Candan. 2012. SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data & Knowledge Engineering 72 (2012), 285--303. Google Scholar
Digital Library
- Maciej Kurant, Athina Markopoulou, and Patrick Thiran. 2010. On the bias of BFS (breadth first search). In Proceedings of International Teletraffic Congress (ITC’10). IEEE, 1--8.Google Scholar
Cross Ref
- Kevin Lang. 2004. Finding good nearly balanced cuts in power law graphs. Technology Report YRL-2004-036, Yahoo! Research Labs (2004).Google Scholar
- Jure Leskovec. 2011. Stanford Large Network Dataset collection. URL http://snap.stanford.edu/data/index. html (2011).Google Scholar
- Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29--123.Google Scholar
Cross Ref
- Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment. 5, 8 (2012), 716--727. Google Scholar
Digital Library
- Gabriel Luque and Enrique Alba. 2011. Parallel Genetic Algorithms: Theory and Real World Applications. Vol. 367. Springer. Google Scholar
Digital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD’10). ACM, 135--146. Google Scholar
Digital Library
- Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, and Ayalvadi Ganesh. 2006. Peer counting and sampling in overlay networks: Random walk methods. In Proceedings of ACM Symposium on Principles of Distributed Computing (PODC). ACM, 123--132. Google Scholar
Digital Library
- Henning Meyerhenke, Burkhard Monien, and Thomas Sauerwald. 2008. A new diffusion-based multilevel algorithm for computing graph partitions of very high quality. In Proceedings of IEEE International Symposium on Parallel and Distributed (IPDPS’08). IEEE, 1--13.Google Scholar
Cross Ref
- Henning Meyerhenke, Burkhard Monien, and Stefan Schamberger. 2009. Graph partitioning and disturbed diffusion. Parallel Comput. 35, 10 (2009), 544--569. Google Scholar
Digital Library
- Alberto Montresor and Márk Jelasity. 2009. PeerSim: A scalable P2P simulator. In Proceedings of IEEE International Conference on Peer-to-Peer Computing (P2P’09). IEEE, 99--100.Google Scholar
Cross Ref
- Amir H. Payberah, Jim Dowling, and Seif Haridi. 2011. Gozar: Nat-friendly peer sampling with one-hop distributed nat traversal. In Proceedings of IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS). Springer, 1--14. Google Scholar
Digital Library
- Fatemeh Rahimian, Amir H. Payberah, Sarunas Girdzijauskas, and Seif Haridi. 2014. Distributed vertex-cut partitioning. In Proceedings of IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS’14). Springer.Google Scholar
Digital Library
- Fatemeh Rahimian, Amir H. Payberah, Sarunas Girdzijauskas, Mark Jelasity, and Seif Haridi. 2013. Ja-be-Ja: A distributed algorithm for balanced graph partitioning. In Proceedings of IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO’13). IEEE, 51--60. Google Scholar
Digital Library
- Lakshmish Ramaswamy, Bugra Gedik, and Ling Liu. 2005. A distributed approach to node clustering in decentralized peer-to-peer networks. IEEE Transactions on Parallel and Distributed Systems (TPDS) 16, 9 (2005), 814--829. Google Scholar
Digital Library
- Peter Sanders and Christian Schulz. 2011. Engineering multilevel graph partitioning algorithms. In Algorithms (ESA’11). Springer, 469--480. Google Scholar
Digital Library
- Peter Sanders and Christian Schulz. 2012. Distributed evolutionary graph partitioning. In Proceedings of ALENEX. SIAM, 16--29. Google Scholar
Digital Library
- Alan J. Soper, Chris Walshaw, and Mark Cross. 2004. A combined evolutionary search and multilevel optimisation approach to graph-partitioning. Journal of Global Optimization 29, 2 (2004), 225--241. Google Scholar
Digital Library
- El-Ghazali Talbi. 2009. Metaheuristics: From Design to Implementation. Vol. 74. John Wiley & Sons. Google Scholar
Digital Library
- E.-G. Talbi and Pierre Bessiere. 1991. A parallel genetic algorithm for the graph partitioning problem. In Proceedings of ACM International Conference on Supercomputing (ICS’91). ACM, 312--320. Google Scholar
Digital Library
- Norbert Tölgyesi and Márk Jelasity. 2009. Adaptive peer sampling with newscast. In Proceedings of International Conference on Parallel Processing (Euro-Par’09). Springer, 523--534. Google Scholar
Digital Library
- Peter J. M. Van Laarhoven and Emile H. L. Aarts. 1987. Simulated Annealing. Springer.Google Scholar
- Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the evolution of user interaction in facebook. In Proceedings of ACM Workshop on Online Social Networks (WOSN’09). ACM, 37--42. Google Scholar
Digital Library
- Spyros Voulgaris, Daniela Gavidia, and Maarten Van Steen. 2005. Cyclon: Inexpensive membership management for unstructured p2p overlays. Journal of Network and Systems Management 13, 2 (2005), 197--217.Google Scholar
Cross Ref
- C. Walshaw. 2012a. FocusWare NetWorks MNO—A commercialised version of JOSTLE. Retrieved from http://http://focusware.co.uk.Google Scholar
- C. Walshaw. 2012b. The Graph Partitioning Archive. Retrieved from http://staffweb.cms.gre.ac.uk/∼wc06/partition.Google Scholar
- Chris Walshaw and Mark Cross. 2000. Mesh partitioning: A multilevel balancing and refinement algorithm. SIAM Journal on Scientific Computing 22, 1 (2000), 63--80. Google Scholar
Digital Library
- Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 6684 (1998), 440--442.Google Scholar
- Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013. Graphx: A resilient distributed graph system on spark. In Proceedings of International Workshop on Graph Data Management Experiences and Systems (GRADES’13). ACM, 2. Google Scholar
Digital Library
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of USENIX Conference on Networked Systems Design and Implementation (NSDI’12). USENIX, 2--2. Google Scholar
Digital Library
- Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’10). USENIX, 10. Google Scholar
Digital Library
Index Terms
A Distributed Algorithm for Large-Scale Graph Partitioning
Recommendations
Streaming graph partitioning for large distributed graphs
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data miningExtracting knowledge by performing computations on graphs is becoming increasingly challenging as graphs grow in size. A standard approach distributes the graph over a cluster of nodes, but performing computations on a distributed graph is expensive if ...
A parallel graph partitioning algorithm to speed up the large-scale distributed graph mining
BigMine '12: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and ApplicationsFor the large-scale distributed graph mining, the graph is distributed over a cluster of nodes, thus performing computations on the distributed graph is expensive when large amount of data have to be moved between different computers. A good ...
Partitioning a Graph into Complementary Subgraphs
WALCOM: Algorithms and ComputationAbstractIn the Partition Into Complementary Subgraphs (Comp-Sub) problem we are given a graph , and an edge set property , and asked whether G can be decomposed into two graphs, H and its complement , for some graph H, in such a way that the edge cut-set (...






Comments