skip to main content
research-article

A Distributed Algorithm for Large-Scale Graph Partitioning

Authors Info & Claims
Published:09 June 2015Publication History
Skip Abstract Section

Abstract

Balanced graph partitioning is an NP-complete problem with a wide range of applications. These applications include many large-scale distributed problems, including the optimal storage of large sets of graph-structured data over several hosts. However, in very large-scale distributed scenarios, state-of-the-art algorithms are not directly applicable because they typically involve frequent global operations over the entire graph. In this article, we propose a fully distributed algorithm called JA-BE-JA that uses local search and simulated annealing techniques for two types of graph partitioning: edge-cut partitioning and vertex-cut partitioning. The algorithm is massively parallel: There is no central coordination, each vertex is processed independently, and only the direct neighbors of a vertex and a small subset of random vertices in the graph need to be known locally. Strict synchronization is not required. These features allow JA-BE-JA to be easily adapted to any distributed graph-processing system from data centers to fully distributed networks. We show that the minimal edge-cut value empirically achieved by JA-BE-JA is comparable to state-of-the-art centralized algorithms such as Metis. In particular, on large social networks, JA-BE-JA outperforms Metis. We also show that JA-BE-JA computes very low vertex-cuts, which are proved significantly more effective than edge-cuts for processing most real-world graphs.

References

  1. Amine Abou-Rjeili and George Karypis. 2006. Multilevel algorithms for partitioning power-law graphs. In Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS’06). IEEE, 10--pp. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 1 (2002), 47.Google ScholarGoogle ScholarCross RefCross Ref
  3. Réka Albert, Hawoong Jeong, and Albert-László Barabási. 2000. Error and attack tolerance of complex networks. Nature 406, 6794 (2000), 378--382.Google ScholarGoogle Scholar
  4. Konstantin Andreev and Harald Räcke. 2004. Balanced graph partitioning. In Proceedings of ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’04). ACM, 120--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alex Averbuch and Martin Neumann. 2013. Partitioning graph databases-a quantitative evaluation. arXiv preprint arXiv:1301.5121 (2013).Google ScholarGoogle Scholar
  6. Asad Awan, Ronaldo A. Ferreira, Suresh Jagannathan, and Ananth Grama. 2006. Distributed uniform sampling in unstructured peer-to-peer networks. In Proceedings of Hawaii International Conference on System Sciences (HICSS’06), Vol. 9. IEEE, 223c--223c. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Raul Baños, Consolación Gil, Julio Ortega, and Francisco G. Montoya. 2003. Multilevel heuristic algorithm for graph partitioning. In Proceedings of Applications of Evolutionary Computing. Springer, 143--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Una Benlic and Jin-Kao Hao. 2011a. An effective multilevel tabu search approach for balanced graph partitioning. Computers & Operations Research 38, 7 (2011), 1066--1075. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Una Benlic and Jin-Kao Hao. 2011b. A multilevel memetic approach for improving graph k-partitions. IEEE Transactions on Evolutionary Computation (TEC) 15, 5 (2011), 624--642. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Thang Nguyen Bui and Byung Ro Moon. 1996. Genetic algorithm and graph partitioning. IEEE Transactions on Computers (TC) 45, 7 (1996), 841--855. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Pierre Chardaire, Musbah Barake, and Geoff P. McKeown. 2007. A probe-based heuristic for graph partitioning. IEEE Transactions on Computers 56, 12 (2007), 1707--1720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David Dominguez-Sal, P. Urbón-Bayes, Aleix Giménez-Vañó, Sergio Gómez-Villamor, Norbert Martínez-Bazán, and Josep-Lluis Larriba-Pey. 2010. Survey of graph database performance on the HPC scalable graph analysis benchmark. (2010), 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jim Dowling and Amir H. Payberah. 2012. Shuffling with a croupier: Nat-aware peer-sampling. In Proceedings of IEEE International Conference on Distributed Computing Systems (ICDCS’12). IEEE, 102--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Anton J. Enright, Stijn Van Dongen, and Christos A. Ouzounis. 2002. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research 30, 7 (2002), 1575--1584.Google ScholarGoogle ScholarCross RefCross Ref
  15. Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic, and Wolfgang Kellerer. 2010. Outtweeting the twitterers-predicting information cascades in microblogs. In Proceedings of Workshop on Online Social Networks (WOSN). USENIX Association, 3--3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Joachim Gehweiler and Henning Meyerhenke. 2010. A distributed diffusive heuristic for clustering a virtual P2P supercomputer. In Proceedings of IEEE International Parallel & Distributed Processing Symposium Workshops and Phd Forum (IPDPSW’’10). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  17. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of USENIX Symposium on Operating System Design and Implementation (OSDI), Vol. 12. USENIX, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alessio Guerrieri and Alberto Montresor. 2014. Distributed edge partitioning for graph processing. arXiv preprint arXiv:1403.6270 (2014).Google ScholarGoogle Scholar
  19. Bruce Hendrickson. 1998. Graph partitioning and parallel solvers: Has the emperor no clothes? In Proceedings of Solving Irregularly Structured Problems in Parallel. Springer, 218--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bruce Hendrickson and Robert W. Leland. 1995. A multi-level algorithm for partitioning graphs. In Proceedings of the 1995 ACM/IEEE Conference on Supercomputing(SC’95). 28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Márk Jelasity, Alberto Montresor, and Ozalp Babaoglu. 2005. Gossip-based aggregation in large dynamic networks. ACM Transactions on Computer Systems (TOCS) 23, 3 (2005), 219--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20, 1 (1998), 359--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. George Karypis and Vipin Kumar. 1999a. Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Review 41, 2 (1999), 278--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. George Karypis and Vipin Kumar. 1999b. Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Review 41, 2 (1999), 278--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Brian W. Kernighan and Shen Lin. 1970. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal 49, 2 (1970), 291--307.Google ScholarGoogle ScholarCross RefCross Ref
  26. Mijung Kim and K. Selçuk Candan. 2012. SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data & Knowledge Engineering 72 (2012), 285--303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Maciej Kurant, Athina Markopoulou, and Patrick Thiran. 2010. On the bias of BFS (breadth first search). In Proceedings of International Teletraffic Congress (ITC’10). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  28. Kevin Lang. 2004. Finding good nearly balanced cuts in power law graphs. Technology Report YRL-2004-036, Yahoo! Research Labs (2004).Google ScholarGoogle Scholar
  29. Jure Leskovec. 2011. Stanford Large Network Dataset collection. URL http://snap.stanford.edu/data/index. html (2011).Google ScholarGoogle Scholar
  30. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6, 1 (2009), 29--123.Google ScholarGoogle ScholarCross RefCross Ref
  31. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment. 5, 8 (2012), 716--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Gabriel Luque and Enrique Alba. 2011. Parallel Genetic Algorithms: Theory and Real World Applications. Vol. 367. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of ACM Special Interest Group on Management of Data (SIGMOD’10). ACM, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Laurent Massoulié, Erwan Le Merrer, Anne-Marie Kermarrec, and Ayalvadi Ganesh. 2006. Peer counting and sampling in overlay networks: Random walk methods. In Proceedings of ACM Symposium on Principles of Distributed Computing (PODC). ACM, 123--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Henning Meyerhenke, Burkhard Monien, and Thomas Sauerwald. 2008. A new diffusion-based multilevel algorithm for computing graph partitions of very high quality. In Proceedings of IEEE International Symposium on Parallel and Distributed (IPDPS’08). IEEE, 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  36. Henning Meyerhenke, Burkhard Monien, and Stefan Schamberger. 2009. Graph partitioning and disturbed diffusion. Parallel Comput. 35, 10 (2009), 544--569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Alberto Montresor and Márk Jelasity. 2009. PeerSim: A scalable P2P simulator. In Proceedings of IEEE International Conference on Peer-to-Peer Computing (P2P’09). IEEE, 99--100.Google ScholarGoogle ScholarCross RefCross Ref
  38. Amir H. Payberah, Jim Dowling, and Seif Haridi. 2011. Gozar: Nat-friendly peer sampling with one-hop distributed nat traversal. In Proceedings of IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS). Springer, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Fatemeh Rahimian, Amir H. Payberah, Sarunas Girdzijauskas, and Seif Haridi. 2014. Distributed vertex-cut partitioning. In Proceedings of IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS’14). Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Fatemeh Rahimian, Amir H. Payberah, Sarunas Girdzijauskas, Mark Jelasity, and Seif Haridi. 2013. Ja-be-Ja: A distributed algorithm for balanced graph partitioning. In Proceedings of IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO’13). IEEE, 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Lakshmish Ramaswamy, Bugra Gedik, and Ling Liu. 2005. A distributed approach to node clustering in decentralized peer-to-peer networks. IEEE Transactions on Parallel and Distributed Systems (TPDS) 16, 9 (2005), 814--829. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Peter Sanders and Christian Schulz. 2011. Engineering multilevel graph partitioning algorithms. In Algorithms (ESA’11). Springer, 469--480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Peter Sanders and Christian Schulz. 2012. Distributed evolutionary graph partitioning. In Proceedings of ALENEX. SIAM, 16--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Alan J. Soper, Chris Walshaw, and Mark Cross. 2004. A combined evolutionary search and multilevel optimisation approach to graph-partitioning. Journal of Global Optimization 29, 2 (2004), 225--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. El-Ghazali Talbi. 2009. Metaheuristics: From Design to Implementation. Vol. 74. John Wiley & Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. E.-G. Talbi and Pierre Bessiere. 1991. A parallel genetic algorithm for the graph partitioning problem. In Proceedings of ACM International Conference on Supercomputing (ICS’91). ACM, 312--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Norbert Tölgyesi and Márk Jelasity. 2009. Adaptive peer sampling with newscast. In Proceedings of International Conference on Parallel Processing (Euro-Par’09). Springer, 523--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Peter J. M. Van Laarhoven and Emile H. L. Aarts. 1987. Simulated Annealing. Springer.Google ScholarGoogle Scholar
  49. Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the evolution of user interaction in facebook. In Proceedings of ACM Workshop on Online Social Networks (WOSN’09). ACM, 37--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Spyros Voulgaris, Daniela Gavidia, and Maarten Van Steen. 2005. Cyclon: Inexpensive membership management for unstructured p2p overlays. Journal of Network and Systems Management 13, 2 (2005), 197--217.Google ScholarGoogle ScholarCross RefCross Ref
  51. C. Walshaw. 2012a. FocusWare NetWorks MNO—A commercialised version of JOSTLE. Retrieved from http://http://focusware.co.uk.Google ScholarGoogle Scholar
  52. C. Walshaw. 2012b. The Graph Partitioning Archive. Retrieved from http://staffweb.cms.gre.ac.uk/∼wc06/partition.Google ScholarGoogle Scholar
  53. Chris Walshaw and Mark Cross. 2000. Mesh partitioning: A multilevel balancing and refinement algorithm. SIAM Journal on Scientific Computing 22, 1 (2000), 63--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of small-world networks. Nature 393, 6684 (1998), 440--442.Google ScholarGoogle Scholar
  55. Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013. Graphx: A resilient distributed graph system on spark. In Proceedings of International Workshop on Graph Data Management Experiences and Systems (GRADES’13). ACM, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of USENIX Conference on Networked Systems Design and Implementation (NSDI’12). USENIX, 2--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’10). USENIX, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Distributed Algorithm for Large-Scale Graph Partitioning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!