skip to main content
research-article

Streaming graph partitioning: an experimental study

Published:01 July 2018Publication History
Skip Abstract Section

Abstract

Graph partitioning is an essential yet challenging task for massive graph analysis in distributed computing. Common graph partitioning methods scan the complete graph to obtain structural characteristics offline, before partitioning. However, the emerging need for low-latency, continuous graph analysis led to the development of online partitioning methods. Online methods ingest edges or vertices as a stream, making partitioning decisions on the fly based on partial knowledge of the graph. Prior studies have compared offline graph partitioning techniques across different systems. Yet, little effort has been put into investigating the characteristics of online graph partitioning strategies.

In this work, we describe and categorize online graph partitioning techniques based on their assumptions, objectives and costs. Furthermore, we employ an experimental comparison across different applications and datasets, using a unified distributed runtime based on Apache Flink. Our experimental results showcase that model-dependent online partitioning techniques such as low-cut algorithms offer better performance for communication-intensive applications such as bulk synchronous iterative algorithms, albeit higher partitioning costs. Otherwise, model-agnostic techniques trade off data locality for lower partitioning costs and balanced workloads which is beneficial when executing data-parallel single-pass graph algorithms.

References

  1. Apache Storm project. http://storm.apache.org/.Google ScholarGoogle Scholar
  2. Grouplens. https://grouplens.org/datasets/movielens/.Google ScholarGoogle Scholar
  3. Online social networks research at the Max Planck Institute for Software Systems. http://socialnetworks.mpi-sws.org/data-imc2007.html.Google ScholarGoogle Scholar
  4. K. Andreev and H. Räcke. Balanced graph partitioning. In Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, pages 120--124. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 16--24. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Bourse, M. Lelarge, and M. Vojnovic. Balanced graph edge partition. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1456--1465. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Brin and L. Page. Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer networks, 56(18):3825--3833, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Carbone, S. Ewen, G. Fóra, S. Haridi, S. Richter, and K. Tzoumas. State management in Apache Flink®: consistent stateful distributed stream processing. PVLDB, 10(12):1718--1729, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 2015.Google ScholarGoogle Scholar
  10. M. Cha, A. Mislove, and K. P. Gummadi. A measurement-driven analysis of information propagation in the Flickr social network. In Proceedings of the 18th International Conference on World Wide Web, pages 721--730. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-MAT: A recursive model for graph mining. In SDM, volume 4, pages 442--446. SIAM, 2004.Google ScholarGoogle Scholar
  12. R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the 7th ACM European Conference on Computer Systems, pages 85--98. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T.-Y. Cheung. Graph traversal techniques and the maximum flow problem in distributed computation. IEEE Transactions on Software Engineering, (4):504--512, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan. One trillion edges: graph processing at Facebook-scale. PVLDB, 8(12):1804--1815, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. ACM SIGOPS operating systems review, 41(6):205--220, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische mathematik, 1(1):269--271, 1959. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. U. Elsner. Graph partitioning-a survey. 1997.Google ScholarGoogle Scholar
  19. J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theoretical Computer Science, 348(2):207--216, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. Graph distances in the data-stream model. SIAM Journal on Computing, 38(5):1709--1727, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, volume 12, page 2, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 599--613, Broomfield, CO, 2014. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Guha and A. McGregor. Stream order and order statistics: Quantile estimation in random-order streams. SIAM Journal on Computing, 38(5):2044--2059, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Guo, S. Hong, H. Chafi, A. Iosup, and D. Epema. Modeling, analysis, and experimental comparison of streaming graph-partitioning policies. Journal of Parallel and Distributed Computing, 108:106--121, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  25. B. Hendrickson and T. G. Kolda. Graph partitioning models for parallel computing. Parallel computing, 26(12):1519--1534, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Hopcroft and R. Tarjan. Algorithm 447: efficient algorithms for graph manipulation. Communications of the ACM, 16(6):372--378, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Iyer, L. E. Li, and I. Stoica. CellIQ: real-time cellular network analytics at scale. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), pages 309--322, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. P. Iyer, L. E. Li, T. Das, and I. Stoica. Time-evolving graph processing at scale. In Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems, page 5. ACM, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. C. J. D. Bali, V. Kalavri. Streaming graph analytics framework design, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170425.Google ScholarGoogle Scholar
  30. N. Jain, G. Liao, and T. L. Willke. Graphbuilder: scalable graph etl framework. In First International Workshop on Graph Data Management Experiences and Systems, page 4. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: mining peta-scale graphs. Knowledge and Information Systems, 27(2):303--325, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Kim, I. Hwang, Y.-H. Kim, and B.-R. Moon. Genetic approaches for graph partitioning: a survey. In Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, pages 473--480. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Kiveris, S. Lattanzi, V. Mirrokni, V. Rastogi, and S. Vassilvitskii. Connected components in MapReduce and beyond. In Proceedings of the ACM Symposium on Cloud Computing, SOCC '14, pages 18:1--18:13, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In WWW '10: Proceedings of the 19th International Conference on World Wide Web, pages 591--600, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.Google ScholarGoogle Scholar
  36. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. McGregor. Graph mining on streams. Encyclopedia of Database Systems, pages 1271--1275, 2009.Google ScholarGoogle Scholar
  38. A. McGregor. Graph stream algorithms: A survey. ACM SIGMOD Record, 43(1):9--20, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC '07, pages 29--42, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 439--455. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. F. Petroni, L. Querzoni, K. Daudjee, S. Kamali, and G. Iacoboni. Hdrf: stream-based partitioning for power-law graphs. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 243--252. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Pothen. Graph partitioning algorithms with applications to scientific computing. ICASE LaRC Interdisciplinary Series in Science and Engineering, 4:323--368, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  43. V. Prabhakaran, M. Wu, X. Weng, F. McSherry, L. Zhou, and M. Haradasan. Managing large graphs on multi-cores with graph awareness. In Presented as part of the 2012 USENIX Annual Technical Conference (USENIX ATC 12), pages 41--52, Boston, MA, 2012. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. I. Stanton. Streaming balanced graph partitioning algorithms for random graphs. In Proceedings of the Twenty-fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '14, pages 1287--1301, Philadelphia, PA, USA, 2014. Society for Industrial and Applied Mathematics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. I. Stanton and G. Kliot. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 1222--1230, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. Thaler. Semi-streaming algorithms for annotated graph streams. arXiv preprint arXiv:1407.3462, 2014.Google ScholarGoogle Scholar
  47. C. Tsourakakis. Streaming graph partitioning in the planted partition model. In Proceedings of the 2015 ACM on Conference on Online Social Networks, COSN '15, pages 27--35, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic. Fennel: Streaming graph partitioning for massive scale graphs. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pages 333--342. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Verma, L. M. Leslie, Y. Shin, and I. Gupta. An experimental comparison of partitioning strategies in distributed graph processing. PVLDB, 10(5):493--504, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. C. Xie, L. Yan, W.-J. Li, and Z. Zhang. Distributed power-law graph computing: Theoretical and empirical analysis. In Advances in Neural Information Processing Systems, pages 1673--1681, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In 2012 IEEE 12th International Conference on Data Mining, pages 745--754, Dec 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, pages 2--2, Berkeley, CA, USA, 2012. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Streaming graph partitioning: an experimental study
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image Proceedings of the VLDB Endowment
            Proceedings of the VLDB Endowment  Volume 11, Issue 11
            July 2018
            507 pages
            ISSN:2150-8097
            Issue’s Table of Contents

            Publisher

            VLDB Endowment

            Publication History

            • Published: 1 July 2018
            Published in pvldb Volume 11, Issue 11

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader