skip to main content
research-article

Hieroglyph: Locally-Sufficient Graph Processing via Compute-Sync-Merge

Published:13 June 2017Publication History
Skip Abstract Section

Abstract

Despite their widespread adoption, large-scale graph processing systems do not fully decouple computation and communication, often yielding suboptimal performance. Locally-sufficient computation-computation that relies only on the graph state local to a computing host-can mitigate the effects of this coupling. In this paper, we present Compute-Sync-Merge (CSM), a new programming abstraction that achieves efficient locally-sufficient computation. CSM enforces local sufficiency at the programming abstraction level and enables the activation of vertex-centric computation on all vertex replicas, thus supporting vertex-cut partitioning. We demonstrate the simplicity of expressing several fundamental graph algorithms in CSM. Hieroglyph-our implementation of a graph processing system with CSM support-outperforms state of the art by up to 53x, with a median speedup of 3.5x and an average speedup of 6x across a wide range of datasets.

References

  1. Apache. 2016. Apache Giraph. http://giraph.apache.org. (2016). Retrieved in May 2017.Google ScholarGoogle Scholar
  2. Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '06). ACM, New York, NY, USA, 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th international conference on World Wide Web (WWW'11). ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004) (WWW'04). ACM Press, Manhattan, USA, 595--601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yingyi Bu, Vinayak Borkar, Jianfeng Jia, Michael J. Carey, and Tyson Condie. 2014. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. Proceedings of the VLDB Endowment 8, 2 (2014), 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Rong Chen, Xin Ding, Peng Wang, Haibo Chen, Binyu Zang, and Haibing Guan. 2014. Computation and Communication Efficient Graph Processing with Distributed Immutable View. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC '14). ACM, New York, NY, USA, 215 226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys'15). ACM, New York, NY, USA, Article 1, 15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Junghoo Cho, Hector Garcia-Molina, Taher Haveliwala, Wang Lam, Andreas Paepcke, Sriram Raghavan, and Gary Wesley. 2004. Stanford WebBase Components and Applications. Technical Report 2004--34. Stanford InfoLab. http://ilpubs.stanford.edu:8090/652/Google ScholarGoogle Scholar
  9. James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton, and Eric Xing. 2013. Solving the Straggler Problem with Bounded Staleness. In Proceedings of the 14th USENIX Conference on Hot Topics in Operating Systems (HotOS'13). USENIX Association, Berkeley, CA, USA, 22--22. http://dl.acm.org/citation.cfm?id=2490483.2490505 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, and Eric P. Xing. 2014. Exploiting Bounded Staleness to Speed Up Big Data Analytics. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 37--48. http://dl.acm.org/citation.cfm?id=2643634.2643639 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. DIMACS. 2010. 9th DIMACS Implementation Challenge - Shortest Paths. http://www.dis.uniroma1.it/challenge9/download.shtml. (2010). Retrieved in May 2017.Google ScholarGoogle Scholar
  12. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 17--30. http://dl.acm.org/citation.cfm?id=2387880.2387883 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) (OSDI '14). USENIX Association, Broomfield, CO, 599--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Minyang Han and Khuzaima Daudjee. 2015. Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems. Proc. VLDB Endow. 8, 9 (May 2015), 950--961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Xiaoen Ju, Dan Williams, Hani Jamjoom, and Kang G. Shin. 2016. Version Traveler: Fast and Memory Efficient Version Switching in Graph Processing Systems. In Proceedings of the 2016 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'16). USENIX Association, Berkeley, CA, USA, 523--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, and Panos Kalnis. 2013. Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 169--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). ACM, New York, NY, USA, 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics 6, 1 (2009), 29--123.Google ScholarGoogle ScholarCross RefCross Ref
  20. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI'14). USENIX Association, Berkeley, CA, USA, 583--598. http://dl.acm.org/citation.cfm?id=2685048.2685095 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yucheng Low. 2013. GraphLab: A Distributed Abstraction for Large Scale Machine Learning. Ph.D. Dissertation. Carnegie Mellon University.Google ScholarGoogle Scholar
  22. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Proc. VLDB Endow. 5, 8 (April 2012), 716--727. http://dl.acm.org/citation.cfm?id=2212351.2212354 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). ACM, New York, NY, USA, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale out Graph Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP'15). ACM, New York, NY, USA, 410--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 472--488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Semih Salihoglu, Jaeho Shin, Vikesh Khanna, Ba Quan Truong, and Jennifer Widom. 2015. Graft: A Debugging Tool For Apache Giraph. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Da (SIGMOD '15). ACM, New York, NY, USA, 1403--1408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Semih Salihoglu and Jennifer Widom. 2013. GPS: A Graph Processing System. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management (SSDBM). ACM, New York, NY, USA, Article 22, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Bin Shao, Haixun Wang, and Yatao Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD'13). New York, New York, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'13). ACM, New York, NY, USA, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From "Think Like a Vertex" to "Think Like a Graph". Proceedings of the VLDB Endowment 7, 3 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Leslie G. Valiant. 1990. A Bridging Model for Parallel Computation. Commun. ACM 33, 8 (Aug. 1990), 103--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Guozhang Wang, Wenlei Xie, Alan J Demers, and Johannes Gehrke. 2013. Asynchronous Large-Scale Graph Processing Made Easy.. In 6th Biennial Conference on Innovative Data Systems Research (CIDR '13).Google ScholarGoogle Scholar
  33. Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. 2015. GraM: Scaling Graph Computation to the Trillions. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC '15). ACM, New York, NY, USA, 408--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. SYNC or ASYNC: Time to Fuse for Distributed Graph-parallel Computation. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015). ACM, New York, NY, USA, 194--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wenlei Xie, Guozhang Wang, David Bindel, Alan Demers, and Johannes Gehrke. 2013. Fast Iterative Graph Computation with Block Updates. Proc. VLDB Endow. 6, 14 (Sept. 2013), 2014--2025. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A Block-centric Framework for Distributed Computation on Real-world Graphs. Proc. VLDB Endow. 7, 14 (Oct. 2014), 1981--1992. http://dl.acm.org/citation.cfm?id=2733085.2733103 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation. In Proceedings of the 24th International Conference on World Wide Web (WWW'15). ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Da Yan, James Cheng, M. Tamer Ozsu, Fan Yang, Yi Lu, John C. S. Lui, Qizhen Zhang, and Wilfred Ng. 2016. A General-purpose Query-centric Framework for Querying Big Graphs. Proc. VLDB Endow. 9, 7 (March 2016), 564--575. http://dl.acm.org/citation.cfm?id=2904483.2904488 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Pingpeng Yuan, Wenya Zhang, Changfeng Xie, Hai Jin, Ling Liu, and Kisung Lee. 2014. Fast Iterative Graph Computation: A Path Centric Approach. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, Piscataway, NJ, USA, 401--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, and Alexander S. Szalay. 2015. FlashGraph: Processing Billion-node Graphs on an Array of Commodity SSDs. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). USENIX Association, Berkeley, CA, USA, 45 58. http://dl.acm.org/citation.cfm?id=2750482.275048 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hieroglyph: Locally-Sufficient Graph Processing via Compute-Sync-Merge

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!