Abstract
Despite their widespread adoption, large-scale graph processing systems do not fully decouple computation and communication, often yielding suboptimal performance. Locally-sufficient computation-computation that relies only on the graph state local to a computing host-can mitigate the effects of this coupling. In this paper, we present Compute-Sync-Merge (CSM), a new programming abstraction that achieves efficient locally-sufficient computation. CSM enforces local sufficiency at the programming abstraction level and enables the activation of vertex-centric computation on all vertex replicas, thus supporting vertex-cut partitioning. We demonstrate the simplicity of expressing several fundamental graph algorithms in CSM. Hieroglyph-our implementation of a graph processing system with CSM support-outperforms state of the art by up to 53x, with a median speedup of 3.5x and an average speedup of 6x across a wide range of datasets.
- Apache. 2016. Apache Giraph. http://giraph.apache.org. (2016). Retrieved in May 2017.Google Scholar
- Lars Backstrom, Dan Huttenlocher, Jon Kleinberg, and Xiangyang Lan. 2006. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '06). ACM, New York, NY, USA, 44--54. Google Scholar
Digital Library
- Paolo Boldi, Marco Rosa, Massimo Santini, and Sebastiano Vigna. 2011. Layered Label Propagation: A MultiResolution Coordinate-Free Ordering for Compressing Social Networks. In Proceedings of the 20th international conference on World Wide Web (WWW'11). ACM Press. Google Scholar
Digital Library
- Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph Framework I: Compression Techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW 2004) (WWW'04). ACM Press, Manhattan, USA, 595--601. Google Scholar
Digital Library
- Yingyi Bu, Vinayak Borkar, Jianfeng Jia, Michael J. Carey, and Tyson Condie. 2014. Pregelix: Big(ger) Graph Analytics on a Dataflow Engine. Proceedings of the VLDB Endowment 8, 2 (2014), 161--172. Google Scholar
Digital Library
- Rong Chen, Xin Ding, Peng Wang, Haibo Chen, Binyu Zang, and Haibing Guan. 2014. Computation and Communication Efficient Graph Processing with Distributed Immutable View. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing (HPDC '14). ACM, New York, NY, USA, 215 226. Google Scholar
Digital Library
- Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys'15). ACM, New York, NY, USA, Article 1, 15 pages. Google Scholar
Digital Library
- Junghoo Cho, Hector Garcia-Molina, Taher Haveliwala, Wang Lam, Andreas Paepcke, Sriram Raghavan, and Gary Wesley. 2004. Stanford WebBase Components and Applications. Technical Report 2004--34. Stanford InfoLab. http://ilpubs.stanford.edu:8090/652/Google Scholar
- James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton, and Eric Xing. 2013. Solving the Straggler Problem with Bounded Staleness. In Proceedings of the 14th USENIX Conference on Hot Topics in Operating Systems (HotOS'13). USENIX Association, Berkeley, CA, USA, 22--22. http://dl.acm.org/citation.cfm?id=2490483.2490505 Google Scholar
Digital Library
- Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R. Ganger, Phillip B. Gibbons, Garth A. Gibson, and Eric P. Xing. 2014. Exploiting Bounded Staleness to Speed Up Big Data Analytics. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'14). USENIX Association, Berkeley, CA, USA, 37--48. http://dl.acm.org/citation.cfm?id=2643634.2643639 Google Scholar
Digital Library
- DIMACS. 2010. 9th DIMACS Implementation Challenge - Shortest Paths. http://www.dis.uniroma1.it/challenge9/download.shtml. (2010). Retrieved in May 2017.Google Scholar
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 17--30. http://dl.acm.org/citation.cfm?id=2387880.2387883 Google Scholar
Digital Library
- Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) (OSDI '14). USENIX Association, Broomfield, CO, 599--613. Google Scholar
Digital Library
- Minyang Han and Khuzaima Daudjee. 2015. Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems. Proc. VLDB Endow. 8, 9 (May 2015), 950--961. Google Scholar
Digital Library
- Xiaoen Ju, Dan Williams, Hani Jamjoom, and Kang G. Shin. 2016. Version Traveler: Fast and Memory Efficient Version Switching in Graph Processing Systems. In Proceedings of the 2016 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'16). USENIX Association, Berkeley, CA, USA, 523--536. Google Scholar
Digital Library
- Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, and Panos Kalnis. 2013. Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 169--182. Google Scholar
Digital Library
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). ACM, New York, NY, USA, 591--600. Google Scholar
Digital Library
- Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 31--46. http://dl.acm.org/citation.cfm?id=2387880.2387884 Google Scholar
Digital Library
- Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2009. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics 6, 1 (2009), 29--123.Google Scholar
Cross Ref
- Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI'14). USENIX Association, Berkeley, CA, USA, 583--598. http://dl.acm.org/citation.cfm?id=2685048.2685095 Google Scholar
Digital Library
- Yucheng Low. 2013. GraphLab: A Distributed Abstraction for Large Scale Machine Learning. Ph.D. Dissertation. Carnegie Mellon University.Google Scholar
- Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Proc. VLDB Endow. 5, 8 (April 2012), 716--727. http://dl.acm.org/citation.cfm?id=2212351.2212354 Google Scholar
Digital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). ACM, New York, NY, USA, 135--146. Google Scholar
Digital Library
- Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale out Graph Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP'15). ACM, New York, NY, USA, 410--424. Google Scholar
Digital Library
- Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). ACM, New York, NY, USA, 472--488. Google Scholar
Digital Library
- Semih Salihoglu, Jaeho Shin, Vikesh Khanna, Ba Quan Truong, and Jennifer Widom. 2015. Graft: A Debugging Tool For Apache Giraph. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Da (SIGMOD '15). ACM, New York, NY, USA, 1403--1408. Google Scholar
Digital Library
- Semih Salihoglu and Jennifer Widom. 2013. GPS: A Graph Processing System. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management (SSDBM). ACM, New York, NY, USA, Article 22, 12 pages. Google Scholar
Digital Library
- Bin Shao, Haixun Wang, and Yatao Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD'13). New York, New York, USA. Google Scholar
Digital Library
- Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'13). ACM, New York, NY, USA, 135--146. Google Scholar
Digital Library
- Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From "Think Like a Vertex" to "Think Like a Graph". Proceedings of the VLDB Endowment 7, 3 (2013). Google Scholar
Digital Library
- Leslie G. Valiant. 1990. A Bridging Model for Parallel Computation. Commun. ACM 33, 8 (Aug. 1990), 103--111. Google Scholar
Digital Library
- Guozhang Wang, Wenlei Xie, Alan J Demers, and Johannes Gehrke. 2013. Asynchronous Large-Scale Graph Processing Made Easy.. In 6th Biennial Conference on Innovative Data Systems Research (CIDR '13).Google Scholar
- Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. 2015. GraM: Scaling Graph Computation to the Trillions. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC '15). ACM, New York, NY, USA, 408--421. Google Scholar
Digital Library
- Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. SYNC or ASYNC: Time to Fuse for Distributed Graph-parallel Computation. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015). ACM, New York, NY, USA, 194--204. Google Scholar
Digital Library
- Wenlei Xie, Guozhang Wang, David Bindel, Alan Demers, and Johannes Gehrke. 2013. Fast Iterative Graph Computation with Block Updates. Proc. VLDB Endow. 6, 14 (Sept. 2013), 2014--2025. Google Scholar
Digital Library
- Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A Block-centric Framework for Distributed Computation on Real-world Graphs. Proc. VLDB Endow. 7, 14 (Oct. 2014), 1981--1992. http://dl.acm.org/citation.cfm?id=2733085.2733103 Google Scholar
Digital Library
- Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation. In Proceedings of the 24th International Conference on World Wide Web (WWW'15). ACM, New York, NY, USA. Google Scholar
Digital Library
- Da Yan, James Cheng, M. Tamer Ozsu, Fan Yang, Yi Lu, John C. S. Lui, Qizhen Zhang, and Wilfred Ng. 2016. A General-purpose Query-centric Framework for Querying Big Graphs. Proc. VLDB Endow. 9, 7 (March 2016), 564--575. http://dl.acm.org/citation.cfm?id=2904483.2904488 Google Scholar
Digital Library
- Pingpeng Yuan, Wenya Zhang, Changfeng Xie, Hai Jin, Ling Liu, and Kisung Lee. 2014. Fast Iterative Graph Computation: A Path Centric Approach. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, Piscataway, NJ, USA, 401--412. Google Scholar
Digital Library
- Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, and Alexander S. Szalay. 2015. FlashGraph: Processing Billion-node Graphs on an Array of Commodity SSDs. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST'15). USENIX Association, Berkeley, CA, USA, 45 58. http://dl.acm.org/citation.cfm?id=2750482.275048 Google Scholar
Digital Library
Index Terms
Hieroglyph: Locally-Sufficient Graph Processing via Compute-Sync-Merge
Recommendations
Hieroglyph: Locally-Sufficient Graph Processing via Compute-Sync-Merge
SIGMETRICS '17 Abstracts: Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer SystemsMainstream graph processing systems (such as Pregel [3] and PowerGraph [1]) follow the bulk synchronous parallel model. This design leads to the tight coupling of computation and communication, where no vertex can proceed to the next iteration of ...
Hieroglyph: Locally-Sufficient Graph Processing via Compute-Sync-Merge
Performance evaluation reviewMainstream graph processing systems (such as Pregel [3] and PowerGraph [1]) follow the bulk synchronous parallel model. This design leads to the tight coupling of computation and communication, where no vertex can proceed to the next iteration of ...
On the complexity of the max-edge-coloring problem with its variants
ESCAPE'07: Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental MethodologiesThe max-edge-coloring problem (MECP) is finding an edge colorings {E1, E2, E3, .., Ez} of a weighted graph G=(V, E) to minimize Σi=1z max {w(ek)|ek ∈Ei}, where w(ek) is the weight of ek. In the work, we discuss the complexity issues on the MECP and its ...






Comments