skip to main content
research-article

CGraph: A Distributed Storage and Processing System for Concurrent Iterative Graph Analysis Jobs

Authors Info & Claims
Published:20 April 2019Publication History
Skip Abstract Section

Abstract

Distributed graph processing platforms usually need to handle massive Concurrent iterative Graph Processing (CGP) jobs for different purposes. However, existing distributed systems face high ratio of data access cost to computation for the CGP jobs, which incurs low throughput. We observed that there are strong spatial and temporal correlations among the data accesses issued by different CGP jobs, because these concurrently running jobs usually need to repeatedly traverse the shared graph structure for the iterative processing of each vertex. Based on this observation, this article proposes a distributed storage and processing system CGraph for the CGP jobs to efficiently handle the underlying static/evolving graph for high throughput. It uses a data-centric load-trigger-pushing model, together with several optimizations, to enable the CGP jobs to efficiently share the graph structure data in the cache/memory and their accesses by fully exploiting such correlations, where the graph structure data is decoupled from the vertex state associated with each job. It can deliver much higher throughput for the CGP jobs by effectively reducing their average ratio of data access cost to computation. Experimental results show that CGraph improves the throughput of the CGP jobs by up to 3.47× in comparison with existing solutions on distributed platforms.

References

  1. Facebook. 2018. Retrieved from http://www.facebook.com/.Google ScholarGoogle Scholar
  2. LAW. 2018. Retrieved from http://law.di.unimi.it/datasets.php.Google ScholarGoogle Scholar
  3. SNAP. 2018. Retrieved from http://snap.stanford.edu/data/index.html.Google ScholarGoogle Scholar
  4. WDC. 2018. Retrieved from http://webdatacommons.org/hyperlinkgraph/.Google ScholarGoogle Scholar
  5. Khaled Ammar and Tamer Ozsu. 2018. Experimental analysis of distributed graph systems. Proc. VLDB Endow. 11, 10 (2018), 1151--1164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shumeet Baluja, Rohan Seth, Dharshi Sivakumar, Yushi Jing, Jay Yagnik, Shankar Kumar, Deepak Ravichandran, and Mohamed Aly. 2008. Video suggestion and discovery for YouTube: Taking random walks through the view graph. In Proceedings of the 17th International Conference on World Wide Web. 895--904. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mihaela A. Bornea, Julian Dolby, Anastasios Kementsietsidis, Kavitha Srinivas, Patrick Dantressangle, Octavian Udrea, and Bishwaranjan Bhattacharjee. 2013. Building an efficient RDF store over a relational database. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 121--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nathan Bronson, Zach Amsden, George Cabrera, Prasad Chakka, Peter Dimov, Hui Ding, Jack Ferris, Anthony Giardullo, Sachin Kulkarni, and Harry Li. 2013. TAO: Facebook’s distributed data store for the social graph. In Proceedings of the USENIX Annual Technical Conference. 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yingyi Bu, Vinayak Borkar, Jianfeng Jia, Michael J. Carey, and Tyson Condie. 2014. Pregelix: Big(ger) graph analytics on a dataflow engine. Proc. VLDB Endow. 8, 2 (2014), 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. 2010. HaLoop: Efficient iterative data processing on large clusters. Proc. VLDB Endow. 3, 1--2 (2010), 285--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Aydin Buluc and Kamesh Madduri. 2011. Parallel breadth-first search on distributed memory systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hanhua Chen, Hai Jin, and Xiaolong Cui. 2017. Hybrid followee recommendation in microblogging systems. Sci. China Inform. Sci. 60, 012102 (2017), 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  13. Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the 10th European Conference on Computer Systems. 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rishan Chen, Mao Yang, Xuetian Weng, Byron Choi, Bingsheng He, and Xiaoming Li. 2012. Improving large graph processing on partitioned graphs in the cloud. In Proceedings of the 3rd ACM Symposium on Cloud Computing. 3:1--3:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jiefeng Cheng, Qin Liu, Zhenguo Li, Wei Fan, John C. S. Lui, and Cheng He. 2015. VENUS: Vertex-centric streamlined graph computation on a single PC. In Proceedings of the 31st IEEE International Conference on Data Engineering. 1131--1142.Google ScholarGoogle ScholarCross RefCross Ref
  16. Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at Facebook-scale. Proc. VLDB Endow. 8, 12 (2015), 1804--1815. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dong Dai, Wei Zhang, and Yong Chen. 2017. IOGP: An incremental online graph partitioning algorithm for distributed graph databases. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing. 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: A runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 810--818. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 17--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 599--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zhenyu Guo, Dong Zhou, Haoxiang Lin, Mao Yang, Fan Long, Chaoqiang Deng, Changshu Liu, and Lidong Zhou. 2011. G2: A graph processing system for diagnosing distributed systems. In Proceedings of the USENIX Annual Technical Conference. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Minyang Han and Khuzaima Daudjee. 2015. Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. Proc. VLDB Endow. 8, 9 (2015), 950--961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bingsheng He, Mao Yang, Zhenyu Guo, Rishan Chen, Bing Su, Wei Lin, and Lidong Zhou. 2010. Comet: Batched stream processing for data intensive distributed computing. In Proceedings of the 1st ACM Symposium on Cloud Computing. 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sungpack Hong, Nicole C. Rodia, and Kunle Olukotun. 2013. On fast parallel detection of strongly connected components (SCC) in small-world graphs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Chuan Hu and Huiping Cao. 2016. Aspect-level influence discovery from graphs. IEEE Trans. Knowl. Data Eng. 28, 7 (2016), 1635--1649.Google ScholarGoogle ScholarCross RefCross Ref
  26. Xiaoen Ju, Williams Dan, Hani Jamjoom, and G. Shin Kang. 2016. Version traveler: Fast and memory-efficient version switching in graph processing systems. In Proceedings of the 2016 USENIX Annual Technical Conference. 523--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. 2018. GraFBoost: Using accelerated flash storage for external graph analytics. In Proceedings of the 45th ACM/IEEE International Symposium on Computer Architecture. 411--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pavlos Kefalas, Panagiotis Symeonidis, and Yannis Manolopoulos. 2016. A graph-based taxonomy of recommendation algorithms and systems in LBSNs. IEEE Trans. Knowl. Data Eng. 28, 3 (2016), 604--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Seongyun Ko and Wook-Shin Han. 2018. TurboGraph++: A scalable and fast graph analytics system. In Proceedings of the International Conference on Management of Data. 395--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nicolas Kourtellis, Gianmarco De Francisci Morales, and Francesco Bonchi. 2015. Scalable online betweenness centrality in evolving graphs. IEEE Trans. Knowl. Data Eng 27, 9 (2015), 2494--2506.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Pradeep Kumar and H. Howie Huang. 2019. GraphOne: A data store for real-time analytics on evolving graphs. In Proceedings of the 17th USENIX Conference on File and Storage Technologies. 249--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Shalmoli Gupta, Ravi Kumar, Kefu Lu, Benjamin Moseley, and Sergei Vassilvitskii. 2017. Local search methods for k-means with outliers. Proc. VLDB Endow. 10, 7 (2017), 757--768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8 (2012), 716--727. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lingxiao Ma, Zhi Yang, Han Chen, Jilong Xue, and Yafei Dai. 2017. Garaph: Efficient GPU-accelerated graph processing on a single machine with balanced replication. In Proceedings of the USENIX Annual Technical Conference. 195--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of data. 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jasmina Malicevic, Baptiste Joseph Eustache Lepers, and Willy Zwaenepoel. 2017. Everything you always wanted to know about multicore graph processing but were afraid to ask. In Proceedings of the USENIX Annual Technical Conference. 631--643. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Claudio Martella, Dionysios Logothetis, Andreas Loukas, and Georgos Siganos. 2017. Spinner: Scalable graph partitioning in the cloud. In Proceedings of the 33rd International Conference on Data Engineering. 1083--1094.Google ScholarGoogle ScholarCross RefCross Ref
  38. Ulrich Meyer. 2001. Single-source shortest-paths on arbitrary directed graphs in linear average-case time. In Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms. 797--806. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting locality in graph analytics through hardware-accelerated traversal scheduling. In Proceedings of the 51st IEEE/ACM International Symposium on Microarchitecture. 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1998. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford Digital Library Technologies Project.Google ScholarGoogle Scholar
  41. Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out graph processing from secondary storage. In Proceedings of the 25th Symposium on Operating Systems Principles. 410--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Feng Sheng, Qiang Cao, Haoran Cai, Jie Yao, and Changsheng Xie. 2018. GraPU: Accelerate streaming graph analysis through preprocessing buffered updates. In Proceedings of the ACM Symposium on Cloud Computing. 301--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. 2016. Fast and concurrent RDF queries with RDMA-based distributed graph exploration. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. 317--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Junshuai Song, Xiongcai Luo, Jun Gao, Chang Zhou, Hu Wei, and Jeffrey Xu Yu. 2018. UniWalk: Unidirectional random walk based scalable SimRank computation over large graph. IEEE Trans. Knowl. Data Eng 30, 5 (2018), 992--1006.Google ScholarGoogle ScholarCross RefCross Ref
  45. Luis M. Vaquero, Felix Cuadrado, Dionysios Logothetis, and Claudio Martella. 2014. Adaptive partitioning for large-scale dynamic graphs. In Proceedings of the 34th International Conference on Distributed Computing Systems. 144--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Shiv Verma, Luke M. Leslie, Yosub Shin, and Indranil Gupta. 2017. An experimental comparison of partitioning strategies in distributed graph processing. Proc. VLDB Endow. 10, 5 (2017), 493--504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Keval Vora, Chen Tian, Rajiv Gupta, and Ziang Hu. 2017. CoRAL: Confined recovery in distributed asynchronous graph processing. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 223--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. 389--404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Siyuan Wang, Chang Lou, Rong Chen, and Haibo Chen. 2018. Fast and concurrent RDF queries using RDMA-assisted GPU graph exploration. In Proceedings of the 2018 USENIX Annual Technical Conference. 651--664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. 2015. GraM: Scaling graph computation to the trillions. In Proceedings of the 6th ACM Symposium on Cloud Computing. 408--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, and Haibo Chen. 2015. SYNC or ASYNC: Time to fuse for distributed graph-parallel computation. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 194--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jilong Xue, Zhi Yang, Shian Hou, and Yafei Dai. 2017. Processing concurrent graph analytics with decoupled computation model. IEEE Trans. Comput. 66, 5 (2017), 876--890. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Jilong Xue, Zhi Yang, Zhi Qu, Shian Hou, and Yafei Dai. 2014. Seraph: An efficient, low-cost system for concurrent graph processing. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing. 227--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A block-centric framework for distributed computation on real-world graphs. Proc. VLDB Endow. 7, 14 (2014), 1981--1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Pingpeng Yuan, Wenya Zhang, Changfeng Xie, Hai Jin, Ling Liu, and Kisung Lee. 2014. Fast iterative graph computation: A path centric approach. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis. 401--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yu Zhang, Xiaofei Liao, Hai Jin, Lin Gu, Ligang He, Bingsheng He, and Haikun Liu. 2018. CGraph: A correlations-aware approach for efficient concurrent iterative graph processing. In Proceedings of the USENIX Annual Technical Conference. 441--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yu Zhang, Xiaofei Liao, Hai Jin, Bingsheng He, Haikun Liu, and Lin Gu. 2019. DiGraph: An efficient path-based iterative directed graph processing system on multiple GPUs. In Proceedings of the Architectural Support for Programming Languages and Operating Systems. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A computation-centric distributed graph processing system. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. 301--316. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CGraph: A Distributed Storage and Processing System for Concurrent Iterative Graph Analysis Jobs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Storage
        ACM Transactions on Storage  Volume 15, Issue 2
        Systor 2018 Special Section on ATC 2018, Special Section on OSDI 2018 and Regular Papers
        May 2019
        187 pages
        ISSN:1553-3077
        EISSN:1553-3093
        DOI:10.1145/3326597
        • Editor:
        • Sam H. Noh
        Issue’s Table of Contents

        Copyright © 2019 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 April 2019
        • Accepted: 1 March 2019
        • Received: 1 September 2018
        Published in tos Volume 15, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!