skip to main content
research-article

Toward Fast and Scalable Random Walks over Disk-Resident Graphs via Efficient I/O Management

Authors Info & Claims
Published:11 November 2022Publication History
Skip Abstract Section

Abstract

Traditional graph systems mainly use the iteration-based model, which iteratively loads graph blocks into memory for analysis so as to reduce random I/Os. However, this iteration-based model limits the efficiency and scalability of running random walk, which is a fundamental technique to analyze large graphs. In this article, we first propose a state-aware I/O model to improve the I/O efficiency of running random walk, then we develop a block-centric indexing and buffering scheme for managing walk data, and leverage an asynchronous walk updating strategy to improve random walk efficiency. We implement an I/O-efficient graph system, GraphWalker, which is efficient to handle very large disk-resident graphs and also scalable to run tens of billions of random walks with only a single commodity machine. Experiments show that GraphWalker can achieve more than an order of magnitude speedup when compared with DrunkardMob, which is tailored for random walks based on the classical graph system GraphChi, as well as two state-of-the-art single-machine graph systems, Graphene and GraFSoft. Furthermore, when compared with the most recent distributed system KnightKing, GraphWalker still achieves comparable performance with only a single machine, thereby making it a more cost-effective alternative.

REFERENCES

  1. [1] Friendster. [n.d]. Home Page. http://konect.uni-koblenz.de/networks/friendster.Google ScholarGoogle Scholar
  2. [2] Graph500. [n.d]. Home Page. Retrieved October 5, 2022 from https://graph500.org/.Google ScholarGoogle Scholar
  3. [3] Web Data Commons. [n.d]. The 2012 Common Crawl Graph. Available at http://webdatacommons.org.Google ScholarGoogle Scholar
  4. [4] ANLAB Traces. [n.d]. Twitter. Available at http://an.kaist.ac.kr/traces/WWW2010.html.Google ScholarGoogle Scholar
  5. [5] Yahoo! [n.d]. Yahoo Webscope Program. Retrieved October 5, 2022 from http://webscope.sandbox.yahoo.com.Google ScholarGoogle Scholar
  6. [6] Ai Zhiyuan, Zhang Mingxing, Wu Yongwei, Qian Xuehai, Chen Kang, and Zheng Weimin. 2017. Squeezing out all the value of loaded data: An out-of-core graph processing system with reduced disk I/O. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  7. [7] Andersen Reid, Borgs Christian, Chayes Jennifer, Feige Uriel, Flaxman Abraham, Kalai Adam, Mirrokni Vahab, and Tennenholtz Moshe. 2008. Trust-based recommendation systems: An axiomatic approach. In Proceedings of WWW. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Bar-Yossef Ziv, Berg Alexander, Chien Steve, Fakcharoenphol Jittat, and Weitz Dror. 2000. Approximating aggregate queries about web pages via random walks. In Proceedings of VLDB.Google ScholarGoogle Scholar
  9. [9] Chen Hongzhi, Liu Miao, Zhao Yunjian, Yan Xiao, Yan Da, and Cheng James. 2018. G-Miner: An efficient task-oriented graph mining system. In Proceedings of EuroSys. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Chen Rong, Shi Jiaxin, Chen Yanzhe, and Chen Haibo. 2015. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of EuroSys. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Chen Wei, Wang Yajun, and Yang Siyu. 2009. Efficient influence maximization in social networks. In Proceedings of KDD. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Zheng Disa Mhembere Da, Burns Randal, Vogelstein Joshua, Priebe Carey E., and Szalay Alexander S.. 2015. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proceedings of FAST.Google ScholarGoogle Scholar
  13. [13] Debnath Souvik, Ganguly Niloy, and Mitra Pabitra. 2008. Feature weighting in content based recommendation system using social network analysis. In Proceedings of WWW. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Elyasi Nima, Choi Changho, and Sivasubramaniam Anand. 2019. Large-scale graph processing on emerging storage devices. In Proceedings of FAST.Google ScholarGoogle Scholar
  15. [15] Fogaras Dániel, Rácz Balázs, Csalogány Károly, and Sarlós Tamás. 2005. Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Internet Mathematics 2, 3 (2005), 333358.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Gonzalez Hector, Han Jiawei, Li Xiaolei, Myslinska Margaret, and Sondag John Paul. 2007. Adaptive fastest path computation on a road network: A traffic mining approach. In Proceedings of VLDB.Google ScholarGoogle Scholar
  17. [17] Gonzalez Joseph E., Low Yucheng, Gu Haijie, Bickson Danny, and Guestrin Carlos. 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of OSDI.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Gonzalez Joseph E., Xin Reynold S., Dave Ankur, Crankshaw Daniel, Franklin Michael J., and Stoica Ion. 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of OSDI.Google ScholarGoogle Scholar
  19. [19] Grover Aditya and Leskovec Jure. 2016. node2vec: Scalable feature learning for networks. In Proceedings of KDD. ACM, New York, NY, 855864.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Haveliwala Taher H.. 2002. Topic-sensitive pagerank. In Proceedings of WWW. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Henzinger Monika R., Heydon Allan, Mitzenmacher Michael, and Najork Marc. 1999. Measuring index quality using random walks on the web. Computer Networks 31, 11 (1999), 12911303.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Hong Sungpack, Chafi Hassan, Sedlar Edic, and Olukotun Kunle. 2012. Green-Marl: A DSL for easy and efficient graph analysis. ACM SIGPLAN Notices 47, 4 (2012), 349–362.Google ScholarGoogle Scholar
  23. [23] Hotho Andreas, Jäschke Robert, Schmitz Christoph, Stumme Gerd, and Althoff Klaus-Dieter. 2006. FolkRank: A ranking algorithm for folksonomies. In Proceedings of LWA.Google ScholarGoogle Scholar
  24. [24] Jamali Mohsen and Ester Martin. 2009. TrustWalker: A random walk model for combining trust-based and ttem-based recommendation. In Proceedings of KDD. ACM, New York, NY, 397–406.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Jeh Glen and Widom Jennifer. 2002. SimRank: A measure of structural-context similarity. In Proceedings of KDD. ACM, New York, NY, 538–543.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Jeh Glen and Widom Jennifer. 2003. Scaling personalized web search. In Proceedings of WWW. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Jun Sang-Woo, Wright Andy, Zhang Sizhuo, Xu Shuotao, and Arvind. 2018. GraFBoost: Using accelerated flash storage for external graph analytics. In Proceedings of ISCA. IEEE, Los Alamitos, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Kempe David, Kleinberg Jon, and Tardos Éva. 2003. Maximizing the spread of influence through a social network. In Proceedings of KDD. ACM, New York, NY, 137–146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Khan Arijit, Segovia Gustavo, and Kossmann Donald. 2018. On smart query routing: For distributed graph querying with decoupled storage. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  30. [30] Khorasani Farzad, Vora Keval, Gupta Rajiv, and Bhuyan Laxmi N.. 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of HPDC. ACM, New York, NY, 239–252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Kyrola Aapo. 2013. DrunkardMob: Billions of random walks on just a PC. In Proceedings of RecSys. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Kyrola Aapo, Blelloch Guy E., and Guestrin Carlos. 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of OSDI.Google ScholarGoogle Scholar
  33. [33] A. N. Langville and C. D. Meyer. 2004. Deeper inside PageRank. Internet Mathematics 1, 3 (2004), 335–380.Google ScholarGoogle Scholar
  34. [34] Lee Chul-Ho, Xu Xin, and Eun Do Young. 2012. Beyond random walk and Metropolis-Hastings samplers: Why you should not backtrack for unbiased graph sampling. In Proceedings of SIGMETRICS.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Li Rong-Hua, Yu Jeffrey Xu, Huang Xin, and Cheng Hong. 2014. Random-walk domination in large graphs. In Proceedings of ICDE. IEEE, Los Alamitos, CA.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Liu Hang and Huang H. Howie. 2015. Enterprise: Breadth-first graph traversal on GPUs. In Proceedings of SC. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  37. [37] Liu Hang and Huang H. Howie. 2017. Graphene: Fine-grained IO management for graph computing. In Proceedings of FAST.Google ScholarGoogle Scholar
  38. [38] Lochert Christian, Hartenstein Hannes, Tian Jing, Fussler Holger, Hermann Dagmar, and Mauve Martin. 2003. A routing strategy for vehicular ad hoc networks in city environments. In Proceedings of IV. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  39. [39] Low Yucheng, Bickson Danny, Gonzalez Joseph, Guestrin Carlos, Kyrola Aapo, and Hellerstein Joseph M.. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proceedings of VLDB.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Maass Steffen, Min Changwoo, Kashyap Sanidhya, Kang Woonhak, Kumar Mohan, and Kim Taesoo. 2017. Mosaic: Processing a trillion-edge graph on a single machine. In Proceedings of EuroSys. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Malewicz Grzegorz, Austern Matthew H., Bik Aart J. C., Dehnert James C., Horn Ilan, Leiser Naty, and Czajkowski Grzegorz. 2010. Pregel: A system for large-scale graph processing. In Proceedings of SIGMOD. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Nguyen Donald, Lenharth Andrew, and Pingali Keshav. 2013. A lightweight infrastructure for graph analytics. In Proceedings of SOSP. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Page Larry. 1998. The PageRank Citation Ranking: Bring Order to the Web. Technical Report. Stanford University.Google ScholarGoogle Scholar
  44. [44] Pan Jia-Yu, Yang Hyung-Jeong, Faloutsos Christos, and Duygulu Pinar. 2004. Automatic multimedia cross-modal correlation discovery. In Proceedings of KDD. ACM, New York, NY, 653–658.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Pržulj Nataša. 2007. Biological network comparison using graphlet degree distribution. Bioinformatics 23, 2 (2007), e177–e183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Pržulj Nataša, Corneil Derek G., and Jurisica Igor. 2004. Modeling interactome: Scale-free or geometric? Bioinformatics 20, 18 (2004), 35083515.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Ribeiro Bruno and Towsley Don. 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of SIGCOMM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Roy Amitabha, Mihailovic Ivo, and Zwaenepoel Willy. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of SOSP. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Rusmevichientong Paat, Pennock David M., Lawrence Steve, and Giles C. Lee. 2001. Methods for sampling pages uniformly from the World Wide Web. In Proceedings of AAAI.Google ScholarGoogle Scholar
  50. [50] Shun Julian and Blelloch Guy E.. 2013. Ligra: A lightweight graph processing framework for shared memory. ACM SIGPLAN Notices 48, 8 (2013), 135–146.Google ScholarGoogle Scholar
  51. [51] Teixeira Carlos H. C., Fonseca Alexandre J., Serafini Marco, Siganos Georgos, Zaki Mohammed J., and Aboulnaga Ashraf. 2015. Arabesque: A system for distributed graph mining. In Proceedings of SOSP. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Tong Hanghang, Faloutsos Christos, and Pan Jia-Yu. 2006. Fast random walk with restart and its applications. In Proceedings of ICDM. IEEE, Los Alamitos, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Vora Keval. 2019. LUMOS: Dependency-driven disk-based graph processing. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  54. [54] Vora Keval, Xu Guoqing (Harry), and Gupta Rajiv. 2016. Load the edges you need: A generic I/O optimization for disk-based graph processing. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar
  55. [55] Wang Rui, Lv Min, Wu Zhiyong, Li Yongkun, and Xu Yinlong. 2019. Fast graph centrality computation via sampling: A case study of influence maximisation over OSNs. International Journal of High Performance Computing and Networking 14, 1 (2019), 92101.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Wang Yangzihao, Davidson Andrew, Pan Yuechao, Wu Yuduo, Riffel Andy, and Owens John D.. 2016. Gunrock: A high-performance graph processing library on the GPU. ACM SIGPLAN Notices 51, 8 (2016), Article 11, 12 pages.Google ScholarGoogle Scholar
  57. [57] Yang Ke, Ma Xiaosong, Thirumuruganathan Saravanan, Chen Kang, and Wu Yongwei. 2021. Random walks on huge graphs at cache efficiency. In Proceedings of SOSP. ACM, New York, NY, 311326.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Yang Ke, Zhang MingXing, Chen Kang, Ma Xiaosong, Bai Yang, and Jiang Yong. 2019. KnightKing: A fast distributed graph random walk engine. In Proceedings of SOSP. ACM, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Zhao Pengpeng, Li Yongkun, Xie Hong, Wu Zhiyong, Xu Yinlong, and Lui John C. S.. 2017. Measuring and maximizing influence via random walk in social activity networks. In Proceedings of DASFAA. 323338.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Zhu Xiaowei, Chen Wenguang, Zheng Weimin, and Ma Xiaosong. 2016. Gemini: A computation-centric distributed graph processing system. In Proceedings of OSDI.Google ScholarGoogle Scholar
  61. [61] Zhu Xiaowei, Feng Guanyu, Serafini Marco, Ma Xiaosong, Yu Jiping, Xie Lei, Aboulnaga Ashraf, and Chen Wenguang. 2020. LiveGraph: A transactional graph storage system with purely sequential adjacency list scans. Proceedings of the VLDB Endowment 13 (2020), 10201034.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Zhu Xiaowei, Han Wentao, and Chen Wenguang. 2015. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of USENIX ATC.Google ScholarGoogle Scholar

Index Terms

  1. Toward Fast and Scalable Random Walks over Disk-Resident Graphs via Efficient I/O Management

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 18, Issue 4
          November 2022
          279 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/3570642
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 November 2022
          • Online AM: 27 September 2022
          • Accepted: 25 April 2022
          • Revised: 19 January 2022
          • Received: 30 August 2021
          Published in tos Volume 18, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!