Abstract
Traditional graph systems mainly use the iteration-based model, which iteratively loads graph blocks into memory for analysis so as to reduce random I/Os. However, this iteration-based model limits the efficiency and scalability of running random walk, which is a fundamental technique to analyze large graphs. In this article, we first propose a state-aware I/O model to improve the I/O efficiency of running random walk, then we develop a block-centric indexing and buffering scheme for managing walk data, and leverage an asynchronous walk updating strategy to improve random walk efficiency. We implement an I/O-efficient graph system,
- [1] Friendster. [n.d]. Home Page. http://konect.uni-koblenz.de/networks/friendster.Google Scholar
- [2] Graph500. [n.d]. Home Page. Retrieved October 5, 2022 from https://graph500.org/.Google Scholar
- [3] Web Data Commons. [n.d]. The 2012 Common Crawl Graph. Available at http://webdatacommons.org.Google Scholar
- [4] ANLAB Traces. [n.d]. Twitter. Available at http://an.kaist.ac.kr/traces/WWW2010.html.Google Scholar
- [5] Yahoo! [n.d]. Yahoo Webscope Program. Retrieved October 5, 2022 from http://webscope.sandbox.yahoo.com.Google Scholar
- [6] . 2017. Squeezing out all the value of loaded data: An out-of-core graph processing system with reduced disk I/O. In Proceedings of USENIX ATC.Google Scholar
- [7] . 2008. Trust-based recommendation systems: An axiomatic approach. In Proceedings of WWW. ACM, New York, NY.Google Scholar
Digital Library
- [8] . 2000. Approximating aggregate queries about web pages via random walks. In Proceedings of VLDB.Google Scholar
- [9] . 2018. G-Miner: An efficient task-oriented graph mining system. In Proceedings of EuroSys. ACM, New York, NY.Google Scholar
Digital Library
- [10] . 2015. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of EuroSys. ACM, New York, NY.Google Scholar
Digital Library
- [11] . 2009. Efficient influence maximization in social networks. In Proceedings of KDD. ACM, New York, NY.Google Scholar
Digital Library
- [12] . 2015. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proceedings of FAST.Google Scholar
- [13] . 2008. Feature weighting in content based recommendation system using social network analysis. In Proceedings of WWW. ACM, New York, NY.Google Scholar
Digital Library
- [14] . 2019. Large-scale graph processing on emerging storage devices. In Proceedings of FAST.Google Scholar
- [15] . 2005. Towards scaling fully personalized PageRank: Algorithms, lower bounds, and experiments. Internet Mathematics 2, 3 (2005), 333–358.Google Scholar
Cross Ref
- [16] . 2007. Adaptive fastest path computation on a road network: A traffic mining approach. In Proceedings of VLDB.Google Scholar
- [17] . 2012. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of OSDI.Google Scholar
Digital Library
- [18] . 2014. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of OSDI.Google Scholar
- [19] . 2016. node2vec: Scalable feature learning for networks. In Proceedings of KDD. ACM, New York, NY, 855–864.Google Scholar
Digital Library
- [20] . 2002. Topic-sensitive pagerank. In Proceedings of WWW. ACM, New York, NY.Google Scholar
Digital Library
- [21] . 1999. Measuring index quality using random walks on the web. Computer Networks 31, 11 (1999), 1291–1303.Google Scholar
Digital Library
- [22] . 2012. Green-Marl: A DSL for easy and efficient graph analysis. ACM SIGPLAN Notices 47, 4 (2012), 349–362.Google Scholar
- [23] . 2006. FolkRank: A ranking algorithm for folksonomies. In Proceedings of LWA.Google Scholar
- [24] . 2009. TrustWalker: A random walk model for combining trust-based and ttem-based recommendation. In Proceedings of KDD. ACM, New York, NY, 397–406.Google Scholar
Digital Library
- [25] . 2002. SimRank: A measure of structural-context similarity. In Proceedings of KDD. ACM, New York, NY, 538–543.Google Scholar
Digital Library
- [26] . 2003. Scaling personalized web search. In Proceedings of WWW. ACM, New York, NY.Google Scholar
Digital Library
- [27] . 2018. GraFBoost: Using accelerated flash storage for external graph analytics. In Proceedings of ISCA. IEEE, Los Alamitos, CA.Google Scholar
Digital Library
- [28] . 2003. Maximizing the spread of influence through a social network. In Proceedings of KDD. ACM, New York, NY, 137–146.Google Scholar
Digital Library
- [29] . 2018. On smart query routing: For distributed graph querying with decoupled storage. In Proceedings of USENIX ATC.Google Scholar
- [30] . 2014. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of HPDC. ACM, New York, NY, 239–252.Google Scholar
Digital Library
- [31] . 2013. DrunkardMob: Billions of random walks on just a PC. In Proceedings of RecSys. ACM, New York, NY.Google Scholar
Digital Library
- [32] . 2012. GraphChi: Large-scale graph computation on just a PC. In Proceedings of OSDI.Google Scholar
- [33] . 2004. Deeper inside PageRank. Internet Mathematics 1, 3 (2004), 335–380.Google Scholar
- [34] . 2012. Beyond random walk and Metropolis-Hastings samplers: Why you should not backtrack for unbiased graph sampling. In Proceedings of SIGMETRICS.Google Scholar
Digital Library
- [35] . 2014. Random-walk domination in large graphs. In Proceedings of ICDE. IEEE, Los Alamitos, CA.Google Scholar
Cross Ref
- [36] . 2015. Enterprise: Breadth-first graph traversal on GPUs. In Proceedings of SC. IEEE, Los Alamitos, CA.Google Scholar
- [37] . 2017. Graphene: Fine-grained IO management for graph computing. In Proceedings of FAST.Google Scholar
- [38] . 2003. A routing strategy for vehicular ad hoc networks in city environments. In Proceedings of IV. IEEE, Los Alamitos, CA.Google Scholar
- [39] . 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proceedings of VLDB.Google Scholar
Digital Library
- [40] . 2017. Mosaic: Processing a trillion-edge graph on a single machine. In Proceedings of EuroSys. ACM, New York, NY.Google Scholar
Digital Library
- [41] . 2010. Pregel: A system for large-scale graph processing. In Proceedings of SIGMOD. ACM, New York, NY.Google Scholar
Digital Library
- [42] . 2013. A lightweight infrastructure for graph analytics. In Proceedings of SOSP. ACM, New York, NY.Google Scholar
Digital Library
- [43] . 1998. The PageRank Citation Ranking: Bring Order to the Web. Technical Report. Stanford University.Google Scholar
- [44] . 2004. Automatic multimedia cross-modal correlation discovery. In Proceedings of KDD. ACM, New York, NY, 653–658.Google Scholar
Digital Library
- [45] . 2007. Biological network comparison using graphlet degree distribution. Bioinformatics 23, 2 (2007), e177–e183.Google Scholar
Digital Library
- [46] . 2004. Modeling interactome: Scale-free or geometric? Bioinformatics 20, 18 (2004), 3508–3515.Google Scholar
Digital Library
- [47] . 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of SIGCOMM.Google Scholar
Digital Library
- [48] . 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of SOSP. ACM, New York, NY.Google Scholar
Digital Library
- [49] . 2001. Methods for sampling pages uniformly from the World Wide Web. In Proceedings of AAAI.Google Scholar
- [50] . 2013. Ligra: A lightweight graph processing framework for shared memory. ACM SIGPLAN Notices 48, 8 (2013), 135–146.Google Scholar
- [51] . 2015. Arabesque: A system for distributed graph mining. In Proceedings of SOSP. ACM, New York, NY.Google Scholar
Digital Library
- [52] . 2006. Fast random walk with restart and its applications. In Proceedings of ICDM. IEEE, Los Alamitos, CA.Google Scholar
Digital Library
- [53] . 2019. LUMOS: Dependency-driven disk-based graph processing. In Proceedings of USENIX ATC.Google Scholar
- [54] . 2016. Load the edges you need: A generic I/O optimization for disk-based graph processing. In Proceedings of USENIX ATC.Google Scholar
- [55] . 2019. Fast graph centrality computation via sampling: A case study of influence maximisation over OSNs. International Journal of High Performance Computing and Networking 14, 1 (2019), 92–101.Google Scholar
Digital Library
- [56] . 2016. Gunrock: A high-performance graph processing library on the GPU. ACM SIGPLAN Notices 51, 8 (2016), Article 11, 12 pages.Google Scholar
- [57] . 2021. Random walks on huge graphs at cache efficiency. In Proceedings of SOSP. ACM, New York, NY, 311–326.Google Scholar
Digital Library
- [58] . 2019. KnightKing: A fast distributed graph random walk engine. In Proceedings of SOSP. ACM, New York, NY.Google Scholar
Digital Library
- [59] . 2017. Measuring and maximizing influence via random walk in social activity networks. In Proceedings of DASFAA. 323–338.Google Scholar
Cross Ref
- [60] . 2016. Gemini: A computation-centric distributed graph processing system. In Proceedings of OSDI.Google Scholar
- [61] . 2020. LiveGraph: A transactional graph storage system with purely sequential adjacency list scans. Proceedings of the VLDB Endowment 13 (2020), 1020–1034.Google Scholar
Digital Library
- [62] . 2015. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proceedings of USENIX ATC.Google Scholar
Index Terms
Toward Fast and Scalable Random Walks over Disk-Resident Graphs via Efficient I/O Management
Recommendations
Short Random Walks on Graphs
The short-term behavior of random walks on graphs is studied, in particular, the rate at which a random walk discovers new vertices and edges. A conjecture by Linial that the expected time to find $\cal N$ distinct vertices is $O({\cal N}^{3})$ is ...
How slow, or fast, are standard random walks?: analysis of hitting and cover times on trees
CATS '11: Proceedings of the Seventeenth Computing: The Australasian Theory Symposium - Volume 119Random walk is a powerful tool, not only for modeling, but also for practical use such as the Internet crawlers. Standard random walks on graphs have been well studied; It is well-known that both hitting time and cover time of a standard random walk are ...
Random walks which prefer unvisited edges.: exploring high girth even degree expanders in linear time.
PODC '12: Proceedings of the 2012 ACM symposium on Principles of distributed computingIn this paper, we consider a modified random walk which uses unvisited edges whenever possible, and makes a simple random walk otherwise. We call such a walk an edge-process (or E-process). We assume there is a rule A, which tells the walk which ...






Comments