Abstract
Personalized PageRank (PPR) has wide applications in search engines, social recommendations, community detection, and so on. Nowadays, graphs are becoming massive and many IT companies need to deal with large graphs that cannot be fitted into the memory of most commodity servers. However, most existing state-of-the-art solutions for PPR computation only work for single-machines and are inefficient for the distributed framework since such solutions either (i) result in an excessively large number of communication rounds, or (ii) incur high communication costs in each round.
Motivated by this, we present Delta-Push, an efficient framework for single-source and top-k PPR queries in distributed settings. Our goal is to reduce the number of rounds while guaranteeing that the load, i.e., the maximum number of messages an executor sends or receives in a round, can be bounded by the capacity of each executor. We first present a non-trivial combination of a redesigned parallel push algorithm and the Monte-Carlo method to answer single-source PPR queries. The solution uses pre-sampled random walks to reduce the number of rounds for the push al6gorithm. Theoretical analysis under the Massively Parallel Computing (MPC) model shows that our proposed solution bounds the communication rounds to [EQUATION] under a load of O(m/p), where m is the number of edges of the input graph, p is the number of executors, and ϵ is a user-defined error parameter. In the meantime, as the number of executors increases to p' = γ · p, the load constraint can be relaxed since each executor can hold O(γ · m/p') messages with invariant local memory. In such scenarios, multiple queries can be processed in batches simultaneously. We show that with a load of O(γ · m/p'), our Delta-Push can process γ queries in a batch with [EQUATION] rounds, while other baseline solutions still keep the same round cost for each batch. We further present a new top-k algorithm that is friendly to the distributed framework and reduces the number of rounds required in practice. Extensive experiments show that our proposed solution is more efficient than alternatives.
- Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. 2007. Local Computation of PageRank Contributions. In WAW. 150--165. Google Scholar
Digital Library
- Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. 2006. Local Graph Partitioning using PageRank Vectors. In FOCS. 475--486. Google Scholar
Digital Library
- Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. 2014. Parallel Algorithms for Geometric Graph Problems. In STOC. 574--583. Google Scholar
Digital Library
- Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, Elena Smirnova, and Marina Sokol. 2011. Quick Detection of Top-k Personalized PageRank Lists. In WAW. 50--61. Google Scholar
Digital Library
- Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: predicting and recommending links in social networks. In WSDM. 635--644. Google Scholar
Digital Library
- Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. 2011. Fast personalized PageRank on MapReduce. In SIGMOD. 973--984. Google Scholar
Digital Library
- Paul Beame, Paraschos Koutris, and Dan Suciu. 2013. Communication steps for parallel query processing. In PODS. 273--284. Google Scholar
Digital Library
- Pavel Berkhin. 2005. Survey: A Survey on PageRank Computing. Internet Mathematics 2, 1 (2005), 73--120.Google Scholar
Cross Ref
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In OSDI. 137--150. Google Scholar
Digital Library
- Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics 2, 3 (2005), 333--358.Google Scholar
Cross Ref
- Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. 2013. Efficient ad-hoc search for personalized PageRank. In SIGMOD. 445--456. Google Scholar
Digital Library
- Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. 2012. Efficient personalized pagerank with accuracy assurance. In KDD. 15--23. Google Scholar
Digital Library
- Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI. 599--613. Google Scholar
Digital Library
- Michael Goodrich, Nodari Sitchinava, and Qin Zhang. 2011. Sorting, Searching, and Simulation in the MapReduce Framework. In ISAAC. 374--383. Google Scholar
Digital Library
- Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. 2017. Distributed Algorithms on Exact Personalized PageRank. In SIGMOD. 479--494. Google Scholar
Digital Library
- Manish S. Gupta, Amit Pathak, and Soumen Chakrabarti. 2008. Fast algorithms for topk personalized pagerank queries. In WWW. 1225--1226. Google Scholar
Digital Library
- Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. Wtf: The who to follow service at twitter. In WWW. 505--514. Google Scholar
Digital Library
- Zoltán Gyöngyi, Pavel Berkhin, Hector Garcia-Molina, and Jan O. Pedersen. 2006. Link Spam Detection Based on Mass Estimation. In VLDB. 439--450. Google Scholar
Digital Library
- Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In WWW. 271--279. Google Scholar
Digital Library
- Jinhong Jung, Namyong Park, Lee Sael, and U Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In SIGMOD. 789--804. Google Scholar
Digital Library
- Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A Model of Computation for MapReduce. In SODA. 938--948. Google Scholar
Digital Library
- Jérôme Kunegis. 2013. KONECT - The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion. 1343--1350. Google Scholar
Digital Library
- Dandan Lin, Raymond Chi-Wing Wong, Min Xie, and Victor Junqiu Wei. 2020. Index-Free Approach with Theoretical Guarantee for Efficient Random Walk with Restart Query. In ICDE. 913--924.Google Scholar
- Wenqing Lin. 2019. Distributed Algorithms for Fully Personalized PageRank on Large Graphs. In WWW. 1084--1094. Google Scholar
Digital Library
- Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized pagerank estimation and search: A bidirectional approach. In WSDM. 163--172. Google Scholar
Digital Library
- Peter A Lofgren, Siddhartha Banerjee, Ashish Goel, and C Seshadhri. 2014. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In KDD. 1436--1445. Google Scholar
Digital Library
- Siqiang Luo. 2019. Distributed PageRank Computation: An Improved Theoretical Study. In AAAI. 4496--4503.Google Scholar
- Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. 2014. Computing personalized PageRank quickly by exploiting graph structures. PVLDB 7, 12 (2014), 1023--1034. Google Scholar
Digital Library
- Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient PageRank Tracking in Evolving Networks. In SIGKDD. 875--884. Google Scholar
Digital Library
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: bringing order to the web. (1999).Google Scholar
- Peitian Pan and Chao Li. 2017. Congra: Towards Efficient Processing of Concurrent Graph Queries on Shared-Memory Machines. In ICCD. 217--224.Google Scholar
- Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. 2013. Fast Distributed PageRank Computation. In ICDCN. 11--26.Google Scholar
- Kijung Shin, Jinhong Jung, Lee Sael, and U. Kang. 2015. BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs. In SIGMOD. 1571--1585. Google Scholar
Digital Library
- Hanzhi Wang, Zhewei Wei, Junhao Gan, Sibo Wang, and Zengfeng Huang. 2020. Personalized PageRank to a Target Node, Revisited. In SIGKDD. 657--667.Google Scholar
- Runhui Wang, Sibo Wang, and Xiaofang Zhou. 2019. Parallelizing approximate single-source personalized PageRank queries on shared memory. VLDB J. 28, 6 (2019), 923--940.Google Scholar
Cross Ref
- Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. HubPPR: Effective Indexing for Approximate Personalized PageRank. PVLDB 10, 3 (2016), 205--216. Google Scholar
Digital Library
- Sibo Wang and Yufei Tao. 2018. Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks. In SIGMOD. 1113--1127. Google Scholar
Digital Library
- Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Yin Yang, and Nan Tang. 2019. Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries. Trans. Database Syst. 44, 4 (2019), 18:1--18:37. Google Scholar
Digital Library
- Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: Simple and Effective Approximate Single-Source Personalized PageRank. In SIGKDD. 505--514. Google Scholar
Digital Library
- Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Yu Liu, Xiaoyong Du, and Ji-Rong Wen. 2019. PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs. In SIGMOD. 1042--1059. Google Scholar
Digital Library
- Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. 2018. TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs. In SIGMOD. 441--456. Google Scholar
Digital Library
- David P Williamson and David B Shmoys. 2011. The design of approximation algorithms. Cambridge university press. Google Scholar
Digital Library
- Jilong Xue, Zhi Yang, Shian Hou, and Yafei Dai. 2017. Processing Concurrent Graph Analytics with Decoupled Computation Model. Trans. Computers 66, 5 (2017), 876--890. Google Scholar
Digital Library
- Jilong Xue, Zhi Yang, Zhi Qu, Shian Hou, and Yafei Dai. 2014. Seraph: an efficient, low-cost system for concurrent graph processing. In HPDC. 227--238. Google Scholar
Digital Library
- Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In HotCloud. Google Scholar
Digital Library
- Hongyang Zhang, Peter Lofgren, and Ashish Goel. 2016. Approximate Personalized PageRank on Dynamic Graphs. In KDD. 1315--1324. Google Scholar
Digital Library
- Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, Haikun Liu, and Yicheng Chen. 2019. GraphM: an efficient storage system for high throughput of concurrent graph processing. In SC. 3:1--3:14. Google Scholar
Digital Library
- Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. 2013. Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation. PVLDB 6, 6 (2013), 481--492. Google Scholar
Digital Library
Index Terms
(auto-classified)Massively parallel algorithms for personalized pagerank
Recommendations
Fast algorithms for topk personalized pagerank queries
WWW '08: Proceedings of the 17th international conference on World Wide WebIn entity-relation (ER) graphs (V,E), nodes V represent typed entities and edges E represent typed relations. For dynamic personalized PageRank queries, nodes are ranked by their steady-state probabilities obtained using the standard random surfer ...
Analysis of Multigrid Algorithms on Massively Parallel Computers
We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether currently envisioned machines will provide an efficient platform for such algorithms. These algorithms ...
Massively Parallel Join Algorithms
Due to the rapid development of massively parallel data processing systems such as MapReduce and Spark, there have been revived interests in designing algorithms in a massively parallel computational model. Computing multi-way joins, as one of the ...






Comments