skip to main content
research-article

Massively parallel algorithms for personalized pagerank

Published:01 May 2021Publication History
Skip Abstract Section

Abstract

Personalized PageRank (PPR) has wide applications in search engines, social recommendations, community detection, and so on. Nowadays, graphs are becoming massive and many IT companies need to deal with large graphs that cannot be fitted into the memory of most commodity servers. However, most existing state-of-the-art solutions for PPR computation only work for single-machines and are inefficient for the distributed framework since such solutions either (i) result in an excessively large number of communication rounds, or (ii) incur high communication costs in each round.

Motivated by this, we present Delta-Push, an efficient framework for single-source and top-k PPR queries in distributed settings. Our goal is to reduce the number of rounds while guaranteeing that the load, i.e., the maximum number of messages an executor sends or receives in a round, can be bounded by the capacity of each executor. We first present a non-trivial combination of a redesigned parallel push algorithm and the Monte-Carlo method to answer single-source PPR queries. The solution uses pre-sampled random walks to reduce the number of rounds for the push al6gorithm. Theoretical analysis under the Massively Parallel Computing (MPC) model shows that our proposed solution bounds the communication rounds to [EQUATION] under a load of O(m/p), where m is the number of edges of the input graph, p is the number of executors, and ϵ is a user-defined error parameter. In the meantime, as the number of executors increases to p' = γ · p, the load constraint can be relaxed since each executor can hold O(γ · m/p') messages with invariant local memory. In such scenarios, multiple queries can be processed in batches simultaneously. We show that with a load of O(γ · m/p'), our Delta-Push can process γ queries in a batch with [EQUATION] rounds, while other baseline solutions still keep the same round cost for each batch. We further present a new top-k algorithm that is friendly to the distributed framework and reduces the number of rounds required in practice. Extensive experiments show that our proposed solution is more efficient than alternatives.

References

  1. Reid Andersen, Christian Borgs, Jennifer T. Chayes, John E. Hopcroft, Vahab S. Mirrokni, and Shang-Hua Teng. 2007. Local Computation of PageRank Contributions. In WAW. 150--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Reid Andersen, Fan R. K. Chung, and Kevin J. Lang. 2006. Local Graph Partitioning using PageRank Vectors. In FOCS. 475--486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexandr Andoni, Aleksandar Nikolov, Krzysztof Onak, and Grigory Yaroslavtsev. 2014. Parallel Algorithms for Geometric Graph Problems. In STOC. 574--583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Konstantin Avrachenkov, Nelly Litvak, Danil Nemirovsky, Elena Smirnova, and Marina Sokol. 2011. Quick Detection of Top-k Personalized PageRank Lists. In WAW. 50--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Lars Backstrom and Jure Leskovec. 2011. Supervised random walks: predicting and recommending links in social networks. In WSDM. 635--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bahman Bahmani, Kaushik Chakrabarti, and Dong Xin. 2011. Fast personalized PageRank on MapReduce. In SIGMOD. 973--984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Paul Beame, Paraschos Koutris, and Dan Suciu. 2013. Communication steps for parallel query processing. In PODS. 273--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Pavel Berkhin. 2005. Survey: A Survey on PageRank Computing. Internet Mathematics 2, 1 (2005), 73--120.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified Data Processing on Large Clusters. In OSDI. 137--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dániel Fogaras, Balázs Rácz, Károly Csalogány, and Tamás Sarlós. 2005. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics 2, 3 (2005), 333--358.Google ScholarGoogle ScholarCross RefCross Ref
  11. Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, Takeshi Mishima, and Makoto Onizuka. 2013. Efficient ad-hoc search for personalized PageRank. In SIGMOD. 445--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yasuhiro Fujiwara, Makoto Nakatsuji, Takeshi Yamamuro, Hiroaki Shiokawa, and Makoto Onizuka. 2012. Efficient personalized pagerank with accuracy assurance. In KDD. 15--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI. 599--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Michael Goodrich, Nodari Sitchinava, and Qin Zhang. 2011. Sorting, Searching, and Simulation in the MapReduce Framework. In ISAAC. 374--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tao Guo, Xin Cao, Gao Cong, Jiaheng Lu, and Xuemin Lin. 2017. Distributed Algorithms on Exact Personalized PageRank. In SIGMOD. 479--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Manish S. Gupta, Amit Pathak, and Soumen Chakrabarti. 2008. Fast algorithms for topk personalized pagerank queries. In WWW. 1225--1226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. Wtf: The who to follow service at twitter. In WWW. 505--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zoltán Gyöngyi, Pavel Berkhin, Hector Garcia-Molina, and Jan O. Pedersen. 2006. Link Spam Detection Based on Mass Estimation. In VLDB. 439--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In WWW. 271--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jinhong Jung, Namyong Park, Lee Sael, and U Kang. 2017. BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. In SIGMOD. 789--804. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. 2010. A Model of Computation for MapReduce. In SODA. 938--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jérôme Kunegis. 2013. KONECT - The Koblenz Network Collection. In Proc. Int. Conf. on World Wide Web Companion. 1343--1350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dandan Lin, Raymond Chi-Wing Wong, Min Xie, and Victor Junqiu Wei. 2020. Index-Free Approach with Theoretical Guarantee for Efficient Random Walk with Restart Query. In ICDE. 913--924.Google ScholarGoogle Scholar
  24. Wenqing Lin. 2019. Distributed Algorithms for Fully Personalized PageRank on Large Graphs. In WWW. 1084--1094. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Peter Lofgren, Siddhartha Banerjee, and Ashish Goel. 2016. Personalized pagerank estimation and search: A bidirectional approach. In WSDM. 163--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Peter A Lofgren, Siddhartha Banerjee, Ashish Goel, and C Seshadhri. 2014. Fast-ppr: Scaling personalized pagerank estimation for large graphs. In KDD. 1436--1445. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Siqiang Luo. 2019. Distributed PageRank Computation: An Improved Theoretical Study. In AAAI. 4496--4503.Google ScholarGoogle Scholar
  28. Takanori Maehara, Takuya Akiba, Yoichi Iwata, and Ken-ichi Kawarabayashi. 2014. Computing personalized PageRank quickly by exploiting graph structures. PVLDB 7, 12 (2014), 1023--1034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient PageRank Tracking in Evolving Networks. In SIGKDD. 875--884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank citation ranking: bringing order to the web. (1999).Google ScholarGoogle Scholar
  31. Peitian Pan and Chao Li. 2017. Congra: Towards Efficient Processing of Concurrent Graph Queries on Shared-Memory Machines. In ICCD. 217--224.Google ScholarGoogle Scholar
  32. Atish Das Sarma, Anisur Rahaman Molla, Gopal Pandurangan, and Eli Upfal. 2013. Fast Distributed PageRank Computation. In ICDCN. 11--26.Google ScholarGoogle Scholar
  33. Kijung Shin, Jinhong Jung, Lee Sael, and U. Kang. 2015. BEAR: Block Elimination Approach for Random Walk with Restart on Large Graphs. In SIGMOD. 1571--1585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Hanzhi Wang, Zhewei Wei, Junhao Gan, Sibo Wang, and Zengfeng Huang. 2020. Personalized PageRank to a Target Node, Revisited. In SIGKDD. 657--667.Google ScholarGoogle Scholar
  35. Runhui Wang, Sibo Wang, and Xiaofang Zhou. 2019. Parallelizing approximate single-source personalized PageRank queries on shared memory. VLDB J. 28, 6 (2019), 923--940.Google ScholarGoogle ScholarCross RefCross Ref
  36. Sibo Wang, Youze Tang, Xiaokui Xiao, Yin Yang, and Zengxiang Li. 2016. HubPPR: Effective Indexing for Approximate Personalized PageRank. PVLDB 10, 3 (2016), 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sibo Wang and Yufei Tao. 2018. Efficient Algorithms for Finding Approximate Heavy Hitters in Personalized PageRanks. In SIGMOD. 1113--1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, Zhewei Wei, Wenqing Lin, Yin Yang, and Nan Tang. 2019. Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries. Trans. Database Syst. 44, 4 (2019), 18:1--18:37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sibo Wang, Renchi Yang, Xiaokui Xiao, Zhewei Wei, and Yin Yang. 2017. FORA: Simple and Effective Approximate Single-Source Personalized PageRank. In SIGKDD. 505--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Yu Liu, Xiaoyong Du, and Ji-Rong Wen. 2019. PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs. In SIGMOD. 1042--1059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhewei Wei, Xiaodong He, Xiaokui Xiao, Sibo Wang, Shuo Shang, and Ji-Rong Wen. 2018. TopPPR: Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs. In SIGMOD. 441--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. David P Williamson and David B Shmoys. 2011. The design of approximation algorithms. Cambridge university press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jilong Xue, Zhi Yang, Shian Hou, and Yafei Dai. 2017. Processing Concurrent Graph Analytics with Decoupled Computation Model. Trans. Computers 66, 5 (2017), 876--890. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jilong Xue, Zhi Yang, Zhi Qu, Shian Hou, and Yafei Dai. 2014. Seraph: an efficient, low-cost system for concurrent graph processing. In HPDC. 227--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster Computing with Working Sets. In HotCloud. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Hongyang Zhang, Peter Lofgren, and Ashish Goel. 2016. Approximate Personalized PageRank on Dynamic Graphs. In KDD. 1315--1324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jin Zhao, Yu Zhang, Xiaofei Liao, Ligang He, Bingsheng He, Hai Jin, Haikun Liu, and Yicheng Chen. 2019. GraphM: an efficient storage system for high throughput of concurrent graph processing. In SC. 3:1--3:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Fanwei Zhu, Yuan Fang, Kevin Chen-Chuan Chang, and Jing Ying. 2013. Incremental and Accuracy-Aware Personalized PageRank through Scheduled Approximation. PVLDB 6, 6 (2013), 481--492. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

(auto-classified)
  1. Massively parallel algorithms for personalized pagerank

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image Proceedings of the VLDB Endowment
          Proceedings of the VLDB Endowment  Volume 14, Issue 9
          May 2021
          249 pages
          ISSN:2150-8097
          Issue’s Table of Contents

          Publisher

          VLDB Endowment

          Publication History

          • Published: 1 May 2021
          Published in pvldb Volume 14, Issue 9

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!