skip to main content
research-article

Efficient Pairwise Penetrating-rank Similarity Retrieval

Authors Info & Claims
Published:18 December 2019Publication History
Skip Abstract Section

Abstract

Many web applications demand a measure of similarity between two entities, such as collaborative filtering, web document ranking, linkage prediction, and anomaly detection. P-Rank (Penetrating-Rank) has been accepted as a promising graph-based similarity measure, as it provides a comprehensive way of encoding both incoming and outgoing links into assessment. However, the existing method to compute P-Rank is iterative in nature and rather cost-inhibitive. Moreover, the accuracy estimate and stability issues for P-Rank computation have not been addressed. In this article, we consider the optimization techniques for P-Rank search that encompasses its accuracy, stability, and computational efficiency. (1) The accuracy estimation is provided for P-Rank iterations, with the aim to find out the number of iterations, k, required to guarantee a desired accuracy. (2) A rigorous bound on the condition number of P-Rank is obtained for stability analysis. Based on this bound, it can be shown that P-Rank is stable and well-conditioned when the damping factors are chosen to be suitably small. (3) Two matrix-based algorithms, applicable to digraphs and undirected graphs, are, respectively, devised for efficient P-Rank computation, which improves the computational time from O(kn3) to On26) for digraphs, and to On2) for undirected graphs, where n is the number of vertices in the graph, and υ (≪ n) is the target rank of the graph. Moreover, our proposed algorithms can significantly reduce the memory space of P-Rank computations from O(n2) to On4) for digraphs, and to On) for undirected graphs, respectively. Finally, extensive experiments on real-world and synthetic datasets demonstrate the usefulness and efficiency of the proposed techniques for P-Rank similarity assessment on various networks.

References

  1. Robert Amsler. 1972. Application of Citation-based Automatic Classification. Technical Report. The University of Texas at Austin, Linguistics Research Center.Google ScholarGoogle Scholar
  2. Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. 2008. SimRank++: Query rewriting through link analysis of the click graph. Proc. VLDB Endow. 1, 1 (2008).Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yuanzhe Cai, Miao Zhang, Chris H. Q. Ding, and Sharma Chakravarthy. 2010. Closed form solution of similarity algorithms. In Proceedings of the SIGIR. 709--710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learning graph representations with global structural information. In Proceedings of the CIKM. 891--900. DOI:https://doi.org/10.1145/2806416.2806512Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dániel Fogaras and Balázs Rácz. 2005. Scaling link-based similarity search. In Proceedings of the WWW. 641--650.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, and Makoto Onizuka. 2013. Efficient search algorithm for SimRank. In Proceedings of the ICDE. 589--600.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gene H. Golub and Charles F. Van Loan. 1996. Matrix Computations (3rd ed.). John Hopkins University Press. 50 pages.Google ScholarGoogle Scholar
  8. Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable feature learning for networks. In Proceedings of the SIGKDD. 855--864. DOI:https://doi.org/10.1145/2939672.2939754Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Masoud Reyhani Hamedani and Sang-Wook Kim. 2019. Pairwise normalization in SimRank variants: Problem, solution, and evaluation. In Proceedings of the SAC. 534--541. DOI:https://doi.org/10.1145/3297280.3297331Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Taher H. Haveliwala. 2003. Topic-Sensitive PageRank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15, 4 (2003), 784--796. DOI:https://doi.org/10.1109/TKDE.2003.1208999Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guoming He, Haijun Feng, Cuiping Li, and Hong Chen. 2010. Parallel SimRank computation on large graphs with iterative aggregation. In Proceedings of the KDD.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jun He, Hongyan Liu, Jeffrey Xu Yu, Pei Li, Wei He, and Xiaoyong Du. 2014. Assessing single-pair similarity over graphs by aggregating first-meeting probabilities. Inf. Syst. 42 (2014), 107--122. DOI:https://doi.org/10.1016/j.is.2013.12.008Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Roger A. Horn and Charles R. Johnson. 1990. Matrix Analysis. Cambridge University Press.Google ScholarGoogle Scholar
  14. Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the KDD. 538--543.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the WWW. 271--279. DOI:https://doi.org/10.1145/775152.775191Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Minhao Jiang, Ada Wai-Chee Fu, Raymond Chi-Wing Wong, and Ke Wang. 2017. READS: A random walk approach for efficient and accurate dynamic SimRank. Proc. VLDB Endow. 10, 9 (2017), 937--948. DOI:https://doi.org/10.14778/3099622.3099625Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ruoming Jin, Victor E. Lee, and Longjie Li. 2014. Scalable and axiomatic ranking of network role similarity. ACM Trans. Knowl. Disc. Data 8, 1 (2014), 3:1--3:37. DOI:https://doi.org/10.1145/2518176Google ScholarGoogle Scholar
  18. Yaron Kanza, Elad Kravi, Eliyahu Safra, and Yehoshua Sagiv. 2017. Location-based distance measures for geosocial similarity. ACM Trans. Web 11, 3 (2017), 17:1--17:32. DOI:https://doi.org/10.1145/3054951Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604--632.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mitsuru Kusumoto, Takanori Maehara, and Ken-ichi Kawarabayashi. 2014. Scalable similarity search for SimRank. In Proceedings of the SIGMOD. 325--336. DOI:https://doi.org/10.1145/2588555.2610526Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yann LeCun, Yoshua Bengio, et al. 1995. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks. The MIT Press, 255--258.Google ScholarGoogle Scholar
  22. Jure Leskovec, Daniel P. Huttenlocher, and Jon M. Kleinberg. 2010. Signed networks in social media. In Proceedings of the CHI. 1361--1370.Google ScholarGoogle Scholar
  23. Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. 2010. Fast computation of SimRank for static and dynamic information networks. In Proceedings of the EDBT.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xuefei Li, Weiren Yu, Bo Yang, and Jiajin Le. 2011. ASAP: Towards accurate, stable and accelerative penetrating-rank estimation on large graphs. In Proceedings of the WAIM. 415--429.Google ScholarGoogle ScholarCross RefCross Ref
  25. Zhenguo Li, Yixiang Fang, Qin Liu, Jiefeng Cheng, Reynold Cheng, and John C. S. Lui. 2015. Walking in the cloud: Parallel SimRank at scale. Proc. VLDB Endow. 9, 1 (2015), 24--35. http://www.vldb.org/pvldb/vol9/p24-li.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yu-Ru Lin, Hari Sundaram, Yun Chi, Jun’ichi Tatemura, and Belle L. Tseng. 2008. Detecting splogs via temporal dynamics using self-similarity analysis. ACM Trans. Web 2, 1 (2008), 4:1--4:35. DOI:https://doi.org/10.1145/1326561.1326565Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zhenjiang Lin, Michael R. Lyu, and Irwin King. 2012. MatchSim: A novel similarity measure based on maximum neighborhood matching. Knowl. Inf. Syst. 32, 1 (2012), 141--166. DOI:https://doi.org/10.1007/s10115-011-0427-zGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yu Liu, Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai Zheng, and Jiaheng Lu. 2017. ProbeSim: Scalable single-source and Top-k SimRank computations on dynamic graphs. Proc. VLDB Endow. 11, 1 (2017), 14--26. DOI:https://doi.org/10.14778/3151113.3151115Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dmitry Lizorkin, Pavel Velikhov, Maxim N. Grinev, and Denis Turdakov. 2010. Accuracy estimate and optimization techniques for SimRank computation. VLDB J. 19, 1 (2010).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Takanori Maehara, Mitsuru Kusumoto, and Ken-ichi Kawarabayashi. 2015. Scalable SimRank join algorithm. In Proceedings of the ICDE. 603--614. DOI:https://doi.org/10.1109/ICDE.2015.7113318Google ScholarGoogle ScholarCross RefCross Ref
  31. Carl Meyer. 2001. Matrix Analysis and Applied Linear Algebra. SIAM: Society for Industrial and Applied Mathematics.Google ScholarGoogle Scholar
  32. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.Google ScholarGoogle Scholar
  33. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the SIGKDD. 701--710. DOI:https://doi.org/10.1145/2623330.2623732Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. 2015. An efficient similarity search framework for SimRank over large dynamic graphs. Proc. VLDB Endow. 8, 8 (2015), 838--849. http://www.vldb.org/pvldb/vol8/p838-shao.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Henry Small. 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 4 (1973), 265--269. DOI:https://doi.org/10.1002/asi.4630240406Google ScholarGoogle ScholarCross RefCross Ref
  36. Wenbo Tao, Minghe Yu, and Guoliang Li. 2014. Efficient Top-K SimRank-based similarity join. Proc. VLDB Endow. 8, 3 (2014), 317--328. http://www.vldb.org/pvldb/vol8/p317-tao.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Boyu Tian and Xiaokui Xiao. 2016. SLING: A near-optimal index structure for SimRank. In Proceedings of the SIGMOD. 1859--1874. DOI:https://doi.org/10.1145/2882903.2915243Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In Proceedings of the ICDM. 613--622. DOI:https://doi.org/10.1109/ICDM.2006.70Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Yue Wang, Xiang Lian, and Lei Chen. 2018. Efficient SimRank tracking in dynamic graphs. In Proceedings of the ICDE. 545--556. DOI:https://doi.org/10.1109/ICDE.2018.00056Google ScholarGoogle ScholarCross RefCross Ref
  40. Wensi Xi, Edward A. Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. 2005. SimFusion: Measuring similarity using unified relationship matrix. In Proceedings of the SIGIR.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Seok-Ho Yoon, Sang-Wook Kim, and Sunju Park. 2016. C-Rank: A link-based similarity measure for scientific literature databases. Inf. Sci. 326 (2016), 25--40. DOI:https://doi.org/10.1016/j.ins.2015.07.036Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Brit Youngmann, Tova Milo, and Amit Somech. 2019. Boosting SimRank with semantics. In Proceedings of the EDBT. 37--48. DOI:https://doi.org/10.5441/002/edbt.2019.05Google ScholarGoogle Scholar
  43. Weiren Yu, Jiajin Le, Xuemin Lin, and Wenjie Zhang. 2012. On the efficiency of estimating penetrating rank on large graphs. In Proceedings of the SSDBM. 231--249.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Weiren Yu, Xuemin Lin, and Jiajin Le. 2010. Taming computational complexity: Efficient and parallel SimRank optimizations on undirected graphs. In Proceedings of the WAIM.Google ScholarGoogle ScholarCross RefCross Ref
  45. Weiren Yu, Xuemin Lin, Wenjie Zhang, Lijun Chang, and Jian Pei. 2013. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. Proc. VLDB Endow. 7, 1 (2013), 13--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Weiren Yu, Xuemin Lin, Wenjie Zhang, and Julie A. McCann. 2015. Fast all-pairs SimRank assessment on large graphs and bipartite domains. IEEE Trans. Knowl. Data Eng. 27, 7 (2015), 1810--1823. DOI:https://doi.org/10.1109/TKDE.2014.2339828Google ScholarGoogle ScholarCross RefCross Ref
  47. Weiren Yu, Xuemin Lin, Wenjie Zhang, and Julie A. McCann. 2018. Dynamical SimRank search on time-varying networks. VLDB J. 27, 1 (2018), 79--104. DOI:https://doi.org/10.1007/s00778-017-0488-zGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  48. Weiren Yu, Xuemin Lin, Wenjie Zhang, Jian Pei, and Julie A. McCann. 2019. SimRank*: Effective and scalable pairwise similarity search based on graph topology. VLDB J. 28, 3 (2019), 401--426. DOI:https://doi.org/10.1007/s00778-018-0536-3Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Weiren Yu and Julie A. McCann. 2015. Efficient partial-pairs SimRank search for large networks. Proc. VLDB Endow. 8, 5 (2015), 569--580. http://www.vldb.org/pvldb/vol8/p569-yu.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Weiren Yu and Julie Ann McCann. 2015. High quality graph-based similarity search. In Proceedings of the ACM SIGIR. 83--92. DOI:https://doi.org/10.1145/2766462.2767720Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Weiren Yu and Julie A. McCann. 2016. Random walk with restart over dynamic graphs. In Proceedings of the ICDM. 589--598. DOI:https://doi.org/10.1109/ICDM.2016.0070Google ScholarGoogle Scholar
  52. Weiren Yu and Fan Wang. 2018. Fast exact CoSimRank search on evolving and static graphs. In Proceedings of the WWW. 599--608. DOI:https://doi.org/10.1145/3178876.3186126Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. 2017. An experimental evaluation of SimRank-based similarity search algorithms. Proc. VLDB Endow. 10, 5 (2017), 601--612. DOI:https://doi.org/10.14778/3055540.3055552Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Peixiang Zhao, Jiawei Han, and Yizhou Sun. 2009. P-Rank: A comprehensive structural similarity measure over information networks. In Proceedings of the CIKM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2, 1 (2009).Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Rong Zhu, Zhaonian Zou, and Jianzhong Li. 2016. SimRank computation on uncertain graphs. In Proceedings of the ICDE. 565--576. DOI:https://doi.org/10.1109/ICDE.2016.7498271Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient Pairwise Penetrating-rank Similarity Retrieval

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)16
        • Downloads (Last 6 weeks)2

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!