Abstract
Many web applications demand a measure of similarity between two entities, such as collaborative filtering, web document ranking, linkage prediction, and anomaly detection. P-Rank (Penetrating-Rank) has been accepted as a promising graph-based similarity measure, as it provides a comprehensive way of encoding both incoming and outgoing links into assessment. However, the existing method to compute P-Rank is iterative in nature and rather cost-inhibitive. Moreover, the accuracy estimate and stability issues for P-Rank computation have not been addressed. In this article, we consider the optimization techniques for P-Rank search that encompasses its accuracy, stability, and computational efficiency. (1) The accuracy estimation is provided for P-Rank iterations, with the aim to find out the number of iterations, k, required to guarantee a desired accuracy. (2) A rigorous bound on the condition number of P-Rank is obtained for stability analysis. Based on this bound, it can be shown that P-Rank is stable and well-conditioned when the damping factors are chosen to be suitably small. (3) Two matrix-based algorithms, applicable to digraphs and undirected graphs, are, respectively, devised for efficient P-Rank computation, which improves the computational time from O(kn3) to O(υ n2+υ6) for digraphs, and to O(υn2) for undirected graphs, where n is the number of vertices in the graph, and υ (≪ n) is the target rank of the graph. Moreover, our proposed algorithms can significantly reduce the memory space of P-Rank computations from O(n2) to O(υn+υ4) for digraphs, and to O(υ n) for undirected graphs, respectively. Finally, extensive experiments on real-world and synthetic datasets demonstrate the usefulness and efficiency of the proposed techniques for P-Rank similarity assessment on various networks.
- Robert Amsler. 1972. Application of Citation-based Automatic Classification. Technical Report. The University of Texas at Austin, Linguistics Research Center.Google Scholar
- Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. 2008. SimRank++: Query rewriting through link analysis of the click graph. Proc. VLDB Endow. 1, 1 (2008).Google Scholar
Digital Library
- Yuanzhe Cai, Miao Zhang, Chris H. Q. Ding, and Sharma Chakravarthy. 2010. Closed form solution of similarity algorithms. In Proceedings of the SIGIR. 709--710.Google Scholar
Digital Library
- Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2015. GraRep: Learning graph representations with global structural information. In Proceedings of the CIKM. 891--900. DOI:https://doi.org/10.1145/2806416.2806512Google Scholar
Digital Library
- Dániel Fogaras and Balázs Rácz. 2005. Scaling link-based similarity search. In Proceedings of the WWW. 641--650.Google Scholar
Digital Library
- Yasuhiro Fujiwara, Makoto Nakatsuji, Hiroaki Shiokawa, and Makoto Onizuka. 2013. Efficient search algorithm for SimRank. In Proceedings of the ICDE. 589--600.Google Scholar
Digital Library
- Gene H. Golub and Charles F. Van Loan. 1996. Matrix Computations (3rd ed.). John Hopkins University Press. 50 pages.Google Scholar
- Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable feature learning for networks. In Proceedings of the SIGKDD. 855--864. DOI:https://doi.org/10.1145/2939672.2939754Google Scholar
Digital Library
- Masoud Reyhani Hamedani and Sang-Wook Kim. 2019. Pairwise normalization in SimRank variants: Problem, solution, and evaluation. In Proceedings of the SAC. 534--541. DOI:https://doi.org/10.1145/3297280.3297331Google Scholar
Digital Library
- Taher H. Haveliwala. 2003. Topic-Sensitive PageRank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15, 4 (2003), 784--796. DOI:https://doi.org/10.1109/TKDE.2003.1208999Google Scholar
Digital Library
- Guoming He, Haijun Feng, Cuiping Li, and Hong Chen. 2010. Parallel SimRank computation on large graphs with iterative aggregation. In Proceedings of the KDD.Google Scholar
Digital Library
- Jun He, Hongyan Liu, Jeffrey Xu Yu, Pei Li, Wei He, and Xiaoyong Du. 2014. Assessing single-pair similarity over graphs by aggregating first-meeting probabilities. Inf. Syst. 42 (2014), 107--122. DOI:https://doi.org/10.1016/j.is.2013.12.008Google Scholar
Digital Library
- Roger A. Horn and Charles R. Johnson. 1990. Matrix Analysis. Cambridge University Press.Google Scholar
- Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the KDD. 538--543.Google Scholar
Digital Library
- Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the WWW. 271--279. DOI:https://doi.org/10.1145/775152.775191Google Scholar
Digital Library
- Minhao Jiang, Ada Wai-Chee Fu, Raymond Chi-Wing Wong, and Ke Wang. 2017. READS: A random walk approach for efficient and accurate dynamic SimRank. Proc. VLDB Endow. 10, 9 (2017), 937--948. DOI:https://doi.org/10.14778/3099622.3099625Google Scholar
Digital Library
- Ruoming Jin, Victor E. Lee, and Longjie Li. 2014. Scalable and axiomatic ranking of network role similarity. ACM Trans. Knowl. Disc. Data 8, 1 (2014), 3:1--3:37. DOI:https://doi.org/10.1145/2518176Google Scholar
- Yaron Kanza, Elad Kravi, Eliyahu Safra, and Yehoshua Sagiv. 2017. Location-based distance measures for geosocial similarity. ACM Trans. Web 11, 3 (2017), 17:1--17:32. DOI:https://doi.org/10.1145/3054951Google Scholar
Digital Library
- Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (1999), 604--632.Google Scholar
Digital Library
- Mitsuru Kusumoto, Takanori Maehara, and Ken-ichi Kawarabayashi. 2014. Scalable similarity search for SimRank. In Proceedings of the SIGMOD. 325--336. DOI:https://doi.org/10.1145/2588555.2610526Google Scholar
Digital Library
- Yann LeCun, Yoshua Bengio, et al. 1995. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks. The MIT Press, 255--258.Google Scholar
- Jure Leskovec, Daniel P. Huttenlocher, and Jon M. Kleinberg. 2010. Signed networks in social media. In Proceedings of the CHI. 1361--1370.Google Scholar
- Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. 2010. Fast computation of SimRank for static and dynamic information networks. In Proceedings of the EDBT.Google Scholar
Digital Library
- Xuefei Li, Weiren Yu, Bo Yang, and Jiajin Le. 2011. ASAP: Towards accurate, stable and accelerative penetrating-rank estimation on large graphs. In Proceedings of the WAIM. 415--429.Google Scholar
Cross Ref
- Zhenguo Li, Yixiang Fang, Qin Liu, Jiefeng Cheng, Reynold Cheng, and John C. S. Lui. 2015. Walking in the cloud: Parallel SimRank at scale. Proc. VLDB Endow. 9, 1 (2015), 24--35. http://www.vldb.org/pvldb/vol9/p24-li.pdf.Google Scholar
Digital Library
- Yu-Ru Lin, Hari Sundaram, Yun Chi, Jun’ichi Tatemura, and Belle L. Tseng. 2008. Detecting splogs via temporal dynamics using self-similarity analysis. ACM Trans. Web 2, 1 (2008), 4:1--4:35. DOI:https://doi.org/10.1145/1326561.1326565Google Scholar
Digital Library
- Zhenjiang Lin, Michael R. Lyu, and Irwin King. 2012. MatchSim: A novel similarity measure based on maximum neighborhood matching. Knowl. Inf. Syst. 32, 1 (2012), 141--166. DOI:https://doi.org/10.1007/s10115-011-0427-zGoogle Scholar
Digital Library
- Yu Liu, Bolong Zheng, Xiaodong He, Zhewei Wei, Xiaokui Xiao, Kai Zheng, and Jiaheng Lu. 2017. ProbeSim: Scalable single-source and Top-k SimRank computations on dynamic graphs. Proc. VLDB Endow. 11, 1 (2017), 14--26. DOI:https://doi.org/10.14778/3151113.3151115Google Scholar
Digital Library
- Dmitry Lizorkin, Pavel Velikhov, Maxim N. Grinev, and Denis Turdakov. 2010. Accuracy estimate and optimization techniques for SimRank computation. VLDB J. 19, 1 (2010).Google Scholar
Digital Library
- Takanori Maehara, Mitsuru Kusumoto, and Ken-ichi Kawarabayashi. 2015. Scalable SimRank join algorithm. In Proceedings of the ICDE. 603--614. DOI:https://doi.org/10.1109/ICDE.2015.7113318Google Scholar
Cross Ref
- Carl Meyer. 2001. Matrix Analysis and Applied Linear Algebra. SIAM: Society for Industrial and Applied Mathematics.Google Scholar
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.Google Scholar
- Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In Proceedings of the SIGKDD. 701--710. DOI:https://doi.org/10.1145/2623330.2623732Google Scholar
Digital Library
- Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. 2015. An efficient similarity search framework for SimRank over large dynamic graphs. Proc. VLDB Endow. 8, 8 (2015), 838--849. http://www.vldb.org/pvldb/vol8/p838-shao.pdf.Google Scholar
Digital Library
- Henry Small. 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 4 (1973), 265--269. DOI:https://doi.org/10.1002/asi.4630240406Google Scholar
Cross Ref
- Wenbo Tao, Minghe Yu, and Guoliang Li. 2014. Efficient Top-K SimRank-based similarity join. Proc. VLDB Endow. 8, 3 (2014), 317--328. http://www.vldb.org/pvldb/vol8/p317-tao.pdf.Google Scholar
Digital Library
- Boyu Tian and Xiaokui Xiao. 2016. SLING: A near-optimal index structure for SimRank. In Proceedings of the SIGMOD. 1859--1874. DOI:https://doi.org/10.1145/2882903.2915243Google Scholar
Digital Library
- Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast random walk with restart and its applications. In Proceedings of the ICDM. 613--622. DOI:https://doi.org/10.1109/ICDM.2006.70Google Scholar
Digital Library
- Yue Wang, Xiang Lian, and Lei Chen. 2018. Efficient SimRank tracking in dynamic graphs. In Proceedings of the ICDE. 545--556. DOI:https://doi.org/10.1109/ICDE.2018.00056Google Scholar
Cross Ref
- Wensi Xi, Edward A. Fox, Weiguo Fan, Benyu Zhang, Zheng Chen, Jun Yan, and Dong Zhuang. 2005. SimFusion: Measuring similarity using unified relationship matrix. In Proceedings of the SIGIR.Google Scholar
Digital Library
- Seok-Ho Yoon, Sang-Wook Kim, and Sunju Park. 2016. C-Rank: A link-based similarity measure for scientific literature databases. Inf. Sci. 326 (2016), 25--40. DOI:https://doi.org/10.1016/j.ins.2015.07.036Google Scholar
Digital Library
- Brit Youngmann, Tova Milo, and Amit Somech. 2019. Boosting SimRank with semantics. In Proceedings of the EDBT. 37--48. DOI:https://doi.org/10.5441/002/edbt.2019.05Google Scholar
- Weiren Yu, Jiajin Le, Xuemin Lin, and Wenjie Zhang. 2012. On the efficiency of estimating penetrating rank on large graphs. In Proceedings of the SSDBM. 231--249.Google Scholar
Digital Library
- Weiren Yu, Xuemin Lin, and Jiajin Le. 2010. Taming computational complexity: Efficient and parallel SimRank optimizations on undirected graphs. In Proceedings of the WAIM.Google Scholar
Cross Ref
- Weiren Yu, Xuemin Lin, Wenjie Zhang, Lijun Chang, and Jian Pei. 2013. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. Proc. VLDB Endow. 7, 1 (2013), 13--24.Google Scholar
Digital Library
- Weiren Yu, Xuemin Lin, Wenjie Zhang, and Julie A. McCann. 2015. Fast all-pairs SimRank assessment on large graphs and bipartite domains. IEEE Trans. Knowl. Data Eng. 27, 7 (2015), 1810--1823. DOI:https://doi.org/10.1109/TKDE.2014.2339828Google Scholar
Cross Ref
- Weiren Yu, Xuemin Lin, Wenjie Zhang, and Julie A. McCann. 2018. Dynamical SimRank search on time-varying networks. VLDB J. 27, 1 (2018), 79--104. DOI:https://doi.org/10.1007/s00778-017-0488-zGoogle Scholar
Digital Library
- Weiren Yu, Xuemin Lin, Wenjie Zhang, Jian Pei, and Julie A. McCann. 2019. SimRank*: Effective and scalable pairwise similarity search based on graph topology. VLDB J. 28, 3 (2019), 401--426. DOI:https://doi.org/10.1007/s00778-018-0536-3Google Scholar
Digital Library
- Weiren Yu and Julie A. McCann. 2015. Efficient partial-pairs SimRank search for large networks. Proc. VLDB Endow. 8, 5 (2015), 569--580. http://www.vldb.org/pvldb/vol8/p569-yu.pdf.Google Scholar
Digital Library
- Weiren Yu and Julie Ann McCann. 2015. High quality graph-based similarity search. In Proceedings of the ACM SIGIR. 83--92. DOI:https://doi.org/10.1145/2766462.2767720Google Scholar
Digital Library
- Weiren Yu and Julie A. McCann. 2016. Random walk with restart over dynamic graphs. In Proceedings of the ICDM. 589--598. DOI:https://doi.org/10.1109/ICDM.2016.0070Google Scholar
- Weiren Yu and Fan Wang. 2018. Fast exact CoSimRank search on evolving and static graphs. In Proceedings of the WWW. 599--608. DOI:https://doi.org/10.1145/3178876.3186126Google Scholar
Digital Library
- Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. 2017. An experimental evaluation of SimRank-based similarity search algorithms. Proc. VLDB Endow. 10, 5 (2017), 601--612. DOI:https://doi.org/10.14778/3055540.3055552Google Scholar
Digital Library
- Peixiang Zhao, Jiawei Han, and Yizhou Sun. 2009. P-Rank: A comprehensive structural similarity measure over information networks. In Proceedings of the CIKM.Google Scholar
Digital Library
- Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2, 1 (2009).Google Scholar
Digital Library
- Rong Zhu, Zhaonian Zou, and Jianzhong Li. 2016. SimRank computation on uncertain graphs. In Proceedings of the ICDE. 565--576. DOI:https://doi.org/10.1109/ICDE.2016.7498271Google Scholar
Cross Ref
Index Terms
Efficient Pairwise Penetrating-rank Similarity Retrieval
Recommendations
Scaling High-Quality Pairwise Link-Based Similarity Retrieval on Billion-Edge Graphs
SimRank is an attractive link-based similarity measure used in fertile fields of Web search and sociometry. However, the existing deterministic method by Kusumoto et al. [24] for retrieving SimRank does not always produce high-quality similarity results, ...
Efficient link-based similarity search in web networks
The pre-computation cost in the off-line stage is significantly reduced.The efficiency of query processing is optimized by proposing a pruning algorithm.The accuracy loss of pruning algorithm is controlled by tuning threshold.The effectiveness of ...
Improving performance of similarity measures for uncertain time series using preprocessing techniques
SSDBM '15: Proceedings of the 27th International Conference on Scientific and Statistical Database ManagementWe study the impact of preprocessing techniques on performance and effectiveness of the similarity measures for uncertain time series. Some existing work on uncertain time series use the same similarity measures developed for standard time series, to ...






Comments