Abstract
Given a graph G and a node u ∈ G, a single source SimRank query evaluates the similarity between u and every node v ∈ G. Existing approaches to single source SimRank computation incur either long query response time, or expensive pre-computation, which needs to be performed again whenever the graph G changes. Consequently, to our knowledge none of them is ideal for scenarios in which (i) query processing must be done in realtime, and (ii) the underlying graph G is massive, with frequent updates.
Motivated by this, we propose SimPush, a novel algorithm that answers single source SimRank queries without any pre-computation, and achieves significantly higher query speed than even the fastest known index-based solutions. Further, SimPush provides rigorous result quality guarantees, and its high performance does not rely on any strong assumption of the graph. Specifically, compared to existing methods, SimPush employs a radically different algorithmic design that focuses on (i) identifying a small number of nodes relevant to the query, and subsequently (ii) computing statistics and performing residue push from these nodes only.
We prove the correctness of SimPush, analyze its time complexity, and compare its asymptotic performance with that of existing methods. Meanwhile, we evaluate the practical performance of SimPush through extensive experiments on 9 real datasets. The results demonstrate that SimPush consistently outperforms all existing solutions, often by over an order of magnitude. In particular, on a commodity machine, SimPush answers a single source SimRank query on a web graph containing over 133 million nodes and 5.4 billion edges in under 62 milliseconds, with 0.00035 empirical error, while the fastest index-based competitor needs 1.18 seconds.
- I. Antonellis, H. Garcia-Molina, and C. Chang. Simrank++: query rewriting through link analysis of the click graph. PVLDB, 1(1):408--421, 2008.Google Scholar
Digital Library
- A. A. Benczur, K. Csalogany, and T. Sarlos. Link-based similarity search to fight web spam. In AIRWEB, pages 9--16, 2006.Google Scholar
- A. D. Broido and A. Clauset. Scale-free networks are rare. Nature Communications, 10(1017), 2019.Google Scholar
- U. degli studi di Milano. http://law.di.unimi.it/datasets.php, 2004.Google Scholar
- D. Fogaras and B. Racz. Scaling link-based similarity search. In WWW, pages 641--650, 2005.Google Scholar
- D. Fogaras, B. Racz, K. Csalogany, and T. Sarlos. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.Google Scholar
Cross Ref
- Y. Fujiwara, M. Nakatsuji, H. Shiokawa, and M. Onizuka. Efficient search algorithm for simrank. In ICDE, pages 589--600, 2013.Google Scholar
Digital Library
- G. He, H. Feng, C. Li, and H. Chen. Parallel simrank computation on large graphs with iterative aggregation. In SIGKDD, pages 543--552, 2010.Google Scholar
Digital Library
- W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13--30, 1963.Google Scholar
- G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In SIGKDD, pages 538--543, 2002.Google Scholar
Digital Library
- G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279, 2003.Google Scholar
Digital Library
- M. Jiang, A. W. Fu, R. C. Wong, and K. Wang. READS: A random walk approach for efficient and accurate dynamic simrank. PVLDB, 10(9):937--948, 2017.Google Scholar
Digital Library
- R. Jin, V. E. Lee, and H. Hong. Axiomatic ranking of network role similarity. In SIGKDD, pages 922--930, 2011.Google Scholar
Digital Library
- M. Kusumoto, T. Maehara, and K. Kawarabayashi. Scalable similarity search for simrank. In SIGMOD, pages 325--336, 2014.Google Scholar
Digital Library
- P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012.Google Scholar
Digital Library
- J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, 2014.Google Scholar
- C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of simrank for static and dynamic information networks. In EDBT, pages 465--476, 2010.Google Scholar
Digital Library
- Z. Li, Y. Fang, Q. Liu, J. Cheng, R. Cheng, and J. C. S. Lui. Walking in the cloud: Parallel simrank at scale. PVLDB, 9(1):24--35, 2015.Google Scholar
Digital Library
- D. Liben-Nowell and J. M. Kleinberg. The link-prediction problem for social networks. JASIST, 58(7):1019--1031, 2007.Google Scholar
Cross Ref
- Z. Lin, M. R. Lyu, and I. King. Matchsim: a novel similarity measure based on maximum neighborhood matching. Knowl. Inf. Syst., 32(1):141--166, 2012.Google Scholar
Digital Library
- Y. Liu, B. Zheng, X. He, Z. Wei, X. Xiao, K. Zheng, and J. Lu. Probesim: Scalable single-source and top-k simrank computations on dynamic graphs. PVLDB, 11(1):14--26, 2017.Google Scholar
Digital Library
- D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. PVLDB, 1(1):422--433, 2008.Google Scholar
Digital Library
- T. Maehara, M. Kusumoto, and K. Kawarabayashi. Efficient simrank computation via linearization. CoRR, abs/1411.7228, 2014.Google Scholar
- T. Maehara, M. Kusumoto, and K. Kawarabayashi. Scalable simrank join algorithm. In ICDE, pages 603--614, 2015.Google Scholar
Cross Ref
- S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, pages 117--128, 2002.Google Scholar
Digital Library
- P. Nguyen, P. Tomeo, T. D. Noia, and E. D. Sciascio. An evaluation of simrank and personalized pagerank to build a recommender system for the web of data. In WWW, pages 1477--1482, 2015.Google Scholar
- R. A. Rossi and N. K. Ahmed. The network data repository with interactive graph analytics and visualization. In AAAI, 2015.Google Scholar
Cross Ref
- Y. Shao, B. Cui, L. Chen, M. Liu, and X. Xie. An efficient similarity search framework for simrank over large dynamic graphs. PVLDB, 8(8):838--849, 2015.Google Scholar
Digital Library
- W. Tao, M. Yu, and G. Li. Efficient top-k simrank based similarity join. PVLDB, 8(3):317--328, 2014.Google Scholar
Digital Library
- B. Tian and X. Xiao. Sling: a near-optimal index structure for simrank. In SIGMOD, pages 1859--1874, 2016.Google Scholar
Digital Library
- Y. Wang, X. Lian, and L. Chen. Efficient simrank tracking in dynamic graphs. In ICDE, page 545, 2018.Google Scholar
- Z. Wei, X. He, X. Xiao, S. Wang, Y. Liu, X. Du, and J. Wen. Prsim: Sublinear time simrank computation on large power-law graphs. In SIGMOD, pages 1042--1059, 2019.Google Scholar
Digital Library
- W. Yu, X. Lin, and W. Zhang. Fast incremental simrank on link-evolving graphs. In ICDE, pages 304--315, 2014.Google Scholar
Cross Ref
- W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1):13--24, 2013.Google Scholar
Digital Library
- W. Yu and J. A. McCann. Efficient partial-pairs simrank search for large networks. PVLDB, 8(5):569--580, 2015.Google Scholar
Digital Library
- W. Yu and J. A. McCann. Gauging correct relative rankings for similarity search. In CIKM, pages 1791--1794, 2015.Google Scholar
Digital Library
- W. Yu and J. A. McCann. High quality graph-based similarity search. In SIGIR, pages 83--92, 2015.Google Scholar
Digital Library
- W. Yu, W. Zhang, X. Lin, Q. Zhang, and J. Le. A space and time efficient algorithm for simrank computation. World Wide Web, 15(3):327--353, 2012.Google Scholar
Digital Library
- P. Zhao, J. Han, and Y. Sun. P-rank: a comprehensive structural similarity measure over information networks. In CIKM, pages 553--562, 2009.Google Scholar
Digital Library
- W. Zheng, L. Zou, Y. Feng, L. Chen, and D. Zhao. Efficient simrank based similarity join over large graphs. PVLDB, 6(7):493--504, 2013.Google Scholar
Digital Library
Recommendations
Exact Single-Source SimRank Computation on Large Graphs
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataSimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-k SimRank queries have received increasing attention due to their applications in web mining, social ...
Efficient index-free SimRank similarity search in large graphs by discounting path lengths
AbstractLink-based similarity search aims to find similar nodes for a given query node in a graph, which arises in numerous applications, including web spam detection, social network analysis and web search. Among existing methods, SimRank is ...
Highlights- The proposed approach supports fast query processing without preprocessing.
- A ...
Fast graph query processing with a low-cost index
This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a supergraph. Existing works usually construct an index and performs a ...






Comments