skip to main content
research-article

Realtime index-free single source SimRank processing on web-scale graphs

Published:01 March 2020Publication History
Skip Abstract Section

Abstract

Given a graph G and a node u ∈ G, a single source SimRank query evaluates the similarity between u and every node v ∈ G. Existing approaches to single source SimRank computation incur either long query response time, or expensive pre-computation, which needs to be performed again whenever the graph G changes. Consequently, to our knowledge none of them is ideal for scenarios in which (i) query processing must be done in realtime, and (ii) the underlying graph G is massive, with frequent updates.

Motivated by this, we propose SimPush, a novel algorithm that answers single source SimRank queries without any pre-computation, and achieves significantly higher query speed than even the fastest known index-based solutions. Further, SimPush provides rigorous result quality guarantees, and its high performance does not rely on any strong assumption of the graph. Specifically, compared to existing methods, SimPush employs a radically different algorithmic design that focuses on (i) identifying a small number of nodes relevant to the query, and subsequently (ii) computing statistics and performing residue push from these nodes only.

We prove the correctness of SimPush, analyze its time complexity, and compare its asymptotic performance with that of existing methods. Meanwhile, we evaluate the practical performance of SimPush through extensive experiments on 9 real datasets. The results demonstrate that SimPush consistently outperforms all existing solutions, often by over an order of magnitude. In particular, on a commodity machine, SimPush answers a single source SimRank query on a web graph containing over 133 million nodes and 5.4 billion edges in under 62 milliseconds, with 0.00035 empirical error, while the fastest index-based competitor needs 1.18 seconds.

References

  1. I. Antonellis, H. Garcia-Molina, and C. Chang. Simrank++: query rewriting through link analysis of the click graph. PVLDB, 1(1):408--421, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. A. Benczur, K. Csalogany, and T. Sarlos. Link-based similarity search to fight web spam. In AIRWEB, pages 9--16, 2006.Google ScholarGoogle Scholar
  3. A. D. Broido and A. Clauset. Scale-free networks are rare. Nature Communications, 10(1017), 2019.Google ScholarGoogle Scholar
  4. U. degli studi di Milano. http://law.di.unimi.it/datasets.php, 2004.Google ScholarGoogle Scholar
  5. D. Fogaras and B. Racz. Scaling link-based similarity search. In WWW, pages 641--650, 2005.Google ScholarGoogle Scholar
  6. D. Fogaras, B. Racz, K. Csalogany, and T. Sarlos. Towards scaling fully personalized pagerank: Algorithms, lower bounds, and experiments. Internet Mathematics, 2(3):333--358, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  7. Y. Fujiwara, M. Nakatsuji, H. Shiokawa, and M. Onizuka. Efficient search algorithm for simrank. In ICDE, pages 589--600, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. He, H. Feng, C. Li, and H. Chen. Parallel simrank computation on large graphs with iterative aggregation. In SIGKDD, pages 543--552, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American statistical association, 58(301):13--30, 1963.Google ScholarGoogle Scholar
  10. G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In SIGKDD, pages 538--543, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Jiang, A. W. Fu, R. C. Wong, and K. Wang. READS: A random walk approach for efficient and accurate dynamic simrank. PVLDB, 10(9):937--948, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Jin, V. E. Lee, and H. Hong. Axiomatic ranking of network role similarity. In SIGKDD, pages 922--930, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Kusumoto, T. Maehara, and K. Kawarabayashi. Scalable similarity search for simrank. In SIGMOD, pages 325--336, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Lee, L. V. S. Lakshmanan, and J. X. Yu. On top-k structural similarity search. In ICDE, pages 774--785, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, 2014.Google ScholarGoogle Scholar
  17. C. Li, J. Han, G. He, X. Jin, Y. Sun, Y. Yu, and T. Wu. Fast computation of simrank for static and dynamic information networks. In EDBT, pages 465--476, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. Li, Y. Fang, Q. Liu, J. Cheng, R. Cheng, and J. C. S. Lui. Walking in the cloud: Parallel simrank at scale. PVLDB, 9(1):24--35, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Liben-Nowell and J. M. Kleinberg. The link-prediction problem for social networks. JASIST, 58(7):1019--1031, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  20. Z. Lin, M. R. Lyu, and I. King. Matchsim: a novel similarity measure based on maximum neighborhood matching. Knowl. Inf. Syst., 32(1):141--166, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Liu, B. Zheng, X. He, Z. Wei, X. Xiao, K. Zheng, and J. Lu. Probesim: Scalable single-source and top-k simrank computations on dynamic graphs. PVLDB, 11(1):14--26, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Lizorkin, P. Velikhov, M. N. Grinev, and D. Turdakov. Accuracy estimate and optimization techniques for simrank computation. PVLDB, 1(1):422--433, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Maehara, M. Kusumoto, and K. Kawarabayashi. Efficient simrank computation via linearization. CoRR, abs/1411.7228, 2014.Google ScholarGoogle Scholar
  24. T. Maehara, M. Kusumoto, and K. Kawarabayashi. Scalable simrank join algorithm. In ICDE, pages 603--614, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  25. S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, pages 117--128, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Nguyen, P. Tomeo, T. D. Noia, and E. D. Sciascio. An evaluation of simrank and personalized pagerank to build a recommender system for the web of data. In WWW, pages 1477--1482, 2015.Google ScholarGoogle Scholar
  27. R. A. Rossi and N. K. Ahmed. The network data repository with interactive graph analytics and visualization. In AAAI, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  28. Y. Shao, B. Cui, L. Chen, M. Liu, and X. Xie. An efficient similarity search framework for simrank over large dynamic graphs. PVLDB, 8(8):838--849, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W. Tao, M. Yu, and G. Li. Efficient top-k simrank based similarity join. PVLDB, 8(3):317--328, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Tian and X. Xiao. Sling: a near-optimal index structure for simrank. In SIGMOD, pages 1859--1874, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Wang, X. Lian, and L. Chen. Efficient simrank tracking in dynamic graphs. In ICDE, page 545, 2018.Google ScholarGoogle Scholar
  32. Z. Wei, X. He, X. Xiao, S. Wang, Y. Liu, X. Du, and J. Wen. Prsim: Sublinear time simrank computation on large power-law graphs. In SIGMOD, pages 1042--1059, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W. Yu, X. Lin, and W. Zhang. Fast incremental simrank on link-evolving graphs. In ICDE, pages 304--315, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  34. W. Yu, X. Lin, W. Zhang, L. Chang, and J. Pei. More is simpler: Effectively and efficiently assessing node-pair similarities based on hyperlinks. PVLDB, 7(1):13--24, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Yu and J. A. McCann. Efficient partial-pairs simrank search for large networks. PVLDB, 8(5):569--580, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W. Yu and J. A. McCann. Gauging correct relative rankings for similarity search. In CIKM, pages 1791--1794, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. Yu and J. A. McCann. High quality graph-based similarity search. In SIGIR, pages 83--92, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. W. Yu, W. Zhang, X. Lin, Q. Zhang, and J. Le. A space and time efficient algorithm for simrank computation. World Wide Web, 15(3):327--353, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. P. Zhao, J. Han, and Y. Sun. P-rank: a comprehensive structural similarity measure over information networks. In CIKM, pages 553--562, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. W. Zheng, L. Zou, Y. Feng, L. Chen, and D. Zhao. Efficient simrank based similarity join over large graphs. PVLDB, 6(7):493--504, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 7
    March 2020
    194 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 March 2020
    Published in pvldb Volume 13, Issue 7

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!