skip to main content
research-article

Homogeneous network embedding for massive graphs via reweighted personalized PageRank

Published:01 January 2020Publication History
Skip Abstract Section

Abstract

Given an input graph G and a node vG, homogeneous network embedding (HNE) maps the graph structure in the vicinity of v to a compact, fixed-dimensional feature vector. This paper focuses on HNE for massive graphs, e.g., with billions of edges. On this scale, most existing approaches fail, as they incur either prohibitively high costs, or severely compromised result utility.

Our proposed solution, called Node-Reweighted PageRank (NRP), is based on a classic idea of deriving embedding vectors from pairwise personalized PageRank (PPR) values. Our contributions are twofold: first, we design a simple and efficient baseline HNE method based on PPR that is capable of handling billion-edge graphs on commodity hardware; second and more importantly, we identify an inherent drawback of vanilla PPR, and address it in our main proposal NRP. Specifically, PPR was designed for a very different purpose, i.e., ranking nodes in G based on their relative importance from a source node's perspective. In contrast, HNE aims to build node embeddings considering the whole graph. Consequently, node embeddings derived directly from PPR are of suboptimal utility.

The proposed NRP approach overcomes the above deficiency through an effective and efficient node reweighting algorithm, which augments PPR values with node degree information, and iteratively adjusts embedding vectors accordingly. Overall, NRP takes O(mlogn) time and O(m) space to compute all node embeddings for a graph with m edges and n nodes. Our extensive experiments that compare NRP against 18 existing solutions over 7 real graphs demonstrate that NRP achieves higher result utility than all the solutions for link prediction, graph reconstruction and node classification, while being up to orders of magnitude faster. In particular, on a billion-edge Twitter graph, NRP terminates within 4 hours, using a single CPU core.

References

  1. S. Abu-El-Haija, B. Perozzi, R. Al-Rfou, and A. A. Alemi. Watch your step: Learning node embeddings via graph attention. In NeurIPS, pages 9180--9190, 2018.Google ScholarGoogle Scholar
  2. A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In WWW, pages 37--48, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social networks. In WSDM, pages 635--644, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. J. Brzozowski and D. M. Romero. Who should i follow? recommending people in directed social networks. In Fifth International AAAI Conference on Weblogs and Social Media, 2011.Google ScholarGoogle Scholar
  5. H. Cai, V. W. Zheng, and K. C. Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. TKDE, 30(9):1616--1637, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  6. S. Cao, W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. In CIKM, pages 891--900, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Cao, W. Lu, and Q. Xu. Deep neural networks for learning graph representations. In AAAI, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  8. H. Chen, B. Perozzi, Y. Hu, and S. Skiena. HARP: hierarchical representation learning for networks. In AAAI, 2018.Google ScholarGoogle Scholar
  9. H. Chen, H. Yin, T. Chen, Q. V. H. Nguyen, W.-C. Peng, and X. Li. Exploiting centrality information with graph convolutions for network representation learning. In ICDE, pages 590--601, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  10. K. L. Clarkson and D. P. Woodruff. Low-rank approximation and regression in input sparsity time. STOC, pages 81--90, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Cui, X. Wang, J. Pei, and W. Zhu. A survey on network embedding. TKDE, 31(5):833--852, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  12. Q. Dai, Q. Li, J. Tang, and D. Wang. Adversarial network embedding. In AAAI, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  13. Q. Dai, X. Shen, L. Zhang, Q. Li, and D. Wang. Adversarial training methods for network embedding. In WWW, pages 329--339, 2019.Google ScholarGoogle Scholar
  14. C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec. Learning structural node embeddings via diffusion wavelets. In KDD, pages 1320--1329, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Gao and H. Huang. Self-paced network embedding. In KDD, pages 1406--1415, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In KDD, pages 855--864, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Gu, Y. Sun, Y. Li, and Y. Yang. Rare: Social rank regulated large-scale network embedding. In WWW, pages 359--368, 2018.Google ScholarGoogle Scholar
  18. Y. Gu, Y. Sun, Y. Li, and Y. Yang. Rare: Social rank regulated large-scale network embedding. In WWW, pages 359--368, 2018.Google ScholarGoogle Scholar
  19. N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217--288, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kaggle, 2012. https://www.kaggle.com/c7kddcup2012-track1.Google ScholarGoogle Scholar
  22. T. N. Kipf and M. Welling. Variational graph auto-encoders. NeurIPS Workshop, 2016.Google ScholarGoogle Scholar
  23. T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.Google ScholarGoogle Scholar
  24. H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y.-A. Lai, C.-C. Hsu, W. Chen, M.-Y. Yeh, and S.-D. Lin. Prune: Preserving proximity and global ranking for network embedding. In NeurIPS, pages 5257--5266, 2017.Google ScholarGoogle Scholar
  26. A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. Pytorch-biggraph: A large-scale graph embedding system. In SysML, 2019.Google ScholarGoogle Scholar
  27. X. Liu, T. Murata, K.-S. Kim, C. Kotarasu, and C. Zhuang. A general view for network embedding as matrix factorization. In WSDM, pages 375--383, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Ma, P. Cui, X. Wang, and W. Zhu. Hierarchical taxonomy aware network embedding. In KDD, pages 1920--1929, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Musco and C. Musco. Randomized block krylov methods for stronger and faster approximate singular value decomposition. In NeurIPS, pages 1396--1404, 2015.Google ScholarGoogle Scholar
  30. M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu. Asymmetric transitivity preserving graph embedding. In KDD, pages 1105--1114, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: online learning of social representations. In KDD, pages 701--710, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Perozzi, V. Kulkarni, H. Chen, and S. Skiena. Don't walk, skip!: Online learning of multi-scale network embeddings. In ASONAM, pages 258--265, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Qiu, Y. Dong, H. Ma, J. Li, C. Wang, K. Wang, and J. Tang. Netsmf: Large-scale network embedding as sparse matrix factorization. In WWW, pages 1509--1520, 2019.Google ScholarGoogle Scholar
  34. J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM, pages 459--467, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Radivojac, W. T. Cark, T. R. Oron, A. M. Schnoes, T. Wittkop, A. Sokolov, K. Graim, C. Funk, K. Verspoor, and et. al. A large-scale evaluation of computational protein function prediction. Nature methods, 10(3):221, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  36. L. F. R. Ribeiro, P. H. P. Saverese, and D. R. Figueiredo. struc2vec: Learning node representations from structural identity. In KDD, pages 385--394, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. Sarlos. Improved approximation algorithms for large matrices via random projections. In FOCS, pages 143--152, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Shi, R. Yang, T. Jin, X. Xiao, and Y. Yang. Realtime top-k personalized pagerank over large graphs on gpus. PVLDB, 13(1):15--28, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: large-scale information network embedding. In WWW, pages 1067--1077, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. L. Tang and H. Liu. Leveraging social media networks for classification. DMKD, 23(3):447--478, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R. Trivedi, B. Sisman, X. L. Dong, C. Faloutsos, J. Ma, and H. Zha. Linknbed: Multi-graph representation learning with entity linkage. In ACL, pages 252--262, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  42. A. Tsitsulin, D. Mottin, P. Karras, and E. Müller. Verse: Versatile graph embeddings from similarity measures. In WWW, pages 539--548, 2018.Google ScholarGoogle Scholar
  43. K. Tu, P. Cui, X. Wang, P. S. Yu, and W. Zhu. Deep recursive network embedding with regular equivalence. In KDD, pages 2357--2366, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. D. Wang, P. Cui, and W. Zhu. Structural deep network embedding. In KDD, pages 1225--1234, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xing, and M. Guo. Graphgan: Graph representation learning with generative adversarial nets. In AAAI, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  46. J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, and D. L. Lee. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In KDD, pages 839--848, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Q. Wang, S. Wang, M. Gong, and Y. Wu. Feature hashing for network representation learning. In IJCAI, pages 2812--2818, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  48. R. Wang, S. Wang, and X. Zhou. Parallelizing approximate single-source personalized pagerank queries on shared memory. VLDBJ, 28(6):923--940, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  49. S. Wang, Y. Tang, X. Xiao, Y. Yang, and Z. Li. Hubppr: Effective indexing for approximate personalized pagerank. PVLDB, 10(3):205--216, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. Wang, R. Yang, R. Wang, X. Xiao, Z. Wei, W. Lin, Y. Yang, and N. Tang. Efficient algorithms for approximate single-source personalized pagerank queries. TODS, 44(4):18, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. S. Wang, R. Yang, X. Xiao, Z. Wei, and Y. Yang. FORA: simple and effective approximate single-source personalized pagerank. In KDD, pages 505--514, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang. Community preserving network embedding. In AAAI, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Z. Wei, X. He, X. Xiao, S. Wang, S. Shang, and J. Wen. Topppr: Top-k personalized pagerank queries with precision guarantees on large graphs. In SIGMOD, pages 441--456, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. S. J. Wright. Coordinate descent algorithms. Mathematical Programming, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. L. Y. Wu, A. Fisch, S. Chopra, K. Adams, A. Bordes, and J. Weston. Starspace: Embed all the things! In AAAI, 2018.Google ScholarGoogle Scholar
  56. C. Yang, M. Sun, Z. Liu, and C. Tu. Fast network embedding enhancement via high order proximity approximation. In IJCAI, pages 3894--3900, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  57. J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. KAIS, 42(1):181--213, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. R. Yang, J. Shi, X. Xiao, S. S. Bhowmick, and Y. Yang. Homogeneous network embedding for massive graphs via personalized pagerank. arXiv preprint, 2019.Google ScholarGoogle Scholar
  59. Y. Yin and Z. Wei. Scalable graph embeddings via sparse transpose proximities. In KDD, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. W. Yu, C. Zheng, W. Cheng, C. C. Aggarwal, D. Song, B. Zong, H. Chen, and W. Wang. Learning deep network representations with adversarially regularized autoencoders. In KDD, pages 2663--2671, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. D. Zhang, J. Yin, X. Zhu, and C. Zhang. Network representation learning: A survey. IEEE Trans. Big Data, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  62. J. Zhang, Y. Dong, Y. Wang, J. Tang, and M. Ding. Prone: Fast and scalable network representation learning. In IJCAI, pages 4278--4284, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  63. Z. Zhang, P. Cui, H. Li, X. Wang, and W. Zhu. Billion-scale network embedding with iterative random projection. In ICDM, pages 787--796, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  64. Z. Zhang, P. Cui, X. Wang, J. Pei, X. Yao, and W. Zhu. Arbitrary-order proximity preserved network embedding. In KDD, pages 2778--2786, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. C. Zhou, Y. Liu, X. Liu, Z. Liu, and J. Gao. Scalable graph embedding for asymmetric proximity. In AAAI, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. D. Zhu, P. Cui, D. Wang, and W. Zhu. Deep variational network embedding in wasserstein space. In KDD, pages 2827--2836, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Z. Zhu, S. Xu, M. Qu, and J. Tang. Graphvite: A high-performance cpu-gpu hybrid system for node embedding. In WWW, pages 2494--2504, 2019.Google ScholarGoogle Scholar

Index Terms

(auto-classified)
  1. Homogeneous network embedding for massive graphs via reweighted personalized PageRank

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 13, Issue 5
      January 2020
      195 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 January 2020
      Published in pvldb Volume 13, Issue 5

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!