Abstract
Given an input graph G and a node v ∈ G, homogeneous network embedding (HNE) maps the graph structure in the vicinity of v to a compact, fixed-dimensional feature vector. This paper focuses on HNE for massive graphs, e.g., with billions of edges. On this scale, most existing approaches fail, as they incur either prohibitively high costs, or severely compromised result utility.
Our proposed solution, called Node-Reweighted PageRank (NRP), is based on a classic idea of deriving embedding vectors from pairwise personalized PageRank (PPR) values. Our contributions are twofold: first, we design a simple and efficient baseline HNE method based on PPR that is capable of handling billion-edge graphs on commodity hardware; second and more importantly, we identify an inherent drawback of vanilla PPR, and address it in our main proposal NRP. Specifically, PPR was designed for a very different purpose, i.e., ranking nodes in G based on their relative importance from a source node's perspective. In contrast, HNE aims to build node embeddings considering the whole graph. Consequently, node embeddings derived directly from PPR are of suboptimal utility.
The proposed NRP approach overcomes the above deficiency through an effective and efficient node reweighting algorithm, which augments PPR values with node degree information, and iteratively adjusts embedding vectors accordingly. Overall, NRP takes O(mlogn) time and O(m) space to compute all node embeddings for a graph with m edges and n nodes. Our extensive experiments that compare NRP against 18 existing solutions over 7 real graphs demonstrate that NRP achieves higher result utility than all the solutions for link prediction, graph reconstruction and node classification, while being up to orders of magnitude faster. In particular, on a billion-edge Twitter graph, NRP terminates within 4 hours, using a single CPU core.
- S. Abu-El-Haija, B. Perozzi, R. Al-Rfou, and A. A. Alemi. Watch your step: Learning node embeddings via graph attention. In NeurIPS, pages 9180--9190, 2018.Google Scholar
- A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed large-scale natural graph factorization. In WWW, pages 37--48, 2013.Google Scholar
Digital Library
- L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links in social networks. In WSDM, pages 635--644, 2011.Google Scholar
Digital Library
- M. J. Brzozowski and D. M. Romero. Who should i follow? recommending people in directed social networks. In Fifth International AAAI Conference on Weblogs and Social Media, 2011.Google Scholar
- H. Cai, V. W. Zheng, and K. C. Chang. A comprehensive survey of graph embedding: Problems, techniques, and applications. TKDE, 30(9):1616--1637, 2018.Google Scholar
Cross Ref
- S. Cao, W. Lu, and Q. Xu. Grarep: Learning graph representations with global structural information. In CIKM, pages 891--900, 2015.Google Scholar
Digital Library
- S. Cao, W. Lu, and Q. Xu. Deep neural networks for learning graph representations. In AAAI, 2016.Google Scholar
Cross Ref
- H. Chen, B. Perozzi, Y. Hu, and S. Skiena. HARP: hierarchical representation learning for networks. In AAAI, 2018.Google Scholar
- H. Chen, H. Yin, T. Chen, Q. V. H. Nguyen, W.-C. Peng, and X. Li. Exploiting centrality information with graph convolutions for network representation learning. In ICDE, pages 590--601, 2019.Google Scholar
Cross Ref
- K. L. Clarkson and D. P. Woodruff. Low-rank approximation and regression in input sparsity time. STOC, pages 81--90, 2013.Google Scholar
Digital Library
- P. Cui, X. Wang, J. Pei, and W. Zhu. A survey on network embedding. TKDE, 31(5):833--852, 2018.Google Scholar
Cross Ref
- Q. Dai, Q. Li, J. Tang, and D. Wang. Adversarial network embedding. In AAAI, 2018.Google Scholar
Cross Ref
- Q. Dai, X. Shen, L. Zhang, Q. Li, and D. Wang. Adversarial training methods for network embedding. In WWW, pages 329--339, 2019.Google Scholar
- C. Donnat, M. Zitnik, D. Hallac, and J. Leskovec. Learning structural node embeddings via diffusion wavelets. In KDD, pages 1320--1329, 2018.Google Scholar
Digital Library
- H. Gao and H. Huang. Self-paced network embedding. In KDD, pages 1406--1415, 2018.Google Scholar
Digital Library
- A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In KDD, pages 855--864, 2016.Google Scholar
Digital Library
- Y. Gu, Y. Sun, Y. Li, and Y. Yang. Rare: Social rank regulated large-scale network embedding. In WWW, pages 359--368, 2018.Google Scholar
- Y. Gu, Y. Sun, Y. Li, and Y. Yang. Rare: Social rank regulated large-scale network embedding. In WWW, pages 359--368, 2018.Google Scholar
- N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217--288, 2011.Google Scholar
Digital Library
- G. Jeh and J. Widom. Scaling personalized web search. In WWW, pages 271--279, 2003.Google Scholar
Digital Library
- Kaggle, 2012. https://www.kaggle.com/c7kddcup2012-track1.Google Scholar
- T. N. Kipf and M. Welling. Variational graph auto-encoders. NeurIPS Workshop, 2016.Google Scholar
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.Google Scholar
- H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW, pages 591--600, 2010.Google Scholar
Digital Library
- Y.-A. Lai, C.-C. Hsu, W. Chen, M.-Y. Yeh, and S.-D. Lin. Prune: Preserving proximity and global ranking for network embedding. In NeurIPS, pages 5257--5266, 2017.Google Scholar
- A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. Pytorch-biggraph: A large-scale graph embedding system. In SysML, 2019.Google Scholar
- X. Liu, T. Murata, K.-S. Kim, C. Kotarasu, and C. Zhuang. A general view for network embedding as matrix factorization. In WSDM, pages 375--383, 2019.Google Scholar
Digital Library
- J. Ma, P. Cui, X. Wang, and W. Zhu. Hierarchical taxonomy aware network embedding. In KDD, pages 1920--1929, 2018.Google Scholar
Digital Library
- C. Musco and C. Musco. Randomized block krylov methods for stronger and faster approximate singular value decomposition. In NeurIPS, pages 1396--1404, 2015.Google Scholar
- M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu. Asymmetric transitivity preserving graph embedding. In KDD, pages 1105--1114, 2016.Google Scholar
Digital Library
- B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: online learning of social representations. In KDD, pages 701--710, 2014.Google Scholar
Digital Library
- B. Perozzi, V. Kulkarni, H. Chen, and S. Skiena. Don't walk, skip!: Online learning of multi-scale network embeddings. In ASONAM, pages 258--265, 2017.Google Scholar
Digital Library
- J. Qiu, Y. Dong, H. Ma, J. Li, C. Wang, K. Wang, and J. Tang. Netsmf: Large-scale network embedding as sparse matrix factorization. In WWW, pages 1509--1520, 2019.Google Scholar
- J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang. Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec. In WSDM, pages 459--467, 2018.Google Scholar
Digital Library
- P. Radivojac, W. T. Cark, T. R. Oron, A. M. Schnoes, T. Wittkop, A. Sokolov, K. Graim, C. Funk, K. Verspoor, and et. al. A large-scale evaluation of computational protein function prediction. Nature methods, 10(3):221, 2013.Google Scholar
Cross Ref
- L. F. R. Ribeiro, P. H. P. Saverese, and D. R. Figueiredo. struc2vec: Learning node representations from structural identity. In KDD, pages 385--394, 2017.Google Scholar
Digital Library
- T. Sarlos. Improved approximation algorithms for large matrices via random projections. In FOCS, pages 143--152, 2006.Google Scholar
Digital Library
- J. Shi, R. Yang, T. Jin, X. Xiao, and Y. Yang. Realtime top-k personalized pagerank over large graphs on gpus. PVLDB, 13(1):15--28, 2019.Google Scholar
Digital Library
- J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: large-scale information network embedding. In WWW, pages 1067--1077, 2015.Google Scholar
Digital Library
- L. Tang and H. Liu. Leveraging social media networks for classification. DMKD, 23(3):447--478, 2011.Google Scholar
Digital Library
- R. Trivedi, B. Sisman, X. L. Dong, C. Faloutsos, J. Ma, and H. Zha. Linknbed: Multi-graph representation learning with entity linkage. In ACL, pages 252--262, 2018.Google Scholar
Cross Ref
- A. Tsitsulin, D. Mottin, P. Karras, and E. Müller. Verse: Versatile graph embeddings from similarity measures. In WWW, pages 539--548, 2018.Google Scholar
- K. Tu, P. Cui, X. Wang, P. S. Yu, and W. Zhu. Deep recursive network embedding with regular equivalence. In KDD, pages 2357--2366, 2018.Google Scholar
Digital Library
- D. Wang, P. Cui, and W. Zhu. Structural deep network embedding. In KDD, pages 1225--1234, 2016.Google Scholar
Digital Library
- H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, X. Xing, and M. Guo. Graphgan: Graph representation learning with generative adversarial nets. In AAAI, 2018.Google Scholar
Cross Ref
- J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, and D. L. Lee. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In KDD, pages 839--848, 2018.Google Scholar
Digital Library
- Q. Wang, S. Wang, M. Gong, and Y. Wu. Feature hashing for network representation learning. In IJCAI, pages 2812--2818, 2018.Google Scholar
Cross Ref
- R. Wang, S. Wang, and X. Zhou. Parallelizing approximate single-source personalized pagerank queries on shared memory. VLDBJ, 28(6):923--940, 2019.Google Scholar
Cross Ref
- S. Wang, Y. Tang, X. Xiao, Y. Yang, and Z. Li. Hubppr: Effective indexing for approximate personalized pagerank. PVLDB, 10(3):205--216, 2016.Google Scholar
Digital Library
- S. Wang, R. Yang, R. Wang, X. Xiao, Z. Wei, W. Lin, Y. Yang, and N. Tang. Efficient algorithms for approximate single-source personalized pagerank queries. TODS, 44(4):18, 2019.Google Scholar
Digital Library
- S. Wang, R. Yang, X. Xiao, Z. Wei, and Y. Yang. FORA: simple and effective approximate single-source personalized pagerank. In KDD, pages 505--514, 2017.Google Scholar
Digital Library
- X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang. Community preserving network embedding. In AAAI, 2017.Google Scholar
Digital Library
- Z. Wei, X. He, X. Xiao, S. Wang, S. Shang, and J. Wen. Topppr: Top-k personalized pagerank queries with precision guarantees on large graphs. In SIGMOD, pages 441--456, 2018.Google Scholar
Digital Library
- S. J. Wright. Coordinate descent algorithms. Mathematical Programming, 2015.Google Scholar
Digital Library
- L. Y. Wu, A. Fisch, S. Chopra, K. Adams, A. Bordes, and J. Weston. Starspace: Embed all the things! In AAAI, 2018.Google Scholar
- C. Yang, M. Sun, Z. Liu, and C. Tu. Fast network embedding enhancement via high order proximity approximation. In IJCAI, pages 3894--3900, 2017.Google Scholar
Cross Ref
- J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. KAIS, 42(1):181--213, 2015.Google Scholar
Digital Library
- R. Yang, J. Shi, X. Xiao, S. S. Bhowmick, and Y. Yang. Homogeneous network embedding for massive graphs via personalized pagerank. arXiv preprint, 2019.Google Scholar
- Y. Yin and Z. Wei. Scalable graph embeddings via sparse transpose proximities. In KDD, 2019.Google Scholar
Digital Library
- W. Yu, C. Zheng, W. Cheng, C. C. Aggarwal, D. Song, B. Zong, H. Chen, and W. Wang. Learning deep network representations with adversarially regularized autoencoders. In KDD, pages 2663--2671, 2018.Google Scholar
Digital Library
- D. Zhang, J. Yin, X. Zhu, and C. Zhang. Network representation learning: A survey. IEEE Trans. Big Data, 2018.Google Scholar
Cross Ref
- J. Zhang, Y. Dong, Y. Wang, J. Tang, and M. Ding. Prone: Fast and scalable network representation learning. In IJCAI, pages 4278--4284, 2019.Google Scholar
Cross Ref
- Z. Zhang, P. Cui, H. Li, X. Wang, and W. Zhu. Billion-scale network embedding with iterative random projection. In ICDM, pages 787--796, 2018.Google Scholar
Cross Ref
- Z. Zhang, P. Cui, X. Wang, J. Pei, X. Yao, and W. Zhu. Arbitrary-order proximity preserved network embedding. In KDD, pages 2778--2786, 2018.Google Scholar
Digital Library
- C. Zhou, Y. Liu, X. Liu, Z. Liu, and J. Gao. Scalable graph embedding for asymmetric proximity. In AAAI, 2017.Google Scholar
Digital Library
- D. Zhu, P. Cui, D. Wang, and W. Zhu. Deep variational network embedding in wasserstein space. In KDD, pages 2827--2836, 2018.Google Scholar
Digital Library
- Z. Zhu, S. Xu, M. Qu, and J. Tang. Graphvite: A high-performance cpu-gpu hybrid system for node embedding. In WWW, pages 2494--2504, 2019.Google Scholar
Index Terms
(auto-classified)Homogeneous network embedding for massive graphs via reweighted personalized PageRank
Recommendations
1-Homogeneous Graphs with Cocktail Party μ-Graphs
Let Γ be a graph with diameter d ≥ 2. Recall Γ is 1-homogeneous (in the sense of Nomura) whenever for every edge xy of Γ the distance partition
{{z ∈ V(Γ) | ∂(z, y) = i, ∂(x, z) = j} | 0 ≤ i, j ≤ d}
is equitable and its parameters do not depend on the ...
AT4 family and 2-homogeneous graphs
The 2000 Com2MaC conference on association schemes, codes and designsLet @C denote an antipodal distance-regular graph of diameter four, with eigenvalues [email protected]"0>@q"1>...>@q"4 and antipodal class size r. Then its Krein parameters satisfyq"1"1^2q"1"2^3q"1"3^4q"2"2^2q"2"2^4q"2"3^3q"2"4^4q"3"3^4>0,q"1"2^2=q"1"2^4=q"1"4^4=q"2"...
Pseudo 1-homogeneous distance-regular graphs
Let Γ be a distance-regular graph of diameter d 2 and a 1 0. Let be a real number. A pseudo cosine sequence for is a sequence of real numbers 0 , , d such that 0 =1 and c i i 1 + a i i + b i i +1 =...






Comments