ABSTRACT
A Graph Convolutional Network (GCN) stacks several layers and in each layer performs a PROPagation operation~(PROP) and a TRANsformation operation~(TRAN) for learning node representations over graph-structured data. Though powerful, GCNs tend to suffer performance drop when the model gets deep. Previous works focus on PROPs to study and mitigate this issue, but the role of TRANs is barely investigated. In this work, we study performance degradation of GCNs by experimentally examining how stacking only TRANs or PROPs works. We find that TRANs contribute significantly, or even more than PROPs, to declining performance, and moreover that they tend to amplify node-wise feature variance in GCNs, causing variance inflammation that we identify as a key factor for causing performance drop. Motivated by such observations, we propose a variance-controlling technique termed Node Normalization (NodeNorm), which scales each node's features using its own standard deviation. Experimental results validate the effectiveness of NodeNorm on addressing performance degradation of GCNs. Specifically, it enables deep GCNs to outperform shallow ones in cases where deep models are needed, and to achieve comparable results with shallow ones on 6 benchmark datasets. NodeNorm is a generic plug-in and can well generalize to other GNN architectures. Code is publicly available at https://github.com/miafei/NodeNorm.
Supplemental Material
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).Google Scholar
- Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2019. Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View. arXiv preprint arXiv:1909.03211 (2019).Google Scholar
- Ming Chen, Zhewei Wei, Zengfeng Huang, Bolin Ding, and Yaliang Li. 2020. Simple and Deep Graph Convolutional Networks. arXiv preprint arXiv:2007.02133 (2020).Google Scholar
- Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844--3852. Google Scholar
Digital Library
- Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.Google Scholar
- Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. 2017. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR.org, 1263--1272. Google Scholar
Digital Library
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in neural information processing systems. 1024--1034. Google Scholar
Digital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google Scholar
Cross Ref
- Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687 (2020).Google Scholar
- Qian Huang, Horace He, Abhay Singh, Ser-Nam Lim, and Austin R Benson. 2020. Combining Label Propagation and Simple Models Out-performs Graph Neural Networks. arXiv preprint arXiv:2010.13993 (2020).Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015).Google Scholar
Digital Library
- Junteng Jia and Austion R Benson. 2020. Residual Correlation in Graph Neural Network Regression. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 588--598.Google Scholar
Digital Library
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).Google Scholar
- Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. In Advances in neural information processing systems. 971--980. Google Scholar
Digital Library
- Johannes Klicpera, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Predict then propagate: Graph neural networks meet personalized pagerank. arXiv preprint arXiv:1810.05997 (2018).Google Scholar
- Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. 2019. Deepgcns: Can gcns go as deep as cnns?. In Proceedings of the IEEE International Conference on Computer Vision. 9267--9276.Google Scholar
Cross Ref
- Guohao Li, Chenxin Xiong, Ali Thabet, and Bernard Ghanem. 2020. Deepergcn: All you need to train deeper gcns. arXiv preprint arXiv:2006.07739 (2020).Google Scholar
- Qimai Li, Zhichao Han, and Xiao-Ming Wu. 2018. Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar
Cross Ref
- Péter Mernyei and Cua tua lina Cangea. 2020. Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks. arXiv preprint arXiv:2007.02901 (2020).Google Scholar
- Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In ICML. Google Scholar
Digital Library
- Kenta Oono and Taiji Suzuki. 2019. On asymptotic behaviors of graph cnns from dynamical systems perspective. arXiv preprint arXiv:1905.10947 (2019).Google Scholar
- Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. 2020. Geom-GCN: Geometric Graph Convolutional Networks. In International Conference on Learning Representations. https://openreview.net/forum?id=S1e2agrFvSGoogle Scholar
- Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. 2020. DropEdge: Towards Deep Graph Convolutional Networks on Node Classification. In International Conference on Learning Representations. https://openreview.net/forum?id=Hkx1qkrKPrGoogle Scholar
- Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. 2008. Collective classification in network data. AI magazine, Vol. 29, 3 (2008), 93--93.Google Scholar
- Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. 2018. Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868 (2018).Google Scholar
- Ke Sun, Zhanxing Zhu, and Zhouchen Lin. 2019. Multi-stage self-supervised learning for graph convolutional networks. arXiv preprint arXiv:1902.11038 (2019).Google Scholar
- Matus Telgarsky. 2016. Benefits of depth in neural networks. arXiv preprint arXiv:1602.04485 (2016).Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008. Google Scholar
Digital Library
- Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).Google Scholar
- Vikas Verma, Meng Qu, Alex Lamb, Yoshua Bengio, Juho Kannala, and Jian Tang. 2019. GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning. arXiv preprint arXiv:1909.11715 (2019).Google Scholar
- Felix Wu, Tianyi Zhang, Amauri Holanda de Souza Jr, Christopher Fifty, Tao Yu, and Kilian Q Weinberger. 2019. Simplifying graph convolutional networks. arXiv preprint arXiv:1902.07153 (2019).Google Scholar
- Keyulu Xu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. 2018. Representation learning on graphs with jumping knowledge networks. arXiv preprint arXiv:1806.03536 (2018).Google Scholar
- Lingxiao Zhao and Leman Akoglu. 2019. PairNorm: Tackling Oversmoothing in GNNs. arXiv preprint arXiv:1909.12223 (2019).Google Scholar
- Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. arXiv preprint arXiv:1812.08434 (2018).Google Scholar
Index Terms
Understanding and Resolving Performance Degradation in Deep Graph Convolutional Networks
Recommendations
Normalization and dropout for stochastic computing-based deep convolutional neural networks
AbstractRecently, Deep Convolutional Neural Network (DCNN) has been recognized as the most effective model for pattern recognition and classification tasks. With the fast growing Internet of Things (IoTs) and wearable devices, it becomes ...
Highlights- A novel and efficient stochastic computing based normalization design is proposed, which is comprised of summation and square, activation, and ...
Image super-resolution via deep dilated convolutional networks
2017 IEEE International Conference on Image Processing (ICIP)Deep learning techniques have been successfully applied in single image super-resolution (SR). Recently, researches have shown that increasing the depth of network can significantly improve SR performance. Very deep networks for SR achieved a large ...
Simple and deep graph convolutional networks
ICML'20: Proceedings of the 37th International Conference on Machine LearningGraph convolutional networks (GCNs) are a powerful deep learning approach for graph-structured data. Recently, GCNs and subsequent variants have shown superior performance in various application areas on real-world datasets. Despite their success, most of ...





Comments