ABSTRACT
Traditional methods of pre-training, fine-tuning, and ensembling often overlook essential relational data and task interconnections. To address this gap, our study presents a novel approach to harnessing this relational information via a relational graph-based model. We introduce Relational grAph Model ensemBLE model, abbreviated as RAMBLE. This model distinguishes itself by performing class label inference simultaneously across all data nodes and task nodes, employing the relational graph in a transductive manner. This fine-grained approach allows us to better comprehend and model the intricate interplay between data and tasks. Furthermore, we incorporate a novel variational information bottleneck-guided scheme for embedding fusion and aggregation. This innovative technique facilitates the creation of an informative fusion embedding, honing in on embeddings beneficial for the intended task while simultaneously filtering out potential noise-laden embeddings. Our theoretical analysis, grounded in information theory, confirms that the use of relational information for embedding fusion allows us to achieve higher upper and lower bounds on our target task's accuracy. We thoroughly assess our proposed model across eight diverse datasets, and the experimental results demonstrate the model's effective utilization of relational knowledge derived from all pre-trained models, thereby enhancing its performance on our target tasks.
Supplemental Material
- Fady Alajaji, Po-Ning Chen, et al. 2018. An Introduction to Single-User Information Theory. Springer.Google Scholar
- Alexander Alemi, Ben Poole, Ian Fischer, Joshua Dillon, Rif A Saurous, and Kevin Murphy. 2018. Fixing a broken ELBO. In International Conference on Machine Learning. PMLR, 159--168.Google Scholar
- Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. 2016. Deep variational information bottleneck. arXiv preprint arXiv:1612.00410 (2016).Google Scholar
- Daniel Bolya, Rohit Mittapalli, and Judy Hoffman. 2021a. Scalable Diverse Model Selection for Accessible Transfer Learning. In NeurIPS.Google Scholar
- Daniel Bolya, Rohit Mittapalli, and Judy Hoffman. 2021b. Scalable Diverse Model Selection for Accessible Transfer Learning. Advances in Neural Information Processing Systems, Vol. 34 (2021), 19301--19312.Google Scholar
- Gavin Brown. 2009. An information theoretic perspective on multiple classifier systems. In International Workshop on Multiple Classifier Systems. Springer, 344--353.Google Scholar
Digital Library
- Jiahang Cao, Jinyuan Fang, Zaiqiao Meng, and Shangsong Liang. 2022a. Knowledge Graph Embedding: A Survey from the Perspective of Representation Spaces. arXiv preprint arXiv:2211.03536 (2022).Google Scholar
- Kaidi Cao, Jiaxuan You, and Jure Leskovec. 2022b. Relational multi-task learning: Modeling relations between data and tasks. In International Conference on Representation Learning (ICLR).Google Scholar
- Kaidi Cao, Jiaxuan You, and Jure Leskovec. 2023. Relational multi-task learning: Modeling relations between data and tasks. arXiv preprint arXiv:2303.07666 (2023).Google Scholar
- Chih chan Tien and Shane Steinert-Threlkeld. 2021. Bilingual alignment transfers to multilingual alignment for unsupervised parallel text mining. In ACL.Google Scholar
- Deli Chen, Yankai Lin, Wei Li, Peng Li, Jie Zhou, and Xu Sun. 2020. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3438--3445.Google Scholar
Cross Ref
- Guanzheng Chen, Jinyuan Fang, Zaiqiao Meng, Qiang Zhang, and Shangsong Liang. 2022. Multi-Relational Graph Representation Learning with Bayesian Gaussian Process Network. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 5530--5538.Google Scholar
Cross Ref
- Jie Chen, Tengfei Ma, and Cao Xiao. 2018. Fastgcn: fast learning with graph convolutional networks via importance sampling. arXiv preprint arXiv:1801.10247 (2018).Google Scholar
- Jianfei Chen, Jun Zhu, and Le Song. 2017. Stochastic training of graph convolutional networks with variance reduction. arXiv preprint arXiv:1710.10568 (2017).Google Scholar
- M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi. 2014. Describing Textures in the Wild. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Thomas G Dietterich. 2000. Ensemble methods in machine learning. In International workshop on multiple classifier systems. Springer, 1--15.Google Scholar
Digital Library
- Jinyuan Fang, Shangsong Liang, Zaiqiao Meng, and Maarten De Rijke. 2021a. Hyperspherical variational co-embedding for attributed networks. ACM Transactions on Information Systems (TOIS), Vol. 40, 3 (2021), 1--36.Google Scholar
Digital Library
- Jinyuan Fang, Shangsong Liang, Zaiqiao Meng, and Qiang Zhang. 2021b. Gaussian process with graph convolutional kernel for relational learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 353--363.Google Scholar
Digital Library
- Hongyang Gao and Shuiwang Ji. 2019. Graph representation learning via hard and channel-wise attention networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 741--749.Google Scholar
Digital Library
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 580--587.Google Scholar
Digital Library
- Raphael Gontijo-Lopes, Yann Dauphin, and Ekin D Cubuk. 2021. No one representation to rule them all: Overlapping features of training methods. arXiv preprint arXiv:2110.12899 (2021).Google Scholar
- Priya Goyal, Quentin Duval, Jeremy Reizenstein, Matthew Leavitt, Min Xu, Benjamin Lefaudeux, Mannat Singh, Vinicius Reis, Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Ishan Misra. 2021. VISSL. https://github.com/facebookresearch/vissl.Google Scholar
- Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems, Vol. 30 (2017).Google Scholar
- Martin Hellman and Josef Raviv. 1970. Probability of error, equivocation, and the Chernoff bound. IEEE Transactions on Information Theory, Vol. 16, 4 (1970), 368--372.Google Scholar
Digital Library
- Long-Kai Huang, Junzhou Huang, Yu Rong, Qiang Yang, and Ying Wei. 2022. Frustratingly easy transferability estimation. In International Conference on Machine Learning. PMLR, 9201--9225.Google Scholar
- Wenbing Huang, Tong Zhang, Yu Rong, and Junzhou Huang. 2018. Adaptive sampling towards fast graph representation learning. Advances in neural information processing systems, Vol. 31 (2018).Google Scholar
- Saachi Jain, Hadi Salman, Alaa Khaddaj, Eric Wong, Sung Min Park, and Aleksander Madry. 2022. A data-based perspective on transfer learning. arXiv preprint arXiv:2207.05739 (2022).Google Scholar
- Eric Jang, Shixiang Gu, and Ben Poole. 2016. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016).Google Scholar
- Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Fei-Fei Li. 2011. Novel dataset for fine-grained image categorization: Stanford dogs. In Proc. CVPR workshop on fine-grained visual categorization (FGVC), Vol. 2. Citeseer.Google Scholar
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).Google Scholar
- Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, and Neil Houlsby. 2020. Big transfer (bit): General visual representation learning. In European conference on computer vision. Springer, 491--507.Google Scholar
Digital Library
- Simon Kornblith, Jonathon Shlens, and Quoc V Le. 2019. Do better imagenet models transfer better?. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2661--2671.Google Scholar
Cross Ref
- Alex Krizhevsky, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2017. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, Vol. 30 (2017).Google Scholar
- Hongkang Li, Meng Wang, Sijia Liu, Pin-Yu Chen, and Jinjun Xiong. 2022. Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling. In International Conference on Machine Learning. PMLR, 13014--13051.Google Scholar
- Shangsong Liang, Zhuo Ouyang, and Zaiqiao Meng. 2021. A normalizing flow-based co-embedding model for attributed networks. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 16, 3 (2021), 1--31.Google Scholar
- Siwei Liu, Zaiqiao Meng, Craig Macdonald, and Iadh Ounis. 2023. Graph neural pre-training for recommendation with side information. ACM Transactions on Information Systems, Vol. 41, 3 (2023), 1--28.Google Scholar
Digital Library
- Ziqi Liu, Chaochao Chen, Longfei Li, Jun Zhou, Xiaolong Li, Le Song, and Yuan Qi. 2019. Geniepath: Graph neural networks with adaptive receptive paths. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4424--4431.Google Scholar
Digital Library
- Chunwei Ma, Ziyun Huang, Mingchen Gao, and Jinhui Xu. 2021a. Few-shot Learning via Dirichlet Tessellation Ensemble. In International Conference on Learning Representations.Google Scholar
- Xiaojun Ma, Junshan Wang, Hanyue Chen, and Guojie Song. 2021b. Improving graph neural networks with structural adaptive receptive fields. In Proceedings of the Web Conference 2021. 2438--2447.Google Scholar
Digital Library
- Chris J Maddison, Andriy Mnih, and Yee Whye Teh. 2016. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712 (2016).Google Scholar
- S. Maji, J. Kannala, E. Rahtu, M. Blaschko, and A. Vedaldi. 2013. Fine-Grained Visual Classification of Aircraft. Technical Report. arxiv: 1306.5151 [cs-cv]Google Scholar
- Keerthiram Murugesan, Vijay Sadashivaiah, Ronny Luss, Karthikeyan Shanmugam, Pin-Yu Chen, and Amit Dhurandhar. 2022. Auto-Transfer: Learning to Route Transferrable Representations. arXiv preprint arXiv:2202.01011 (2022).Google Scholar
- Maria-Elena Nilsback and Andrew Zisserman. 2008. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. IEEE, 722--729.Google Scholar
Digital Library
- Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, Vol. 22 (2010), 1345--1359.Google Scholar
Digital Library
- Michal Pándy, Andrea Agostinelli, Jasper Uijlings, Vittorio Ferrari, and Thomas Mensink. 2022. Transferability Estimation using Bhattacharyya Class Separability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9172--9182.Google Scholar
Cross Ref
- Ariadna Quattoni and Antonio Torralba. 2009. Recognizing indoor scenes. In 2009 IEEE conference on computer vision and pattern recognition. IEEE, 413--420.Google Scholar
Cross Ref
- Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. 2019. Dropedge: Towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903 (2019).Google Scholar
- Zhiqiang Shen, Zechun Liu, Jie Qin, Marios Savvides, and Kwang-Ting Cheng. 2021. Partial is better than all: Revisiting fine-tuning strategy for few-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 9594--9602.Google Scholar
Cross Ref
- Yang Shu, Zhi Kou, Zhangjie Cao, Jianmin Wang, and Mingsheng Long. 2021. Zoo-Tuning: Adaptive Transfer from a Zoo of Models. ArXiv, Vol. abs/2106.15434 (2021).Google Scholar
- Qingyun Sun, Jianxin Li, Hao Peng, Jia Wu, Xingcheng Fu, Cheng Ji, and Philip S. Yu. 2021. Graph Structure Learning with Variational Information Bottleneck. In AAAI Conference on Artificial Intelligence.Google Scholar
- Susheel Suresh, Pan Li, Cong Hao, and Jennifer Neville. 2021. Adversarial graph augmentation to improve graph contrastive learning. Advances in Neural Information Processing Systems, Vol. 34 (2021), 15920--15933.Google Scholar
- Anh T Tran, Cuong V Nguyen, and Tal Hassner. 2019. Transferability and hardness of supervised classification tasks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1395--1405.Google Scholar
Cross Ref
- Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200--2011 dataset. (2011).Google Scholar
- Xiao Wang, Meiqi Zhu, Deyu Bo, Peng Cui, Chuan Shi, and Jian Pei. 2020. Am-gcn: Adaptive multi-channel graph convolutional networks. In Proceedings of the 26th ACM SIGKDD International conference on knowledge discovery & data mining. 1243--1253.Google Scholar
Digital Library
- Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, et al. 2022. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning. PMLR, 23965--23998.Google Scholar
- Tailin Wu, Hongyu Ren, Pan Li, and Jure Leskovec. 2020b. Graph information bottleneck. Advances in Neural Information Processing Systems, Vol. 33 (2020), 20437--20448.Google Scholar
- Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020a. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, Vol. 32, 1 (2020), 4--24.Google Scholar
Cross Ref
- Tianpei Yang, Weixun Wang, Hongyao Tang, Jianye Hao, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Yingfeng Chen, Yujing Hu, et al. 2021. An efficient transfer learning framework for multiagent reinforcement learning. Advances in Neural Information Processing Systems, Vol. 34 (2021), 17037--17048.Google Scholar
- Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and Li Fei-Fei. 2011. Human action recognition by learning bases of action attributes and parts. In 2011 International conference on computer vision. IEEE, 1331--1338.Google Scholar
Digital Library
- Shingo Yashima, Teppei Suzuki, Kohta Ishikawa, Ikuro Sato, and Rei Kawakami. 2022. Feature Space Particle Inference for Neural Network Ensembles. arXiv preprint arXiv:2206.00944 (2022).Google Scholar
- Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? Advances in neural information processing systems, Vol. 27 (2014).Google Scholar
Digital Library
- Jialin Zhao, Yuxiao Dong, Ming Ding, Evgeny Kharlamov, and Jie Tang. 2021. Adaptive Diffusion in Graph Neural Networks. Advances in Neural Information Processing Systems, Vol. 34 (2021), 23321--23333.Google Scholar
- Cheng Zheng, Bo Zong, Wei Cheng, Dongjin Song, Jingchao Ni, Wenchao Yu, Haifeng Chen, and Wei Wang. 2020. Robust graph representation learning via neural sparsification. In International Conference on Machine Learning. PMLR, 11458--11468.Google Scholar
- Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI open, Vol. 1 (2020), 57--81.Google Scholar
- Zhi-Hua Zhou and Nan Li. 2010. Multi-information ensemble diversity. In International Workshop on Multiple Classifier Systems. Springer, 134--144.Google Scholar
Digital Library
- Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. 2019. A Comprehensive Survey on Transfer Learning. Proc. IEEE, Vol. 109 (2019), 43--76.Google Scholar
Cross Ref
Index Terms
- Leveraging Relational Graph Neural Network for Transductive Model Ensemble
Recommendations
Transductive relational classification in the co-training paradigm
MLDM'12: Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern RecognitionConsider a multi-relational database, to be used for classification, that contains a large number of unlabeled data. It follows that the cost of labeling such data is prohibitive. Transductive learning, which learns from labeled as well as from ...
Adaptive Transfer Learning on Graph Neural Networks
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningGraph neural networks (GNNs) is widely used to learn a powerful representation of graph-structured data. Recent work demonstrates that transferring knowledge from self-supervised tasks to downstream tasks could further improve graph representation. ...
Single network relational transductive learning
Relational classification on a single connected network has been of particular interest in the machine learning and data mining communities in the last decade or so. This is mainly due to the explosion in popularity of social networking sites such as ...





Comments