10.1145/3534678.3539086acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

No One Left Behind: Inclusive Federated Learning over Heterogeneous Devices

Published:14 August 2022Publication History

ABSTRACT

Federated learning (FL) is an important paradigm for training global models from decentralized data in a privacy-preserving way. Existing FL methods usually assume the global model can be trained on any participating client. However, in real applications, the devices of clients are usually heterogeneous, and have different computing power. Although big models like BERT have achieved huge success in AI, it is difficult to apply them to heterogeneous FL with weak clients. The straightforward solutions like removing the weak clients or using a small model to fit all clients will lead to some problems, such as under-representation of dropped clients and inferior accuracy due to data loss or limited model representation ability. In this work, we propose InclusiveFL, a client-inclusive federated learning method to handle this problem. The core idea of InclusiveFL is to assign models of different sizes to clients with different computing capabilities, bigger models for powerful clients and smaller ones for weak clients. We also propose an effective method to share the knowledge among local models with different sizes. In this way, all the clients can participate in FL training, and the final model can be big and powerful enough. Besides, we propose a momentum knowledge distillation method to better transfer knowledge in big models on powerful clients to the small models on weak clients. Extensive experiments on many real-world benchmark datasets demonstrate the effectiveness of InclusiveFL in learning accurate models from clients with heterogeneous devices under the FL framework.

Skip Supplemental Material Section

Supplemental Material

KDD22-apfp1078.mp4

Federated learning (FL) emerges as an important paradigm for training a machine learning model without exchanging decentralized data. FL paradigm typically assumes that all clients can share the same model structure. However, clients in the real world have different local capabilities, such as communication bandwidth, computation, or memory resources. The straightforward solution of dropping out weak clients causes fairness issues because the dropped clients will be under-represented. On the contrary, using a small global model to fit all clients' capabilities will result in poor performance. InclusiveFL solves the heterogeneous resource challenge in FL by assigning models of different sizes to clients with different resources. In addition, we propose a momentum knowledge distillation method to better transfer knowledge in large models on powerful clients to small models on weak clients. Extensive experiments on many real-world benchmark datasets demonstrate the effectiveness of InclusiveFL.

References

  1. Keith Bonawitz, Hubert Eichner, Wolfgang Grieskamp, Dzmitry Huba, Alex Ingerman, Vladimir Ivanov, Chloe Kiddon, Jakub Konecny, Stefano Mazzocchi, Brendan McMahan, et al. 2019. Towards federated learning at scale: System design. Proceedings of Machine Learning and Systems 1 (2019), 374--388.Google ScholarGoogle Scholar
  2. Nader Bouacida, Jiahui Hou, Hui Zang, and Xin Liu. 2021. Adaptive federated dropout: Improving communication efficiency and generalization for federated learning. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  3. Sebastian Caldas, Jakub Konecny, H Brendan McMahan, and Ameet Talwalkar. 2018. Expanding the reach of federated learning by reducing client resource requirements. arXiv preprint arXiv:1812.07210 (2018).Google ScholarGoogle Scholar
  4. Daniel Cera, Mona Diabb, Eneko Agirrec, Inigo Lopez-Gazpioc, Lucia Speciad, and Basque Country Donostia. 2017. SemEval-2017 Task 1 Semantic Textual Similarity Multilingual and Cross-lingual Focused Evaluation. (2017).Google ScholarGoogle Scholar
  5. Hongyan Chang, Virat Shejwalkar, Reza Shokri, and Amir Houmansadr. 2019. Cronus: Robust and heterogeneous collaborative learning with black-box knowledge transfer. arXiv preprint arXiv:1912.11279 (2019).Google ScholarGoogle Scholar
  6. Pierre Courtiol, Charles Maussion, Matahi Moarii, Elodie Pronier, Samuel Pilcer, Meriem Sefta, and Pierre Manceron. 2019. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nature Medicine 25, 10 (2019), 1519--1526.Google ScholarGoogle ScholarCross RefCross Ref
  7. Enmao Diao, Jie Ding, and Vahid Tarokh. 2021. HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3--7, 2021. OpenReview.net. https://openreview.net/forum?id=TNkPBBYFkXgGoogle ScholarGoogle Scholar
  8. Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, and Noah Smith. 2020. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. arXiv preprint arXiv:2002.06305 (2020).Google ScholarGoogle Scholar
  9. Bill Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Third International Workshop on Paraphrasing (IWP2005).Google ScholarGoogle Scholar
  10. Aritra Dutta, El Houcine Bergou, Ahmed M Abdelmoniem, Chen-Yu Ho, Atal Narayan Sahu, Marco Canini, and Panos Kalnis. 2020. On the discrepancy between the theoretical analysis and practical implementations of compressed communication for distributed deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 3817--3824.Google ScholarGoogle ScholarCross RefCross Ref
  11. Suyu Ge, Fangzhao Wu, Chuhan Wu, Tao Qi, Yongfeng Huang, and Xing Xie. 2020. FedNER: Privacy-preserving Medical Named Entity Recognition with Federated Learning. (2020). arXiv:2003.09288 https://arxiv.org/abs/2003.09288Google ScholarGoogle Scholar
  12. Harsha Gurulingappa, Abdul Mateen Rajput, Angus Roberts, Juliane Fluck, Martin Hofmann-Apitius, and Luca Toldo. 2012. Development of a Benchmark Corpus to Support the Automatic Extraction of Drug-Related Adverse Effects from Medical Case Reports. JBI 45 (Oct. 2012), 885--892. https://doi.org/10.1016/j.jbi.2012.04.008Google ScholarGoogle Scholar
  13. Weituo Hao, Mostafa El-Khamy, Jungwon Lee, Jianyi Zhang, Kevin J. Liang, Changyou Chen, and Lawrence Carin. 2021. Towards Fair Federated Learning With Zero-Shot Data Augmentation. In CVPR Workshops. 3310--3319.Google ScholarGoogle Scholar
  14. Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604 (2018).Google ScholarGoogle Scholar
  15. Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2, 7 (2015).Google ScholarGoogle Scholar
  16. Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. 2020. Dynabert: Dynamic bert with adaptive width and depth. NeurIPS 33 (2020), 9782--9793.Google ScholarGoogle Scholar
  17. Sarvnaz Karimi, Alejandro Metke-Jimenez, Madonna Kemp, and Chen Wang. 2015. Cadec: A Corpus of Adverse Drug Event Annotations. Journal of Biomedical Informatics 55 (June 2015), 73--81. https://doi.org/10.1016/j.jbi.2015.03.010Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Universal Language Model Fine-tuning for Text Classification (2018), 278.Google ScholarGoogle Scholar
  19. Jakub Konecny, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).Google ScholarGoogle Scholar
  20. Daliang Li and Junpu Wang. 2019. Fedmd: Heterogenous federated learning via model distillation. arXiv preprint arXiv:1910.03581 (2019).Google ScholarGoogle Scholar
  21. Tao Lin, Lingjing Kong, Sebastian U Stich, and Martin Jaggi. 2020. Ensemble distillation for robust model fusion in federated learning. NeurIPS 33 (2020), 2351--2363.Google ScholarGoogle Scholar
  22. Yujun Lin, Song Han, Huizi Mao, Yu Wang, and William J. Dally. 2020. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. arXiv:1712.01887 [cs.CV]Google ScholarGoogle Scholar
  23. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692 (2019).Google ScholarGoogle Scholar
  24. Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the Value of Network Pruning. In ICLR.Google ScholarGoogle Scholar
  25. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics. PMLR, 1273--1282.Google ScholarGoogle Scholar
  26. H Brendan McMahan et al. 2021. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14, 1 (2021).Google ScholarGoogle Scholar
  27. H Brendan McMahan, Daniel Ramage, Kunal Talwar, and Li Zhang. 2018. Learning Differentially Private Recurrent Language Models. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  28. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR) 54, 6 (2021), 1--35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Joseph Prusa, Taghi M. Khoshgoftaar, and Naeem Seliya. 2015. The Effect of Dataset Size on Training Tweet Sentiment Classifiers. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA). 96--102. https://doi.org/10.1109/ICMLA.2015.22Google ScholarGoogle Scholar
  30. Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822 (2018).Google ScholarGoogle Scholar
  31. Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konecny, Sanjiv Kumar, and H Brendan McMahan. 2020. Adaptive federated optimization. arXiv preprint arXiv:2003.00295 (2020).Google ScholarGoogle Scholar
  32. Hassan Sajjad, Fahim Dalvi, Nadir Durrani, and Preslav Nakov. 2020. On the effect of dropping layers of pre-trained transformer models. arXiv preprint arXiv:2004.03844 (2020).Google ScholarGoogle Scholar
  33. Felix Sattler, Simon Wiedemann, Klaus-Robert Müller, and Wojciech Samek. 2019. Robust and communication-efficient federated learning from non-iid data. TNNLS 31, 9 (2019), 3400--3413.Google ScholarGoogle ScholarCross RefCross Ref
  34. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing. 1631--1642.Google ScholarGoogle Scholar
  35. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 15, 1 (jan 2014), 1929--1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lichao Sun and Lingjuan Lyu. 2020. Federated model distillation with noise-free differential privacy. arXiv preprint arXiv:2009.05537 (2020).Google ScholarGoogle Scholar
  37. Blesson Varghese, Nan Wang, Sakil Barbhuiya, Peter Kilpatrick, and Dimitrios S Nikolopoulos. 2016. Challenges and opportunities in edge computing. In Smart- Cloud. IEEE, 20--26.Google ScholarGoogle Scholar
  38. Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).Google ScholarGoogle Scholar
  39. Hongyi Wang, Scott Sievert, Shengchao Liu, Zachary Charles, Dimitris Papailiopoulos, and Stephen Wright. 2018. Atomo: Communication-efficient learning via atomic sparsification. NeurIPS 31 (2018).Google ScholarGoogle Scholar
  40. Jianqiao Wangni, Jialei Wang, Ji Liu, and Tong Zhang. 2018. Gradient sparsification for communication-efficient distributed optimization. NeurIPS 31 (2018).Google ScholarGoogle Scholar
  41. Stefanie Warnat-Herresthal, Hartmut Schultze, Krishnaprasad Lingadahalli Shastry, Sathyanarayanan Manamohan, Saikat Mukherjee, Vishesh Garg, Ravi Sarveswara, Kristian Händler, Peter Pickkers, N Ahmad Aziz, et al. 2021. Swarm learning for decentralized and confidential clinical machine learning. Nature 594, 7862 (2021), 265--270.Google ScholarGoogle Scholar
  42. Alex Warstadt, Amanpreet Singh, and Samuel R Bowman. 2019. Neural network acceptability judgments. Transactions of the Association for Computational Linguistics 7 (2019), 625--641.Google ScholarGoogle ScholarCross RefCross Ref
  43. Davy Weissenbacher, Abeed Sarker, Arjun Magge, Ashlynn Daughton, Karen O'Connor, Michael J. Paul, and Graciela Gonzalez-Hernandez. 2019. Overview of the Fourth Social Media Mining for Health (SMM4H) Shared Tasks at ACL 2019. In Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task. 21--30. https://doi.org/10.18653/v1/W19--3203Google ScholarGoogle ScholarCross RefCross Ref
  44. Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. 1112--1122.Google ScholarGoogle ScholarCross RefCross Ref
  45. Chuhan Wu, Fangzhao Wu, Lingjuan Lyu, Yongfeng Huang, and Xing Xie. 2022. FedCTR: Federated Native Ad CTR Prediction with Cross Platform User Behavior Data. ACM Transactions on Intelligent Systems and Technology (TIST) (2022).Google ScholarGoogle Scholar
  46. Chuhan Wu, Fangzhao Wu, Yang Yu, Tao Qi, Yongfeng Huang, and Qi Liu. 2021. NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application. arXiv:2102.04887 [cs.CL]Google ScholarGoogle Scholar
  47. Chenhao Xu, Youyang Qu, Yong Xiang, and Longxiang Gao. 2021. Asynchronous federated learning on heterogeneous devices: A survey. arXiv preprint arXiv:2109.04269 (2021).Google ScholarGoogle Scholar
  48. Timothy Yang, Galen Andrew, Hubert Eichner, Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, and Françoise Beaufays. 2018. Applied federated learning: Improving google keyboard query suggestions. arXiv:1812.02903 (2018).Google ScholarGoogle Scholar

Index Terms

  1. No One Left Behind: Inclusive Federated Learning over Heterogeneous Devices

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!