ABSTRACT
Vertical federated learning (VFL) is an effective paradigm of training the emerging cross-organizational (e.g., different corporations, companies and organizations) collaborative learning with privacy preserving. Stochastic gradient descent (SGD) methods are the popular choices for training VFL models because of the low per-iteration computation. However, existing SGD-based VFL algorithms are communication-expensive due to a large number of communication rounds. Meanwhile, most existing VFL algorithms use synchronous computation which seriously hamper the computation resource utilization in real-world applications. To address the challenges of communication and computation resource utilization, we propose an asynchronous stochastic quasi-Newton (AsySQN) framework for VFL, under which three algorithms, i.e. AsySQN-SGD, -SVRG and -SAGA, are proposed. The proposed AsySQN-type algorithms making descent steps scaled by approximate (without calculating the inverse Hessian matrix explicitly) Hessian information convergence much faster than SGD-based methods in practice and thus can dramatically reduce the number of communication rounds. Moreover, the adopted asynchronous computation can make better use of the computation resource. We theoretically prove the convergence rates of our proposed algorithms for strongly convex problems. Extensive numerical experiments on real-word datasets demonstrate the lower communication costs and better computation resource utilization of our algorithms compared with state-of-the-art VFL algorithms.
Supplemental Material
- Richard H Byrd, Samantha L Hansen, Jorge Nocedal, and Yoram Singer. 2016. A stochastic quasi-Newton method for large-scale optimization. SIAM Journal on Optimization, Vol. 26, 2 (2016), 1008--1031.Google Scholar
Digital Library
- Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, and Qiang Yang. 2019. SecureBoost: A Lossless Federated Learning Framework. arXiv preprint arXiv:1901.08755 (2019).Google Scholar
- Bryan Conroy and Paul Sajda. 2012. Fast, exact model selection and permutation testing for l2-regularized logistic regression. In AISTATS. 246--254.Google Scholar
- Zhiyuan Dang, Xiang Li, Bin Gu, Cheng Deng, and Heng Huang. 2020. Large-Scale Nonlinear AUC Maximization via Triply Stochastic Gradients. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).Google Scholar
Cross Ref
- Adrià Gascón, Phillipp Schoppmann, Borja Balle, Mariana Raykova, Jack Doerner, Samee Zahur, and David Evans. 2016. Secure Linear Regression on Vertically Partitioned Datasets. IACR Cryptology ePrint Archive, Vol. 2016 (2016), 892.Google Scholar
- Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang. 2016. Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Mathematical Programming, Vol. 155, 1--2 (2016), 267--305.Google Scholar
Digital Library
- Yanmin Gong, Yuguang Fang, and Yuanxiong Guo. 2016. Private data analytics on biomedical sensing data via distributed computation. IEEE/ACM transactions on computational biology and bioinformatics, Vol. 13, 3 (2016), 431--444.Google Scholar
Digital Library
- Bin Gu, Zhiyuan Dang, Xiang Li, and Heng Huang. 2020 a. Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2483--2493.Google Scholar
Digital Library
- Bin Gu, An Xu, Cheng Deng, and heng Huang. 2020 b. Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning. arXiv preprint arXiv:2008.06233 (2020).Google Scholar
- Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017).Google Scholar
- Yaochen Hu, Di Niu, Jianming Yang, and Shengping Zhou. 2019. FDML: A collaborative machine learning framework for distributed features. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2232--2240.Google Scholar
Digital Library
- Feihu Huang, Songcan Chen, and Heng Huang. 2019 a. Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization. In ICML. 2839--2848.Google Scholar
- Feihu Huang, Shangqian Gao, Jian Pei, and Heng Huang. 2019 b. Nonconvex zeroth-order stochastic admm methods with lower function query complexity. arXiv preprint arXiv:1907.13463 (2019).Google Scholar
- Feihu Huang, Shangqian Gao, Jian Pei, and Heng Huang. 2020. Accelerated zeroth-order momentum methods from mini to minimax optimization. arXiv preprint arXiv:2008.08170 (2020).Google Scholar
- Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2019. Advances and Open Problems in Federated Learning. arXiv preprint arXiv:1912.04977 (2019).Google Scholar
- Yang Liu, Yan Kang, Xinwei Zhang, Liping Li, Yong Cheng, Tianjian Chen, Mingyi Hong, and Qiang Yang. 2019 a. A communication efficient vertical federated learning framework. arXiv preprint arXiv:1912.11187 (2019).Google Scholar
- Yang Liu, Yingting Liu, Zhijie Liu, Junbo Zhang, Chuishi Meng, and Yu Zheng. 2019 b. Federated Forest. arXiv preprint arXiv:1905.10053 (2019).Google Scholar
- Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics. 1273--1282.Google Scholar
- Jorge Nocedal and Stephen Wright. 2006. Numerical optimization .Springer Science & Business Media.Google Scholar
- Xia Shen, Moudud Alam, Freddy Fikse, and Lars Rönnegård. 2013. A novel generalized ridge regression method for quantitative genetics. Genetics, Vol. 193, 4 (2013), 1255--1268.Google Scholar
Cross Ref
- Johan AK Suykens and Joos Vandewalle. 1999. Least squares support vector machine classifiers. Neural processing letters, Vol. 9, 3 (1999), 293--300.Google Scholar
Digital Library
- Runhua Xu, Nathalie Baracaldo, Yi Zhou, Ali Anwar, and Heiko Ludwig. 2019. Hybridalpha: An efficient approach for privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security. 13--23.Google Scholar
Digital Library
- Kai Yang, Tao Fan, Tianjian Chen, Yuanming Shi, and Qiang Yang. 2019 a. A Quasi-Newton Method Based Vertical Federated Learning Framework for Logistic Regression. arXiv preprint arXiv:1912.00513 (2019).Google Scholar
- Kai Yang, Tao Jiang, Yuanming Shi, and Zhi Ding. 2020 b. Federated learning via over-the-air computation. IEEE Transactions on Wireless Communications, Vol. 19, 3 (2020), 2022--2035.Google Scholar
Digital Library
- Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019 b. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), Vol. 10, 2 (2019), 12.Google Scholar
Digital Library
- Xu Yang, Cheng Deng, Kun Wei, Junchi Yan, and Wei Liu. 2020 a. Adversarial Learning for Robust Deep Clustering. Advances in Neural Information Processing Systems, Vol. 33 (2020).Google Scholar
- Gong-Duo Zhang, Shen-Yi Zhao, Hao Gao, and Wu-Jun Li. 2018. Feature-Distributed SVRG for High-Dimensional Linear Classification. arXiv preprint arXiv:1802.03604 (2018).Google Scholar
- Qingsong Zhang, Bin Gu, Cheng Deng, and Heng Huang. 2021 a. Secure Bilevel Asynchronous Vertical Federated Learning with Backward Updating. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 10896--10904.Google Scholar
- Qingsong Zhang, Feihu Huang, Cheng Deng, and Heng Huang. 2021 b. Faster Stochastic Quasi-Newton Methods. IEEE TNNLS (2021).Google Scholar
Index Terms
- AsySQN: Faster Vertical Federated Learning Algorithms with Better Computation Resource Utilization
Recommendations
Convergence of the BFGS Method for LC1 Convex Constrained Optimization
This paper proposes a BFGS-SQP method for linearly constrained optimization where the objective function $f$ is required only to have a Lipschitz gradient. The Karush--Kuhn--Tucker system of the problem is equivalent to a system of nonsmooth equations $...
Penalty Lagrangian Methods Via a Quasi-Newton Approach
For solving nonlinear programming problems we iteratively minimize the penalty Lagrangian developed by Hestenes, Powell, and Rockafellar with the multipliers estimated by solving nonnegatively constrained quadratic programming subproblems and a penalty ...
Convergence Rates of Quasi-Newton Algorithms for Some Nonsmooth Optimization Problems
In this paper we consider a class of nonsmooth optimization problems and investigate an algorithm which makes use of approximations of the derivative. We study a growth condition on the objective and various conditions on the step-sizes and the quasi-...





Comments