Abstract
We consider unreliable distributed learning systems wherein the training data is kept confidential by external workers, and the learner has to interact closely with those workers to train a model. In particular, we assume that there exists a system adversary that can adaptively compromise some workers; the compromised workers deviate from their local designed specifications by sending out arbitrarily malicious messages. We assume in each communication round, up to q out of the m workers suffer Byzantine faults. Each worker keeps a local sample of size n and the total sample size is N=nm. We propose a secured variant of the gradient descent method that can tolerate up to a constant fraction of Byzantine workers, i.e., q/m = O(1). Moreover, we show the statistical estimation error of the iterates converges in O(log N) rounds to O(√/N + √/N ), where d is the model dimension. As long as q=O(d), our proposed algorithm achieves the optimal error rate O(√/N $. Our results are obtained under some technical assumptions. Specifically, we assume strongly-convex population risk. Nevertheless, the empirical risk (sample version) is allowed to be non-convex. The core of our method is to robustly aggregate the gradients computed by the workers based on the filtering procedure proposed by Steinhardt et al. On the technical front, deviating from the existing literature on robustly estimating a finite-dimensional mean vector, we establish a uniform concentration of the sample covariance matrix of gradients, and show that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function. To get a near-optimal uniform concentration bound, we develop a new matrix concentration inequality, which might be of independent interest.
- Radosław Adamczak, Alexander Litvak, Alain Pajor, and Nicole Tomczak-Jaegermann. 2010. Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles. Journal of the American Mathematical Society , Vol. 23, 2 (2010), 535--561.Google Scholar
Cross Ref
- Dan Alistarh, Zeyuan Allen-Zhu, and Jerry Li. 2018. Byzantine Stochastic Gradient Descent. arXiv preprint arXiv:1803.08917 (2018).Google Scholar
- Dimitri P Bertsekas and Athena Scientific. 2015. Convex optimization algorithms .Athena Scientific Belmont.Google Scholar
- Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Byzantine-Tolerant Machine Learning. arXiv preprint arXiv:1703.02757 (2017).Google Scholar
- Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et almbox. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning , Vol. 3, 1 (2011), 1--122. Google Scholar
Digital Library
- Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization .Cambridge university press. Google Scholar
Digital Library
- Moses Charikar, Jacob Steinhardt, and Gregory Valiant. 2017. Learning from Untrusted Data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC 2017). ACM, New York, NY, USA, 47--60. Google Scholar
Digital Library
- Yudong Chen, Lili Su, and Jiaming Xu. 2017. Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent. Proc. ACM Meas. Anal. Comput. Syst. , Vol. 1, 2, Article 44 (Dec. 2017), bibinfonumpages25 pages. Google Scholar
Digital Library
- I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart. 2016. Robust Estimators in High Dimensions without the Computational Intractability. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS) . 655--664.Google Scholar
- Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. 2017. Being Robust (in High Dimensions) Can Be Practical. CoRR , Vol. abs/1703.00893 (2017). arxiv: 1703.00893 http://arxiv.org/abs/1703.00893 Google Scholar
Digital Library
- Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Jacob Steinhardt, and Alistair Stewart. 2018. Sever: A Robust Meta-Algorithm for Stochastic Optimization. arXiv preprint arXiv:1803.02815 (2018).Google Scholar
- John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2014. Privacy Aware Learning. J. ACM , Vol. 61, 6, Article 38 (Dec. 2014), bibinfonumpages57 pages. Google Scholar
Digital Library
- Jiashi Feng, Huan Xu, and Shie Mannor. 2014. Distributed Robust Learning. arXiv preprint arXiv:1409.5937 (2014).Google Scholar
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction .Springer Series in Statistics.Google Scholar
- Peter J Huber. 2011. Robust statistics. International Encyclopedia of Statistical Science. Springer, 1248--1251.Google Scholar
- Adam Klivans, Pravesh K Kothari, and Raghu Meka. 2018. Efficient Algorithms for Outlier-Robust Regression. arXiv preprint arXiv:1803.03241 (2018).Google Scholar
- Jakub Konevc nỳ , Brendan McMahan, and Daniel Ramage. 2015. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575 (2015).Google Scholar
- Jakub Konev cný , H. Brendan McMahan, Felix X. Yu, Peter Richtarik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated Learning: Strategies for Improving Communication Efficiency. In NIPS Workshop on Private Multi-Party Machine Learning . https://arxiv.org/abs/1610.05492Google Scholar
- Kevin A Lai, Anup B Rao, and Santosh Vempala. 2016. Agnostic estimation of mean and covariance. In Foundations of Computer Science (FOCS), 2016 IEEE 57th Annual Symposium on. IEEE, 665--674.Google Scholar
Cross Ref
- Nancy A. Lynch. 1996. Distributed Algorithms .Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google Scholar
Digital Library
- Cong Ma, Kaizheng Wang, Yuejie Chi, and Yuxin Chen. 2017. Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution. arXiv preprint arXiv:1711.10467 (2017).Google Scholar
- Brendan McMahan and Daniel Ramage. 2017. Federated Learning: Collaborative Machine Learning without Centralized Training Data. https://research.googleblog.com/2017/04/federated-learning-collaborative.html . (April 2017). Accessed: 2017-04-06.Google Scholar
- Song Mei, Yu Bai, and Andrea Montanari. 2016. The landscape of empirical risk for non-convex losses. arXiv preprint arXiv:1607.06534 (2016).Google Scholar
- Sahand Negahban and Martin J Wainwright. 2011. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. Journal of Machine Learning Research , Vol. 13, 1 (2011), 1665--1697. Google Scholar
Digital Library
- Adarsh Prasad, Arun Sai Suggala, Sivaraman Balakrishnan, and Pradeep Ravikumar. 2018. Robust estimation via robust gradient estimation. arXiv preprint arXiv:1802.06485 (2018).Google Scholar
- Maxim Raginsky. {n. d.}. ECE 543: Statistical Learning Theory Bruce Hajek. ({n. d.}).Google Scholar
- Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms .Cambridge university press. Google Scholar
Digital Library
- Alex Smola and SVN Vishwanathan. 2008. Introduction to machine learning. Cambridge University, UK , Vol. 32 (2008), 34.Google Scholar
- Jacob Steinhardt, Moses Charikar, and Gregory Valiant. 2018. Resilience: A Criterion for Learning in the Presence of Arbitrary Outliers. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018) (Leibniz International Proceedings in Informatics (LIPIcs)), Anna R. Karlin (Ed.), Vol. 94. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 45:1--45:21.Google Scholar
- Lili Su. 2017. Defending distributed systems against adversarial attacks: Consensus, consensus-based learning, and statistical learning . Ph.D. Dissertation. University of Illinois at Urbana-Champaign.Google Scholar
- Lili Su and Nitin H. Vaidya. 2016. Fault-Tolerant Multi-Agent Optimization: Optimal Iterative Distributed Algorithms. In Proceedings of the 2016 ACM Symposium on Principles of Distributed Computing (PODC '16). ACM, New York, NY, USA, 425--434. Google Scholar
Digital Library
- T. Tao. 2012. Topics in random matrix theory .American Mathematical Society, Providence, RI, USA.Google Scholar
- Roman Vershynin. 2010. Introduction to the non-asymptotic analysis of random matrices. arXiv preprint arXiv:1011.3027 (2010).Google Scholar
- Roman Vershynin. 2012. How close is the sample covariance matrix to the actual covariance matrix? Journal of Theoretical Probability , Vol. 25, 3 (2012), 655--686.Google Scholar
Cross Ref
- Roman Vershynin. 2018. High-Dimensional Probability: An Introduction with Applications in Data Science .Cambridge university press.Google Scholar
- Martin Wainwright. 2015. Basic tail and concentration bounds. URl: https://www. stat. berkeley. edu/.../Chap2_TailBounds_Jan22_2015. pdf (visited on 12/31/2017) (2015).Google Scholar
- Yihong Wu. 2017. Lecture Notes on Information-theoretic Methods For High-dimensional Statistics. (April 2017). http://www.stat.yale.edu/ yw562/teaching/it-stats.pdf.Google Scholar
- Dong Yin, Yudong Chen, Kannan Ramchandran, and Peter Bartlett. 2018a. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates. arXiv preprint arXiv:1803.01498 (2018).Google Scholar
- Dong Yin, Yudong Chen, Kannan Ramchandran, and Peter Bartlett. 2018b. Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning. arXiv preprint arXiv:1806.05358 (2018).Google Scholar
- Yuchen Zhang, John C. Duchi, and Martin J. Wainwright. 2013. Communication-Efficient Algorithms for Statistical Optimization. Journal of Machine Learning Research , Vol. 14 (2013), 3321--3363. http://jmlr.org/papers/v14/zhang13b.html Google Scholar
Digital Library
Index Terms
Securing Distributed Gradient Descent in High Dimensional Statistical Learning
Recommendations
Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent
We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized ...
Securing Distributed Gradient Descent in High Dimensional Statistical Learning
SIGMETRICS '19: Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer SystemsWe consider unreliable distributed learning systems wherein the training data is kept confidential by external workers, and the learner has to interact closely with those workers to train a model. In particular, we assume that there exists a system ...
Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent
SIGMETRICS '18We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized ...






Comments