skip to main content
research-article
Public Access

On the Convergence Rate of Distributed Gradient Methods for Finite-Sum Optimization under Communication Delays

Authors Info & Claims
Published:19 December 2017Publication History
Skip Abstract Section

Abstract

Motivated by applications in machine learning and statistics, we study distributed optimization problems over a network of processors, where the goal is to optimize a global objective composed of a sum of local functions. In these problems, due to the large scale of the data sets, the data and computation must be distributed over processors resulting in the need for distributed algorithms. In this paper, we consider a popular distributed gradient-based consensus algorithm, which only requires local computation and communication. An important problem in this area is to analyze the convergence rate of such algorithms in the presence of communication delays that are inevitable in distributed systems. We prove the convergence of the gradient-based consensus algorithm in the presence of uniform, but possibly arbitrarily large, communication delays between the processors. Moreover, we obtain an upper bound on the rate of convergence of the algorithm as a function of the network size, topology, and the inter-processor communication delays.

References

  1. D. Bertsekas, A. Nedić, and A. Ozdaglar. 2004. Convex Analysis and Optimization. Cambridge, MA: Athena Scientific.Google ScholarGoogle Scholar
  2. V.D. Blondel, J.M. Hendrickx, A. Olshevsky, and J.N. Tsitsiklis. 2005. Convergence in multiagent coordination, consensus, and flocking. In Proceeding of the Joint 44th Conference on Decision and Control And European Control Conference. 2996--3000.Google ScholarGoogle Scholar
  3. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. 2011. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning 3, 1 (2011), 1--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Charalambous, Y. Yuan, T. Yang, W. Pan, C. N. Hadjicostis, and M. Johansson. 2015. Distributed Finite-Time Average Consensus in Digraphs in the Presence of Time Delays. IEEE Transactions on Control of Network Systems 2, 4 (Dec 2015), 370--381.Google ScholarGoogle ScholarCross RefCross Ref
  5. Y.C. Eldar D.P. Palomar. Dec. 2009. Convex Optimization in Signal Processing and Communications (1st ed.). Cambridge University Press.Google ScholarGoogle Scholar
  6. J.C. Duchi, A. Agarwal, and M.J. Wainwright. 2012. Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Transactions on Automatic Control 57, 3 (2012), 592--606.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. Li et. al. 2014. Scaling Distributed Machine Learning with the Parameter Server. In Operating Systems Design and Implementation (OSDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Gharesifard and J. Cortés. 2014. Distributed Continuous-Time Convex Optimization on Weight-Balanced Digraphs. IEEE Trans. Automat. Control 59, 3 (2014), 781--786.Google ScholarGoogle ScholarCross RefCross Ref
  9. J.K. Hale and S.M.V. Lunel. 1993. Introduction to Functional Diffential Equations. Vol. 99. Springer-Verlag.Google ScholarGoogle Scholar
  10. T. Hastie, T. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springe-Verlag, New York.Google ScholarGoogle Scholar
  11. R.A. Horn and C.R. Johnson. 1985. Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. K. Khalil. 2002. Nonlinear System (3rd ed.). Upper Saddle River, NJ: Prentice Hall.Google ScholarGoogle Scholar
  13. A. Makhdoumi and A. Ozdaglar. 2014. Broadcast-based distributed alternating direction method of multipliers. In 52nd Annual Allerton Conference on Communication, Control, and Computing. Monticello, IL.Google ScholarGoogle Scholar
  14. G. Meteos, J. Bazerque, and G. Giannakis. 2010. Distributed Sparse Linear Regression. IEEE Transactions on Signal Processing 58 (2010), 5262--5276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. U. Münz, A. Papachristodoulou, and F. Allgöwer. 2011. Consensus in Multi-Agent Systems With Coupling Delays and Switching Topology. IEEE Trans. Automat. Control 56, 12 (2011), 2976 -- 2982.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Nedič and A. Olshevsky. 2015. Distributed Optimization Over Time-Varying Directed Graphs. IEEE Trans. Automat. Control 60, 3 (2015), 601--615.Google ScholarGoogle ScholarCross RefCross Ref
  17. A. Nedič, A. Olshevsky, A. Ozdaglar, and J. N. Tsitsiklis. 2009. On Distributed Averaging Algorithms and Quantization Effect. IEEE Trans. Automat. Control 54, 11 (2009), 2506--2517.Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Nedíc, A. Olshevsky, and W. Shi. 2016. Achieving linear convergence for distributed optimization over time-varying and directed graphs. arXiv preprint: http://arxiv.org/pdf/1607.03218v1.pdf. (2016).Google ScholarGoogle Scholar
  19. A. Nedič and A. Ozdaglar. 2009. Distributed Subgradient Methods for Multi-Agent Optimization. IEEE Trans. Automat. Control 54, 1 (2009), 48--61.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. Nedič and A. Ozdaglar. 2010. Convergence rate for consensus with delays. Journal of Global Optimization 47, 3 (2010), 437'456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Nedič, A. Ozdaglar, and P. A. Parrilo. 2010. Constrained Consensus and Optimization in Multi-Agent Networks. IEEE Trans. Automat. Control 55, 4 (2010), 922--938.Google ScholarGoogle ScholarCross RefCross Ref
  22. Y. Nesterov. 2004. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Norwell, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Qu and N. Li. 2016. Harnessing Smoothness to Accelerate Distributed Optimization. arXiv preprint: https: //arxiv.org/pdf/1605.07112v1.pdf. (2016).Google ScholarGoogle Scholar
  24. S. Shalev-Shwartz and S. Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms (1st ed.). Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. W. Shi, Q. Ling, G. Wu, and W. Yin. 2014. On the Linear Convergence of the ADMM in Decentralized Consensus Optimization. IEEE Transactions on Signal Processing 62, 7 (2014), 1750--1761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. W. Shi, Q. Ling, G. Wu, and W. Yin. 2015. EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization. SIAM Journal on Optimization 25, 2 (2015), 944--966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Touri and B. Gharesifard. 2015. Continuous-time distributed convex optimization on time-varying directed networks. In IEEE 54th Annual Conference on Decision and Control (CDC). Japan.Google ScholarGoogle Scholar
  28. K.I. Tsianos, S. Lawlor, and M.G. Rabbat. 2012. Distributed dual averaging for convex optimization under communication delays. In Proc. of American Control Conference (ACC).Google ScholarGoogle Scholar
  29. K.I. Tsianos, S. Lawlor, and M.G. Rabbat. 2012. Push-Sum Distributed Dual Averaging for Convex Optimization. In Proc. of the 51st IEEE Conference on Decision and Control (CDC). Hawaii, USA.Google ScholarGoogle Scholar
  30. K.I. Tsianos and M.G. Rabbat. 2012. Consensus-Based Distributed Optimization: Practical Issues and Applications in Large-Scale Machine Learning. In Proc. of Allerton Conference on Communication, Control, and Computing.Google ScholarGoogle Scholar
  31. K.I. Tsianos and M.G. Rabbat. 2012. The Impact of Communication Delays on Distributed Consensus Algorithms. arXiv preprint: https://arxiv.org/pdf/1207.5839.pdf. (2012).Google ScholarGoogle Scholar
  32. E. Wei and A. Ozdaglar. 2013. On the O(1/k) convergence of asynchronous distributed alternating direction method of multipliers. arXiv preprint: https://arxiv.org/abs/1307.8254. (2013).Google ScholarGoogle Scholar

Index Terms

  1. On the Convergence Rate of Distributed Gradient Methods for Finite-Sum Optimization under Communication Delays

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!