skip to main content
research-article

Multi-objective Optimisation of Online Distributed Software Update for DevOps in Clouds

Published:27 August 2019Publication History
Skip Abstract Section

Abstract

This article studies synchronous online distributed software update, also known as rolling upgrade in DevOps, which in clouds upgrades software versions in virtual machine instances even when various failures may occur. The goal is to minimise completion time, availability degradation, and monetary cost for entire rolling upgrade by selecting proper parameters. For this goal, we propose a stochastic model and a novel optimisation method. We validate our approach to minimise the objectives through both experiments in Amazon Web Service (AWS) and simulations.

References

  1. Sameer Ajmani. 2004. Automatic Software Upgrades for Distributed Systems. Ph.D. thesis. MIT, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sameer Ajmani, Barbara Liskov, and Liuba Shrira. 2006. Modular software upgrades for distributed systems. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Branke, K. Deb, K. Miettinen, and R. Slowinski. 2008. Multiobjective Optimization: Interactive and Evolutionary Approaches. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eric A. Brewer. 2000. Towards robust distributed systems (abstract). In Proceedings of the ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC’00). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Olivier Crameri, Nikola Knezevic, Dejan Kostic, Ricardo Bianchini, and Willy Zwaenepoel. 2007. Staged deployment in mirage, an integrated software upgrade testing and distribution system. In Proceedings of the Symposium on Operating Systems Principles (SOSP’07). ACM, 221--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Vidroha Debroy and W. Eric Wong. 2009. Insights on fault interference for programs with multiple bugs. In Proceedings of the IEEE International Symposium on Software Reliability Engineering (ISSRE’09). IEEE, 165--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Catello Di Martino, Zbigniew Kalbarczyk, Ravishankar K. Iyer, Geetika Goel, Santonu Sarkar, and Rajeshwari Ganesan. 2014. Characterization of operational failures from a business data processing SaaS platform. In Companion Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, New York, NY, 195--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tudor Dumitraş and Priya Narasimhan. 2009. Why do upgrades fail and what can we do about it?: Toward dependable, online upgrades in enterprise system. In Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware. Springer, 1–20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tudor Dumitraş, Priya Narasimhan, and Eli Tilevich. 2010. To upgrade or not to upgrade: Impact of online upgrades across multiple administrative domains. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’10). ACM, 865--876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J.-M. Fourneau, N. Pekergin, and S. Younès. 2007. Censoring Markov chains and stochastic bounds. In Formal Methods and Stochastic Models for Performance Evaluation, Katinka Wolter (Ed.). Vol. 4748. Springer, 213--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Vincent Gramoli, Len Bass, Alan Fekete, and Daniel Sun. 2016. Rollup: Non-disruptive rolling upgrade with fast consensus-based dynamic reconfigurations. IEEE Trans. Parallel Distrib. Syst. 27, 9 (9 2016), 2711--2724. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alexandru Iosup. 2013. IaaS cloud benchmarking: Approaches, challenges, and experience. In Proceedings of the HotTopics Conference. 1--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mansour Khelghatdoust, Vincent Gramoli, and Daniel Sun. 2016. GLAP: Distributed dynamic workload consolidation through gossip-based learning. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’16). IEEE, 80--90.Google ScholarGoogle ScholarCross RefCross Ref
  14. Yongkun Li, Patrick P. C. Lee, and John C. S. Lui. 2013. Stochastic analysis on RAID reliability for solid-state drives. In Proceedings of the IEEE International Symposium on Reliable Distributed Systems (SRDS’13). 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. 2010. Q-clouds: Managing performance interference effects for QoS-aware clouds. In Proceedings of the European Conference on Computer Systems (EuroSys’10). ACM, 237--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. NIST. 2014. Common Vulnerability Scoring System (CVSS). Retrieved from: http://nvd.nist.gov/cvss.cfm.Google ScholarGoogle Scholar
  17. Roberto Pietrantuono, Stefano Russo, and Kishor S. Trivedi. 2010. Software reliability and testing time allocation: An architecture-based approach. IEEE Trans. Softw. Eng. 36, 3 (2010), 323--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Vernon Rego. 1990. A band and bound technique for simple random algorithms. Prob. Eng. Inform. Sci. 4 (7 1990), 333--344. Issue 03.Google ScholarGoogle Scholar
  19. Vernon Rego. 1992. Naive asymptotics for hitting time bounds in Markov chains. ACTA Inform. 29 (1992), 579--594. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Sun, L. Bass, A. Fekete, V. Gramoli, A. Tran, S. Xu, and L. Zhu. 2014. Quantifying failure risk of version switch for rolling upgrade on clouds. In Proceedings of the International Conference on Big Data and Cloud Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Sun, A. Fekete, V. Gramoli, G. Li, X. Xu, and L. Zhu. 2018. R2C: Robust rolling-upgrade in clouds. IEEE Trans. Depend. Sec. Comput. 15, 5 (9 2018), 811--823.Google ScholarGoogle ScholarCross RefCross Ref
  22. D. Sun, M. Fu, L. Zhu, G. Li, and Q. Lu. 2016. Non-intrusive anomaly detection with streaming performance metrics and logs for DevOps in public clouds: A case study in AWS. IEEE Trans. Emerg. Topics Comput. 4, 2 (June 2016), 278--289.Google ScholarGoogle ScholarCross RefCross Ref
  23. D. Sun, Daniel Guimarans, A. Fekete, V. Gramoli, and L. Zhu. 2015. Multi-objective optimisation for rolling upgrade allowing for failures in clouds. In Proceedings of the IEEE Symposium on Reliable Distributed Systems (SRDS’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Daniel Sun, Guoqiang Li, Yuanyuan Zhang, Liming Zhu, and Raj Gaire. 2019. Statistically managing cloud operations for latency-tail-tolerance in IoT-enabled smart cities. J. Parallel and Distrib. Comput. 127 (Mar. 2019), 184–195.Google ScholarGoogle Scholar
  25. Wei Sun, Yuanyuan Zhang, Chen Yu, Xavier Défago, and Yasushi Inoguchi. 2007. Hybrid overloading and stochastic analysis for redundant real-time multiprocessor systems. In Proceedings of the IEEE International Symposium on Reliable Distributed Systems (SRDS’07). 265--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Tian, S. Rudraraju, and Zhao Li. 2004. Evaluating Web software reliability based on workload and failure data extracted from server logs. IEEE Trans. Softw. Eng. 30, 11 (Nov. 2004), 754--769. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kishor S. Trivedi. 2001. Probability and Statistics with Reliability, Queueing, and Computer Science Applications. John Wiley and Sons. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xiwei Xu, Liming Zhu, Len Bass, Ingo Weber, and Daniel Sun. 2014. POD-diagnosis: Error diagnosis of sporadic operations on cloud applications. In Proceedings of the Conference on Dependable Systems and Networks (DSN’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Bastian Zimmer, Christoph Dropmann, and Jochen Ulrich Hanger. 2014. A systematic approach for software interference analysis. In Proceedings of the IEEE International Symposium on Software Reliability Engineering (ISSRE’14). IEEE, 78--87. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-objective Optimisation of Online Distributed Software Update for DevOps in Clouds

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!