Abstract
This article studies synchronous online distributed software update, also known as rolling upgrade in DevOps, which in clouds upgrades software versions in virtual machine instances even when various failures may occur. The goal is to minimise completion time, availability degradation, and monetary cost for entire rolling upgrade by selecting proper parameters. For this goal, we propose a stochastic model and a novel optimisation method. We validate our approach to minimise the objectives through both experiments in Amazon Web Service (AWS) and simulations.
- Sameer Ajmani. 2004. Automatic Software Upgrades for Distributed Systems. Ph.D. thesis. MIT, Cambridge, MA. Google Scholar
Digital Library
- Sameer Ajmani, Barbara Liskov, and Liuba Shrira. 2006. Modular software upgrades for distributed systems. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP’06). Google Scholar
Digital Library
- J. Branke, K. Deb, K. Miettinen, and R. Slowinski. 2008. Multiobjective Optimization: Interactive and Evolutionary Approaches. Springer. Google Scholar
Digital Library
- Eric A. Brewer. 2000. Towards robust distributed systems (abstract). In Proceedings of the ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (PODC’00). ACM. Google Scholar
Digital Library
- Olivier Crameri, Nikola Knezevic, Dejan Kostic, Ricardo Bianchini, and Willy Zwaenepoel. 2007. Staged deployment in mirage, an integrated software upgrade testing and distribution system. In Proceedings of the Symposium on Operating Systems Principles (SOSP’07). ACM, 221--236. Google Scholar
Digital Library
- Vidroha Debroy and W. Eric Wong. 2009. Insights on fault interference for programs with multiple bugs. In Proceedings of the IEEE International Symposium on Software Reliability Engineering (ISSRE’09). IEEE, 165--174. Google Scholar
Digital Library
- Catello Di Martino, Zbigniew Kalbarczyk, Ravishankar K. Iyer, Geetika Goel, Santonu Sarkar, and Rajeshwari Ganesan. 2014. Characterization of operational failures from a business data processing SaaS platform. In Companion Proceedings of the 36th International Conference on Software Engineering (ICSE Companion 2014). ACM, New York, NY, 195--204. Google Scholar
Digital Library
- Tudor Dumitraş and Priya Narasimhan. 2009. Why do upgrades fail and what can we do about it?: Toward dependable, online upgrades in enterprise system. In Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware. Springer, 1–20. Google Scholar
Digital Library
- Tudor Dumitraş, Priya Narasimhan, and Eli Tilevich. 2010. To upgrade or not to upgrade: Impact of online upgrades across multiple administrative domains. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’10). ACM, 865--876. Google Scholar
Digital Library
- J.-M. Fourneau, N. Pekergin, and S. Younès. 2007. Censoring Markov chains and stochastic bounds. In Formal Methods and Stochastic Models for Performance Evaluation, Katinka Wolter (Ed.). Vol. 4748. Springer, 213--227. Google Scholar
Digital Library
- Vincent Gramoli, Len Bass, Alan Fekete, and Daniel Sun. 2016. Rollup: Non-disruptive rolling upgrade with fast consensus-based dynamic reconfigurations. IEEE Trans. Parallel Distrib. Syst. 27, 9 (9 2016), 2711--2724. Google Scholar
Digital Library
- Alexandru Iosup. 2013. IaaS cloud benchmarking: Approaches, challenges, and experience. In Proceedings of the HotTopics Conference. 1--2. Google Scholar
Digital Library
- Mansour Khelghatdoust, Vincent Gramoli, and Daniel Sun. 2016. GLAP: Distributed dynamic workload consolidation through gossip-based learning. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’16). IEEE, 80--90.Google Scholar
Cross Ref
- Yongkun Li, Patrick P. C. Lee, and John C. S. Lui. 2013. Stochastic analysis on RAID reliability for solid-state drives. In Proceedings of the IEEE International Symposium on Reliable Distributed Systems (SRDS’13). 71--80. Google Scholar
Digital Library
- Ripal Nathuji, Aman Kansal, and Alireza Ghaffarkhah. 2010. Q-clouds: Managing performance interference effects for QoS-aware clouds. In Proceedings of the European Conference on Computer Systems (EuroSys’10). ACM, 237--250. Google Scholar
Digital Library
- NIST. 2014. Common Vulnerability Scoring System (CVSS). Retrieved from: http://nvd.nist.gov/cvss.cfm.Google Scholar
- Roberto Pietrantuono, Stefano Russo, and Kishor S. Trivedi. 2010. Software reliability and testing time allocation: An architecture-based approach. IEEE Trans. Softw. Eng. 36, 3 (2010), 323--337. Google Scholar
Digital Library
- Vernon Rego. 1990. A band and bound technique for simple random algorithms. Prob. Eng. Inform. Sci. 4 (7 1990), 333--344. Issue 03.Google Scholar
- Vernon Rego. 1992. Naive asymptotics for hitting time bounds in Markov chains. ACTA Inform. 29 (1992), 579--594. Google Scholar
Digital Library
- D. Sun, L. Bass, A. Fekete, V. Gramoli, A. Tran, S. Xu, and L. Zhu. 2014. Quantifying failure risk of version switch for rolling upgrade on clouds. In Proceedings of the International Conference on Big Data and Cloud Computing. Google Scholar
Digital Library
- D. Sun, A. Fekete, V. Gramoli, G. Li, X. Xu, and L. Zhu. 2018. R2C: Robust rolling-upgrade in clouds. IEEE Trans. Depend. Sec. Comput. 15, 5 (9 2018), 811--823.Google Scholar
Cross Ref
- D. Sun, M. Fu, L. Zhu, G. Li, and Q. Lu. 2016. Non-intrusive anomaly detection with streaming performance metrics and logs for DevOps in public clouds: A case study in AWS. IEEE Trans. Emerg. Topics Comput. 4, 2 (June 2016), 278--289.Google Scholar
Cross Ref
- D. Sun, Daniel Guimarans, A. Fekete, V. Gramoli, and L. Zhu. 2015. Multi-objective optimisation for rolling upgrade allowing for failures in clouds. In Proceedings of the IEEE Symposium on Reliable Distributed Systems (SRDS’15). Google Scholar
Digital Library
- Daniel Sun, Guoqiang Li, Yuanyuan Zhang, Liming Zhu, and Raj Gaire. 2019. Statistically managing cloud operations for latency-tail-tolerance in IoT-enabled smart cities. J. Parallel and Distrib. Comput. 127 (Mar. 2019), 184–195.Google Scholar
- Wei Sun, Yuanyuan Zhang, Chen Yu, Xavier Défago, and Yasushi Inoguchi. 2007. Hybrid overloading and stochastic analysis for redundant real-time multiprocessor systems. In Proceedings of the IEEE International Symposium on Reliable Distributed Systems (SRDS’07). 265--274. Google Scholar
Digital Library
- J. Tian, S. Rudraraju, and Zhao Li. 2004. Evaluating Web software reliability based on workload and failure data extracted from server logs. IEEE Trans. Softw. Eng. 30, 11 (Nov. 2004), 754--769. Google Scholar
Digital Library
- Kishor S. Trivedi. 2001. Probability and Statistics with Reliability, Queueing, and Computer Science Applications. John Wiley and Sons. Google Scholar
Digital Library
- Xiwei Xu, Liming Zhu, Len Bass, Ingo Weber, and Daniel Sun. 2014. POD-diagnosis: Error diagnosis of sporadic operations on cloud applications. In Proceedings of the Conference on Dependable Systems and Networks (DSN’14). Google Scholar
Digital Library
- Bastian Zimmer, Christoph Dropmann, and Jochen Ulrich Hanger. 2014. A systematic approach for software interference analysis. In Proceedings of the IEEE International Symposium on Software Reliability Engineering (ISSRE’14). IEEE, 78--87. Google Scholar
Digital Library
Index Terms
Multi-objective Optimisation of Online Distributed Software Update for DevOps in Clouds
Recommendations
Quantifying Failure Risk of Version Switch for Rolling Upgrade on Clouds
BDCLOUD '14: Proceedings of the 2014 IEEE Fourth International Conference on Big Data and Cloud ComputingRolling upgrade is an industry technique for online dynamic software update. A rolling upgrade updates a small number of instances in an old version to a new version at a time and the operation is repeated in a wave rolling until all of the instances ...
Multi-Objective Optimisation Problems: A Symbolic Algorithm for Performance Measurement of Evolutionary Computing Techniques
EMO '09: Proceedings of the 5th International Conference on Evolutionary Multi-Criterion OptimizationIn this paper, a symbolic algorithm for solving constrained multi-objective optimisation problems is proposed. It is used to get the Pareto optimal solutions as functions of KKT multipliers $\overrightarrow{\lambda}$ for multi-objective problems with ...
A Modified micro Genetic Algorithm for undertaking Multi-Objective Optimization Problems
Recent Advances in Soft Computing: Theories and ApplicationsIn this paper, a Modified micro Genetic Algorithm MmGA is proposed for undertaking Multi-objective Optimization Problems MOPs. An NSGA-II inspired elitism strategy and a population initialization strategy are embedded into the traditional micro Genetic ...






Comments