ABSTRACT
Unavailability in distributed enterprise systems is usually the result of planned events, such as upgrades, rather than failures. Major system upgrades entail complex data conversions that are difficult to perform on the fly, in the face of live workloads. Minimizing the downtime imposed by such conversions is a time-intensive and error-prone manual process. We propose upgrades-as-a-service, a novel approach that can eliminate all the causes of planned downtime recorded during the upgrade history of one of the ten most popular websites. Building on the lessons learned from past research on live upgrades in middleware systems, upgrades-as-a-service trade off a need for additional hardware resources during the upgrade for the ability to perform end-to-end upgrades online, with minimal application-specific knowledge.
References
- N. Carvalho, A. C. Jr., J. Pereira, L. Rodrigues, R. Oliveira, and S. Guedes. On the use of a reflective architecture to augment database management systems. Journal of Universal Computer Science, 13(8):1110--1135, 2007.Google Scholar
- E. Cecchet, J. Marguerite, and W. Zwaenepoel. C-JDBC: Flexible database clustering middleware. In USENIX Annual Technical Conference, 2004. Google Scholar
Digital Library
- T. Dumitraş and P. Narasimhan. No downtime for data conversions: Rethinking hot upgrades. Technical Report CMU-PDL-09-106, Carnegie Mellon University, 2009.Google Scholar
- T. Dumitraş and P. Narasimhan. Why do upgrades fail and what can we do about it? toward dependable, online upgrades in enterprise systems. In ACM/IEEE/IFIP Middleware Conference, pages 349--372, Urbana-Champaign, IL, Nov/Dec 2009. Google Scholar
Digital Library
- L. Moser, P. Melliar-Smith, P. Narasimhan, L. Tewksbury, and V. Kalogeraki. Eternal: fault tolerance and live upgrades for distributed object systems. In Information Survivability Conference and Exposition, pages 184--196, Hilton Head, SC, Jan 2000.Google Scholar
Index Terms
Toward upgrades-as-a-service in distributed systems




Comments