Abstract
The complexity of systems is considered an obstacle to the progress of the IT industry. Autonomic computing is presented as the alternative to cope with the growing complexity. It is a holistic approach, in which the systems are able to configure, heal, optimize, and protect by themselves. Web-based applications are an example of systems where the complexity is high. The number of components, their interoperability, and workload variations are factors that may lead to performance failures or unavailability scenarios. The occurrence of these scenarios affects the revenue and reputation of businesses that rely on these types of applications.
In this article, we present a self-healing framework for Web-based applications (SHõWA). SHõWA is composed by several modules, which monitor the application, analyze the data to detect and pinpoint anomalies, and execute recovery actions autonomously. The monitoring is done by a small aspect-oriented programming agent. This agent does not require changes to the application source code and includes adaptive and selective algorithms to regulate the level of monitoring. The anomalies are detected and pinpointed by means of statistical correlation. The data analysis detects changes in the server response time and analyzes if those changes are correlated with the workload or are due to a performance anomaly. In the presence of performance anomalies, the data analysis pinpoints the anomaly. Upon the pinpointing of anomalies, SHõWA executes a recovery procedure. We also present a study about the detection and localization of anomalies, the accuracy of the data analysis, and the performance impact induced by SHõWA. Two benchmarking applications, exercised through dynamic workloads, and different types of anomaly were considered in the study. The results reveal that (1) the capacity of SHõWA to detect and pinpoint anomalies while the number of end users affected is low; (2) SHõWA was able to detect anomalies without raising any false alarm; and (3) SHõWA does not induce a significant performance overhead (throughput was affected in less than 1%, and the response time delay was no more than 2 milliseconds).
- Aberdeen Group. 2010. Web Performance Today. http://www.webperformancetoday.com/2010/06/15/everything-you-wanted-to-know-about-web-performance.Google Scholar
- AppInternals/SteelCentral. 2014. Riverbed Application Performance Management. http://www.riverbed.com/products/performance-management-control/application-performance-management.Google Scholar
- Applications Manager. 2014. Application performance monitoring tool. http://www.manageengine.com/products/applications_manager/.Google Scholar
- Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. 2001. Fail-stutter fault tolerance. In Proceedings of the 8th Workshop on Hot Topics in Operating Systems. 33--38. Google Scholar
Digital Library
- Paul Barham, Rebecca Isaacs, Richard Mortier, and Dushyanth Narayanan. 2003. Magpie: Online modelling and performance-aware systems. In Proceedings of the 9th Conference on Hot Topics in Operating Systems. 15. Google Scholar
Digital Library
- Umesh Bellur and Amar Agrawal. 2007. Root cause isolation for self healing in J2EE environments. In Proceedings of the 1st International Conference on Self-Adaptive and Self-Organizing Systems. IEEE, Los Alamitos, CA, 324--327. DOI: http://dx.doi.org/10.1109/SASO.2007.46 Google Scholar
Digital Library
- Peter Bodíc, Greg Friedman, Lukas Biewald, Helen Levine, George Candea, Kayur Patel, Gilman Tolle, Jonathan Hui, Armando Fox, Michael I. Jordan, and David A. Patterson. 2005. Combining visualization and statistical analysis to improve operator confidence and efficiency for failure detection and localization. In Proceedings of the International Conference on Autonomic Computing. IEEE, Los Alamitos, CA, 89--100. Google Scholar
Digital Library
- George Candea, Mauricio Delgado, Michael Chen, and Armando Fox. 2003. Automatic failure-path inference: A generic introspection technique for Internet applications. In Proceedings of the 3rd IEEE Workshop on Internet Applications. 132. Google Scholar
Digital Library
- George Candea, Emre Kiciman, Shinichi Kawamoto, and Armando Fox. 2006. Autonomous recovery in componentized Internet applications. Cluster Computing 9, 2, 175--190. Google Scholar
Digital Library
- Antonio Carzaniga, Alessandra Gorla, Nicolò Perino, and Mauro Pezzè. 2010. Automatic workarounds for Web applications. In Proceedings of the ACM SIGSOFT 18th Symposium on Foundations of Software Engineering (SIGSOFT 2010/FSE-18). 237--246. Google Scholar
Digital Library
- Antonio Carzaniga, Alessandra Gorla, and Mauro Pezze. 2008. Healing Web applications through automatic workarounds. International Journal on Software Tools for Technology Transfer 10, 6, 493--502. Google Scholar
Digital Library
- Emmanuel Cecchet, Julie Marguerite, and Willy Zwaenepoel. 2002. Performance and scalability of EJB applications. ACM SIGPLAN Notices 37, 11, 246--261. Google Scholar
Digital Library
- Mike Y. Chen, Emre Kiciman, Eugene Fratkin, Armando Fox, and Eric Brewer. 2002. Pinpoint: Problem determination in large, dynamic Internet services. In Proceedings of the International Conference on Dependable Systems and Networks. 595--604. Google Scholar
Digital Library
- Ludmila Cherkasova, Kivanc M. Ozonat, Ningfang Mi, Julie Symons, and Evgenia Smirni. 2008. Anomaly? application change? or workload change? towards automated detection of application performance anomaly and change. In Proceedings of the International Conference on Dependable Systems and Networks. 452--461.Google Scholar
- Jacob Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum.Google Scholar
- Alan G. Ganek and Thomas A. Corbi. 2003. The dawning of the autonomic computing era. IBM Systems Journal 42, 1, 5--18. Google Scholar
Digital Library
- Sachin Garg, Aad Van Moorsel, Kalyanaraman Vaidyanathan, and Kishor S. Trivedi. 1998. A methodology for detection and estimation of software aging. In Proceedings of the the 9th International Symposium on Software Reliability Engineering. 283--292. Google Scholar
Digital Library
- David Garlan, Shang-Wen Cheng, An-Cheng Huang, Bradley Schmerl, and Peter Steenkiste. 2004. Rainbow: Architecture-based self-adaptation with reusable infrastructure. Computer 37, 10, 46--54. Google Scholar
Digital Library
- Gomez. 2014. Compuware Application Performance Management solution. http://www.ndm.net/apm/Compuware/gomez.Google Scholar
- HP Operations Manager. 2011. Fault and Performance Monitoring. Available at http://www8.hp.com/us/en/software-solutions/software.html?compURI=1170678.Google Scholar
- IBM Tivoli Monitoring. Proactive Monitoring. Available at http://www-03.ibm.com/software/products/us/en/tivomoni/.Google Scholar
- Terence Kelly and Alex Zhang. 2006. Predicting Performance in Distributed Enterprise Applications. Technical Report HPL-2006-76. HP Laboratories, Palo Alto, CA.Google Scholar
- Emre Kiciman and Armando Fox. 2005. Detecting application-level failures in component-based Internet services. IEEE Transactions on Neural Networks 16, 5, 1027--1041. Google Scholar
Digital Library
- Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Maeda, Cristina Videira Lopes, Jean-Marc Loingtier, and John Irwin. 1997. Aspect-oriented programming. In Proceedings of the 11th European Conference on Object-Oriented Programming. 220--242.Google Scholar
Cross Ref
- Leslie Lamport, Robert Shostak, and Marshall Pease. 1982. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems 4, 3, 382--401. Google Scholar
Digital Library
- Junguo Li, Gang Huang, Jian Zou, and Hong Mei. 2007. Failure analysis of open source J2EE application servers. In Proceedings of the 7th International Conference on Quality Software. 198--208. Google Scholar
Digital Library
- Lei Li, Kalyanaraman Vaidyanathan, and Kishor S. Trivedi. 2002. An approach for estimation of software aging in a Web server. In Proceedings of the International Symposium on Empirical Software Engineering. 91--100. Google Scholar
Digital Library
- Greg Linden. 2006. Make Data Useful by Greg Linden, Amazon.com. Retrieved January 29, 2015, from http://www.scribd.com/doc/4970486/.Google Scholar
- David Mosberger and Tai Jin. 1998. Httperf—a tool for measuring Web server performance. In Proceedings of the 1st Workshop on Internet Server Performance. 59--67.Google Scholar
Digital Library
- Nagios. 2009. IT Infrastructure Monitoring. Available at http://www.nagios.org/.Google Scholar
- Oracle Glassfish Server. 2012. Java Application Servers. Available at http://www.oracle.com/technetwork/middleware/glassfish/.Google Scholar
- Barbara Pernici. 2008. Self-healing systems and Web services: The WS-Diamond approach. In Business Process Management Workshops. Lecture Notes in Business Information Processing, Vol. 17. Springer, 440--442.Google Scholar
- Soila Pertet and Priya Narasimhan. 2005. Causes of Failure in Web Applications. Technical Report. Parallel Data Lab, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
- Sean Power. 2010. Metrics 101: What to Watch. Retrieved January 29, 2015, from http://www.slideshare.net/bitcurrent/metrics-101.Google Scholar
- Andres J. Ramirez, David B. Knoester, Betty H. C. Cheng, and Philip K. McKinley. 2011. Plato: A genetic algorithm approach to run-time reconfiguration in autonomic computing systems. Cluster Computing 14, 3, 229--244. Google Scholar
Digital Library
- Carroll Rheem. 2010. Consumer response to travel site performance. In A PhoCusWright and Akamai WHITEPAPER.Google Scholar
- Nuno Rodrigues, Décio Sousa, and Luis Silva. 2008. A fault-injector tool to evaluate failure detectors in grid-services. In Making Grids Work. Number 978-0-387-78447-2. Springer, 261--271.Google Scholar
- Chris Schneider, Adam Barker, and Simon Dobson. 2014. A survey of self-healing systems frameworks. Software: Practice and Experience. DOI: http://dx.doi.org/10.1002/spe.2250Google Scholar
- Fred B. Schneider. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22, 4, 299--319. Google Scholar
Digital Library
- Onn Shehory. 2006. A self-healing approach to designing and deploying complex, distributed and concurrent software systems. In Programming Multi-Agent Systems. Lecture Notes in Computer Science, Vol. 4411. Springer, 3--13. Google Scholar
Digital Library
- Bojan Simic. 2010. Ten Areas That Are Changing Market Dynamic in Web Performance Management. Available at http://www.trac-research.com/web-performance/.Google Scholar
- Site 24x7. 2010. Website Monitoring. http://www.site24x7.com.Google Scholar
- Wayne D. Smith. 2001. TPC-W: Benchmarking an Ecommerce Solution. Available at http://www.tpc.org/tpcw/.Google Scholar
- Stress. Load and Stress Test Tool. http://linux.die.net/man/1/stress.Google Scholar
- Zabbix. 2010. Enterprise Monitoring Solution. http://www.zabbix.com/.Google Scholar
- Jerrold H. Zar. 1972. Significance testing of the Spearman rank correlation coefficient. Journal of the American Statistical Association 67, 339, 578--580.Google Scholar
Cross Ref
Index Terms
SHõWA: A Self-Healing Framework for Web-Based Applications
Recommendations
Performance evaluation for self-healing distributed services and fault detection mechanisms
Special issue: Performance modelling and evaluation of computer systemsDistributed applications, based on internet worked services, provide users with more flexible and varied services and developers with the ability to incorporate a vast array of services into their applications. Such applications are difficult to develop ...
Towards Reliability and Performance Prediction of Autonomic Systems with Self-Healing and Protection
ICCAC '14: Proceedings of the 2014 International Conference on Cloud and Autonomic ComputingAutonomic systems providing self-healing and self-protection capabilities have been proposed to efficiently automate rectification of system faults and recovery from malicious attacks. In fact, it becomes more and more difficult, labor-intensive, ...
A review on architecture and models for autonomic software systems
AbstractAutonomic computing was the term coined by IBM in 2001. The term autonomic computing was used to define the self-adaptable nature of the human body. According to IBM, the same self-adaptable feature was the need to be incorporated in the software ...






Comments