skip to main content
research-article

SHõWA: A Self-Healing Framework for Web-Based Applications

Published:09 March 2015Publication History
Skip Abstract Section

Abstract

The complexity of systems is considered an obstacle to the progress of the IT industry. Autonomic computing is presented as the alternative to cope with the growing complexity. It is a holistic approach, in which the systems are able to configure, heal, optimize, and protect by themselves. Web-based applications are an example of systems where the complexity is high. The number of components, their interoperability, and workload variations are factors that may lead to performance failures or unavailability scenarios. The occurrence of these scenarios affects the revenue and reputation of businesses that rely on these types of applications.

In this article, we present a self-healing framework for Web-based applications (SHõWA). SHõWA is composed by several modules, which monitor the application, analyze the data to detect and pinpoint anomalies, and execute recovery actions autonomously. The monitoring is done by a small aspect-oriented programming agent. This agent does not require changes to the application source code and includes adaptive and selective algorithms to regulate the level of monitoring. The anomalies are detected and pinpointed by means of statistical correlation. The data analysis detects changes in the server response time and analyzes if those changes are correlated with the workload or are due to a performance anomaly. In the presence of performance anomalies, the data analysis pinpoints the anomaly. Upon the pinpointing of anomalies, SHõWA executes a recovery procedure. We also present a study about the detection and localization of anomalies, the accuracy of the data analysis, and the performance impact induced by SHõWA. Two benchmarking applications, exercised through dynamic workloads, and different types of anomaly were considered in the study. The results reveal that (1) the capacity of SHõWA to detect and pinpoint anomalies while the number of end users affected is low; (2) SHõWA was able to detect anomalies without raising any false alarm; and (3) SHõWA does not induce a significant performance overhead (throughput was affected in less than 1%, and the response time delay was no more than 2 milliseconds).

References

  1. Aberdeen Group. 2010. Web Performance Today. http://www.webperformancetoday.com/2010/06/15/everything-you-wanted-to-know-about-web-performance.Google ScholarGoogle Scholar
  2. AppInternals/SteelCentral. 2014. Riverbed Application Performance Management. http://www.riverbed.com/products/performance-management-control/application-performance-management.Google ScholarGoogle Scholar
  3. Applications Manager. 2014. Application performance monitoring tool. http://www.manageengine.com/products/applications_manager/.Google ScholarGoogle Scholar
  4. Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. 2001. Fail-stutter fault tolerance. In Proceedings of the 8th Workshop on Hot Topics in Operating Systems. 33--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Paul Barham, Rebecca Isaacs, Richard Mortier, and Dushyanth Narayanan. 2003. Magpie: Online modelling and performance-aware systems. In Proceedings of the 9th Conference on Hot Topics in Operating Systems. 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Umesh Bellur and Amar Agrawal. 2007. Root cause isolation for self healing in J2EE environments. In Proceedings of the 1st International Conference on Self-Adaptive and Self-Organizing Systems. IEEE, Los Alamitos, CA, 324--327. DOI: http://dx.doi.org/10.1109/SASO.2007.46 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Peter Bodíc, Greg Friedman, Lukas Biewald, Helen Levine, George Candea, Kayur Patel, Gilman Tolle, Jonathan Hui, Armando Fox, Michael I. Jordan, and David A. Patterson. 2005. Combining visualization and statistical analysis to improve operator confidence and efficiency for failure detection and localization. In Proceedings of the International Conference on Autonomic Computing. IEEE, Los Alamitos, CA, 89--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. George Candea, Mauricio Delgado, Michael Chen, and Armando Fox. 2003. Automatic failure-path inference: A generic introspection technique for Internet applications. In Proceedings of the 3rd IEEE Workshop on Internet Applications. 132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. George Candea, Emre Kiciman, Shinichi Kawamoto, and Armando Fox. 2006. Autonomous recovery in componentized Internet applications. Cluster Computing 9, 2, 175--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Antonio Carzaniga, Alessandra Gorla, Nicolò Perino, and Mauro Pezzè. 2010. Automatic workarounds for Web applications. In Proceedings of the ACM SIGSOFT 18th Symposium on Foundations of Software Engineering (SIGSOFT 2010/FSE-18). 237--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Antonio Carzaniga, Alessandra Gorla, and Mauro Pezze. 2008. Healing Web applications through automatic workarounds. International Journal on Software Tools for Technology Transfer 10, 6, 493--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Emmanuel Cecchet, Julie Marguerite, and Willy Zwaenepoel. 2002. Performance and scalability of EJB applications. ACM SIGPLAN Notices 37, 11, 246--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mike Y. Chen, Emre Kiciman, Eugene Fratkin, Armando Fox, and Eric Brewer. 2002. Pinpoint: Problem determination in large, dynamic Internet services. In Proceedings of the International Conference on Dependable Systems and Networks. 595--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ludmila Cherkasova, Kivanc M. Ozonat, Ningfang Mi, Julie Symons, and Evgenia Smirni. 2008. Anomaly? application change? or workload change? towards automated detection of application performance anomaly and change. In Proceedings of the International Conference on Dependable Systems and Networks. 452--461.Google ScholarGoogle Scholar
  15. Jacob Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum.Google ScholarGoogle Scholar
  16. Alan G. Ganek and Thomas A. Corbi. 2003. The dawning of the autonomic computing era. IBM Systems Journal 42, 1, 5--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Sachin Garg, Aad Van Moorsel, Kalyanaraman Vaidyanathan, and Kishor S. Trivedi. 1998. A methodology for detection and estimation of software aging. In Proceedings of the the 9th International Symposium on Software Reliability Engineering. 283--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. David Garlan, Shang-Wen Cheng, An-Cheng Huang, Bradley Schmerl, and Peter Steenkiste. 2004. Rainbow: Architecture-based self-adaptation with reusable infrastructure. Computer 37, 10, 46--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gomez. 2014. Compuware Application Performance Management solution. http://www.ndm.net/apm/Compuware/gomez.Google ScholarGoogle Scholar
  20. HP Operations Manager. 2011. Fault and Performance Monitoring. Available at http://www8.hp.com/us/en/software-solutions/software.html?compURI=1170678.Google ScholarGoogle Scholar
  21. IBM Tivoli Monitoring. Proactive Monitoring. Available at http://www-03.ibm.com/software/products/us/en/tivomoni/.Google ScholarGoogle Scholar
  22. Terence Kelly and Alex Zhang. 2006. Predicting Performance in Distributed Enterprise Applications. Technical Report HPL-2006-76. HP Laboratories, Palo Alto, CA.Google ScholarGoogle Scholar
  23. Emre Kiciman and Armando Fox. 2005. Detecting application-level failures in component-based Internet services. IEEE Transactions on Neural Networks 16, 5, 1027--1041. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Maeda, Cristina Videira Lopes, Jean-Marc Loingtier, and John Irwin. 1997. Aspect-oriented programming. In Proceedings of the 11th European Conference on Object-Oriented Programming. 220--242.Google ScholarGoogle ScholarCross RefCross Ref
  25. Leslie Lamport, Robert Shostak, and Marshall Pease. 1982. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems 4, 3, 382--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Junguo Li, Gang Huang, Jian Zou, and Hong Mei. 2007. Failure analysis of open source J2EE application servers. In Proceedings of the 7th International Conference on Quality Software. 198--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lei Li, Kalyanaraman Vaidyanathan, and Kishor S. Trivedi. 2002. An approach for estimation of software aging in a Web server. In Proceedings of the International Symposium on Empirical Software Engineering. 91--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Greg Linden. 2006. Make Data Useful by Greg Linden, Amazon.com. Retrieved January 29, 2015, from http://www.scribd.com/doc/4970486/.Google ScholarGoogle Scholar
  29. David Mosberger and Tai Jin. 1998. Httperf—a tool for measuring Web server performance. In Proceedings of the 1st Workshop on Internet Server Performance. 59--67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nagios. 2009. IT Infrastructure Monitoring. Available at http://www.nagios.org/.Google ScholarGoogle Scholar
  31. Oracle Glassfish Server. 2012. Java Application Servers. Available at http://www.oracle.com/technetwork/middleware/glassfish/.Google ScholarGoogle Scholar
  32. Barbara Pernici. 2008. Self-healing systems and Web services: The WS-Diamond approach. In Business Process Management Workshops. Lecture Notes in Business Information Processing, Vol. 17. Springer, 440--442.Google ScholarGoogle Scholar
  33. Soila Pertet and Priya Narasimhan. 2005. Causes of Failure in Web Applications. Technical Report. Parallel Data Lab, Carnegie Mellon University, Pittsburgh, PA.Google ScholarGoogle Scholar
  34. Sean Power. 2010. Metrics 101: What to Watch. Retrieved January 29, 2015, from http://www.slideshare.net/bitcurrent/metrics-101.Google ScholarGoogle Scholar
  35. Andres J. Ramirez, David B. Knoester, Betty H. C. Cheng, and Philip K. McKinley. 2011. Plato: A genetic algorithm approach to run-time reconfiguration in autonomic computing systems. Cluster Computing 14, 3, 229--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Carroll Rheem. 2010. Consumer response to travel site performance. In A PhoCusWright and Akamai WHITEPAPER.Google ScholarGoogle Scholar
  37. Nuno Rodrigues, Décio Sousa, and Luis Silva. 2008. A fault-injector tool to evaluate failure detectors in grid-services. In Making Grids Work. Number 978-0-387-78447-2. Springer, 261--271.Google ScholarGoogle Scholar
  38. Chris Schneider, Adam Barker, and Simon Dobson. 2014. A survey of self-healing systems frameworks. Software: Practice and Experience. DOI: http://dx.doi.org/10.1002/spe.2250Google ScholarGoogle Scholar
  39. Fred B. Schneider. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22, 4, 299--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Onn Shehory. 2006. A self-healing approach to designing and deploying complex, distributed and concurrent software systems. In Programming Multi-Agent Systems. Lecture Notes in Computer Science, Vol. 4411. Springer, 3--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Bojan Simic. 2010. Ten Areas That Are Changing Market Dynamic in Web Performance Management. Available at http://www.trac-research.com/web-performance/.Google ScholarGoogle Scholar
  42. Site 24x7. 2010. Website Monitoring. http://www.site24x7.com.Google ScholarGoogle Scholar
  43. Wayne D. Smith. 2001. TPC-W: Benchmarking an Ecommerce Solution. Available at http://www.tpc.org/tpcw/.Google ScholarGoogle Scholar
  44. Stress. Load and Stress Test Tool. http://linux.die.net/man/1/stress.Google ScholarGoogle Scholar
  45. Zabbix. 2010. Enterprise Monitoring Solution. http://www.zabbix.com/.Google ScholarGoogle Scholar
  46. Jerrold H. Zar. 1972. Significance testing of the Spearman rank correlation coefficient. Journal of the American Statistical Association 67, 339, 578--580.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. SHõWA: A Self-Healing Framework for Web-Based Applications

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!