skip to main content
research-article

Isolation points: Creating performance-robust enterprise systems

Published:21 May 2009Publication History
Skip Abstract Section

Abstract

This article explores a performance isolation-based approach to creating robust distributed applications. For each application, the approach is to understand the performance dependencies that pervade it and then impose constraints on the possible ‘spread’ of such dependencies through the application. The mechanisms used for this purpose, termed isolation points, are software abstractions inserted at key program locations: (1) in application interfaces, (2) in middleware implementations for making remote requests, and (3) in the system interfaces used by middleware and applications. This article demonstrates the utility of isolation points by using them to implement higher level abstractions that improve the performance-robustness of representative enterprise applications. The I-Queue abstraction uses isolation points to implement performance-robust messaging, targeting the message queues used in distributed enterprise codes. By appropriately orchestrating message dispatching, I-Queue can achieve an improvement of 16--32% in dispatched message locality based on traces obtained from the large-scale e-Pricing® search engine operated by Worldspan L.P.

References

  1. Aristotle research group. JABA: Java architecture for bytecode analysis. http://www.cc.gatech.edu/aristotle/Tools/jaba.html. (10/04/04).Google ScholarGoogle Scholar
  2. Agarwala, S., Alegre, F., Schwan, K., and Mehalingham, J. 2007. E2EProf: Automated end-to-end performance management for enterprise systems. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Agarwala, S., Chen, Y., Milojicic, D., and Schwan, K. 2006. QMON: QoS- and utility-aware monitoring in enterprise systems. In Proceedings of the 3rd International Conference on Autonomic Computing (ICAC'06). IEEE, 124--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Aguilera, M. K., Mogul, J. C., Wiener, J. L., Reynolds, P., and Muthitacharoen, A. 2003. Performance debugging for distributed systems of black boxes. Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). 74--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Aweya, J., Ouellette, M., Montuno, D. Y., Doray, B., and Felske, K. 2002. An adaptive load balancing scheme for web servers. Int. J. Netw. Manage. 12, 1, 3--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ball, T., Bounimova, E., Cook, B., Levin, V., Lichtenberg, J., McGarvey, C., Ondrusek, B., Rajamani, S. K., and Ustuner, A. 2006. Thorough static analysis of device drivers. In Proceedings of the European System Conference (EuroSys'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barham, P. T., Dragovic, B., Fraser, K., Hand, S., Harris, T. L., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. 2003. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). 164--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bhatia, S., Consel, C., Meur, A.-F. L., and Pu, C. 2004. Automatic specialization of protocol stacks. In Proceedings of the 29th Annual IEEE Conference on Local Computer Networks (LCN'04). IEEE Computer Society, 152--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bowring, J. F., Rehg, J. M., and Harrold, M. J. 2004. Active learning for automatic classification of software behavior. In Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'04). ACM Press, New York, 195--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cai, Z., Chen, Y., Kumar, V., Milojicic, D. S., and Schwan, K. 2007. Automated availability management driven by business policies. In Integr. Netw. Manage. IEEE, 264--273.Google ScholarGoogle Scholar
  11. Candea, G., Cutler, J., and Fox, A. 2004. Improving availability with recursive microreboots: a soft-state system case study. Perform. Eval. 56, 1-4, 213--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cardellini, V., Casalicchio, E., Colajanni, M., and Yu, P. S. 2002. The state of the art in locally distributed web-server systems. ACM Comput. Surv. 34, 2, 263--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Castro, M., Druschel, P., Kermarrec, A.-M., Nandi, A., Rowstron, A., and Singh, A. 2003. Splitstream: high-bandwidth multicast in cooperative environments. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). ACM Press, New York, 298--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chen, M., Kiciman, E., Fratkin, E., Brewer, E., and Fox, A. 2002. Pinpoint: Problem determination in large, dynamic, internet services. In Proceedings of the International Conference on Dependable Systems and Networks (IPDS Track). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Clause, J., Li, W., and Orso, A. 2007. Dytan: a generic dynamic taint analysis framework. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA'07). ACM, New York, 196--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cohen, I., Chase, J. S., Goldszmidt, M., Kelly, T., and Symons, J. 2004. Correlating instrumentation data to system states: A building block for automated diagnosis and control. In Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI'04). USENIX Association, 231--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cohen, I., Zhang, S., Goldszmidt, M., Symons, J., Kelly, T., and Fox, A. 2005. Capturing, indexing, clustering, and retrieving system history. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). ACM Press, New York, 105--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cowan, C., Cen, S., Walpole, J., and Pu, C. 1995. Adaptive methods for distributed video presentation. ACM Comput. Surv. 27, 4, 580--583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Czajkowski, G. 2000. Application isolation in the Java virtual machine. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications (OOPSLA'00). 354--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Eisenhauer, G., Bustamante, F. E., and Schwan, K. 2000. Event services for high performance computing. In Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing (HPDC'00). IEEE, 113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Fox, A., Kiciman, E., and Patterson, D. 2004. Combining statistical monitoring and predictable recovery for self-management. In Proceedings of the 1st ACM SIGSOFT Workshop on Self-managed Systems (WOSS'04). ACM Press, New York, 49--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gavrilovska, A., Schwan, K., and Oleson, V. 2001. Adaptable mirroring in cluster servers. In Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPDC'01). IEEE, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gavrilovska, A., Schwan, K., and Oleson, V. 2002. A practical approach for zero' downtime in an operational information system. In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS'02). IEEE, 345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gray, J. 1985. Why do computers stop and what can be done about it? Tech. Rep. TR.85.7, Tandem.Google ScholarGoogle Scholar
  25. Gu, W., Eisenhauer, G., Schwan, K., and Vetter, J. S. 1988. Falcon: On-line monitoring and steering of parallel programs. Concurrency: Practice and Experience 10, 9, 699--736.Google ScholarGoogle ScholarCross RefCross Ref
  26. Halfond, W. and Orso, A. 2008. Automated identification of interface mismatches in Web applications. In Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hamilton, G., Powell, M. L., and Mitchell, J. G. 1993. Subcontract: a flexible base for distributed programming. In Proceedings of the 14th ACM Symposium on Operating Systems Principles (SOSP'93). ACM Press, 69--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hanson, J. E., Whalley, I., Chess, D. M., and Kephart, J. O. 2004. An architectural approach to autonomic computing. In Proceedings of the 1st International Conference on Autonomic Computing (ICAC'04). IEEE, 2--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. He, Q. and Schwan, K. 2002. IQ-RUDP: Coordinating application adaptation with network transport. In Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC'02). IEEE, 369--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jancic, J., Poellabauer, C., Schwan, K., Wolf, M., and Bright, N. 2002. dproc - extensible run-time resource monitoring for cluster applications. In Proceedings of the International Conference on Computational Science—Part II (ICCS'02). Springer-Verlag, 894--903. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jiang, G., Chen, H., Ungureanu, C., and Yoshihira, K. 2005. Multi-resolution abnormal trace detection using varied-length n-grams and automata. In Proceedings of the 2nd International Conference on Automatic Computing (ICAC'05). IEEE, 111--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jordan, M. J., Czajkowski, G., Kouklinski, K., and Skinner, G. 2004. Extending a J2EE server with dynamic and flexible resource management. In Proceedings of the ACM/IFIP/USENIX International Middleware Conference (Middleware'04). Lecture Notes in Computer Science, vol. 3231. Springer, 439--458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kephart, J. O. 2005. Research challenges of autonomic computing. In Proceedings of the 27th International Conference on Software Engineering. 15--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kumar, R., Wolenetz, M., Agarwalla, B., Shin, J., Hutto, P., Paul, A., and Ramachandran, U. 2003. Dfuse: a framework for distributed data fusion. In Proceedings of the 1st International Conference on Embedded Networked Sensor Systems (SenSys'03). ACM Press, New York, 114--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Kumar, V., Cooper, B. F., Eisenhauer, G., and Schwan, K. 2007. imanage: Policy-driven self-management for enterprise-scale systems. In Middleware, R. Cerqueira and R. H. Campbell, Eds. Lecture Notes in Computer Science, vol. 4834. Springer, 287--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lei, Z. and Georganas, N. D. 2005. Adaptive video transcoding and streaming over wireless channels. J. Syst. Softw. 75, 3, 253--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Liao, C., Martonosi, M., and Clark, D. W. 1998. Performance monitoring in a myrinet-connected SHRIMP cluster. In Proceedings of the SIGMETRICS Symposium on Parallel and distributed tools (SPDT'98). ACM Press, New York, 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Lohman, G., Champlin, J., and Sohn, P. 2005. Quickly finding known software problems via automated symptom matching. In Proceedings of the Second International Conference on Automatic Computing (ICAC'05). IEEE, 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Lowekamp, B., Miller, N., Karrer, R., Gross, T., and Steenkiste, P. 2003. Design, implementation, and evaluation of the remos network monitoring system. J. Grid Comput. 1, 1, 75--93.Google ScholarGoogle ScholarCross RefCross Ref
  40. Loyall, J. P., Schantz, R. E., Zinky, J. A., and Bakken, D. E. 1998. Specifying and measuring quality of service in distributed object systems. In Proceedings of the The 1st IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'98). IEEE, 43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Mansour, M. and Mehalingham, J. 2006. private communication.Google ScholarGoogle Scholar
  42. Mansour, M. S. and Schwan, K. 2005. I-RMI: Performance isolation in information flow applications. In Proceedings of the ACM/IFIP/USENIX 6th International Middleware Conference (Middleware 2005), G. Alonso, Ed. Lecture Notes in Computer Science, vol. 3790. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Mansour, M. S., Schwan, K., and Abdelaziz, S. 2006a. I-Queue: Smart queues for service management. In Proceedings of the 4th International Conference on Service Oriented Computing (ICSOC'06). Lecture Notes in Computer Science. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Mansour, M. S., Schwan, K., and Abdelaziz, S. 2006b. I-Queue: Smart queues for service management. Tech. Rep. GIT-CERCS-06-11, CERCS.Google ScholarGoogle Scholar
  45. Noble, B. D., Satyanarayanan, M., Narayanan, D., Tilton, J. E., Flinn, J., and Walker, K. R. 1997. Agile application-aware adaptation for mobility. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP'97). ACM Press, New York, 276--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Oreizy, P., Gorlick, M., Taylor, R., Heimbigner, D., Johnson, G., Medvidovic, N., Quilici, A., Rosenblum, D., and Wolf, A. 1999. An architecture-based approach to self-adaptive software. IEEE Intell. Syst. 14, 3, 54--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Oster, S., Hastings, S. L., Langella, S., Ervin, D. W., Madduri, R., Kurc, T. M., Siebenlist, F., Foster, I., Shanbhag, K., Covitz, P. A., and Saltz, J. H. 2007. cagrid 1.0: A grid enterprise architecture for cancer research. Proceedings of the 2007 AMIA Annual Symposium.Google ScholarGoogle Scholar
  48. Poellabauer, C., Schwan, K., West, R., Ganev, I., Bright, N., and Losik, G. 2000. Flexible user/kernel communication for real-time applications in Elinux. In Proceedings of the Workshop on Real Time Operating Systems and Applications and Second Real Time Linux Workshop (in conjunction with RTSS 2000).Google ScholarGoogle Scholar
  49. Powell, M. L. and Miller, B. P. 1983. Process migration in DEMOS/MP. In Proceedings of the 9th ACM Symposium on Operating Systems Principles (SOSP'83). ACM Press, New York, 110--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Powers, R., Goldszmidt, M., and Cohen, I. 2005. Short term performance forecasting in enterprise systems. In Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD'05). ACM Press, New York, NY, USA, 801--807. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Provos, N. and Lever, C. 2000. Scalable network I/O in Linux. In Proceeding of the USENIX Annual Technical Conference, FREENIX Track. 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Pyarali, I., Schmidt, D. C., and Cytron, R. 2003. Techniques for enhancing real-time corba quality of service. Proc. IEEE 91, 7, 1070--1085.Google ScholarGoogle ScholarCross RefCross Ref
  53. Rabiner, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286. Google ScholarGoogle ScholarCross RefCross Ref
  54. Ramshaw, L., Sahai, A., Saxe, J., and Singhal, S. 2006. Cauldron: A policy-based design tool. In Proceeding of the 7th IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'06). IEEE, 113--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Roblee, C. and Cybenko, G. 2005. Implementing large-scale autonomic server monitoring using process query systems. In Proceedings of the 2nd International Conference on Automatic Computing (ICAC'05). IEEE, 123--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Rosu, D., Schwan, K., and Yalamanchili, S. 1998. Fara: A framework for adaptive resource allocation in complex real-time systems. In Proceedings of the 4th IEEE Real-Time Technology and Applications Symposium (RTAS 98). IEEE, 79--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Rosu, D., Schwan, K., Yalamanchili, S., and Jha, R. 1997. On adaptive resource allocation for complex real-time applications. In Proceedings of the 18th IEEE Real-Time Systems Symposium (RTSS'97). IEEE, 320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Squiillante, M. S. and Lazowska, E. D. 1993. Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Trans. Parall. Distrib. Syst. 4, 2, 131--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Strom, R., Banavar, G., Chandra, T., Kaplan, M., Miller, K., Mukherjee, B., Sturman, D., and Ward, M. 1998. Gryphon: An information flow based approach to message brokering. In Proceedings of the International Symposium on Software Reliability Engineering (ISSRE'98).Google ScholarGoogle Scholar
  60. Sundaram, V., Chandra, A., Goyal, P., Shenoy, P., Sahni, J., and Vin, H. 2000. Application performance in the QLinux multimedia operating system. In Proceedings of the 8th ACM International Conference on Multimedia (MULTIMEDIA'00). ACM Press, New York, 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Uthayopas, P., Phaisithbenchapol, S., and Chongbarirux, K. 1998. Building a resources monitoring system for smile beowulf cluster. In Proceeding of the 3rd International Conference/Exhibition on High Performance Computing in Asia-Pacific Region (HPC ASIA'99).Google ScholarGoogle Scholar
  62. Weiser, M. 1981. Program slicing. In Proceedings of the 5th International Conference on Software Engineering (ICSE'81). IEEE, 439--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. West, R. and Schwan, K. 1999. Dynamic window-constrained scheduling for multimedia applications. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, (ICMCS'99). Volume II IEEE, 87--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Wolf, M., Cai, Z., Huang, W., and Schwan, K. 2002. SmartPointers: personalized scientific data portals in your hand. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'02). IEEE, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yuan, W. and Nahrstedt, K. 2004. Process group management in cross-layer adaptation. In Proceedings of the SPIE/ACM Multimedia Computing and Networking Conference (MMCN'04). 55--68.Google ScholarGoogle Scholar
  66. Zhang, S., Cohen, I., Symons, J., and Fox, A. 2005. Ensembles of models for automated diagnosis of system performance problems. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'05). IEEE, 644--653. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Isolation points: Creating performance-robust enterprise systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Autonomous and Adaptive Systems
            ACM Transactions on Autonomous and Adaptive Systems  Volume 4, Issue 2
            May 2009
            155 pages
            ISSN:1556-4665
            EISSN:1556-4703
            DOI:10.1145/1516533
            Issue’s Table of Contents

            Copyright © 2009 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 May 2009
            • Accepted: 1 February 2009
            • Revised: 1 June 2008
            • Received: 1 September 2007
            Published in taas Volume 4, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed
          • Article Metrics

            • Downloads (Last 12 months)1
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!