Abstract
This article explores a performance isolation-based approach to creating robust distributed applications. For each application, the approach is to understand the performance dependencies that pervade it and then impose constraints on the possible ‘spread’ of such dependencies through the application. The mechanisms used for this purpose, termed isolation points, are software abstractions inserted at key program locations: (1) in application interfaces, (2) in middleware implementations for making remote requests, and (3) in the system interfaces used by middleware and applications. This article demonstrates the utility of isolation points by using them to implement higher level abstractions that improve the performance-robustness of representative enterprise applications. The I-Queue abstraction uses isolation points to implement performance-robust messaging, targeting the message queues used in distributed enterprise codes. By appropriately orchestrating message dispatching, I-Queue can achieve an improvement of 16--32% in dispatched message locality based on traces obtained from the large-scale e-Pricing® search engine operated by Worldspan L.P.
- Aristotle research group. JABA: Java architecture for bytecode analysis. http://www.cc.gatech.edu/aristotle/Tools/jaba.html. (10/04/04).Google Scholar
- Agarwala, S., Alegre, F., Schwan, K., and Mehalingham, J. 2007. E2EProf: Automated end-to-end performance management for enterprise systems. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07). Google Scholar
Digital Library
- Agarwala, S., Chen, Y., Milojicic, D., and Schwan, K. 2006. QMON: QoS- and utility-aware monitoring in enterprise systems. In Proceedings of the 3rd International Conference on Autonomic Computing (ICAC'06). IEEE, 124--133. Google Scholar
Digital Library
- Aguilera, M. K., Mogul, J. C., Wiener, J. L., Reynolds, P., and Muthitacharoen, A. 2003. Performance debugging for distributed systems of black boxes. Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). 74--89. Google Scholar
Digital Library
- Aweya, J., Ouellette, M., Montuno, D. Y., Doray, B., and Felske, K. 2002. An adaptive load balancing scheme for web servers. Int. J. Netw. Manage. 12, 1, 3--39. Google Scholar
Digital Library
- Ball, T., Bounimova, E., Cook, B., Levin, V., Lichtenberg, J., McGarvey, C., Ondrusek, B., Rajamani, S. K., and Ustuner, A. 2006. Thorough static analysis of device drivers. In Proceedings of the European System Conference (EuroSys'06). Google Scholar
Digital Library
- Barham, P. T., Dragovic, B., Fraser, K., Hand, S., Harris, T. L., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. 2003. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). 164--177. Google Scholar
Digital Library
- Bhatia, S., Consel, C., Meur, A.-F. L., and Pu, C. 2004. Automatic specialization of protocol stacks. In Proceedings of the 29th Annual IEEE Conference on Local Computer Networks (LCN'04). IEEE Computer Society, 152--159. Google Scholar
Digital Library
- Bowring, J. F., Rehg, J. M., and Harrold, M. J. 2004. Active learning for automatic classification of software behavior. In Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA'04). ACM Press, New York, 195--205. Google Scholar
Digital Library
- Cai, Z., Chen, Y., Kumar, V., Milojicic, D. S., and Schwan, K. 2007. Automated availability management driven by business policies. In Integr. Netw. Manage. IEEE, 264--273.Google Scholar
- Candea, G., Cutler, J., and Fox, A. 2004. Improving availability with recursive microreboots: a soft-state system case study. Perform. Eval. 56, 1-4, 213--248. Google Scholar
Digital Library
- Cardellini, V., Casalicchio, E., Colajanni, M., and Yu, P. S. 2002. The state of the art in locally distributed web-server systems. ACM Comput. Surv. 34, 2, 263--311. Google Scholar
Digital Library
- Castro, M., Druschel, P., Kermarrec, A.-M., Nandi, A., Rowstron, A., and Singh, A. 2003. Splitstream: high-bandwidth multicast in cooperative environments. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). ACM Press, New York, 298--313. Google Scholar
Digital Library
- Chen, M., Kiciman, E., Fratkin, E., Brewer, E., and Fox, A. 2002. Pinpoint: Problem determination in large, dynamic, internet services. In Proceedings of the International Conference on Dependable Systems and Networks (IPDS Track). Google Scholar
Digital Library
- Clause, J., Li, W., and Orso, A. 2007. Dytan: a generic dynamic taint analysis framework. In Proceedings of the International Symposium on Software Testing and Analysis (ISSTA'07). ACM, New York, 196--206. Google Scholar
Digital Library
- Cohen, I., Chase, J. S., Goldszmidt, M., Kelly, T., and Symons, J. 2004. Correlating instrumentation data to system states: A building block for automated diagnosis and control. In Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI'04). USENIX Association, 231--244. Google Scholar
Digital Library
- Cohen, I., Zhang, S., Goldszmidt, M., Symons, J., Kelly, T., and Fox, A. 2005. Capturing, indexing, clustering, and retrieving system history. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). ACM Press, New York, 105--118. Google Scholar
Digital Library
- Cowan, C., Cen, S., Walpole, J., and Pu, C. 1995. Adaptive methods for distributed video presentation. ACM Comput. Surv. 27, 4, 580--583. Google Scholar
Digital Library
- Czajkowski, G. 2000. Application isolation in the Java virtual machine. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages & Applications (OOPSLA'00). 354--366. Google Scholar
Digital Library
- Eisenhauer, G., Bustamante, F. E., and Schwan, K. 2000. Event services for high performance computing. In Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing (HPDC'00). IEEE, 113. Google Scholar
Digital Library
- Fox, A., Kiciman, E., and Patterson, D. 2004. Combining statistical monitoring and predictable recovery for self-management. In Proceedings of the 1st ACM SIGSOFT Workshop on Self-managed Systems (WOSS'04). ACM Press, New York, 49--53. Google Scholar
Digital Library
- Gavrilovska, A., Schwan, K., and Oleson, V. 2001. Adaptable mirroring in cluster servers. In Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPDC'01). IEEE, 3. Google Scholar
Digital Library
- Gavrilovska, A., Schwan, K., and Oleson, V. 2002. A practical approach for zero' downtime in an operational information system. In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS'02). IEEE, 345. Google Scholar
Digital Library
- Gray, J. 1985. Why do computers stop and what can be done about it? Tech. Rep. TR.85.7, Tandem.Google Scholar
- Gu, W., Eisenhauer, G., Schwan, K., and Vetter, J. S. 1988. Falcon: On-line monitoring and steering of parallel programs. Concurrency: Practice and Experience 10, 9, 699--736.Google Scholar
Cross Ref
- Halfond, W. and Orso, A. 2008. Automated identification of interface mismatches in Web applications. In Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE'08). Google Scholar
Digital Library
- Hamilton, G., Powell, M. L., and Mitchell, J. G. 1993. Subcontract: a flexible base for distributed programming. In Proceedings of the 14th ACM Symposium on Operating Systems Principles (SOSP'93). ACM Press, 69--79. Google Scholar
Digital Library
- Hanson, J. E., Whalley, I., Chess, D. M., and Kephart, J. O. 2004. An architectural approach to autonomic computing. In Proceedings of the 1st International Conference on Autonomic Computing (ICAC'04). IEEE, 2--9. Google Scholar
Digital Library
- He, Q. and Schwan, K. 2002. IQ-RUDP: Coordinating application adaptation with network transport. In Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC'02). IEEE, 369--378. Google Scholar
Digital Library
- Jancic, J., Poellabauer, C., Schwan, K., Wolf, M., and Bright, N. 2002. dproc - extensible run-time resource monitoring for cluster applications. In Proceedings of the International Conference on Computational Science—Part II (ICCS'02). Springer-Verlag, 894--903. Google Scholar
Digital Library
- Jiang, G., Chen, H., Ungureanu, C., and Yoshihira, K. 2005. Multi-resolution abnormal trace detection using varied-length n-grams and automata. In Proceedings of the 2nd International Conference on Automatic Computing (ICAC'05). IEEE, 111--122. Google Scholar
Digital Library
- Jordan, M. J., Czajkowski, G., Kouklinski, K., and Skinner, G. 2004. Extending a J2EE server with dynamic and flexible resource management. In Proceedings of the ACM/IFIP/USENIX International Middleware Conference (Middleware'04). Lecture Notes in Computer Science, vol. 3231. Springer, 439--458. Google Scholar
Digital Library
- Kephart, J. O. 2005. Research challenges of autonomic computing. In Proceedings of the 27th International Conference on Software Engineering. 15--22. Google Scholar
Digital Library
- Kumar, R., Wolenetz, M., Agarwalla, B., Shin, J., Hutto, P., Paul, A., and Ramachandran, U. 2003. Dfuse: a framework for distributed data fusion. In Proceedings of the 1st International Conference on Embedded Networked Sensor Systems (SenSys'03). ACM Press, New York, 114--125. Google Scholar
Digital Library
- Kumar, V., Cooper, B. F., Eisenhauer, G., and Schwan, K. 2007. imanage: Policy-driven self-management for enterprise-scale systems. In Middleware, R. Cerqueira and R. H. Campbell, Eds. Lecture Notes in Computer Science, vol. 4834. Springer, 287--307. Google Scholar
Digital Library
- Lei, Z. and Georganas, N. D. 2005. Adaptive video transcoding and streaming over wireless channels. J. Syst. Softw. 75, 3, 253--270. Google Scholar
Digital Library
- Liao, C., Martonosi, M., and Clark, D. W. 1998. Performance monitoring in a myrinet-connected SHRIMP cluster. In Proceedings of the SIGMETRICS Symposium on Parallel and distributed tools (SPDT'98). ACM Press, New York, 21--29. Google Scholar
Digital Library
- Lohman, G., Champlin, J., and Sohn, P. 2005. Quickly finding known software problems via automated symptom matching. In Proceedings of the Second International Conference on Automatic Computing (ICAC'05). IEEE, 101--110. Google Scholar
Digital Library
- Lowekamp, B., Miller, N., Karrer, R., Gross, T., and Steenkiste, P. 2003. Design, implementation, and evaluation of the remos network monitoring system. J. Grid Comput. 1, 1, 75--93.Google Scholar
Cross Ref
- Loyall, J. P., Schantz, R. E., Zinky, J. A., and Bakken, D. E. 1998. Specifying and measuring quality of service in distributed object systems. In Proceedings of the The 1st IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'98). IEEE, 43. Google Scholar
Digital Library
- Mansour, M. and Mehalingham, J. 2006. private communication.Google Scholar
- Mansour, M. S. and Schwan, K. 2005. I-RMI: Performance isolation in information flow applications. In Proceedings of the ACM/IFIP/USENIX 6th International Middleware Conference (Middleware 2005), G. Alonso, Ed. Lecture Notes in Computer Science, vol. 3790. Springer. Google Scholar
Digital Library
- Mansour, M. S., Schwan, K., and Abdelaziz, S. 2006a. I-Queue: Smart queues for service management. In Proceedings of the 4th International Conference on Service Oriented Computing (ICSOC'06). Lecture Notes in Computer Science. Springer. Google Scholar
Digital Library
- Mansour, M. S., Schwan, K., and Abdelaziz, S. 2006b. I-Queue: Smart queues for service management. Tech. Rep. GIT-CERCS-06-11, CERCS.Google Scholar
- Noble, B. D., Satyanarayanan, M., Narayanan, D., Tilton, J. E., Flinn, J., and Walker, K. R. 1997. Agile application-aware adaptation for mobility. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP'97). ACM Press, New York, 276--287. Google Scholar
Digital Library
- Oreizy, P., Gorlick, M., Taylor, R., Heimbigner, D., Johnson, G., Medvidovic, N., Quilici, A., Rosenblum, D., and Wolf, A. 1999. An architecture-based approach to self-adaptive software. IEEE Intell. Syst. 14, 3, 54--62. Google Scholar
Digital Library
- Oster, S., Hastings, S. L., Langella, S., Ervin, D. W., Madduri, R., Kurc, T. M., Siebenlist, F., Foster, I., Shanbhag, K., Covitz, P. A., and Saltz, J. H. 2007. cagrid 1.0: A grid enterprise architecture for cancer research. Proceedings of the 2007 AMIA Annual Symposium.Google Scholar
- Poellabauer, C., Schwan, K., West, R., Ganev, I., Bright, N., and Losik, G. 2000. Flexible user/kernel communication for real-time applications in Elinux. In Proceedings of the Workshop on Real Time Operating Systems and Applications and Second Real Time Linux Workshop (in conjunction with RTSS 2000).Google Scholar
- Powell, M. L. and Miller, B. P. 1983. Process migration in DEMOS/MP. In Proceedings of the 9th ACM Symposium on Operating Systems Principles (SOSP'83). ACM Press, New York, 110--119. Google Scholar
Digital Library
- Powers, R., Goldszmidt, M., and Cohen, I. 2005. Short term performance forecasting in enterprise systems. In Proceeding of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD'05). ACM Press, New York, NY, USA, 801--807. Google Scholar
Digital Library
- Provos, N. and Lever, C. 2000. Scalable network I/O in Linux. In Proceeding of the USENIX Annual Technical Conference, FREENIX Track. 109--120. Google Scholar
Digital Library
- Pyarali, I., Schmidt, D. C., and Cytron, R. 2003. Techniques for enhancing real-time corba quality of service. Proc. IEEE 91, 7, 1070--1085.Google Scholar
Cross Ref
- Rabiner, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286. Google Scholar
Cross Ref
- Ramshaw, L., Sahai, A., Saxe, J., and Singhal, S. 2006. Cauldron: A policy-based design tool. In Proceeding of the 7th IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'06). IEEE, 113--122. Google Scholar
Digital Library
- Roblee, C. and Cybenko, G. 2005. Implementing large-scale autonomic server monitoring using process query systems. In Proceedings of the 2nd International Conference on Automatic Computing (ICAC'05). IEEE, 123--133. Google Scholar
Digital Library
- Rosu, D., Schwan, K., and Yalamanchili, S. 1998. Fara: A framework for adaptive resource allocation in complex real-time systems. In Proceedings of the 4th IEEE Real-Time Technology and Applications Symposium (RTAS 98). IEEE, 79--84. Google Scholar
Digital Library
- Rosu, D., Schwan, K., Yalamanchili, S., and Jha, R. 1997. On adaptive resource allocation for complex real-time applications. In Proceedings of the 18th IEEE Real-Time Systems Symposium (RTSS'97). IEEE, 320. Google Scholar
Digital Library
- Squiillante, M. S. and Lazowska, E. D. 1993. Using processor-cache affinity information in shared-memory multiprocessor scheduling. IEEE Trans. Parall. Distrib. Syst. 4, 2, 131--143. Google Scholar
Digital Library
- Strom, R., Banavar, G., Chandra, T., Kaplan, M., Miller, K., Mukherjee, B., Sturman, D., and Ward, M. 1998. Gryphon: An information flow based approach to message brokering. In Proceedings of the International Symposium on Software Reliability Engineering (ISSRE'98).Google Scholar
- Sundaram, V., Chandra, A., Goyal, P., Shenoy, P., Sahni, J., and Vin, H. 2000. Application performance in the QLinux multimedia operating system. In Proceedings of the 8th ACM International Conference on Multimedia (MULTIMEDIA'00). ACM Press, New York, 127--136. Google Scholar
Digital Library
- Uthayopas, P., Phaisithbenchapol, S., and Chongbarirux, K. 1998. Building a resources monitoring system for smile beowulf cluster. In Proceeding of the 3rd International Conference/Exhibition on High Performance Computing in Asia-Pacific Region (HPC ASIA'99).Google Scholar
- Weiser, M. 1981. Program slicing. In Proceedings of the 5th International Conference on Software Engineering (ICSE'81). IEEE, 439--449. Google Scholar
Digital Library
- West, R. and Schwan, K. 1999. Dynamic window-constrained scheduling for multimedia applications. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems, (ICMCS'99). Volume II IEEE, 87--91. Google Scholar
Digital Library
- Wolf, M., Cai, Z., Huang, W., and Schwan, K. 2002. SmartPointers: personalized scientific data portals in your hand. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC'02). IEEE, 1--16. Google Scholar
Digital Library
- Yuan, W. and Nahrstedt, K. 2004. Process group management in cross-layer adaptation. In Proceedings of the SPIE/ACM Multimedia Computing and Networking Conference (MMCN'04). 55--68.Google Scholar
- Zhang, S., Cohen, I., Symons, J., and Fox, A. 2005. Ensembles of models for automated diagnosis of system performance problems. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'05). IEEE, 644--653. Google Scholar
Digital Library
Index Terms
Isolation points: Creating performance-robust enterprise systems
Recommendations
Quantifying the performance isolation properties of virtualization systems
ExpCS '07: Proceedings of the 2007 workshop on Experimental computer scienceIn this paper, we present the design of a performance isolation benchmark that quantifies the degree to which a virtualization system limits the impact of a misbehaving virtual machine on other well-behaving virtual machines running on the same physical ...
vCache: architectural support for transparent and isolated virtual LLCs in virtualized environments
MICRO-48: Proceedings of the 48th International Symposium on MicroarchitectureA key role of virtualization is to give an illusion that a consolidated workload runs on a dedicated machine although the underlying resources are actively shared by multiple workloads. Technical advances have enabled a virtual machine (VM) to exercise ...
Performance Isolation Exposure in Virtualized Platforms with PCI Passthrough I/O Sharing
Proceedings of the 27th International Conference on Architecture of Computing Systems ARCS 2014 - Volume 8350PCI Passthrough is an x86 virtualization technology that enables low overhead, high performance I/O virtualization. It is an established technology in server and cloud computing environments and a promising technology for sharing I/O devices in future ...






Comments