Abstract
In this article, we introduce TAS (Transactional Auto Scaler), a system for automating the elastic scaling of replicated in-memory transactional data grids, such as NoSQL data stores or Distributed Transactional Memories. Applications of TAS range from online self-optimization of in-production applications to the automatic generation of QoS/cost-driven elastic scaling policies, as well as to support for what-if analysis on the scalability of transactional applications.
In this article, we present the key innovation at the core of TAS, namely, a novel performance forecasting methodology that relies on the joint usage of analytical modeling and machine learning. By exploiting these two classically competing approaches in a synergic fashion, TAS achieves the best of the two worlds, namely, high extrapolation power and good accuracy, even when faced with complex workloads deployed over public cloud infrastructures.
We demonstrate the accuracy and feasibility of TAS’s performance forecasting methodology via an extensive experimental study based on a fully fledged prototype implementation integrated with a popular open-source in-memory transactional data grid (Red Hat’s Infinispan) and industry-standard benchmarks generating a breadth of heterogeneous workloads.
- Ahmed Ali-Eldin, Maria Kihl, Johan Tordsson, and Erik Elmroth. 2012a. Efficient provisioning of bursty scientific workloads on the cloud using adaptive elasticity control. In Proc. of the Workshop on Scientific Cloud Computing Date (ScienceCloud’12). Google Scholar
Digital Library
- Ahmed Ali-Eldin, Johan Tordsson, and Erik Elmroth. 2012b. An adaptive hybrid elasticity controller for cloud infrastructures. In Proc. of the Network Operations and Management Symposium (NOMS’12).Google Scholar
Cross Ref
- Amazon. 2013. Amazon S3. Available at http://aws.amazon.com/s3/.Google Scholar
- Jason Baker, Chris Bond, James C. Corbett, J. J. Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proc. of the Conference on Innovative Data System Research (CIDR’11).Google Scholar
- Bela Ban. 2012. JGroups—A Toolkit for Reliable Multicast Communication. Available at http://www.jgroups.org.Google Scholar
- Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O’Neil, and Patrick O’Neil. 1995. A critique of ANSI SQL isolation levels. In Proc. of the ACM SIGMOD International Conference on Management of Data. Google Scholar
Digital Library
- Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1986. Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman. Google Scholar
Digital Library
- U. Narayan Bhat, Mohamed Shalaby, and Martin J. Fischer. 1979. Approximation techniques in the solution of queueing problems. Naval Research Logistics Quarterly 26, 2, 311--326.Google Scholar
Cross Ref
- Christopher M. Bishop. 2007. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer. Google Scholar
Digital Library
- Jin Chen, Gokul Soundararajan, and Cristiana Amza. 2006. Autonomic provisioning of backend databases in dynamic content web servers. In Proc. of the International Conference on Autonomic Computing (ICAC). Google Scholar
Digital Library
- Bruno Ciciani, Daniel M. Dias, and Philip S. Yu. 1990. Analysis of replication in distributed database systems. IEEE Transactions on Knowledge and Data Engineering 2, 2 (1990), 247--261. Google Scholar
Digital Library
- Yi Dai, Yunzhao Luo, Zhonghua Li, and Zhaojun Wang. 2011. A new adaptive CUSUM control chart for detecting the multivariate process mean. Quality and Reliability Engineering International 27, 7 (2011), 877--824.Google Scholar
Cross Ref
- Pierangelo di Sanzo, Bruno Ciciani, Roberto Palmieri, Francesco Quaglia, and Paolo Romano. 2012. On the analytical modeling of concurrency control algorithms for Software Transactional Memories: The case of Commit-Time-Locking. Performance Evaluation 69, 5 (2012), 187--205. Google Scholar
Digital Library
- Pierangelo di Sanzo, Bruno Ciciani, Francesco Quaglia, and Paolo Romano. 2008. A performance model of multi-version concurrency control. In Proc. of the International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’08).Google Scholar
Cross Ref
- Pierangelo di Sanzo, Roberto Palmieri, Bruno Ciciani, Francesco Quaglia, and Paolo Romano. 2010. Analytical modeling of lock-based concurrency control with arbitrary transaction data access patterns. In Proc. of WOSP/SIPEW International Conference on Performance Engineering (ICPE’10). Google Scholar
Digital Library
- Dave Dice, Ori Shalev, and Nir Shavit. 2006. Transactional locking II. In Proc. of the International Symposium on Distributed Computing (DISC’06). Google Scholar
Digital Library
- Xavier Dutreilh, Sergey Kirgizov, Olga Melekhova, Jacques Malenfant, Nicolas Rivierre, and Isis Truck. 2011. Using reinforcement learning for autonomic resource allocation in clouds: Towards a fully automated workflow. In Proc. of the International Conference on Autonomic and Autonomous Systems (ICAS’11).Google Scholar
- Sameh Elnikety, Steven Dropsho, Emmanuel Cecchet, and Willy Zwaenepoel. 2009. Predicting replicated database scalability from standalone database profiling. In Proc. of the European Conference on Computer systems (EuroSys’09). Google Scholar
Digital Library
- Saeed Ghanbari, Gokul Soundararajan, Jin Chen, and Cristiana Amza. 2007. Adaptive learning of metric correlations for temperature-aware database provisioning. In Proc. of the International Conference on Autonomic Computing (ICAC’07). Google Scholar
Digital Library
- Jim Gray, Pat Helland, Patrick O’Neil, and Dennis Shasha. 1996. The dangers of replication and a solution. In Proc. of the ACM SIGMOD International Conference on Management of Data. Google Scholar
Digital Library
- Leonard Kleinrock. 1975. Queueing Systems. Vol. I: Theory. Wiley Interscience.Google Scholar
Digital Library
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. SIGOPS Operating System Review, 44, 2 (2010), 35--41. Google Scholar
Digital Library
- Scott T. Leutenegger and Daniel Dias. 1993. A Modeling study of the TPC-C benchmark. In SIGMOD Record 22, 2, 22--31. Google Scholar
Digital Library
- John Dutton Conant Little. 1961. A proof for the queuing formula: L = λ W. Operations Research 9, 3 (1961), 383--387. Google Scholar
Digital Library
- London’s Global University. 2013. Lattice Monitoring Framework. Available at http://clayfour.ee.ucl.ac.uk/lattice/.Google Scholar
- Francesco Marchioni and Manik Surtani. 2012. Infinispan Data Grid Platform. Packt Publishing.Google Scholar
- Daniel A. Menascé and Tatuo Nakanishi. 1982. Performance evaluation of a two-phase commit based protocol for DDBs. In Proc. of the ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS). Google Scholar
Digital Library
- Matthias Nicola and Matthias Jarke. 2000. Performance modeling of distributed and replicated databases. IEEE Transaction on Knowledge and Data Engineering 12, 4 (2000), 645--672. Google Scholar
Digital Library
- Oracle. 2011. Oracle Coherence. Available at http://www.oracle.com/technetwork/middleware/coherence/overview/index.html.Google Scholar
- Roberto Palmieri, Pierangelo di Sanzo, Francesco Quaglia, Paolo Romano, Sebastiano Peluso, and Diego Didona. 2011. Integrated monitoring of infrastructures and applications in cloud environments. In Proc. of the 2011 international conference on Parallel Processing (Euro-Par’11). Google Scholar
Digital Library
- Francisco Perez-Sorrosal, Marta Patiño Martinez, Ricardo Jimenez-Peris, and Bettina Kemme. 2011. Elastic SI-Cache: Consistent and scalable caching in multi-tier architectures. VLDB Journal 20, 6 (2011), 841--865. Google Scholar
Digital Library
- John Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann. Google Scholar
Digital Library
- John Ross Quinlan. 2012. Rulequest Cubist. Available at http://www.rulequest.com/cubist-info.html.Google Scholar
- Jing Fei Ren, Yutaka Tokahashi, and Toshiharu Hasegawa. 1996. Analysis of impact of network delay on multiversion conservative timestamp algorithms in DDBS. Performance Evaluation 26, 1 (1996), 21--50. Google Scholar
Digital Library
- Upendra Sharma, Prashant Shenoy, and Donald F. Towsley. 2012. Provisioning multi-tier cloud applications using statistical bounds on sojourn time. In Proc. of the International Conference on Autonomic Computing (ICAC’12). Google Scholar
Digital Library
- Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, and John Wilkes. 2011. CloudScale: Elastic resource scaling for multi-tenant cloud systems. In Proc. of the ACM Symposium on Cloud Computing (SOCC’11). Google Scholar
Digital Library
- Rahul Singh, Upendra Sharma, Emmanuel Cecchet, and Prashant Shenoy. 2010. Autonomic mix-aware provisioning for non-stationary data center workloads. In Proc. of the International conference on Autonomic computing (ICAC’10). Google Scholar
Digital Library
- Yong Chiang Tay, Nathan Goodman, and Rajan Suri. 1985. Locking performance in centralized databases. ACM Transactions on Database Systems 10, 4 (1985), 415--462. Google Scholar
Digital Library
- Alexander Thomasian. 1998. Concurrency control: Methods, performance, and analysis. ACM Computing Surveys 30, 1 (1998), 70--119. Google Scholar
Digital Library
- Steven K. Thompson. 2002. Sampling (3rd ed.). Wiley Desktop Editions.Google Scholar
- TPC Council. 2011. TPC-C Benchmark. Available at http://www.tpc.org/tpcc.Google Scholar
- Beth Trushkowsky, Peter Bodík, Armando Fox, Michael J. Franklin, Michael I. Jordan, and David A. Patterson. 2011. The SCADS director: Scaling a distributed storage system under stringent performance requirements. In Proc. of the Conference on File and Storage Technologies (FAST’11). Google Scholar
Digital Library
- Bhuvan Urgaonkar, Giovanni Pacifici, Prashant Shenoy, Mike Spreitzer, and Asser Tantawi. 2005. An analytical model for multi-tier internet services and its applications. SIGMETRICS Performance Evaluation Review 33, 1 (June 2005) 291--302. Google Scholar
Digital Library
- Chris Watkins and Peter Dayan. 1992. Q-learning. Machine Learning 8, 3 (1992), 279--292. Google Scholar
Digital Library
- Greg Welch and Gary Bishop. 1995. An Introduction to the Kalman Filter. Technical Report 95-041. Department of Computer Science, University of North Carolina at Chapel Hill. Google Scholar
Digital Library
- Jing Xu, Ming Zhao, José A. B. Fortes, Robert Carpenter, and Mazin S. Yousif. 2007. On the use of fuzzy modeling in virtualized data center management. In Proc. of the International Conference on Autonomic Computing (ICAC’07). Google Scholar
Digital Library
- Philip S. Yu, Daniel M. Dias, and Stephen S. Lavenberg. 1993. On the analytical modeling of database concurrency control. Journal of the ACM (JACM) 40, 4 (1993), 841--872. Google Scholar
Digital Library
- Bin Zhang and Meichun Hsu. 1995. Modeling performance impact of hot spots. In Performance of Concurrency Control Mechanisms in Centralized Database Systems., Vijay Kumar (Ed.), Prentice-Hall, 148--165. Google Scholar
Digital Library
- Qi Zhang, Ludmila Cherkasova, and Evgenia Smirni. 2007. A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In Proc. of the International Conference on Autonomic Computing (ICAC). Google Scholar
Digital Library
- Benjamin Zhu, Kai Li, and Hugo Patterson. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proc. of the Conference on File and Storage Technologies (FAST). Google Scholar
Digital Library
Index Terms
Transactional Auto Scaler: Elastic Scaling of Replicated In-Memory Transactional Data Grids
Recommendations
Transactional auto scaler: elastic scaling of in-memory transactional data grids
ICAC '12: Proceedings of the 9th international conference on Autonomic computingIn this paper we introduce TAS (Transactional Auto Scaler), a system for automating elastic-scaling of in-memory transactional data grids, such as NoSQL data stores or Distributed Transactional Memories. Applications of TAS range from on-line self-...
Stretching transactional memory
PLDI '09Transactional memory (TM) is an appealing abstraction for programming multi-core systems. Potential target applications for TM, such as business software and video games, are likely to involve complex data structures and large transactions, requiring ...
Transactional locking II
DISC'06: Proceedings of the 20th international conference on Distributed ComputingThe transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. This paper introduces the transactional locking II (TL2) algorithm, a software transactional memory (STM) ...






Comments