Abstract
This paper presents ESTIMA, an easy-to-use tool for extrapolating the scalability of in-memory applications. ESTIMA is designed to perform a simple, yet important task: given the performance of an application on a small machine with a handful of cores, ESTIMA extrapolates its scalability to a larger machine with more cores, while requiring minimum input from the user. The key idea underlying ESTIMA is the use of stalled cycles (e.g. cycles that the processor spends waiting for various events, such as cache misses or waiting on a lock). ESTIMA measures stalled cycles on a few cores and extrapolates them to more cores, estimating the amount of waiting in the system. ESTIMA can be effectively used to predict the scalability of in-memory applications. For instance, using measurements of memcached and SQLite on a desktop machine, we obtain accurate predictions of their scalability on a server. Our extensive evaluation on a large number of in-memory benchmarks shows that ESTIMA has generally low prediction errors.
- N. I. Akhiezer. Theory of Approximation. Dover Publications, 1992.Google Scholar
- A. R. Alameldeen and D. A. Wood. Ipc considered harmful for multiprocessor workloads. IEEE Micro, 26(4), 2006. Google Scholar
Digital Library
- AMD. BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors, 2010.Google Scholar
- B. J. Barnes, B. Rountree, D. K. Lowenthal, J. Reeves, B. de Supinski, and M. Schulz. A regression-based approach to scalability prediction. ICS '08. ACM, 2008. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT, 2008. Google Scholar
Digital Library
- L. Carrington, A. Snavely, and N. Wolter. A performance prediction framework for scientific applications. Future Generation Computer Systems, 22(3), 2006. Google Scholar
Digital Library
- C. Coarfa, J. Mellor-Crummey, N. Froyd, and Y. Dotsenko. Scalability analysis of spmd codes using expectations. ICS '07. ACM, 2007. Google Scholar
Digital Library
- M. E. Crovella and T. J. LeBlanc. Parallel performance prediction using lost cycles analysis. IEEE Computer Society Press, 1994.Google Scholar
- C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: Sql server's memory-optimized oltp engine. SIGMOD '13. ACM, 2013. Google Scholar
Digital Library
- A. Dragojević, P. Felber, V. Gramoli, and R. Guerraoui. Why stm can be more than a research toy. Commun. ACM, 54(4), Apr. 2011. Google Scholar
Digital Library
- A. Dragojević, R. Guerraoui, and M. Kapalka. Stretching transactional memory. In PLDI '09, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- B. Fan, D. G. Andersen, and M. Kaminsky. Memc3: Compact and concurrent memcache with dumber caching and smarter hashing. NSDI'13. USENIX Association, 2013. Google Scholar
Digital Library
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In ASPLOS '12. ACM, 2012. Google Scholar
Digital Library
- B. Haagdorens, T. Vermeiren, and M. Goossens. Improving the performance of signature-based network intrusion detection sensors by multi-threading. In WISA'04, 2005. Google Scholar
Digital Library
- A. Heindl and G. Pokam. An analytic framework for performance modeling of software transactional memory. Computer Networks, 53(8), 2009. Google Scholar
Digital Library
- A. Heindl and G. Pokam. An analytic model for optimistic stm with lazy locking. In Analytical and Stochastic Modeling Techniques and Applications. Springer, 2009. Google Scholar
Digital Library
- A. Heindl, G. Pokam, and A.-R. Adl-Tabatabai. An analytic model of optimistic software transactional memory. In ISPASS 2009. IEEE, 2009.Google Scholar
Cross Ref
- M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA '93. ACM, 1993. Google Scholar
Digital Library
- A. Hoisie, O. Lubeck, and H. Wasserman. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. International Journal of High Performance Computing Applications, 14(4), 2000. Google Scholar
Digital Library
- I. Intel. and ia-32 architectures software developers manual volume 3b: System programming guide. Part, 1, 2007.Google Scholar
- E. Ipek, B. R. De Supinski, M. Schulz, and S. A. McKee. An approach to performance prediction for parallel applications. In Euro-Par 2005 Parallel Processing. Springer, 2005. Google Scholar
Digital Library
- R. Jain. The Art of Computer Systems Performance Analysis: techniques for experimental design, measurement, simulation, and modeling. Wiley, 1991.Google Scholar
- V. Jiménez, F. J. Cazorla, R. Gioiosa, M. Valero, C. Boneti, E. Kursun, C.-Y. Cher, C. Isci, A. Buyuktosunoglu, and P. Bose. Power and thermal characterization of power6 system. In PACT '10. ACM, 2010. Google Scholar
Digital Library
- D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC2001. ACM, 2001. Google Scholar
Digital Library
- B. C. Lee, D. M. Brooks, B. R. De Supinski, M. Schulz, K. Singh, and S. A. McKee. Methods of inference and learning for performance modeling of parallel applications. In PPoPP '07. ACM, 2007. Google Scholar
Digital Library
- H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. Mica: A holistic approach to fast in-memory key-value storage. NSDI'14. USENIX Association, 2014. Google Scholar
Digital Library
- G. Marin and J. Mellor-Crummey. Cross-architecture performance predictions for scientific applications using parameterized models. In ACM SIGMETRICS Performance Evaluation Review, volume 32. ACM, 2004. Google Scholar
Digital Library
- C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In IISWC, 2008.Google Scholar
- R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling memcache at facebook. NSDI'13. USENIX Association, 2013. Google Scholar
Digital Library
- G. R. Nudd, D. J. Kerbyson, E. Papaefstathiou, S. C. Perry, J. S. Harper, and D. V. Wilcox. PACEa toolset for the performance prediction of parallel and distributed systems. International Journal of High Performance Computing Applications, 14(3), 2000. Google Scholar
Digital Library
- C. R. M. Olschanowsky. Hpc Application Address Stream Compression, Replay and Scaling. PhD thesis, La Jolla, CA, USA, 2011. Google Scholar
Digital Library
- C. A. Petri. Communication with automata, new york: Griffiss air force base. Technical report, Tech. Rep. RADC-TR-65-377, 1966.Google Scholar
- J. R. Phillips. Zunzun.com. http://www.zunzun.com.Google Scholar
- D. E. Porter and E. Witchel. Understanding transactional memory performance. In ISPASS 2010. IEEE, 2010.Google Scholar
Cross Ref
- J. Ruppert. A delaunay refinement algorithm for quality 2-dimensional mesh generation. Journal of algorithms, 18(3), 1995. Google Scholar
Digital Library
- N. Shavit and D. Touitou. Software transactional memory. In PODC '95. ACM, 1995. Google Scholar
Digital Library
- K. Singh, M. Bhadauria, and S. A. McKee. Real time power estimation and thread scheduling via performance counters. SIGARCH Comput. Archit. News, 37(2), July 2009. Google Scholar
Digital Library
- J. Torrellas, Y. Solihin, and V. Lam. Scal-tool: Pinpointing and quantifying scalability bottlenecks in dsm multiprocessors. In Supercomputing, ACM/IEEE 1999 Conference, Nov 1999. Google Scholar
Digital Library
- T. Usui, R. Behrends, J. Evans, and Y. Smaragdakis. Adaptive locks: Combining transactions and locks for efficient concurrency. In PACT '09. IEEE Computer Society, 2009. Google Scholar
Digital Library
- A. Vega, A. Buyuktosunoglu, and P. Bose. Smt-centric power-aware thread placement in chip multiprocessors. In PACT '13. IEEE Press, 2013. Google Scholar
Digital Library
- R. West, P. Zaroo, C. A. Waldspurger, and X. Zhang. Online cache modeling for commodity multicore processors. In PACT '10. ACM, 2010. Google Scholar
Digital Library
- L. T. Yang, X. Ma, and F. Mueller. Cross-platform performance prediction of parallel applications using partial execution. In SC2005. IEEE, 2005. Google Scholar
Digital Library
- A. Yasin. A top-down method for performance analysis and counters architecture. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, March 2014.Google Scholar
Cross Ref
- J. Zhai, W. Chen, and W. Zheng. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. In ACM Sigplan Notices, volume 45. ACM, 2010. Google Scholar
Digital Library
Recommendations
ESTIMA: Extrapolating ScalabiliTy of In-Memory Applications
Special Issue: Invited papers from PPoPP 2016, Part 2This article presents estima, an easy-to-use tool for extrapolating the scalability of in-memory applications. estima is designed to perform a simple yet important task: Given the performance of an application on a small machine with a handful of cores, ...
ESTIMA: extrapolating scalability of in-memory applications
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingThis paper presents ESTIMA, an easy-to-use tool for extrapolating the scalability of in-memory applications. ESTIMA is designed to perform a simple, yet important task: given the performance of an application on a small machine with a handful of cores, ...
Design and Optimization of Large Size and Low Overhead Off-Chip Caches
Large off-chip L3 caches can significantly improve the performance of memory-intensive applications. However, conventional L3 SRAM caches are facing two issues as those applications require increasingly large caches. First, an SRAM cache has a limited ...






Comments