skip to main content
research-article

ESTIMA: extrapolating scalability of in-memory applications

Published:27 February 2016Publication History
Skip Abstract Section

Abstract

This paper presents ESTIMA, an easy-to-use tool for extrapolating the scalability of in-memory applications. ESTIMA is designed to perform a simple, yet important task: given the performance of an application on a small machine with a handful of cores, ESTIMA extrapolates its scalability to a larger machine with more cores, while requiring minimum input from the user. The key idea underlying ESTIMA is the use of stalled cycles (e.g. cycles that the processor spends waiting for various events, such as cache misses or waiting on a lock). ESTIMA measures stalled cycles on a few cores and extrapolates them to more cores, estimating the amount of waiting in the system. ESTIMA can be effectively used to predict the scalability of in-memory applications. For instance, using measurements of memcached and SQLite on a desktop machine, we obtain accurate predictions of their scalability on a server. Our extensive evaluation on a large number of in-memory benchmarks shows that ESTIMA has generally low prediction errors.

References

  1. N. I. Akhiezer. Theory of Approximation. Dover Publications, 1992.Google ScholarGoogle Scholar
  2. A. R. Alameldeen and D. A. Wood. Ipc considered harmful for multiprocessor workloads. IEEE Micro, 26(4), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AMD. BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors, 2010.Google ScholarGoogle Scholar
  4. B. J. Barnes, B. Rountree, D. K. Lowenthal, J. Reeves, B. de Supinski, and M. Schulz. A regression-based approach to scalability prediction. ICS '08. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Carrington, A. Snavely, and N. Wolter. A performance prediction framework for scientific applications. Future Generation Computer Systems, 22(3), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Coarfa, J. Mellor-Crummey, N. Froyd, and Y. Dotsenko. Scalability analysis of spmd codes using expectations. ICS '07. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. E. Crovella and T. J. LeBlanc. Parallel performance prediction using lost cycles analysis. IEEE Computer Society Press, 1994.Google ScholarGoogle Scholar
  9. C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: Sql server's memory-optimized oltp engine. SIGMOD '13. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Dragojević, P. Felber, V. Gramoli, and R. Guerraoui. Why stm can be more than a research toy. Commun. ACM, 54(4), Apr. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Dragojević, R. Guerraoui, and M. Kapalka. Stretching transactional memory. In PLDI '09, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Fan, D. G. Andersen, and M. Kaminsky. Memc3: Compact and concurrent memcache with dumber caching and smarter hashing. NSDI'13. USENIX Association, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In ASPLOS '12. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Haagdorens, T. Vermeiren, and M. Goossens. Improving the performance of signature-based network intrusion detection sensors by multi-threading. In WISA'04, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Heindl and G. Pokam. An analytic framework for performance modeling of software transactional memory. Computer Networks, 53(8), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Heindl and G. Pokam. An analytic model for optimistic stm with lazy locking. In Analytical and Stochastic Modeling Techniques and Applications. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Heindl, G. Pokam, and A.-R. Adl-Tabatabai. An analytic model of optimistic software transactional memory. In ISPASS 2009. IEEE, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA '93. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Hoisie, O. Lubeck, and H. Wasserman. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. International Journal of High Performance Computing Applications, 14(4), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. I. Intel. and ia-32 architectures software developers manual volume 3b: System programming guide. Part, 1, 2007.Google ScholarGoogle Scholar
  21. E. Ipek, B. R. De Supinski, M. Schulz, and S. A. McKee. An approach to performance prediction for parallel applications. In Euro-Par 2005 Parallel Processing. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Jain. The Art of Computer Systems Performance Analysis: techniques for experimental design, measurement, simulation, and modeling. Wiley, 1991.Google ScholarGoogle Scholar
  23. V. Jiménez, F. J. Cazorla, R. Gioiosa, M. Valero, C. Boneti, E. Kursun, C.-Y. Cher, C. Isci, A. Buyuktosunoglu, and P. Bose. Power and thermal characterization of power6 system. In PACT '10. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In SC2001. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. C. Lee, D. M. Brooks, B. R. De Supinski, M. Schulz, K. Singh, and S. A. McKee. Methods of inference and learning for performance modeling of parallel applications. In PPoPP '07. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. Mica: A holistic approach to fast in-memory key-value storage. NSDI'14. USENIX Association, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Marin and J. Mellor-Crummey. Cross-architecture performance predictions for scientific applications using parameterized models. In ACM SIGMETRICS Performance Evaluation Review, volume 32. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In IISWC, 2008.Google ScholarGoogle Scholar
  29. R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford, T. Tung, and V. Venkataramani. Scaling memcache at facebook. NSDI'13. USENIX Association, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. R. Nudd, D. J. Kerbyson, E. Papaefstathiou, S. C. Perry, J. S. Harper, and D. V. Wilcox. PACEa toolset for the performance prediction of parallel and distributed systems. International Journal of High Performance Computing Applications, 14(3), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. R. M. Olschanowsky. Hpc Application Address Stream Compression, Replay and Scaling. PhD thesis, La Jolla, CA, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. A. Petri. Communication with automata, new york: Griffiss air force base. Technical report, Tech. Rep. RADC-TR-65-377, 1966.Google ScholarGoogle Scholar
  33. J. R. Phillips. Zunzun.com. http://www.zunzun.com.Google ScholarGoogle Scholar
  34. D. E. Porter and E. Witchel. Understanding transactional memory performance. In ISPASS 2010. IEEE, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  35. J. Ruppert. A delaunay refinement algorithm for quality 2-dimensional mesh generation. Journal of algorithms, 18(3), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. N. Shavit and D. Touitou. Software transactional memory. In PODC '95. ACM, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. K. Singh, M. Bhadauria, and S. A. McKee. Real time power estimation and thread scheduling via performance counters. SIGARCH Comput. Archit. News, 37(2), July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Torrellas, Y. Solihin, and V. Lam. Scal-tool: Pinpointing and quantifying scalability bottlenecks in dsm multiprocessors. In Supercomputing, ACM/IEEE 1999 Conference, Nov 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. Usui, R. Behrends, J. Evans, and Y. Smaragdakis. Adaptive locks: Combining transactions and locks for efficient concurrency. In PACT '09. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. Vega, A. Buyuktosunoglu, and P. Bose. Smt-centric power-aware thread placement in chip multiprocessors. In PACT '13. IEEE Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R. West, P. Zaroo, C. A. Waldspurger, and X. Zhang. Online cache modeling for commodity multicore processors. In PACT '10. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. L. T. Yang, X. Ma, and F. Mueller. Cross-platform performance prediction of parallel applications using partial execution. In SC2005. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Yasin. A top-down method for performance analysis and counters architecture. In Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on, March 2014.Google ScholarGoogle ScholarCross RefCross Ref
  44. J. Zhai, W. Chen, and W. Zheng. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. In ACM Sigplan Notices, volume 45. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 51, Issue 8
    PPoPP '16
    August 2016
    405 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3016078
    Issue’s Table of Contents
    • cover image ACM Conferences
      PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
      February 2016
      420 pages
      ISBN:9781450340922
      DOI:10.1145/2851141

    Copyright © 2016 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 February 2016

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!