skip to main content
research-article

Performance implications of dynamic memory allocators on transactional memory systems

Published:24 January 2015Publication History
Skip Abstract Section

Abstract

Although dynamic memory management accounts for a significant part of the execution time on many modern software systems, its impact on the performance of transactional memory systems has been mostly overlooked. In order to shed some light into this subject, this paper conducts a thorough investigation of the interplay between memory allocators and software transactional memory (STM) systems. We show that allocators can interfere with the way memory addresses are mapped to versioned locks on state-of-the-art software transactional memory implementations. Moreover, we observed that key aspects of allocators such as false sharing avoidance, scalability, and locality have a drastic impact on the final performance. For instance, we have detected performance differences of up to 171% in the STAMP applications when using distinct allocators. Moreover, we show that optimizations at the STM-level (such as caching transactional objects) are not effective when a modern allocator is already in use. All in all, our study highlights the importance of reporting the allocator utilized in the performance evaluation of transactional memory systems.

References

  1. E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: A scalable memory allocator for multithreaded applications. ACM SIGPLAN Notices, 35(11):117–128, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software transactional memory: Why is it only a research toy? Communications of the ACM, 51(11):40–46, Nov. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Castro, L. F. W. Goes, C. P. Ribeiro, M. Cole, M. Cintra, and J.-F. Mehaut. A machine learning-based approach for thread mapping on transactional memory applications. In Proceedings of the 2011 18th International Conference on High Performance Computing, pages 1–10, Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Christie, J.-W. Chung, S. Diestelhorst, M. Hohmuth, M. Pohlack, C. Fetzer, M. Nowack, T. Riegel, P. Felber, P. Marlier, and E. Riviere. Evaluation of AMD’s advanced synchronization facility within a complete transactional memory stack. In Proceedings of the 5th European Conference on Computer Systems, pages 27–40, Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Dice and A. Garthwaite. Mostly lock-free malloc. In Proceedings of the 3rd International Symposium on Memory Management, pages 163–174, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Dice, O. Shalev, and N. Shavit. Transactional Locking II. In 20th International Symposium on Distributed Computing, pages 194–208, Sept. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Dragojevic, R. Guerraoui, and M. Kapalka. Stretching transactional memory. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 155–165, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Dragojevic, P. Felber, V. Gramoli, and R. Guerraoui. Why STM can be more than a research toy. Communications of the ACM, 54(4): 70–77, Apr. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Dragojevic, M. Herlihy, Y. Lev, and M. Moir. On the power of hardware transactional memory to simplify memory management. In Proceedings of the 30th Annual Symposium on Principles of Distributed Computing, pages 99–108, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Felber, C. Fetzer, and T. Riegel. Dynamic performance tuning of word-based software transactional memory. In Proceedings of the 13th Symposium on Principles and Practice of Parallel Programming, pages 237–246, Feb. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Ghemawat and P. Menage. TCMalloc : Thread-caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html. {Last accessed November, 2013}.Google ScholarGoogle Scholar
  12. W. Gloger. Dynamic memory allocator implementations in Linux system libraries. In Internationaler Linux Kongreß’ in Würzburg, May 1997.Google ScholarGoogle Scholar
  13. J. E. Gottschlich and D. A. Connors. DracoSTM: A practical C++ approach to software transactional memory. In Proceedings of the 2007 Symposium on Library-Centric Software Design, pages 52–66, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Harris, J. Larus, and R. Rajwar. Transactional Memory. Morgan & Claypool Publishers, 2 edition, June 2010. Google ScholarGoogle Scholar
  15. M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer. Software transactional memory for dynamic-sized data structures. In Proceedings of the 22nd Annual Symposium on Principles of Distributed Computing, pages 92–101, July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. L. Hudson, B. Saha, A.-R. Adl-Tabatabai, and B. C. Hertzberg. McRT-malloc: A scalable transactional memory allocator. In Proceedings of the 2006 International Symposium on Memory Management, pages 74–83, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Intel R Architecture Instruction Set Extensions Programming Reference. Intel Corporation, Feb. 2012.Google ScholarGoogle Scholar
  18. C. Jacobi, T. Slegel, and D. Greiner. Transactional memory architecture and implementation for IBM system z. In Proceedings of the 45th ACM/IEEE International Symposium on Microarchitecture, pages 25– 36, Dec. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Kukanov and M. J. Voss. The foundations for scalable multi-core software in Intel R threading building blocks. Intel Tecnology Journal, 11(4):309–322, Nov. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  20. D. Lea. A memory allocator. http://gee.cs.oswego.edu/dl/html/malloc.html.Google ScholarGoogle Scholar
  21. S. Mannarswamy and R. Govindarajan. Making STMs cache friendly with compiler transformations. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques, pages 232–242, Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. S. Mannarswamy and R. Govindarajan. Variable granularity access tracking scheme for improving the performance of software transactional memory. In Proceedings of the International Symposium on Parallel and Distributed Processing, pages 455–466, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. M. Michael. Scalable lock-free dynamic memory allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 35–46, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In Proceedings of the IEEE International Symposium on Workload Characterization, pages 35–46, Sept. 2008.Google ScholarGoogle Scholar
  25. T. Riegel. Software Transactional Memory Building Blocks. PhD thesis, Technischen Universitat Dresden, May 2013.Google ScholarGoogle Scholar
  26. W. Ruan, Y. Liu, C. Wang, and M. Spear. On the platform specificity of STM instrumentation mechanisms. In Proceedings of the International Symposium on Code Generation and Optimization, pages 1–10, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Saha, A.-R. Adl-Tabatabai, A. Ghuloum, M. Rajagopalan, R. L. Hudson, L. Petersen, V. Menon, B. Murphy, T. Shpeisman, E. Sprangle, A. Rohillah, D. Carmean, and J. Fang. Enabling scalability and performance in a large scale CMP environment. In Proceedings of the 2nd European Conference on Computer Systems, pages 73–86, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Seo, J. Kim, and J. Lee. SFMalloc: A lock-free and mostly synchronization-free dynamic memory allocator for manycores. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques, pages 253–263, Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Terpstra, H. Jagode, H. You, and J. Dongarra. Collecting performance data with PAPI-C. In M. S. Müller, M. M. Resch, A. Schulz, and W. E. Nagel, editors, Tools for High Performance Computing 2009, pages 157–173. Springer Berlin Heidelberg, 2010.Google ScholarGoogle Scholar
  30. D. Tiwari, S. Lee, J. Tuck, and D. Solihin. MMT:exploiting fine-grained parallelism in dynamic memory management. In Proceedings of the International Symposium on Parallel and Distributed Processing, pages 1–12, Apr. 2010.Google ScholarGoogle Scholar
  31. A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of Blue Gene/Q hardware support for transactional memories. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pages 127–136, Sept. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles. Dynamic storage allocation: A survey and critical review. In Proceedings of the International Workshop on Memory Management, pages 1–116, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. M. Yoo, Y. Ni, A. Welc, B. Saha, A.-R. Adl-Tabatabai, and H.-H. S. Lee. Kicking the tires of software transactional memory: Why the going gets tough. In Proceedings of the 20th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 265–274, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Performance implications of dynamic memory allocators on transactional memory systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 50, Issue 8
        PPoPP '15
        August 2015
        290 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2858788
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
          January 2015
          290 pages
          ISBN:9781450332057
          DOI:10.1145/2688500

        Copyright © 2015 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 January 2015

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!