Abstract
Although dynamic memory management accounts for a significant part of the execution time on many modern software systems, its impact on the performance of transactional memory systems has been mostly overlooked. In order to shed some light into this subject, this paper conducts a thorough investigation of the interplay between memory allocators and software transactional memory (STM) systems. We show that allocators can interfere with the way memory addresses are mapped to versioned locks on state-of-the-art software transactional memory implementations. Moreover, we observed that key aspects of allocators such as false sharing avoidance, scalability, and locality have a drastic impact on the final performance. For instance, we have detected performance differences of up to 171% in the STAMP applications when using distinct allocators. Moreover, we show that optimizations at the STM-level (such as caching transactional objects) are not effective when a modern allocator is already in use. All in all, our study highlights the importance of reporting the allocator utilized in the performance evaluation of transactional memory systems.
- E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. Hoard: A scalable memory allocator for multithreaded applications. ACM SIGPLAN Notices, 35(11):117–128, Nov. 2000. Google Scholar
Digital Library
- C. Cascaval, C. Blundell, M. Michael, H. W. Cain, P. Wu, S. Chiras, and S. Chatterjee. Software transactional memory: Why is it only a research toy? Communications of the ACM, 51(11):40–46, Nov. 2008. Google Scholar
Digital Library
- M. Castro, L. F. W. Goes, C. P. Ribeiro, M. Cole, M. Cintra, and J.-F. Mehaut. A machine learning-based approach for thread mapping on transactional memory applications. In Proceedings of the 2011 18th International Conference on High Performance Computing, pages 1–10, Dec. 2011. Google Scholar
Digital Library
- D. Christie, J.-W. Chung, S. Diestelhorst, M. Hohmuth, M. Pohlack, C. Fetzer, M. Nowack, T. Riegel, P. Felber, P. Marlier, and E. Riviere. Evaluation of AMD’s advanced synchronization facility within a complete transactional memory stack. In Proceedings of the 5th European Conference on Computer Systems, pages 27–40, Apr. 2010. Google Scholar
Digital Library
- D. Dice and A. Garthwaite. Mostly lock-free malloc. In Proceedings of the 3rd International Symposium on Memory Management, pages 163–174, June 2002. Google Scholar
Digital Library
- D. Dice, O. Shalev, and N. Shavit. Transactional Locking II. In 20th International Symposium on Distributed Computing, pages 194–208, Sept. 2006. Google Scholar
Digital Library
- A. Dragojevic, R. Guerraoui, and M. Kapalka. Stretching transactional memory. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 155–165, June 2009. Google Scholar
Digital Library
- A. Dragojevic, P. Felber, V. Gramoli, and R. Guerraoui. Why STM can be more than a research toy. Communications of the ACM, 54(4): 70–77, Apr. 2011. Google Scholar
Digital Library
- A. Dragojevic, M. Herlihy, Y. Lev, and M. Moir. On the power of hardware transactional memory to simplify memory management. In Proceedings of the 30th Annual Symposium on Principles of Distributed Computing, pages 99–108, June 2011. Google Scholar
Digital Library
- P. Felber, C. Fetzer, and T. Riegel. Dynamic performance tuning of word-based software transactional memory. In Proceedings of the 13th Symposium on Principles and Practice of Parallel Programming, pages 237–246, Feb. 2008. Google Scholar
Digital Library
- S. Ghemawat and P. Menage. TCMalloc : Thread-caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html. {Last accessed November, 2013}.Google Scholar
- W. Gloger. Dynamic memory allocator implementations in Linux system libraries. In Internationaler Linux Kongreß’ in Würzburg, May 1997.Google Scholar
- J. E. Gottschlich and D. A. Connors. DracoSTM: A practical C++ approach to software transactional memory. In Proceedings of the 2007 Symposium on Library-Centric Software Design, pages 52–66, Oct. 2007. Google Scholar
Digital Library
- T. Harris, J. Larus, and R. Rajwar. Transactional Memory. Morgan & Claypool Publishers, 2 edition, June 2010. Google Scholar
- M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer. Software transactional memory for dynamic-sized data structures. In Proceedings of the 22nd Annual Symposium on Principles of Distributed Computing, pages 92–101, July 2003. Google Scholar
Digital Library
- R. L. Hudson, B. Saha, A.-R. Adl-Tabatabai, and B. C. Hertzberg. McRT-malloc: A scalable transactional memory allocator. In Proceedings of the 2006 International Symposium on Memory Management, pages 74–83, June 2006. Google Scholar
Digital Library
- Intel R Architecture Instruction Set Extensions Programming Reference. Intel Corporation, Feb. 2012.Google Scholar
- C. Jacobi, T. Slegel, and D. Greiner. Transactional memory architecture and implementation for IBM system z. In Proceedings of the 45th ACM/IEEE International Symposium on Microarchitecture, pages 25– 36, Dec. 2012. Google Scholar
Digital Library
- A. Kukanov and M. J. Voss. The foundations for scalable multi-core software in Intel R threading building blocks. Intel Tecnology Journal, 11(4):309–322, Nov. 2007.Google Scholar
Cross Ref
- D. Lea. A memory allocator. http://gee.cs.oswego.edu/dl/html/malloc.html.Google Scholar
- S. Mannarswamy and R. Govindarajan. Making STMs cache friendly with compiler transformations. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques, pages 232–242, Oct. 2011. Google Scholar
Digital Library
- S. S. Mannarswamy and R. Govindarajan. Variable granularity access tracking scheme for improving the performance of software transactional memory. In Proceedings of the International Symposium on Parallel and Distributed Processing, pages 455–466, May 2011. Google Scholar
Digital Library
- M. M. Michael. Scalable lock-free dynamic memory allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 35–46, June 2004. Google Scholar
Digital Library
- C. C. Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In Proceedings of the IEEE International Symposium on Workload Characterization, pages 35–46, Sept. 2008.Google Scholar
- T. Riegel. Software Transactional Memory Building Blocks. PhD thesis, Technischen Universitat Dresden, May 2013.Google Scholar
- W. Ruan, Y. Liu, C. Wang, and M. Spear. On the platform specificity of STM instrumentation mechanisms. In Proceedings of the International Symposium on Code Generation and Optimization, pages 1–10, Feb. 2013. Google Scholar
Digital Library
- B. Saha, A.-R. Adl-Tabatabai, A. Ghuloum, M. Rajagopalan, R. L. Hudson, L. Petersen, V. Menon, B. Murphy, T. Shpeisman, E. Sprangle, A. Rohillah, D. Carmean, and J. Fang. Enabling scalability and performance in a large scale CMP environment. In Proceedings of the 2nd European Conference on Computer Systems, pages 73–86, Mar. 2007. Google Scholar
Digital Library
- S. Seo, J. Kim, and J. Lee. SFMalloc: A lock-free and mostly synchronization-free dynamic memory allocator for manycores. In Proceedings of the 20th International Conference on Parallel Architectures and Compilation Techniques, pages 253–263, Oct. 2011. Google Scholar
Digital Library
- D. Terpstra, H. Jagode, H. You, and J. Dongarra. Collecting performance data with PAPI-C. In M. S. Müller, M. M. Resch, A. Schulz, and W. E. Nagel, editors, Tools for High Performance Computing 2009, pages 157–173. Springer Berlin Heidelberg, 2010.Google Scholar
- D. Tiwari, S. Lee, J. Tuck, and D. Solihin. MMT:exploiting fine-grained parallelism in dynamic memory management. In Proceedings of the International Symposium on Parallel and Distributed Processing, pages 1–12, Apr. 2010.Google Scholar
- A. Wang, M. Gaudet, P. Wu, J. N. Amaral, M. Ohmacht, C. Barton, R. Silvera, and M. Michael. Evaluation of Blue Gene/Q hardware support for transactional memories. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pages 127–136, Sept. 2012. Google Scholar
Digital Library
- P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles. Dynamic storage allocation: A survey and critical review. In Proceedings of the International Workshop on Memory Management, pages 1–116, 1995. Google Scholar
Digital Library
- R. M. Yoo, Y. Ni, A. Welc, B. Saha, A.-R. Adl-Tabatabai, and H.-H. S. Lee. Kicking the tires of software transactional memory: Why the going gets tough. In Proceedings of the 20th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 265–274, June 2008. Google Scholar
Digital Library
Index Terms
Performance implications of dynamic memory allocators on transactional memory systems
Recommendations
Performance implications of dynamic memory allocators on transactional memory systems
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingAlthough dynamic memory management accounts for a significant part of the execution time on many modern software systems, its impact on the performance of transactional memory systems has been mostly overlooked. In order to shed some light into this ...
Transactional memory: from semantics to silicon
IWMSE '08: Proceedings of the 1st international workshop on Multicore software engineeringMulti-core architectures bring parallel programming into the mainstream. Parallel programming poses many new challenges to the developer, one of which is synchronizing concurrent access to shared memory by multiple threads. Programmers have ...
Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing SystemsThe non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...






Comments