Abstract
Software transactional memory (STM) can lead to scalable implementations of concurrent programs, as the relative performance of an application increases with the number of threads that support it. However, the absolute performance is typically impaired by the overheads of transaction management and instrumented accesses to shared memory. This often leads STM-based programs with low thread counts to perform worse than a sequential, non-instrumented version of the same application.
In this paper, we propose FastLane, a new STM algorithm that bridges the performance gap between sequential execution and classical STM algorithms when running on few cores. FastLane seeks to reduce instrumentation costs and thus performance degradation in its target operation range. We introduce a novel algorithm that differentiates between two types of threads: One thread (the master) executes transactions pessimistically without ever aborting, thus with minimal instrumentation and management costs, while other threads (the helpers) can commit speculative transactions only when they do not conflict with the master. Helpers thus contribute to the application progress without impairing on the performance of the master.
We implement FastLane as an extension of a state-of-the-art STM runtime system and compiler. Multiple code paths are produced for execution on a single, few, and many cores. The runtime system selects the code path providing the best throughput, depending on the number of cores available on the target machine. Evaluation results indicate that our approach provides promising performance at low thread counts: FastLane almost systematically wins over a classical STM in the 1-6 threads range, and often performs better than sequential execution of the non-instrumented version of the same application starting with 2 threads.
- G. Blake, R. G. Dreslinski, T. Mudge, and K. Flautner. Evolution of thread-level parallelism in desktop applications. SIGARCH Comput. Archit. News, 38 (3): 302--313, 2010. Google Scholar
Digital Library
- D. Christie, J.-W. Chung, S. Diestelhorst, M. Hohmuth, M. Pohlack, C. Fetzer, M. Nowack, T. Riegel, P. Felber, P. Marlier, and E. Riviere. Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack. In Eurosys, Apr. 2010. Google Scholar
Digital Library
- C. Cortes, D. Pregibon, and C. Volinsky. Communities of interest. In Advances in Intelligent Data Analysis, IDA. Springer, 2001. Google Scholar
Digital Library
- Dalessandro, Dice, Scott, Shavit, and Spear}Dalessandro2010aL. Dalessandro, D. Dice, M. Scott, N. Shavit, and M. Spear. Transactional mutex locks. In Euro-Par, 2010. Google Scholar
Digital Library
- L. Dalessandro, M. F. Spear, and M. L. Scott. NOrec: Streamlining STM by abolishing ownership records. In PPoPP, 2010. Google Scholar
Digital Library
- D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In DISC, 2006. Google Scholar
Digital Library
- A. Dragojević, P. Felber, V. Gramoli, and R. Guerraoui. Why STM can be more than a research toy. CACM, 54 (4): 70--77, Apr. 2011. Google Scholar
Digital Library
- P. Felber, C. Fetzer, and T. Riegel. Dynamic performance tuning of word-based software transactional memory. In PPoPP, 2008. Google Scholar
Digital Library
- S. M. Fernandes and J. Cachopo. Lock-free and scalable multi-version software transactional memory. In PPoPP, 2011. Google Scholar
Digital Library
- R. Guerraoui and M. Kapalka. On the correctness of transactional memory. In PPoPP, 2008. Google Scholar
Digital Library
- T. Harris, J. Larus, and R. Rajwar. Transactional Memory, 2nd edition. Morgan and Claypool Publishers, December 2010. Google Scholar
Digital Library
- C. P. Kruskal, L. Rudolph, and M. Snir. Efficient synchronization of multiprocessors with shared memory. ACM Trans. Program. Lang. Syst., 10 (4): 579--601, Oct. 1988. Google Scholar
Digital Library
- W. Maldonado, P. Marlier, P. Felber, J. Lawall, G. Muller, and E. Riviere. Deadline-aware scheduling for software transactional memory. In DSN, 2011. Google Scholar
Digital Library
- A. Matveev and N. Shavit. Towards a fully pessimistic STM model. In TRANSACT, New Orleand, LA, USA, 2012.Google Scholar
- M. Mehrara, J. Hao, P.-C. Hsu, and S. Mahlke. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory. In PLDI, 2009. Google Scholar
Digital Library
- J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 9 (1): 21--65, Feb. 1991. Google Scholar
Digital Library
- C. C. Minh, J.-W. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In IISWC, 2008.Google Scholar
- C. E. Oancea, A. Mycroft, and T. Harris. A lightweight in-place implementation for software thread-level speculation. In SPAA, 2009. Google Scholar
Digital Library
- T. Riegel, P. Felber, and C. Fetzer. A lazy snapshot algorithm with eager validation. In DISC, 2006. Google Scholar
Digital Library
- T. Riegel, C. Fetzer, and P. Felber. Automatic data partitioning in software transactional memories. In SPAA, 2008. Google Scholar
Digital Library
- A. Roy, S. Hand, and T. Harris. A runtime system for software lock elision. In EuroSys, 2009. Google Scholar
Digital Library
- M. F. Spear. Lightweight, robust adaptivity for software transactional memory. In SPAA, 2010. Google Scholar
Digital Library
- M. F. Spear, L. Dalessandro, V. J. Marathe, and M. L. Scott. A comprehensive strategy for contention management in software transactional memory. In PPoPP, 2009. Google Scholar
Digital Library
- M. F. Spear, K. Kelsey, T. Bai, L. Dalessandro, M. L. Scott, C. Ding, and P. Wu. Fastpath speculative parallelization. In LCPC, 2009. Google Scholar
Digital Library
- J. Sreeram, R. Cledat, T. Kumar, and S. Pande. RSTM: A relaxed consistency software trans. memory for multicores. In PACT, 2007. Google Scholar
Digital Library
- T. Usui, R. Behrends, J. Evans, and Y. Smaragdakis. Adaptive locks: Combining transactions and locks for efficient concurrency. In PACT, 2009. Google Scholar
Digital Library
- J.-T. Wamhoff, T. Riegel, C. Fetzer, and P. Felber. RobuSTM: a robust software transactional memory. In SSS, 2010. Google Scholar
Digital Library
- C. Wang, W.-Y. Chen, Y. Wu, B. Saha, and A.-R. Adl-Tabatabai. Code generation and optimization for transactional memory constructs in an unmanaged language. In CGO, 2007. Google Scholar
Digital Library
- S. Weigert, M. Hiltunen, and C. Fetzer. Community-based analysis of netflow for early detection of security incidents. In LISA, 2011. Google Scholar
Digital Library
Index Terms
FastLane: improving performance of software transactional memory for low thread counts
Recommendations
FastLane: improving performance of software transactional memory for low thread counts
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingSoftware transactional memory (STM) can lead to scalable implementations of concurrent programs, as the relative performance of an application increases with the number of threads that support it. However, the absolute performance is typically impaired ...
Unbounded page-based transactional memory
Proceedings of the 2006 ASPLOS ConferenceExploiting thread level parallelism is paramount in the multicore era. Transactions enable programmers to expose such parallelism by greatly simplifying the multi-threaded programming model. Virtualized transactions (unbounded in space and time) are ...
Time-Based Software Transactional Memory
Software transactional memory (STM) is a concurrency control mechanism that is widely considered to be easier to use by programmers than other mechanisms such as locking. The first generations of STMs have either relied on visible read designs, which ...







Comments