Abstract
The ubiquity of multicore processors has led programmers to write parallel and concurrent applications to take advantage of the underlying hardware and speed up their executions. In this context, Transactional Memory (TM) has emerged as a simple and effective synchronization paradigm, via the familiar abstraction of atomic transactions.
After many years of intense research, major processor manufacturers (including Intel) have recently released mainstream processors with hardware support for TM (HTM).
In this work, we study a relevant issue with great impact on the performance of HTM. Due to the optimistic and inherently limited nature of HTM, transactions may have to be aborted and restarted numerous times, without any progress guarantee. As a result, it is up to the software library that regulates the HTM usage to ensure progress and optimize performance. Transaction scheduling is probably one of the most well-studied and effective techniques to achieve these goals.
However, these recent mainstream HTMs have some technical limitations that prevent the adoption of known scheduling techniques: unlike software implementations of TM used in the past, existing HTMs provide limited or no information on which memory regions or contending transactions caused the abort.
To address this crucial issue for HTMs, we propose Seer, a software scheduler that addresses precisely this restriction of HTM by leveraging on an online probabilistic inference technique that identifies the most likely conflict relations and establishes a dynamic locking scheme to serialize transactions in a fine-grained manner. The key idea of our solution is to constrain the portions of parallelism that are affecting negatively the whole system. As a result, this not only prevents performance reduction but also in fact unveils further scalability and performance for HTM. Via an extensive evaluation study, we show that Seer improves the performance of the Intel’s HTM by up to 3.6×, and by 65% on average across all concurrency degrees and benchmarks on a large processor with 28 cores.
- Allon Adir, Dave Goodman, Daniel Hershcovich, Oz Hershkovitz, Bryan Hickerson, Karen Holtz, Wisam Kadry, Anatoly Koyfman, John Ludden, Charles Meissner, Amir Nahir, Randall R. Pratt, Mike Schiffli, Brett St. Onge, Brian Thompto, Elena Tsanko, and Avi Ziv. 2014. Verification of transactional memory in POWER8. In Proceedings of the Annual Design Automation Conference (DAC’14). Article 58, 6 pages. Google Scholar
Digital Library
- Yehuda Afek, Amir Levy, and Adam Morrison. 2014. Software-improved hardware lock elision. In Proceedings of the Symposium on Principles of Distributed Computing (PODC’14). Google Scholar
Digital Library
- Mohammad Ansari, Behram Khan, Mikel Luján, Christos Kotselidis, Chris C. Kirkham, and Ian Watson. 2010. Improving performance by reducing aborts in hardware transactional memory. In Proceedings of the Conference on High Performance Embedded Architectures and Compilers (HiPEAC’10). 35--49. Google Scholar
Digital Library
- Mohammad Ansari, Mikel Luján, Christos Kotselidis, Kim Jarvis, Chris C. Kirkham, and Ian Watson. 2009. Steal-on-abort: Improving transactional memory performance through dynamic transaction reordering. In Proceedings of the Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09). 4--18. Google Scholar
Digital Library
- Harold Cain, Maged Michael, Brad Frey, Cathy May, Derek Williams, and Hung Le. 2013. Robust architectural support for transactional memory in the power architecture. In Proceedings of the International Symposium on Computer Architecture (ISCA’13). 225--236. Google Scholar
Digital Library
- Dave Christie, Jae-Woong Chung, Stephan Diestelhorst, Michael Hohmuth, Martin Pohlack, Christof Fetzer, Martin Nowack, Torvald Riegel, Pascal Felber, Patrick Marlier, and Etienne Rivière. 2010. Evaluation of AMD’s advanced synchronization facility within a complete transactional memory stack. In Proceedings of the European Conference on Computer Systems (EuroSys’10). 27--40. Google Scholar
Digital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the Symposium on Cloud Computing (SoCC’10). 143--154. Google Scholar
Digital Library
- Peter Damron, Alexandra Fedorova, Yossi Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum. 2006. Hybrid transactional memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’06). 336--346. Google Scholar
Digital Library
- Dave Dice, Maurice Herlihy, Doug Lea, Yossi Lev, Victor Luchangco, Wayne Mesard, Mark Moir, Kevin Moore, and Dan Nussbaum. 2008. Applications of the adaptive transactional memory test platform. In 3rd Workshop on Transactional Computing (TRANSACT’08).Google Scholar
- Dave Dice, Alex Kogan, Yossi Lev, Timothy Merrifield, and Mark Moir. 2014. Adaptive integration of hardware and software lock elision techniques. In Proceedings of the Symposium on Parallelism in Algorithms and Architectures (SPAA’14). 188--197. Google Scholar
Digital Library
- Diego Didona, Nuno Diegues, Anne-Marie Kermarrec, Rachid Guerraoui, Ricardo Neves, and Paolo Romano. 2016. ProteusTM: Abstraction meets performance in transactional memory. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’16). 757--771. Google Scholar
Digital Library
- Nuno Diegues and Paolo Romano. 2015. Self-tuning Intel restricted transactional memory. In Elsevier Parallel Computing. Google Scholar
Digital Library
- Nuno Diegues, Paolo Romano, and Stoyan Garbatov. 2015. Seer: Probabilistic scheduling for hardware transactional memory. In Proceedings of the Symposium on Parallelism in Algorithms and Architectures (SPAA’15). 224--233. Google Scholar
Digital Library
- Nuno Diegues, Paolo Romano, and Luis Rodrigues. 2014. Virtues and limitations of commodity hardware transactional memory. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’14). 3--14. Google Scholar
Digital Library
- Shlomi Dolev, Danny Hendler, and Adi Suissa. 2008. CAR-STM: Scheduling-based collision avoidance and resolution for software transactional memory. In Proceedings of the Symposium on Principles of Distributed Computing (PODC’08). 125--134. Google Scholar
Digital Library
- Aleksandar Dragojević, Rachid Guerraoui, and Michal Kapalka. 2009a. Stretching transactional memory. In Proceedings of the Conference on Programming Language Design and Implementation (PLDI’09). 155--165. Google Scholar
Digital Library
- Aleksandar Dragojević, Rachid Guerraoui, Anmol V. Singh, and Vasu Singh. 2009b. Preventing versus curing: Avoiding conflicts in transactional memories. In Proceedings of the Symposium on Principles of Distributed Computing (PODC’09). 7--16. Google Scholar
Digital Library
- Pascal Felber, Christof Fetzer, and Torvald Riegel. 2008. Dynamic performance tuning of word-based software transactional memory. In Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP’08). 237--246. Google Scholar
Digital Library
- Sergio Miguel Fernandes and Joao Cachopo. 2011. Lock-free and scalable multi-version software transactional memory. In Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP’11). 179--188. Google Scholar
Digital Library
- Keir Fraser and Timothy L. Harris. 2007. Concurrent programming without locks. ACM Transactions on Computer Systems 25, 2 (2007). Google Scholar
Digital Library
- Vincent Gramoli and Rachid Guerraoui. 2014. Democratizing transactional programming. Communications of ACM 57, 1 (Jan. 2014), 86--93. Google Scholar
Digital Library
- Rachid Guerraoui, Michal Kapalka, and Jan Vitek. 2007. STMBench7: A benchmark for software transactional memory. In Proceedings of the European Conference on Computer Systems (EuroSys’07). 315--324. Google Scholar
Digital Library
- Tim Harris, Simon Marlow, Simon L. Peyton Jones, and Maurice Herlihy. 2008. Composable memory transactions. Communications of the ACM 51, 8 (2008), 91--100. Google Scholar
Digital Library
- Tomer Heber, Danny Hendler, and Adi Suissa. 2012. On the impact of serializing contention management on STM performance. Journal of Parallel and Distributed Computing 72, 6 (June 2012), 739--750. Google Scholar
Digital Library
- Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the International Symposium on Computer Architecture (ISCA’93). 289--300. Google Scholar
Digital Library
- João Cachopo Hugo Rito. 2015. Adaptive transaction scheduling for mixed transactional workloads. Elsevier Parallel Computing Journal 41 (2015), 31--49. Google Scholar
Digital Library
- Christian Jacobi, Timothy Slegel, and Dan Greiner. 2012. Transactional memory architecture and implementation for IBM system Z. In Proceedings of the Symposium on Microarchitecture (MICRO’12). 25--36. Google Scholar
Digital Library
- Andi Kleen. 2014. Scaling existing lock-based applications with lock elision. Communications of the ACM 57, 3 (March 2014), 52--56. Google Scholar
Digital Library
- Yossi Lev, Mark Moir, and Dan Nussbaum. 2007. PhTM: Phased transactional memory. In 2nd Workshop on Transactional Computing (TRANSACT’07).Google Scholar
- Daniel Lupei, Bogdan Simion, Don Pinto, Matthew Misler, Mihai Burcea, William Krick, and Cristiana Amza. 2010. Transactional memory support for scalable and transparent parallelization of multiplayer games. In Proceedings of the European Conference on Computer Systems (EuroSys’10). 41--54. Google Scholar
Digital Library
- Walther Maldonado, Patrick Marlier, Pascal Felber, Adi Suissa, Danny Hendler, Alexandra Fedorova, Julia L. Lawall, and Gilles Muller. 2010. Scheduling support for transactional memory contention management. In Proceedings of Principles and Practice of Parallel Programming (PPoPP’10). 79--90. Google Scholar
Digital Library
- Maged M. Michael. 2013. The balancing act of choosing nonblocking features. Communications of ACM 56, 9 (Sept. 2013), 46--53. Google Scholar
Digital Library
- Chi Cao Minh, JaeWoong Chung, C. Kozyrakis, and K. Olukotun. 2008. STAMP: Stanford transactional applications for multi-processing. In Proceedings of the Symposium on Workload Characterization (IISWC’08). 35--46.Google Scholar
- Mohamed Mohamedin, Roberto Palmieri, and Binoy Ravindran. 2015. Brief announcement: On scheduling best-effort HTM transactions. In Proceedings of the Symposium on Parallelism in Algorithms and Architectures (SPAA’15). 74--76. Google Scholar
Digital Library
- Takuya Nakaike, Rei Odaira, Matthew Gaudet, Maged M. Michael, and Hisanobu Tomari. 2015. Quantitative comparison of hardware transactional memory for blue gene/Q, zenterprise EC12, intel core, and POWER8. In Proceedings of the International Symposium on Computer Architecture (ISCA’15). 144--157. Google Scholar
Digital Library
- Yang Ni, Adam Welc, Ali-Reza Adl-Tabatabai, Moshe Bach, Sion Berkowits, James Cownie, Robert Geva, Sergey Kozhukow, Ravi Narayanaswamy, Jeffrey Olivier, Serguei Preis, Bratin Saha, Ady Tal, and Xinmin Tian. 2008. Design and implementation of transactional constructs for C/C++. In Proceedings of the Conference on Object-oriented Programming Systems Languages and Applications (OOPSLA’08). 195--212. Google Scholar
Digital Library
- Victor Pankratius and Ali-Reza Adl-Tabatabai. 2014. Software engineering with transactional memory versus locks in practice. Theory of Computer Systems 55, 3 (Oct. 2014), 555--590. Google Scholar
Digital Library
- Ravi Rajwar and James R. Goodman. 2001. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the Symposium on Microarchitecture (MICRO’01). 294--305. Google Scholar
Digital Library
- Torvald Riegel, Christof Fetzer, and Pascal Felber. 2008. Automatic data partitioning in software transactional memories. In Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPA’08). 152--159. Google Scholar
Digital Library
- Hugo Rito and Joao Cachopo. 2014. ProPS: A progressively pessimistic scheduler for software transactional memory. In Proceedings European Conference on Parallel Processing (Euro-Par’14). 150--161. Google Scholar
Cross Ref
- Christopher J. Rossbach, Owen S. Hofmann, Donald E. Porter, Hany E. Ramadan, Bhandari Aditya, and Emmett Witchel. 2007. TxLinux: Using and managing hardware transactional memory in an operating system. In Proceedings of Symposium on Operating Systems Principles (SOSP’07). 87--102. Google Scholar
Digital Library
- Christopher J. Rossbach, Owen S. Hofmann, and Emmett Witchel. 2010. Is transactional programming actually easier? SIGPLAN Notifications 45, 5 (Jan. 2010), 47--56. Google Scholar
Digital Library
- Amy Wang, Matthew Gaudet, Peng Wu, José Nelson Amaral, Martin Ohmacht, Christopher Barton, Raul Silvera, and Maged Michael. 2012. Evaluation of blue gene/Q hardware support for transactional memories. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’12). 127--136. Google Scholar
Digital Library
- Lingxiang Xiang and Michael L. Scott. 2015. Conflict reduction in hardware transactions using advisory locks. In Proceedings of Symposium on Parallelism in Algorithms and Architectures (SPAA’15). 234--243. Google Scholar
Digital Library
- Luke Yen, Jayaram Bobba, Michael R. Marty, Kevin E. Moore, Haris Volos, Mark D. Hill, Michael M. Swift, and David A. Wood. 2007. LogTM-SE: Decoupling hardware transactional memory from caches. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’07). 261--272. Google Scholar
Digital Library
- Richard M. Yoo, Christopher J. Hughes, Konrad Lai, and Ravi Rajwar. 2013. Performance evaluation of Intel transactional synchronization extensions for high-performance computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC’13). 1--11. Google Scholar
Digital Library
- Richard M. Yoo and Hsien-Hsin S. Lee. 2008. Adaptive transaction scheduling for transactional memory systems. In Proceedings of the Symposium on Parallelism in Algorithms and Architectures (SPAA’08). 169--178. Google Scholar
Digital Library
Index Terms
(auto-classified)Seer: Probabilistic Scheduling for Hardware Transactional Memory
Recommendations
Seer: Probabilistic Scheduling for Hardware Transactional Memory
SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and ArchitecturesScheduling concurrent transactions to minimize contention is a well known technique in the Transactional Memory (TM) literature, which was largely investigated in the context of software TMs. However, the recent advent of Hardware Transactional Memory (...
Enhancing scalability in best-effort hardware transactional memory systems
Current industry proposals for hardware transactional memory focus on best-effort solutions where hardware limits are imposed on transactions. These designs can efficiently execute transactions but they may abort due to different hardware and operating ...
Refined transactional lock elision
PPoPP '16Transactional lock elision (TLE) is a well-known technique that exploits hardware transactional memory (HTM) to introduce concurrency into lock-based software. It achieves that by attempting to execute a critical section protected by a lock in an atomic ...






Comments