Abstract
The Transactional Memory (TM) paradigm promises to greatly simplify the development of concurrent applications. This led, over the years, to the creation of a plethora of TM implementations delivering wide ranges of performance across workloads. Yet, no universal implementation fits each and every workload. In fact, the best TM in a given workload can reveal to be disastrous for another one. This forces developers to face the complex task of tuning TM implementations, which significantly hampers their wide adoption. In this paper, we address the challenge of automatically identifying the best TM implementation for a given workload. Our proposed system, ProteusTM, hides behind the TM interface a large library of implementations. Underneath, it leverages a novel multi-dimensional online optimization scheme, combining two popular learning techniques: Collaborative Filtering and Bayesian Optimization.
We integrated ProteusTM in GCC and demonstrate its ability to switch between TMs and adapt several configuration parameters (e.g., number of threads). We extensively evaluated ProteusTM, obtaining average performance <3% from optimal, and gains up to 100x over static alternatives.
- Allon Adir, Dave Goodman, Daniel Hershcovich, Oz Hershkovitz, Bryan Hickerson, Karen Holtz, Wisam Kadry, Anatoly Koyfman, John Ludden, Charles Meissner, Amir Nahir, Randall R. Pratt, Mike Schiffli, Brett St. Onge, Brian Thompto, Elena Tsanko, and Avi Ziv. Verification of Transactional Memory in POWER8. In Proceedings of the Annual Design Automation Conference, DAC, pages 1--6, 2014.Google Scholar
Digital Library
- Michèle Basseville and Igor V. Nikiforov. Detection of Abrupt Changes: Theory and Application. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993.Google Scholar
Digital Library
- James Bergstra, R. Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for Hyper-Parameter Optimization. In Proceedings of the Annual Conference on Neural Information Processing Systems, NIPS, Granada, Spain, 2011.Google Scholar
- James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13(1):281--305, February 2012.Google Scholar
Digital Library
- Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., 2006.Google Scholar
Digital Library
- Christopher M. Bishop. Pattern Recognition and Machine Learning. 2007.Google Scholar
- Leo Breiman. Bagging predictors. Mach. Learn., 24(2):123--140, August 1996.Google Scholar
Digital Library
- Eric Brochu, Vlad M Cora, and Nando de Freitas. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. eprint arXiv:1012.2599, arXiv.org, December 2010.Google Scholar
- Chi Cao Minh, JaeWoong Chung, Christos Kozyrakis, and Kunle Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In Proceedings of The IEEE International Symposium on Workload Characterization, IISWC, 2008.Google Scholar
- Michael J. Carey, David J. DeWitt, and Jeffrey F. Naughton. The oo7 benchmark. SIGMOD Rec., 22(2):12--21, June 1993.Google Scholar
Digital Library
- Calin Cascaval, Colin Blundell, Maged Michael, Harold W Cain, Peng Wu, Stefanie Chiras, and Siddhartha Chatterjee. Software transactional memory: why is it only a research toy? Communications of the ACM, 51(11):40--46, 2008.Google Scholar
Digital Library
- Márcio Castro, LuísFabrícioWanderley Góes, LuizGustavo Fernandes, and Jean-François Méhaut. Dynamic Thread Mapping Based on Machine Learning for Transactional Memory Applications. In Proceedings of the European Conference on Parallel Processing, Euro-Par, pages 465--476. 2012.Google Scholar
- Carlo Curino, Evan P.C. Jones, Samuel Madden, and Hari Balakrishnan. Workload-aware database monitoring and consolidation. In Proceedings of the ACM International Conference on Management of Data, SIGMOD, pages 313--324, 2011.Google Scholar
Digital Library
- Luke Dalessandro, François Carouge, Sean White, Yossi Lev, Mark Moir, Michael L. Scott, and Michael F. Spear. Hybrid NOrec: A Case Study in the Effectiveness of Best Effort Hardware Transactional Memory. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 39--52, 2011.Google Scholar
Digital Library
- Luke Dalessandro, Michael F. Spear, and Michael L. Scott. NOrec: Streamlining STM by Abolishing Ownership Records. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pages 67--78, 2010.Google Scholar
Digital Library
- Abhinandan S. Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. Google news personalization: Scalable online collaborative filtering. In Proceedings of the International Conference on World Wide Web, WWW, pages 271--280, 2007.Google Scholar
Digital Library
- Howard David, Eugene Gorbatov, Ulf R. Hanebutte, Rahul Khanna, and Christian Le. RAPL: Memory Power Estimation and Capping. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED, pages 189--194, 2010.Google Scholar
- Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask. In Proceedings of the ACM Symposium on Operating Systems Principles, SOSP, pages 33--48, 2013.Google Scholar
Digital Library
- James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, and Dasarathi Sampath. The YouTube Video Recommendation System. In Proceedings of the ACM Conference on Recommender Systems, RecSys, pages 293--296, 2010.Google Scholar
- Christina Delimitrou and Christos Kozyrakis. Paragon: QoS-aware Scheduling for Heterogeneous Datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 77--88, 2013.Google Scholar
- Christina Delimitrou and Christos Kozyrakis. Quasar: resource-efficient and QoS-aware cluster management. In Proceedings of Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 127--144, 2014.Google Scholar
- Dave Dice, Ori Shalev, and Nir Shavit. Transactional Locking II. In Proceedings of the International Conference on Distributed Computing, DISC, pages 194--208, 2006.Google Scholar
- David Dice, Yossi Lev, Mark Moir, and Daniel Nussbaum. Early experience with a commercial hardware transactional memory implementation. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 157--168, 2009.Google Scholar
Digital Library
- Diego Didona, Pascal Felber, Derin Harmanci, Paolo Romano, and Joerg Schenker. Identifying the Optimal Level of Parallelism in Transactional Memory Applications. Computing Journal, pages 1--21, December 2013.Google Scholar
Digital Library
- Nuno Diegues and Paolo Romano. Self-Tuning Intel Transactional Synchronization Extensions. In Proceedings of the USENIX International Conference on Autonomic Computing, pages 209--219, Philadelphia, PA, 2014.Google Scholar
- Nuno Diegues, Paolo Romano, and Luıs Rodrigues. Virtues and Limitations of Commodity Hardware Transactional Memory. In Proceedings of the International Conference on Parallel Architectures and Compilation, PACT, pages 3--14, 2014.Google Scholar
- Aleksandar Dragojević, Rachid Guerraoui, and Michal Kapalka. Stretching Transactional Memory. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI, pages 155--165, 2009.Google Scholar
Digital Library
- Songyun Duan, Vamsidhar Thummala, and Shivnath Babu. Tuning Database Configuration Parameters with iTuned. PVLDB, 2(1):1246--1257, 2009.Google Scholar
Digital Library
- Pascal Felber, Christof Fetzer, and Torvald Riegel. Dynamic Performance Tuning of Word-based Software Transactional Memory. In Proc. of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pages 237--246, 2008.Google Scholar
- Rachid Guerraoui, Maurice Herlihy, and Bastian Pochon. Polymorphic Contention Management. In Proceedings of the International Conference on Distributed Computing, DISC, pages 303--323, 2005.Google Scholar
- Rachid Guerraoui, Maurice Herlihy, and Bastian Pochon. Toward a Theory of Transactional Contention Managers. In Proceedings of the ACM Symposium on Principles of Distributed Computing, PODC, pages 258--264, 2005.Google Scholar
Digital Library
- Rachid Guerraoui, Michal Kapalka, and Jan Vitek. STMBench7: A Benchmark for Software Transactional Memory. In Proceedings of the ACM SIGOPS European Conference on Computer Systems, EuroSys, pages 315--324, 2007.Google Scholar
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1):10--18, November 2009.Google Scholar
Digital Library
- Tim Harris, James Larus, and Ravi Rajwar. Transactional Memory, 2nd Edition. Morgan and Claypool Publishers, 2nd edition, 2010.Google Scholar
- Maurice Herlihy and J. Eliot B. Moss. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the Annual International Symposium on Computer Architecture, ISCA, pages 289--300, 1993.Google Scholar
- M. Horowitz, T. Indermaur, and R. Gonzalez. Low-power digital design. In Proceedings of the IEEE Symposium on Low Power Electronics, pages 8--11, Oct 1994.Google Scholar
Cross Ref
- Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential Model-based Optimization for General Algorithm Configuration. In Proceedings of the International Conference on Learning and Intelligent Optimization, LION, pages 507--523, 2011.Google Scholar
- Intel Corporation. Intel Transactional Memory Compiler and Runtime Application Binary Interface. https://gcc.gnu.org/wiki/TransactionalMemory?action=AttachFile&do=get&target=Intel-TM-ABI-1_1_20060506.pdf, 2009.Google Scholar
- Christian Jacobi, Timothy Slegel, and Dan Greiner. Transactional Memory Architecture and Implementation for IBM System Z. In Proceedings of the Annual nternational Symposium on Microarchitecture, MICRO, pages 25--36, 2012.Google Scholar
- Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient Global Optimization of Expensive Black-Box Functions. Journal of Global Optimization, 13(4):455--492, December 1998.Google Scholar
Digital Library
- T. Karnagel, R. Dementiev, R. Rajwar, K. Lai, T. Legler, B. Schlegel, and W. Lehner. Improving in-memory database index performance with Intel Transactional Synchronization Extensions. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture, pages 476--487, 2014.Google Scholar
- Andi Kleen. Scaling existing lock-based applications with lock elision. Commun. ACM, 57(3):52--56, March 2014.Google Scholar
Digital Library
- Per-Ake Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, and Mike Zwilling. High-performance Concurrency Control Mechanisms for Main-memory Databases. Proceedings of the VLDB Endownment, 5(4):298--309, December 2011.Google Scholar
Digital Library
- Yossi Lev, Mark Moir, and Dan Nussbaum. Phtm: Phased transactional memory. In Workshop on Transactional Computing (Transact), 2007.Google Scholar
- Greg Linden, Brent Smith, and Jeremy York. Amazon.Com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing, 7(1):76--80, January 2003.Google Scholar
Digital Library
- Daniel Lupei, Bogdan Simion, Don Pinto, Matthew Misler, Mihai Burcea, William Krick, and Cristiana Amza. Transactional Memory Support for Scalable and Transparent Parallelization of Multiplayer Games. In Proceedings of the ACM SIGOPS European Conference on Computer Systems, EuroSys, pages 41--54, 2010.Google Scholar
- Alexander Matveev and Nir Shavit. Reduced Hardware Transactions: A New Approach to Hybrid Transactional Memory. In Proceedings of the Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA, pages 11--22, 2013.Google Scholar
Digital Library
- Adam Morrison and Yehuda Afek. Fast Concurrent Queues for x86 Processors. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pages 103--112, 2013.Google Scholar
Digital Library
- Yang Ni, Adam Welc, Ali-Reza Adl-Tabatabai, Moshe Bach, Sion Berkowits, James Cownie, Robert Geva, Sergey Kozhukow, Ravi Narayanaswamy, Jeffrey Olivier, Serguei Preis, Bratin Saha, Ady Tal, and Xinmin Tian. Design and Implementation of Transactional Constructs for C/CGoogle Scholar
- . In Proceedings of the ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications, OOPSLA, pages 195--212, 2008.Google Scholar
- Takayuki Osogami and Sei Kato. Optimizing System Configurations Quickly by Guessing at the Performance. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS, pages 145--156, 2007.Google Scholar
Digital Library
- Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. Mahout in Action. Manning Publications Co., Greenwich, CT, USA, 2011.Google Scholar
Digital Library
- Victor Pankratius and Ali-Reza Adl-Tabatabai. Software Engineering with Transactional Memory Versus Locks in Practice. Theor. Comp. Sys., 55(3):555--590, October 2014.Google Scholar
Digital Library
- Eric Pettijohn, Yanfei Guo, Palden Lama, and Xiaobo Zhou. User-Centric Heterogeneity-Aware MapReduce Job Provisioning in the Public Cloud. In Proceedings of the International Conference on Autonomic Computing, ICAC, pages 137--143, 2014.Google Scholar
- Anand Rajaraman and Jeffrey David Ullman. Mining of Massive Datasets. Cambridge University Press, 2011.Google Scholar
Digital Library
- Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2005.Google Scholar
- Carl Ritson and Frederick Barnes. An Evaluation of Intel's Restricted Transactional Memory for CPAs. In Proceedings of Communicating Process Architectures, CPA, pages 271--292, 2013.Google Scholar
- Christopher J. Rossbach, Owen S. Hofmann, and Emmett Witchel. Is Transactional Programming Actually Easier? SIGPLAN Not., 45(5):47--56, January 2010.Google Scholar
Digital Library
- Wenjia Ruan, Trilok Vyas, Yujie Liu, and Michael Spear. Transactionalizing Legacy Code: An Experience Report Using GCC and Memcached. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, pages 399--412, New York, NY, USA, 2014. ACM.Google Scholar
- Diego Rughetti, Pierangelo Di Sanzo, Bruno Ciciani, and Francesco Quaglia. Machine learning-based self-adjusting concurrency in software transactional memory systems. In Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS '12, pages 278--285, Washington, DC, USA, 2012. IEEE Computer Society.Google Scholar
Digital Library
- Pierangelo Di Sanzo, Francesco Del Re, Diego Rughetti, Bruno Ciciani, and Francesco Quaglia. Regulating Concurrency in Software Transactional Memory: An Effective Model-based Approach. In Proceedings of the IEEE International Conference on Self-Adaptive and Self-Organizing Systems, SASO, pages 31--40, 2013.Google Scholar
- Xiaoyuan Su and Taghi M. Khoshgoftaar. A survey of collaborative filtering techniques. Adv. in Artif. Intell., 2009:4:2--4:2, January 2009.Google Scholar
Digital Library
- Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 847--855, 2013.Google Scholar
- TPC Council. TPC-C Benchmark. http://www.tpc.org/tpcc, 2011.Google Scholar
- Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. Speedy Transactions in Multicore In-memory Databases. In Proceedings of the ACM Symposium on Operating Systems Principles, SOSP, pages 18--32, 2013.Google Scholar
- Qingping Wang, Sameer Kulkarni, John Cavazos, and Michael Spear. A Transactional Memory with Automatic Performance Tuning. ACM Trans. Archit. Code Optim., 8(4):54:1--54:23, January 2012.Google Scholar
Digital Library
- Bowei Xi, Zhen Liu, Mukund Raghavachari, Cathy H. Xia, and Li Zhang. A Smart Hill-climbing Algorithm for Application Server Configuration. In Proceedings of the International Conference on World Wide Web, WWW, pages 287--296, 2004.Google Scholar
- Richard M. Yoo, Christopher J. Hughes, Konrad Lai, and Ravi Rajwar. Performance evaluation of Intel Transactional Synchronization Extensions for High-performance Computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1--19. ACM, 2013.Google Scholar
Digital Library
- Wei Zheng, Ricardo Bianchini, G. John Janakiraman, Jose Renato Santos, and Yoshio Turner. JustRunIt: Experiment-based Management of Virtualized Data Centers. In Proceedings of the Conference on USENIX Annual Technical Conference, ATC, pages 18--18, Berkeley, CA, USA, 2009. USENIX Association.Google Scholar
Index Terms
ProteusTM: Abstraction Meets Performance in Transactional Memory
Recommendations
ProteusTM: Abstraction Meets Performance in Transactional Memory
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsThe Transactional Memory (TM) paradigm promises to greatly simplify the development of concurrent applications. This led, over the years, to the creation of a plethora of TM implementations delivering wide ranges of performance across workloads. Yet, no ...
ProteusTM: Abstraction Meets Performance in Transactional Memory
ASPLOS'16The Transactional Memory (TM) paradigm promises to greatly simplify the development of concurrent applications. This led, over the years, to the creation of a plethora of TM implementations delivering wide ranges of performance across workloads. Yet, no ...







Comments