Abstract
A plethora of optimized mutex lock algorithms have been designed over the past 25 years to mitigate performance bottlenecks related to critical sections and locks. Unfortunately, there is currently no broad study of the behavior of these optimized lock algorithms on realistic applications that consider different performance metrics, such as energy efficiency and tail latency. In this article, we perform a thorough and practical analysis of synchronization, with the goal of providing software developers with enough information to design fast, scalable, and energy-efficient synchronization in their systems. First, we perform a performance study of 28 state-of-the-art mutex lock algorithms, on 40 applications, on four different multicore machines. We consider not only throughput (traditionally the main performance metric) but also energy efficiency and tail latency, which are becoming increasingly important. Second, we present an in-depth analysis in which we summarize our findings for all the studied applications. In particular, we describe nine different lock-related performance bottlenecks, and we propose six guidelines helping software developers with their choice of a lock algorithm according to the different lock properties and the application characteristics.
From our detailed analysis, we make several observations regarding locking algorithms and application behaviors, several of which have not been previously discovered: (i) applications stress not only the lock–unlock interface but also the full locking API (e.g., trylocks, condition variables); (ii) the memory footprint of a lock can directly affect the application performance; (iii) for many applications, the interaction between locks and scheduling is an important application performance factor; (vi) lock tail latencies may or may not affect application tail latency; (v) no single lock is systematically the best; (vi) choosing the best lock is difficult; and (vii) energy efficiency and throughput go hand in hand in the context of lock algorithms. These findings highlight that locking involves more considerations than the simple lock/unlock interface and call for further research on designing low-memory footprint adaptive locks that fully and efficiently support the full lock interface, and consider all performance metrics.
- Yehuda Afek, Alexander Matveev, Oscar R. Moll, and Nir Shavit. 2015. Amalgamated lock-elision. In Distributed Computing. Lecture Notes in Computer Science, Vol. 9363. Springer, 309--324. Google Scholar
Digital Library
- Anant Agarwal and Mathews Cherian. 1989. Adaptive backoff synchronization techniques. In Proceedings of the 16th Annual International Symposium on Computer Architecture. ACM, New York, NY, 396--406. Google Scholar
Digital Library
- Martin Aigner, Christoph M. Kirsch, Michael Lippautz, and Ana Sokolova. 2015. Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPLSLA’15), part of SPLASH 2015. ACM, New York, NY, 451--469. Google Scholar
Digital Library
- Samy Al Bahra. 2015. Concurrency Kit. Retrieved November 8, 2018 from http://concurrencykit.org/Google Scholar
- Mohammad Mejbah Ul Alam, Tongping Liu, Guangming Zeng, and Abdullah Muzahid. 2017. SyncPerf: Categorizing, detecting, and diagnosing synchronization performance bugs. In Proceedings of the 12th European Conference on Computer Systems (EuroSys’17). ACM, New York, NY, 298--313. Google Scholar
Digital Library
- AMD. 2010. BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors. Retrieved November 8, 2018 from http://support.amd.com/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdfGoogle Scholar
- Nikos Anastopoulos and Nectarios Koziris. 2008. Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors. In Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing (IPDPS’08). IEEE, Los Alamitos, CA, 1--8.Google Scholar
Cross Ref
- Thomas E. Anderson. 1990. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems 1, 1 (1990), 6--16. Google Scholar
Digital Library
- Jelena Antic, Georgios Chatzopoulos, Rachid Guerraoui, and Vasileios Trigonakis. 2016. Locking made easy. In Proceedings of the 17th International Middleware Conference. ACM, New York, NY, 20. http://dl.acm.org/citation.cfm?id=2988357Google Scholar
Digital Library
- Marc Auslander, David Edelsohn, Orran Krieger, Bryan Rosenburg, and Robert Wisniewski. 2003. Enhancement to the MCS lock for increased functionality and improved programmability. U.S. Patent Application No. 20030200457 (abandoned).Google Scholar
- Woongki Baek and Trishul M. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 198--209. Google Scholar
Digital Library
- Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX). ACM, New York, NY, 117--128. Google Scholar
Digital Library
- Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University, Princeton, NJ. Google Scholar
Digital Library
- Mike W. Blasgen, Jim Gray, Michael F. Mitoma, and Thomas G. Price. 1979. The convoy phenomenon. Operating Systems Review 13, 2 (1979), 20--25. Google Scholar
Digital Library
- Silas Boyd-Wickizer, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2012. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium. 119--130.Google Scholar
- Brad Fitzpatrick. 2018. Memcached. Retrieved November 8, 2018 from http://memcached.orgGoogle Scholar
- Trevor Brown, Alex Kogan, Yossi Lev, and Victor Luchangco. 2016. Investigating the performance of hardware transactions on a multi-socket machine. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’16). ACM, New York, NY, 121--132. Google Scholar
Digital Library
- Irina Calciu, Dave Dice, Tim Harris, Maurice Herlihy, Alex Kogan, Virendra J. Marathe, and Mark Moir. 2013. Message passing or shared memory: Evaluating the delegation abstraction for multicores. In Principles of Distributed Systems. Lecture Notes in Computer Science, Vol. 8304. Springer, 83--97. Google Scholar
Digital Library
- Irina Calciu, David Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, and Nir Shavit. 2013. NUMA-aware reader-writer locks. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). ACM, New York, NY, 157--166. Google Scholar
Digital Library
- Milind Chabbi, Michael W. Fagan, and John M. Mellor-Crummey. 2015. High performance locks for multi-level NUMA systems. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY, 215--226. Google Scholar
Digital Library
- Milind Chabbi and John M. Mellor-Crummey. 2016. Contention-conscious, locality-preserving locks. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). ACM, New York, NY, 22:1--22:14. Google Scholar
Digital Library
- Christoph Rupp. 2018. Upscaledb. Retrieved November 8, 2018 from https://upscaledb.comGoogle Scholar
- Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2013. RadixVM: Scalable address spaces for multithreaded applications. In Proceedings of the 8th Eurosys Conference (EuroSys’13). ACM, New York, NY, 211--224. Google Scholar
Digital Library
- Travis S. Craig. 1993. Building FIFO and Priority-Queuing Spin Locks From Atomic Swap. Technical Report TR 93-02-02. University of Washington.Google Scholar
- Florian David, Gaël Thomas, Julia Lawall, and Gilles Muller. 2014. Continuously measuring critical section pressure with the free-lunch profiler. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’14), part of SPLASH 2014. ACM, New York, NY, 291--307. Google Scholar
Digital Library
- Tudor David and Rachid Guerraoui. 2016. Concurrent search data structures can be blocking and practically wait-free. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’16). ACM, New York, NY, 337--348. Google Scholar
Digital Library
- Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything you always wanted to know about synchronization but were afraid to ask. In Proceedings of the ACM SIGOPS 24th Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 33--48. Google Scholar
Digital Library
- Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2015. Asynchronized concurrency: The secret to scaling concurrent search data structures. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). ACM, New York, NY, 631--644. Google Scholar
Digital Library
- David Dice. 2011. Brief announcement: A partitioned ticket lock. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11), Co-located With FCRC’11. ACM, New York, NY, 309--310. Google Scholar
Digital Library
- Dave Dice. 2017. Malthusian locks. In Proceedings of the 12th European Conference on Computer Systems (EuroSys’17). ACM, New York, NY, 314--327. Google Scholar
Digital Library
- David Dice, Virendra J. Marathe, and Nir Shavit. 2011. Flat-combining NUMA locks. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11), Co-located with FCRC’11. CACM, New York, NY, 65--74. Google Scholar
Digital Library
- David Dice, Virendra J. Marathe, and Nir Shavit. 2015. Lock cohorting: A general technique for designing NUMA locks. ACM Transactions on Parallel Computing 1, 2 (2015), 13:1--13:42. Google Scholar
Digital Library
- Dave Dice, Mark S. Moir, and William N. Scherer III. 2003. Quickly Reacquirable Locks. Patent No. US7814488B1, Filed Sep. 9, 2002, Issued Sep. 3, 2003.Google Scholar
- Open Source Facebook. 2017. RocksDB. Retrieved November 8, 2018 from http://rocksdb.orgGoogle Scholar
- FAL Labs. 2012. Kyoto Cabinet: A Straightforward Implementation of DBM. Retrieved November 8, 2018 from http://fallabs.com/kyotocabinetGoogle Scholar
- Babak Falsafi, Rachid Guerraoui, Javier Picorel, and Vasileios Trigonakis. 2016. Unlocking energy. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC’16). 393--406. https://www.usenix.org/conference/atc16/technical-sessions/presentation/falsafi. Google Scholar
Digital Library
- Panagiota Fatourou and Nikolaos D. Kallimanis. 2012. Revisiting the combining synchronization technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). ACM, New York, NY, 257--266. Google Scholar
Digital Library
- Rich Felker. 2018. musl libc. Retrieved November 8, 2018 from https://www.musl-libc.org.Google Scholar
- Free Software Foundation. 2018. The GNU C Library. Retrieved November 8, 2018 from https://www.gnu.org/software/libc/manualGoogle Scholar
- Free Software Foundation. 2018. pthread_mutex_lock GNU C library implementation. Retrieved November 8, 2018 from https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_mutex_lock.c;hb=HEADGoogle Scholar
- Sanjay Ghemawat and Paul Menage. 2018. TCMalloc: Thread-Caching Malloc. Retrieved November 8, 2018 from https://github.com/gperftools/gperftoolsGoogle Scholar
- Vincent Gramoli. 2015. More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY, 1--10. Google Scholar
Digital Library
- Hugo Guiroux, Renaud Lachaize, and Vivien Quéma. 2016. Multicore locks: The case is not closed yet. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC’16). 649--662. https://www.usenix.org/conference/atc16/technical-sessions/presentation/guiroux. Google Scholar
Digital Library
- Hugo Guiroux, Renaud Lachaize, and Vivien Quéma. 2018. LiTL source code and data sets. Retrieved November 8, 2018 from https://github.com/multicore-locksGoogle Scholar
- Pat Hanrahan, David Salzman, and Larry Aupperle. 1991. A rapid hierarchical radiosity algorithm. In Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’91). ACM, New York, NY, 197--206. Google Scholar
Digital Library
- Bijun He, William N. Scherer III, and Michael L. Scott. 2005. Preemption adaptivity in time-published queue-based spin locks. In High Performance Computing. Lecture Notes in Computer Science, Vol. 3769. Springer, 7--18. Google Scholar
Digital Library
- Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’10). ACM, New York, NY, 355--364. Google Scholar
Digital Library
- IEEE. 2013. pthread_mutex_lock(3p) manual page. Retrieved November 8, 2018 from http://man7.org/linux/man-pages/man3/pthread_mutex_lock.3p.htmlGoogle Scholar
- IEEE. 2017. mallopt(3) manual page. Retrieved November 8, 2018 from http://man7.org/linux/man-pages/man3/mallopt.3.htmlGoogle Scholar
- Intel. 2015. Intel Xeon Processor E7-4800/8800 v3 Product Families. Retrieved November 8, 2018 from http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e7-v3-datasheet-vol-1.pdfGoogle Scholar
- Intel. 2016. Intel 64 and IA-32 Architectures, Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2. Retrieved November 8, 2018 from https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdfGoogle Scholar
- Ryan Johnson, Radu Stoica, Anastasia Ailamaki, and Todd C. Mowry. 2010. Decoupling contention management from scheduling. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’10). ACM, New York, NY, 117--128. Google Scholar
Digital Library
- Alain Kägi, Doug Burger, and James R. Goodman. 1997. Efficient synchronization: Let them eat QOLB. In Proceedings of the 24th International Symposium on Computer Architecture. ACM, New York, NY, 170--180. Google Scholar
Digital Library
- Anna R. Karlin, Kai Li, Mark S. Manasse, and Susan S. Owicki. 1991. Empirical studies of competitive spinning for a shared-memory multiprocessor. In Proceedings of the 13th ACM Symposium on Operating System Principles (SOSP’91). ACM, New York, NY, 41--55. Google Scholar
Digital Library
- Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. 2017. Scalable NUMA-aware blocking synchronization primitives. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC’17). 603--615. https://www.usenix.org/conference/atc17/technical-sessions/presentation/kashyap. Google Scholar
Digital Library
- Leonidas I. Kontothanassis, Robert W. Wisniewski, and Michael L. Scott. 1997. Scheduler-conscious synchronization. ACM Transactions on Computer Systems 15, 1 (1997), 3--40. Google Scholar
Digital Library
- Konstantinos Koukos, David Black-Schaffer, Vasileios Spiliopoulos, and Stefanos Kaxiras. 2013. Towards more efficient execution: A decoupled access-execute approach. In Proceedings of the International Conference on Supercomputing (ICS’13). ACM, New York, NY, 253--262. Google Scholar
Digital Library
- Bradley C. Kuszmaul. 2015. SuperMalloc: A super fast multithreaded malloc for 64-bit machines. In Proceedings of the 2015 ACM SIGPLAN International Symposium on Memory Management (ISMM’15). ACM, New York, NY, 41--55. Google Scholar
Digital Library
- Kaz Kylheku. 2014. What is PTHREAD_MUTEX_ADAPTIVE_NP? Retrieved November 8, 2018 from http://stackoverflow.com/a/25168942Google Scholar
- Beng-Hong Lim. 1995. Reactive Synchronization Algorithms for Multiprocessors. Ph.D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA. http://hdl.handle.net/1721.1/36018 Google Scholar
Digital Library
- Jean-Pierre Lozi. 2014. Towards More Scalable Mutual Exclusion for Multicore Architectures. (Vers des mécanismes d’exclusion mutuelle plus efficaces pour les architectures multi-cœur). Ph.D. Dissertation. Pierre and Marie Curie University, Paris, France. https://tel.archives-ouvertes.fr/tel-01067244Google Scholar
- Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia L. Lawall, and Gilles Muller. 2012. Remote core locking: Migrating critical-section execution to improve the performance of multithreaded applications. In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC’12). 65--76. https://www.usenix.org/conference/atc12/technical-sessions/presentation/lozi. Google Scholar
Digital Library
- Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia L. Lawall, and Gilles Muller. 2016. Fast and portable locking for multicore architectures. ACM Transactions on Computer Systems 33, 4 (2016), 13:1--13:62. Google Scholar
Digital Library
- Jean-Pierre Lozi, Baptiste Lepers, Justin R. Funston, Fabien Gaud, Vivien Quéma, and Alexandra Fedorova. 2016. The Linux scheduler: A decade of wasted cores. In Proceedings of the 11th European Conference on Computer Systems (EuroSys’16). ACM, New York, NY, 1:1--1:16. Google Scholar
Digital Library
- Victor Luchangco, Daniel Nussbaum, and Nir Shavit. 2006. A hierarchical CLH queue lock. In Euro-Par 2006: Parallel Processing. Lecture Notes in Computer Science, Vol. 4128. Springer, 801--810. Google Scholar
Digital Library
- Peter S. Magnusson, Anders Landin, and Erik Hagersten. 1994. Queue locks on cache coherent multiprocessors. In Proceedings of the 8th International Symposium on Parallel Processing IEEE, Los Alamitos, CA, 165--171. Google Scholar
Digital Library
- Paul E. McKenney. 1996. Selecting locking designs for parallel programs. In Pattern Languages of Program Design 2, J. M. Vlissides, J. O. Coplien, and N. L. Kerth (Eds.). Addison Wesley Longman, Boston, MA, 501--531. http://dl.acm.org/citation.cfm?id=231958.232968 Google Scholar
Digital Library
- Paul E. McKenney. 1996. Selecting locking primitives for parallel programming. Communications of the ACM 39, 10 (1996), 75--82. Google Scholar
Digital Library
- Paul E. McKenney. 2017. Is parallel programming hard, and, if so, what can you do about it? (v2017.01.02a). arXiv:1701.00854. http://arxiv.org/abs/1701.00854Google Scholar
- John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems 9, 1 (1991), 21--65. Google Scholar
Digital Library
- Thannirmalai Somu Muthukaruppan, Anuj Pathania, and Tulika Mitra. 2014. Price theory based power management for heterogeneous multi-cores. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 161--176. Google Scholar
Digital Library
- Regina Nuzzo. 2014. Scientific method: Statistical errors. Nature 506, 7487 (2014), 150--152.Google Scholar
- Oracle Corporation. 2017. MySQL. Retrieved November 8, 2018 from https://www.mysql.com.Google Scholar
- Y. Oyama, K. Taura, and A. Yonezawa. 1999. Executing parallel programs with synchronization bottlenecks efficiently. In Proceedings of the International Workshop on Parallel and Distributed Computing for Symbolic And Irregular Applications (PDSIA’99). 23.Google Scholar
- Venkatesh Pallipadi and Alexey Starikovskiy. 2006. The ondemand governor. In Proceedings of the Linux Symposium, Vol. 2. 215--230.Google Scholar
- Lennart Poettering. 2009. Measuring Lock Contention. Retrieved November 8, 2018 from http://0pointer.de/blog/projects/mutrace.htmlGoogle Scholar
- Zoran Radovic and Erik Hagersten. 2003. Hierarchical backoff locks for nonuniform communication architectures. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA’03). IEEE, Los Alamitos, CA, 241--252. Google Scholar
Digital Library
- Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary R. Bradski, and Christos Kozyrakis. 2007. Evaluating MapReduce for multi-core and multiprocessor systems. In Proceedings of the 13th International Conference on High-Performance Computer Architecture (HPCA’07). IEEE, Los Alamitos, CA, 13--24. Google Scholar
Digital Library
- David P. Reed and Rajendra K. Kanodia. 1979. Synchronization with eventcounts and sequences. Communications of the ACM 22, 2 (1979), 115--123. Google Scholar
Digital Library
- Haris Ribic and Yu David Liu. 2014. Energy-efficient work-stealing language runtimes. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 513--528. Google Scholar
Digital Library
- Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. ffwd: Delegation is (much) faster than you think. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, New York, NY, 342--358. Google Scholar
Digital Library
- Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 164--174. Google Scholar
Digital Library
- Michael L. Scott. 2013. Shared-Memory Synchronization. Morgan 8 Claypool, San Rafael, CA. Google Scholar
Digital Library
- Michael L. Scott and William N. Scherer III. 2001. Scalable queue-based spin locks with timeout. In Proceedings of the 2001 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’01). ACM, New York, NY, 44--52. Google Scholar
Digital Library
- Jianchen Shan, Xiaoning Ding, and Narain H. Gehani. 2017. APPLES: Efficiently handling spin-lock synchronization on virtualized platforms. IEEE Transactions on Parallel and Distributed Systems 28, 7 (2017), 1811--1824.Google Scholar
Cross Ref
- Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao Zhang, and Zhuan Chen. 2013. Power containers: An OS facility for fine-grained power and energy management on multicore servers. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’13). ACM, New York, NY, 65--76. Google Scholar
Digital Library
- Karan Singh, Major Bhadauria, and Sally A. McKee. 2009. Real time power estimation and thread scheduling via performance counters. SIGARCH Computer Architecture News 37, 2 (2009), 46--55. Google Scholar
Digital Library
- Will Sobel, Shanti Subramanyam, Akara Sucharitakul, Jimmy Nguyen, Hubert Wong, Arthur Klepchukov, et al. 2008. Cloudstone: Multi-platform, multi-language benchmark and measurement tools for Web 2.0. Retrieved November 8, 2018 from https://pdfs.semanticscholar.org/34dd/c3da70f5b17ae0a73266ad1e4f9ae155811f.pdf.Google Scholar
- SQLite Consortium. 2018. SQLite. Retrieved November 8, 2018 from https://www.sqlite.org.Google Scholar
- Sun Microsystems. 2002. Multithreading in the Solaris Operating Environment. Retrieved November 8, 2018 from http://home.mit.bme.hu/∼meszaros/edu/oprendszerek/segedlet/unix/2_folyamatok_es_utemezes/solaris_multithread.pdfGoogle Scholar
- Nathan R. Tallent, John M. Mellor-Crummey, and Allan Porterfield. 2010. Analyzing lock contention in multithreaded applications. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10). ACM, New York, NY, 269--280. Google Scholar
Digital Library
- Jons-Tobias Wamhoff, Stephan Diestelhorst, Christof Fetzer, Patrick Marlier, Pascal Felber, and Dave Dice. 2015. The TURBO diaries: Application-controlled frequency scaling explained. In Software Engineering and Management (Vol. 239 von LNI), U. Aßmann, B. Demuth, T. Spitta, G. Püschel, and R. Kaiser (Eds.). GI, Dresden, Germany, 141--142. https://dl.gi.de/20.500.12116/2537Google Scholar
- Tianzheng Wang, Milind Chabbi, and Hideaki Kimura. 2016. Be my guest: MCS lock now welcomes guests. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). ACM, New York, NY, 21:1--21:12. Google Scholar
Digital Library
- Qiang Wu, Margaret Martonosi, Douglas W. Clark, Vijay Janapa Reddi, Dan Connors, Youfeng Wu, et al. 2005. A dynamic compilation framework for controlling microprocessor energy and performance. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38). IEEE, Los Alamitos, CA, 271--282. Google Scholar
Digital Library
- Fen Xie, Margaret Martonosi, and Sharad Malik. 2003. Compile-time dynamic voltage scaling settings: Opportunities and limits. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation. ACM, New York, NY, 49--62. Google Scholar
Digital Library
- Chao Xu, Felix Xiaozhu Lin, Yuyang Wang, and Lin Zhong. 2015. Automated OS-level device runtime power management. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). ACM, New York, NY, 239--252. Google Scholar
Digital Library
- Konrad Zemek. 2015. Asio, SSL, and Scalability. Retrieved November 8, 2018 from https://konradzemek.com/2015/08/16/asio-ssl-and-scalabilityGoogle Scholar
- Mingzhe Zhang, Haibo Chen, Luwei Cheng, Francis C. M. Lau, and Cho-Li Wang. 2017. Scalable adaptive NUMA-aware lock. IEEE Transactions on Parallel and Distributed Systems 28, 6 (2017), 1754--1769. Google Scholar
Digital Library
Index Terms
Lock–Unlock: Is That All? A Pragmatic Analysis of Locking in Software Systems
Recommendations
Pessimistic software lock-elision
DISC'12: Proceedings of the 26th international conference on Distributed ComputingRead-write locks are one of the most prevalent lock forms in concurrent applications because they allow read accesses to locked code to proceed in parallel. However, they do not offer any parallelism between reads and writes.
This paper introduces ...
Lock Cohorting: A General Technique for Designing NUMA Locks
Special Issue on PPOPP 2012Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machine's nonuniform memory and caching hierarchy, ever more important. This article presents lock ...
Fast and Portable Locking for Multicore Architectures
The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. The main contribution presented in this article is a new locking ...






Comments