skip to main content
research-article

Lock–Unlock: Is That All? A Pragmatic Analysis of Locking in Software Systems

Published:14 March 2019Publication History
Skip Abstract Section

Abstract

A plethora of optimized mutex lock algorithms have been designed over the past 25 years to mitigate performance bottlenecks related to critical sections and locks. Unfortunately, there is currently no broad study of the behavior of these optimized lock algorithms on realistic applications that consider different performance metrics, such as energy efficiency and tail latency. In this article, we perform a thorough and practical analysis of synchronization, with the goal of providing software developers with enough information to design fast, scalable, and energy-efficient synchronization in their systems. First, we perform a performance study of 28 state-of-the-art mutex lock algorithms, on 40 applications, on four different multicore machines. We consider not only throughput (traditionally the main performance metric) but also energy efficiency and tail latency, which are becoming increasingly important. Second, we present an in-depth analysis in which we summarize our findings for all the studied applications. In particular, we describe nine different lock-related performance bottlenecks, and we propose six guidelines helping software developers with their choice of a lock algorithm according to the different lock properties and the application characteristics.

From our detailed analysis, we make several observations regarding locking algorithms and application behaviors, several of which have not been previously discovered: (i) applications stress not only the lock–unlock interface but also the full locking API (e.g., trylocks, condition variables); (ii) the memory footprint of a lock can directly affect the application performance; (iii) for many applications, the interaction between locks and scheduling is an important application performance factor; (vi) lock tail latencies may or may not affect application tail latency; (v) no single lock is systematically the best; (vi) choosing the best lock is difficult; and (vii) energy efficiency and throughput go hand in hand in the context of lock algorithms. These findings highlight that locking involves more considerations than the simple lock/unlock interface and call for further research on designing low-memory footprint adaptive locks that fully and efficiently support the full lock interface, and consider all performance metrics.

References

  1. Yehuda Afek, Alexander Matveev, Oscar R. Moll, and Nir Shavit. 2015. Amalgamated lock-elision. In Distributed Computing. Lecture Notes in Computer Science, Vol. 9363. Springer, 309--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anant Agarwal and Mathews Cherian. 1989. Adaptive backoff synchronization techniques. In Proceedings of the 16th Annual International Symposium on Computer Architecture. ACM, New York, NY, 396--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Martin Aigner, Christoph M. Kirsch, Michael Lippautz, and Ana Sokolova. 2015. Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPLSLA’15), part of SPLASH 2015. ACM, New York, NY, 451--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Samy Al Bahra. 2015. Concurrency Kit. Retrieved November 8, 2018 from http://concurrencykit.org/Google ScholarGoogle Scholar
  5. Mohammad Mejbah Ul Alam, Tongping Liu, Guangming Zeng, and Abdullah Muzahid. 2017. SyncPerf: Categorizing, detecting, and diagnosing synchronization performance bugs. In Proceedings of the 12th European Conference on Computer Systems (EuroSys’17). ACM, New York, NY, 298--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. AMD. 2010. BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors. Retrieved November 8, 2018 from http://support.amd.com/TechDocs/42301_15h_Mod_00h-0Fh_BKDG.pdfGoogle ScholarGoogle Scholar
  7. Nikos Anastopoulos and Nectarios Koziris. 2008. Facilitating efficient synchronization of asymmetric threads on hyper-threaded processors. In Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing (IPDPS’08). IEEE, Los Alamitos, CA, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  8. Thomas E. Anderson. 1990. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Transactions on Parallel and Distributed Systems 1, 1 (1990), 6--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jelena Antic, Georgios Chatzopoulos, Rachid Guerraoui, and Vasileios Trigonakis. 2016. Locking made easy. In Proceedings of the 17th International Middleware Conference. ACM, New York, NY, 20. http://dl.acm.org/citation.cfm?id=2988357Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marc Auslander, David Edelsohn, Orran Krieger, Bryan Rosenburg, and Robert Wisniewski. 2003. Enhancement to the MCS lock for increased functionality and improved programmability. U.S. Patent Application No. 20030200457 (abandoned).Google ScholarGoogle Scholar
  11. Woongki Baek and Trishul M. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’10). ACM, New York, NY, 198--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A scalable memory allocator for multithreaded applications. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX). ACM, New York, NY, 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University, Princeton, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mike W. Blasgen, Jim Gray, Michael F. Mitoma, and Thomas G. Price. 1979. The convoy phenomenon. Operating Systems Review 13, 2 (1979), 20--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Silas Boyd-Wickizer, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2012. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium. 119--130.Google ScholarGoogle Scholar
  16. Brad Fitzpatrick. 2018. Memcached. Retrieved November 8, 2018 from http://memcached.orgGoogle ScholarGoogle Scholar
  17. Trevor Brown, Alex Kogan, Yossi Lev, and Victor Luchangco. 2016. Investigating the performance of hardware transactions on a multi-socket machine. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’16). ACM, New York, NY, 121--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Irina Calciu, Dave Dice, Tim Harris, Maurice Herlihy, Alex Kogan, Virendra J. Marathe, and Mark Moir. 2013. Message passing or shared memory: Evaluating the delegation abstraction for multicores. In Principles of Distributed Systems. Lecture Notes in Computer Science, Vol. 8304. Springer, 83--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Irina Calciu, David Dice, Yossi Lev, Victor Luchangco, Virendra J. Marathe, and Nir Shavit. 2013. NUMA-aware reader-writer locks. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13). ACM, New York, NY, 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Milind Chabbi, Michael W. Fagan, and John M. Mellor-Crummey. 2015. High performance locks for multi-level NUMA systems. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY, 215--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Milind Chabbi and John M. Mellor-Crummey. 2016. Contention-conscious, locality-preserving locks. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). ACM, New York, NY, 22:1--22:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Christoph Rupp. 2018. Upscaledb. Retrieved November 8, 2018 from https://upscaledb.comGoogle ScholarGoogle Scholar
  23. Austin T. Clements, M. Frans Kaashoek, and Nickolai Zeldovich. 2013. RadixVM: Scalable address spaces for multithreaded applications. In Proceedings of the 8th Eurosys Conference (EuroSys’13). ACM, New York, NY, 211--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Travis S. Craig. 1993. Building FIFO and Priority-Queuing Spin Locks From Atomic Swap. Technical Report TR 93-02-02. University of Washington.Google ScholarGoogle Scholar
  25. Florian David, Gaël Thomas, Julia Lawall, and Gilles Muller. 2014. Continuously measuring critical section pressure with the free-lunch profiler. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA’14), part of SPLASH 2014. ACM, New York, NY, 291--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tudor David and Rachid Guerraoui. 2016. Concurrent search data structures can be blocking and practically wait-free. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’16). ACM, New York, NY, 337--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2013. Everything you always wanted to know about synchronization but were afraid to ask. In Proceedings of the ACM SIGOPS 24th Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 33--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. 2015. Asynchronized concurrency: The secret to scaling concurrent search data structures. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). ACM, New York, NY, 631--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. David Dice. 2011. Brief announcement: A partitioned ticket lock. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11), Co-located With FCRC’11. ACM, New York, NY, 309--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Dave Dice. 2017. Malthusian locks. In Proceedings of the 12th European Conference on Computer Systems (EuroSys’17). ACM, New York, NY, 314--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. David Dice, Virendra J. Marathe, and Nir Shavit. 2011. Flat-combining NUMA locks. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’11), Co-located with FCRC’11. CACM, New York, NY, 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. David Dice, Virendra J. Marathe, and Nir Shavit. 2015. Lock cohorting: A general technique for designing NUMA locks. ACM Transactions on Parallel Computing 1, 2 (2015), 13:1--13:42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dave Dice, Mark S. Moir, and William N. Scherer III. 2003. Quickly Reacquirable Locks. Patent No. US7814488B1, Filed Sep. 9, 2002, Issued Sep. 3, 2003.Google ScholarGoogle Scholar
  34. Open Source Facebook. 2017. RocksDB. Retrieved November 8, 2018 from http://rocksdb.orgGoogle ScholarGoogle Scholar
  35. FAL Labs. 2012. Kyoto Cabinet: A Straightforward Implementation of DBM. Retrieved November 8, 2018 from http://fallabs.com/kyotocabinetGoogle ScholarGoogle Scholar
  36. Babak Falsafi, Rachid Guerraoui, Javier Picorel, and Vasileios Trigonakis. 2016. Unlocking energy. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC’16). 393--406. https://www.usenix.org/conference/atc16/technical-sessions/presentation/falsafi. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Panagiota Fatourou and Nikolaos D. Kallimanis. 2012. Revisiting the combining synchronization technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12). ACM, New York, NY, 257--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Rich Felker. 2018. musl libc. Retrieved November 8, 2018 from https://www.musl-libc.org.Google ScholarGoogle Scholar
  39. Free Software Foundation. 2018. The GNU C Library. Retrieved November 8, 2018 from https://www.gnu.org/software/libc/manualGoogle ScholarGoogle Scholar
  40. Free Software Foundation. 2018. pthread_mutex_lock GNU C library implementation. Retrieved November 8, 2018 from https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/pthread_mutex_lock.c;hb=HEADGoogle ScholarGoogle Scholar
  41. Sanjay Ghemawat and Paul Menage. 2018. TCMalloc: Thread-Caching Malloc. Retrieved November 8, 2018 from https://github.com/gperftools/gperftoolsGoogle ScholarGoogle Scholar
  42. Vincent Gramoli. 2015. More than you ever wanted to know about synchronization: Synchrobench, measuring the impact of the synchronization on concurrent algorithms. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). ACM, New York, NY, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Hugo Guiroux, Renaud Lachaize, and Vivien Quéma. 2016. Multicore locks: The case is not closed yet. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC’16). 649--662. https://www.usenix.org/conference/atc16/technical-sessions/presentation/guiroux. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Hugo Guiroux, Renaud Lachaize, and Vivien Quéma. 2018. LiTL source code and data sets. Retrieved November 8, 2018 from https://github.com/multicore-locksGoogle ScholarGoogle Scholar
  45. Pat Hanrahan, David Salzman, and Larry Aupperle. 1991. A rapid hierarchical radiosity algorithm. In Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’91). ACM, New York, NY, 197--206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Bijun He, William N. Scherer III, and Michael L. Scott. 2005. Preemption adaptivity in time-published queue-based spin locks. In High Performance Computing. Lecture Notes in Computer Science, Vol. 3769. Springer, 7--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’10). ACM, New York, NY, 355--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. IEEE. 2013. pthread_mutex_lock(3p) manual page. Retrieved November 8, 2018 from http://man7.org/linux/man-pages/man3/pthread_mutex_lock.3p.htmlGoogle ScholarGoogle Scholar
  49. IEEE. 2017. mallopt(3) manual page. Retrieved November 8, 2018 from http://man7.org/linux/man-pages/man3/mallopt.3.htmlGoogle ScholarGoogle Scholar
  50. Intel. 2015. Intel Xeon Processor E7-4800/8800 v3 Product Families. Retrieved November 8, 2018 from http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e7-v3-datasheet-vol-1.pdfGoogle ScholarGoogle Scholar
  51. Intel. 2016. Intel 64 and IA-32 Architectures, Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2. Retrieved November 8, 2018 from https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdfGoogle ScholarGoogle Scholar
  52. Ryan Johnson, Radu Stoica, Anastasia Ailamaki, and Todd C. Mowry. 2010. Decoupling contention management from scheduling. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’10). ACM, New York, NY, 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Alain Kägi, Doug Burger, and James R. Goodman. 1997. Efficient synchronization: Let them eat QOLB. In Proceedings of the 24th International Symposium on Computer Architecture. ACM, New York, NY, 170--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Anna R. Karlin, Kai Li, Mark S. Manasse, and Susan S. Owicki. 1991. Empirical studies of competitive spinning for a shared-memory multiprocessor. In Proceedings of the 13th ACM Symposium on Operating System Principles (SOSP’91). ACM, New York, NY, 41--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. 2017. Scalable NUMA-aware blocking synchronization primitives. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC’17). 603--615. https://www.usenix.org/conference/atc17/technical-sessions/presentation/kashyap. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Leonidas I. Kontothanassis, Robert W. Wisniewski, and Michael L. Scott. 1997. Scheduler-conscious synchronization. ACM Transactions on Computer Systems 15, 1 (1997), 3--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Konstantinos Koukos, David Black-Schaffer, Vasileios Spiliopoulos, and Stefanos Kaxiras. 2013. Towards more efficient execution: A decoupled access-execute approach. In Proceedings of the International Conference on Supercomputing (ICS’13). ACM, New York, NY, 253--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Bradley C. Kuszmaul. 2015. SuperMalloc: A super fast multithreaded malloc for 64-bit machines. In Proceedings of the 2015 ACM SIGPLAN International Symposium on Memory Management (ISMM’15). ACM, New York, NY, 41--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Kaz Kylheku. 2014. What is PTHREAD_MUTEX_ADAPTIVE_NP? Retrieved November 8, 2018 from http://stackoverflow.com/a/25168942Google ScholarGoogle Scholar
  60. Beng-Hong Lim. 1995. Reactive Synchronization Algorithms for Multiprocessors. Ph.D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA. http://hdl.handle.net/1721.1/36018 Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Jean-Pierre Lozi. 2014. Towards More Scalable Mutual Exclusion for Multicore Architectures. (Vers des mécanismes d’exclusion mutuelle plus efficaces pour les architectures multi-cœur). Ph.D. Dissertation. Pierre and Marie Curie University, Paris, France. https://tel.archives-ouvertes.fr/tel-01067244Google ScholarGoogle Scholar
  62. Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia L. Lawall, and Gilles Muller. 2012. Remote core locking: Migrating critical-section execution to improve the performance of multithreaded applications. In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC’12). 65--76. https://www.usenix.org/conference/atc12/technical-sessions/presentation/lozi. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia L. Lawall, and Gilles Muller. 2016. Fast and portable locking for multicore architectures. ACM Transactions on Computer Systems 33, 4 (2016), 13:1--13:62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jean-Pierre Lozi, Baptiste Lepers, Justin R. Funston, Fabien Gaud, Vivien Quéma, and Alexandra Fedorova. 2016. The Linux scheduler: A decade of wasted cores. In Proceedings of the 11th European Conference on Computer Systems (EuroSys’16). ACM, New York, NY, 1:1--1:16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Victor Luchangco, Daniel Nussbaum, and Nir Shavit. 2006. A hierarchical CLH queue lock. In Euro-Par 2006: Parallel Processing. Lecture Notes in Computer Science, Vol. 4128. Springer, 801--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Peter S. Magnusson, Anders Landin, and Erik Hagersten. 1994. Queue locks on cache coherent multiprocessors. In Proceedings of the 8th International Symposium on Parallel Processing IEEE, Los Alamitos, CA, 165--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Paul E. McKenney. 1996. Selecting locking designs for parallel programs. In Pattern Languages of Program Design 2, J. M. Vlissides, J. O. Coplien, and N. L. Kerth (Eds.). Addison Wesley Longman, Boston, MA, 501--531. http://dl.acm.org/citation.cfm?id=231958.232968 Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Paul E. McKenney. 1996. Selecting locking primitives for parallel programming. Communications of the ACM 39, 10 (1996), 75--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Paul E. McKenney. 2017. Is parallel programming hard, and, if so, what can you do about it? (v2017.01.02a). arXiv:1701.00854. http://arxiv.org/abs/1701.00854Google ScholarGoogle Scholar
  70. John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems 9, 1 (1991), 21--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Thannirmalai Somu Muthukaruppan, Anuj Pathania, and Tulika Mitra. 2014. Price theory based power management for heterogeneous multi-cores. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 161--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Regina Nuzzo. 2014. Scientific method: Statistical errors. Nature 506, 7487 (2014), 150--152.Google ScholarGoogle Scholar
  73. Oracle Corporation. 2017. MySQL. Retrieved November 8, 2018 from https://www.mysql.com.Google ScholarGoogle Scholar
  74. Y. Oyama, K. Taura, and A. Yonezawa. 1999. Executing parallel programs with synchronization bottlenecks efficiently. In Proceedings of the International Workshop on Parallel and Distributed Computing for Symbolic And Irregular Applications (PDSIA’99). 23.Google ScholarGoogle Scholar
  75. Venkatesh Pallipadi and Alexey Starikovskiy. 2006. The ondemand governor. In Proceedings of the Linux Symposium, Vol. 2. 215--230.Google ScholarGoogle Scholar
  76. Lennart Poettering. 2009. Measuring Lock Contention. Retrieved November 8, 2018 from http://0pointer.de/blog/projects/mutrace.htmlGoogle ScholarGoogle Scholar
  77. Zoran Radovic and Erik Hagersten. 2003. Hierarchical backoff locks for nonuniform communication architectures. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA’03). IEEE, Los Alamitos, CA, 241--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary R. Bradski, and Christos Kozyrakis. 2007. Evaluating MapReduce for multi-core and multiprocessor systems. In Proceedings of the 13th International Conference on High-Performance Computer Architecture (HPCA’07). IEEE, Los Alamitos, CA, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. David P. Reed and Rajendra K. Kanodia. 1979. Synchronization with eventcounts and sequences. Communications of the ACM 22, 2 (1979), 115--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Haris Ribic and Yu David Liu. 2014. Energy-efficient work-stealing language runtimes. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). ACM, New York, NY, 513--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. ffwd: Delegation is (much) faster than you think. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, New York, NY, 342--358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate data types for safe and general low-power computation. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 164--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Michael L. Scott. 2013. Shared-Memory Synchronization. Morgan 8 Claypool, San Rafael, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Michael L. Scott and William N. Scherer III. 2001. Scalable queue-based spin locks with timeout. In Proceedings of the 2001 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’01). ACM, New York, NY, 44--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Jianchen Shan, Xiaoning Ding, and Narain H. Gehani. 2017. APPLES: Efficiently handling spin-lock synchronization on virtualized platforms. IEEE Transactions on Parallel and Distributed Systems 28, 7 (2017), 1811--1824.Google ScholarGoogle ScholarCross RefCross Ref
  86. Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao Zhang, and Zhuan Chen. 2013. Power containers: An OS facility for fine-grained power and energy management on multicore servers. In Proceedings of the Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’13). ACM, New York, NY, 65--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Karan Singh, Major Bhadauria, and Sally A. McKee. 2009. Real time power estimation and thread scheduling via performance counters. SIGARCH Computer Architecture News 37, 2 (2009), 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Will Sobel, Shanti Subramanyam, Akara Sucharitakul, Jimmy Nguyen, Hubert Wong, Arthur Klepchukov, et al. 2008. Cloudstone: Multi-platform, multi-language benchmark and measurement tools for Web 2.0. Retrieved November 8, 2018 from https://pdfs.semanticscholar.org/34dd/c3da70f5b17ae0a73266ad1e4f9ae155811f.pdf.Google ScholarGoogle Scholar
  89. SQLite Consortium. 2018. SQLite. Retrieved November 8, 2018 from https://www.sqlite.org.Google ScholarGoogle Scholar
  90. Sun Microsystems. 2002. Multithreading in the Solaris Operating Environment. Retrieved November 8, 2018 from http://home.mit.bme.hu/∼meszaros/edu/oprendszerek/segedlet/unix/2_folyamatok_es_utemezes/solaris_multithread.pdfGoogle ScholarGoogle Scholar
  91. Nathan R. Tallent, John M. Mellor-Crummey, and Allan Porterfield. 2010. Analyzing lock contention in multithreaded applications. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’10). ACM, New York, NY, 269--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Jons-Tobias Wamhoff, Stephan Diestelhorst, Christof Fetzer, Patrick Marlier, Pascal Felber, and Dave Dice. 2015. The TURBO diaries: Application-controlled frequency scaling explained. In Software Engineering and Management (Vol. 239 von LNI), U. Aßmann, B. Demuth, T. Spitta, G. Püschel, and R. Kaiser (Eds.). GI, Dresden, Germany, 141--142. https://dl.gi.de/20.500.12116/2537Google ScholarGoogle Scholar
  93. Tianzheng Wang, Milind Chabbi, and Hideaki Kimura. 2016. Be my guest: MCS lock now welcomes guests. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). ACM, New York, NY, 21:1--21:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Qiang Wu, Margaret Martonosi, Douglas W. Clark, Vijay Janapa Reddi, Dan Connors, Youfeng Wu, et al. 2005. A dynamic compilation framework for controlling microprocessor energy and performance. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38). IEEE, Los Alamitos, CA, 271--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Fen Xie, Margaret Martonosi, and Sharad Malik. 2003. Compile-time dynamic voltage scaling settings: Opportunities and limits. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation. ACM, New York, NY, 49--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Chao Xu, Felix Xiaozhu Lin, Yuyang Wang, and Lin Zhong. 2015. Automated OS-level device runtime power management. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). ACM, New York, NY, 239--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Konrad Zemek. 2015. Asio, SSL, and Scalability. Retrieved November 8, 2018 from https://konradzemek.com/2015/08/16/asio-ssl-and-scalabilityGoogle ScholarGoogle Scholar
  98. Mingzhe Zhang, Haibo Chen, Luwei Cheng, Francis C. M. Lau, and Cho-Li Wang. 2017. Scalable adaptive NUMA-aware lock. IEEE Transactions on Parallel and Distributed Systems 28, 6 (2017), 1754--1769. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Lock–Unlock: Is That All? A Pragmatic Analysis of Locking in Software Systems

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Computer Systems
    ACM Transactions on Computer Systems  Volume 36, Issue 1
    February 2018
    222 pages
    ISSN:0734-2071
    EISSN:1557-7333
    DOI:10.1145/3319851
    Issue’s Table of Contents

    Copyright © 2019 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 14 March 2019
    • Accepted: 1 October 2018
    • Revised: 1 September 2018
    • Received: 1 July 2017
    Published in tocs Volume 36, Issue 1

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!