Abstract
In this paper, we present the most extensive comparison of synchronization techniques. We evaluate 5 different synchronization techniques through a series of 31 data structure algorithms from the recent literature on 3 multicore platforms from Intel, Sun Microsystems and AMD. To this end, we developed in C/C++ and Java a new micro-benchmark suite, called Synchrobench, hence helping the community evaluate new data structures and synchronization techniques. The main conclusion of this evaluation is threefold: (i) although compare-and-swap helps achieving the best performance on multicores, doing so correctly is hard; (ii) optimistic locking offers varying performance results while transactional memory offers more consistent results; and (iii) copy-on-write and read-copy-update suffer more from contention than any other technique but could be combined with others to derive efficient algorithms.
- JSE-7. http://docs.oracle.com/javase/7/docs/api/.Google Scholar
- D. Alistarh, J. Kopisky, J. Li, and N. Shavit. The SprayList: A scalable relaxed priority queue. Technical Report TR-2014-16, MSR, 2014.Google Scholar
- J. Antony, P. P. Janes, and A. P. Rendell. Exploring thread and memory placement on NUMA architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/Hypertransport. In HiPC, pages 338–352, 2006. Google Scholar
Digital Library
- M. Arbel and H. Attiya. Concurrent updates with RCU: Search tree as an example. In PODC, pages 196–205, 2014. Google Scholar
Digital Library
- D. F. Bacon, R. E. Strom, and A. Tarafdar. Guava: a dialect of Java without data races. In OOPSLA, pages 382–400, 2000. Google Scholar
Digital Library
- S. Boyd-Wickizer, M. F. Kaashoek, R. Morris, and N. Zeldovich. Nonscalable locks are dangerous. In Proceedings of the Linux Symposium, Ottawa, Canada, 2012.Google Scholar
- N. G. Bronson, J. Casper, H. Chafi, and K. Olukotun. A practical concurrent binary search tree. In PPoPP, pages 257–268, 2010. Google Scholar
Digital Library
- C. Cao Minh, J. Chung, C. Kozyrakis, and K. Olukotun. STAMP: Stanford transactional applications for multi-processing. In IISWC, pages 35–46, 2008.Google Scholar
- B. D. Carlstrom, A. McDonald, M. Carbin, C. Kozyrakis, and K. Olukotun. Transactional collection classes. In PPoPP, pages 56–67, 2007. Google Scholar
Digital Library
- D. Cederman, B. Chatterjee, N. Nguyen, Y. Nikolakopoulos, M. Papatriantafilou, and P. Tsigas. A study of the behavior of synchronization methods in commonly used languages and systems. In IPDPS, pages 1309–1320, 2013. Google Scholar
Digital Library
- A. T. Clements, M. F. Kaashoek, and N. Zeldovich. Scalable address spaces using RCU balanced trees. In ASPLOS, pages 199–210, 2012. Google Scholar
Digital Library
- C. Click. A lock-free hash table, 2007. http://www. azulsystems.com/events/javaone_2007/2007_ LockFreeHash.pdf.Google Scholar
- T. Crain, V. Gramoli, and M. Raynal. A contention-friendly methodology for search structures. Technical Report RR-1989, INRIA, 2012.Google Scholar
- T. Crain, V. Gramoli, and M. Raynal. A speculation-friendly binary search tree. In PPoPP, pages 161–170, 2012. Google Scholar
Digital Library
- T. Crain, V. Gramoli, and M. Raynal. A contention-friendly binary search tree. In Euro-Par, pages 229–240, 2013. Google Scholar
Digital Library
- T. Crain, V. Gramoli, and M. Raynal. No hot spot non-blocking skip list. In ICDCS, pages 196–205, 2013. Google Scholar
Digital Library
- L. Dalessandro, M. F. Spear, and M. L. Scott. NOrec: streamlining STM by abolishing ownership records. In PPoPP, pages 67–78, 2010. Google Scholar
Digital Library
- T. David, R. Guerraoui, and V. Trigonakis. Everything you always wanted to know about synchronization but were afraid to ask. In SOSP, pages 33–48, 2013. Google Scholar
Digital Library
- T. David, R. Guerraoui, and V. Trigonakis. Asynchronized concurrency: the secret of scaling concurrent search structures. In ASPLOS, 2015.Google Scholar
Digital Library
- To appear.Google Scholar
- D. Dice, O. Shalev, and N. Shavit. Transactional locking II. In DISC, pages 194–208, 2006. Google Scholar
Digital Library
- I. Dick, A. Fekete, and V. Gramoli. Logarithmic data structures for multicores. Technical Report 697, University of Sydney, 2014.Google Scholar
- D. Drachsler, M. Vechev, and E. Yahav. Practical concurrent binary search trees via logical ordering. In PPoPP, pages 343–356, 2014. Google Scholar
Digital Library
- A. Dragojevi´c, P. Felber, V. Gramoli, and R. Guerraoui. Why STM can be more than a research toy. Commun. ACM, 54(4):70–77, 2011. Google Scholar
Digital Library
- F. Ellen, P. Fatourou, E. Ruppert, and F. van Breugel. Non-blocking binary search trees. In PODC, pages 131–140, 2010. Google Scholar
Digital Library
- P. Felber, C. Fetzer, and T. Riegel. Dynamic performance tuning of word-based software transactional memory. In PPoPP, pages 237–246, 2008. Google Scholar
Digital Library
- P. Felber, V. Gramoli, and R. Guerraoui. Elastic transactions. Technical Report LPD-REPORT-2009-002, EPFL, 2009.Google Scholar
- P. Felber, V. Gramoli, and R. Guerraoui. Elastic transactions. In DISC, pages 93–108, 2009. Google Scholar
Digital Library
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Quantifying the mismatch between emerging scale-out applications and modern processors. TOCS, 30(4):15:1–15:24, 2012. Google Scholar
Digital Library
- M. Fomitchev and E. Ruppert. Lock-free linked lists and skip lists. In PODC, pages 50–59, 2004. Google Scholar
Digital Library
- K. Fraser. Practical lock freedom. PhD thesis, Cambridge University, September 2003.Google Scholar
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cacheoblivious algorithms. In FOCS, page 285, 1999. Google Scholar
Digital Library
- B. Goetz, T. Peierls, J. Bloch, J. Bowbeer, D. Lea, and D. Holmes. Java Concurrency in Practice. Addison-Wesley, 2005. Google Scholar
Digital Library
- V. Gramoli and R. Guerraoui. Democratizing transactional programming. Commun. ACM, 57(1):86–93, 2014. Google Scholar
Digital Library
- V. Gramoli and R. Guerraoui. Reusable concurrent data types. In ECOOP, pages 182–206, 2014.Google Scholar
Digital Library
- V. Gramoli, R. Guerraoui, and V. Trigonakis. TM2C: A software transactional memory for many-cores. In EuroSys, pages 351–364, 2012. Google Scholar
Digital Library
- R. Guerraoui, M. Kapalka, and J. Vitek. STMBench7: a benchmark for software transactional memory. In EuroSys, pages 315–324, 2007. Google Scholar
Digital Library
- D. Harmanci, P. Felber, V. Gramoli, and C. Fetzer. TMunit: Testing software transactional memories. In 4th ACM SIGPLAN Workshop on Transactional Computing, 2009.Google Scholar
- D. Harmanci, V. Gramoli, P. Felber, and C. Fetzer. Extensible transactional memory testbed. J. of Parallel and Distributed Computing, 70(10):1053–1067, March 2010. Google Scholar
Digital Library
- T. Harris. A pragmatic implementation of non-blocking linked-lists. In DISC, pages 300–314, 2001. Google Scholar
Digital Library
- T. Harris, S. Marlow, S. Peyton-Jones, and M. Herlihy. Composable memory transactions. In PPoPP, pages 48–60, 2005. Google Scholar
Digital Library
- S. Heller, M. Herlihy, V. Luchangco, M. Moir, W. N. Scherer III, and N. Shavit. A lazy concurrent list-based set algorithm. Parallel Processing Letters, 17(4):411–424, 2007.Google Scholar
Cross Ref
- M. Herlihy and E. Koskinen. Transactional boosting: A methodology for highly-concurrent transactional objects. In PPoPP, pages 207–216, 2008. Google Scholar
Digital Library
- M. Herlihy, Y. Lev, V. Luchangco, and N. Shavit. A simple optimistic skiplist algorithm. In SIROCCO, pages 124–138, 2007. Google Scholar
Digital Library
- M. Herlihy, V. Luchangco, and M. Moir. Obstruction-free synchronization: Double-ended queues as an example. In ICDCS, 2003. Google Scholar
Digital Library
- M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer, III. Software transactional memory for dynamic-sized data structures. In PODC, pages 92–101, 2003. Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures. In ISCA, pages 289–300, 1993. Google Scholar
Digital Library
- M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kauffman, 2008. Google Scholar
Digital Library
- G. Korland, N. Shavit, and P. Felber. Deuce: Noninvasive software transactional memory. Transactions on HiPEAC, 5(2), 2010.Google Scholar
- D. Lea. JSR-166 specification request group. http://g.oswego. edu/dl/concurrency-interest.Google Scholar
- J. J. Levandoski and S. Sengupta. The BW-Tree: A latch-free B-tree for log-structured flash storage. IEEE Data Eng. Bull., 36(2):56–62, 2013.Google Scholar
- Y. Liu, K. Zhang, and M. Spear. Dynamic-sized nonblocking hash tables. In PODC, pages 242–251, 2014. Google Scholar
Digital Library
- A. Marowka. TBBench: A micro-benchmark suite for Intel threading building blocks. JIPS, 8(2):331–346, 2012.Google Scholar
Cross Ref
- P. E. McKenney, J. Appavoo, A. Kleen, O. Krieger, R. Russell, D. Sarma, and M. Soni. Read-copy update. In AUUG, 2001.Google Scholar
- M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In SPAA, pages 73–82, 2002. Google Scholar
Digital Library
- M. M. Michael. The balancing act of choosing nonblocking features. Commun. ACM, 56(9):46–53, 2013. Google Scholar
Digital Library
- M. M. Michael and M. L. Scott. Simple, fast, and practical nonblocking and blocking concurrent queue algorithms. In PODC, pages 267–275, 1996. Google Scholar
Digital Library
- M. Mohamedin, B. Ravindran, and R. Palmieri. ByteSTM: Virtual machine-level Java software transactional memory. In COORDINATION, pages 166–180, 2013.Google Scholar
- A. Natarajan and N. Mittal. Fast concurrent lock-free binary search trees. In PPoPP, pages 317–328, 2014. Google Scholar
Digital Library
- R. Odaira, J. G. Castaños, and T. Nakaike. Do C and Java programs scale differently on hardware transactional memory? In IISWC, pages 34–43, 2013.Google Scholar
Cross Ref
- W. Pugh. Skip lists: a probabilistic alternative to balanced trees. Commun. ACM, 33, June 1990. Google Scholar
Digital Library
- T. Riegel, P. Felber, and C. Fetzer. A lazy snapshot algorithm with eager validation. In DISC, pages 284–298, 2006. Google Scholar
Digital Library
- H. Sundell and P. Tsigas. NOBLE: A non-blocking inter-process communication library. Technical report, Chalmers University of Technology, March 2002.Google Scholar
- H. Sundell and P. Tsigas. Scalable and lock-free concurrent dictionaries. In SAC, pages 1438–1445. ACM, 2004. Google Scholar
Digital Library
- H. Sundell and P. Tsigas. Lock-free deques and doubly linked lists. J. Parallel Distrib. Comput., 68(7):1008–1020, 2008. Google Scholar
Digital Library
- I. Umar, O. J. Anshus, and P. H. Ha. DeltaTree: A practical localityaware concurrent search tree. Technical Report 2013-74, University of Tromso, Norway, Oct. 2013.Google Scholar
- B. Wicht. Binary trees implementations comparison for multicore programming. Technical report, Switzerland HES-SO University of applied science, 2012.Google Scholar
- R. M. Yoo, Y. Ni, A. Welc, B. Saha, A.-R. Adl-Tabatabai, and H.-H. S. Lee. Kicking the tires of software transactional memory: why the going gets tough. In SPAA, pages 265–274, 2008. Google Scholar
Digital Library
- F. Zyulkyarov, A. Cristal, S. Cvijic, E. Ayguade, M. Valero, O. Unsal, and T. Harris. Wormbench: A configurable workload for evaluating transactional memory systems. In MEDEA, pages 61–68, 2008. Google Scholar
Digital Library
- F. Zyulkyarov, S. Stipic, T. Harris, O. S. Unsal, A. Cristal, I. Hur, and M. Valero. Profiling and optimizing transactional memory applications. International Journal of Parallel Programming, 40(1):25–56, 2012.Google Scholar
Cross Ref
Index Terms
More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms
Recommendations
More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingIn this paper, we present the most extensive comparison of synchronization techniques. We evaluate 5 different synchronization techniques through a series of 31 data structure algorithms from the recent literature on 3 multicore platforms from Intel, ...
Looking for efficient implementations of concurrent objects
PaCT'11: Proceedings of the 11th international conference on Parallel computing technologiesAs introduced by Taubenfeld, a contention-sensitive implementation of a concurrent object is an implementation such that the overhead introduced by locking is eliminated in the common cases, i.e., when there is no contention or when the operations ...
Transactional Acceleration of Concurrent Data Structures
SPAA '15: Proceedings of the 27th ACM symposium on Parallelism in Algorithms and ArchitecturesConcurrent data structures are a fundamental building block for scalable multi-threaded programs. While Transactional Memory (TM) was originally conceived as a mechanism for simplifying the creation of concurrent data structures, modern hardware TM ...






Comments