skip to main content
research-article

A wait-free queue as fast as fetch-and-add

Published:27 February 2016Publication History
Skip Abstract Section

Abstract

Concurrent data structures that have fast and predictable performance are of critical importance for harnessing the power of multicore processors, which are now ubiquitous. Although wait-free objects, whose operations complete in a bounded number of steps, were devised more than two decades ago, wait-free objects that can deliver scalable high performance are still rare.

In this paper, we present the first wait-free FIFO queue based on fetch-and-add (FAA). While compare-and-swap (CAS) based non-blocking algorithms may perform poorly due to work wasted by CAS failures, algorithms that coordinate using FAA, which is guaranteed to succeed, can in principle perform better under high contention. Along with FAA, our queue uses a custom epoch-based scheme to reclaim memory; on x86 architectures, it requires no extra memory fences on our algorithm's typical execution path. An empirical study of our new FAA-based wait-free FIFO queue under high contention on four different architectures with many hardware threads shows that it outperforms prior queue designs that lack a wait-free progress guarantee. Surprisingly, at the highest level of contention, the throughput of our queue is often as high as that of a microbenchmark that only performs FAA. As a result, our fast wait-free queue implementation is useful in practice on most multi-core systems today. We believe that our design can serve as an example of how to construct other fast wait-free objects.

Skip Supplemental Material Section

Supplemental Material

References

  1. J. Alemany and E. W. Felten. Performance Issues in Non-blocking Synchronization on Shared-memory Multiprocessors. In Proceedings of the Eleventh Annual ACM Symposium on Principles of Distributed Computing, PODC '92, pages 125--134, New York, NY, USA, 1992. ACM. doi: 10.1145/135419.135446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Anderson and M. Moir. Universal constructions for large objects. In J.-M. Hlary and M. Raynal, editors, Distributed Algorithms, volume 972 of Lecture Notes in Computer Science, pages 168--182. Springer Berlin Heidelberg, 1995. doi: 10.1007/BFb0022146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. A. Brown. Reclaiming Memory for Lock-Free Data Structures: There Has to Be a Better Way. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC '15, pages 261--270, New York, NY, USA, 2015. ACM. doi: 10.1145/2767386. 2767436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Chuong, F. Ellen, and V. Ramachandran. A Universal Construction for Wait-free Transaction Friendly Data Structures. In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '10, pages 335--344, New York, NY, USA, 2010. ACM. doi: 10.1145/1810479.1810538. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. W. Dijkstra. Solution of a problem in concurrent programming control. Commun. ACM, 8(9):569--, Sept. 1965. doi: 10.1145/365559. 365617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Evans. Scalable memory allocation using jemalloc. http://on.fb.me/1KmCoyj, January 2011.Google ScholarGoogle Scholar
  7. P. Fatourou and N. D. Kallimanis. A Highly-efficient Wait-free Universal Construction. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '11, pages 325--334, New York, NY, USA, 2011. ACM. doi: 10.1145/1989493. 1989549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Fatourou and N. D. Kallimanis. Revisiting the Combining Synchronization Technique. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 257--266, New York, NY, USA, 2012. ACM. doi: 10.1145/2145816.2145849. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Georges, D. Buytaert, and L. Eeckhout. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications, OOPSLA '07, pages 57--76, New York, NY, USA, 2007. ACM. doi: 10.1145/1297027.1297033. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. L. Harris. A Pragmatic Implementation of Non-blocking Linked-Lists. In Proceedings of the 15th International Conference on Distributed Computing, DISC '01, pages 300--314, London, UK, UK, 2001. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Herlihy. Wait-free Synchronization. ACM Trans. Program. Lang. Syst., 13(1):124--149, Jan. 1991. doi: 10.1145/114005.102808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. P. Herlihy and J. M. Wing. Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Program. Lang. Syst., 12(3): 463--492, July 1990. doi: 10.1145/78969.78972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Kogan and E. Petrank. Wait-free Queues with Multiple Enqueuers and Dequeuers. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 223--234, New York, NY, USA, 2011. ACM. doi: 10.1145/1941553. 1941585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Kogan and E. Petrank. A Methodology for Creating Fast Wait-free Data Structures. SIGPLAN Not., 47(8):141--150, Feb. 2012. doi: 10.1145/2370036.2145835. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Trans. Comput., 28(9):690--691, Sept. 1979. doi: 10.1109/TC.1979.1675439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. M. Michael. Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects. IEEE Trans. Parallel Distrib. Syst., 15(6):491--504, June 2004. doi: 10.1109/TPDS.2004.8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. M. Michael and M. L. Scott. Simple, Fast, and Practical Non-blocking and Blocking Concurrent Queue Algorithms. In Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing, PODC '96, pages 267--275, New York, NY, USA, 1996. ACM. doi: 10.1145/248052.248106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Moir. Laziness pays! Using lazy synchronization mechanisms to improve non-blocking constructions. Distributed Computing, 14(4): 193--204, 2001. doi: 10.1007/s004460100063. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Morrison and Y. Afek. Fast Concurrent Queues for x86 Processors. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 103--112, New York, NY, USA, 2013. ACM. doi: 10.1145/2442516.2442527. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. TAU Multicore Computing Group. Fast concurrent queues for x86 processors. http://mcg.cs.tau.ac.il/projects/lcrq/.Google ScholarGoogle Scholar
  21. S. Timnat and E. Petrank. A Practical Wait-free Simulation for Lock-free Data Structures. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '14, pages 357--368, New York, NY, USA, 2014. ACM. doi: 10.1145/2555243.2555261. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A wait-free queue as fast as fetch-and-add

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 51, Issue 8
          PPoPP '16
          August 2016
          405 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3016078
          Issue’s Table of Contents
          • cover image ACM Conferences
            PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
            February 2016
            420 pages
            ISBN:9781450340922
            DOI:10.1145/2851141

          Copyright © 2016 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 February 2016

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!