skip to main content
research-article

Fast RMWs for TSO: semantics and implementation

Published:16 June 2013Publication History
Skip Abstract Section

Abstract

Read-Modify-Write (RMW) instructions are widely used as the building blocks of a variety of higher level synchronization constructs, including locks, barriers, and lock-free data structures. Unfortunately, they are expensive in architectures such as x86 and SPARC which enforce (variants of) Total-Store-Order (TSO). A key reason is that RMWs in these architectures are ordered like a memory barrier, incurring the cost of a write-buffer drain in the critical path. Such strong ordering semantics are dictated by the requirements of the strict atomicity definition (type-1) that existing TSO RMWs use. Programmers often do not need such strong semantics. Besides, weakening the atomicity definition of TSO RMWs, would also weaken their ordering -- thereby leading to more efficient hardware implementations.

In this paper we argue for TSO RMWs to use weaker atomicity definitions -- we consider two weaker definitions: type-2 and type-3, with different relaxed ordering differences. We formally specify how such weaker RMWs would be ordered, and show that type-2 RMWs, in particular, can seamlessly replace existing type-1 RMWs in common synchronization idioms -- except in situations where a type-1 RMW is used as a memory barrier. Recent work has shown that the new C/C++11 concurrency model can be realized by generating conventional (type-1) RMWs for C/C++11 SC-atomic-writes and/or SC-atomic-reads. We formally prove that this is equally valid using the proposed type-2 RMWs; type-3 RMWs, on the other hand, could be used for SC-atomic-reads (and optionally SC-atomic-writes). We further propose efficient microarchitectural implementations for type-2 (type-3) RMWs -- simulation results show that our implementation reduces the cost of an RMW by up to 58.9% (64.3%), which translates into an overall performance improvement of up to 9.0% (9.2%) on a set of parallel programs, including those from the SPLASH-2, PARSEC, and STAMP benchmarks.

References

  1. S. V. Adve. Designing memory consistency models for shared-memory multiprocessors. PhD thesis, Madison, WI, USA, 1993. UMI Order No. GAX94-07354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Alglave. A Shared Memory Poetics. PhD thesis, 2010.Google ScholarGoogle Scholar
  3. H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. M. Michael, and M. T. Vechev. Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated. In POPL, pages 487--498, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. A. Bader and G. Cong. A fast, parallel spanning tree algorithm for symmetric multiprocessors (smps). J. Parallel Distrib. Comput., 65(9):994--1006, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Batty, K. Memarian, S. Owens, S. Sarkar, and P. Sewell. Clarifying and compiling C/C++ concurrency: from C++11 to POWER. In Proc. POPL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C++concurrency. In POPL, pages 55--66, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Becker, editor. Programming Languages -- C++.2011. ISO/IEC 14882:2011. A non-final recent version is available at http://www.openstd.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf.Google ScholarGoogle Scholar
  8. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Blundell, M. M. K. Martin, and T. F. Wenisch. Invisifence: performance-transparent memory ordering in conventional multiprocessors. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Programming Languages -- C. 2011. ISO/IEC 9899:2011. A non-final recent version is available at http://www.open-std.org/jtc1/sc22/wg14/docs/n1539.pdf.Google ScholarGoogle Scholar
  11. D. Dice, O. Shalev, and N. Shavit. Transactional locking ii. In DISC, pages 194--208, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Gharachorloo, S. Adve, A. Gupta, J. Hennessy, and M. Hill. Specifying system requirements for memory consistency models. Computer Systems Laboratory, Stanford University, 1993.Google ScholarGoogle Scholar
  13. K. Gharachorloo, A. Gupta, and J. L. Hennessy. Two techniques to enhance the performance of memory consistency models. In ICPP (1), pages 355--364, 1991.Google ScholarGoogle Scholar
  14. C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is sc + ilp=rc? In ISCA, pages 162--171, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Herlihy. Wait-free synchronization. ACM Trans. Program. Lang. Syst., 13:124--149, January 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Intel Corporation. Intel® 64 and IA-32 Architectures Software Developer's Manual. Number 253669-033US. December 2009.Google ScholarGoogle Scholar
  17. E. Ladan-Mozes, I.-T. A. Lee, and D. Vyukov. Location-based memory fences. In SPAA, pages 75--84, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Lin, V. Nagarajan, R. Gupta, and B. Rajaram. Efficient sequential consistency via conflict ordering. In ASPLOS, pages 273--286, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. I. B. Machine and A. C. I. Staff. PowerPC Microprocessor Common Hardware Reference Platform: A System Architecture. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Michael and M. Scott. Implementation of atomic primitives on distributed shared memory multiprocessors. In Proc. HPCA, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  21. N. Muralimanohar and R. Balasubramonian. Cacti 6.0: A tool to understand large caches.Google ScholarGoogle Scholar
  22. S. Owens, S. Sarkar, and P. Sewell. A better x86 memory model: x86-TSO. In Proc. TPHOLs, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Singh, S. Narayanasamy, D. Marino, T. D. Millstein, and M. Musuvathi. End-to-end sequential consistency. In ISCA, pages 524--535, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence. Morgan and ClayPool Publishers, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. SPARC International, Inc. The SPARC architecture manual (version 8). Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. SPARC International, Inc. The SPARC architecture manual (version 9). Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Terekhov. Brief tentative example x86 implementation for C/C++ memory model.textttcpp-threads mailing list, http://www.decadent.org.uk/pipermail/cpp-threads/2008-December/001933.html, Dec. 2008.Google ScholarGoogle Scholar
  28. E. Vallejo, R. Beivide, A. Cristal, T. Harris, F. Vallejo, O. Unsal, and M. Valero. Architectural support for fair reader-writer locking. In Proc. MICRO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast RMWs for TSO: semantics and implementation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 6
      PLDI '13
      June 2013
      515 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2499370
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2013
        546 pages
        ISBN:9781450320146
        DOI:10.1145/2491956

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 June 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!