skip to main content
research-article

Temporally Bounding TSO for Fence-Free Asymmetric Synchronization

Published:14 March 2015Publication History
Skip Abstract Section

Abstract

This paper introduces a temporally bounded total store ordering (TBTSO) memory model, and shows that it enables nonblocking fence-free solutions to asymmetric synchronization problems, such as those arising in memory reclamation and biased locking.

TBTSO strengthens the TSO memory model by bounding the time it takes a store to drain from the store buffer into memory. This bound enables devising fence-free algorithms for asymmetric problems, which require a performance-critical fast path to synchronize with an infrequently executed slow path. We demonstrate this by constructing (1) a fence-free version of the hazard pointers memory reclamation scheme, and (2) a fence-free biased lock algorithm which is compatible with unmanaged environments as it does not rely on safe points or similar mechanisms.

We further argue that TBTSO can be implemented in hardware with modest modifications to existing TSO architectures. However, our design makes assumptions about proprietary implementation details of commercial hardware; it thus best serves as a starting point for a discussion on the feasibility of hardware TBTSO implementation. We also show how minimal OS support enables the adaptation of TBTSO algorithms to x86 systems.

References

  1. The SPARC Architecture Manual Version 8. Prentice Hall, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3: System Programming Guide. http://download.intel.com/products/processor/manual/325384.pdf, June 2013.Google ScholarGoogle Scholar
  3. Intel 64 and IA-32 Architectures Optimization Reference Manual. https://www-ssl.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html, July 2013.Google ScholarGoogle Scholar
  4. D. Alistarh, P. Eugster, M. Herlihy, A. Matveev, and N. Shavit. StackTrack: An Automated Transactional Approach to Concurrent Memory Reclamation. In Proceedings of the 9th European Conference on Computer Systems, EuroSys '14, pages 25:1--25:14, New York, NY, USA, 2014. ACM. ISBN 978--1--4503--2704--6. 10.1145/2592798.2592808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Blundell, M. M. Martin, and T. F. Wenisch. Invisifence: Performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, pages 233--244, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--526-0. 10.1145/1555754.1555785. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Braginsky, A. Kogan, and E. Petrank. Drop the anchor: lightweight memory management for non-blocking data structures. In Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '13, pages 33--42, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Budruk, D. Anderson, and E. Solari. PCI Express System Architecture. Pearson Education, 2003. ISBN 0321156307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Corbet. On vsyscalls and the vdso. http://lwn.net/Articles/446528/, 2011. Linux World News.Google ScholarGoogle Scholar
  9. J. Corbet. (Nearly) full tickless operation in 3.10. http://lwn.net/Articles/549580/, 2013. Linux World News.Google ScholarGoogle Scholar
  10. M. Desnoyers, P. E. McKenney, A. S. Stern, M. R. Dagenais, and J. Walpole. User-Level Implementations of Read-Copy Update. IEEE Transactions on Parallel and Distributed Systems, 23 (2): 375--382, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Dice, H. Huang, and M. Yang. Asymmetric Dekker Synchronization. http://home.comcast.net/pjbishop/Dave/Asymmetric-Dekker-Synchronization.txt, 2001.Google ScholarGoogle Scholar
  12. E. W. Dijkstra. Cooperating sequential processes. http://www.cs.utexas.edu/users/EWD/ewd01xx/EWD123.PDF, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Dragojević, M. Herlihy, Y. Lev, and M. Moir. On the Power of Hardware Transactional Memory to Simplify Memory Management. In Proceedings of the 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC '11, pages 99--108, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0719--2. 10.1145/1993806.1993821. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Duan, A. Muzahid, and J. Torrellas. WeeFence: toward making fences free in TSO. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA '13, pages 213--224, New York, NY, USA, 2013. ACM. 10.1145/2485922.2485941. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Fraser. Practical lock-freedom. PhD thesis, University of Cambridge, Computer Laboratory, University of Cambridge, Computer Laboratory, February 2004.Google ScholarGoogle Scholar
  16. J. L. Hennessy and D. A. Patterson. Computer Architecture, Fourth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2006. ISBN 0123704901. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13: 124--149, January 1991. 10.1145/114005.102808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008. ISBN 0123705916, 9780123705914. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Herlihy, V. Luchangco, P. Martin, and M. Moir. Nonblocking memory management support for dynamic-sized data structures. ACM Transactions on Computer Systems (TOCS), 23 (2): 146--196, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Kawachiya, A. Koseki, and T. Onodera. Lock Reservation: Java Locks Can Mostly Do Without Atomic Operations. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '02, pages 130--141, New York, NY, USA, 2002. ACM. ISBN 1--58113--471--1. 10.1145/582419.582433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, 28 (9): 690--691, Sept. 1979. ISSN 0018--9340. 10.1109/TC.1979.1675439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Lin, V. Nagarajan, and R. Gupta. Address-aware fences. In Proceedings of the 27th International Conference on Supercomputing, ICS '13, pages 313--324, New York, NY, USA, 2013. ACM. 10.1145/2464996.2465015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Liu, H. Zhang, and H. Chen. Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks. In Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC '14, pages 219--230, Philadelphia, PA, June 2014. USENIX Association. ISBN 978--1--931971--10--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.Google ScholarGoogle Scholar
  25. P. E. McKenney. Structured deferral: synchronization via procrastination. Communications of the ACM, 56 (7): 40--49, July 2013. 10.1145/2483852.2483867. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. E. McKenney and J. D. Slingwine. Read-copy update: Using execution history to solve concurrency problems. In Proceedings of the 10th International Conference on Parallel and Distributed Computing and Systems, IASTED '98, pages 508--518. ACTA Press, 1998.Google ScholarGoogle Scholar
  27. M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '02, pages 73--82, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. M. Michael. Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects. IEEE Transactions on Parallel and Distributed System, 15 (6): 491--504, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Morrison and Y. Afek. Fence-free Work Stealing on Bounded TSO Processors. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 413--426, New York, NY, USA, 2014. ACM. ISBN 978--1--4503--2305--5. 10.1145/2541940.2541987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Onodera, K. Kawachiya, and A. Koseki. Lock Reservation for Java Reconsidered. In M. Odersky, editor, ECOOP 2004 -- Object-Oriented Programming, volume 3086 of Lecture Notes in Computer Science, pages 559--583. Springer Berlin Heidelberg, 2004. ISBN 978--3--540--22159--3.Google ScholarGoogle Scholar
  31. G. L. Peterson. Myths about the mutual exclusion problem. Information Processing Letters, 12 (3): 115--116, 1981. ISSN 0020-0190.Google ScholarGoogle Scholar
  32. W. Ruan, Y. Liu, and M. Spear. Boosting Timestamp-based Transactional Memory by Exploiting Hardware Cycle Counters. ACM Transactions on Architecture and Code Optimization (TACO), 10 (4): 40:1--40:21, Dec. 2013. ISSN 1544--3566. 10.1145/2541228.2555297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Russell and D. Detlefs. Eliminating Synchronization-related Atomic Operations with Biased Locking and Bulk Rebiasing. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 263--272, New York, NY, USA, 2006. ACM. ISBN 1--59593--348--4. 10.1145/1167473.1167496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. Communications of the ACM, 53 (7): 89--97, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Singh, S. Narayanasamy, D. Marino, T. Millstein, and M. Musuvathi. End-to-end sequential consistency. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA '12, pages 524--535, Washington, DC, USA, 2012. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool Publishers, 1st edition, 2011. ISBN 1608455645, 9781608455645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. F. J. Torres-Rojas, M. Ahamad, and M. Raynal. Timed Consistency for Shared Distributed Objects. In Proceedings of the 18th Annual ACM Symposium on Principles of Distributed Computing, PODC '99, pages 163--172, New York, NY, USA, 1999. ACM. ISBN 1--58113-099--6. 10.1145/301308.301350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Triplett, P. E. McKenney, and J. Walpole. Resizable, scalable, concurrent hash tables via relativistic programming. In Proceedings of the 2011 USENIX Annual Technical Conference, USENIX ATC'11, pages 145--158, Berkeley, CA, USA, 2011. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. R. Vash, B. Jung, and R. Tan. System-wide quiescence and per-thread transaction fence in a distributed caching agent. http://www.google.com/patents/US8443148, 2013. US Patent 8443148 B2.Google ScholarGoogle Scholar
  40. N. Vasudevan, K. S. Namjoshi, and S. A. Edwards. Simple and Fast Biased Locks. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pages 65--74, New York, NY, USA, 2010. ACM. ISBN 978--1--4503-0178--7. 10.1145/1854273.1854287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for store-wait-free multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA '07, pages 266--277, New York, NY, USA, 2007. ACM. 10.1145/1250662.1250696. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Temporally Bounding TSO for Fence-Free Asymmetric Synchronization

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 50, Issue 4
          ASPLOS '15
          April 2015
          676 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2775054
          • Editor:
          • Andy Gill
          Issue’s Table of Contents
          • cover image ACM Conferences
            ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems
            March 2015
            720 pages
            ISBN:9781450328357
            DOI:10.1145/2694344

          Copyright © 2015 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 March 2015

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!