Abstract
This paper introduces a temporally bounded total store ordering (TBTSO) memory model, and shows that it enables nonblocking fence-free solutions to asymmetric synchronization problems, such as those arising in memory reclamation and biased locking.
TBTSO strengthens the TSO memory model by bounding the time it takes a store to drain from the store buffer into memory. This bound enables devising fence-free algorithms for asymmetric problems, which require a performance-critical fast path to synchronize with an infrequently executed slow path. We demonstrate this by constructing (1) a fence-free version of the hazard pointers memory reclamation scheme, and (2) a fence-free biased lock algorithm which is compatible with unmanaged environments as it does not rely on safe points or similar mechanisms.
We further argue that TBTSO can be implemented in hardware with modest modifications to existing TSO architectures. However, our design makes assumptions about proprietary implementation details of commercial hardware; it thus best serves as a starting point for a discussion on the feasibility of hardware TBTSO implementation. We also show how minimal OS support enables the adaptation of TBTSO algorithms to x86 systems.
- The SPARC Architecture Manual Version 8. Prentice Hall, 1992. Google Scholar
Digital Library
- Intel 64 and IA-32 Architectures Software Developers Manual, Volume 3: System Programming Guide. http://download.intel.com/products/processor/manual/325384.pdf, June 2013.Google Scholar
- Intel 64 and IA-32 Architectures Optimization Reference Manual. https://www-ssl.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html, July 2013.Google Scholar
- D. Alistarh, P. Eugster, M. Herlihy, A. Matveev, and N. Shavit. StackTrack: An Automated Transactional Approach to Concurrent Memory Reclamation. In Proceedings of the 9th European Conference on Computer Systems, EuroSys '14, pages 25:1--25:14, New York, NY, USA, 2014. ACM. ISBN 978--1--4503--2704--6. 10.1145/2592798.2592808. Google Scholar
Digital Library
- C. Blundell, M. M. Martin, and T. F. Wenisch. Invisifence: Performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, pages 233--244, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--526-0. 10.1145/1555754.1555785. Google Scholar
Digital Library
- A. Braginsky, A. Kogan, and E. Petrank. Drop the anchor: lightweight memory management for non-blocking data structures. In Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '13, pages 33--42, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- R. Budruk, D. Anderson, and E. Solari. PCI Express System Architecture. Pearson Education, 2003. ISBN 0321156307. Google Scholar
Digital Library
- J. Corbet. On vsyscalls and the vdso. http://lwn.net/Articles/446528/, 2011. Linux World News.Google Scholar
- J. Corbet. (Nearly) full tickless operation in 3.10. http://lwn.net/Articles/549580/, 2013. Linux World News.Google Scholar
- M. Desnoyers, P. E. McKenney, A. S. Stern, M. R. Dagenais, and J. Walpole. User-Level Implementations of Read-Copy Update. IEEE Transactions on Parallel and Distributed Systems, 23 (2): 375--382, 2012. Google Scholar
Digital Library
- D. Dice, H. Huang, and M. Yang. Asymmetric Dekker Synchronization. http://home.comcast.net/pjbishop/Dave/Asymmetric-Dekker-Synchronization.txt, 2001.Google Scholar
- E. W. Dijkstra. Cooperating sequential processes. http://www.cs.utexas.edu/users/EWD/ewd01xx/EWD123.PDF, 1968. Google Scholar
Digital Library
- A. Dragojević, M. Herlihy, Y. Lev, and M. Moir. On the Power of Hardware Transactional Memory to Simplify Memory Management. In Proceedings of the 30th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, PODC '11, pages 99--108, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0719--2. 10.1145/1993806.1993821. Google Scholar
Digital Library
- Y. Duan, A. Muzahid, and J. Torrellas. WeeFence: toward making fences free in TSO. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA '13, pages 213--224, New York, NY, USA, 2013. ACM. 10.1145/2485922.2485941. Google Scholar
Digital Library
- K. Fraser. Practical lock-freedom. PhD thesis, University of Cambridge, Computer Laboratory, University of Cambridge, Computer Laboratory, February 2004.Google Scholar
- J. L. Hennessy and D. A. Patterson. Computer Architecture, Fourth Edition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2006. ISBN 0123704901. Google Scholar
Digital Library
- M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13: 124--149, January 1991. 10.1145/114005.102808. Google Scholar
Digital Library
- M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008. ISBN 0123705916, 9780123705914. Google Scholar
Digital Library
- M. Herlihy, V. Luchangco, P. Martin, and M. Moir. Nonblocking memory management support for dynamic-sized data structures. ACM Transactions on Computer Systems (TOCS), 23 (2): 146--196, May 2005. Google Scholar
Digital Library
- K. Kawachiya, A. Koseki, and T. Onodera. Lock Reservation: Java Locks Can Mostly Do Without Atomic Operations. In Proceedings of the 17th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '02, pages 130--141, New York, NY, USA, 2002. ACM. ISBN 1--58113--471--1. 10.1145/582419.582433. Google Scholar
Digital Library
- L. Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, 28 (9): 690--691, Sept. 1979. ISSN 0018--9340. 10.1109/TC.1979.1675439. Google Scholar
Digital Library
- C. Lin, V. Nagarajan, and R. Gupta. Address-aware fences. In Proceedings of the 27th International Conference on Supercomputing, ICS '13, pages 313--324, New York, NY, USA, 2013. ACM. 10.1145/2464996.2465015. Google Scholar
Digital Library
- R. Liu, H. Zhang, and H. Chen. Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks. In Proceedings of the 2014 USENIX Annual Technical Conference, USENIX ATC '14, pages 219--230, Philadelphia, PA, June 2014. USENIX Association. ISBN 978--1--931971--10--2. Google Scholar
Digital Library
- J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19--25, Dec. 1995.Google Scholar
- P. E. McKenney. Structured deferral: synchronization via procrastination. Communications of the ACM, 56 (7): 40--49, July 2013. 10.1145/2483852.2483867. Google Scholar
Digital Library
- P. E. McKenney and J. D. Slingwine. Read-copy update: Using execution history to solve concurrency problems. In Proceedings of the 10th International Conference on Parallel and Distributed Computing and Systems, IASTED '98, pages 508--518. ACTA Press, 1998.Google Scholar
- M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In Proceedings of the 14th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '02, pages 73--82, New York, NY, USA, 2002. ACM. Google Scholar
Digital Library
- M. M. Michael. Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects. IEEE Transactions on Parallel and Distributed System, 15 (6): 491--504, June 2004. Google Scholar
Digital Library
- A. Morrison and Y. Afek. Fence-free Work Stealing on Bounded TSO Processors. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 413--426, New York, NY, USA, 2014. ACM. ISBN 978--1--4503--2305--5. 10.1145/2541940.2541987. Google Scholar
Digital Library
- T. Onodera, K. Kawachiya, and A. Koseki. Lock Reservation for Java Reconsidered. In M. Odersky, editor, ECOOP 2004 -- Object-Oriented Programming, volume 3086 of Lecture Notes in Computer Science, pages 559--583. Springer Berlin Heidelberg, 2004. ISBN 978--3--540--22159--3.Google Scholar
- G. L. Peterson. Myths about the mutual exclusion problem. Information Processing Letters, 12 (3): 115--116, 1981. ISSN 0020-0190.Google Scholar
- W. Ruan, Y. Liu, and M. Spear. Boosting Timestamp-based Transactional Memory by Exploiting Hardware Cycle Counters. ACM Transactions on Architecture and Code Optimization (TACO), 10 (4): 40:1--40:21, Dec. 2013. ISSN 1544--3566. 10.1145/2541228.2555297. Google Scholar
Digital Library
- K. Russell and D. Detlefs. Eliminating Synchronization-related Atomic Operations with Biased Locking and Bulk Rebiasing. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 263--272, New York, NY, USA, 2006. ACM. ISBN 1--59593--348--4. 10.1145/1167473.1167496. Google Scholar
Digital Library
- P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. Communications of the ACM, 53 (7): 89--97, July 2010. Google Scholar
Digital Library
- A. Singh, S. Narayanasamy, D. Marino, T. Millstein, and M. Musuvathi. End-to-end sequential consistency. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA '12, pages 524--535, Washington, DC, USA, 2012. IEEE Computer Society. Google Scholar
Digital Library
- D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on Memory Consistency and Cache Coherence. Morgan & Claypool Publishers, 1st edition, 2011. ISBN 1608455645, 9781608455645. Google Scholar
Digital Library
- F. J. Torres-Rojas, M. Ahamad, and M. Raynal. Timed Consistency for Shared Distributed Objects. In Proceedings of the 18th Annual ACM Symposium on Principles of Distributed Computing, PODC '99, pages 163--172, New York, NY, USA, 1999. ACM. ISBN 1--58113-099--6. 10.1145/301308.301350. Google Scholar
Digital Library
- J. Triplett, P. E. McKenney, and J. Walpole. Resizable, scalable, concurrent hash tables via relativistic programming. In Proceedings of the 2011 USENIX Annual Technical Conference, USENIX ATC'11, pages 145--158, Berkeley, CA, USA, 2011. USENIX Association. Google Scholar
Digital Library
- J. R. Vash, B. Jung, and R. Tan. System-wide quiescence and per-thread transaction fence in a distributed caching agent. http://www.google.com/patents/US8443148, 2013. US Patent 8443148 B2.Google Scholar
- N. Vasudevan, K. S. Namjoshi, and S. A. Edwards. Simple and Fast Biased Locks. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pages 65--74, New York, NY, USA, 2010. ACM. ISBN 978--1--4503-0178--7. 10.1145/1854273.1854287. Google Scholar
Digital Library
- T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for store-wait-free multiprocessors. In Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA '07, pages 266--277, New York, NY, USA, 2007. ACM. 10.1145/1250662.1250696. Google Scholar
Digital Library
Index Terms
Temporally Bounding TSO for Fence-Free Asymmetric Synchronization
Recommendations
Temporally Bounding TSO for Fence-Free Asymmetric Synchronization
ASPLOS'15This paper introduces a temporally bounded total store ordering (TBTSO) memory model, and shows that it enables nonblocking fence-free solutions to asymmetric synchronization problems, such as those arising in memory reclamation and biased locking.
...
Temporally Bounding TSO for Fence-Free Asymmetric Synchronization
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating SystemsThis paper introduces a temporally bounded total store ordering (TBTSO) memory model, and shows that it enables nonblocking fence-free solutions to asymmetric synchronization problems, such as those arising in memory reclamation and biased locking.
...
Fence-free work stealing on bounded TSO processors
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsWork stealing is the method of choice for load balancing in task parallel programming languages and frameworks. Yet despite considerable effort invested in optimizing work stealing task queues, existing algorithms issue a costly memory fence when ...







Comments