skip to main content
research-article

Asymmetric Memory Fences: Optimizing Both Performance and Implementability

Authors Info & Claims
Published:14 March 2015Publication History
Skip Abstract Section

Abstract

There have been several recent efforts to improve the performance of fences. The most aggressive designs allow post-fence accesses to retire and complete before the fence completes. Unfortunately, such designs present implementation difficulties due to their reliance on global state and structures.

This paper's goal is to optimize both the performance and the implementability of fences. We start-off with a design like the most aggressive ones but without the global state. We call it Weak Fence or wF. Since the concurrent execution of multiple wFs can deadlock, we combine wFs with a conventional fence (i.e., Strong Fence or sF) for the less performance-critical thread(s). We call the result an Asymmetric fence group. We also propose a taxonomy of Asymmetric fence groups under TSO. Compared to past aggressive fences, Asymmetric fence groups both are substantially easier to implement and have higher average performance. The two main designs presented (WS+ and W+) speed-up workloads under TSO by an average of 13% and 21%, respectively, over conventional fences.

References

  1. Rochester Software Transactional Memory. http://www.cs.rochester.edu/research/synchronization/rstm/.Google ScholarGoogle Scholar
  2. ARM. ARMv8-A Reference Manual, Issue A.d. http://infocenter.arm.com.Google ScholarGoogle Scholar
  3. Colin Blundell, Milo M. K. Martin, and Thomas F. Wenisch. InvisiFence: Performance-Transparent Memory Ordering in Conventional Multiprocessors. In International Symposium on Computer Architecture, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Luis Ceze, James Tuck, Pablo Montesinos, and Josep Torrellas. BulkSC: Bulk Enforcement of Sequential Consistency. In International Symposium on Computer Architecture, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dave Dice, Mark Moir, and William Scherer. Quickly Reacquirable Locks. Technical Report, Sun Microsystems Inc., 2003.Google ScholarGoogle Scholar
  6. Dave Dice and Nir Shavit. TLRW: Return of the Read-write Lock. In Symposium on Parallelism in Algorithms and Architectures, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yuelu Duan, Xiaobing Feng, Lei Wang, Chao Zhang, and Pen-Chung Yew. Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs. In International Symposium on Code Generation and Optimization, March 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yuelu Duan, Abdullah Muzahid, and Josep Torrellas. WeeFence: Toward Making Fences Free in TSO. In International Symposium on Computer Architecture, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Conference on Programming Language Design and Implementation, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kourosh Gharachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. Memory Consistency and Event Ordering in Scalable Shared-memory Multi-processors. In International Symposium on Computer Architecture, June 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chris Gniady, Babak Falsafi, and T. N. Vijaykumar. Is SC + ILP = RC? In International Symposium on Computer Architecture, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Intel. Intel Itanium Architecture Software Developer's Manual, Revision 2.3. http://www.intel.com/design/itanium/manuals/iiasdmanual.htm, May 2010.Google ScholarGoogle Scholar
  13. Intel Corp. IA-32 Intel Architecture Software Developer Manual, Volume 2: Instruction Set Reference. 2002.Google ScholarGoogle Scholar
  14. Kiyokuni Kawachiya, Akira Koseki, and Tamiya Onodera. Lock Reservation: Java Locks Can Mostly Do without Atomic Operations. In Conference on Object-Oriented Programming, Systems, Language, and Applications, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Edya Ladan-Mozes, I-Ting Angelina Lee, and Dmitry Vyukov. Location-Based Memory Fences. In Symposium on Parallelism in Algorithms and Architectures, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. Lamport. A New Solution of Dijkstra's Concurrent Programming Problem. Communications of the ACM, August 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Lamport. How to Make a Multiprocessor Computer that Correctly Executes Multiprocess Programs. IEEE Transactions on Computers, July 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jaejin Lee and D.A. Padua. Hiding Relaxed Memory Consistency with Compilers. In International Conference on Parallel Architectures and Compilation Techniques, October 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Lin, V. Nagarajan, and R. Gupta. Address-aware Fences. In International Conference on Supercomputing, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Changhui Lin, Vijay Nagarajan, and Rajiv Gupta. Efficient Sequential Consistency using Conditional Fences. In International Conference on Parallel Architectures and Compilation Techniques, September 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Changhui Lin, Vijay Nagarajan, Rajiv Gupta, and Bharghava Rajaram. Efficient Sequential Consistency via Conflict Ordering. In International Conference on Architectural Support for Programming Languages and Operating Systems, March 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Daniel Marino, Abhayendra Singh, Todd Millstein, Madanlal Musuvathi, and Satish Narayanasamy. A Case for an SC-preserving Compiler. In Conference on Programming Language Design and Implementation, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chi Cao Minh, Jaewoong Chung, Christos Kozyrakis, and Kunle Olukotun. STAMP: Stanford Transactional Applications for Multi-Processing. In International Symposium on Work- load Characterization, September 2008.Google ScholarGoogle Scholar
  24. Abdullah Muzahid, Shanxiang Qi, and Josep Torrellas. Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically. In International Symposium on Microarchitecture, December 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xuehai Qian, Benjamin Sahelices, Josep Torrellas, and Depei Qian. Volition: Scalable and Precise Sequential Consistency Violation Detection. In International Conference on Architectural Support for Programming Languages and Operating Systems, March 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Parthasarathy Ranganathan, Vijay S. Pai, and Sarita V. Adve. Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap Between Memory Consistency Models. In Symposium on Parallelism in Algorithms and Architectures, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. James Reinders. Intel Threading Building Blocks. O'Reilly & Associates, Inc., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Douglas C. Schmidt and Tim Harrison. Double-Checked Locking: An Optimization Pattern for Efficiently Initializing and Accessing Thread-Safe Objects. In Conference on Pattern Languages of Programming, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Shasha and M. Snir. Efficient and Correct Execution of Parallel Programs that Share Memory. ACM Transactions on Programming Languages and Systems, April 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nir Shavit and Dan Touitou. Software Transactional Memory. In Symposium on Principles of Distributed Computing, August 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Abhayendra Singh, Satish Narayanasamy, Daniel Marino, Todd D. Millstein, and Madanlal Musuvathi. End-to-End Sequential Consistency. In International Symposium on Computer Architecture, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. SPARC International, Inc. The SPARC Architecture Manual (Version 9). 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zehra Sura, Xing Fang, Chi-Leung Wong, Samuel P. Midkiff, Jaejin Lee, and David Padua. Compiler Techniques for High Performance Sequentially Consistent Java Programs. In Symposium on Principles and Practice of Parallel Programming, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Asymmetric Memory Fences: Optimizing Both Performance and Implementability

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!