skip to main content
research-article

TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

Published:16 March 2013Publication History
Skip Abstract Section

Abstract

Program optimizations based on data dependences may not preserve the memory consistency in the programs. Previous works leverage a hardware ATOMICITY primitive to restrict the thread interleaving for preserving sequential consistency in region optimizations. However, ATOMICITY primitive is over restrictive on the thread interleaving for optimizing real-world applications developed with the popular Total-Store-Ordering (TSO) memory consistency, which is weaker than sequential consistency. In this paper, we present a novel hardware TSO_ATOMICITY primitive, which has less restriction on the thread interleaving than ATOMICITY primitive to permit more efficient program execution than ATOMICITY primitive, but can still preserve TSO memory consistency in all region optimizations. Furthermore, TSO_ATOMICITY primitive requires similar architecture support as ATOMICITY primitive and can be implemented with only slight change to the existing ATOMICITY primitive implementation. Our experimental results show that in a start-of-art dynamic binary optimization system on a large set of workloads, ATOMICITY primitive can only improve the performance by 4% on average. TSO_ATOMICITY primitive can reduce the overhead associated with ATOMICITY primitive and improve the performance by 12% on average.

References

  1. R. Agarwal, J. Torrellas, "FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes", ISCA 2011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. Ahn, S. Qi, M. Nicolaides, J. Torrellas, J. Lee, X. Fang, S. Midkiff, S and D. Wong, "BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support", MICRO 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. S. Ananian and K. Asanovic and B. C. Kuszmaul, C. E. Leiserson and S. Lie, "Unbounded Transactional Memory", HPCA 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Bala, E. Duesterwald and S. Banerjia, "Dynamo: A transparent runtime optimization system", PLDI 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skalesky, Y. Wang and Y. Zemach, "IA-32 Execution Layer: A Two Phase Dynamic Translator Designed to Support IA-32 Applications on Itaniumâ-based Systems", MICRO 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil and P. O'Neil, "A Critique of ANSI SQL Isolation Levels", Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Blundell, E. C. Levwis and M. K. Martin, "Deconstructing Transactions: The Subtleties of ATOMICITY", WDDD 2005.Google ScholarGoogle Scholar
  8. C. Blundell, M. M. Martin and T. F. Wenisch, "InvisiFence: performance-transparent memory ordering in conventional multiprocessors". ISCA 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Boehm and S. V. Adve, "Foundations of the C++ concurrency memory model", PLDI 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Borin, Y. Wu, M. Breternitz Jr., C. Wang, "LAR-CC: Large atomic regions with conditional commits," CGO 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Borin, Y. Wu, C. Wang, W. Liu, M. -Breternitz, S. Hu, E. Natanzon, S. Rotem and R. Rosner, "TAO: two-level atomicity for dynamic binary optimizations", CGO 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Bruening, T. Garnett and S. Amarasinghe, "An infrastructure for adaptive Dynamic Optimization", CGO 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Burckhardt, M. Musuvathi and V. Singh. "Verifying local transformations on relaxed memory models", CC 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Ceze, J. Tuck, P. Montesinos and J. Torrellas, "BulkSC: bulk enforcement of sequential consistency", ISCA 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Ebcioglu , E. R. Altman, "DAISY: dynamic compilation for 100% architectural compatibility", ISCA 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Gao, V. Sarkar, "Location Consistency-A New Memory Model and Cache Consistency Protocol", IEEE Trans. Computers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Gharachorloo, A. Gupta and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models", ICPP 1991.Google ScholarGoogle Scholar
  18. K. Gharachorloo , D. Lenoski , J. Laudon , P. Gibbons , A. Gupta and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors", ISCA 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Hammond, V. Wong, M. Chen. B.D. Carlstrom, J.D. Davis, B. Hertzberg, M.K. Prabhu, H. Wijaya; C. Kozyrakis and K. Olukotun, "Transactional Memory Coherence and Consistency", ISCA 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. K. Krewell, "Transmeta Gets More Efficeon", Microprocessor report. v.17, October, 2003.Google ScholarGoogle Scholar
  21. M. Herlihy and J. E. B. Moss, "Transactional memory: Architectural support for lock-free data structures", In Proceedings of the 20th annual International Symposium on Computer Architecture (ISCA) 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Lamport, "How to Make a Multiprocessor Compute That Correctly Executes Multiprocess Programs", IEEE Transactions on Computers, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation", PLDI 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Manson, W. Pugh , S. V. Adve, "The Java memory model", POPL 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Marino, A. Singh, T. Millstein, M. Musuvathi and S. Narayanasamy, "A Case for SC-Preserving Compiler", PLDI 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. E. Moore and J. Bobba and M. J. Moravan and M. D. Hill and D. A. Wood, "LogTM: Log-based Transactional Memory", HPCA 2006.Google ScholarGoogle ScholarCross RefCross Ref
  27. N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan and C. Zilles, "Hardware atomicity for reliable software speculation", ISCA 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Owens, S. Sarkar and P. Sewell, "A Better X86 Memory Model: X86-TSO", Theorem Proving in Higher Order Logics, (TPHOLs), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Patel and S. Lumetta, "rePLay: A Hardware Framework for Dynamic Optimization". IEEE Transactions on Computers.50, 6 (Jun. 2001), 590--608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Patel, T. Tung, S. Bose and M. Crum, "Increasing the size of atomic instruction blocks using control flow assertions", MICRO 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Rajwar and M. Herlihy and K. Lai, "Virtualizing Transactional Memory", ISCA 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Ranganathan, V. Pai and S. Adve, "Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap between Memory Consistency Models", SPAA 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Rosner, Y. Almog, Y, M. Moffie, N. Schwartz and A. Mendelson, "Power Awareness through Selective Dynamically Optimized Frames", ISCA 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Singh, S. Narayanasamy, D. Marino, T. Millstein, M. Musuvathi, "End-to-End Sequential Consistency", ISCA 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Sridhar, J. S. Shapiro, E. Northup and P. Bungale, "HDTrans: An Open Source, Low-Level Dynamic Instrumentation System", VEE 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Wang, Y. Wu, "Modeling and Performance Evaluation of TSO-Preserving Binary Optimization", PACT 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. L. Weaver and T. Germond, editors, "The SPARC architecture Manual (Version 9)", Prentice-Hall, 1994.Google ScholarGoogle Scholar
  38. T. F. Wenisch, A. Ailamaki, B. Falsafi and A. Moshovos. "Mechanisms for store-wait-free multiprocessors", ISCA 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A", Order Number: 253668032US.Google ScholarGoogle Scholar
  40. Intel® Architecture Instruction Set Extensions Programming Reference", February 2012Google ScholarGoogle Scholar

Index Terms

  1. TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 4
      ASPLOS '13
      April 2013
      540 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2499368
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
        March 2013
        574 pages
        ISBN:9781450318709
        DOI:10.1145/2451116

      Copyright © 2013 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 March 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!