Abstract
Program optimizations based on data dependences may not preserve the memory consistency in the programs. Previous works leverage a hardware ATOMICITY primitive to restrict the thread interleaving for preserving sequential consistency in region optimizations. However, ATOMICITY primitive is over restrictive on the thread interleaving for optimizing real-world applications developed with the popular Total-Store-Ordering (TSO) memory consistency, which is weaker than sequential consistency. In this paper, we present a novel hardware TSO_ATOMICITY primitive, which has less restriction on the thread interleaving than ATOMICITY primitive to permit more efficient program execution than ATOMICITY primitive, but can still preserve TSO memory consistency in all region optimizations. Furthermore, TSO_ATOMICITY primitive requires similar architecture support as ATOMICITY primitive and can be implemented with only slight change to the existing ATOMICITY primitive implementation. Our experimental results show that in a start-of-art dynamic binary optimization system on a large set of workloads, ATOMICITY primitive can only improve the performance by 4% on average. TSO_ATOMICITY primitive can reduce the overhead associated with ATOMICITY primitive and improve the performance by 12% on average.
- R. Agarwal, J. Torrellas, "FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes", ISCA 2011 Google Scholar
Digital Library
- W. Ahn, S. Qi, M. Nicolaides, J. Torrellas, J. Lee, X. Fang, S. Midkiff, S and D. Wong, "BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support", MICRO 2009. Google Scholar
Digital Library
- C. S. Ananian and K. Asanovic and B. C. Kuszmaul, C. E. Leiserson and S. Lie, "Unbounded Transactional Memory", HPCA 2005. Google Scholar
Digital Library
- V. Bala, E. Duesterwald and S. Banerjia, "Dynamo: A transparent runtime optimization system", PLDI 2000. Google Scholar
Digital Library
- L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skalesky, Y. Wang and Y. Zemach, "IA-32 Execution Layer: A Two Phase Dynamic Translator Designed to Support IA-32 Applications on Itaniumâ-based Systems", MICRO 2003. Google Scholar
Digital Library
- H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil and P. O'Neil, "A Critique of ANSI SQL Isolation Levels", Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data. Google Scholar
Digital Library
- C. Blundell, E. C. Levwis and M. K. Martin, "Deconstructing Transactions: The Subtleties of ATOMICITY", WDDD 2005.Google Scholar
- C. Blundell, M. M. Martin and T. F. Wenisch, "InvisiFence: performance-transparent memory ordering in conventional multiprocessors". ISCA 2009. Google Scholar
Digital Library
- H. Boehm and S. V. Adve, "Foundations of the C++ concurrency memory model", PLDI 2008. Google Scholar
Digital Library
- E. Borin, Y. Wu, M. Breternitz Jr., C. Wang, "LAR-CC: Large atomic regions with conditional commits," CGO 2011. Google Scholar
Digital Library
- E. Borin, Y. Wu, C. Wang, W. Liu, M. -Breternitz, S. Hu, E. Natanzon, S. Rotem and R. Rosner, "TAO: two-level atomicity for dynamic binary optimizations", CGO 2010. Google Scholar
Digital Library
- D. Bruening, T. Garnett and S. Amarasinghe, "An infrastructure for adaptive Dynamic Optimization", CGO 2003. Google Scholar
Digital Library
- S. Burckhardt, M. Musuvathi and V. Singh. "Verifying local transformations on relaxed memory models", CC 2010. Google Scholar
Digital Library
- L. Ceze, J. Tuck, P. Montesinos and J. Torrellas, "BulkSC: bulk enforcement of sequential consistency", ISCA 2007. Google Scholar
Digital Library
- K. Ebcioglu , E. R. Altman, "DAISY: dynamic compilation for 100% architectural compatibility", ISCA 1997. Google Scholar
Digital Library
- G. Gao, V. Sarkar, "Location Consistency-A New Memory Model and Cache Consistency Protocol", IEEE Trans. Computers, 2000. Google Scholar
Digital Library
- K. Gharachorloo, A. Gupta and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models", ICPP 1991.Google Scholar
- K. Gharachorloo , D. Lenoski , J. Laudon , P. Gibbons , A. Gupta and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors", ISCA 1990. Google Scholar
Digital Library
- L. Hammond, V. Wong, M. Chen. B.D. Carlstrom, J.D. Davis, B. Hertzberg, M.K. Prabhu, H. Wijaya; C. Kozyrakis and K. Olukotun, "Transactional Memory Coherence and Consistency", ISCA 2004. Google Scholar
Digital Library
- K. Krewell, "Transmeta Gets More Efficeon", Microprocessor report. v.17, October, 2003.Google Scholar
- M. Herlihy and J. E. B. Moss, "Transactional memory: Architectural support for lock-free data structures", In Proceedings of the 20th annual International Symposium on Computer Architecture (ISCA) 1993. Google Scholar
Digital Library
- L. Lamport, "How to Make a Multiprocessor Compute That Correctly Executes Multiprocess Programs", IEEE Transactions on Computers, 1979. Google Scholar
Digital Library
- C. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Reddi and K. Hazelwood, "Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation", PLDI 2005. Google Scholar
Digital Library
- J. Manson, W. Pugh , S. V. Adve, "The Java memory model", POPL 2005. Google Scholar
Digital Library
- D. Marino, A. Singh, T. Millstein, M. Musuvathi and S. Narayanasamy, "A Case for SC-Preserving Compiler", PLDI 2011. Google Scholar
Digital Library
- K. E. Moore and J. Bobba and M. J. Moravan and M. D. Hill and D. A. Wood, "LogTM: Log-based Transactional Memory", HPCA 2006.Google Scholar
Cross Ref
- N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan and C. Zilles, "Hardware atomicity for reliable software speculation", ISCA 2007. Google Scholar
Digital Library
- S. Owens, S. Sarkar and P. Sewell, "A Better X86 Memory Model: X86-TSO", Theorem Proving in Higher Order Logics, (TPHOLs), 2009. Google Scholar
Digital Library
- S. Patel and S. Lumetta, "rePLay: A Hardware Framework for Dynamic Optimization". IEEE Transactions on Computers.50, 6 (Jun. 2001), 590--608. Google Scholar
Digital Library
- S. Patel, T. Tung, S. Bose and M. Crum, "Increasing the size of atomic instruction blocks using control flow assertions", MICRO 2000. Google Scholar
Digital Library
- R. Rajwar and M. Herlihy and K. Lai, "Virtualizing Transactional Memory", ISCA 2005. Google Scholar
Digital Library
- P. Ranganathan, V. Pai and S. Adve, "Using Speculative Retirement and Larger Instruction Windows to Narrow the Performance Gap between Memory Consistency Models", SPAA 1997. Google Scholar
Digital Library
- R. Rosner, Y. Almog, Y, M. Moffie, N. Schwartz and A. Mendelson, "Power Awareness through Selective Dynamically Optimized Frames", ISCA 2004. Google Scholar
Digital Library
- A. Singh, S. Narayanasamy, D. Marino, T. Millstein, M. Musuvathi, "End-to-End Sequential Consistency", ISCA 2012. Google Scholar
Digital Library
- S. Sridhar, J. S. Shapiro, E. Northup and P. Bungale, "HDTrans: An Open Source, Low-Level Dynamic Instrumentation System", VEE 2006. Google Scholar
Digital Library
- C. Wang, Y. Wu, "Modeling and Performance Evaluation of TSO-Preserving Binary Optimization", PACT 2011. Google Scholar
Digital Library
- D. L. Weaver and T. Germond, editors, "The SPARC architecture Manual (Version 9)", Prentice-Hall, 1994.Google Scholar
- T. F. Wenisch, A. Ailamaki, B. Falsafi and A. Moshovos. "Mechanisms for store-wait-free multiprocessors", ISCA 2007. Google Scholar
Digital Library
- Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A", Order Number: 253668032US.Google Scholar
- Intel® Architecture Instruction Set Extensions Programming Reference", February 2012Google Scholar
Index Terms
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations
Recommendations
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations
ASPLOS '13Program optimizations based on data dependences may not preserve the memory consistency in the programs. Previous works leverage a hardware ATOMICITY primitive to restrict the thread interleaving for preserving sequential consistency in region ...
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations
ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systemsProgram optimizations based on data dependences may not preserve the memory consistency in the programs. Previous works leverage a hardware ATOMICITY primitive to restrict the thread interleaving for preserving sequential consistency in region ...
STM systems: enforcing strong isolation between transactions and non-transactional code
ICA3PP'12: Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part ITransactional memory (TM) systems implement the concept of an atomic execution unit called transaction in order to discharge programmers from explicit synchronization management. But when shared data is atomically accessed by both transaction and non-...







Comments