skip to main content
research-article

A real system evaluation of hardware atomicity for software speculation

Published:13 March 2010Publication History
Skip Abstract Section

Abstract

In this paper we evaluate the atomic region compiler abstraction by incorporating it into a commercial system. We find that atomic regions are simple and intuitive to integrate into an x86 binary-translation system. Furthermore, doing so trivially enables additional optimization opportunities beyond that achievable by a high-performance dynamic optimizer, which already implements superblocks.

We show that atomic regions can suffer from severe performance penalties if misspeculations are left uncontrolled, but that a simple software control mechanism is sufficient to reign in all detrimental side-effects. We evaluate using full reference runs of the SPEC CPU2000 integer benchmarks and find that atomic regions enable up to a 9% (3% on average) improvement beyond the performance of a tuned product.

These performance improvements are achieved without any negative side effects. Performance side effects such as code bloat are absent with atomic regions; in fact, static code size is reduced. The hardware necessary is synergistic with other needs and was already available on the commercial product used in our evaluation. Finally, the software complexity is minimal as a single developer was able to incorporate atomic regions into a sophisticated 300,000 line code base in three months, despite never having seen the translator source code beforehand.

References

  1. V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A Transparent Dynamic Optimization System. In Proceedings of the SIGPLAN 2000 Conference on Programming Language Design and Implementation, pages 1--12, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Blundell, M. M. Martin, and T. F. Wenisch. Invisifence: performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th International Symposium on Computer Architecture, pages 233--244, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. A. Bringmann, S. A. Mahlke, R. E. Hank, J. C. Gyllenhaal, and W.-m.W. Hwu. Speculative execution exception recovery using writeback suppression. In Proceedings of the 26th International Symposium on Microarchitecture, pages 214--223, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Cmelik and D. Keppel. Shade: A fast instruction-set simulator for execution profiling. ACM SIGMETRICS Performance Evaluation Review, 22(1):128--137, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. C. Dehnert et al. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Reallife Challenges. In Proceedings of the International Symposium on Code Generation and Optimization, pages 15--24, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. A. Fisher. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478--490, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. C. Valentine. The Intel Pentium M Processor: Microarchitecture and Performance. Intel Technology Journal, 7(2):21--36, 2003.Google ScholarGoogle Scholar
  8. S. Gopal, T. Vijaykumar, J. Smith, and G. Sohi. Speculative versioning cache. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture, page 195, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. M. Hwu et al. The Superblock: An Effective Technique for VLIW and Superscalar Compilation. Journal of Supercomputing, 7(1):229--248, Mar 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Klaiber. The Technology Behind Crusoe Processors. Transmeta Whitepaper, Jan. 2000.Google ScholarGoogle Scholar
  11. J. R. Larus and R. Rajwar. Transactional Memory. Morgan and Claypool, Dec. 2006.Google ScholarGoogle Scholar
  12. S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W.-M. W. Hwu, B. R. Rau, and M. S. Schlansker. Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Trans. Comput. Syst., 11(4):376--408, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In In Proceedings of the 25th International Symposium on Microarchitecture, pages 45--54, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan, and C. Zilles. Hardware atomicity for reliable software speculation. In Proceedings of the 34th International Symposium on Computer Architecture, pages 174--185, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. J. Patel and S. S. Lumetta. rePLay: A Hardware Framework for Dynamic Optimization. IEEE Transactions on Computers, 50(6):590--608, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the 34th International Symposium on Microarchitecture, pages 294--305, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Rozas. Memory management methods and systems that support cache consistency. United States Patent 7,376,798, May 2008.Google ScholarGoogle Scholar
  18. G. Rozas, A. Klaiber, D. Dunn, P. Serris, and L. Shah. Supporting speculative modification in a data cache. United States Patent 7,225,299, May 2007.Google ScholarGoogle Scholar
  19. M. D. Smith, M. Horowitz, and M. S. Lam. Efficient superscalar performance through boosting. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 248--259, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Zilles and N. Neelakantam. Reactive Techniques for Controlling Software Speculation. In Proceedings of the International Symposium on Code Generation and Optimization, pages 305--316, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A real system evaluation of hardware atomicity for software speculation

      Recommendations

      Reviews

      Wolfgang Schreiner

      New features in processor design strive to provide performance improvements, without unduly increasing hardware complexity, in a way that is effectively exploitable by compilers. One such feature is hardware atomicity, where a region of code can be marked as "atomic," such that its effect can eventually be either committed or rolled back. Consequently, the compiler may speculate when generating code for multiple possible execution paths: it may guess the most likely path, translate conditional jumps out of this path into assertions stating that the jumps are not taken, optimize the code along this path assuming that the assertions hold, and tag the result as atomic. If the guess is right, the optimization pays off; if the guess is wrong-that is, some assertion fails-a costly rollback has to be performed. The authors incorporate the idea into the Transmeta Efficeon, a processor that uses code morphing to translate x86-instructions into its internal instruction set, based on the very long instruction word (VLIW) principle. The processor's code-morphing software is modified to generate atomic regions that can be supported by the processor's capabilities for memory checkpointing; thus, the performance of the SPEC CPU2000 benchmarks could be improved by an average of three percent (and as much as nine percent). The improvements depend heavily on carefully monitoring misspeculations at runtime-in order to readjust misbehaving assertions-and on the compile-time optimization of redundant assertions. The results are very encouraging and may well find their way, in the future, into mainstream processor designs. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGARCH Computer Architecture News
        ACM SIGARCH Computer Architecture News  Volume 38, Issue 1
        ASPLOS '10
        March 2010
        399 pages
        ISSN:0163-5964
        DOI:10.1145/1735970
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems
          March 2010
          422 pages
          ISBN:9781605588391
          DOI:10.1145/1736020
          • General Chair:
          • James C. Hoe,
          • Program Chair:
          • Vikram S. Adve

        Copyright © 2010 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 March 2010

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!