Abstract
In this paper we evaluate the atomic region compiler abstraction by incorporating it into a commercial system. We find that atomic regions are simple and intuitive to integrate into an x86 binary-translation system. Furthermore, doing so trivially enables additional optimization opportunities beyond that achievable by a high-performance dynamic optimizer, which already implements superblocks.
We show that atomic regions can suffer from severe performance penalties if misspeculations are left uncontrolled, but that a simple software control mechanism is sufficient to reign in all detrimental side-effects. We evaluate using full reference runs of the SPEC CPU2000 integer benchmarks and find that atomic regions enable up to a 9% (3% on average) improvement beyond the performance of a tuned product.
These performance improvements are achieved without any negative side effects. Performance side effects such as code bloat are absent with atomic regions; in fact, static code size is reduced. The hardware necessary is synergistic with other needs and was already available on the commercial product used in our evaluation. Finally, the software complexity is minimal as a single developer was able to incorporate atomic regions into a sophisticated 300,000 line code base in three months, despite never having seen the translator source code beforehand.
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A Transparent Dynamic Optimization System. In Proceedings of the SIGPLAN 2000 Conference on Programming Language Design and Implementation, pages 1--12, 2000. Google Scholar
Digital Library
- C. Blundell, M. M. Martin, and T. F. Wenisch. Invisifence: performance-transparent memory ordering in conventional multiprocessors. In Proceedings of the 36th International Symposium on Computer Architecture, pages 233--244, 2009. Google Scholar
Digital Library
- R. A. Bringmann, S. A. Mahlke, R. E. Hank, J. C. Gyllenhaal, and W.-m.W. Hwu. Speculative execution exception recovery using writeback suppression. In Proceedings of the 26th International Symposium on Microarchitecture, pages 214--223, 1993. Google Scholar
Digital Library
- B. Cmelik and D. Keppel. Shade: A fast instruction-set simulator for execution profiling. ACM SIGMETRICS Performance Evaluation Review, 22(1):128--137, May 1994. Google Scholar
Digital Library
- J. C. Dehnert et al. The Transmeta Code Morphing Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Reallife Challenges. In Proceedings of the International Symposium on Code Generation and Optimization, pages 15--24, 2003. Google Scholar
Digital Library
- J. A. Fisher. Trace scheduling: a technique for global microcode compaction. IEEE Transactions on Computers, 30(7):478--490, 1981. Google Scholar
Digital Library
- S. Gochman, R. Ronen, I. Anati, A. Berkovits, T. Kurts, A. Naveh, A. Saeed, Z. Sperber, and R. C. Valentine. The Intel Pentium M Processor: Microarchitecture and Performance. Intel Technology Journal, 7(2):21--36, 2003.Google Scholar
- S. Gopal, T. Vijaykumar, J. Smith, and G. Sohi. Speculative versioning cache. In Proceedings of the 4th International Symposium on High-Performance Computer Architecture, page 195, 1998. Google Scholar
Digital Library
- W. M. Hwu et al. The Superblock: An Effective Technique for VLIW and Superscalar Compilation. Journal of Supercomputing, 7(1):229--248, Mar 1993. Google Scholar
Digital Library
- A. Klaiber. The Technology Behind Crusoe Processors. Transmeta Whitepaper, Jan. 2000.Google Scholar
- J. R. Larus and R. Rajwar. Transactional Memory. Morgan and Claypool, Dec. 2006.Google Scholar
- S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W.-M. W. Hwu, B. R. Rau, and M. S. Schlansker. Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Trans. Comput. Syst., 11(4):376--408, 1993. Google Scholar
Digital Library
- S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann. Effective compiler support for predicated execution using the hyperblock. In In Proceedings of the 25th International Symposium on Microarchitecture, pages 45--54, 1992. Google Scholar
Digital Library
- N. Neelakantam, R. Rajwar, S. Srinivas, U. Srinivasan, and C. Zilles. Hardware atomicity for reliable software speculation. In Proceedings of the 34th International Symposium on Computer Architecture, pages 174--185, 2007. Google Scholar
Digital Library
- S. J. Patel and S. S. Lumetta. rePLay: A Hardware Framework for Dynamic Optimization. IEEE Transactions on Computers, 50(6):590--608, 2001. Google Scholar
Digital Library
- R. Rajwar and J. R. Goodman. Speculative lock elision: Enabling highly concurrent multithreaded execution. In Proceedings of the 34th International Symposium on Microarchitecture, pages 294--305, 2001. Google Scholar
Digital Library
- G. Rozas. Memory management methods and systems that support cache consistency. United States Patent 7,376,798, May 2008.Google Scholar
- G. Rozas, A. Klaiber, D. Dunn, P. Serris, and L. Shah. Supporting speculative modification in a data cache. United States Patent 7,225,299, May 2007.Google Scholar
- M. D. Smith, M. Horowitz, and M. S. Lam. Efficient superscalar performance through boosting. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 248--259, 1992. Google Scholar
Digital Library
- C. Zilles and N. Neelakantam. Reactive Techniques for Controlling Software Speculation. In Proceedings of the International Symposium on Code Generation and Optimization, pages 305--316, 2005. Google Scholar
Digital Library
Index Terms
A real system evaluation of hardware atomicity for software speculation
Recommendations
A real system evaluation of hardware atomicity for software speculation
ASPLOS '10In this paper we evaluate the atomic region compiler abstraction by incorporating it into a commercial system. We find that atomic regions are simple and intuitive to integrate into an x86 binary-translation system. Furthermore, doing so trivially ...
A real system evaluation of hardware atomicity for software speculation
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsIn this paper we evaluate the atomic region compiler abstraction by incorporating it into a commercial system. We find that atomic regions are simple and intuitive to integrate into an x86 binary-translation system. Furthermore, doing so trivially ...
Hardware atomicity for reliable software speculation
ISCA '07: Proceedings of the 34th annual international symposium on Computer architectureSpeculative compiler optimizations are effective in improving both single-thread performance and reducing power consumption, but their implementation introduces significant complexity, which can limit their adoption, limit their optimization scope, and ...









Comments