Abstract
Memory operations have a significant impact on both performance and energy usage even when an access hits in the level-one data cache (L1 DC). Load instructions in particular affect performance as they frequently result in stalls since the register to be loaded is often referenced before the data is available in the pipeline. L1 DC accesses also impact energy usage as they typically require significantly more energy than a register file access. Despite their impact on performance and energy usage, L1 DC accesses on most processors are performed in a general fashion without regard to the context in which the load or store operation is performed. We describe a set of techniques where the compiler enhances load and store instructions so that they can be executed with fewer stalls and/or enable the L1 DC to be accessed in a more energy-efficient manner. We show that using these techniques can simultaneously achieve a 6% gain in performance and a 43% reduction in L1 DC energy usage.
- A. Bardizbanyan, M. Själander, D. Whalley, and P. Larsson-Edefors. Speculative tag access for reduced energy dissipation in set-associative L1 data caches. In IEEE Int. Conf. Computer Design, pages 302--308, October 2013.Google Scholar
Cross Ref
- L. Jin and S. Cho. Macro data load: An efficient mechanism for enhancing loaded data reuse. IEEE Trans. on Computers, 60(4):526--537, April 2011. Google Scholar
Digital Library
- P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, O. Kockberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Ozer, and B. Falsafi. Scale-out processors. In Annual Int. Symp. Computer Architecture, pages 500--511, June 2012. Google Scholar
Digital Library
- M. E. Benitez and J. W. Davidson. A portable global optimizer and linker. In ACM SIGPLAN Conf. Programming Language Design and Implementation, pages 329--338, June 1988. Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In IEEE Int. Workshop/Symp. on Workload Characterization, pages 3--14, December 2001. Google Scholar
Digital Library
- T. Austin, E. Larson, and D. Ernst. SimpleScalar: An infrastructure for computer system modeling. Computer, 35(2):59--67, February 2002. Google Scholar
Digital Library
- D. Williamson. ARM Cortex A8: A high performance processor for low power applications. In E. John and J. Rubio, editors, Unique Chips and Systems. CRC Press, 2007.Google Scholar
Cross Ref
- M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In IEEE/ACM Annual Int. Symp. Microarchitecture, pages 54--65, December 2001. Google Scholar
Digital Library
- K. Inoue, T. Ishihara, and K. Murakami. Way-predicting set-associative cache for high performance and low energy consumption. In Int. Symp. Low Power Electronics and Design, pages 273--275, August 1999. Google Scholar
Digital Library
- D. Nicolaescu, B. Salamat, A. Veidenbaum, and M. Valero. Fast speculative address generation and way caching for reducing L1 data cache energy. In IEEE Int. Conf. Computer Design, pages 101--107, October 2006.Google Scholar
Cross Ref
- C. Zhang, F. Vahid, J. Yang, and W. Najjar. A way-halting cache for low-energy high-performance systems. ACM Trans. on Architecture and Code Optimization, 2(1):34--54, March 2005. Google Scholar
Digital Library
- E. Witchel, S. Larsen, C. S. Ananian, and K. Asanović. Direct addressed caches for reduced power consumption. In IEEE/ACM Annual Int. Symp. Microarchitecture, pages 124--133, December 2001. Google Scholar
Digital Library
- C. Su and A Despain. Cache design trade-offs for power and performance optimization: A case study. In Int. Symp. Low Power Electronics and Design, pages 63--68, 1995. Google Scholar
Digital Library
- J. Kin, M. Gupta, and W. H. Mangione-Smith. The filter cache: an energy efficient memory structure. In IEEE/ACM Annual Int. Symp. Microarchitecture, pages 184--193, December 1997. Google Scholar
Digital Library
- A. Bardizbanyan, M. Själander, D. Whalley, and P. Larsson-Edefors. Designing a practical data filter cache to improve both energy efficiency and performance. ACM Trans. on Architecture and Code Optimization, 10(4):54:1--54:25, December 2013. Google Scholar
Digital Library
- A. Bardizbanyan, P. Gavin, D. Whalley, M. Själander, P. Larsson-Edefors, S. McKee, and P. Stenström. Improving data access efficiency by using a tagless access buffer (TAB). In Int. Symp. Code Generation and Optimization, pages 269--279, February 2013. Google Scholar
Digital Library
Index Terms
Improving Data Access Efficiency by Using Context-Aware Loads and Stores
Recommendations
Improving Data Access Efficiency by Using Context-Aware Loads and Stores
LCTES'15: Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROMMemory operations have a significant impact on both performance and energy usage even when an access hits in the level-one data cache (L1 DC). Load instructions in particular affect performance as they frequently result in stalls since the register to ...
Decoupling address generation from loads and stores to improve data access energy efficiency
LCTES '18Level-one data cache (L1 DC) accesses impact energy usage as they frequently occur and use significantly more energy than register file accesses. A memory access instruction consists of an address generation operation calculating the location where the ...
Decoupling address generation from loads and stores to improve data access energy efficiency
LCTES 2018: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsLevel-one data cache (L1 DC) accesses impact energy usage as they frequently occur and use significantly more energy than register file accesses. A memory access instruction consists of an address generation operation calculating the location where the ...







Comments