Abstract
Level-one data cache (L1 DC) accesses impact energy usage as they frequently occur and use significantly more energy than register file accesses. A memory access instruction consists of an address generation operation calculating the location where the data item resides in memory and the data access operation that loads/stores a value from/to that location. We propose to decouple these two operations into separate machine instructions to reduce energy usage. By associating the data translation lookaside buffer (DTLB) access and level-one data cache (L1 DC) tag check with an address generation instruction, only a single data array in a set-associative L1 DC needs to be accessed during a load instruction when the result of the tag check is known at that point. In addition, many DTLB accesses and L1 DC tag checks are avoided by memoizing the DTLB way and L1 DC way with the register that holds the memory address to be dereferenced. Finally, we are able to often coalesce an ALU operation with a load or store data access using our technique to reduce the number of instructions executed.
- A. Bardizbanyan, M. Själander, D. Whalley, and P. Larsson-Edefors. Speculative tag access for reduced energy dissipation in set-associative l1 data caches. In Proceedings of the IEEE International Conference on Computer Design (ICCD 2013), Oct. 2013.Google Scholar
Cross Ref
- A. Bardizbanyan, M. Själander, D. Whalley, and P. Larsson-Edefors. Reducing set-associative l1 data cache energy by early load data dependence detection (eld3). In IEEE/ACM Design Automation and Test in Europe Conference, March 2014. Google Scholar
Digital Library
- A. Bardizbanyan, M. Själander, D. Whalley, and P. Larsson-Edefors. Improving data access efficiency by using context-aware loads and stores. In ACM Conference on Languages, Compilers, and Tools for Embedded Systems, June 2015. Google Scholar
Digital Library
- A. Basu, M. Hill, and M. Swift. Reducing memory reference energy with opportunistic virtual caching. In Proceedings of ACM/IEEE International Symposium on Computer Architecture, pages 297–308, June 2012. Google Scholar
Digital Library
- M. E. Benitez and J. W. Davidson. A portable global optimizer and linker. In Proceedings of the SIGPLAN Symposium on Programming Language Design and Implementation, pages 329–338, June 1988. Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proc. Int. Workshop on Workload Characterization, pages 3–14, Dec. 2001. Google Scholar
Digital Library
- K. Inoue, T. Ishihara, and K. Murakami. Way-predicting set-associative cache for high performance and low energy consumption. In Proc. IEEE Int. Symp. on Low Power Design (ISLPED), pages 273–275, Aug. 1999. Google Scholar
Digital Library
- J. Kin, M. Gupta, and W. Mangione-Smith. The filter cache: An energy efficient memory structure. In Proc. Int. Symp. on Microarchitecture, pages 184–193, Dec. 1997. Google Scholar
Digital Library
- D. Moreau, A. Bardizbanyan, M. Själander, D. Whalley, and P. LarssonEdefors. Practical way halting by speculatively accessing halt tags. In Proceedings of the IEEE Design, Automation, and Test in Europe (DATE 2016), Mar. 2016. Google Scholar
Digital Library
- D. Nicolaescu, B. Salamat, A. Veidenbaum, and M. Valero. Fast speculative address generation and way caching for reducing l1 data cache energy. In Proceedings of International Conference on Computer Design, Oct. 2007.Google Scholar
- S. Önder and R. Gupta. Automatic generation of microarchitecture simulators. In IEEE International Conference on Computer Languages, pages 80–89, Chicago, May 1998. Google Scholar
Digital Library
- M. D. Powell, A. Agarwal, T. N. Vijaykumar, B. Falsafi, and K. Roy. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In Proc. ACM/IEEE Int. Symp. on Microarchitecture (MICRO), pages 54–65, Dec. 2001. Google Scholar
Digital Library
- A. Sembrant, E. Hagersten, and D. Black-Shaffer. Tlc: A tag-less cache for reducing dynamic first level cache energy. In Proc. 46th ACM/IEEE Int. Symp. on Microarchitecture (MICRO), pages 351–356, Dec. 2013. Google Scholar
Digital Library
- C. Su and A. Despain. Cache design trade-offs for power and performance optimization: A case study. In Proc. Int. Symp. on Low Power Design (ISLPED), pages 63–68, 1995. Google Scholar
Digital Library
- E. Witchel, S. Larsen, C. S. Ananian, and K. Asanović. Direct addressed caches for reduced power consumption. In Proc. 34th ACM/IEEE Int. Symp. on Microarchitecture (MICRO), pages 124–133, Dec. 2001. Google Scholar
Digital Library
- C. Zhang, F. Vahid, J. Yang, and W. Najjar. A way-halting cache for lowenergy high-performance systems. ACM Transactions on Architecture and Compiler Optimizations (TACO), 2(1):34–54, Mar. 2005. Google Scholar
Digital Library
- Z. Zheng, Z. Wang, and M. Lipasti. Tag check elision. In International Symposium on Low Power Electronics and Design, pages 351–356, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
Index Terms
Decoupling address generation from loads and stores to improve data access energy efficiency
Recommendations
Decoupling address generation from loads and stores to improve data access energy efficiency
LCTES 2018: Proceedings of the 19th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded SystemsLevel-one data cache (L1 DC) accesses impact energy usage as they frequently occur and use significantly more energy than register file accesses. A memory access instruction consists of an address generation operation calculating the location where the ...
Improving Data Access Efficiency by Using Context-Aware Loads and Stores
LCTES'15: Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROMMemory operations have a significant impact on both performance and energy usage even when an access hits in the level-one data cache (L1 DC). Load instructions in particular affect performance as they frequently result in stalls since the register to ...
Improving Data Access Efficiency by Using Context-Aware Loads and Stores
LCTES '15Memory operations have a significant impact on both performance and energy usage even when an access hits in the level-one data cache (L1 DC). Load instructions in particular affect performance as they frequently result in stalls since the register to ...







Comments