Abstract
Instruction fetch behavior has been shown to be very regular and predictable, even for diverse application areas. In this work, we propose the Lookahead Instruction Fetch Engine (LIFE), which is designed to exploit the regularity present in instruction fetch. The nucleus of LIFE is the Tagless Hit Instruction Cache (TH-IC), a small cache that assists the instruction fetch pipeline stage as it efficiently captures information about both sequential and non-sequential transitions between instructions. TH-IC provides a considerable savings in fetch energy without incurring the performance penalty normally associated with small filter instruction caches. LIFE extends TH-IC by making use of advanced control flow metadata to further improve utilization of fetch-associated structures such as the branch predictor, branch target buffer, and return address stack. These structures are selectively disabled by LIFE when it can be determined that they are unnecessary for the following instruction to be fetched. Our results show that LIFE enables further reductions in total processor energy consumption with no impact on application execution times even for the most aggressive power-saving configuration. We also explore the use of LIFE metadata on guiding decisions further down the pipeline. Next sequential line prefetch for the data cache can be enhanced by only prefetching when the triggering instruction has been previously accessed in the TH-IC. This strategy reduces the number of useless prefetches and thus contributes to improving overall processor efficiency. LIFE enables designers to boost instruction fetch efficiency by reducing energy cost without negatively affecting performance.
- Aragón, J. L., González, J., and González, A. Power-aware control speculation through selective throttling. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture (Washington, DC, USA, 2003), IEEE Computer Society, pp. 103--112. Google Scholar
Digital Library
- Austin, T., Larson, E., and Ernst, D. SimpleScalar: An infrastructure for computer system modeling. IEEE Computer 35 (February 2002), 59--67. Google Scholar
Digital Library
- Baniasadi, A., and Moshovos, A. Instruction flow-based front-end throttling for power-aware high-performance processors. In Proceedings of the 2001 international symposium on Low power electronics and design (New York, NY, USA, 2001), ACM Press, pp. 16--21. Google Scholar
Digital Library
- Bellas, N. E., Hajj, I. N., and Polychronopoulos, C. D. Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Transactions on Very Large Scale Integrated Systems 8, 6 (2000), 693--708. Google Scholar
Digital Library
- Benitez, M. E., and Davidson, J. W. A portable global optimizer and linker. In Proceedings of the SIGPLAN'88 conference on Programming Language Design and Implementation (1988), ACM Press, pp. 329--338. Google Scholar
Digital Library
- Brooks, D., Tiwari, V., and Martonosi, M. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th annual International Symposium on Computer Architecture (New York, NY, USA, 2000), ACM Press, pp. 83--94. Google Scholar
Digital Library
- Ernst, D., Hamel, A., and Austin, T. Cyclone: A broadcast-free dynamic instruction scheduler with selective replay. In Proceedings of the 30th annual International Symposium on Computer Architecture (New York, NY, USA, 2003), ACM, pp. 253--263. Google Scholar
Digital Library
- Eyre, J., and Bier, J. DSP processors hit the mainstream. IEEE Computer 31, 8 (August 1998), 51--59. Google Scholar
Digital Library
- Gindele, J. Buffer block prefetching method. IBM Tech Disclosure Bulletin 20, 2 (July 1977), 696--697.Google Scholar
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. MiBench: A free, commercially representative embedded benchmark suite. IEEE 4th Annual Workshop on Workload Characterization (December 2001). Google Scholar
Digital Library
- Hines, S., Whalley, D., and Tyson, G. Guaranteeing hits to improve the efficiency of a small instruction cache. In Proceedings of the 40th annual ACM/IEEE International Symposium on Microarchitecture (December 2007), IEEE Computer Society, pp. 433--444. Google Scholar
Digital Library
- Hu, Z., Juang, P., Skadron, K., Clark, D., and Martonosi, M. Applying decay strategies to branch predictors for leakage energy savings. In Proceedings of the International Conference on Computer Design (September 2002), pp. 442--445. Google Scholar
Digital Library
- Kin, J., Gupta, M., and Mangione--Smith, W. H. Filtering memory references to increase energy efficiency. IEEE Transactions on Computers 49, 1 (2000), 1--15. Google Scholar
Digital Library
- Lee, L., Moyer, B., and Arends, J. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In Proceedings of the International Symposium on Low Power Electronics and Design (1999), pp. 267--269. Google Scholar
Digital Library
- Manne, S., Klauser, A., and Grunwald, D. Pipeline gating: speculation control for energy reduction. In Proceedings of the 25th annual International Symposium on Computer Architecture (Washington, DC, USA, 1998), IEEE Computer Society, pp. 132--141. Google Scholar
Digital Library
- Montanaro, J., Witek, R. T., Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Hoeppner, G. W., Kruckemyer, D., Lee, T. H., Lin, P. C. M., Madden, L., Murray, D., Pearce, M. H., Santhanam, S., Snyder, K. J., Stephany, R., and Thierauf, S. C. A 160-mhz, 32-b, 0.5-W CMOS RISC microprocessor. Digital Tech. J. 9, 1 (1997), 49--62. Google Scholar
Digital Library
- Parikh, D., Skadron, K., Zhang, Y., Barcella, M., and Stan, M. Power issues related to branch prediction. In Proceedings of the International Symposium on High Performance Computer Architecture (February 2002), pp. 233--244. Google Scholar
Digital Library
- Reinman, G., Calder, B., and Austin, T. M. High performance and energy efficient serial prefetch architecture. In ISHPC'02: Proceedings of the 4th International Symposium on High Performance Computing (London, UK, 2002), Springer-Verlag, pp. 146--159. Google Scholar
Digital Library
- Smith, A. J. Cache memories. ACM Comput. Surv. 14, 3 (1982), 473--530. Google Scholar
Digital Library
- Srinivasan, V., Davidson, E. S., and Tyson, G. S. A prefetch taxonomy. IEEE Trans. Comput. 53, 2 (2004), 126--140. Google Scholar
Digital Library
- Wilton, S. J., and Jouppi, N. P. CACTI: An enhanced cache access and cycle time model. IEEE Journal of Solid State Circuits 31, 5 (May 1996), 677--688.Google Scholar
Cross Ref
- Yang, C., and Orailoglu, A. Power efficient branch prediction through early identification of branch addresses. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (October 2006), pp. 169--178. Google Scholar
Digital Library
Index Terms
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
Recommendations
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsInstruction fetch behavior has been shown to be very regular and predictable, even for diverse application areas. In this work, we propose the Lookahead Instruction Fetch Engine (LIFE), which is designed to exploit the regularity present in instruction ...
Addressing instruction fetch bottlenecks by using an instruction register file
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsThe Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, ...
Addressing instruction fetch bottlenecks by using an instruction register file
Proceedings of the 2007 LCTES conferenceThe Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application's instructions, resulting in decreased code size, ...







Comments