skip to main content
research-article

DLIC: Decoded loop instructions caching for energy-aware embedded processors

Published:05 September 2013Publication History
Skip Abstract Section

Abstract

With the explosive proliferation of embedded systems, especially through countless portable devices and wireless equipment used, embedded systems have become indispensable to the modern society and people's life. Those devices are often battery driven. Therefore, low energy consumption in embedded processors is important and becomes critical in step with the system complexity. The on-chip instruction cache (I-cache) is usually the most energy-consuming component on the processor chip due to its large size and frequent access operations. To reduce such energy consumption, the existing loop cache approaches use a tiny decoded cache to filter the I-cache access and instruction decode activity for repeated loop iterations. However, such designs are effective for small and simple loops, and only suitable for DSP kernel-like applications. They are not effectual for many embedded applications where complex loops are common. In this article, we propose a decoded loop instruction cache (DLIC) that is small, hence energy efficient, yet can capture most loops, including large nested ones with branch executions, so that a significant amount of I-cache accesses and instruction decoding can be eradicated. The experiments on a set of embedded benchmarks show that our proposed DLIC scheme can reduce energy consumption by up to 87% as compared to normal cache-only design. On average, 66% energy can be saved on instruction fetching and decoding, while at a performance overhead of only 1.4%.

References

  1. Aa, T. V., Jayapala, M., Barat, F., Deconinck, G., Lauwereins, R., Catthoor, F., and Corporaal, H. 2004. Instruction buffering exploration for low energy VLIWs with instruction clusters. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'04). 824--829. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Anderson, T. and Agarwala, S. 2000. Effective hardware-based two-way loop cache for high performance low power processors. In Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers & Processors. 403--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bajwa, R. S., Hiraki, M., Kojima, H., Gorny, D. J., Nitta, K., Shridhar, A., Seki, K., and Sasaki, K. 1997. Instruction buffering to reduce power in processors for signal processing. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 5, 4, 417--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bellas, N. E., Hajj, I. N., and Polychronopoulos, C. D. 2000a. Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 8, 6, 693--708. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bellas, N. E., Hajj, I. N., Polychronopoulos, C. D., and Stamoulis, G. 1999. Energy and performance improvements in microprocessor design using a loop cache. In Proceedings of the International Conference on Computer Design (ICCD'99). 378--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bellas, N. E., Hajj, I. N., Polychronopoulos, C. D., and Stamoulis, G. 2000b. Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 8, 3, 317--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Burger, D. and Austin, T. 1997. The simplescalar tool set, version 2.0. Tech. rep. CS-TR-1997-1342, Department of Computer Science, University of Wisconsin, Madison, WI.Google ScholarGoogle Scholar
  8. Chang, Y.-J. 2006. Lazy BTB: Reduce BTB energy consumption using dynamic profiling. In Proceedings of the Asia and South Pacific Design Automation Conference(ASP-DAC'06). 917--922. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chang, Y.-J., Ruan, S.-J., and Lai, F. 2003. Design and analysis of low-power cache using two-level filter scheme. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 11, 4, 568--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dally, W. J., Balfour, J., Black-Shaffer, D., Chen, J., Harting, R. C., Parikh, V., Park, J., and Sheffield, D. 2008. Efficient embedded computing. IEEE Comput. 41, 7, 27--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ghose, K. and Kamble, M. B. 1999. Reducing power in superscalar processor caches using subbanking, multiple line buffers, and bit-line segmentation. In Proceedings of the International Symposium on Low Power Electronics and Design. 70--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. González, R., Cristal, A., Ortega, D., Veidenbaum, A., and Valero, M. 2004. A content aware integer register file organization. ACM SIGARCH Comput. Architect. News 32, 2, 314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gordon-Ross, A. and Vahid, F. 2003. Frequent loop detection using efficient non-intrusive on-chip hardware. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). 117--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Gu, J., Guo, H., and Lee, P. 2011. An on-chip instruction cache design with one-bit tag for low power embedded systems. Microprocess. Microsyst. 35, 4, 382--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guan, X. and Fei, Y. 2008. Reducing power consumption of embedded processors through register file partitioning and compiler support. In Proceedings of the 19th IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP'08). 269--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE 4th Annual Workshop on Workload Characterization. 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hu, Z. and Martonosi, M. 2000. Reducing register file power consumption by exploiting value lifetime characteristics. In Proceedings of the Workshop on Complexity Effectice Design (in conjunction with ISCA'00).Google ScholarGoogle Scholar
  18. Itoh, M., Higaki, S., Takeuchi, Y., Kitajima, A., Imai, M., Sato, J., and Shiomi, A. 2000. Peas-iii: An asip design environment. In Proceedings of the IEEE International Conference on Computer Design. 430--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kahn, R. and Weiss, S. 2008. Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers. Microprocess. Microsyst. 32, 8, 425--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kim, S., Vijaykrishnan, N., Kandemir, M., Sivasubramaniam, A., Irwin, M. J., and Geethanjali, E. 2001. Power-aware partitioned cache architectures. In Proceedings of the International Symposium on Low Power Electronics and Design. 64--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kin, J., Gupta, M., and Mangione-Simith, W. H. 1997. The filter cache: An energy efficient memory structure. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. 184--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'00). 241--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Manne, S., Klauser, A., and Grunwald, D. 1998. Pipeline gating: Speculation control for energy reduction. In Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA'98). 132--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Min, R., Xu, Z., Hu, Y., and ben Jone, W. 2004. Partial tag comparison: A new technology for power-efficient set-associative cache designs. In Proceedings of the 17th International Conference on VLSI Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Montanaro, J., Witek, R. T., Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Hoeppner, G. W., Kruckmeyer, D., Lee, T. H., Lin, P. C. M., Madden, L., Murray, D., Pearce, M. H., Santhanam, S., Snyder, K. J., Stephang, R., and Thierauf, S. C. 1996. A 160-mhz, 32-b, 0.5-w cmos risc microprocessor. IEEE J. Solid-State Circuits 31, 11, 1703--1714.Google ScholarGoogle ScholarCross RefCross Ref
  26. Nalluri, R., Garg, R., and Panda, P. R. 2007. Customization of register file banking architecture for low power. In Proceedings of the 20th International Conference on VLSI Design (VLSID'07). 239--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Panwar, R. and Rennels, D. 1995. Reducing the frequency of tag compares for low power i-cache design. In Proceedings of the International Symposium on Low Power Electronics and Design. 57--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ravindran, R. A., Nagarkar, P. D., Dasika, G. S., Marsman, E. D., Senger, R. M., Mahlke, S. A., and Brown, R. B. 2005. Compiler managed dynamic instruction placement in a low-power code cache. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'05). 179--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Rawlins, M. and Gordon-Ross, A. 2010. Lightweight runtime control flow analysis for adaptive loop caching. In Proceedings of the 20th Symposium on Great Lakes Symposium on VLSI (GLSVLSI'10). 239--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Rixner, S., Dally, W. J., Khailany, B., Mattson, P., Kapasi, U. J., and Owens, J. D. 2000. Register organization for media processing. In Proceedings of the 6th International Symposium on High-Performance Computer Architecture (HPCA-6). 375--386.Google ScholarGoogle Scholar
  31. Scott, J., Lee, L. H., Arends, J., and Moyer, B. 1998. Designing the low-power m-core architecture. In Proceedings of the International Sympsium on Computer Architecture Power Driven Microarchitecture Workshop. 145--150.Google ScholarGoogle Scholar
  32. Solomon, B., Mendelson, A., Orenstein, D., Almog, Y., and Ronen, R. 2003. Micro-operation cache: A power aware frontend for the variable instruction length ISA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 11, 5, 801--811. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Su, C.-L. and Despain, A. M. 1995. Cache design trade-offs for power and performance optimization: A case study. In Proceedings of the International Symposium on Low Power Electronics and Design. 63--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tang, W., Gupta, R., and Nicolau, A. 2002. Power savings in embedded processors through decode filer cache. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'02). 443--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Thoziyoor, S., Muralimanohar, N., Ahn, J. H., and Jouppi, N. P. 2008. Cacti: An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model. Tech. rep. HPL-2008-20, HP Laboratories.Google ScholarGoogle Scholar
  36. Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. 2002. A study on the loop behavior of embedded programs. Tech. rep. UCR-CSE-01-03, University of California, Riverside.Google ScholarGoogle Scholar
  37. Vivekanandarajah, K., Srikanthan, T., and Bhattacharyya, S. 2004. Decode filter cache for energy efficient instruction cache hierarchy in super scalar architectures. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC'04). 373--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Wang, S., Hu, J., and Ziavras, S. G. 2008. BTB access filtering: A low energy and high performance design. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'08). 81--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Zeng, H. and Ghose, K. 2006. Register file caching for energy efficiency. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'06). 244--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhang, C., Vahid, F., Yang, J., and Najjar, W. Marc. 2005. A way-halting cache for low-energy high-performance systems. ACM Trans. Architect. Code Optim. (TACO) 2, 1, 34--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Zhang, W. and Allu, B. 2007. Reducing branch predictor leakage energy by exploiting loops. ACM Trans. Embed. Comput. Syst. (TECS) 6, 2, Article 11. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DLIC: Decoded loop instructions caching for energy-aware embedded processors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)10
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!