skip to main content
research-article

Combining code reordering and cache configuration

Published:01 January 2013Publication History
Skip Abstract Section

Abstract

The instruction cache is a popular optimization target due to the cache's high impact on system performance and power and because of the cache's predictable temporal and spatial locality. This article is an in depth study on the interaction of code reordering (a long-known technique) and cache configuration (a relatively new technique). Experimental results show that code reordering coupled with cache configuration reveals additional energy savings as high as 10--15% for several benchmarks with reduced cache area as high as 48%. To exploit these additional benefits, we architect and evaluate several design exploration heuristics for combining these two methods.

References

  1. Albonesi, D. H. 2002. Selective cache ways: on demand cache resource allocation. J. Instruction Level Parallel.Google ScholarGoogle Scholar
  2. Altera. 2010. Nios embedded processor system development. http://www.altera.com/corporate/news_room/releases/products/nr-nios_delivers_goods.html.Google ScholarGoogle Scholar
  3. Arc International 2010. www.arccores.com.Google ScholarGoogle Scholar
  4. ARM. 2010. www.arm.com.Google ScholarGoogle Scholar
  5. Aydin, H. and Kaeli, D. 2000. Using cache line coloring to perform aggressive procedure inlining. ACM SIGARCH News 28, 1, 62--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bahar, I. Calder, B., and Grunwald, D. A. 1998. Comparison of software code reordering and victim buffers. In Proceedings of the 3rd Workshop of Interaction Between Compilers and Computer Architecture.Google ScholarGoogle Scholar
  7. Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A., and Dwarkadas, S. 2000. Memory heirarchy reconfiguration for energy and performance in general-purpose processor architecture. In Proceedings of the 33rd International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bartolini, S. and Prete, C. A. 2005. Optimizing instruction cache performance of embedded systems. ACM Trans. Embedd. Comput. Syst. 4, 4, 934--965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Benini, L., Macii, A., Macii, E., and Poncino, M. 1999. Selective instruction compression for memory energy reduction in embedded systems. In Proceedings of the International Symposium on Low Power Emedded Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Burger, D., Austin, T., and Bennet, S. 2000. Evaluating future microprocessors: The simplescalar toolset. Tech. rep. CS-TR-1308. Computer Science Department, University of Wisconsin-Madison.Google ScholarGoogle Scholar
  11. Calder, B. and Grunwald, D. 1994. Reducing branch costs via branch alignment. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen, J. and Leupen, B. 1997. Improving instruction locality with just-in-time code layout. In Proceedings of the USENIX Windows NT Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chen, Y. and Zhang, F. 2007. Code reordering on limited branch offset. ACM Trans. Architec. Code Optimz. 4, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cohn, R., Goodwin, P., Lowney, G., and Rubin, N. 1997. Spike: An optimizer for Alpha/NT executables. In Proceedings of the USENIX Windows NT Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cohn. R. and Lowney, P. G. 2000. Design and analysis of profile-based optimization in Compaq's compilation tools for Alpha. J. Instruction Level Parallelism 2.Google ScholarGoogle Scholar
  16. Dinero I. 2010. http://www.cs.wisc.edu/~markhill/DineroIV/.Google ScholarGoogle Scholar
  17. EEMBC. 2010. The Embedded Microprocessor Benchmark Consortium. www.eembc.org.Google ScholarGoogle Scholar
  18. Ghosh, A. and Givargis, T. 2003. Cache optimization for embedded processor cores: an analytical approach. In Proceedings of the International Conference on Computer Aided Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Givargis, T. and Vahid, F. 2002. Platune: a tuning framework for system-on-a-chip platforms. IEEE Trans. Comput. Aid. Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gloy, N., Blackwell, T., Smith, M. D., and Calder, B. 1997. Procedure placement using temporal ordering information. In Proceedings of the 30th Anual ACM/IEEE International Symposium on Microarchitecture. 303--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gordon-Ross, A., Cotterell, and Vahid, F. 2002. Exploiting fixed programs in embedded systems: A Loop cache example. Comput. Architec. Letters 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gordon-Ross, A., Lau, J., and Calder, B. 2008. Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy. In Proceedings of the 18th ACM Great Lakes Symposium on VLSI (GLSVLSI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gordon-Ross, A. and Vahid, F. 2002. Dynamic loop caching meets preloaded loop caching—a hybrid approach. In Proceedings of the International Conference on Computer Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gordon-Ross, A., Vahid, F., and Dutt, N. 2009. Fast Configurable-Cache Tuning with a Unified Second-Level Cache. IEEE Trans. VLSI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hashemi, A., Kaeli, D., and Calder, B. 1997. Efficient procedure mapping using cache line coloring. In Proceedings of the International Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hines, S., Whalley, D., and Tyson, G. 2007. Guaranteeing hits to improve the efficiency of a small instruction cache. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Huang, X., Blackburn, S., Grove, D., and McKinley, K. 2006a. Fast and efficient partial code reordering: taking advantage of a dynamic recompiler. In Proceedings of the International Symposium on Memory Managment. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Huang, X., Lewis, T., and McKinley, K. 2006b. Dyanmic code management: improving whole program code locality in managed runtimes. In Proceedings of the ACM International Conference on Virtual Execution Environments. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hwu, W. W. and Chang, P. 1989. Achieving high instruction cache performance with an optimizing compiler. In Proceedings of the 16th Annual Intl. Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kalmatianos, J. and Kaeli, D. 1999. Code reordering for multi-level cache hierarchies. Northeeastern University Computer Architecture Research Group. http://www.ece.neu.edu/info/architecture/publications. html.Google ScholarGoogle Scholar
  31. Kalmatianos and J., Kaeli, D. 2000. Accurate simulation and evaluation of code reordering. In Proceedings of the IEEE International Symposium on the Performance Analysis of Systems and Software. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kin, J., Gupta, M., and Mangione-Smith, W. The filter cache: an energy efficient memory structure. In Proceedings of the IEEE Micro. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lee, D., Baer, J., Bershad, B., and Anderson, T. 1999a. Reducing startup latency in web and desktop applications. In Proceedings of the Windows NT Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lee, L. H., Moyer, W., and Arends, J. 1999b. Low cost Embedded Program Loop Caching -- Revisited. Tech. rep. N CSE-TR-411-99, University of Michigan.Google ScholarGoogle Scholar
  35. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: a tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of the 30th Annual International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Malik, A., Moyer, W., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. McFarling. S. 1989. Program optimization for instruction caches. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS III). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. MIPS Technologies. 2010. www.mips.com.Google ScholarGoogle Scholar
  39. Moseley, P., Debray, S., and Andrews, G. Checking program profiles. In Proceedings of the 3rd IEEE International Workshop of Source Code Analysis and Manipulation.Google ScholarGoogle Scholar
  40. Muth, R., Debray, S., Watterson, S., and de Bosschere, K. 2001. Alto: a link-time optimizer for the Compaq Alpha. Softw. Pract. Exper. 31, 6, 67--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Palesi, M. and Givargis, T. 2002. Multi-objective design space exploration using genetic algorithms. In Proceedings of the International Workshop on Hardware/Software Codesign. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Pettis, K. and Hansen, R. 1990. Profile guided code positioning. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ramirez, A. 2005. Code placement for improving dynamic branch prediction accuracy. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ramirez, A., Larriba-Pay, J. Navarro, C., Valero, M., and Torrellas, J. 2002. Software trace caches for commerial applications. Int. J. Parallel Program. 30, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ramirez, A., Larriba-Pey, J., and Valero, M. 2000. The effect of code reordering on branch predition. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Ramirez, A., Larriba-Pey, J., and Valero, M. 2001. Instruction fetch architectures and code layout optimizations. Proc. IEEE 89, 11.Google ScholarGoogle ScholarCross RefCross Ref
  47. Ramirez, A., Larriba-Pay, J., and Valero, M. 2005. Software trace caches. IEEE Trans. Comput. 54, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Reinman, G. and Jouppi, N. P. 1999. Cacti2.0: An integraded cache timing and power model. Tech rep., COMPAQ Western Research Lab.Google ScholarGoogle Scholar
  49. Samples, A. D., and Hilfinger, P. N. 1988. Code reorganization for instruction caches. Techn. rep. UCB/CSD 88/447, University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sanghai, K., Kaeli, D., Raikman, A., and Butler, K. 2007. A code layout framework for embedded processors with configurable memory hierarchy. In Proceedings of the Workshop on Optimizations for DSP and Embedded Systems (ODES).Google ScholarGoogle Scholar
  51. Scales, D. 1998. Efficient dynamic procedure placement. Tech. rep. WRL-98/5, Compaq WRL Research Lab.Google ScholarGoogle Scholar
  52. Scharz, B., Debray, S., Andrews, G., and Legendre, M. 2001. PLTO: a link-time optimizer for the Intel IA-32 architecture. In Proceedings of the Workshop on Binary Translation (WBT).Google ScholarGoogle Scholar
  53. Schmidt, W. J., Roediger, R. R., Mestad, C. S., Mendelson, B., Shavit-Lottem, I., and Bortnikov-and Sitnitsky, V. 1998. Profile-directed restructuring of operation system code. IBM Syst. J. 37, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Srivastava, A., and Wall, D. W. 1992. A practical system of intermodule code optimization at link-time. J. Program. Lang. 11, 1, 1--18.Google ScholarGoogle Scholar
  55. Su, C. and Despain, A. M. 1995. Cache design trade-offs for power and performance optimization: a case study. Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Tensilica. 2010. Xtensa processor generator. http://www.tensilica.com/.Google ScholarGoogle Scholar
  57. Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. 2001. Loop analysis of embedded applications. Tech. rep. UCR-CSR-01-03, University of California Riverside.Google ScholarGoogle Scholar
  58. Zhang, C. and Vahid, F. 2003. Cache configuration exploration on prototyping platforms. In Proceedings of the 14th IEEE International Workshop on Rapid System Prototyping (RSP- 03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Zhang, C., Vahid, F., and Najjar, W. 2003. A highly-configurable cache architecture for embedded eystems. In Proceedings of the 30th Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Zhang, C. and Vahid, F. 2004a. Using a victim buffer in an application-specific memory hierarchy. In Proceedings of the Design, Automation and Test (DATE) Conference in Europe. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Zhang, C. and Vahid, F. 2004b. A self-tuning cache architecture for embedded systems. In Proceedings of the Design, Automation and Test (DATE) Conference in Europe. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Combining code reordering and cache configuration

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!