skip to main content
research-article

An analytical approach for fast and accurate design space exploration of instruction caches

Authors Info & Claims
Published:24 December 2013Publication History
Skip Abstract Section

Abstract

Application-specific system-on-chip platforms create the opportunity to customize the cache configuration for optimal performance with minimal chip area. Simulation, in particular trace-driven simulation, is widely used to estimate cache hit rates. However, simulation is too slow to be deployed in design space exploration, especially when there are hundreds of design points and the traces are huge. In this article, we propose a novel analytical approach for design space exploration of instruction caches. Given the program control flow graph (CFG) annotated only with basic block and control flow edge execution counts, we first model the cache states at each point of the CFG in a probabilistic manner. Then, we exploit the structural similarities among related cache configurations to estimate the cache hit rates for multiple cache configurations in one pass. Experimental results indicate that our analysis is 28--2,500 times faster compared to the fastest known cache simulator while maintaining high accuracy (0.2% average error) in estimating cache hit rates for a large set of popular benchmarks. Moreover, compared to a state-of-the-art cache design space exploration technique, our approach achieves 304--8,086 times speedup and saves up to 62% (average 7%) energy for the evaluated benchmarks.

References

  1. Arnold, R., Mueller, F., Whalley, D., and Harmon, M. 1994. Bounding worst-case instruction cache performance. In Proceedings of the Real-Time Systems Symposium. 172--181.Google ScholarGoogle Scholar
  2. Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. IEEE Computer 35, 2, 59--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ball, T. 1994. Efficiently counting program events with support for on-line queries. ACM Trans. Program. Lang. Syst. 16, 5, 1399--1410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA'00). 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ghosh, A. and Givargis, T. 2004. Cache optimization for embedded processor cores: An analytical approach. ACM Trans. Des. Autom. Electron. Syst. 9, 4, 419--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gordon-Ross, A., Viana, P., Vahid, F., Najjar, W., and Barros, E. 2007. A one-shot configurable-cache tuner for improved energy and performance. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'07). 755--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Guillon, C., Rustello, F., Bidault, T., and Bouchez, F. 2004. Procedure placement using temporal-ordering information: Dealing with code size expansion. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'04). 268--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Guthaus, M. R., RingeNberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization. 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Haque, M. S., Janapsatya, A., and Parameswaran, S. 2009. SuSeSim: A fast simulation strategy to find optimal l1 cache configuration for embedded systems. In Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'09). 295--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hill, M. D. and Smith, A. J. 1989. Evaluating associativity in cpu caches. IEEE Trans. Comput. 38, 12, 1612--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Li, X. F., Mitra, T., Negi, H. S., and Roychoundhury, A. 2004. Design space exploration of caches using compressed traces. In Proceedings of the 18th Annual International Conference on Supercomputing (ICS'04). 116--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Li, Y., Callahan, T., Darnell, E., Harr, R., Kurkure, U., and Stockwood, J. 2000. Hardware-software co-design of embedded reconfigurable architectures. In Proceedings of the 37th Annual Design Automation Conference (DAC'00). 507--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Liang, Y. and Mitra, T. 2008a. Cache modeling in probabilistic execution time analysis. In Proceedings of the 45th Annual Design Automation Conference (DAC'08). 319--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Liang, Y. and Mitra, T. 2008b. Static analysis for fast and accurate design space exploration of caches. In Proceedings of the 6th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'08). 103--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Liang, Y. and Mitra, T. 2010a. lnstruction cache locking using temporal reuse profile. In Proceedings of the 47th Design Automation Conference (DAC'10). 344--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Liang, Y. and Mitra, T. 2010b. Improved procedure placement for set associative caches. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'10). 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mattson, R. L., Gecsel, J., Slute, D. R., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Montanaro, J., Witek, R. T. Anne, K., Black, A. J., Cooper, E. M., Dobberpuhl, D. W., Donahue, P. M., Eno, J., Farell, A., Hoeppner, G. W., Kruckmeyer, D., Lee, T. H., Lin, P. C. M, Madden, L., Murray, D., Pearce, M. H., Santhanam, S., Snyder, K. J., Stephany, R., and Thieruf, S. C. 1997. A 160-mhz, 32-b, 0.5-w cmos risc microprocessor. Digital Tech. J. 9, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Steven, J. E. W. and Norman, P. J. 1996. Cacti: An enhanced cache access and cycle time model. IEEE J. Solid-State Circuits 31, 677--688.Google ScholarGoogle ScholarCross RefCross Ref
  20. Sugumar, R. A. and Abraham, S. G. 1995. Set-associative cache simulation using generalized binomial trees. ACM Trans. Comput. Syst. 13, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Uhlig, R. A. and Mudge, T. N. 1997. Trace-driven memory simulation: A survey. ACM Comput. Surv. 29, 2, 128--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Wang, W. H. and Baer, J. L. 1991. Efficient trace-driven simulation methods for cache performance analysis. ACM Trans. Comput. Syst. 9, 3, 222--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Wu, Z. and Wolf, W. 1999, Iterative cache simulation of embedded CPUs with trace Stripping. In Proceedings of the 7th International Workshop on Hardware/Software Codesign (CODES'99). 95--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zhang, C. and Vahid, F. 2003. Cache configuratoin exploration on prototying platforms. In Proceeding of the 14th IEEE International Workshop on Rapid System Prototyping. 164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Zhang, C., Vahid, F., and Najjar, W. 2003. A highly configurable cache architecture for embedded systems. SIGARCH Comput. Archit. News 31, 2, 136--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zitzler, E., Deb, K., and Thiele, L. 2000. Comparison of multiobjective evolutionary algorithms: Empirical results. Evol. Comput. 8, 2, 173--195. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An analytical approach for fast and accurate design space exploration of instruction caches

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!