skip to main content
research-article

Superoptimization of memory subsystems

Published:12 June 2014Publication History
Skip Abstract Section

Abstract

The disparity in performance between processors and main memories has led computer architects to incorporate large cache hierarchies in modern computers. Because these cache hierarchies are designed to be general-purpose, they may not provide the best possible performance for a given application. In this paper, we determine a memory subsystem well suited for a given application and main memory by discovering a memory subsystem comprised of caches,scratchpads, and other components that are combined to provide better performance. We draw motivation from the superoptimization of instruction sequences, which successfully finds unusually clever instruction sequences for programs. Targeting both ASIC and FPGA devices, we show that it is possible to discover unusual memory subsystems that provide performance improvements over a typical memory subsystem.

References

  1. M. Adler, K. E. Fleming, A. Parashar, M. Pellauer, and J. Emer. LEAP scratchpads: automatic memory and cache management for reconfigurable logic. In Proc. of 19th ACM/SIGDA Int'l Symp. on Field Programmable Gate Arrays, pages 25--28, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Balasubramonian, D. H. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. A dynamically tunable memory hierarchy. IEEE Trans. on Computers, 52(10):1243--1258, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Calder, C. Krintz, S. John, and T. Austin. Cache-conscious data placement. In Proc. of 8th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 139--149, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Carr, K. S. McKinley, and C.-W. Tseng. Compiler optimizations for improving data locality. In Proc. of 6th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 252--262, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Chang, P. Ranganathan, D. A. Roberts, M. A. Shah, and J. Sontag. Data storage apparatus and methods, Mar. 2012. US Patent App. 2012/0131278.Google ScholarGoogle Scholar
  6. T. M. Chilimbi, B. Davidson, and J. R. Larus. Cache-conscious structure definition. In Proc. of ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 13--24, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. M. Chilimbi, M. D. Hill, and J. R. Larus. Cache-conscious structure layout. In Proc. of ACM Conf. on Programming Language Design and Implementation, pages 1--12, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Dueck and T. Scheuer. Threshold accepting: a general purpose optimization algorithm appearing superior to simulated annealing. Journal of Computational Physics, 90(1):161--175, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cacheoblivious algorithms. In Proc. of 40th Symp. on Foundations of Computer Science, pages 285--297, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Ghosh and T. Givargis. Cache optimization for embedded processor cores: An analytical approach. ACM Trans. on Design Automation of Electronic Systems, 9(4):419--440, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Gordon-Ross, F. Vahid, and N. Dutt. Automatic tuning of two-level caches to embedded applications. In Proc. of the Conf. on Design, Automation and Test in Europe, page 10208, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Gordon-Ross, F. Vahid, and N. Dutt. Fast configurable-cache tuning with a unified second-level cache. In Proc. of Int'l Symp. on Low Power Electronics and Design, pages 323--326, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Granlund and R. Kenner. Eliminating branches using a superoptimizer and the GNU C compiler. In Proc. of ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 341--352, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proc. of 4th Int'l Workshop on Workload Characterization, pages 3--14, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. C. Hu, A. B. Kahng, and C.-W. A. Tsao. Old bachelor acceptance: A new class of non-monotone threshold accepting methods. ORSA Journal on Computing, 7(4):417--425, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  16. E. Ïpek, S. A. McKee, R. Caruana, B. R. de Supinski, and M. Schulz. Efficiently exploring architectural design spaces via predictive modeling. In Proc. of 12th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 195--206, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. of 17th Int'l Symp. on Computer Architecture, pages 364--373, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simmulated annealing. Science, 220(4598):671--680, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  19. B. C. Lee and D. M. Brooks. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proc. of 12th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 185--194, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Massalin. Superoptimizer: a look at the smallest program. In Proc. of 2nd Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 122--126, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Naz. Split Array and Scalar Data Caches: A Comprehensive Study of Data Cache Organization. PhD thesis, Univ. of North Texas, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proc. of ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 89--100, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Panda, N. Dutt, and A. Nicolau. Local memory exploration and optimization in embedded systems. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 18(1):3--13, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Putnam, S. Eggers, D. Bennett, E. Dellinger, J. Mason, H. Styles, P. Sundararajan, and R.Wittig. Performance and power of cache-based reconfigurable computing. ACM SIGARCH Computer Architecture News, 37(3):395--405, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Ranjan Panda, N. D. Dutt, A. Nicolau, F. Catthoor, A. Vandecappelle, E. Brockmeyer, C. Kulkarni, and E. De Greef. Data memory organization and optimizations in application-specific systems. IEEE Design & Test of Computers, 18(3):56--68, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Raspberry Pi. http://www.raspberrypi.org.Google ScholarGoogle Scholar
  27. E. Schkufza, R. Sharma, and A. Aiken. Stochastic superoptimization. In Proc. of 18th Int'l Conf. on Architectural Support for Programming Languages and Operating Systems, pages 305--316, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Sen, S. Chatterjee, and N. Dumir. Towards a theory of cacheefficient algorithms. Journal of the ACM, 49(6):828--858, Nov. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. K. T. Sundararajan, T. M. Jones, and N. P. Topham. Smart cache: A self adaptive cache architecture for energy efficiency. In Proc. of Int'l Conf. on Embedded Computer Systems, pages 41--50, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  30. S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. HP Laboratories, 2, Apr. 2008.Google ScholarGoogle Scholar
  31. A. Veidenbaum, W. Tang, R. Gupta, A. Nicolau, and X. Ji. Adapting cache line size to application behavior. In Proc. of 13th Int'l Conf. on Supercomputing, pages 145--154, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. G. Wingbermuehle, R. D. Chamberlain, and R. K. Cytron. ScalaPipe: A streaming application generator. In Proc. of 2012 Symp. on Application Accelerators in High-Performance Computing, pages 244--254, July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. G. Wingbermuehle, R. K. Cytron, and R. D. Chamberlain. Optimization of application-specific memories. Computer Architecture Letters, Apr. 2013.Google ScholarGoogle Scholar
  34. W. A. Wulf and S. A. McKee. Hitting the memory wall: Implications of the obvious. ACM SIGARCH Computer Architecture News, 23(1): 20--24, Mar. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Zhang and F. Vahid. Using a victim buffer in an application-specific memory hierarchy. In Proc. of Design, Automation and Test in Europe Conference and Exhibition, pages 220--225, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Superoptimization of memory subsystems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 49, Issue 5
          LCTES '14
          May 2014
          162 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2666357
          Issue’s Table of Contents
          • cover image ACM Conferences
            LCTES '14: Proceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
            June 2014
            174 pages
            ISBN:9781450328777
            DOI:10.1145/2597809

          Copyright © 2014 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 June 2014

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!