skip to main content
research-article
Open Access

SEAMS: Self-Optimizing Runtime Manager for Approximate Memory Hierarchies

Published:29 July 2021Publication History
Skip Abstract Section

Abstract

Memory approximation techniques are commonly limited in scope, targeting individual levels of the memory hierarchy. Existing approximation techniques for a full memory hierarchy determine optimal configurations at design-time provided a goal and application. Such policies are rigid: they cannot adapt to unknown workloads and must be redesigned for different memory configurations and technologies. We propose SEAMS: the first self-optimizing runtime manager for coordinating configurable approximation knobs across all levels of the memory hierarchy. SEAMS continuously updates and optimizes its approximation management policy throughout runtime for diverse workloads. SEAMS optimizes the approximate memory configuration to minimize energy consumption without compromising the quality threshold specified by application developers. SEAMS can (1) learn a policy at runtime to manage variable application quality of service (QoS) constraints, (2) automatically optimize for a target metric within those constraints, and (3) coordinate runtime decisions for interdependent knobs and subsystems. We demonstrate SEAMS’ ability to efficiently provide functions (1)–(3) on a RISC-V Linux platform with approximate memory segments in the on-chip cache and main memory. We demonstrate SEAMS’ ability to save up to 37% energy in the memory subsystem without any design-time overhead. We show SEAMS’ ability to reduce QoS violations by 75% with < 5% additional energy.

References

  1. Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke. 2009. Enabling ultra low voltage system operation by tolerating on-chip cache failures. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design. Association for Computing Machinery, New York, NY, 307–310.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jason Ansel, Yee Lok Wong, Cy Chan, Marek Olszewski, Alan Edelman, and Saman Amarasinghe. 2011. Language and compiler support for auto-tuning variable-accuracy algorithms. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 85–96.Google ScholarGoogle ScholarCross RefCross Ref
  3. F. Arnaud, A. Thean, M. Eller, M. Lipinski, Y. W. Teh, M. Ostermayr, K. Kang, N. S. Kim, K. Ohuchi, J. P. Han, et al. 2009. Competitive and cost effective high-k based 28nm CMOS technology for low power applications. In IEEE International Electron Devices Meeting (IEDM). IEEE, 1–4.Google ScholarGoogle ScholarCross RefCross Ref
  4. Woongki Baek and Trishul M. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proceedings of Programming Language Design and Implementation. Association for Computing Machinery, New York, NY, 198–209.Google ScholarGoogle Scholar
  5. Jonathan Balkind, Katie Lim, Fei Gao, Jinzheng Tu, David Wentzlaff, Michael Schaffner, Florian Zaruba, and Luca Benini. 2019. OpenPiton+ Ariane: The first open-source, SMP Linux-booting RISC-V system scaling from one to many cores. In Third Workshop on Computer Architecture Research with RISC-V, CARRV. CARRV.Google ScholarGoogle Scholar
  6. John Canny. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8 (1986), 679–698.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Trevor E. Carlson, Wim Heirman, Stijn Eyerman, Ibrahim Hur, and Lieven Eeckhout. 2014. An Evaluation of high-level mechanistic core models. ACM Transactions on Architecture and Code Optimization (TACO) 11 (2014), 1–25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Aaron Carroll, Gernot Heiser, et al. 2010. An analysis of power consumption in a smartphone. In Proceedings of Annual Technical Conference, Vol. 14. USENIX Association, 21.Google ScholarGoogle Scholar
  9. Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference. Association for Computing Machinery, New York, NY, 1–9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kyungsang Cho, Yongjun Lee, Young H. Oh, Gyoo-cheol Hwang, and Jae W. Lee. 2014. eDRAM-based tiered-reliability memory with applications to low-power frame buffers. In Proceedings of International Symposium on Low Power Electronics and Design. Association for Computing Machinery, New York, NY, 333–338.Google ScholarGoogle Scholar
  11. Minki Cho, Jason Schlessman, Wayne Wolf, and Saibal Mukhopadhyay. 2009. Accuracy-aware SRAM: A reconfigurable low power SRAM architecture for mobile multimedia applications. In Asia and South Pacific Design Automation Conference.Google ScholarGoogle ScholarCross RefCross Ref
  12. Bryan Donyanavard, Tiago Mück, Amir M. Rahmani, Nikil Dutt, Armin Sadighi, Florian Maurer, and Andreas Herkersdorf. 2019. SOSA: Self-optimizing learning with self-adaptive control for hierarchical system-on-chip management. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, 685–698.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rudolf Eigenmann et al. 2017. Harnessing parallelism in multicore systems to expedite and improve function approximation. In Languages and Compilers for Parallel Computing. Springer International Publishing, Cham, 88–92.Google ScholarGoogle Scholar
  14. Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Architecture support for disciplined approximate programming. In Proceedings of the 17th international Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 301–312.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Thomas Goldbrunner, Thomas Wild, and Andreas Herkersdorf. 2018. Memory access pattern profiling for streaming applications based on MATLAB models. In 28th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS’18). IEEE, 32–38.Google ScholarGoogle ScholarCross RefCross Ref
  16. Beayna Grigorian, Nazanin Farahpour, and Glenn Reinman. 2015. BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing. In IEEE 21st International Symposium on High Performance Computer Architecture. IEEE, 615–626.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jiajia Jiao. 2020. HEAP: A holistic error assessment framework for multiple approximations using probabilistic graphical models. Electronics 9 (2020), 373.Google ScholarGoogle ScholarCross RefCross Ref
  18. Matthias Jung, Éder Zulian, Deepak M. Mathew, Matthias Herrmann, Christian Brugger, Christian Weis, and Norbert Wehn. 2015. Omitting refresh: A case study for commodity and wide I/O DRAMs. In Proceedings of the 2015 International Symposium on Memory Systems. Association for Computing Machinery, New York, NY, 85–91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Skanda Koppula, Lois Orosa, A. Giray Yağlıkçı, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, and Onur Mutlu. 2019. EDEN: Enabling energy-efficient, high-performance deep neural network inference using approximate DRAM. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, 166–181.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, 469–480.Google ScholarGoogle Scholar
  21. Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn. 2011. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating sSystems. Association for Computing Machinery, New York, NY, 213–224.Google ScholarGoogle Scholar
  22. Biswadip Maity, Bryan Donyanavard, and Nikil Dutt. 2020. Self-aware memory management for emerging energy-efficient architectures. In 2020 11th International Green and Sustainable Computing Workshops (IGSC’20). 1–8.Google ScholarGoogle ScholarCross RefCross Ref
  23. Biswadip Maity, Majid Shoushtari, Amir M. Rahmani, and Nikil Dutt. 2019. Self-adaptive memory approximation: A formal control theory approach. IEEE Embedded Systems Letters 12 (2019), 33–36.Google ScholarGoogle ScholarCross RefCross Ref
  24. Biswadip Maity, Majid Shoushtari, Amir M. Rahmani, and Nikil Dutt. 2019. Simulation infrastructure and system dynamics of quality configurable memory. CECS Technical Report 19-03.Google ScholarGoogle Scholar
  25. Mahmoud Masadeh, Osman Hasan, and Sofiene Tahar. 2019. Using machine learning for quality configurable approximate computing. In Design, Automation & Test in Europe Conference & Exhibition (DATE’19). IEEE, 1575–1578.Google ScholarGoogle Scholar
  26. Mahmoud Masadeh, Osman Hasan, and Sofiene Tahar. 2020. Machine learning-based self-compensating approximate computing. arXiv:2001.03783.Google ScholarGoogle Scholar
  27. Joshua San Miguel, Mario Badr, and Natalie Enright Jerger. 2014. Load value approximation. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 127–139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48 (2016), 1–33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kasra Moazzemi, Biswadip Maity, Saehanseul Yi, Amir M. Rahmani, and Nikil Dutt. 2019. HESSLE-FREE: Heterogeneous systems leveraging fuzzy control for runtime resource management. ACM Trans.actions on Embedded Computer Systems 18, 5s, Article 74 (2019).Google ScholarGoogle Scholar
  30. Amir Mahdi Hosseini Monazzah, Majid Shoushtari, Seyed Ghassem Miremadi, Amir M. Rahmani, and Nikil Dutt. 2017. QuARK: Quality-configurable approximate STT-MRAM cache by fine-grained tuning of reliability-energy knobs. In IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’17). IEEE, 1–6.Google ScholarGoogle Scholar
  31. Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, and Mark Oskin. 2015. SNNAP: Approximate computing on programmable SoCs via neural acceleration. In IEEE 21st International Symposium on High Performance Computer Architecture. IEEE, 603–614.Google ScholarGoogle ScholarCross RefCross Ref
  32. Moinuddin K. Qureshi, John Karidis, Michele Franceschini, Vijayalakshmi Srinivasan, Luis Lastras, and Bulent Abali. 2009. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). Association for Computing Machinery, New York, NY, 14–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Raha and V. Raghunathan. 2018. Approximating beyond the processor: Exploring full-system energy-accuracy tradeoffs in a smart camera system. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 12 (2018), 2884–2897.Google ScholarGoogle ScholarCross RefCross Ref
  34. A. Raha, S. Sutar, H. Jayakumar, and V. Raghunathan. 2017. Quality configurable approximate DRAM. IEEE Transactions on Computers 66, 7 (2017), 1172–1187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Michael Ringenburg, Adrian Sampson, Isaac Ackerman, Luis Ceze, and Dan Grossman. 2015. Monitoring and debugging the quality of results in approximate programs. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 399–411.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pooja Roy, Rajarshi Ray, Chundong Wang, and Weng Fai Wong. 2014. ASAC: Automatic sensitivity analysis for approximate computing. In Proceedings of the SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems. Association for Computing Machinery, New York, NY, 95–104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Felipe Sampaio, Muhammad Shafique, Bruno Zatt, Sergio Bampi, and Jörg Henkel. 2015. Approximation-aware Multi-Level Cells STT-RAM cache architecture. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’15). IEEE, 79–88.Google ScholarGoogle ScholarCross RefCross Ref
  38. Adrian Sampson, Jacob Nelson, Karin Strauss, and Luis Ceze. 2013. Approximate storage in solid-state memories. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, 25–36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Majid Shoushtari, Abbas BanaiyanMofrad, and Nikil Dutt. 2015. Exploiting partially-forgetful memories for approximate computing. IEEE Embedded Systems Letters 7 (2015), 19–22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Clinton W. Smullen, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In IEEE 17th International Symposium on High Performance Computer Architecture. IEEE, 50–61.Google ScholarGoogle Scholar
  41. Renée St. Amant, Amir Yazdanbakhsh, Jongse Park, Bradley Thwaites, Hadi Esmaeilzadeh, Arjang Hassibi, Luis Ceze, and Doug Burger. 2014. General-purpose code acceleration with limited-precision analog computation. In Proceedings of the 41st Annual International Symposium on Computer Architecuture. IEEE, 505–516.Google ScholarGoogle ScholarCross RefCross Ref
  42. Richard S. Sutton. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3 (1988), 9–44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Richard S. Sutton and Andrew G. Barto. 2018. Introduction to Reinforcement Learning. MIT Ppress, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Mohammad Taghi Teimoori, Muhammad Abdullah Hanif, Alireza Ejlali, and Muhammad Shafique. 2018. AdAM: Adaptive approximation management for the non-volatile memory hierarchies. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 785–790.Google ScholarGoogle ScholarCross RefCross Ref
  45. Radha Venkatagiri, Khalique Ahmed, Abdulrahman Mahmoud, Sasa Misailovic, Darko Marinov, Christopher W. Fletcher, and Sarita V. Adve. 2019. gem5-Approxilyzer: An open-source tool for application-level soft error analysis. In 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’19). IEEE, 214–221.Google ScholarGoogle Scholar
  46. Radha Venkatagiri, Abdulrahman Mahmoud, Siva Kumar Sastry Hari, and Sarita V. Adve. 2016. Approxilyzer: Towards a systematic framework for instruction-level approximate computing and its application to hardware resiliency. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, 1–14.Google ScholarGoogle Scholar
  47. Andrew Waterman, Yunsup Lee, David Patterson, and Krste Asanović. 2014. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2, Technical Report No. UCB/EECS-2014-54. University of California at Berkeley.Google ScholarGoogle Scholar
  48. Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. In Machine Learning, Vol. 8. Springer Science and Business Media LLC, 279–292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Chelsea C. White III and Douglas J. White. 1989. Markov decision processes. European Journal of Operational Research 39 (1989), 1–16.Google ScholarGoogle ScholarCross RefCross Ref
  50. Roohollah Yarmand, Mehdi Kamal, Ali Afzali-Kusha, and Massoud Pedram. 2019. DART: A framework for determining approximation levels in an approximable memory hierarchy. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28 (2019), 273–286.Google ScholarGoogle ScholarCross RefCross Ref
  51. Amir Yazdanbakhsh, Divya Mahajan, Hadi Esmaeilzadeh, and Pejman Lotfi-Kamran. 2017. AxBench: A multiplatform benchmark suite for approximate computing. IEEE Design & Test 34 (2017), 60–68.Google ScholarGoogle ScholarCross RefCross Ref
  52. Florian Zaruba and Luca Benini. 2019. The cost of application-class processing: Energy and performance analysis of a linux-ready 1.7-GHz 64-Bit RISC-V core in 22-nm FDSOI technology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27 (2019).Google ScholarGoogle Scholar
  53. Haibo Zhang, Shulin Zhao, Ashutosh Pattnaik, Mahmut T. Kandemir, Anand Sivasubramaniam, and Chita R. Das. 2019. Distilling the essence of raw video to reduce memory usage and energy at edge devices. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, 657–669.Google ScholarGoogle Scholar
  54. Brian Zimmer, Seng Oon Toh, Huy Vo, Yunsup Lee, Olivier Thomas, Krste Asanovic, and Borivoje Nikolic. 2012. SRAM assist techniques for operation in a wide voltage range in 28-nm CMOS. IEEE Transactions on Circuits and Systems II: Express Briefs 59 (2012), 853–857.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. SEAMS: Self-Optimizing Runtime Manager for Approximate Memory Hierarchies

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!