Abstract
Memory approximation techniques are commonly limited in scope, targeting individual levels of the memory hierarchy. Existing approximation techniques for a full memory hierarchy determine optimal configurations at design-time provided a goal and application. Such policies are rigid: they cannot adapt to unknown workloads and must be redesigned for different memory configurations and technologies. We propose SEAMS: the first self-optimizing runtime manager for coordinating configurable approximation knobs across all levels of the memory hierarchy. SEAMS continuously updates and optimizes its approximation management policy throughout runtime for diverse workloads. SEAMS optimizes the approximate memory configuration to minimize energy consumption without compromising the quality threshold specified by application developers. SEAMS can (1) learn a policy at runtime to manage variable application quality of service (QoS) constraints, (2) automatically optimize for a target metric within those constraints, and (3) coordinate runtime decisions for interdependent knobs and subsystems. We demonstrate SEAMS’ ability to efficiently provide functions (1)–(3) on a RISC-V Linux platform with approximate memory segments in the on-chip cache and main memory. We demonstrate SEAMS’ ability to save up to 37% energy in the memory subsystem without any design-time overhead. We show SEAMS’ ability to reduce QoS violations by 75% with < 5% additional energy.
- Amin Ansari, Shuguang Feng, Shantanu Gupta, and Scott Mahlke. 2009. Enabling ultra low voltage system operation by tolerating on-chip cache failures. In Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design. Association for Computing Machinery, New York, NY, 307–310.Google Scholar
Digital Library
- Jason Ansel, Yee Lok Wong, Cy Chan, Marek Olszewski, Alan Edelman, and Saman Amarasinghe. 2011. Language and compiler support for auto-tuning variable-accuracy algorithms. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE Computer Society, 85–96.Google Scholar
Cross Ref
- F. Arnaud, A. Thean, M. Eller, M. Lipinski, Y. W. Teh, M. Ostermayr, K. Kang, N. S. Kim, K. Ohuchi, J. P. Han, et al. 2009. Competitive and cost effective high-k based 28nm CMOS technology for low power applications. In IEEE International Electron Devices Meeting (IEDM). IEEE, 1–4.Google Scholar
Cross Ref
- Woongki Baek and Trishul M. Chilimbi. 2010. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proceedings of Programming Language Design and Implementation. Association for Computing Machinery, New York, NY, 198–209.Google Scholar
- Jonathan Balkind, Katie Lim, Fei Gao, Jinzheng Tu, David Wentzlaff, Michael Schaffner, Florian Zaruba, and Luca Benini. 2019. OpenPiton+ Ariane: The first open-source, SMP Linux-booting RISC-V system scaling from one to many cores. In Third Workshop on Computer Architecture Research with RISC-V, CARRV. CARRV.Google Scholar
- John Canny. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8 (1986), 679–698.Google Scholar
Digital Library
- Trevor E. Carlson, Wim Heirman, Stijn Eyerman, Ibrahim Hur, and Lieven Eeckhout. 2014. An Evaluation of high-level mechanistic core models. ACM Transactions on Architecture and Code Optimization (TACO) 11 (2014), 1–25.Google Scholar
Digital Library
- Aaron Carroll, Gernot Heiser, et al. 2010. An analysis of power consumption in a smartphone. In Proceedings of Annual Technical Conference, Vol. 14. USENIX Association, 21.Google Scholar
- Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference. Association for Computing Machinery, New York, NY, 1–9.Google Scholar
Digital Library
- Kyungsang Cho, Yongjun Lee, Young H. Oh, Gyoo-cheol Hwang, and Jae W. Lee. 2014. eDRAM-based tiered-reliability memory with applications to low-power frame buffers. In Proceedings of International Symposium on Low Power Electronics and Design. Association for Computing Machinery, New York, NY, 333–338.Google Scholar
- Minki Cho, Jason Schlessman, Wayne Wolf, and Saibal Mukhopadhyay. 2009. Accuracy-aware SRAM: A reconfigurable low power SRAM architecture for mobile multimedia applications. In Asia and South Pacific Design Automation Conference.Google Scholar
Cross Ref
- Bryan Donyanavard, Tiago Mück, Amir M. Rahmani, Nikil Dutt, Armin Sadighi, Florian Maurer, and Andreas Herkersdorf. 2019. SOSA: Self-optimizing learning with self-adaptive control for hierarchical system-on-chip management. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, 685–698.Google Scholar
Digital Library
- Rudolf Eigenmann et al. 2017. Harnessing parallelism in multicore systems to expedite and improve function approximation. In Languages and Compilers for Parallel Computing. Springer International Publishing, Cham, 88–92.Google Scholar
- Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Architecture support for disciplined approximate programming. In Proceedings of the 17th international Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 301–312.Google Scholar
Digital Library
- Thomas Goldbrunner, Thomas Wild, and Andreas Herkersdorf. 2018. Memory access pattern profiling for streaming applications based on MATLAB models. In 28th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS’18). IEEE, 32–38.Google Scholar
Cross Ref
- Beayna Grigorian, Nazanin Farahpour, and Glenn Reinman. 2015. BRAINIAC: Bringing reliable accuracy into neurally-implemented approximate computing. In IEEE 21st International Symposium on High Performance Computer Architecture. IEEE, 615–626.Google Scholar
Cross Ref
- Jiajia Jiao. 2020. HEAP: A holistic error assessment framework for multiple approximations using probabilistic graphical models. Electronics 9 (2020), 373.Google Scholar
Cross Ref
- Matthias Jung, Éder Zulian, Deepak M. Mathew, Matthias Herrmann, Christian Brugger, Christian Weis, and Norbert Wehn. 2015. Omitting refresh: A case study for commodity and wide I/O DRAMs. In Proceedings of the 2015 International Symposium on Memory Systems. Association for Computing Machinery, New York, NY, 85–91.Google Scholar
Digital Library
- Skanda Koppula, Lois Orosa, A. Giray Yağlıkçı, Roknoddin Azizi, Taha Shahroodi, Konstantinos Kanellopoulos, and Onur Mutlu. 2019. EDEN: Enabling energy-efficient, high-performance deep neural network inference using approximate DRAM. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, 166–181.Google Scholar
Digital Library
- Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, 469–480.Google Scholar
- Song Liu, Karthik Pattabiraman, Thomas Moscibroda, and Benjamin G. Zorn. 2011. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating sSystems. Association for Computing Machinery, New York, NY, 213–224.Google Scholar
- Biswadip Maity, Bryan Donyanavard, and Nikil Dutt. 2020. Self-aware memory management for emerging energy-efficient architectures. In 2020 11th International Green and Sustainable Computing Workshops (IGSC’20). 1–8.Google Scholar
Cross Ref
- Biswadip Maity, Majid Shoushtari, Amir M. Rahmani, and Nikil Dutt. 2019. Self-adaptive memory approximation: A formal control theory approach. IEEE Embedded Systems Letters 12 (2019), 33–36.Google Scholar
Cross Ref
- Biswadip Maity, Majid Shoushtari, Amir M. Rahmani, and Nikil Dutt. 2019. Simulation infrastructure and system dynamics of quality configurable memory. CECS Technical Report 19-03.Google Scholar
- Mahmoud Masadeh, Osman Hasan, and Sofiene Tahar. 2019. Using machine learning for quality configurable approximate computing. In Design, Automation & Test in Europe Conference & Exhibition (DATE’19). IEEE, 1575–1578.Google Scholar
- Mahmoud Masadeh, Osman Hasan, and Sofiene Tahar. 2020. Machine learning-based self-compensating approximate computing. arXiv:2001.03783.Google Scholar
- Joshua San Miguel, Mario Badr, and Natalie Enright Jerger. 2014. Load value approximation. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 127–139.Google Scholar
Digital Library
- Sparsh Mittal. 2016. A survey of techniques for approximate computing. ACM Computing Surveys (CSUR) 48 (2016), 1–33.Google Scholar
Digital Library
- Kasra Moazzemi, Biswadip Maity, Saehanseul Yi, Amir M. Rahmani, and Nikil Dutt. 2019. HESSLE-FREE: Heterogeneous systems leveraging fuzzy control for runtime resource management. ACM Trans.actions on Embedded Computer Systems 18, 5s, Article 74 (2019).Google Scholar
- Amir Mahdi Hosseini Monazzah, Majid Shoushtari, Seyed Ghassem Miremadi, Amir M. Rahmani, and Nikil Dutt. 2017. QuARK: Quality-configurable approximate STT-MRAM cache by fine-grained tuning of reliability-energy knobs. In IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’17). IEEE, 1–6.Google Scholar
- Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, and Mark Oskin. 2015. SNNAP: Approximate computing on programmable SoCs via neural acceleration. In IEEE 21st International Symposium on High Performance Computer Architecture. IEEE, 603–614.Google Scholar
Cross Ref
- Moinuddin K. Qureshi, John Karidis, Michele Franceschini, Vijayalakshmi Srinivasan, Luis Lastras, and Bulent Abali. 2009. Enhancing lifetime and security of PCM-based main memory with start-gap wear leveling. In 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’09). Association for Computing Machinery, New York, NY, 14–23.Google Scholar
Digital Library
- A. Raha and V. Raghunathan. 2018. Approximating beyond the processor: Exploring full-system energy-accuracy tradeoffs in a smart camera system. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 12 (2018), 2884–2897.Google Scholar
Cross Ref
- A. Raha, S. Sutar, H. Jayakumar, and V. Raghunathan. 2017. Quality configurable approximate DRAM. IEEE Transactions on Computers 66, 7 (2017), 1172–1187.Google Scholar
Digital Library
- Michael Ringenburg, Adrian Sampson, Isaac Ackerman, Luis Ceze, and Dan Grossman. 2015. Monitoring and debugging the quality of results in approximate programs. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 399–411.Google Scholar
Digital Library
- Pooja Roy, Rajarshi Ray, Chundong Wang, and Weng Fai Wong. 2014. ASAC: Automatic sensitivity analysis for approximate computing. In Proceedings of the SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems. Association for Computing Machinery, New York, NY, 95–104.Google Scholar
Digital Library
- Felipe Sampaio, Muhammad Shafique, Bruno Zatt, Sergio Bampi, and Jörg Henkel. 2015. Approximation-aware Multi-Level Cells STT-RAM cache architecture. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’15). IEEE, 79–88.Google Scholar
Cross Ref
- Adrian Sampson, Jacob Nelson, Karin Strauss, and Luis Ceze. 2013. Approximate storage in solid-state memories. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, 25–36.Google Scholar
Digital Library
- Majid Shoushtari, Abbas BanaiyanMofrad, and Nikil Dutt. 2015. Exploiting partially-forgetful memories for approximate computing. IEEE Embedded Systems Letters 7 (2015), 19–22.Google Scholar
Digital Library
- Clinton W. Smullen, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In IEEE 17th International Symposium on High Performance Computer Architecture. IEEE, 50–61.Google Scholar
- Renée St. Amant, Amir Yazdanbakhsh, Jongse Park, Bradley Thwaites, Hadi Esmaeilzadeh, Arjang Hassibi, Luis Ceze, and Doug Burger. 2014. General-purpose code acceleration with limited-precision analog computation. In Proceedings of the 41st Annual International Symposium on Computer Architecuture. IEEE, 505–516.Google Scholar
Cross Ref
- Richard S. Sutton. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3 (1988), 9–44.Google Scholar
Digital Library
- Richard S. Sutton and Andrew G. Barto. 2018. Introduction to Reinforcement Learning. MIT Ppress, Cambridge, MA.Google Scholar
Digital Library
- Mohammad Taghi Teimoori, Muhammad Abdullah Hanif, Alireza Ejlali, and Muhammad Shafique. 2018. AdAM: Adaptive approximation management for the non-volatile memory hierarchies. In Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 785–790.Google Scholar
Cross Ref
- Radha Venkatagiri, Khalique Ahmed, Abdulrahman Mahmoud, Sasa Misailovic, Darko Marinov, Christopher W. Fletcher, and Sarita V. Adve. 2019. gem5-Approxilyzer: An open-source tool for application-level soft error analysis. In 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’19). IEEE, 214–221.Google Scholar
- Radha Venkatagiri, Abdulrahman Mahmoud, Siva Kumar Sastry Hari, and Sarita V. Adve. 2016. Approxilyzer: Towards a systematic framework for instruction-level approximate computing and its application to hardware resiliency. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, 1–14.Google Scholar
- Andrew Waterman, Yunsup Lee, David Patterson, and Krste Asanović. 2014. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2, Technical Report No. UCB/EECS-2014-54. University of California at Berkeley.Google Scholar
- Christopher J. C. H. Watkins and Peter Dayan. 1992. Q-learning. In Machine Learning, Vol. 8. Springer Science and Business Media LLC, 279–292.Google Scholar
Digital Library
- Chelsea C. White III and Douglas J. White. 1989. Markov decision processes. European Journal of Operational Research 39 (1989), 1–16.Google Scholar
Cross Ref
- Roohollah Yarmand, Mehdi Kamal, Ali Afzali-Kusha, and Massoud Pedram. 2019. DART: A framework for determining approximation levels in an approximable memory hierarchy. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28 (2019), 273–286.Google Scholar
Cross Ref
- Amir Yazdanbakhsh, Divya Mahajan, Hadi Esmaeilzadeh, and Pejman Lotfi-Kamran. 2017. AxBench: A multiplatform benchmark suite for approximate computing. IEEE Design & Test 34 (2017), 60–68.Google Scholar
Cross Ref
- Florian Zaruba and Luca Benini. 2019. The cost of application-class processing: Energy and performance analysis of a linux-ready 1.7-GHz 64-Bit RISC-V core in 22-nm FDSOI technology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27 (2019).Google Scholar
- Haibo Zhang, Shulin Zhao, Ashutosh Pattnaik, Mahmut T. Kandemir, Anand Sivasubramaniam, and Chita R. Das. 2019. Distilling the essence of raw video to reduce memory usage and energy at edge devices. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. Association for Computing Machinery, New York, NY, 657–669.Google Scholar
- Brian Zimmer, Seng Oon Toh, Huy Vo, Yunsup Lee, Olivier Thomas, Krste Asanovic, and Borivoje Nikolic. 2012. SRAM assist techniques for operation in a wide voltage range in 28-nm CMOS. IEEE Transactions on Circuits and Systems II: Express Briefs 59 (2012), 853–857.Google Scholar
Cross Ref
Index Terms
SEAMS: Self-Optimizing Runtime Manager for Approximate Memory Hierarchies
Recommendations
Quality-configurable memory hierarchy through approximation: special session
CASES '17: Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems CompanionThe memory subsystem is a major contributor to the overall performance and energy consumption of embedded computing platforms. The emergence of "killer" applications such as data-intensive recognition, mining, and synthesis (RMS) applications puts even ...
An integrated memory-disk system with buffering adapter and non-volatile memory
Next generation non-volatile memory devices are promising replacements for DRAM and Flash memories for mobile devices because of their energy efficiency and non-volatile characteristics. In this paper, we propose a new memory hierarchy system for next-...
Half-Wits: Software Techniques for Low-Voltage Probabilistic Storage on Microcontrollers with NOR Flash Memory
Special Section on Probabilistic Embedded ComputingThis work analyzes the stochastic behavior of writing to embedded flash memory at voltages lower than recommended by a microcontroller’s specifications in order to reduce energy consumption. Flash memory integrated within a microcontroller typically ...






Comments