skip to main content
research-article

More Is Less: Model Augmentation for Intermittent Deep Inference

Published:08 October 2022Publication History
Skip Abstract Section

Abstract

Energy harvesting creates an emerging intermittent computing paradigm but poses new challenges for sophisticated applications such as intermittent deep neural network (DNN) inference. Although model compression has adapted DNNs to resource-constrained devices, under intermittent power, compressed models will still experience multiple power failures during a single inference. Footprint-based approaches enable hardware-accelerated intermittent DNN inference by tracking footprints, independent of model computations, to indicate accelerator progress across power cycles. However, we observe that the extra overhead required to preserve progress indicators can severely offset the computation progress accumulated by intermittent DNN inference.

This work proposes the concept of model augmentation to adapt DNNs to intermittent devices. Our middleware stack, JAPARI, appends extra neural network components into a given DNN, to enable the accelerator to intrinsically integrate progress indicators into the inference process, without affecting model accuracy. Their specific positions allow progress indicator preservation to be piggybacked onto output feature preservation to amortize the extra overhead, and their assigned values ensure uniquely distinguishable progress indicators for correct inference recovery upon power resumption. Evaluations on a Texas Instruments device under various DNN models, capacitor sizes, and progress preservation granularities show that JAPARI can speed up intermittent DNN inference by 3× over the state of the art, for common convolutional neural architectures that require heavy acceleration.

REFERENCES

  1. [1] Infineon. 2019. Excelon LP 8-Mbit SPI F-RAM. Retrieved February 17, 2022 from https://www.cypress.com/file/444186/download.Google ScholarGoogle Scholar
  2. [2] Ahmed Saad, Bakar Abu, Bhatti Naveed Anwar, Alizai Muhammad Hamad, Siddiqui Junaid Haroon, and Mottola Luca. 2019. The betrayal of constant power × time: Finding the missing Joules of transiently-powered computers. In Proc. of ACM LCTES. 97109.Google ScholarGoogle Scholar
  3. [3] Ahmed Saad, Bhatti Naveed Anwar, Alizai Muhammad Hamad, Siddiqui Junaid Haroon, and Mottola Luca. 2019. Efficient intermittent computing with differential checkpointing. In Proc. of ACM LCTES. 7081.Google ScholarGoogle Scholar
  4. [4] Ahn Jun Ick, Kim Daeyong, Ha Rhan, and Cha Hojung. 2021. State-of-charge estimation of supercapacitors in transiently-powered sensor nodes. IEEE TCAD 41, 2 (2021), 225237.Google ScholarGoogle Scholar
  5. [5] Anderson Andrew, Vasudevan Aravind, Keane Cormac, and Gregg David. 2017. Low-memory GEMM -based convolution algorithms for deep neural networks. arXiv preprint arXiv:1709.03395 (2017).Google ScholarGoogle Scholar
  6. [6] Anguita Davide, Ghio Alessandro, Oneto Luca, Parra Xavier, and Reyes-Ortiz Jorge L.. 2013. A Public Domain Dataset for Human Activity Recognition Using Smartphones. Retrieved February 17, 2022 from https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones.Google ScholarGoogle Scholar
  7. [7] Arm. 2021. Cortex-M4 Processor Mixed-Signal MCUs with DSP and FPU Instructions. Retrieved February 17, 2022 from https://www.st.com/en/microcontrollers-microprocessors/stm32g4-series.html.Google ScholarGoogle Scholar
  8. [8] Balsamo Domenico, Weddell Alex S., Das Anup, Arreola Alberto Rodriguez, Brunelli Davide, Al-Hashimi Bashir M., Merrett Geoff V., and Benini Luca. 2016. Hibernus++: A self-calibrating and adaptive system for transiently-powered embedded devices. IEEE TCAD 35, 12 (2016), 19681980.Google ScholarGoogle Scholar
  9. [9] Banbury Colby R., Reddi Vijay Janapa, Lam Max, Fu William, Fazel Amin, Holleman Jeremy, Huang Xinyuan, et al. 2020. Benchmarking TinyML systems: Challenges and direction. arXiv preprint arXiv:2003.04821 (2020).Google ScholarGoogle Scholar
  10. [10] Berthou Gautier, Delizy Tristan, Marquet Kevin, Risset Tanguy, and Salagnac Guillaume. 2018. Sytare: A lightweight kernel for NVRAM-based transiently-powered systems. IEEE TC 68, 9 (2018), 13901403.Google ScholarGoogle Scholar
  11. [11] Chakradhar Srimat, Sankaradas Murugan, Jakkula Venkata, and Cadambi Srihari. 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proc. of ACM/IEEE ISCA. 247257.Google ScholarGoogle Scholar
  12. [12] Chen Wei-Ming, Kuo Tei-Wei, and Hsiu Pi-Cheng. 2020. Enabling failure-resilient intermittent systems without runtime checkpointing. IEEE TCAD 39, 12 (2020), 43994412.Google ScholarGoogle Scholar
  13. [13] Chen Yu-Hsin, Krishna Tushar, Emer Joel, and Sze Vivienne. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In Proc. of IEEE ISSCC. 262263.Google ScholarGoogle Scholar
  14. [14] Colin Alexei and Lucia Brandon. 2016. Chain: Tasks and channels for reliable intermittent programs. In Proc. of ACM OOPSLA. 514530.Google ScholarGoogle Scholar
  15. [15] Colin Alexei, Ruppel Emily, and Lucia Brandon. 2018. A reconfigurable energy storage architecture for energy-harvesting devices. In Proc. of ACM ASPLOS. 767781.Google ScholarGoogle Scholar
  16. [16] Compute Eta. 2021. ECM3532 Neural Sensor Processor. Retrieved February 17, 2022 from https://media.digikey.com/pdf/DataSheets/EtaComputePDFs/ECM3532_AI_Sensor_PB_1.0.pdf.Google ScholarGoogle Scholar
  17. [17] Corporation IXYS. 2010. IXOLAR High Efficiency Solar Cell. Retrieved February 17, 2022 from http://ixapps.ixys.com/DataSheet/SM111K04L.pdf.Google ScholarGoogle Scholar
  18. [18] Cypress. 2020. PSoC 62 MCU. Retrieved February 17, 2022 from https://www.cypress.com/products/32-bit-arm-cortex-m4-cortex-m0-psoc-6.Google ScholarGoogle Scholar
  19. [19] Michell G. De and Gupta Rajesh K.. 1997. Hardware/software co-design. Proc. IEEE 85, 3 (1997), 349365.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Winkel Jasper de, Kortbeek Vito, Hester Josiah, and Pawełczak Przemysław. 2020. Battery-free Game Boy. Proc. ACM IMWUT 4, 3 (2020), Article 111, 34 pages.Google ScholarGoogle Scholar
  21. [21] Du Zidong, Fasthuber Robert, Chen Tianshi, Ienne Paolo, Li Ling, Luo Tao, Feng Xiaobing, Chen Yunji, and Temam Olivier. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proc. of ACM/IEEE ISCA. 92104.Google ScholarGoogle Scholar
  22. [22] Fedorov Igor, Adams Ryan P., Mattina Matthew, and Whatmough Paul N.. 2019. SpArSe: Sparse architecture search for CNNs on resource-constrained microcontrollers. In Proc. of NeurIPS.Google ScholarGoogle Scholar
  23. [23] Gautschi Michael, Schiavone Pasquale Davide, Traber Andreas, Loi Igor, Pullini Antonio, Rossi Davide, Flamand Eric, Gürkaynak Frank K., and Benini Luca. 2017. Near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices. IEEE TVLSI 25, 10 (2017), 27002713.Google ScholarGoogle Scholar
  24. [24] Gobieski Graham, Beckmann Nathan, and Lucia Brandon. 2019. Intelligence beyond the edge: Inference on intermittent embedded systems. In Proc. of ACM ASPLOS. 199213.Google ScholarGoogle Scholar
  25. [25] Gobieski Graham, Nagi Amolak, Serafin Nathan, Isgenc Mehmet Meric, Beckmann Nathan, and Lucia Brandon. 2019. MANIC: A vector-dataflow architecture for ultra-low-power embedded systems. In Proc. of ACM/IEEE MICRO. 670684.Google ScholarGoogle Scholar
  26. [26] Gondimalla Ashish, Chesnut Noah, Thottethodi Mithuna, and Vijaykumar T. N.. 2019. SparTen: A sparse tensor accelerator for convolutional neural networks. In Proc. of IEEE/ACM Micro. 151165.Google ScholarGoogle Scholar
  27. [27] Google. 2020. Example CNN to Classify CIFAR-10 Using TensorFlow. Retrieved February 17, 2022 from https://www.tensorflow.org/tutorials/images/cnn.Google ScholarGoogle Scholar
  28. [28] Guo Kaiyuan, Sui Lingzhi, Qiu Jiantao, Yu Jincheng, Wang Junbin, Yao Song, Han Song, Wang Yu, and Yang Huazhong. 2017. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE TCAD 37, 1 (2017), 3547.Google ScholarGoogle Scholar
  29. [29] Han Song, Mao Huizi, and Dally William J.. 2016. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. In Proc. of ICLR.Google ScholarGoogle Scholar
  30. [30] Hester Josiah and Sorber Jacob. 2017. New directions: The future of sensing is batteryless, intermittent, and awesome. In Proc. of ACM SenSys. 16.Google ScholarGoogle Scholar
  31. [31] Himmetoglu Burak. 2017. CNN Model for Human Activity Recognition. Retrieved February 17, 2022 from https://github.com/healthDataScience/deep-learning-HAR.Google ScholarGoogle Scholar
  32. [32] Howard Andrew G., Zhu Menglong, Chen Bo, Kalenichenko Dmitry, Wang Weijun, Weyand Tobias, Andreetto Marco, and Adam Hartwig. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 [cs] (2017).Google ScholarGoogle Scholar
  33. [33] Instruments Texas. 2014. EnergyTrace Technology for MSP430. Retrieved February 17, 2022 from http://www.ti.com/tool/ENERGYTRACE.Google ScholarGoogle Scholar
  34. [34] Instruments Texas. 2016. Benchmarking the Signal Processing Capabilities of the LEA on MSP430 MCUs. Retrieved February 17, 2022 from http://www.tij.co.jp/jp/lit/an/slaa698b/slaa698b.pdf.Google ScholarGoogle Scholar
  35. [35] Instruments Texas. 2016. Low-Energy Accelerator (LEA). Retrieved February 17, 2022 from http://www.ti.com/lit/an/slaa720/slaa720.pdf.Google ScholarGoogle Scholar
  36. [36] Instruments Texas. 2018. MSP430FR5994 MCU. Retrieved February 17, 2022 from http://www.ti.com/product/MSP430FR5994.Google ScholarGoogle Scholar
  37. [37] Instruments Texas. 2018. MSP430x5xx and MSP430x6xx—DMA Controller Module. Retrieved February 17, 2022 from https://www.ti.com/lit/ug/slau395f/slau395f.pdf.Google ScholarGoogle Scholar
  38. [38] Instruments Texas. 2019. BQ25504 Ultra Low Power Boost Converter with Battery Management for Energy Harvester. Retrieved February 17, 2022 from http://www.ti.com/product/BQ25504.Google ScholarGoogle Scholar
  39. [39] Integrated Maxim. 2021. MAX78000 Ultra-Low-Power MCU with Arm Cortex-M4 and a Convolutional Neural Network Accelerator. Retrieved February 17, 2022 from https://datasheets.maximintegrated.com/en/ds/MAX78000.pdf.Google ScholarGoogle Scholar
  40. [40] Jayakumar Hrishikesh, Lee Kangwoo, Lee Woo Suk, Raha Arnab, Kim Younghyun, and Raghunathan Vijay. 2014. Powering the Internet of Things. In Proc. of ACM/IEEE ISLPED. 375380.Google ScholarGoogle Scholar
  41. [41] Jayakumar Hrishikesh, Raha Arnab, Stevens Jacob R., and Raghunathan Vijay. 2017. Energy-aware memory mapping for hybrid FRAM-SRAM MCUs in intermittently-powered IoT devices. ACM TECS 16, 3 (2017), Article 65, 23 pages.Google ScholarGoogle Scholar
  42. [42] Jiang Weiwen, Zhang Xinyi, Sha Edwin H.-M., Yang Lei, Zhuge Qingfeng, Shi Yiyu, and Hu Jingtong. 2019. Accuracy vs. efficiency: Achieving both through FPGA-implementation aware neural architecture search. In Proc. of ACM/IEEE DAC.Google ScholarGoogle Scholar
  43. [43] Kang Chih-Kai, Mendis Hashan Roshantha, Lin Chun-Han, Chen Ming-Syan, and Hsiu Pi-Cheng. 2020. Everything leaves footprints: Hardware accelerated intermittent deep inference. IEEE TCAD 39, 11 (2020), 34793491.Google ScholarGoogle Scholar
  44. [44] Kang Chih-Kai, Mendis Hashan Roshantha, Lin Chun-Han, Chen Ming-Syan, and Hsiu Pi-Cheng. 2020. HAWAII Open Source Project. Retrieved February 17, 2022 from https://github.com/EMCLab-Sinica/HAWAII_Project.Google ScholarGoogle Scholar
  45. [45] Kang Chih-Kai, Mendis Hashan Roshantha, Lin Chun-Han, Chen Ming-Syan, and Hsiu Pi-Cheng. 2021. JAPARI Open Source Project. Retrieved February 17, 2022 from https://github.com/EMCLab-Sinica/JAPARI.Google ScholarGoogle Scholar
  46. [46] Krishnamoorthi Raghuraman. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).Google ScholarGoogle Scholar
  47. [47] Krizhevsky Alex and Hinton Geoffrey. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto.Google ScholarGoogle Scholar
  48. [48] Li Jiajun, Jiang Shuhao, Gong Shijun, Wu Jingya, Yan Junchao, Yan Guihai, and Li Xiaowei. 2019. SqueezeFlow: A sparse CNN accelerator exploiting concise convolution rules. IEEE TC 68 (2019), 16631677.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Liu Yongpan, Li Zewei, Li Hehe, Wang Yiqun, Li Xueqing, Ma Kaisheng, Li Shuangchen, et al. 2015. Ambient energy harvesting nonvolatile processors: From circuit to system. In Proc. of ACM/IEEE DAC. Article 150, 6 pages.Google ScholarGoogle Scholar
  50. [50] Liu Ziwei, Li Xiaoxiao, Luo Ping, Loy Chen-Change, and Tang Xiaoou. 2015. Semantic image segmentation via deep parsing network. In Proc. of IEEE ICCV. 13771385.Google ScholarGoogle Scholar
  51. [51] Liu Zhuang, Sun Mingjie, Zhou Tinghui, Huang Gao, and Darrell Trevor. 2018. Rethinking the value of network pruning. In Proc. of ICLR.Google ScholarGoogle Scholar
  52. [52] Ma Kaisheng, Li Xueqing, Li Shuangchen, Liu Yongpan, Sampson John Jack, Xie Yuan, and Narayanan Vijaykrishnan. 2015. Nonvolatile processor architecture exploration for energy-harvesting applications. IEEE Micro 35, 5 (2015), 3240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Ma Kaisheng, Li Xueqing, Liu Huichu, Sheng Xiao, Wang Yiqun, Swaminathan Karthik, Liu Yongpan, Xie Yuan, Sampson John, and Narayanan Vijaykrishnan. 2017. Dynamic power and energy management for energy harvesting nonvolatile processor systems. ACM TECS 16, 4 (2017), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Maeng Kiwan, Colin Alexei, and Lucia Brandon. 2017. Alpaca: Intermittent execution without checkpoints. In Proc. of ACM OOPSLA. Article 96, 30 pages.Google ScholarGoogle Scholar
  55. [55] Maeng Kiwan and Lucia Brandon. 2019. Supporting peripherals in intermittent systems with just-in-time checkpoints. In Proc. of ACM PLDI. 11011116.Google ScholarGoogle Scholar
  56. [56] Mendis Hashan Roshantha and Hsiu Pi-Cheng. 2019. Accumulative display updating for intermittent systems. ACM TECS 18, 5s (2019), Article 72, 22 pages. Google ScholarGoogle Scholar
  57. [57] Nayar Shree K., Sims Daniel C., and Fridberg Mikhail. 2015. Towards self-powered cameras. In Proc. of IEEE ICCP. 110.Google ScholarGoogle Scholar
  58. [58] Niu Wei, Ma Xiaolong, Lin Sheng, Wang Shihao, Qian Xuehai, Lin Xue, Wang Yanzhi, and Ren Bin. 2020. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In Proc. of ASPLOS. 907922.Google ScholarGoogle Scholar
  59. [59] Parashar Angshuman, Rhu Minsoo, Mukkara Anurag, Puglielli Antonio, Venkatesan Rangharajan, Khailany Brucek, Emer Joel, Keckler Stephen W., and Dally William J.. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proc. of ACM/IEEE ISCA. 2740.Google ScholarGoogle Scholar
  60. [60] Ransford Benjamin, Sorber Jacob, and Fu Kevin. 2011. Mementos: System support for long-running computation on RFID-scale devices. In Proc. of ACM ASPLOS. 159170.Google ScholarGoogle Scholar
  61. [61] Ryu Sungju, Kim Hyungjun, Yi Wooseok, and Kim Jae-Joon. 2019. BitBlade: Area and energy-efficient precision-scalable neural network accelerator with bitwise summation. In Proc. of ACM/IEEE DAC. 16.Google ScholarGoogle Scholar
  62. [62] Sharma Hardik, Park Jongse, Suda Naveen, Lai Liangzhen, Chau Benson, Chandra Vikas, and Esmaeilzadeh Hadi. 2018. Bit Fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In Proc. of ISCA. 764775.Google ScholarGoogle Scholar
  63. [63] Shen Yongming, Ferdman Michael, and Milder Peter. 2017. Maximizing CNN accelerator efficiency through resource partitioning. In Proc. of ACM/IEEE ISCA. 535547.Google ScholarGoogle Scholar
  64. [64] STMicroelectronics. 2021. STM32G4 Series Mixed-Signal MCUs with DSP and FPU Instructions. Retrieved February 17, 2022 from https://www.st.com/en/microcontrollers-microprocessors/stm32g4-series.html.Google ScholarGoogle Scholar
  65. [65] Sze Vivienne, Chen Yu-Hsin, Yang Tien-Ju, and Emer Joel S.. 2017. Efficient processing of deep neural networks: A tutorial and survey. IEEE PIEEE 105, 12 (2017), 22952329.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Talla Vamsi, Kellogg Bryce, Gollakota Shyamnath, and Smith Joshua R.. 2017. Battery-free cellphone. Proc. ACM IMWUT 1, 2 (2017), 1–20.Google ScholarGoogle Scholar
  67. [67] Technologies Greenwaves. 2018. GAP-8 IoT Application Processor with a Hardware Convolution Engine. Retrieved February 17, 2022 from https://greenwaves-technologies.com/gap8_gap9/.Google ScholarGoogle Scholar
  68. [68] Ullrich Karen, Meeds Edward, and Welling Max. 2017. Soft weight-sharing for neural network compression. arXiv preprint arXiv:1702.04008 (2017).Google ScholarGoogle Scholar
  69. [69] Wang Kuan, Liu Zhijian, Lin Yujun, Lin Ji, and Han Song. 2019. HAQ: Hardware-aware automated quantization with mixed precision. In Proc. of IEEE CVPR. 86128620.Google ScholarGoogle Scholar
  70. [70] Warden Pete. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).Google ScholarGoogle Scholar
  71. [71] Yang Tien-Ju, Chen Yu-Hsin, and Sze Vivienne. 2017. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proc. of IEEE CVPR. 56875695.Google ScholarGoogle Scholar
  72. [72] Yawen Wu, Zhepeng Wang, Zhenge Jia, Yiyu Shi, and Jingtong Hu. 2020. Intermittent inference with nonuniformly compressed multi-exit neural network for energy harvesting powered devices. In Proc. of ACM/IEEE DAC. 16.Google ScholarGoogle Scholar
  73. [73] Yin Shouyi, Ouyang Peng, Tang Shibin, Tu Fengbin, Li Xiudong, Zheng Shixuan, Lu Tianyi, Gu Jiangyuan, Liu Lingling, and Wei Shaojun. 2018. A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE JSSC 53 (2018), 968982.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Zhang Chen, Li Peng, Sun Guangyu, Guan Yijin, Xiao Bingjun, and Cong Jason. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proc. of ACM FPGA. 161170. Google ScholarGoogle Scholar
  75. [75] Zhang Yundong, Suda Naveen, Lai Liangzhen, and Chandra Vikas. 2017. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017).Google ScholarGoogle Scholar

Index Terms

  1. More Is Less: Model Augmentation for Intermittent Deep Inference

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 21, Issue 5
        September 2022
        526 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3561947
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 October 2022
        • Online AM: 26 January 2022
        • Accepted: 15 December 2021
        • Revised: 11 November 2021
        • Received: 7 July 2021
        Published in tecs Volume 21, Issue 5

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!