Abstract
Energy harvesting creates an emerging intermittent computing paradigm but poses new challenges for sophisticated applications such as intermittent deep neural network (DNN) inference. Although model compression has adapted DNNs to resource-constrained devices, under intermittent power, compressed models will still experience multiple power failures during a single inference. Footprint-based approaches enable hardware-accelerated intermittent DNN inference by tracking footprints, independent of model computations, to indicate accelerator progress across power cycles. However, we observe that the extra overhead required to preserve progress indicators can severely offset the computation progress accumulated by intermittent DNN inference.
This work proposes the concept of model augmentation to adapt DNNs to intermittent devices. Our middleware stack, JAPARI, appends extra neural network components into a given DNN, to enable the accelerator to intrinsically integrate progress indicators into the inference process, without affecting model accuracy. Their specific positions allow progress indicator preservation to be piggybacked onto output feature preservation to amortize the extra overhead, and their assigned values ensure uniquely distinguishable progress indicators for correct inference recovery upon power resumption. Evaluations on a Texas Instruments device under various DNN models, capacitor sizes, and progress preservation granularities show that JAPARI can speed up intermittent DNN inference by 3× over the state of the art, for common convolutional neural architectures that require heavy acceleration.
- [1] Infineon. 2019. Excelon LP 8-Mbit SPI F-RAM. Retrieved February 17, 2022 from https://www.cypress.com/file/444186/download.Google Scholar
- [2] . 2019. The betrayal of constant power × time: Finding the missing Joules of transiently-powered computers. In Proc. of ACM LCTES. 97–109.Google Scholar
- [3] . 2019. Efficient intermittent computing with differential checkpointing. In Proc. of ACM LCTES. 70–81.Google Scholar
- [4] . 2021. State-of-charge estimation of supercapacitors in transiently-powered sensor nodes. IEEE TCAD 41, 2 (2021), 225–237.Google Scholar
- [5] . 2017. Low-memory GEMM -based convolution algorithms for deep neural networks. arXiv preprint arXiv:1709.03395 (2017).Google Scholar
- [6] . 2013. A Public Domain Dataset for Human Activity Recognition Using Smartphones. Retrieved February 17, 2022 from https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones.Google Scholar
- [7] . 2021. Cortex-M4 Processor Mixed-Signal MCUs with DSP and FPU Instructions. Retrieved February 17, 2022 from https://www.st.com/en/microcontrollers-microprocessors/stm32g4-series.html.Google Scholar
- [8] . 2016. Hibernus++: A self-calibrating and adaptive system for transiently-powered embedded devices. IEEE TCAD 35, 12 (2016), 1968–1980.Google Scholar
- [9] . 2020. Benchmarking TinyML systems: Challenges and direction. arXiv preprint arXiv:2003.04821 (2020).Google Scholar
- [10] . 2018. Sytare: A lightweight kernel for NVRAM-based transiently-powered systems. IEEE TC 68, 9 (2018), 1390–1403.Google Scholar
- [11] . 2010. A dynamically configurable coprocessor for convolutional neural networks. In Proc. of ACM/IEEE ISCA. 247–257.Google Scholar
- [12] . 2020. Enabling failure-resilient intermittent systems without runtime checkpointing. IEEE TCAD 39, 12 (2020), 4399–4412.Google Scholar
- [13] . 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In Proc. of IEEE ISSCC. 262–263.Google Scholar
- [14] . 2016. Chain: Tasks and channels for reliable intermittent programs. In Proc. of ACM OOPSLA. 514–530.Google Scholar
- [15] . 2018. A reconfigurable energy storage architecture for energy-harvesting devices. In Proc. of ACM ASPLOS. 767–781.Google Scholar
- [16] . 2021. ECM3532 Neural Sensor Processor. Retrieved February 17, 2022 from https://media.digikey.com/pdf/DataSheets/EtaComputePDFs/ECM3532_AI_Sensor_PB_1.0.pdf.Google Scholar
- [17] . 2010. IXOLAR High Efficiency Solar Cell. Retrieved February 17, 2022 from http://ixapps.ixys.com/DataSheet/SM111K04L.pdf.Google Scholar
- [18] . 2020. PSoC 62 MCU. Retrieved February 17, 2022 from https://www.cypress.com/products/32-bit-arm-cortex-m4-cortex-m0-psoc-6.Google Scholar
- [19] . 1997. Hardware/software co-design. Proc. IEEE 85, 3 (1997), 349–365.Google Scholar
Cross Ref
- [20] . 2020. Battery-free Game Boy. Proc. ACM IMWUT 4, 3 (2020), Article 111, 34 pages.Google Scholar
- [21] . 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proc. of ACM/IEEE ISCA. 92–104.Google Scholar
- [22] . 2019. SpArSe: Sparse architecture search for CNNs on resource-constrained microcontrollers. In Proc. of NeurIPS.Google Scholar
- [23] . 2017. Near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices. IEEE TVLSI 25, 10 (2017), 2700–2713.Google Scholar
- [24] . 2019. Intelligence beyond the edge: Inference on intermittent embedded systems. In Proc. of ACM ASPLOS. 199–213.Google Scholar
- [25] . 2019. MANIC: A vector-dataflow architecture for ultra-low-power embedded systems. In Proc. of ACM/IEEE MICRO. 670–684.Google Scholar
- [26] . 2019. SparTen: A sparse tensor accelerator for convolutional neural networks. In Proc. of IEEE/ACM Micro. 151–165.Google Scholar
- [27] . 2020. Example CNN to Classify CIFAR-10 Using TensorFlow. Retrieved February 17, 2022 from https://www.tensorflow.org/tutorials/images/cnn.Google Scholar
- [28] . 2017. Angel-Eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE TCAD 37, 1 (2017), 35–47.Google Scholar
- [29] . 2016. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. In Proc. of ICLR.Google Scholar
- [30] . 2017. New directions: The future of sensing is batteryless, intermittent, and awesome. In Proc. of ACM SenSys. 1–6.Google Scholar
- [31] . 2017. CNN Model for Human Activity Recognition. Retrieved February 17, 2022 from https://github.com/healthDataScience/deep-learning-HAR.Google Scholar
- [32] . 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 [cs] (2017).Google Scholar
- [33] . 2014. EnergyTrace Technology for MSP430. Retrieved February 17, 2022 from http://www.ti.com/tool/ENERGYTRACE.Google Scholar
- [34] . 2016. Benchmarking the Signal Processing Capabilities of the LEA on MSP430 MCUs. Retrieved February 17, 2022 from http://www.tij.co.jp/jp/lit/an/slaa698b/slaa698b.pdf.Google Scholar
- [35] . 2016. Low-Energy Accelerator (LEA). Retrieved February 17, 2022 from http://www.ti.com/lit/an/slaa720/slaa720.pdf.Google Scholar
- [36] . 2018. MSP430FR5994 MCU. Retrieved February 17, 2022 from http://www.ti.com/product/MSP430FR5994.Google Scholar
- [37] . 2018. MSP430x5xx and MSP430x6xx—DMA Controller Module. Retrieved February 17, 2022 from https://www.ti.com/lit/ug/slau395f/slau395f.pdf.Google Scholar
- [38] . 2019. BQ25504 Ultra Low Power Boost Converter with Battery Management for Energy Harvester. Retrieved February 17, 2022 from http://www.ti.com/product/BQ25504.Google Scholar
- [39] . 2021. MAX78000 Ultra-Low-Power MCU with Arm Cortex-M4 and a Convolutional Neural Network Accelerator. Retrieved February 17, 2022 from https://datasheets.maximintegrated.com/en/ds/MAX78000.pdf.Google Scholar
- [40] . 2014. Powering the Internet of Things. In Proc. of ACM/IEEE ISLPED. 375–380.Google Scholar
- [41] . 2017. Energy-aware memory mapping for hybrid FRAM-SRAM MCUs in intermittently-powered IoT devices. ACM TECS 16, 3 (2017), Article 65, 23 pages.Google Scholar
- [42] . 2019. Accuracy vs. efficiency: Achieving both through FPGA-implementation aware neural architecture search. In Proc. of ACM/IEEE DAC.Google Scholar
- [43] . 2020. Everything leaves footprints: Hardware accelerated intermittent deep inference. IEEE TCAD 39, 11 (2020), 3479–3491.Google Scholar
- [44] . 2020. HAWAII Open Source Project. Retrieved February 17, 2022 from https://github.com/EMCLab-Sinica/HAWAII_Project.Google Scholar
- [45] . 2021. JAPARI Open Source Project. Retrieved February 17, 2022 from https://github.com/EMCLab-Sinica/JAPARI.Google Scholar
- [46] . 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342 (2018).Google Scholar
- [47] . 2009. Learning Multiple Layers of Features from Tiny Images.
Technical Report . University of Toronto.Google Scholar - [48] . 2019. SqueezeFlow: A sparse CNN accelerator exploiting concise convolution rules. IEEE TC 68 (2019), 1663–1677.Google Scholar
Digital Library
- [49] . 2015. Ambient energy harvesting nonvolatile processors: From circuit to system. In Proc. of ACM/IEEE DAC. Article 150, 6 pages.Google Scholar
- [50] . 2015. Semantic image segmentation via deep parsing network. In Proc. of IEEE ICCV. 1377–1385.Google Scholar
- [51] . 2018. Rethinking the value of network pruning. In Proc. of ICLR.Google Scholar
- [52] . 2015. Nonvolatile processor architecture exploration for energy-harvesting applications. IEEE Micro 35, 5 (2015), 32–40.Google Scholar
Digital Library
- [53] . 2017. Dynamic power and energy management for energy harvesting nonvolatile processor systems. ACM TECS 16, 4 (2017), 1–23.Google Scholar
Digital Library
- [54] . 2017. Alpaca: Intermittent execution without checkpoints. In Proc. of ACM OOPSLA. Article 96, 30 pages.Google Scholar
- [55] . 2019. Supporting peripherals in intermittent systems with just-in-time checkpoints. In Proc. of ACM PLDI. 1101–1116.Google Scholar
- [56] . 2019. Accumulative display updating for intermittent systems. ACM TECS 18, 5s (2019), Article 72, 22 pages. Google Scholar
- [57] . 2015. Towards self-powered cameras. In Proc. of IEEE ICCP. 1–10.Google Scholar
- [58] . 2020. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In Proc. of ASPLOS. 907–922.Google Scholar
- [59] . 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proc. of ACM/IEEE ISCA. 27–40.Google Scholar
- [60] . 2011. Mementos: System support for long-running computation on RFID-scale devices. In Proc. of ACM ASPLOS. 159–170.Google Scholar
- [61] . 2019. BitBlade: Area and energy-efficient precision-scalable neural network accelerator with bitwise summation. In Proc. of ACM/IEEE DAC. 1–6.Google Scholar
- [62] . 2018. Bit Fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In Proc. of ISCA. 764–775.Google Scholar
- [63] . 2017. Maximizing CNN accelerator efficiency through resource partitioning. In Proc. of ACM/IEEE ISCA. 535–547.Google Scholar
- [64] . 2021. STM32G4 Series Mixed-Signal MCUs with DSP and FPU Instructions. Retrieved February 17, 2022 from https://www.st.com/en/microcontrollers-microprocessors/stm32g4-series.html.Google Scholar
- [65] . 2017. Efficient processing of deep neural networks: A tutorial and survey. IEEE PIEEE 105, 12 (2017), 2295–2329.Google Scholar
Cross Ref
- [66] . 2017. Battery-free cellphone. Proc. ACM IMWUT 1, 2 (2017), 1–20.Google Scholar
- [67] . 2018. GAP-8 IoT Application Processor with a Hardware Convolution Engine. Retrieved February 17, 2022 from https://greenwaves-technologies.com/gap8_gap9/.Google Scholar
- [68] . 2017. Soft weight-sharing for neural network compression. arXiv preprint arXiv:1702.04008 (2017).Google Scholar
- [69] . 2019. HAQ: Hardware-aware automated quantization with mixed precision. In Proc. of IEEE CVPR. 8612–8620.Google Scholar
- [70] . 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).Google Scholar
- [71] . 2017. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proc. of IEEE CVPR. 5687–5695.Google Scholar
- [72] . 2020. Intermittent inference with nonuniformly compressed multi-exit neural network for energy harvesting powered devices. In Proc. of ACM/IEEE DAC. 1–6.Google Scholar
- [73] . 2018. A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE JSSC 53 (2018), 968–982.Google Scholar
Cross Ref
- [74] . 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In Proc. of ACM FPGA. 161–170. Google Scholar
- [75] . 2017. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017).Google Scholar
Index Terms
More Is Less: Model Augmentation for Intermittent Deep Inference
Recommendations
Intermittent-Aware Neural Architecture Search
Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021The increasing paradigm shift towards intermittent computing has made it possible to intermittently execute deep neural network (DNN) inference on edge devices powered by ambient energy. Recently, neural architecture search (NAS) techniques have achieved ...
Fine-grained Hardware Acceleration for Efficient Batteryless Intermittent Inference on the Edge
Backing up the intermediate results of hardware-accelerated deep inference is crucial to ensure the progress of execution on batteryless computing platforms. However, hardware accelerators in low-power AI platforms only support the one-shot atomic ...
Open Set Deep Learning with A Bayesian Nonparametric Generative Model
MM '19: Proceedings of the 27th ACM International Conference on MultimediaBeing a widely studied model in machine learning and multimedia community, Deep Neural Network (DNN) has achieved an encouraging success in various applications. However, conventional DNN suffers the difficulty when handling the open set learning ...






Comments