Abstract
Traditionally, IoT devices send collected sensor data to an intelligent cloud where machine learning (ML) inference happens. However, this course is rapidly changing and there is a recent trend to run ML on the edge IoT devices themselves. An intelligent edge is attractive because it saves network round trip (efficiency) and keeps user data at the source (privacy). However, the IoT devices are much more resource constrained than the cloud, which makes running ML on them challenging. Specifically, consider Arduino Uno, a commonly used board, that has 2KB of RAM and 32KB of read-only Flash memory. Although recent breakthroughs in ML have created novel recurrent neural network (RNN) models that provide good accuracy with KB-sized models, deploying them on tiny devices with such hard memory requirements has remained elusive.
We provide, Shiftry, an automatic compiler from high-level floating-point ML models to fixed-point C-programs with 8-bit and 16-bit integers, which have significantly lower memory requirements. For this conversion, Shiftry uses a data-driven float-to-fixed procedure and a RAM management mechanism. These techniques enable us to provide first empirical evaluation of RNNs running on tiny edge devices. On simpler ML models that prior work could handle, Shiftry-generated code has lower latency and higher accuracy.
Supplemental Material
- Sridhar Gopinath, Nikhil Ghanathe, Vivek Seshadri, and Rahul Sharma. 2019a. Compiling KB-Sized Machine Learning Models to Tiny IoT Devices. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Association for Computing Machinery, New York, NY, USA, 79-95. https://www.microsoft.com/ en-us/research/uploads/prod/2018/10/pldi19-SeeDot.pdfGoogle Scholar
Digital Library
- Sridhar Gopinath, Nikhil Ghanathe, Vivek Seshadri, and Rahul Sharma. 2019b. Microsoft EdgeML Repository. https: //github.com/microsoft/EdgeML/tree/51c5ae0b81ab259f3bcab12741d7b489b7fe49deGoogle Scholar
- Denis A. Gudovskiy and Luca Rigazio. 2017. ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks. CoRR abs/1706.02393 ( 2017 ). https://www.researchgate.net/publication/317419072_ShiftCNN_Generalized_Low-Precision_Architecture_for_Inference_of_Convolutional_Neural_NetworksGoogle Scholar
- Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, and Prateek Jain. 2017. ProtoNN: compressed and accurate kNN for resource-scarce devices. In International Conference on Machine Learning. PMLR, International Convention Centre, Sydney, Australia, 1331-1340. https://dl.acm.org/doi/10.5555/3305381.3305519 Google Scholar
Digital Library
- Zhezhi He and Deliang Fan. 2019. Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https: //arxiv.org/abs/ 1810.01018Google Scholar
- Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, and Tie-Yan Liu. 2019. Normalization Helps Training of Quantized LSTM. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 7346-7356. http://papers.nips.cc/paper/8954-normalization-helpstraining-of-quantized-lstm.pdfGoogle Scholar
- Chih-Wei Hsu and Chih-Jen Lin. 2002. A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks 13, 2 ( 2002 ), 415-425. https://doi.org/10.1109/72.991427 Google Scholar
Digital Library
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 4107-4115. http://papers.nips.cc/paper/6573-binarized-neural-networks.pdfGoogle Scholar
Digital Library
- Jonathan J. Hull. 1994. A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence 16, 5 ( 1994 ), 550-554. https://doi.org/10.1109/34.291440 Google Scholar
Digital Library
- Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 ( 2016 ). https://arxiv.org/ abs/1602.07360v1Google Scholar
- Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew G. Howard, Hartwig Adam, and Dmitry Kalenichenko. 2017. Quantization and Training of Neural Networks for Eficient Integer-Arithmetic-Only Inference. CoRR abs/1712.05877 ( 2017 ). arXiv: 1712.05877 http://arxiv.org/abs/1712.05877Google Scholar
- Jef Johnson. 2018. Rethinking floating point for deep learning. CoRR abs/1811.01721 ( 2018 ). https://arxiv.org/abs/ 1811.01721Google Scholar
- Urs Köster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William Constable, Oguz Elibol, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Eficient Training of Deep Neural Networks. CoRR abs/1711.02213 ( 2017 ). https://arxiv.org/abs/1711.02213Google Scholar
- Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for eficient inference: A whitepaper. CoRR abs/ 1806.08342 ( 2018 ). https://arxiv.org/abs/ 1806.08342Google Scholar
- Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.Google Scholar
- Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-eficient Machine Learning in 2 KB RAM for the Internet of Things. In International Conference on Machine Learning. PMLR, International Convention Centre, Sydney, Australia, 1935-1944. https://dl.acm.org/doi/10.5555/3305381.3305581 Google Scholar
Cross Ref
- Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain, and Manik Varma. 2018. FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network. In Proceedings of the Thirty-first Annual Conference on Neural Information Processing Systems (NeurIPS). 9031-9042. https://dl.acm.org/doi/10.5555/3327546. 3327577 Google Scholar
Digital Library
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Hafner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 ( 1998 ), 2278-2324. https://doi.org/10.1109/5.726791 Google Scholar
Cross Ref
- Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, and Tom Goldstein. 2017. Training Quantized Nets: A Deeper Understanding. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5811-5821. http://papers.nips.cc/paper/7163-training-quantized-nets-a-deeper-understanding.pdfGoogle Scholar
Digital Library
- Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. 2015. Neural Networks with Few Multiplications. CoRR abs/1510.03009 ( 2015 ). arXiv: 1510.03009 http://arxiv.org/abs/1510.03009Google Scholar
- Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, and Max Welling. 2019. Relaxed Quantization for Discretized Neural Networks. In International Conference on Learning Representations. https://openreview.net/forum? id=HkxjYoCqKXGoogle Scholar
- Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, and James J. Little. 2018. LSQ++: Lower running time and higher recall in multi-codebook quantization. In The European Conference on Computer Vision (ECCV). https://openaccess.thecvf.com/ content_ECCV_2018/papers/Julieta_Martinez_LSQ_lower_runtime_ECCV_2018_paper.pdfGoogle Scholar
- Eldad Meller, Alexander Finkelstein, Uri Almog, and Mark Grobman. 2019. Same, Same But Diferent-Recovering Neural Network Quantization Error Through Weight Factorization. CoRR abs/ 1902. 01917 ( 2019 ). https://arxiv.org/abs/ 1902.01917Google Scholar
- Daniel Menard, Daniel Chillet, François Charot, and Olivier Sentieys. 2002. Automatic Floating-point to Fixed-point Conversion for DSP Code Generation. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (Grenoble, France) (CASES '02). Association for Computing Machinery, New York, NY, USA, 270-276. https://doi.org/10.1145/581630.581674 Google Scholar
Digital Library
- Tom M. Mitchell. 1997. Machine Learning. McGraw-Hill, New York. http://www.cs.cmu.edu/~tom/mlbook.htmlGoogle Scholar
Digital Library
- Daisuke Miyashita, Edward H. Lee, and Boris Murmann. 2016. Convolutional Neural Networks using Logarithmic Data Representation. CoRR abs/1603.01025 ( 2016 ). https://arxiv.org/abs/1603.01025Google Scholar
- Markus Nagel, Mart van Baalen, Tijmen Blankevoort, and Max Welling. 2019. Data-Free Quantization Through Weight Equalization and Bias Correction. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27-November 2, 2019. IEEE, 1325-1334. https://openaccess.thecvf.com/content_ICCV_2019 /papers/ Nagel_Data-Free_Quantization_Through_Weight_Equalization_and_Bias_Correction_ICCV_2019_paper.pdfGoogle Scholar
- Anshuman Nayak, Malay Haldar, Alok N. Choudhary, and Prithviraj Banerjee. 2001. Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2001, Munich, Germany, March 12-16, 2001. 722-728. https://doi.org/10.1109/DATE. 2001.915108 Google Scholar
- Shishir Patil, Don Kurian Dennis, Chirag Pabbaraju, Rajanikant Deshmukh, Harsha Simhadri, Manik Varma, and Prateek Jain. 2018. GesturePod: Programmable Gesture Recognition for Augmenting Assistive Devices. Technical Report. Microsoft. https://www.microsoft.com/en-us/research/publication/gesturepod-programmable-gesture-recognitionaugmenting-assistive-devices/Google Scholar
- Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR abs/1603.05279 ( 2016 ). arXiv: 1603.05279 http://arxiv.org/abs/1603.05279Google Scholar
- Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning Assistant for Floating-point Precision. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) ( SC '13). Association for Computing Machinery, New York, NY, USA, Article 27, 12 pages. https://doi.org/10.1145/2503210.2503296 Google Scholar
Digital Library
- Oindrila Saha, Aditya Kusupati, Harsha Vardhan Simhadri, Manik Varma, and Prateek Jain. 2020. RNNPool: Eficient Non-linear Pooling for RAM Constrained Inference. arXiv: 2002. 11921 [cs.CV]Google Scholar
- Charbel Sakr and Naresh Shanbhag. 2019. Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm. In International Conference on Learning Representations. https://openreview.net/forum?id=rkxaNjA9YmGoogle Scholar
- Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, Edinburgh, United Kingdom-June 09-11, 2014. 53-64. https://doi.org/10.1145/2666356.2594302 Google Scholar
Digital Library
- Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hofmann, and Martin Rinard. 2011. Managing Performance vs. Accuracy Trade-ofs with Loop Perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (Szeged, Hungary) (ESEC/FSE '11). Association for Computing Machinery, New York, NY, USA, 124-134. https://doi.org/10.1145/2025113.2025133 Google Scholar
Digital Library
- STMicroelectronics. 2020. STM32 Nucleo-144 development board with STM32F439ZI MCU, supports Arduino, ST Zio and morpho connectivity. https://www.st.com/en/evaluation-tools/nucleo-f439zi.htmlGoogle Scholar
- Manik Varma and Andrew Zisserman. 2005. A statistical approach to texture classification from single images. International journal of computer vision 62, 1-2 ( 2005 ), 61-81. https://doi.org/10.1023/B:VISI. 0000046589.39864.ee Google Scholar
Cross Ref
- Y. Wang, M. Chen, X. Wang, R. H. M. Chan, and W. J. Li. 2018. IoT for Next-Generation Racket Sports Training. IEEE Internet of Things Journal 5, 6 ( 2018 ), 4558-4566. https://doi.org/10.1109/JIOT. 2018.2837347 Google Scholar
Cross Ref
- Pete Warden. 2018. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. arXiv: 1804. 03209 [cs.CL]Google Scholar
- M. WILLEMS. 1997. FRIDGE : Floating-point programming of fixed-point digital signal processors. Proc. International Conference on Signal Processing Applications and Technology 1997 (ICSPAT-97), Sept. ( 1997 ). https://ci.nii.ac.jp/naid/ 10018558547/en/Google Scholar
- Jingjing Yang, Yuanning Li, Yonghong Tian, Lingyu Duan, and Wen Gao. 2009. Group-sensitive multiple kernel learning for object categorization. In Computer Vision, 2009 IEEE 12th International Conference on. IEEE, Kyoto, Japan, 436-443. https://doi.org/10.1109/ICCV. 2009.5459172 Google Scholar
Cross Ref
- Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, and Cheng-Zhong Xu. 2019. Focused Quantization for Sparse CNNs. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 5584-5593. http://papers.nips.cc/paper/8796-focused-quantization-forsparse-cnns.pdfGoogle Scholar
- Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://arxiv.org/abs/1702.03044v2Google Scholar
- Aojun Zhou, Anbang Yao, Kuan Wang, and Yurong Chen. 2018. Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://openaccess.thecvf. com/content_cvpr_2018/papers/Zhou_Explicit_Loss-Error-Aware_Quantization_CVPR_2018_paper.pdfGoogle Scholar
Cross Ref
- Zeyuan Allen Zhu, Sasa Misailovic, Jonathan A. Kelner, and Martin Rinard. 2012. Randomized Accuracy-aware Program Transformations for Eficient Approximate Computations. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Philadelphia, PA, USA) ( POPL '12). Association for Computing Machinery, New York, NY, USA, 441-454. https://doi.org/10.1145/2103621.2103710 Google Scholar
Digital Library
Index Terms
Shiftry: RNN inference in 2KB of RAM
Recommendations
Compiling KB-sized machine learning models to tiny IoT devices
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationRecent advances in machine learning (ML) have produced KiloByte-size models that can directly run on constrained IoT devices. This approach avoids expensive communication between IoT devices and the cloud, thereby enabling energy-efficient real-time ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsEmerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14Emerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...






Comments