skip to main content
research-article
Open Access

Shiftry: RNN inference in 2KB of RAM

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

Traditionally, IoT devices send collected sensor data to an intelligent cloud where machine learning (ML) inference happens. However, this course is rapidly changing and there is a recent trend to run ML on the edge IoT devices themselves. An intelligent edge is attractive because it saves network round trip (efficiency) and keeps user data at the source (privacy). However, the IoT devices are much more resource constrained than the cloud, which makes running ML on them challenging. Specifically, consider Arduino Uno, a commonly used board, that has 2KB of RAM and 32KB of read-only Flash memory. Although recent breakthroughs in ML have created novel recurrent neural network (RNN) models that provide good accuracy with KB-sized models, deploying them on tiny devices with such hard memory requirements has remained elusive.

We provide, Shiftry, an automatic compiler from high-level floating-point ML models to fixed-point C-programs with 8-bit and 16-bit integers, which have significantly lower memory requirements. For this conversion, Shiftry uses a data-driven float-to-fixed procedure and a RAM management mechanism. These techniques enable us to provide first empirical evaluation of RNNs running on tiny edge devices. On simpler ML models that prior work could handle, Shiftry-generated code has lower latency and higher accuracy.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

This is a presentation of our talk at OOPSLA 2020 for our work: 'Shiftry: RNN Inference in 2KB of RAM'. Shiftry is a compiler which can translate ML algorithms to code which can directly run on tiny IoT devices, taking into account severe resource constraints of these devices. Shiftry is the first work to run Recurrent Neural Networks on an Arduino Uno.

References

  1. Sridhar Gopinath, Nikhil Ghanathe, Vivek Seshadri, and Rahul Sharma. 2019a. Compiling KB-Sized Machine Learning Models to Tiny IoT Devices. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Association for Computing Machinery, New York, NY, USA, 79-95. https://www.microsoft.com/ en-us/research/uploads/prod/2018/10/pldi19-SeeDot.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sridhar Gopinath, Nikhil Ghanathe, Vivek Seshadri, and Rahul Sharma. 2019b. Microsoft EdgeML Repository. https: //github.com/microsoft/EdgeML/tree/51c5ae0b81ab259f3bcab12741d7b489b7fe49deGoogle ScholarGoogle Scholar
  3. Denis A. Gudovskiy and Luca Rigazio. 2017. ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks. CoRR abs/1706.02393 ( 2017 ). https://www.researchgate.net/publication/317419072_ShiftCNN_Generalized_Low-Precision_Architecture_for_Inference_of_Convolutional_Neural_NetworksGoogle ScholarGoogle Scholar
  4. Chirag Gupta, Arun Sai Suggala, Ankit Goyal, Harsha Vardhan Simhadri, Bhargavi Paranjape, Ashish Kumar, Saurabh Goyal, Raghavendra Udupa, Manik Varma, and Prateek Jain. 2017. ProtoNN: compressed and accurate kNN for resource-scarce devices. In International Conference on Machine Learning. PMLR, International Convention Centre, Sydney, Australia, 1331-1340. https://dl.acm.org/doi/10.5555/3305381.3305519 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Zhezhi He and Deliang Fan. 2019. Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https: //arxiv.org/abs/ 1810.01018Google ScholarGoogle Scholar
  6. Lu Hou, Jinhua Zhu, James Kwok, Fei Gao, Tao Qin, and Tie-Yan Liu. 2019. Normalization Helps Training of Quantized LSTM. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 7346-7356. http://papers.nips.cc/paper/8954-normalization-helpstraining-of-quantized-lstm.pdfGoogle ScholarGoogle Scholar
  7. Chih-Wei Hsu and Chih-Jen Lin. 2002. A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks 13, 2 ( 2002 ), 415-425. https://doi.org/10.1109/72.991427 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 4107-4115. http://papers.nips.cc/paper/6573-binarized-neural-networks.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jonathan J. Hull. 1994. A database for handwritten text recognition research. IEEE Transactions on pattern analysis and machine intelligence 16, 5 ( 1994 ), 550-554. https://doi.org/10.1109/34.291440 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 ( 2016 ). https://arxiv.org/ abs/1602.07360v1Google ScholarGoogle Scholar
  11. Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew G. Howard, Hartwig Adam, and Dmitry Kalenichenko. 2017. Quantization and Training of Neural Networks for Eficient Integer-Arithmetic-Only Inference. CoRR abs/1712.05877 ( 2017 ). arXiv: 1712.05877 http://arxiv.org/abs/1712.05877Google ScholarGoogle Scholar
  12. Jef Johnson. 2018. Rethinking floating point for deep learning. CoRR abs/1811.01721 ( 2018 ). https://arxiv.org/abs/ 1811.01721Google ScholarGoogle Scholar
  13. Urs Köster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William Constable, Oguz Elibol, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Eficient Training of Deep Neural Networks. CoRR abs/1711.02213 ( 2017 ). https://arxiv.org/abs/1711.02213Google ScholarGoogle Scholar
  14. Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for eficient inference: A whitepaper. CoRR abs/ 1806.08342 ( 2018 ). https://arxiv.org/abs/ 1806.08342Google ScholarGoogle Scholar
  15. Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report. Citeseer.Google ScholarGoogle Scholar
  16. Ashish Kumar, Saurabh Goyal, and Manik Varma. 2017. Resource-eficient Machine Learning in 2 KB RAM for the Internet of Things. In International Conference on Machine Learning. PMLR, International Convention Centre, Sydney, Australia, 1935-1944. https://dl.acm.org/doi/10.5555/3305381.3305581 Google ScholarGoogle ScholarCross RefCross Ref
  17. Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain, and Manik Varma. 2018. FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network. In Proceedings of the Thirty-first Annual Conference on Neural Information Processing Systems (NeurIPS). 9031-9042. https://dl.acm.org/doi/10.5555/3327546. 3327577 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Hafner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 ( 1998 ), 2278-2324. https://doi.org/10.1109/5.726791 Google ScholarGoogle ScholarCross RefCross Ref
  19. Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, and Tom Goldstein. 2017. Training Quantized Nets: A Deeper Understanding. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5811-5821. http://papers.nips.cc/paper/7163-training-quantized-nets-a-deeper-understanding.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, and Yoshua Bengio. 2015. Neural Networks with Few Multiplications. CoRR abs/1510.03009 ( 2015 ). arXiv: 1510.03009 http://arxiv.org/abs/1510.03009Google ScholarGoogle Scholar
  21. Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, and Max Welling. 2019. Relaxed Quantization for Discretized Neural Networks. In International Conference on Learning Representations. https://openreview.net/forum? id=HkxjYoCqKXGoogle ScholarGoogle Scholar
  22. Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, and James J. Little. 2018. LSQ++: Lower running time and higher recall in multi-codebook quantization. In The European Conference on Computer Vision (ECCV). https://openaccess.thecvf.com/ content_ECCV_2018/papers/Julieta_Martinez_LSQ_lower_runtime_ECCV_2018_paper.pdfGoogle ScholarGoogle Scholar
  23. Eldad Meller, Alexander Finkelstein, Uri Almog, and Mark Grobman. 2019. Same, Same But Diferent-Recovering Neural Network Quantization Error Through Weight Factorization. CoRR abs/ 1902. 01917 ( 2019 ). https://arxiv.org/abs/ 1902.01917Google ScholarGoogle Scholar
  24. Daniel Menard, Daniel Chillet, François Charot, and Olivier Sentieys. 2002. Automatic Floating-point to Fixed-point Conversion for DSP Code Generation. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (Grenoble, France) (CASES '02). Association for Computing Machinery, New York, NY, USA, 270-276. https://doi.org/10.1145/581630.581674 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tom M. Mitchell. 1997. Machine Learning. McGraw-Hill, New York. http://www.cs.cmu.edu/~tom/mlbook.htmlGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  26. Daisuke Miyashita, Edward H. Lee, and Boris Murmann. 2016. Convolutional Neural Networks using Logarithmic Data Representation. CoRR abs/1603.01025 ( 2016 ). https://arxiv.org/abs/1603.01025Google ScholarGoogle Scholar
  27. Markus Nagel, Mart van Baalen, Tijmen Blankevoort, and Max Welling. 2019. Data-Free Quantization Through Weight Equalization and Bias Correction. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27-November 2, 2019. IEEE, 1325-1334. https://openaccess.thecvf.com/content_ICCV_2019 /papers/ Nagel_Data-Free_Quantization_Through_Weight_Equalization_and_Bias_Correction_ICCV_2019_paper.pdfGoogle ScholarGoogle Scholar
  28. Anshuman Nayak, Malay Haldar, Alok N. Choudhary, and Prithviraj Banerjee. 2001. Precision and error analysis of MATLAB applications during automated hardware synthesis for FPGAs. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2001, Munich, Germany, March 12-16, 2001. 722-728. https://doi.org/10.1109/DATE. 2001.915108 Google ScholarGoogle Scholar
  29. Shishir Patil, Don Kurian Dennis, Chirag Pabbaraju, Rajanikant Deshmukh, Harsha Simhadri, Manik Varma, and Prateek Jain. 2018. GesturePod: Programmable Gesture Recognition for Augmenting Assistive Devices. Technical Report. Microsoft. https://www.microsoft.com/en-us/research/publication/gesturepod-programmable-gesture-recognitionaugmenting-assistive-devices/Google ScholarGoogle Scholar
  30. Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR abs/1603.05279 ( 2016 ). arXiv: 1603.05279 http://arxiv.org/abs/1603.05279Google ScholarGoogle Scholar
  31. Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, and David Hough. 2013. Precimonious: Tuning Assistant for Floating-point Precision. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Denver, Colorado) ( SC '13). Association for Computing Machinery, New York, NY, USA, Article 27, 12 pages. https://doi.org/10.1145/2503210.2503296 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Oindrila Saha, Aditya Kusupati, Harsha Vardhan Simhadri, Manik Varma, and Prateek Jain. 2020. RNNPool: Eficient Non-linear Pooling for RAM Constrained Inference. arXiv: 2002. 11921 [cs.CV]Google ScholarGoogle Scholar
  33. Charbel Sakr and Naresh Shanbhag. 2019. Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm. In International Conference on Learning Representations. https://openreview.net/forum?id=rkxaNjA9YmGoogle ScholarGoogle Scholar
  34. Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '14, Edinburgh, United Kingdom-June 09-11, 2014. 53-64. https://doi.org/10.1145/2666356.2594302 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hofmann, and Martin Rinard. 2011. Managing Performance vs. Accuracy Trade-ofs with Loop Perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (Szeged, Hungary) (ESEC/FSE '11). Association for Computing Machinery, New York, NY, USA, 124-134. https://doi.org/10.1145/2025113.2025133 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. STMicroelectronics. 2020. STM32 Nucleo-144 development board with STM32F439ZI MCU, supports Arduino, ST Zio and morpho connectivity. https://www.st.com/en/evaluation-tools/nucleo-f439zi.htmlGoogle ScholarGoogle Scholar
  37. Manik Varma and Andrew Zisserman. 2005. A statistical approach to texture classification from single images. International journal of computer vision 62, 1-2 ( 2005 ), 61-81. https://doi.org/10.1023/B:VISI. 0000046589.39864.ee Google ScholarGoogle ScholarCross RefCross Ref
  38. Y. Wang, M. Chen, X. Wang, R. H. M. Chan, and W. J. Li. 2018. IoT for Next-Generation Racket Sports Training. IEEE Internet of Things Journal 5, 6 ( 2018 ), 4558-4566. https://doi.org/10.1109/JIOT. 2018.2837347 Google ScholarGoogle ScholarCross RefCross Ref
  39. Pete Warden. 2018. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. arXiv: 1804. 03209 [cs.CL]Google ScholarGoogle Scholar
  40. M. WILLEMS. 1997. FRIDGE : Floating-point programming of fixed-point digital signal processors. Proc. International Conference on Signal Processing Applications and Technology 1997 (ICSPAT-97), Sept. ( 1997 ). https://ci.nii.ac.jp/naid/ 10018558547/en/Google ScholarGoogle Scholar
  41. Jingjing Yang, Yuanning Li, Yonghong Tian, Lingyu Duan, and Wen Gao. 2009. Group-sensitive multiple kernel learning for object categorization. In Computer Vision, 2009 IEEE 12th International Conference on. IEEE, Kyoto, Japan, 436-443. https://doi.org/10.1109/ICCV. 2009.5459172 Google ScholarGoogle ScholarCross RefCross Ref
  42. Yiren Zhao, Xitong Gao, Daniel Bates, Robert Mullins, and Cheng-Zhong Xu. 2019. Focused Quantization for Sparse CNNs. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 5584-5593. http://papers.nips.cc/paper/8796-focused-quantization-forsparse-cnns.pdfGoogle ScholarGoogle Scholar
  43. Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://arxiv.org/abs/1702.03044v2Google ScholarGoogle Scholar
  44. Aojun Zhou, Anbang Yao, Kuan Wang, and Yurong Chen. 2018. Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://openaccess.thecvf. com/content_cvpr_2018/papers/Zhou_Explicit_Loss-Error-Aware_Quantization_CVPR_2018_paper.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  45. Zeyuan Allen Zhu, Sasa Misailovic, Jonathan A. Kelner, and Martin Rinard. 2012. Randomized Accuracy-aware Program Transformations for Eficient Approximate Computations. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Philadelphia, PA, USA) ( POPL '12). Association for Computing Machinery, New York, NY, USA, 441-454. https://doi.org/10.1145/2103621.2103710 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Shiftry: RNN inference in 2KB of RAM

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!