skip to main content
10.1145/3384419.3430898acmconferencesArticle/Chapter ViewAbstractPublication PagessensysConference Proceedingsconference-collections
research-article
Public Access
Best Paper

Deep compressive offloading: speeding up neural network inference by trading edge computation for network latency

Published:16 November 2020Publication History

ABSTRACT

With recent advances, neural networks have become a crucial building block in intelligent IoT systems and sensing applications. However, the excessive computational demand remains a serious impediment to their deployments on low-end IoT devices. With the emergence of edge computing, offloading grows into a promising technique to circumvent end-device limitations. However, transferring data between local and edge devices takes up a large proportion of time in existing offloading frameworks, creating a bottleneck for low-latency intelligent services. In this work, we propose a general framework, called deep compressive offloading. By integrating compressive sensing theory and deep learning, our framework can encode data for offloading into tiny sizes with negligible overhead on local devices and decode the data on the edge server, while offering theoretical guarantees on perfect reconstruction and lossless inference. By trading edge computing resources for data transmission time, our design can significantly reduce offloading latency with almost no accuracy loss. We build a deep compressive offloading system to serve state-of-the-art computer vision and speech recognition services. With comprehensive evaluations, our system can consistently reduce end-to-end latency by 2X to 4X with 1% accuracy loss, compared to state-of-the-art neural network offloading systems. In conditions of limited network bandwidth or intensive background traffic, our system can further speed up the neural network inference by up to 35X 1.

References

  1. K. M. Abadir and J. R. Magnus. Matrix algebra, volume 1. Cambridge University Press, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  2. E. Agustsson, F. Mentzer, M. Tschannen, L. Cavigelli, R. Timofte, L. Benini, and L. V. Gool. Soft-to-hard vector quantization for end-to-end learning compressible representations. In Advances in Neural Information Processing Systems, pages 1141--1151, 2017.Google ScholarGoogle Scholar
  3. E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. V. Gool. Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE International Conference on Computer Vision, pages 221--231, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning, pages 173--182, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.Google ScholarGoogle Scholar
  6. D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.Google ScholarGoogle Scholar
  7. R. G. Baraniuk. Compressive sensing. IEEE signal processing magazine, 24(4), 2007.Google ScholarGoogle Scholar
  8. A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183--202, 2009.Google ScholarGoogle Scholar
  9. Y. Bengio, N. Leonard, and A. Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.Google ScholarGoogle Scholar
  10. S. Bhattacharya and N. D. Lane. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM, pages 176--189. ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Bora, A. Jalal, E. Price, and A. G. Dimakis. Compressed sensing using generative models. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 537--546. JMLR. org, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Brock, J. Donahue, and K. Simonyan. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.Google ScholarGoogle Scholar
  13. E. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. arXiv preprint math/0409186, 2004.Google ScholarGoogle Scholar
  14. I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57(11):1413--1457, 2004.Google ScholarGoogle Scholar
  15. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248--255. Ieee, 2009.Google ScholarGoogle Scholar
  16. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.Google ScholarGoogle Scholar
  17. A. E. Eshratifar and M. Pedram. Energy and performance efficient computation offloading for deep neural networks in a mobile cloud computing environment. In Proceedings of the 2018 on Great Lakes Symposium on VLSI, pages 111--116. ACM, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Goldstein and S. Osher. The split bregman method for l1-regularized problems. SIAM journal on imaging sciences, 2(2):323--343, 2009.Google ScholarGoogle Scholar
  19. G. H. Golub and H. A. Van der Vorst. Eigenvalue computation in the 20th century. In Numerical analysis: historical developments in the 20th century, pages 209--239. Elsevier, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  20. P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.Google ScholarGoogle Scholar
  21. A. Graves, A.-r. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645--6649. IEEE, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  22. S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.Google ScholarGoogle Scholar
  23. A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, et al. Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567, 2014.Google ScholarGoogle Scholar
  24. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  25. G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.Google ScholarGoogle Scholar
  26. https://github.com/tensorflow/tensorflow/. Tensorflow benchmark tool. tree/master/tensorflow/tools/benchmark.Google ScholarGoogle Scholar
  27. D. A. Huffman. A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9):1098--1101, 1952.Google ScholarGoogle ScholarCross RefCross Ref
  28. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, pages 675--678. ACM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Jiang, V. Sekar, and H. Zhang. Improving fairness, efficiency, and stability in http-based adaptive video streaming with festive. IEEE/ACM Transactions on Networking (ToN), 22(1):326--340, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  30. Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and L. Tang. Neuro-surgeon: Collaborative intelligence between the cloud and mobile edge. In ACM SIGARCH Computer Architecture News, volume 45, pages 615--629. ACM, 2017.Google ScholarGoogle Scholar
  31. J. H. Ko, T. Na, M. F. Amir, and S. Mukhopadhyay. Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pages 1--6. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  32. H. Li, C. Hu, J. Jiang, Z. Wang, Y. Wen, and W. Zhu. Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), pages 671--678. IEEE, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  33. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740--755. Springer, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  34. L. Liu, H. Li, and M. Gruteser. Edge assisted real-time object detection for mobile augmented reality. In MobiCom. ACM, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool. Conditional probability models for deep image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4394--4402, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  36. T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.Google ScholarGoogle Scholar
  37. C. Olston, N. Fiedel, K. Gorovoy, J. Harmsen, L. Lao, F. Li, V. Rajashekhar, S. Ramesh, and J. Soyke. Tensorflow-serving: Flexible, high-performance ml serving. arXiv preprint arXiv:1712.06139, 2017.Google ScholarGoogle Scholar
  38. V. Panayotov, G. Chen, D. Povey, and S. Khudanpur. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5206--5210. IEEE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  39. X. Ran, H. Chen, X. Zhu, Z. Liu, and J. Chen. Deepdecision: A mobile deep learning framework for edge video analytics. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pages 1421--1429. IEEE, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779--788, 2016.Google ScholarGoogle Scholar
  41. J. Redmon and A. Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.Google ScholarGoogle Scholar
  42. S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pages 91--99, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. L. Theis, W. Shi, A. Cunningham, and F. Huszár. Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395, 2017.Google ScholarGoogle Scholar
  44. G. Wade. Signal coding and processing. Cambridge university press, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Y. Weiss, H. S. Chang, and W. T. Freeman. Learning compressed sensing. In Snowbird Learning Workshop, Allerton, CA. Citeseer, 2007.Google ScholarGoogle Scholar
  46. S. Yao, S. Hu, Y. Zhao, A. Zhang, and T. Abdelzaher. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web, pages 351--360, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. S. Yao, A. Piao, W. Jiang, Y. Zhao, H. Shao, S. Liu, D. Liu, J. Li, T. Wang, S. Hu, et al. Stfnets: Learning sensing signals from the time-frequency perspective with short-time fourier neural networks. In The World Wide Web Conference, pages 2192--2202, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. Yao, Y. Zhao, H. Shao, S. Liu, D. Liu, L. Su, and T. Abdelzaher. Fastdeepiot: Towards understanding and optimizing neural network execution time on mobile and embedded devices. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems, pages 278--291. ACM, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. S. Yao, Y. Zhao, A. Zhang, S. Hu, H. Shao, C. Zhang, L. Su, and T. Abdelzaher. Deep learning for the internet of things. Computer, 51(5):32--41, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. Yao, Y. Zhao, A. Zhang, L. Su, and T. Abdelzaher. Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems, page 4. ACM, 2017.Google ScholarGoogle Scholar
  51. H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318, 2018.Google ScholarGoogle Scholar
  52. L. Zhang, B. Tiwana, Z. Qian, Z. Wang, R. P. Dick, Z. M. Mao, and L. Yang. Accurate online power estimation and automatic battery behavior based power model generation for smartphones. In Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, pages 105--114, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deep compressive offloading: speeding up neural network inference by trading edge computation for network latency

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SenSys '20: Proceedings of the 18th Conference on Embedded Networked Sensor Systems
          November 2020
          852 pages
          ISBN:9781450375900
          DOI:10.1145/3384419

          Copyright © 2020 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 November 2020

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate174of867submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader