skip to main content
research-article

Efficient-Grad: Efficient Training Deep Convolutional Neural Networks on Edge Devices with Gradient Optimizations

Authors Info & Claims
Published:08 February 2022Publication History
Skip Abstract Section

Abstract

With the prospering of mobile devices, the distributed learning approach, enabling model training with decentralized data, has attracted great interest from researchers. However, the lack of training capability for edge devices significantly limits the energy efficiency of distributed learning in real life. This article describes Efficient-Grad, an algorithm-hardware co-design approach for training deep convolutional neural networks, which improves both throughput and energy saving during model training, with negligible validation accuracy loss.

The key to Efficient-Grad is its exploitation of two observations. Firstly, the sparsity has potential for not only activation and weight, but gradients and the asymmetry residing in the gradients for the conventional back propagation (BP). Secondly, a dedicated hardware architecture for sparsity utilization and efficient data movement can be optimized to support the Efficient-Grad algorithm in a scalable manner. To the best of our knowledge, Efficient-Grad is the first approach that successfully adopts a feedback-alignment (FA)-based gradient optimization scheme for deep convolutional neural network training, which leads to its superiority in terms of energy efficiency. We present case studies to demonstrate that the Efficient-Grad design outperforms the prior arts by 3.72x in terms of energy efficiency.

Skip Supplemental Material Section

Supplemental Material

REFERENCES

  1. [1] Amid Alon, Biancolin David, Gonzalez Abraham, Grubb Daniel, Karandikar Sagar, Liew Harrison, Magyar Albert, Mao Howard, Ou Albert, Pemberton Nathan, Rigge Paul, Schmidt Colin, Wright John, Zhao Jerry, Shao Yakun Sophia, Asanović Krste, and Nikolić Borivoje. 2020. Chipyard: Integrated design, simulation, and implementation framework for custom SoCs. IEEE Micro 40, 4 (2020), 1021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Asanović Krste, Avizienis Rimas, Bachrach Jonathan, Beamer Scott, Biancolin David, Celio Christopher, Cook Henry, Dabbelt Daniel, Hauser John, Izraelevitz Adam, Karandikar Sagar, Keller Ben, Kim Donggyu, Koenig John, Lee Yunsup, Love Eric, Maas Martin, Magyar Albert, Mao Howard, Moreto Miquel, Ou Albert, Patterson David A., Richards Brian, Schmidt Colin, Twigg Stephen, Vo Huy, and Waterman Andrew. 2016. The Rocket Chip Generator. Technical Report UCB/EECS-2016-17. EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  3. [3] Bachrach Jonathan, Vo Huy, Richards Brian, Lee Yunsup, Waterman Andrew, Avižienis Rimas, Wawrzynek John, and Asanović Krste. 2012. Chisel: Constructing hardware in a scala embedded language. In Proceedings of the 49th Annual Design Automation Conference. Association for Computing Machinery, New York, NY, 12161225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Baldi Pierre, Sadowski Peter, and Lu Zhiqin. 2019. Learning in the machine: Random backpropagation and the deep learning channel. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 63486352.Google ScholarGoogle Scholar
  5. [5] Bonawitz Keith, Eichner Hubert, Grieskamp Wolfgang, Huba Dzmitry, Ingerman Alex, Ivanov Vladimir, Kiddon Chloé, Konečný Jakub, Mazzocchi Stefano, McMahan Brendan, Overveldt Timon Van, Petrou David, Ramage Daniel, and Roselander Jason. 2019. Towards federated learning at scale: System design. In Proceedings of Machine Learning and Systems, Talwalkar A., Smith V., and Zaharia M. (Eds.), Vol. 1. 374388.Google ScholarGoogle Scholar
  6. [6] Bong Kyeongryeol, Choi Sungpill, Kim Changhyeon, Han Donghyeon, and Yoo Hoi-Jun. 2018. A low-power convolutional neural network face recognition processor and a CIS integrated with always-on face detector. IEEE Journal of Solid-State Circuits 53, 1 (2018), 115123.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Chandrasekar Karthik, Weis Christian, Akesson Benny, Wehn Norbert, and Goossens Kees. 2013. Towards variation-aware system-level power estimation of DRAMs: An empirical approach. In Proceedings of the Design Automation Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Chen Tianshi, Du Zidong, Sun Ninghui, Wang Jia, Wu Chengyong, Chen Yunji, and Temam Olivier. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 269284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chen Xi, Hu Xiaolin, Zhou Hucheng, and Xu Ningyi. 2017. FxpNet: Training a deep convolutional neural network in fixed-point representation. In Proceedings of the 2017 International Joint Conference on Neural Networks. 24942501.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Chen Yunji, Luo Tao, Liu Shaoli, Zhang Shijin, He Liqiang, Wang Jia, Li Ling, Chen Tianshi, Xu Zhiwei, Sun Ninghui, and Temam Olivier. 2014. DaDianNao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Chen Yu-Hsin, Yang Tien-Ju, Emer Joel, and Sze Vivienne. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2 (2019), 292308.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Cheng Yuan, Li Guangya, Wong Ngai, Chen Hai-Bao, and Yu Hao. 2020. DEEPEYE: A deeply tensor-compressed neural network for video comprehension on terminal devices. ACM Transactions on Embedded Computing System 19, 3, Article 18 (May 2020), 25 pages.Google ScholarGoogle Scholar
  13. [13] Choi Seungkyu, Shin Jaekang, Choi Yeongjae, and Kim Lee-Sup. 2019. An optimized design technique of low-bit neural network training for personalization on IoT devices. In Proceedings of the 2019 56th ACM/IEEE Design Automation Conference. 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Choi Seungkyu, Sim Jaehyeong, Kang Myeonggu, Choi Yeongjae, Kim Hyeonuk, and Kim Lee-Sup. 2020. An energy-efficient deep convolutional neural network training accelerator for in situ personalization on smart devices. IEEE Journal of Solid-State Circuits 55, 10 (2020), 26912702.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Coussy Philippe, Gajski Daniel D., Meredith Michael, and Takach Andres. 2009. An introduction to high-level synthesis. IEEE Design Test of Computers 26, 4 (2009), 817.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Dai Pengcheng, Yang Jianlei, Ye Xucheng, Cheng Xingzhou, Luo Junyu, Song Linghao, Chen Yiran, and Zhao Weisheng. 2020. SparseTrain: Exploiting dataflow sparsity for efficient convolutional neural networks training. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference. 16.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248255.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Du Zidong, Fasthuber Robert, Chen Tianshi, Ienne Paolo, Li Ling, Luo Tao, Feng Xiaobing, Chen Yunji, and Temam Olivier. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture. Association for Computing Machinery, New York, NY, 92104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Fleischer Bruce, Shukla Sunil, Ziegler Matthew, Silberman Joel, Oh Jinwook, Srinivasan Vijavalakshmi, Choi Jungwook, Mueller Silvia, Agrawal Ankur, Babinsky Tina, Cao Nianzheng, Chen Chia-Yu, Chuang Pierce, Fox Thomas, Gristede George, Guillorn Michael, Haynie Howard, Klaiber Michael, Lee Dongsoo, Lo Shih-Hsien, Maier Gary, Scheuermann Michael, Venkataramani Swagath, Vezyrtzis Christos, Wang Naigang, Yee Fanchieh, Zhou Ching, Lu Pong-Fei, Curran Brian, Chang Lel, and Gopalakrishnan Kailash. 2018. A scalable multi- teraops deep learning processor core for AI trainina and inference. In Proceedings of the 2018 IEEE Symposium on VLSI Circuits. 3536.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Frenkel Charlotte, Lefebvre Martin, and Bol David. 2021. Learning without feedback: Fixed random learning signals allow for feedforward training of deep neural networks. Frontiers in Neuroscience 15, 1 (2021), 20.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Frenkel Charlotte, Legat Jean-Didier, and Bol David. 2020. A 28-nm convolutional neuromorphic processor enabling online learning with spike-based retinas. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems. 15.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Genc Hasan, Kim Seah, Amid Alon, Haj-Ali Ameer, Iyer Vighnesh, Prakash Pranav, Zhao Jerry, Grubb Daniel, Liew Harrison, Mao Howard, Ou Albert, Schmidt Colin, Steffl Samuel, Wright John, Stoica Ion, Ragan-Kelley Jonathan, Asanovic Krste, Nikolic Borivoje, and Shao Yakun Sophia. 2021. Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration. In Proceedings of the 58th Annual Design Automation Conference. 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Han Donghyeon, Lee Jinsu, Lee Jinmook, and Yoo Hoi-Jun. 2019. A low-power deep neural network online learning processor for real-time object tracking application. IEEE Transactions on Circuits and Systems I: Regular Papers 66, 5 (2019), 17941804.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Han Donghyeon, Lee Jinsu, and Yoo Hoi-Jun. 2021. DF-LNPU: A pipelined direct feedback alignment-based deep neural network learning processor for fast online learning. IEEE Journal of Solid-State Circuits 56, 5 (2021), 16301640.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Hong Ziyang, Fang Wenxiao, En Yunfei, Luo Chengyang, Shao Weiheng, Wang Lei, He Zhiyuan, and Shao E.. 2019. Electromagnetic pattern extraction and grouping for near-field scanning of integrated circuits by PCA and k-means approaches. IEEE Transactions on Electromagnetic Compatibility 61, 6 (2019), 18111822.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Hong Ziyang and Yue C. Patrick. 2021. Efficient training convolutional neural networks on edge devices with gradient-pruned sign-symmetric feedback alignment. (2021). arXiv:cs.LG/2103.02889. Retrieved from https://arxiv.org/abs/2103.02889.Google ScholarGoogle Scholar
  28. [28] Horowitz Mark. 2014. 1.1 computing’s energy problem (and what we can do about it). In Proceedings of the 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers. 1014.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Huo Zhouyuan, Gu Bin, and Huang Heng. 2018. Training neural networks using features replay. In Proceedings of the Advances in Neural Information Processing Systems, Bengio S., Wallach H., Larochelle H., Grauman K., Cesa-Bianchi N., and Garnett R. (Eds.), Vol. 31. Curran Associates, Inc.Google ScholarGoogle Scholar
  30. [30] Ioffe Sergey and Szegedy Christian. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. JMLR.org, 448456.Google ScholarGoogle Scholar
  31. [31] Jaderberg Max, Czarnecki Wojciech Marian, Osindero Simon, Vinyals Oriol, Graves Alex, Silver David, and Kavukcuoglu Koray. 2017. Decoupled neural interfaces using synthetic gradients. In Proceedings of the 34th International Conference on Machine Learning (Proceedings of Machine Learning Research), Precup Doina and Teh Yee Whye (Eds.), Vol. 70. PMLR, 16271635.Google ScholarGoogle Scholar
  32. [32] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David, Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, Boyle Rick, Cantin Pierre-luc, Chao Clifford, Clark Chris, Coriell Jeremy, Daley Mike, Dau Matt, Dean Jeffrey, Gelb Ben, Ghaemmaghami Tara Vazir, Gottipati Rajendra, Gulland William, Hagmann Robert, Ho C. Richard, Hogberg Doug, Hu John, Hundt Robert, Hurt Dan, Ibarz Julian, Jaffey Aaron, Jaworski Alek, Kaplan Alexander, Khaitan Harshit, Killebrew Daniel, Koch Andy, Kumar Naveen, Lacy Steve, Laudon James, Law James, Le Diemthu, Leary Chris, Liu Zhuyuan, Lucke Kyle, Lundin Alan, MacKean Gordon, Maggiore Adriana, Mahony Maire, Miller Kieran, Nagarajan Rahul, Narayanaswami Ravi, Ni Ray, Nix Kathy, Norrie Thomas, Omernick Mark, Penukonda Narayana, Phelps Andy, Ross Jonathan, Ross Matt, Salek Amir, Samadiani Emad, Severn Chris, Sizikov Gregory, Snelham Matthew, Souter Jed, Steinberg Dan, Swing Andy, Tan Mercedes, Thorson Gregory, Tian Bo, Toma Horia, Tuttle Erick, Vasudevan Vijay, Walter Richard, Wang Walter, Wilcox Eric, and Yoon Doe Hyun. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture. 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Krizhevsky Alex and Hinton Geoffrey. 2009. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009).Google ScholarGoogle Scholar
  34. [34] Kukreja Navjot, Shilova Alena, Beaumont Olivier, Huckelheim Jan, Ferrier Nicola, Hovland Paul, and Gorman Gerard. 2019. Training on the edge: The why and the how. In Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium Workshops. 899903.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] LeCun Yann, Bottou Léon, Orr Genevieve B., and Müller Klaus-Robert. 1998. Efficient backprop. In Proceedings of the Neural Networks: Tricks of the Trade, This Book is an Outgrowth of a 1996 NIPS Workshop. Springer-Verlag, Berlin, 950.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Lee Gunhee, Park Hanmin, Kim Namhyung, Yu Joonsang, Jo Sujeong, and Choi Kiyoung. 2019. Acceleration of DNN backward propagation by selective computation of gradients. In Proceedings of the 2019 56th ACM/IEEE Design Automation Conference (DAC). 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Lee Jinsu, Lee Juhyoung, Han Donghyeon, Lee Jinmook, Park Gwangtae, and Yoo Hoi-Jun. 2019. 7.7 LNPU: A 25.3TFLOPS/W sparse deep-neural-network learning processor with fine-grained mixed precision of FP8-FP16. In Proceedings of the 2019 IEEE International Solid- State Circuits Conference. 142144.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Li Sheng, Chen Ke, Ahn Jung Ho, Brockman Jay B., and Jouppi Norman P.. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In Proceedings of the ICCAD: International Conference on Computer-Aided Design. 694701.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Li Shang, Yang Zhiyuan, Reddy Dhiraj, Srivastava Ankur, and Jacob Bruce. 2020. DRAMsim3: A cycle-accurate, thermal-capable DRAM simulator. IEEE Computer Architecture Letters 19, 2 (2020), 106109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Lillicrap Timothy P., Cownden Daniel, Tweed Douglas B., and Akerman Colin J.. 2016. Random feedback weights support learning in deep neural networks. Nature Communications 7, 1 (2016), 1–10.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Lu Cheng-Hsun, Wu Yi-Chung, and Yang Chia-Hsiang. 2019. A 2.25 TOPS/W fully-integrated deep CNN learning processor with on-chip training. In Proceedings of the 2019 IEEE Asian Solid-State Circuits Conference. 6568.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] McMahan Brendan, Moore Eider, Ramage Daniel, Hampson Seth, and Arcas Blaise Agüera y. 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Vol. 54. PMLR, 12731282.Google ScholarGoogle Scholar
  43. [43] Nøkland Arild. 2016. Direct feedback alignment provides learning in deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems 29. Curran Associates, Inc., 10371045.Google ScholarGoogle Scholar
  44. [44] Papon Charles. 2020. Scala based HDL v1.4.0. (mar 2020). Retrieved May 2, 2021 from https://github.com/SpinalHDL/SpinalHDL.Google ScholarGoogle Scholar
  45. [45] Park Jeongwoo, Lee Sunwoo, and Jeon Dongsuk. 2021. 9.3 A 40nm 4.81TFLOPS/W 8b floating-point training processor for non-sparse neural networks using shared exponent bias and 24-way fused multiply-add tree. In Proceedings of the 2021 IEEE International Solid- State Circuits Conference Vol. 64. 13.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Refinetti Maria, d’Ascoli Sté phane, Ohana Ruben, and Goldt Sebastian. 2020. The dynamics of learning with feedback alignment. (2020). arXiv:stat.ML/2011.12428. Retrieved from https://arxiv.org/abs/2011.12428.Google ScholarGoogle Scholar
  47. [47] Rosenfeld Paul, Cooper-Balis Elliott, and Jacob Bruce. 2011. DRAMSim2: A cycle accurate memory system simulator. IEEE Computer Architecture Letters 10, 1 (2011), 1619.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Rumelhart David E., Hinton Geoffrey E., and Williams Ronald J.. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533536.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Samajdar Ananda, Joseph Jan Moritz, Zhu Yuhao, Whatmough Paul, Mattina Matthew, and Krishna Tushar. 2020. A systematic methodology for characterizing scalability of DNN accelerators using SCALE-Sim. In Proceedings of the 2020 IEEE International Symposium on Performance Analysis of Systems and Software. 5868.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Su Chunyou, Zhou Sheng, Feng Liang, and Zhang Wei. 2020. Towards high performance low bitwidth training for deep neural networks. Journal of Semiconductors 41, 2 (Feb. 2020), 022404.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Tan Mingxing and Le Quoc. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Chaudhuri Kamalika and Salakhutdinov Ruslan (Eds.), Vol. 97. PMLR, 61056114.Google ScholarGoogle Scholar
  52. [52] Tu Fengbin, Yin Shouyi, Ouyang Peng, Tang Shibin, Liu Leibo, and Wei Shaojun. 2017. Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 8 (2017), 22202233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] UCB-BAR. 2018. ChiselTest, a test harness for Chisel-based RTL designs. (2018). Retrieved November 1, 2020 from https://github.com/ucb-bar/chisel-testers2.Google ScholarGoogle Scholar
  54. [54] Wang Maolin, Rasoulinezhad Seyedramin, Leong Philip H. W., and So Hayden K. H.. 2020. NITI: Training Integer Neural Networks Using Integer-only Arithmetic. (2020). arXiv:cs.CV/2009.13108. Retrieved from https://arxiv.org/abs/2009.13108.Google ScholarGoogle Scholar
  55. [55] Wang Yue, Jiang Ziyu, Chen Xiaohan, Xu Pengfei, Zhao Yang, Lin Yingyan, and Wang Zhangyang. 2019. E2-Train: Training state-of-the-art CNNs with over 80% energy savings. In Proceedings of the Advances in Neural Information Processing Systems, Wallach H., Larochelle H., Beygelzimer A., d'Alché-Buc F., Fox E., and Garnett R. (Eds.), Vol. 32. Curran Associates, Inc.Google ScholarGoogle Scholar
  56. [56] Wu Yawen, Wang Zhepeng, Shi Yiyu, and Hu Jingtong. 2020. Enabling on-device CNN training by self-supervised instance filtering and error map pruning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 34453457.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Yang Qiang, Liu Yang, Chen Tianjian, and Tong Yongxin. 2019. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology 10, 2, Article 12 (Jan. 2019), 19 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Yin Shouyi, Ouyang Peng, Tang Shibin, Tu Fengbin, Li Xiudong, Zheng Shixuan, Lu Tianyi, Gu Jiangyuan, Liu Leibo, and Wei Shaojun. 2018. A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE Journal of Solid-State Circuits 53, 4 (2018), 968982.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Efficient-Grad: Efficient Training Deep Convolutional Neural Networks on Edge Devices with Gradient Optimizations

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 21, Issue 2
        March 2022
        187 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3514174
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 February 2022
        • Accepted: 1 December 2021
        • Revised: 1 October 2021
        • Received: 1 June 2021
        Published in tecs Volume 21, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)392
        • Downloads (Last 6 weeks)49

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!