skip to main content
research-article

A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

Published:08 October 2022Publication History
Skip Abstract Section

Abstract

Implementing embedded neural network processing at the edge requires efficient hardware acceleration that combines high computational throughput with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly being adapted to support the improved functionalities. Hardware designers can refer to a myriad of accelerator implementations in the literature to evaluate and compare hardware design choices. However, the sheer number of publications and their diverse optimization directions hinder an effective assessment. Existing surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effects of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress.

In contrast to previous surveys, this work provides a quantitative overview of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. The list of optimizations and their quantitative effects are presented as a construction kit, allowing to assess the design choices for each building block individually. Reported optimizations range from up to 10,000× memory savings to 33× energy reductions, providing chip designers with an overview of design choices for implementing efficient low power neural network accelerators.

REFERENCES

  1. [1] Zhao Z., Zheng P., Xu S., and Wu X.. 2019. Object detection with deep learning: A review. IEEE Trans. Neural Networks and Learning Systems 30 (2019), 32123232.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Zhang Y., Suda N., Lai L., and Chandra V.. 2017. Hello Edge: Keyword spotting on microcontrollers. CoRR, vol. abs/1711.07128, 2017.Google ScholarGoogle Scholar
  3. [3] Krizhevsky A., Sutskever I., and Hinton G. E.. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems - Volume 1, USA, 2012.Google ScholarGoogle Scholar
  4. [4] He K., Zhang X., Ren S., and Sun J.. 2016. Deep residual learning for image recognition. In IEEE Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Huang G., Liu Z., and Weinberger K. Q.. 2016. Densely connected convolutional networks. CoRR, vol. abs/1608.06993, 2016.Google ScholarGoogle Scholar
  6. [6] Koomey J., Berard S., Sanchez M., and Wong H.. 2011. Implications of historical trends in the electrical efficiency of computing. IEEE Annals of the History of Computing 33 (2011), 4654.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Gubbi J., Buyya R., Marusic S., and Palaniswami M.. 2013. Internet of Things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems 29 (2013), 16451660.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Shi W., Cao J., Zhang Q., Li Y., and Xu L.. 2016. Edge computing: Vision and challenges. IEEE Internet of Things J. 3 (2016), 637646.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Zhou Z., Chen X., Li E., Zeng L., Luo K., and Zhang J.. 2019. Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proceedings of the IEEE 107 (2019), 17381762.Google ScholarGoogle Scholar
  10. [10] Hong I. et al. 2016. A 2.71 nJ/Pixel Gaze-Activated object recognition system for low-power mobile smart glasses. IEEE Journal of Solid-State Circuits 51 (2016), 4555.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] LiKamWa R., Wang Z., Carroll A., Lin F. X., and Zhong L.. 2014. Draining our glass: An energy and heat characterization of Google glass. In 5th Asia-Pacific Workshop on Systems, New York, NY, USA, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Park S., Choi S., Lee J., Kim M., Park J., and Yoo H.. 2016. 14.1 A 126.1mW real-time natural UI/UX processor with embedded deep-learning core for low-power smart glasses. In Int. Solid-State Circuits Conference, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Google, Google Clips specifications, 2017. [Online]. Available: https://support.google.com/googleclips/answer/7545447?hl=en. [Accessed 21 05 2019].Google ScholarGoogle Scholar
  14. [14] Xiaomi, Xiaomo AI door bell overview, 2019. [Online]. Available: https://www.xiaomitoday.com/2019/10/29/xiaomi-xiaomo-mijia-ai-face-identifcation-1080p-door-bell/. [Accessed 09 02 2021].Google ScholarGoogle Scholar
  15. [15] Orcam, Orcam MyEye 2 specifications, 2020. [Online]. Available: https://www.orcam.com/en/myeye2/specification [Accessed 09 02 2021].Google ScholarGoogle Scholar
  16. [16] Alyamkin S. et al. 2019. Low-power computer vision: Status, challenges, and opportunities. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9 (2019), 411421.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Oh S. et al. 2019. IoT2 — The Internet of Tiny Things: Realizing mm-scale sensors through 3D die stacking. In Design, Automation Test in Europe Conference Exhibition, 2019.Google ScholarGoogle Scholar
  18. [18] Ilas M. E. and Ilas C.. 2020. Towards real-time and real-life image classification and detection using CNN: A review of practical applications requirements, algorithms, hardware and current trends. In Int. Symp. for Design and Technology in Electronic Packaging, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Giraldo J. S. P., Lauwereins S., Badami K., Hamme H. V., and Verhelst M.. 2019. 18μW SoC for near-microphone keyword spotting and speaker verification. In Symposium on VLSI Circuits, 2019.Google ScholarGoogle Scholar
  20. [20] Norrie T. et al. 2021. The design process for Google's training chips: TPUv2 and TPUv3. IEEE Micro 41 (2021), 5663, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Jia Z., Tillman B., Maggioni M., and Scarpazza D. P.. 2019. Dissecting the graphcore IPU architecture via microbenchmarking. CoRR, vol. abs/1912.03413, 2019.Google ScholarGoogle Scholar
  22. [22] Guo K. et al. 2020. Neural network accelerator comparison. 2020. [Online]. Available: http://nicsefc.ee.tsinghua.edu.cn/projects/neural-network-accelerator/. [Accessed 22 03 2021].Google ScholarGoogle Scholar
  23. [23] Reuther A. et al. 2019. Survey and benchmarking of machine learning accelerators. In IEEE High Perf. Extreme Computing Conf., 2019.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Reuther A. et al. 2020. Survey of machine learning accelerators. In IEEE High Perf. Extreme Computing Conf., 2020.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Schuman C. D. et al. 2017. A survey of neuromorphic computing and neural networks in hardware. CoRR, vol. abs/1705.06963, 2017.Google ScholarGoogle Scholar
  26. [26] Sze V., Chen Y.-H., Yang T.-J., and Emer J. S.. 2020. Efficient processing of deep neural networks. Synthesis Lectures on Comp. Architecture, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Sze V., Chen Y., Yang T., and Emer J. S.. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proceedings of the IEEE 105 (2017), 22952329.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Sze V., Chen Y., Emer J., Suleiman A., and Zhang Z.. 2017. Hardware for machine learning: Challenges and opportunities. In IEEE Custom Integrated Circuits Conference, 2017.Google ScholarGoogle Scholar
  29. [29] Bodiwala S. and Nanavati N.. 2020. Efficient hardware implementations of deep neural networks: A survey. In International Conference on Inventive Systems and Control, 2020.Google ScholarGoogle Scholar
  30. [30] Mittal S.. 2018. A survey of FPGA-based accelerators for convolutional neural networks. Neural Computing and Applications, 2018.Google ScholarGoogle Scholar
  31. [31] Capra M. et al. 2020. Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead. IEEE Access 8 (2020), 225134225180.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Gadepally V. et al. 2019. AI enabling technologies: A survey. CoRR, vol. abs/1905.03592, 2019.Google ScholarGoogle Scholar
  33. [33] Tang J. et al. 2019. Bridging biological and artificial neural networks with emerging neuromorphic devices: Fundamentals, progress, and challenges. Advanced Materials 31 (2019), 1902761.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Lee K. J., Lee J., Choi S., and Yoo H. J.. 2020. The development of silicon for AI: Different design approaches. IEEE Trans. Circuits and Systems I: Regular Papers 67 (2020), 47194732.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Chen Y., Xie Y., Song L., Chen, F. and Tang T.. 2020. A survey of accelerator architectures for deep neural networks. Engineering 6 (2020), 264274.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Xu D. et al. 2003. Edge intelligence: Architectures, challenges, and applications. CoRR, vol. abs/2003.12172, 2020.Google ScholarGoogle Scholar
  37. [37] Du Z. et al. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Int. Symposium on Computer Architecture, 2015.Google ScholarGoogle Scholar
  38. [38] Han S. et al. 2016. EIE: Efficient inference engine on compressed deep neural network. In Int. Symposium on Computer Architecture, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Moons B., Uytterhoeven R., Dehaene, W. and Verhelst M.. 2017. 14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm FDSOI. In IEEE Int. Solid-State Circuits Conference, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Chen Y. H., Krishna T., Emer J. S., and Sze V.. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. of Solid-State Circuits 52, 1 (2017), 127138.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Andri R., Cavigelli L., Rossi, D. and Benini L.. 2018. YodaNN: An architecture for ultralow power binary-weight CNN acceleration. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 37 (2018), 4860.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Lee J. et al. 2019. UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision. IEEE J. Solid-State Circ. 54 (2019), 173185.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Chen Y., Yang T., Emer J., and Sze V.. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9 (2019), 292308.Google ScholarGoogle Scholar
  44. [44] Pramanik P. K. D. et al. 2019. Power consumption analysis, measurement, management, and issues: A state-of-the-art review of smartphone battery and energy usage. IEEE Access 7 (2019), 182113182172.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Miro-Panades I. et al. 2020. SamurAI: A 1.7MOPS-36GOPS adaptive versatile IoT node with 15,000× peak-to-idle power reduction, 207ns wake-up time and 1.3TOPS/W ML efficiency. In IEEE Symposium on VLSI Circuits, 2020.Google ScholarGoogle Scholar
  46. [46] Kim J.-H., Kim C., Kim K., and Yoo H.-J.. 2019. An ultra-low-power analog-digital hybrid CNN face recognition processor integrated with a CIS for always-on mobile devices. In IEEE Int. Symposium on Circuits and Systems, 2019.Google ScholarGoogle Scholar
  47. [47] Gysel P., Motamedi M., and Ghiasi S.. 2016. Hardware-oriented approximation of convolutional neural networks. CoRR, vol. abs/1604.03168, 2016.Google ScholarGoogle Scholar
  48. [48] Gholami A. et al. 2021. A survey of quantization methods for efficient neural network inference. CoRR, 2021.Google ScholarGoogle Scholar
  49. [49] Ignatov A. et al. 2019. AI benchmark: All about deep learning on smartphones in 2019. CoRR, vol. abs/1910.06663, 2019.Google ScholarGoogle Scholar
  50. [50] Reddi V. J. et al. 2019. MLPerf inference benchmark. CoRR, vol. abs/1911.02549, 2019.Google ScholarGoogle Scholar
  51. [51] Banbury C. R. et al. 2020. Benchmarking TinyML systems: Challenges and direction. CoRR, vol. abs/2003.04821, 3 2020.Google ScholarGoogle Scholar
  52. [52] EEMBC, Exploring CoreMark – A Benchmark Maximizing Simplicity and Efficacy, 2009. [Online] Available: https://www.eembc.org/techlit/articles/coremark-whitepaper.pdf. [Accessed 03 02 2021].Google ScholarGoogle Scholar
  53. [53] Williams S., Waterman A., and Patterson D.. 2019. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52, 4 (2019), 6576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Neumann J.. 1993. First draft of a report on the EDVAC. IEEE Annals of the History of Computing 15 (1993), 2775.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Silc J., Robic B., and Ungerer T.. 1999. Multiple-issue processors. In Processor Architecture: From Dataflow to Superscalar and Beyond, Berlin: Springer Berlin, 1999, 123219.Google ScholarGoogle Scholar
  56. [56] Jouppi N. P. et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Int. Symposium on Computer Architecture, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Chen Y., Emer, J. and Sze V.. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In International Symposium on Computer Architecture, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Liu Z., Whatmough, P. N. and Mattina M.. 2020. Systolic tensor array: An efficient structured-sparse GEMM accelerator for mobile CNN inference. IEEE Comp. Architecture Letters 19 (2020), 3437.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Samajdar A., Zhu Y., Whatmough P. N., Mattina M., and Krishna T.. 2018. SCALE-Sim: Systolic CNN Accelerator. CoRR, vol. abs/1811.02883, 2018.Google ScholarGoogle Scholar
  60. [60] Yang T.-J. and Sze V.. 2019. Design considerations for efficient deep neural networks on processing-in-memory accelerators. In IEEE International Electron Devices Meeting, 2019.Google ScholarGoogle Scholar
  61. [61] McKee S. A.. 2004. Reflections on the Memory Wall. In Proceedings of the 1st Conference on Computing Frontiers, New York, NY, USA, 2004.Google ScholarGoogle Scholar
  62. [62] Yu S., Sun X., Peng, X. and Huang S.. 2020. Compute-in-Memory with emerging nonvolatile-memories: Challenges and prospects. In IEEE Custom Integrated Circuits Conference, 2020.Google ScholarGoogle Scholar
  63. [63] Si X. et al. 2019. 24.5 A Twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning. In IEEE International Solid- State Circuits Conference, 2019.Google ScholarGoogle Scholar
  64. [64] Zhang J., Wang, Z. and Verma N.. 2017. In-memory computation of a machine-learning classifier in a standard 6T SRAM array. IEEE Journal of Solid-State Circuits 52 (2017), 915924.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Kang M., Lim S., Gonugondla, S. and Shanbhag N. R.. 2018. An in-memory VLSI architecture for convolutional neural networks. IEEE J. Emerging and Sel. Topics in Circ. and Sys. 8 (2018), 494505.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Su J.-W. et al. 2021. 16.3 A 28nm 384kb 6T-SRAM computation-in-memory macro with 8b precision for AI edge chips. In 2021 IEEE International Solid- State Circuits Conference, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Yu S.. 2018. Neuro-inspired computing with emerging nonvolatile memorys. Proceedings of the IEEE 106 (2018), 260285.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Cai F. et al. 2019. A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nature Electronics 2 (2019) 1, 7 2019.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Dash S., Luo Y., Lu A., Yu S., and Mukhopadhyay S.. 2021. Robust processing-in-memory with multi-bit ReRAM using Hessian-driven mixed-precision computation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11, 2021.Google ScholarGoogle Scholar
  70. [70] Xue C. et al. 2019. 24.1 A 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors. In IEEE Int. Solid- State Circuits Conference, 2019.Google ScholarGoogle Scholar
  71. [71] Yin S. et al. 2019. Monolithically integrated RRAM- and CMOS-based in-memory computing optimizations for efficient deep learning. IEEE Micro 39 (2019), 5463.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Murali G., Sun X., Yu S., and Lim S. K.. 2021. Heterogeneous mixed-signal monolithic 3-D in-memory computing using resistive RAM. IEEE Trans. Very Large Scale Integration Systems 29 (2021), 386396.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. [73] Peng X., Huang S., Luo Y., Sun, X. and Yu S.. 2019. DNN+NeuroSim: An end-to-end benchmarking framework for compute-in-memory accelerators with versatile device technologies. In IEEE International Electron Devices Meeting, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Cho S., Choi H., Park E., Shin H., and Yoo S.. 2020. McDRAM v2: In-dynamic random access memory systolic array accelerator to address the large model problem in deep neural networks on the edge. IEEE Access 8 (2020), 135223135243.Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] G. E. Moore. Cramming more components onto integrated circuits. Reprinted from Electronics 38, 8, (1965), 114. IEEE Solid-State Circuits Society Newsletter, vol. 11, pp. 33--35, 2006.Google ScholarGoogle Scholar
  76. [76] Williams R. S.. 2017. What's next? [The end of Moore's law]. Computing in Science Engineering 19 (2017), 713.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Bohr M.. 2007. A 30 year retrospective on Dennard's MOSFET scaling paper. IEEE Solid-State Circ. Society Newsl. 12 (2007), 1113.Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Bose P.. 2011. Power Wall. In Encyclopedia of Parallel Computing, D. Padua, Hrsg., Boston, MA: Springer US, 2011, 15931608.Google ScholarGoogle Scholar
  79. [79] Horowitz M.. 2014. 1.1 Computing's energy problem (and what we can do about it). In IEEE International Solid-State Circuits Conference, 2014.Google ScholarGoogle Scholar
  80. [80] Lee Y. et al. 2013. A modular 1 mm³die-stacked sensing platform with low power I²C inter-die communication and multi-modal energy harvesting. IEEE J. Solid-State Circuits 48, 1 (2013), 229243.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Ueyoshi K. et al. 2018. QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS. In IEEE Int. Solid-State Circuits Conference, 2018.Google ScholarGoogle Scholar
  82. [82] IRDS. 2021. International Roadmap for Devices and Systems: 2020 Update, 2020. [Online] Available: https://irds.ieee.org/editions/2020. [Accessed 11 05 2021].Google ScholarGoogle Scholar
  83. [83] Auth C. et al. 2012. A 22nm high performance and low-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors. In Symp. on VLSI Technology, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  84. [84] Patel H. N. et al. 2016. A 55nm ultra low leakage deeply depleted Channel technology optimized for energy minimization in subthreshold SRAM and logic. In European Solid-State Circuits Conference, 2016.Google ScholarGoogle Scholar
  85. [85] Natarajan S. et al. 2014. A 14nm logic technology featuring 2nd-generation FinFET, air-gapped interconnects, self-aligned double patterning and a 0.0588 μm2 SRAM cell size. In IEEE Int. Electron Dev. Meeting, 2014.Google ScholarGoogle Scholar
  86. [86] Weber O.. 2017. FDSOI vs FinFET: Differentiating device features for ultra low power IoT applications. In IEEE Int. Conf. on IC Design and Technology, 2017.Google ScholarGoogle Scholar
  87. [87] Carter R. et al. 2016. 22nm FDSOI technology for emerging mobile, Internet-of-Things, and RF applications. In IEEE Int. Electron Dev. Meet., 2016.Google ScholarGoogle Scholar
  88. [88] Lee J.-H., Lee J.-W., Jung, H.-A.-R. and Choi B.-K.. 2009. Comparison of SOI FinFETs and Bulk FinFETs. ECS Trans. 19 (2009), 101112.Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Pons M. et al. 2013. Ultra low-power standard cell design using planar bulk CMOS in subthreshold operation. In International Workshop on Power and Timing Modeling, Optimization and Simulation, 2013.Google ScholarGoogle Scholar
  90. [90] Pons M. et al. 2015. A 1kb single-side read 6T sub-threshold SRAM in 180 nm with 530 Hz frequency 3.1 nA total current and 2.4 nA leakage at 0.27 V. In IEEE SOI-3D-Subthr. Microel. Tech. Unified Conf., 2015.Google ScholarGoogle Scholar
  91. [91] Pons M. et al. 2016. Sub-threshold latch-based icyflex2 32-bit processor with wide supply range operation. In Europ. Solid-State Circuits Conf., 2016.Google ScholarGoogle Scholar
  92. [92] Pons M. et al. 2019. A 0.5 V 2.5 μW/MHz microcontroller with analog-assisted adaptive body bias PVT compensation with 3.13nW/kB SRAM retention in 55nm deeply-depleted Channel CMOS. In IEEE Custom Integrated Circuits Conference, 2019.Google ScholarGoogle Scholar
  93. [93] Müller T. C. et al. 2017. PVT compensation in Mie Fujitsu 55 nm DDC: A standard-cell library based comparison. In IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  94. [94] Weiser M., Welch B., Demers, A. and Shenker S.. 1994. Scheduling for reduced CPU ewnergy. In USENIX Conference on Operating Systems Design and Implementation, USA, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. [95] Burd T. D. and Brodersen R. W.. 1996. Processor design for portable systems. In Technologies for Wireless Computing, USA, Kluwer Academic Publishers, 1996, 203221.Google ScholarGoogle Scholar
  96. [96] Moons B. and Verhelst M.. 2015. DVAS: Dynamic voltage accuracy scaling for increased energy-efficiency in approximate computing. In Int. Symposium on Low Power Electronics and Design, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  97. [97] Yang X. et al. 2016. A systematic approach to blocking convolutional neural networks. CoRR, vol. abs/1606.04209, 2016.Google ScholarGoogle Scholar
  98. [98] Chen A.. 2016. A review of emerging non-volatile memory (NVM) technologies and applications. Solid-State Electronics 125 (2016), 2538.Google ScholarGoogle Scholar
  99. [99] Park J.. 2020. Neuromorphic computing using emerging synaptic devices: A retrospective summary and an outlook. Electronics 9, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  100. [100] Wu J. Y. et al. 2018. A 40nm low-power logic compatible phase change memory technology. In IEEE Int. Electron Devices Meeting, 2018.Google ScholarGoogle Scholar
  101. [101] Lee K. et al. 2019. 1Gbit high density embedded STT-MRAM in 28nm FDSOI technology. In IEEE Int. Electron Devices Meeting, 2019.Google ScholarGoogle Scholar
  102. [102] Wei L. et al. 2019. 13.3 A 7Mb STT-MRAM in 22FFL FinFET technology with 4ns read sensing time at 0.9V using write-verify-write scheme and offset-cancellation sensing technique. In IEEE International Solid- State Circuits Conference, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  103. [103] Song Y. J. et al. 2018. Demonstration of highly manufacturable STT-MRAM embedded in 28nm logic. In IEEE Int. Electron Devices Meeting, 2018.Google ScholarGoogle Scholar
  104. [104] Liu T. et al. 2013. A 130.7mm2 2-layer 32Gb ReRAM memory device in 24nm technology. In IEEE Int. Solid-State Circuits Conference, 2013.Google ScholarGoogle Scholar
  105. [105] Chou C. et al. 2018. An N40 256K×44 embedded RRAM macro with SL-precharge SA and low-voltage current limiter to improve read and write performance. In IEEE Int. Solid - State Circuits Conference, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  106. [106] Jain P. et al. 2019. 13.2 A 3.6Mb 10.1Mb/mm2 embedded non-volatile ReRAM macro in 22nm FinFET technology with adaptive forming/set/reset schemes yielding down to 0.5V with sensing time of 5ns at 0.7V. In IEEE Int. Solid- State Circuits Conference, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  107. [107] Yang J. et al. 2021. 24.2 A 14nm-FinFET 1Mb embedded 1T1R RRAM with a 0.022μ m2 cell size using self-adaptive delayed termination and multi-cell reference. In IEEE Int. Solid- State Circuits Conf., 2021.Google ScholarGoogle Scholar
  108. [108] Beyer S. et al. 2020. FeFET: A versatile CMOS compatible device with game-changing potential. In IEEE Int. Memory Workshop, 2020.Google ScholarGoogle Scholar
  109. [109] Reis D. et al. 2019. Design and analysis of an ultra-dense, low-leakage, and fast FeFET-based random access memory array. IEEE J. Exploratory Solid-State Comp. Dev. and Circ. 5 (2019), 103112.Google ScholarGoogle ScholarCross RefCross Ref
  110. [110] Bang S. et al. 2017. 14.7 A 288μW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In IEEE Int. Solid-State Circuits Conf., 2017.Google ScholarGoogle Scholar
  111. [111] Lee J. et al. 2020. AμProcessor layer for mm-scale die-stacked sensing platforms featuring ultra-low power sleep mode at 125°C. In 2020 IEEE Asian Solid-State Circuits Conference, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  112. [112] Bong K., Choi S., Kim C., Han, D. and Yoo H. J.. 2018. A low-power convolutional neural network face recognition processor and a CIS integrated with always-on face detector. IEEE Journal of Solid-State Circuits 53, 1 2018, 115123.Google ScholarGoogle ScholarCross RefCross Ref
  113. [113] Teman A., Rossi D., Meinerzhagen P., Benini, L. and Burg A.. 2016. Power, area, and performance optimization of standard cell memory arrays through controlled placement. ACM Trans. Des. Autom. Electron. Syst. 21, 5 (2016).Google ScholarGoogle Scholar
  114. [114] Conti F., Schiavone, P. Davide and Benini L.. 2018. XNOR Neural Engine: A hardware accelerator IP for 21.6 fJ/op binary neural network inference. ArXiv e-prints, 7 2018.Google ScholarGoogle Scholar
  115. [115] Jeddeloh J. and Keeth B.. 2012. Hybrid memory cube new DRAM architecture increases density and performance. In Symposium on VLSI Technology, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  116. [116] Gao M., Pu J., Yang X., Horowitz, M. and Kozyrakis C.. 2017. TETRIS: Scalable and efficient neural network acceleration with 3D memory. SIGARCH Comput. Archit. News 45, 4 (2017), 751764.Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. [117] Giterman R., Fish A., Burg A., and Teman A.. 2018. A 4-transistor nMOS-only logic-compatible gain-cell embedded DRAM with over 1.6-ms retention time at 700 mV in 28-nm FD-SOI. IEEE Transactions on Circuits and Systems I: Regular Papers 65, 4 (2018), 12451256.Google ScholarGoogle ScholarCross RefCross Ref
  118. [118] Giterman R., Teman A., and Fish A.. 2018. A 14.3pW sub-threshold 2T gain-cell eDRAM for ultra-low power IoT applications in 28nm FD-SOI. In IEEE SOI-3D-Subthreshold Microel. Tech. Unified Conf., 2018.Google ScholarGoogle Scholar
  119. [119] Pavan P., Bez R., Olivo P., and Zanoni E.. 1997. Flash memory cells-an overview. Proceedings of the IEEE 85 (1997), 12481271.Google ScholarGoogle ScholarCross RefCross Ref
  120. [120] Cypress, SONOS flash technology, 2019. [Online]. Available: https://www.cypress.com/file/123341/download. [Accessed 21 03 2021].Google ScholarGoogle Scholar
  121. [121] Goetschalckx K. and Verhelst M.. 2019. Breaking high-resolution CNN bandwidth barriers with enhanced depth-first execution. IEEE J. on Emerging and Sel. Topics in Circ. and Sys. 9 (2019), 323331.Google ScholarGoogle ScholarCross RefCross Ref
  122. [122] Colleman S. and Verhelst M.. 2021. High-utilization, high-flexibility depth-first CNN coprocessor for image pixel processing on FPGA. IEEE Trans. Very Large Scale Integration Systems 29 (2021), 461471.Google ScholarGoogle ScholarCross RefCross Ref
  123. [123] Alwani M., Chen H., Ferdman, M. and Milder P.. 2016. Fused-layer CNN accelerators. In IEEE/ACM Int. Symp. on Microarchitecture, 2016.Google ScholarGoogle Scholar
  124. [124] Scherer M., Rutishauser G., Cavigelli L., and Benini L.. 2021. CUTIE: Beyond PetaOp/s/W Ternary DNN inference acceleration with better-than-binary energy efficiency. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 11, 2021.Google ScholarGoogle Scholar
  125. [125] Stoutchinin A., Conti, F. and Benini L.. 2019. Optimally scheduling CNN convolutions for efficient memory access. CoRR, vol. abs/1902.01492, 2019.Google ScholarGoogle Scholar
  126. [126] Siu K., Stuart D. M., Mahmoud, M. and Moshovos A.. 2018. Memory requirements for convolutional neural network hardware accelerators. In IEEE Int. Symp. on Workload Characterization, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  127. [127] Zheng S. et al. 2020. Efficient scheduling of irregular network structures on CNN accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39 (2020), 34083419.Google ScholarGoogle ScholarCross RefCross Ref
  128. [128] Lai L., Suda, N. and Chandra V.. 2018. CMSIS-NN: Efficient neural network kernels for Arm Cortex-M CPUs. CoRR, vol. abs/1801.06601, 2018.Google ScholarGoogle Scholar
  129. [129] De Giovanni E. et al. 2020. Modular design and optimization of biomedical applications for ultralow power heterogeneous platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39 (2020), 38213832.Google ScholarGoogle ScholarCross RefCross Ref
  130. [130] Chen H. G. et al. 2016. ASP Vision: Optically computing the first layer of convolutional neural networks using angle sensitive pixels. CoRR, vol. abs/1605.03621, 2016.Google ScholarGoogle Scholar
  131. [131] Pad P. et al. 2020. Efficient neural vision systems based on convolutional image acquisition. In Conf. on Comp. Vision and Pattern Recog., 2020.Google ScholarGoogle Scholar
  132. [132] Wang Z., Zhang, J. and Verma N.. 2015. Realizing low-energy classification systems by implementing matrix multiplication directly within an ADC. IEEE Trans. Biomed. Circuits Syst. 9 (2015), 825837.Google ScholarGoogle Scholar
  133. [133] LiKamWa R., Hou Y., Gao Y., Polansky, M. and Zhong L.. 2016. RedEye: Analog ConvNet image sensor architecture for continuous mobile vision. In ACM/IEEE Int. Symp. on Computer Architecture, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. [134] Teerapittayanon S., McDanel, B. and Kung H. T.. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In IEEE Int. Conference on Distributed Computing Systems, 2017.Google ScholarGoogle Scholar
  135. [135] Zhao Z., Barijough, K. M. and Gerstlauer A.. 2018. DeepThings: Distributed adaptive deep learning inference on resource-constrained IoT edge clusters. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37 (2018), 23482359.Google ScholarGoogle Scholar
  136. [136] Teerapittayanon S., McDanel, B. and Kung H. T.. 2017. BranchyNet: Fast inference via early exiting from deep neural networks. CoRR, vol. abs/1709.01686, 2017.Google ScholarGoogle Scholar
  137. [137] Panda P., Sengupta A. and Roy K.. 2016. Conditional deep learning for energy-efficient and enhanced pattern recognition. In 2016 Design, Automation Test in Europe Conference Exhibition, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. [138] Bernardo P. P., Gerum C., Frischknecht A., Lübeck, K. and Bringmann O.. 2020. UltraTrail: A configurable ultralow-power TC-ResNet AI accelerator for efficient keyword spotting. IEEE Trans. on Computer-Aided Design of Integrated Circ. and Sys. 39 (2020), 42404251.Google ScholarGoogle ScholarCross RefCross Ref
  139. [139] Liu L. and Deng J.. 2017. Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution. CoRR, vol. abs/1701.00299, 2017.Google ScholarGoogle Scholar
  140. [140] Venkataramani S., Raghunathan A., Liu, J. and Shoaib M.. 2015. Scalable-effort classifiers for energy-efficient machine learning. In Design Automation Conference, New York, NY, USA, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. [141] Jokic P., Emery, S. and Benini L.. 2020. Improving memory utilization in convolutional neural network accelerators. IEEE Embedded Systems Letters, 11, 2020.Google ScholarGoogle Scholar
  142. [142] Han S., Mao H., and Dally W. J.. 2016. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. In Int. Conf. on Learning Representations, 2016.Google ScholarGoogle Scholar
  143. [143] Liang Y., Lu L., Xiao Q., and Yan S.. 2020. Evaluating fast algorithms for convolutional neural networks on FPGAs. IEEE Trans. on Computer-Aided Design of Integrated Circ. and Sys. 39 (2020), 857870.Google ScholarGoogle ScholarCross RefCross Ref
  144. [144] Lavin A. and Gray S.. 2016. Fast algorithms for convolutional neural networks. In CVPR, 2016.Google ScholarGoogle Scholar
  145. [145] Howard A. G. et al. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR, vol. abs/1704.04861, 2017.Google ScholarGoogle Scholar
  146. [146] Mathieu M., Henaff, M. and LeCun Y.. 2014. Fast training of convolutional networks through FFTs. In Int. Conference on Learning Representations, Banff, AB, Canada, 2014.Google ScholarGoogle Scholar
  147. [147] Zhang Y. and Li X.. 2020. Fast convolutional neural networks with fine-grained FFTs. In International Conference on Parallel Architectures and Compilation Techniques, New York, NY, USA, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. [148] Cong J. and Xiao B.. 2014. Minimizing computation in convolutional neural networks. In Artificial Neural Networks and Machine Learning – ICANN 2014, Cham, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  149. [149] LeCun Y., Denker, J. S. and Solla S. A.. 1989. Optimal brain damage. In NIPS, 1989.Google ScholarGoogle Scholar
  150. [150] Hoefler T., Alistarh D., Ben-Nun T., Dryden N., and Peste A.. 2021. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. CoRR, vol. abs/2102.00554, 2021.Google ScholarGoogle Scholar
  151. [151] Molchanov D., Ashukha, A. and Vetrov D.. 2017. Variational dropout sparsifies deep neural networks. In International Conference on Machine Learning, 2017.Google ScholarGoogle Scholar
  152. [152] Elsken T., Metzen J. H., and Hutter F.. 2019. Neural architecture search: A survey. J. Mach. Learn. Res. 20, 1 (2019), 19972017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. [153] Sainath T. N., Kingsbury B., Sindhwani V., Arisoy, E. and Ramabhadran B.. 2013. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  154. [154] Plummer B. A., Dryden N., Frost J., Hoefler, T. and Saenko K.. 2020. Shapeshifter networks: Decoupling layers from parameters for scalable and effective deep learning. CoRR, vol. abs/2006.10598, 2020.Google ScholarGoogle Scholar
  155. [155] Deng L., Li G., Han S., Shi, L. and Xie Y.. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proceedings of the IEEE 108 (2020), 485532.Google ScholarGoogle ScholarCross RefCross Ref
  156. [156] Han S., Pool J., Tran J., and Dally W.. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, 2015.Google ScholarGoogle Scholar
  157. [157] Yang T., Chen, Y. and Sze V.. 2017. Designing energy-efficient convolutional neural networks using energy-aware pruning. In IEEE Conference on Computer Vision and Pattern Recognition, 2017.Google ScholarGoogle Scholar
  158. [158] Aimar A. et al. 2019. NullHop: A flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. on Neural Netw. and Learning Sys. 30 (2019), 644656.Google ScholarGoogle ScholarCross RefCross Ref
  159. [159] Albericio J. et al. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Int. Symp. on Computer Architecture, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. [160] Cavigelli L., Degen P., and Benini L.. 2017. CBinfer: Change-based inference for convolutional neural networks on video data. CoRR, vol. abs/1704.04313, 2017.Google ScholarGoogle Scholar
  161. [161] Gokhale V., Jin J., Dundar A., Martini B., and Culurciello E.. 2014. A 240 G-ops/s mobile coprocessor for deep neural networks. In IEEE Conf. on Computer Vision and Pattern Recognition Workshops, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. [162] Chen T. et al. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. SIGARCH Comput. Archit. News. 42, 2 (2014), 269284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. [163] Yuan Z., Liu Y., Yue J., Li, J. and Yang H.. 2017. CORAL: Coarse-grained reconfigurable architecture for convolutional neural networks. In IEEE ISLPED, 2017.Google ScholarGoogle Scholar
  164. [164] Andri R., Cavigelli L., Rossi D., and Benini L.. 2019. Hyperdrive: A multi-chip systolically scalable binary-weight CNN inference engine. IEEE J. on Emerg. and Sel. Topics in Circ. and Sys. 9 (2019), 309322.Google ScholarGoogle ScholarCross RefCross Ref
  165. [165] Sinz F. H., Pitkow X., Reimer J., Bethge, M. and Tolias A. S.. 2019. Engineering a \less artificial intelligence. Neuron 103 (2019), 967979.Google ScholarGoogle ScholarCross RefCross Ref
  166. [166] Iandola F. N. et al. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR, vol. abs/1602.07360, 2016.Google ScholarGoogle Scholar
  167. [167] Liu S. et al. 2018. On-demand deep model compression for mobile devices: A usage-driven model selection framework. In Int. Conf. on Mobile Systems, Applications, and Services, New York, NY, USA, 2018.Google ScholarGoogle Scholar
  168. [168] Cai H., Gan C., Wang T., Zhang, Z. and Han S.. 2020. Once for all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations, 2020.Google ScholarGoogle Scholar
  169. [169] Alioto M., De V., and Marongiu A.. 2018. Energy-quality scalable integrated circuits and systems: continuing energy scaling in the twilight of Moore's law. IEEE J. on Emerging and Selected Topics in Circuits and Systems 8 (2018), 653678.Google ScholarGoogle ScholarCross RefCross Ref
  170. [170] Teo J. H., Cheng S., and Alioto M.. 2020. Low-energy voice activity detection via energy-quality scaling from data conversion to machine learning. IEEE Trans. on Circ. and Sys. 67 (2020), 13781388.Google ScholarGoogle Scholar
  171. [171] Alvarez A., Ponnusamy G., and Alioto M.. 2020. Energy-quality scalable memory-frugal feature extraction for always-on deep sub-mW distributed vision. IEEE Access 8 (2020), 1895118961.Google ScholarGoogle ScholarCross RefCross Ref
  172. [172] Yin P. et al. 2019. Understanding straight-through estimator in training activation quantized neural nets. CoRR, vol. abs/1903.05662, 2019.Google ScholarGoogle Scholar
  173. [173] Sung W., Shin, S. and Hwang K.. 2015. Resiliency of deep neural networks under quantization. CoRR, vol. abs/1511.06488, 2015.Google ScholarGoogle Scholar
  174. [174] Courbariaux M. and Bengio Y.. 2016. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or −1. CoRR, vol. abs/1602.02830, 2016.Google ScholarGoogle Scholar
  175. [175] Gao M., Wang Q., Nagendra, A. S. K. and Qu G.. 2017. A novel data format for approximate arithmetic computing. In Asia and South Pacific Design Automation Conference, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  176. [176] Shin D., Lee J., Lee J., Lee J., and Yoo H.. 2018. DNPU: An energy-efficient deep-learning processor with heterogeneous multi-core architecture. IEEE Micro 38 (2018), 8593.Google ScholarGoogle ScholarDigital LibraryDigital Library
  177. [177] Camus V., Mei L., Enz C., and Verhelst M.. 2019. Review and benchmarking of precision-scalable multiply-accumulate unit architectures for embedded neural-network processing. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9 (2019), 697711.Google ScholarGoogle ScholarCross RefCross Ref
  178. [178] Lin D. D., Talathi S. S. and Annapureddy V. S.. 2016. Fixed point quantization of deep convolutional networks. In Int. Conf. on Machine Learning - Volume 48, New York, NY, USA, 2016.Google ScholarGoogle Scholar
  179. [179] Mishra A. K. and Marr D.. 2017. Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. CoRR, vol. abs/1711.05852, 2017.Google ScholarGoogle Scholar
  180. [180] Venkataramani S., Chakradhar S. T., Roy K., and Raghunathan A.. 2015. Approximate computing and the quest for computing efficiency. In Design Automation Conference, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  181. [181] Gao M., Wang Q., Arafin M. T., Lyu, Y. and Qu G.. 2017. Approximate computing for low power and security in the Internet of Things. Computer 50 (2017), 2734.Google ScholarGoogle ScholarDigital LibraryDigital Library
  182. [182] Jiang H., Santiago F. J. H., Mo H., Liu, L. and Han J.. 2020. Approximate arithmetic circuits: A survey, characterization, and recent applications. Proceedings of the IEEE 108 (2020), 21082135.Google ScholarGoogle ScholarCross RefCross Ref
  183. [183] Tajasob S., Rezaalipour M., Dehyadegari, M. and Bojnordi M. N.. 2018. Designing efficient imprecise adders using multi-bit approximate building blocks. In Int. Symp. on Low Power Electronics and Design, New York, NY, USA, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  184. [184] Yang L., Bankman D., Moons B., Verhelst, M. and Murmann B.. 2018. Bit error tolerance of a CIFAR-10 binarized convolutional neural network processor. In IEEE Int. Symp. on Circuits and Systems, 2018.Google ScholarGoogle Scholar
  185. [185] Sousa L.. 2021. Nonconventional computer arithmetic circuits, systems and applications. IEEE Circ. and Sys. Magazine, 21 (2021), 640.Google ScholarGoogle ScholarCross RefCross Ref
  186. [186] Popoff Y. et al. 2016. High-efficiency logarithmic number unit design based on an improved cotransformation scheme. In Design, Automation Test in Europe Conference Exhibition, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  187. [187] Roy K., Jaiswal, A. and Panda P.. 2019. Towards spike-based machine intelligence with neuromorphic computing. Nature 575 (2019), 607—617.Google ScholarGoogle ScholarCross RefCross Ref
  188. [188] Akopyan F. et al. 2015. TrueNorth: Design and tool flow of a 65 mW 1 million neuron programmable neurosynaptic chip. Trans. on Comp.-Aided Design of Integr. Circ. and Sys. 34 (2015), 15371557.Google ScholarGoogle ScholarDigital LibraryDigital Library
  189. [189] Davies M. et al. 2018. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 38 (2018), 8299.Google ScholarGoogle ScholarCross RefCross Ref
  190. [190] Paugam-Moisy H. and Bohte S. M.. 2012. Computing with spiking neuron networks. In Handbook of Natural Computing, G. Rozenberg, T. Back and J. Kok, Hrsg., (Eds.). Springer-Verlag, 335376.Google ScholarGoogle ScholarCross RefCross Ref
  191. [191] Jang H., Simeone O., Gardner, B. and Gruning A.. 2019. An introduction to probabilistic spiking neural networks: Probabilistic models, learning rules, and applications. Sig. Proc. Mag. 36 (2019), 6477.Google ScholarGoogle ScholarCross RefCross Ref
  192. [192] Kanerva and Pentti. 2009. Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors. Cognitive Computation 1, 6 (2009).Google ScholarGoogle Scholar
  193. [193] Murmann B.. 2021. Mixed-signal computing for deep neural network inference. IEEE Trans. on Very Large Scale Integr. Sys. 29 (2021), 313.Google ScholarGoogle ScholarCross RefCross Ref
  194. [194] Haensch W., Gokmen T., and Puri R.. 2019. The next generation of deep learning hardware: Analog computing. Proceedings of the IEEE 107 (2019), 108122.Google ScholarGoogle ScholarCross RefCross Ref
  195. [195] Bankman D., Yang L., Moons B., Verhelst M., and Murmann B.. 2018. An always-on 3.8 uJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS. In IEEE International Solid - State Circuits Conference, 2018.Google ScholarGoogle Scholar
  196. [196] Moons B., Bankman D., Yang L., Murmann B., and Verhelst M.. 2018. BinarEye: An always-on energy-accuracy-scalable binary CNN processor with all memory on chip in 28nm CMOS. CoRR, vol. abs/1804.05554, 2018.Google ScholarGoogle Scholar
  197. [197] Bankman D. and Murmann B.. 2016. An 8-bit, 16 input, 3.2 pJ/op switched-capacitor dot product circuit in 28-nm FDSOI CMOS. In IEEE Asian Solid-State Circuits Conference, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  198. [198] Bong K. et al. 2017. 14.6 A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on Haar-like face detector. In IEEE Int. Solid-State Circuits Conf., 2017.Google ScholarGoogle ScholarCross RefCross Ref
  199. [199] Oh J., Park J., Kim G., Lee S., and Yoo H.. 2011. A 57mW embedded mixed-mode neuro-fuzzy accelerator for intelligent multi-core processor. In IEEE International Solid-State Circuits Conference, 2011.Google ScholarGoogle Scholar
  200. [200] Marinella M. J. et al. 2018. Multiscale co-design analysis of energy, latency, area, and accuracy of a ReRAM analog neural training accelerator. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 8 (2018), 86101.Google ScholarGoogle ScholarCross RefCross Ref
  201. [201] Anwar S., Hwang, K. and Sung W.. 2015. Fixed point optimization of deep convolutional neural networks for object recognition. In IEEE Int. Conference on Acoustics, Speech and Signal Processing, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  202. [202] Judd P., Albericio J., and Moshovos A.. 2017. Stripes: Bit-serial deep neural network computing. IEEE Comp. Arch. Letters 16 (2017), 8083.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 21, Issue 5
        September 2022
        526 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3561947
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 October 2022
        • Online AM: 7 March 2022
        • Accepted: 19 February 2022
        • Revised: 31 January 2022
        • Received: 14 July 2021
        Published in tecs Volume 21, Issue 5

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)289
        • Downloads (Last 6 weeks)13

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!