skip to main content
research-article

Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework

Authors Info & Claims
Published:09 December 2022Publication History
Skip Abstract Section

Abstract

Efficient deployment of Deep Neural Networks (DNNs) on edge devices (i.e., FPGAs and mobile platforms) is very challenging, especially under a recent witness of the increasing DNN model size and complexity. Model compression strategies, including weight quantization and pruning, are widely recognized as effective approaches to significantly reduce computation and memory intensities, and have been implemented in many DNNs on edge devices. However, most state-of-the-art works focus on ad hoc optimizations, and there lacks a thorough study to comprehensively reveal the potentials and constraints of different edge devices when considering different compression strategies. In this article, we qualitatively and quantitatively compare the energy efficiency of FPGA-based and mobile-based DNN executions using mobile GPU and provide a detailed analysis. Based on the observations obtained from the analysis, we propose a unified optimization framework using block-based pruning to reduce the weight storage and accelerate the inference speed on mobile devices and FPGAs, achieving high hardware performance and energy-efficiency gain while maintaining accuracy.

REFERENCES

  1. [1] TensorFlow. [n.d.]. Retrieved from https://www.tensorflow.org/mobile/tflite/.Google ScholarGoogle Scholar
  2. [2] Qualcomm. [n.d.]. Retrieved from https://www.qualcomm.com/products/snapdragon-865-plus-5g-mobile-platform.Google ScholarGoogle Scholar
  3. [3] Ashari Arash, Tatikonda Shirish, Boehm Matthias, Reinwald Berthold, Campbell Keith, Keenleyside John, and Sadayappan P.. 2015. On optimizing machine learning workloads via kernel fusion. ACM SIGPLAN Not. 50, 8 (2015), 173182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Bai Lin, Zhao Yiming, and Huang Xinming. 2018. A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circ. Syst. II: Express Briefs 65, 10 (2018), 14151419.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Bezanson Jeff, Edelman Alan, Karpinski Stefan, and Shah Viral B.. 2017. Julia: A fresh approach to numerical computing. SIAM Rev. 59, 1 (2017), 6598.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Boehm Matthias, Reinwald Berthold, Hutchison Dylan, Evfimievski Alexandre V., and Sen Prithviraj. 2018. On optimizing operator fusion plans for large-scale machine learning in SystemML. Retrieved from https://arXiv:1801.00829.Google ScholarGoogle Scholar
  7. [7] Chang Sung-En, Li Yanyu, Sun Mengshu, Jiang Weiwen, Shi Runbin, Lin Xue, and Wang Yanzhi. 2020. MSP: An FPGA-specific mixed-scheme, multi-precision deep neural network quantization framework. Retrieved from https://arXiv:2009.07460.Google ScholarGoogle Scholar
  8. [8] Chen Tianqi, Moreau Thierry, Jiang Ziheng, Zheng Lianmin, Yan Eddie, Shen Haichen, Cowan Meghan, Wang Leyuan, Hu Yuwei, Ceze Luis, Guestrin Carlos, and Krishnamurthy Arvind. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 578594.Google ScholarGoogle Scholar
  9. [9] Cheng Gong, Ye Lu, Tao Li, Xiaofan Zhang, Cong Hao, Deming Chen, and Yao Chen. 2019. \(\mu\)L2Q: An ultra-low loss quantization method for DNN. Proceedings of the International Joint Conference on Neural Networks (IJCNN’19).Google ScholarGoogle Scholar
  10. [10] Choi Jungwook, Wang Zhuo, Venkataramani Swagath, Chuang Pierce I.-Jen, Srinivasan Vijayalakshmi, and Gopalakrishnan Kailash. 2018. Pact: Parameterized clipping activation for quantized neural networks. Retrieved from https://arXiv:1805.06085.Google ScholarGoogle Scholar
  11. [11] Cong Jason, Fang Zhenman, Lo Michael, Wang Hanrui, Xu Jingxian, and Zhang Shaochong. 2018. Understanding performance differences of FPGAs and GPUs: (Abtract only). In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). Association for Computing Machinery, New York, NY, 288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Courbariaux Matthieu, Bengio Yoshua, and David Jean-Pierre. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. MIT Press, 31233131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Courbariaux Matthieu, Hubara Itay, Soudry Daniel, El-Yaniv Ran, and Bengio Yoshua. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. Retrieved from https://arXiv:1602.02830.Google ScholarGoogle Scholar
  14. [14] Dai Xiaoliang, Yin Hongxu, and Jha Niraj K.. 2017. NeST: A neural network synthesis tool based on a grow-and-prune paradigm. Retrieved from https://arXiv:1711.02017.Google ScholarGoogle Scholar
  15. [15] Deng Jia, Dong Wei, Socher Richard, Li Li-Jia, Li Kai, and Fei-Fei Li. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248255.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Ding Caiwen, Liao Siyu, Wang Yanzhi, Li Zhe, Liu Ning, Zhuo Youwei, Wang Chao, Qian Xuehai, Bai Yu, Yuan Geng, Ma Xiaolong, Zhang Yipeng, Tang Jian, Qiu Qinru, Lin Xue, and Yuan Bo. 2017. Circnn: Accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 395408.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Ding Caiwen, Wang Shuo, Liu Ning, Xu Kaidi, Wang Yanzhi, and Liang Yun. 2019. REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 3342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Dong Xuanyi and Yang Yi. 2019. Network pruning via transformable architecture search. In Advances in Neural Information Processing Systems. MIT Press, 759770.Google ScholarGoogle Scholar
  19. [19] Esser Steven K., McKinstry Jeffrey L., Bablani Deepika, Appuswamy Rathinakumar, and Modha Dharmendra S.. 2019. Learned step size quantization. In Proceedings of the International Conference on Learning Representations (ICLR’19).Google ScholarGoogle Scholar
  20. [20] Everingham M., Gool L. Van, Williams C. K. I., Winn J., and Zisserman A.. [n.d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Retrieved from http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.Google ScholarGoogle Scholar
  21. [21] Gondimalla Ashish, Chesnut Noah, Thottethodi Mithuna, and Vijaykumar T. N.. 2019. SparTen: A sparse tensor accelerator for convolutional neural networks. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’19). 151165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Gong Ruihao, Liu Xianglong, Jiang Shenghu, Li Tianxiang, Hu Peng, Lin Jiazhen, Yu Fengwei, and Yan Junjie. 2019. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). 48524861.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Guo Kaiyuan, Han Song, Yao Song, Wang Yu, Xie Yuan, and Yang Huazhong. 2017. Software-hardware codesign for efficient neural network acceleration. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 1825.Google ScholarGoogle Scholar
  24. [24] Guo Kaiyuan, Sui Lingzhi, Qiu Jiantao, Yu Jincheng, Wang Junbin, Yao Song, Han Song, Wang Yu, and Yang Huazhong. 2017. Angel-eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Trans. Computer-Aided Design Integr. Circ. Syst. 37, 1 (2017), 3547.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Guo Peng, Ma Hong, Chen Ruizhi, Li Pin, Xie Shaolin, and Wang Donglin. 2018. Fbna: A fully binarized neural network accelerator. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, 51513.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Guo Yiwen, Yao Anbang, and Chen Yurong. 2016. Dynamic network surgery for efficient DNNs. In Advances In Neural Information Processing Systems. 13791387.Google ScholarGoogle Scholar
  27. [27] Han Song, Pool Jeff, Tran John, and Dally William. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. MIT Press, 11351143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] He Yang, Kang Guoliang, Dong Xuanyi, Fu Yanwei, and Yang Yi. 2018. Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18).Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] He Yang, Liu Ping, Wang Ziwei, Hu Zhilan, and Yang Yi. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 43404349.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] He Yihui, Zhang Xiangyu, and Sun Jian. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 13981406.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] He Zhezhi and Fan Deliang. 2019. Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 1143811446.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Jang Hanhwi, Kim Joonsung, Jo Jae-Eon, Lee Jaewon, and Kim Jangwoo. 2019. MnnFast: A fast and scalable system architecture for memory-augmented neural networks. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA’19). 250263.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Jiang Weiwen, Sha Edwin H.-M., Zhang Xinyi, Yang Lei, Zhuge Qingfeng, Shi Yiyu, and Hu Jingtong. 2019. Achieving super-linear speedup across multi-FPGA for real-time DNN inference. ACM Trans. Embed. Comput. Syst. 18, 5s (2019), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Jiang Weiwen, Yang Lei, Dasgupta Sakyasingha, Hu Jingtong, and Shi Yiyu. 2020. Standing on the shoulders of giants: Hardware and neural architecture co-search with hot start. Retrieved from https://arXiv:2007.09087.Google ScholarGoogle Scholar
  35. [35] Jiang Weiwen, Yang Lei, Sha Edwin H-M, Zhuge Qingfeng, Gu Shouzhen, Dasgupta Sakyasingha, Shi Yiyu, and Hu Jingtong. 2020. Hardware/software co-exploration of neural architectures. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. (2020).Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Jiang Weiwen, Zhang Xinyi, Sha Edwin H-M, Yang Lei, Zhuge Qingfeng, Shi Yiyu, and Hu Jingtong. 2019. Accuracy vs. efficiency: Achieving both through FPGA-implementation aware neural architecture search. In Proceedings of the 56th Annual Design Automation Conference. 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Jiang Xiaotang, Wang Huan, Chen Yiliu, Wu Ziqi, Wang Lichuan, Zou Bin, Yang Yafeng, Cui Zongyang, Cai Yu, Yu Tianhang, Lyu Chengfei, and Wu Zhihua. 2020. MNN: A universal and efficient inference engine. In Proceedings of Machine Learning and Systems, Dhillon I., Papailiopoulos D., and Sze V. (Eds.). Vol. 2. 113.Google ScholarGoogle Scholar
  38. [38] Jung Sangil, Son Changyong, Lee Seohyung, Son Jinwoo, Han Jae-Joon, Kwak Youngjun, Hwang Sung Ju, and Choi Changkyu. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 43504359.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Leng Cong, Dou Zesheng, Li Hao, Zhu Shenghuo, and Jin Rong. 2018. Extremely low bit neural network: Squeeze the last bit out with ADMM. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Li Fengfu, Zhang Bo, and Liu Bin. 2016. Ternary weight networks. Retrieved from https://arXiv:1605.04711.Google ScholarGoogle Scholar
  41. [41] Li Tuanhui, Wu Baoyuan, Yang Yujiu, Fan Yanbo, Zhang Yong, and Liu Wei. 2019. Compressing convolutional neural networks via factorized convolutional filters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 39773986.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Li Yuhang, Dong Xin, and Wang Wei. 2020. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’20).Google ScholarGoogle Scholar
  43. [43] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740755.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Lin Xiaofan, Zhao Cong, and Pan Wei. 2017. Towards accurate binary convolutional neural network. In Advances in Neural Information Processing Systems. MIT Press, 345353.Google ScholarGoogle Scholar
  45. [45] Liu Ning, Ma Xiaolong, Xu Zhiyuan, Wang Yanzhi, Tang Jian, and Ye Jieping. 2019. AutoCompress: An automatic DNN structured pruning framework for ultra-high compression rates. Retrieved from https://arXiv:1907.03141.Google ScholarGoogle Scholar
  46. [46] Liu Zhuang, Sun Mingjie, Zhou Tinghui, Huang Gao, and Darrell Trevor. 2018. Rethinking the value of network pruning. Retrieved from https://arXiv:1810.05270.Google ScholarGoogle Scholar
  47. [47] Lu Qing, Jiang Weiwen, Xu Xiaowei, Shi Yiyu, and Hu Jingtong. 2019. On neural architecture search for resource-constrained hardware platforms. Retrieved from https://arXiv:1911.00105.Google ScholarGoogle Scholar
  48. [48] Luo Cheng, Cao Wei, Wang Lingli, and Leong Philip H. W.. 2019. Rna: An accurate residual network accelerator for quantized and reconstructed deep neural networks. IEICE Trans. Info. Syst. 102, 5 (2019), 10371045.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Luo Jian-Hao, Wu Jianxin, and Lin Weiyao. 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision. 50585066.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Ma Xiaolong, Guo Fu-Ming, Niu Wei, Lin Xue, Tang Jian, Ma Kaisheng, Ren Bin, and Wang Yanzhi. 2020. Pconv: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20).Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Ma Xiaolong, Guo Fu-Ming, Niu Wei, Lin Xue, Tang Jian, Ma Kaisheng, Ren Bin, and Wang Yanzhi. 2020. Pconv: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20).Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Ma Yufei, Cao Yu, Vrudhula Sarma, and Seo Jae-sun. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 4554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Mao Huizi, Han Song, Pool Jeff, Li Wenshuo, Liu Xingyu, Wang Yu, and Dally William J.. 2017. Exploring the regularity of sparse structure in convolutional neural networks. Retrieved from https://arXiv:1705.08922.Google ScholarGoogle Scholar
  54. [54] Miyashita Daisuke, Lee Edward H., and Murmann Boris. 2016. Convolutional neural networks using logarithmic data representation. Retrieved from https://arXiv:1603.01025.Google ScholarGoogle Scholar
  55. [55] Nakahara Hiroki, Fujii Tomoya, and Sato Shimpei. 2017. A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). IEEE, 14.Google ScholarGoogle Scholar
  56. [56] Nakahara Hiroki, Shimoda Masayuki, and Sato Shimpei. 2018. A tri-state weight convolutional neural network for an FPGA: Applied to YOLOv2 object detector. In Proceedings of the International Conference on Field-Programmable Technology (FPT’18). IEEE, 298301.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Nakahara Hiroki, Yonekawa Haruyoshi, Fujii Tomoya, and Sato Shimpei. 2018. A lightweight yolov2: A binarized cnn with a parallel support vector regression for an FPGA. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays. 3140.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Nakahara Hiroki, Yonekawa Haruyoshi, Sasao Tsutomu, Iwamoto Hisashi, and Motomura Masato. 2016. A memory-based realization of a binarized deep convolutional neural network. In Proceedings of the International Conference on Field-Programmable Technology (FPT’16). IEEE, 277280.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Nguyen Duy Thanh, Nguyen Tuan Nghia, Kim Hyun, and Lee Hyuk-Jae. 2019. A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. Syst. 27, 8 (2019), 18611873.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Niu Wei, Ma Xiaolong, Lin Sheng, Wang Shihao, Qian Xuehai, Lin Xue, Wang Yanzhi, and Ren Bin. 2020. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’20).Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Niu Yue, Kannan Rajgopal, Srivastava Ajitesh, and Prasanna Viktor. 2020. Reuse kernels or activations? A flexible dataflow for low-latency spectral CNN acceleration. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). 266276.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Preußer Thomas B., Gambardella Giulio, Fraser Nicholas, and Blott Michaela. 2018. Inference of quantized neural networks on heterogeneous all-programmable devices. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’18). IEEE, 833838.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Qasaimeh Murad, Denolf Kristof, Lo Jack, Vissers Kees A., Zambreno Joseph, and Jones Phillip H.. 2019. Comparing energy efficiency of CPU, GPU, and FPGA implementations for vision kernels. In Proceedings of the 15th IEEE International Conference on Embedded Software and Systems (ICESS’19). IEEE, 18. Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Qiu Jiantao, Wang Jie, Yao Song, Guo Kaiyuan, Li Boxun, Zhou Erjin, Yu Jincheng, Tang Tianqi, Xu Ningyi, Song Sen, Wang Yu, and Yang Huazhong. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2635.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Rastegari Mohammad, Ordonez Vicente, Redmon Joseph, and Farhadi Ali. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, 525542.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Sharma Hardik, Park Jongse, Mahajan Divya, Amaro Emmanuel, Kim Joon Kyung, Shao Chenkai, Mishra Asit, and Esmaeilzadeh Hadi. 2016. From high-level deep neural models to FPGAs. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Shi Runbin, Dong Peiyan, Geng Tong, Ding Yuhao, Ma Xiaolong, So Hayden K.-H., Herbordt Martin, Li Ang, and Wang Yanzhi. 2020. CSB-RNN: A faster-than-realtime RNN acceleration framework with compressed structured blocks. Retrieved from https://arXiv:2005.05758.Google ScholarGoogle Scholar
  68. [68] Soomro Khurram, Zamir Amir Roshan, and Shah Mubarak. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. Retrieved from https://arXiv:1212.0402.Google ScholarGoogle Scholar
  69. [69] Su Jiang, Faraone Julian, Liu Junyi, Zhao Yiren, Thomas David B., Leong Philip H. W., and Cheung Peter Y. K.. 2018. Redundancy-reduced MobileNet acceleration on reconfigurable logic for ImageNet classification. In Proceedings of the International Symposium on Applied Reconfigurable Computing. Springer, 1628.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Venieris Stylianos I. and Bouganis Christos-Savvas. 2018. FPGAConvNet: Mapping regular and irregular convolutional neural networks on FPGAs. IEEE Trans. Neural Netw. Learn. Syst. 30, 2 (2018), 326342.Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] Wang Junsong, Lou Qiuwen, Zhang Xiaofan, Zhu Chao, Lin Yonghua, and Chen Deming. 2018. Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, 1631636.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Wen Wei, Wu Chunpeng, Wang Yandan, Chen Yiran, and Li Hai. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. MIT Press, 20742082.Google ScholarGoogle Scholar
  73. [73] Yu Jincheng, Guo Kaiyuan, Hu Yiming, Ning Xuefei, Qiu Jiantao, Mao Huizi, Yao Song, Tang Tianqi, Li Boxun, Wang Yu, and Yang Huazhong. 2018. Real-time object detection towards high power efficiency. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’18). IEEE, 704708.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Yu Ruichi, Li Ang, Chen Chun-Fu, Lai Jui-Hsin, Morariu Vlad I., Han Xintong, Gao Mingfei, Lin Ching-Yung, and Davis Larry S.. 2018. Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 91949203.Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Yu Yunxuan, Zhao Tiandong, Wang Kun, and He Lei. 2020. Light-OPU: An FPGA-based overlay processor for lightweight convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 122132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] Zhang Chi and Prasanna Viktor. 2017. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 3544.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Zhang Chen, Sun Guangyu, Fang Zhenman, Zhou Peipei, Pan Peichen, and Cong Jason. 2018. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 38, 11 (2018), 20722085.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] Zhang Dongqing, Yang Jiaolong, Ye Dongqiangzi, and Hua Gang. 2018. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18). 365382.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. [79] Zhang Jiaqi, Chen Xiangru, Song Mingcong, and Li Tao. 2019. Eager pruning: Algorithm and architecture support for fast training of deep neural networks. In Proceedings of the 46th Annual International Symposium on Computer Architecture (ISCA’19). IEEE, 292303.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. [80] Zhang Jialiang and Li Jing. 2017. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 2534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. [81] Zhang Tianyun, Ye Shaokai, Zhang Kaiqi, Tang Jian, Wen Wujie, Fardad Makan, and Wang Yanzhi. 2018. A systematic DNN weight pruning framework using alternating direction method of multipliers. Retrieved from https://arXiv:1804.03294.Google ScholarGoogle Scholar
  82. [82] Zhao Ruizhe, Niu Xinyu, Wu Yajie, Luk Wayne, and Liu Qiang. 2017. Optimizing CNN-based object detection algorithms on embedded FPGA platforms. In Proceedings of the International Symposium on Applied Reconfigurable Computing. Springer, 255267.Google ScholarGoogle ScholarCross RefCross Ref
  83. [83] Zhou Aojun, Yao Anbang, Guo Yiwen, Xu Lin, and Chen Yurong. 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar
  84. [84] Zhou Shuchang, Wu Yuxin, Ni Zekun, Zhou Xinyu, Wen He, and Zou Yuheng. 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. Retrieved from https://arXiv:1606.06160.Google ScholarGoogle Scholar
  85. [85] Zhu Chenzhuo, Han Song, Mao Huizi, and Dally William J.. 2017. Trained ternary quantization. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google ScholarGoogle Scholar

Index Terms

  1. Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 21, Issue 5
      September 2022
      526 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/3561947
      • Editor:
      • Tulika Mitra
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 December 2022
      • Online AM: 20 April 2022
      • Accepted: 23 March 2022
      • Revised: 10 February 2022
      • Received: 15 July 2021
      Published in tecs Volume 21, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)201
      • Downloads (Last 6 weeks)17

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!