Abstract
Efficient deployment of Deep Neural Networks (DNNs) on edge devices (i.e., FPGAs and mobile platforms) is very challenging, especially under a recent witness of the increasing DNN model size and complexity. Model compression strategies, including weight quantization and pruning, are widely recognized as effective approaches to significantly reduce computation and memory intensities, and have been implemented in many DNNs on edge devices. However, most state-of-the-art works focus on ad hoc optimizations, and there lacks a thorough study to comprehensively reveal the potentials and constraints of different edge devices when considering different compression strategies. In this article, we qualitatively and quantitatively compare the energy efficiency of FPGA-based and mobile-based DNN executions using mobile GPU and provide a detailed analysis. Based on the observations obtained from the analysis, we propose a unified optimization framework using block-based pruning to reduce the weight storage and accelerate the inference speed on mobile devices and FPGAs, achieving high hardware performance and energy-efficiency gain while maintaining accuracy.
- [1] TensorFlow. [n.d.]. Retrieved from https://www.tensorflow.org/mobile/tflite/.Google Scholar
- [2] Qualcomm. [n.d.]. Retrieved from https://www.qualcomm.com/products/snapdragon-865-plus-5g-mobile-platform.Google Scholar
- [3] . 2015. On optimizing machine learning workloads via kernel fusion. ACM SIGPLAN Not. 50, 8 (2015), 173–182.Google Scholar
Digital Library
- [4] . 2018. A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circ. Syst. II: Express Briefs 65, 10 (2018), 1415–1419.Google Scholar
Cross Ref
- [5] . 2017. Julia: A fresh approach to numerical computing. SIAM Rev. 59, 1 (2017), 65–98.Google Scholar
Digital Library
- [6] . 2018. On optimizing operator fusion plans for large-scale machine learning in SystemML. Retrieved from https://arXiv:1801.00829.Google Scholar
- [7] . 2020. MSP: An FPGA-specific mixed-scheme, multi-precision deep neural network quantization framework. Retrieved from https://arXiv:2009.07460.Google Scholar
- [8] . 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 578–594.Google Scholar
- [9] . 2019. \(\mu\)L2Q: An ultra-low loss quantization method for DNN. Proceedings of the International Joint Conference on Neural Networks (IJCNN’19).Google Scholar
- [10] . 2018. Pact: Parameterized clipping activation for quantized neural networks. Retrieved from https://arXiv:1805.06085.Google Scholar
- [11] . 2018. Understanding performance differences of FPGAs and GPUs: (Abtract only). In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). Association for Computing Machinery, New York, NY, 288. Google Scholar
Digital Library
- [12] . 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. MIT Press, 3123–3131.Google Scholar
Digital Library
- [13] . 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. Retrieved from https://arXiv:1602.02830.Google Scholar
- [14] . 2017. NeST: A neural network synthesis tool based on a grow-and-prune paradigm. Retrieved from https://arXiv:1711.02017.Google Scholar
- [15] . 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248–255.Google Scholar
Cross Ref
- [16] . 2017. Circnn: Accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 395–408.Google Scholar
Digital Library
- [17] . 2019. REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 33–42.Google Scholar
Digital Library
- [18] . 2019. Network pruning via transformable architecture search. In Advances in Neural Information Processing Systems. MIT Press, 759–770.Google Scholar
- [19] . 2019. Learned step size quantization. In Proceedings of the International Conference on Learning Representations (ICLR’19).Google Scholar
- [20] . [n.d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Retrieved from http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.Google Scholar
- [21] . 2019. SparTen: A sparse tensor accelerator for convolutional neural networks. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’19). 151–165.Google Scholar
Digital Library
- [22] . 2019. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). 4852–4861.Google Scholar
Cross Ref
- [23] . 2017. Software-hardware codesign for efficient neural network acceleration. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 18–25.Google Scholar
- [24] . 2017. Angel-eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Trans. Computer-Aided Design Integr. Circ. Syst. 37, 1 (2017), 35–47.Google Scholar
Cross Ref
- [25] . 2018. Fbna: A fully binarized neural network accelerator. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, 51–513.Google Scholar
Cross Ref
- [26] . 2016. Dynamic network surgery for efficient DNNs. In Advances In Neural Information Processing Systems. 1379–1387.Google Scholar
- [27] . 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. MIT Press, 1135–1143.Google Scholar
Digital Library
- [28] . 2018. Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18).Google Scholar
Cross Ref
- [29] . 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4340–4349.Google Scholar
Cross Ref
- [30] . 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 1398–1406.Google Scholar
Cross Ref
- [31] . 2019. Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 11438–11446.Google Scholar
Cross Ref
- [32] . 2019. MnnFast: A fast and scalable system architecture for memory-augmented neural networks. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA’19). 250–263.Google Scholar
Digital Library
- [33] . 2019. Achieving super-linear speedup across multi-FPGA for real-time DNN inference. ACM Trans. Embed. Comput. Syst. 18, 5s (2019), 1–23.Google Scholar
Digital Library
- [34] . 2020. Standing on the shoulders of giants: Hardware and neural architecture co-search with hot start. Retrieved from https://arXiv:2007.09087.Google Scholar
- [35] . 2020. Hardware/software co-exploration of neural architectures. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. (2020).Google Scholar
Cross Ref
- [36] . 2019. Accuracy vs. efficiency: Achieving both through FPGA-implementation aware neural architecture search. In Proceedings of the 56th Annual Design Automation Conference. 1–6.Google Scholar
Digital Library
- [37] . 2020. MNN: A universal and efficient inference engine. In Proceedings of Machine Learning and Systems, , , and (Eds.). Vol. 2. 1–13.Google Scholar
- [38] . 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4350–4359.Google Scholar
Cross Ref
- [39] . 2018. Extremely low bit neural network: Squeeze the last bit out with ADMM. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).Google Scholar
Cross Ref
- [40] . 2016. Ternary weight networks. Retrieved from https://arXiv:1605.04711.Google Scholar
- [41] . 2019. Compressing convolutional neural networks via factorized convolutional filters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 3977–3986.Google Scholar
Cross Ref
- [42] . 2020. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’20).Google Scholar
- [43] . 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.Google Scholar
Cross Ref
- [44] . 2017. Towards accurate binary convolutional neural network. In Advances in Neural Information Processing Systems. MIT Press, 345–353.Google Scholar
- [45] . 2019. AutoCompress: An automatic DNN structured pruning framework for ultra-high compression rates. Retrieved from https://arXiv:1907.03141.Google Scholar
- [46] . 2018. Rethinking the value of network pruning. Retrieved from https://arXiv:1810.05270.Google Scholar
- [47] . 2019. On neural architecture search for resource-constrained hardware platforms. Retrieved from https://arXiv:1911.00105.Google Scholar
- [48] . 2019. Rna: An accurate residual network accelerator for quantized and reconstructed deep neural networks. IEICE Trans. Info. Syst. 102, 5 (2019), 1037–1045.Google Scholar
Cross Ref
- [49] . 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision. 5058–5066.Google Scholar
Cross Ref
- [50] . 2020. Pconv: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20).Google Scholar
Cross Ref
- [51] . 2020. Pconv: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20).Google Scholar
Cross Ref
- [52] . 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 45–54.Google Scholar
Digital Library
- [53] . 2017. Exploring the regularity of sparse structure in convolutional neural networks. Retrieved from https://arXiv:1705.08922.Google Scholar
- [54] . 2016. Convolutional neural networks using logarithmic data representation. Retrieved from https://arXiv:1603.01025.Google Scholar
- [55] . 2017. A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). IEEE, 1–4.Google Scholar
- [56] . 2018. A tri-state weight convolutional neural network for an FPGA: Applied to YOLOv2 object detector. In Proceedings of the International Conference on Field-Programmable Technology (FPT’18). IEEE, 298–301.Google Scholar
Cross Ref
- [57] . 2018. A lightweight yolov2: A binarized cnn with a parallel support vector regression for an FPGA. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays. 31–40.Google Scholar
Digital Library
- [58] . 2016. A memory-based realization of a binarized deep convolutional neural network. In Proceedings of the International Conference on Field-Programmable Technology (FPT’16). IEEE, 277–280.Google Scholar
Cross Ref
- [59] . 2019. A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. Syst. 27, 8 (2019), 1861–1873.Google Scholar
Digital Library
- [60] . 2020. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’20).Google Scholar
Digital Library
- [61] . 2020. Reuse kernels or activations? A flexible dataflow for low-latency spectral CNN acceleration. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). 266–276.Google Scholar
Digital Library
- [62] . 2018. Inference of quantized neural networks on heterogeneous all-programmable devices. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’18). IEEE, 833–838.Google Scholar
Cross Ref
- [63] . 2019. Comparing energy efficiency of CPU, GPU, and FPGA implementations for vision kernels. In Proceedings of the 15th IEEE International Conference on Embedded Software and Systems (ICESS’19). IEEE, 1–8. Google Scholar
Cross Ref
- [64] . 2016. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 26–35.Google Scholar
Digital Library
- [65] . 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, 525–542.Google Scholar
Cross Ref
- [66] . 2016. From high-level deep neural models to FPGAs. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 1–13.Google Scholar
Digital Library
- [67] . 2020. CSB-RNN: A faster-than-realtime RNN acceleration framework with compressed structured blocks. Retrieved from https://arXiv:2005.05758.Google Scholar
- [68] . 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. Retrieved from https://arXiv:1212.0402.Google Scholar
- [69] . 2018. Redundancy-reduced MobileNet acceleration on reconfigurable logic for ImageNet classification. In Proceedings of the International Symposium on Applied Reconfigurable Computing. Springer, 16–28.Google Scholar
Cross Ref
- [70] . 2018. FPGAConvNet: Mapping regular and irregular convolutional neural networks on FPGAs. IEEE Trans. Neural Netw. Learn. Syst. 30, 2 (2018), 326–342.Google Scholar
Cross Ref
- [71] . 2018. Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, 163–1636.Google Scholar
Cross Ref
- [72] . 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. MIT Press, 2074–2082.Google Scholar
- [73] . 2018. Real-time object detection towards high power efficiency. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’18). IEEE, 704–708.Google Scholar
Cross Ref
- [74] . 2018. Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9194–9203.Google Scholar
Cross Ref
- [75] . 2020. Light-OPU: An FPGA-based overlay processor for lightweight convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 122–132.Google Scholar
Digital Library
- [76] . 2017. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 35–44.Google Scholar
Digital Library
- [77] . 2018. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 38, 11 (2018), 2072–2085.Google Scholar
Digital Library
- [78] . 2018. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18). 365–382.Google Scholar
Digital Library
- [79] . 2019. Eager pruning: Algorithm and architecture support for fast training of deep neural networks. In Proceedings of the 46th Annual International Symposium on Computer Architecture (ISCA’19). IEEE, 292–303.Google Scholar
Digital Library
- [80] . 2017. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 25–34.Google Scholar
Digital Library
- [81] . 2018. A systematic DNN weight pruning framework using alternating direction method of multipliers. Retrieved from https://arXiv:1804.03294.Google Scholar
- [82] . 2017. Optimizing CNN-based object detection algorithms on embedded FPGA platforms. In Proceedings of the International Symposium on Applied Reconfigurable Computing. Springer, 255–267.Google Scholar
Cross Ref
- [83] . 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google Scholar
- [84] . 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. Retrieved from https://arXiv:1606.06160.Google Scholar
- [85] . 2017. Trained ternary quantization. In Proceedings of the International Conference on Learning Representations (ICLR’17).Google Scholar
Index Terms
Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework
Recommendations
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing, and speech recognition. However, their superior performance comes at the considerable ...
Energy allocation and task scheduling in edge devices based on forecast solar energy with meteorological information
AbstractOffloading tasks from edge devices to the cloud is an important method to enhance the performance of the edge device. With the help of EH (Energy Harvesting) technology, the edge device can use the collected green energy to support its ...
Highlights- We model the edge system considering the battery and the weather information.
- ...
Collaboration among mobile agents for efficient energy allocation in mobile grid
The use of mobile devices in grid environments may have two interaction aspects: devices are considered as users of grid resources or as grid resources providers. Due to the limitation constraints on energy and processing capacity of mobile devices, ...






Comments