Abstract
In the past years, several works have proposed custom hardware and software-based techniques for the acceleration of Convolutional Neural Networks (CNNs). Most of these works focus on saving computations by changing the used precision or modifying frame processing. To reach a more aggressive energy reduction, in this paper we propose software-only modifications to the CNNs inference process.
Our approach exploits the inherent locality in videos by replacing entire frame computations with a movement prediction algorithm. Furthermore, when a frame must be processed, we avoid energy-demanding floating-point operations, and at the same time reduce memory accesses by employing look-up tables in place of the original convolutions.
Using the proposed approach, one can reach significant energy gains of more than 25× for security cameras, and 12× for moving vehicles applications, with only small software modifications.
- [n.d.]. CAVIAR Test Case Scenarios. http://homepages.inf.ed.ac.uk/rbf/CAVIAR/. Accessed: April 2019.Google Scholar
- Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM Sigplan Notices, Vol. 49. ACM, 269--284.Google Scholar
Digital Library
- Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 609--622.Google Scholar
Digital Library
- Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127--138.Google Scholar
Cross Ref
- Yong-Sheng Chen, Yi-Ping Hung, and Chiou-Shann Fuh. 2001. Fast block matching algorithm based on the winner-update strategy. IEEE Transactions on Image Processing 10, 8 (2001), 1212--1222.Google Scholar
Digital Library
- Jack Choquette, Olivier Giroux, and Denis Foley. 2018. Volta: Performance and programmability. IEEE Micro 38, 2 (2018), 42--52.Google Scholar
Cross Ref
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248--255.Google Scholar
Cross Ref
- Piotr Dollár, Christian Wojek, Bernt Schiele, and Pietro Perona. 2009. Pedestrian detection: A benchmark. (2009).Google Scholar
- Alireza Fathi, Xiaofeng Ren, and James M. Rehg. 2011. Learning to recognize objects in egocentric activities. In CVPR 2011. IEEE, 3281--3288.Google Scholar
- Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
Digital Library
- Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, and Christopher W. Fletcher. 2018. Ucnn: Exploiting computational reuse in deep neural networks via weight repetition. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 674--687.Google Scholar
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research 18, 1 (2017), 6869--6898.Google Scholar
Digital Library
- M. Jakubowski and G. Pastuszak. 2013. Block-based motion estimation algorithms-a survey. Opto-Electronics Review 21, 1 (2013), 86--102.Google Scholar
Cross Ref
- Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 221--231.Google Scholar
Cross Ref
- Xun Jiao, Vahideh Akhlaghi, Yu Jiang, and Rajesh K. Gupta. 2018. Energy-efficient neural networks using approximate computation reuse. In 2018 Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE). IEEE, 1223--1228.Google Scholar
- Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 2017. Noscope: Optimizing neural network queries over video at scale. Proceedings of the VLDB Endowment 10, 11 (2017), 1586--1597.Google Scholar
Digital Library
- Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1725--1732.Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.Google Scholar
Digital Library
- Yin Li, Zhefan Ye, and James M. Rehg. 2015. Delving into egocentric actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 287--295.Google Scholar
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European Conference on Computer Vision. Springer, 740--755.Google Scholar
- Xingyu Liu, Jeff Pool, Song Han, and William J. Dally. 2018. Efficient sparse-winograd convolutional neural networks. International Conference on Learning Representations (ICLR) (2018).Google Scholar
- Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories (2009), 22--31.Google Scholar
- Hyeonseob Nam and Bohyung Han. 2016. Learning multi-domain convolutional neural networks for visual tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4293--4302.Google Scholar
Cross Ref
- Arnab Raha and Vijay Raghunathan. 2017. q LUT: Input-aware quantized table lookup for energy-efficient approximate accelerators. ACM Transactions on Embedded Computing Systems (TECS) 16, 5s (2017), 130.Google Scholar
Digital Library
- Mohammad Samragh Razlighi, Mohsen Imani, Farinaz Koushanfar, and Tajana Rosing. 2017. Looknn: Neural network with no multiplication. In Proceedings of the Conference on Design, Automation 8 Test in Europe. European Design and Automation Association, 1779--1784.Google Scholar
Cross Ref
- Joseph Redmon. 2013--2016. Darknet: Open Source Neural Networks in C. http://pjreddie.com/darknet/.Google Scholar
- Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7263--7271.Google Scholar
Cross Ref
- Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018).Google Scholar
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.Google Scholar
- Marc Riera, Jose-Maria Arnau, and Antonio González. 2018. Computation reuse in DNNs by exploiting input similarity. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 57--68.Google Scholar
Digital Library
- Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ACM SIGARCH Computer Architecture News 44, 3 (2016), 14--26.Google Scholar
Digital Library
- Mohammad Javad Shafiee, Brendan Chywl, Francis Li, and Alexander Wong. 2017. Fast YOLO: A fast you only look once system for real-time embedded object detection in video. arXiv preprint arXiv:1709.05943 (2017).Google Scholar
- Avinash Sodani. 2015. Knights landing (knl): 2nd generation intel® xeon phi processor. In 2015 IEEE Hot Chips 27 Symposium (HCS). IEEE, 1--24.Google Scholar
Cross Ref
- Arjun Suresh, Erven Rohou, and André Seznec. 2017. Compile-time function memoization. In Proceedings of the 26th International Conference on Compiler Construction. ACM, 45--54.Google Scholar
Digital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1--9.Google Scholar
Cross Ref
- Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2016. Quantized convolutional neural networks for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4820--4828.Google Scholar
Cross Ref
- Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4694--4702.Google Scholar
Cross Ref
- Yuhao Zhu, Anand Samajdar, Matthew Mattina, and Paul Whatmough. 2018. Euphrates: Algorithm-soc co-design for low-power mobile continuous vision. arXiv preprint arXiv:1803.11232 (2018).Google Scholar
Index Terms
Aggressive Energy Reduction for Video Inference with Software-only Strategies
Recommendations
Skipping CNN Convolutions Through Efficient Memoization
Embedded Computer Systems: Architectures, Modeling, and SimulationAbstractConvolutional Neural Networks (CNNs) have become a de-facto standard for image and video recognition. However, current software and hardware implementations targeting convolutional operations still lack embracing energy budget constraints due to ...
Using Frame Similarity for Low Energy Software-Only IoT Video Recognition
Embedded Computer Systems: Architectures, Modeling, and SimulationAbstractEmbedded video-processing applications are everywhere, and need to be low-energy in order to extend battery life. Convolutional Neural Networks (CNNs), frequently used for this task, fail to explore the intrinsic redundancy present in videos: ...
An FPGA Overlay for CNN Inference with Fine-grained Flexible Parallelism
Increasingly, pre-trained convolutional neural networks (CNNs) are being deployed for inference in various computer vision applications, both on the server-side in the data centers and at the edge. CNN inference is a very compute-intensive task. It is a ...






Comments