skip to main content
research-article
Public Access

CompAct: On-chip <underline>Com</underline>pression of <underline>Act</underline>ivations for Low Power Systolic Array Based CNN Acceleration

Published:07 October 2019Publication History
Skip Abstract Section

Abstract

This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- and off-chip memory accesses to the large activation inputs (sometimes called feature maps) of CNN layers contribute significantly to total energy consumption for such accelerators; while prior has proposed off-chip compression, activations are still stored on-chip in uncompressed form, requiring either large on-chip activation buffers or slow and energy-hungry off-chip accesses. In this paper, we propose CompAct, a new architecture that enables on-chip compression of activations for SA based CNN accelerators. CompAct is built around several key ideas. First, CompAct identifies an SA schedule that has nearly regular access patterns, enabling the use of a modified run-length coding scheme (RLC). Second, CompAct improves compression ratio of the RLC scheme using Sparse-RLC in later CNN layers and Lossy-RLC in earlier layers. Finally, CompAct proposes look-ahead snoozing that operates synergistically with RLC to reduce the leakage energy of activation buffers. Based on detailed synthesis results, we show that CompAct enables up to 62% reduction in activation buffer energy, and 34% reduction in total chip energy.

References

  1. Jorge Albericio et al. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In Processdings of ACM/IEEE ISCA. 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Manoj Alwani et al. 2016. Fused-layer CNN accelerators. In IEEE/ACM MICRO. 1--12.Google ScholarGoogle Scholar
  3. ARM. 2018. PROJECT [email protected]. https://www.arm.com/products/silicon-ip-cpu/machine-learning/project-trilliumGoogle ScholarGoogle Scholar
  4. Srimat Chakradhar et al. 2010. A dynamically configurable coprocessor for convolutional neural networks. In ACM Computer Architecture News, Vol. 38. 247--257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yu-Hsin Chen et al. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE JSSC 52, 1 (2017), 127--138.Google ScholarGoogle ScholarCross RefCross Ref
  6. Reetuparna Das and Tushar Krishna. [n.d.]. DNN Accelerator Architecture -- SIMD or Systolic? https://www.sigarch.org/dnn-accelerator-architecture-simd-or-systolic/. Accessed: 2019-04-27.Google ScholarGoogle Scholar
  7. Jia Deng et al. 2009. Imagenet: A large-scale hierarchical image database. In IEEE CVPR. 248--255.Google ScholarGoogle Scholar
  8. Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, et al. 2017. CirCNN: Accelerating and compressing deep neural networks using Block-CirculantWeight matrices. arXiv preprint arXiv:1708.08917 (2017).Google ScholarGoogle Scholar
  9. Zidong Du et al. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. 92--104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Li et al. 2018. SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators. In 2018 IEEE DATE. 343--348.Google ScholarGoogle Scholar
  11. Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, and Christos Kozyrakis. 2019. Tangram: Optimized coarse-grained dataflow for scalable NN accelerators.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yijin Guan et al. 2017. FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In 2017 IEEE FCCM. 152--159.Google ScholarGoogle Scholar
  13. Song Han et al. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google ScholarGoogle Scholar
  14. Song Han et al. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of IEEE ISCA. 243--254.Google ScholarGoogle Scholar
  15. Muhammad Abdullah Hanif, Alberto Marchisio, Tabasher Arif, Rehan Hafiz, Semeen Rehman, and Muhammad Shafique. 2018. X-DNNs: Systematic cross-layer approximations for energy-efficient deep neural networks. Journal of Low Power Electronics 14, 4 (2018), 520--534.Google ScholarGoogle ScholarCross RefCross Ref
  16. Weiwen Jiang, Edwin Hsing-Mean Sha, Qingfeng Zhuge, Lei Yang, Xianzhang Chen, and Jingtong Hu. 2018. Heterogeneous fpga-based cost-optimal design for timing-constrained cnns. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2542--2554.Google ScholarGoogle ScholarCross RefCross Ref
  17. Norman Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv preprint arXiv:1704.04760 (2017).Google ScholarGoogle Scholar
  18. Andrej Karpathy et al. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE CVPR. 1725--1732.Google ScholarGoogle Scholar
  19. Dongyoung Kim et al. 2017. ZeNA: Zero-aware neural network accelerator. IEEE Design Test (2017).Google ScholarGoogle Scholar
  20. Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014).Google ScholarGoogle Scholar
  21. Alex Krizhevsky et al. 2012. Imagenet classification with deep convolutional neural networks. In NIPS. 1097--1105.Google ScholarGoogle Scholar
  22. HT Kung et al. 2019. Packing sparse convolutional neural networks for efficient systolic array implementations: Column combining under joint optimization. In 24th ASPLOS. ACM, 821--834.Google ScholarGoogle Scholar
  23. Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016).Google ScholarGoogle Scholar
  24. Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018).Google ScholarGoogle Scholar
  25. Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Laboratories (2009), 22--31.Google ScholarGoogle Scholar
  26. Angshuman Parashar et al. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. SIGARCH Comput. Archit. News 45, 2 (June 2017).Google ScholarGoogle Scholar
  27. Seongwook Park et al. 2015. 93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications. In IEEE ISSCC.Google ScholarGoogle Scholar
  28. Atul Rahman et al. 2016. Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In IEEE DATE. 1393--1398.Google ScholarGoogle Scholar
  29. Jonathan Ross and Gregory Michael Thorson. 2017. Rotating data for neural network computations. US Patent 9,805,303.Google ScholarGoogle Scholar
  30. Murugan Sankaradas et al. 2009. A massively parallel coprocessor for convolutional neural networks. In IEEE ASAP 2009. 53--60.Google ScholarGoogle Scholar
  31. Syed Shakib Sarwar, Gopalakrishnan Srinivasan, Bing Han, Parami Wijesinghe, Akhilesh Jaiswal, Priyadarshini Panda, Anand Raghunathan, and Kaushik Roy. 2018. Energy efficient neural computing: A study of cross-layer approximations. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 8, 4 (2018), 796--809.Google ScholarGoogle ScholarCross RefCross Ref
  32. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks 61 (2015), 85--117.Google ScholarGoogle Scholar
  33. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  34. Aravind Vasudevan et al., Andrew Anderson, and David Gregg. 2017. Parallel multi channel convolution using general matrix multiplication. In IEEE 28th ASAP. 19--24.Google ScholarGoogle Scholar
  35. Elena I Vatajelu and Joan Figueras. 2011. Statistical analysis of 6T SRAM data retention voltage under process variation. In 14th IEEE International Symposium on Design and Diagnostics of Electronic Circuits and Systems. IEEE, 365--370.Google ScholarGoogle ScholarCross RefCross Ref
  36. Wei Wen et al. 2016. Learning structured sparsity in deep neural networks. In NIPS. 2074--2082.Google ScholarGoogle Scholar
  37. Keiji Yanai, Ryosuke Tanno, and Koichi Okamoto. 2016. Efficient mobile implementation of a cnn-based object recognition system. In Proceedings of the 24th ACM international conference on Multimedia. ACM, 362--366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. 2017. Designing energy-efficient convolutional neural networks using energy-aware pruning. In Proceedings of the IEEE CVPR. 5687--5695.Google ScholarGoogle ScholarCross RefCross Ref
  39. Jiecao Yu et al. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In Proceedings of ACM ISCA. 548--560.Google ScholarGoogle Scholar
  40. Jeff Zhang, Kartheek Rangineni, Zahra Ghodsi, and Siddharth Garg. 2018. Thundervolt: Enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In ACM 55th DAC. 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jeff Jun Zhang, Tianyu Gu, Kanad Basu, and Siddharth Garg. 2018. Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. In 2018 IEEE 36th VLSI Test Symposium (VTS). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  42. Chen Zhang et al. 2015. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM FPGA. ACM, 161--170.Google ScholarGoogle Scholar

Index Terms

  1. CompAct: On-chip <underline>Com</underline>pression of <underline>Act</underline>ivations for Low Power Systolic Array Based CNN Acceleration

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!