Abstract
Resistive crossbars have shown strong potential as the building blocks of future neural fabrics, due to their ability to natively execute vector-matrix multiplication (the dominant computational kernel in DNNs). However, a key challenge that arises in resistive crossbars is that non-idealities in the synaptic devices, interconnects, and peripheral circuits of resistive crossbars lead to errors in the computations performed. When large-scale DNNs are executed on resistive crossbar systems, these errors compound and result in unacceptable degradation in application-level accuracy. We propose CxDNN, a hardware-software methodology that enables the realization of large-scale DNNs on crossbar systems by compensating for errors due to non-idealities, greatly mitigating the degradation in accuracy. CxDNN is composed of (i) an optimized mapping technique to convert floating-point weights and activations to crossbar conductances and input voltages, (ii) a fast one-time re-training method to recover accuracy loss due to this conversion, and (iii) low-overhead compensation hardware to mitigate dynamic and hardware-instance-specific errors. Unlike previous efforts that are limited to small networks and require the training and deployment of hardware-instance-specific models, CxDNN presents a scalable compensation methodology that can address large DNNs (e.g., ResNet-50 on ImageNet) and maintains the train-once-deploy-anywhere tenet of current DNN application. We evaluated CxDNN on six top DNNs on the ImageNet dataset with 0.5--13.8 million neurons and 0.5--15.5 billion connections. CxDNN achieves 16.9%--49% improvement in the top-1 classification accuracy, effectively mitigating a key challenge to the use of resistive crossbar--based neural fabrics.
- A. Avizienis. 1971. Arithmetic error codes: Cost and effectiveness studies for application in digital system design. IEEE Trans. Comput. C-20, 11 (Nov. 1971), 1322--1331. DOI:https://doi.org/10.1109/T-C.1971.223134Google Scholar
Digital Library
- I. Chakraborty, D. Roy, and K. Roy. 2018. Technology aware training in memristive neuromorphic systems for nonideal synaptic crossbars. IEEE Trans. Emerg. Topics. Comput. Intell. 2, 5 (Oct. 2018), 335--344. DOI:https://doi.org/10.1109/tetci.2018.2829919Google Scholar
- L. Chen, J. Li, Y. Chen, Q. Deng, J. Shen, X. Liang, and L. Jiang. 2017. Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’17). 19--24. DOI:https://doi.org/10.23919/DATE.2017.7926952Google Scholar
- P. Chen, X. Peng, and S. Yu. 2018. NeuroSim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning. IEEE Trans. Comput.-Aided Des. Integ. Circ. Syst. (2018), 1--1. DOI:https://doi.org/10.1109/TCAD.2018.2789723Google Scholar
- Pai-Yu Chen, Binbin Lin, I-Ting Wang, Tuo-Hung Hou, Jieping Ye, Sarma Vrudhula, Jae-sun Seo, Yu Cao, and Shimeng Yu. 2015. Mitigating effects of non-ideal synaptic device characteristics for on-chip learning. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’15). IEEE Press, Piscataway, NJ, 194--199. Retrieved from: http://dl.acm.org/citation.cfm?id=2840819.2840848.Google Scholar
Digital Library
- Ming Cheng, Lixue Xia, Zhenhua Zhu, Yi Cai, Yuan Xie, Yu Wang, and Huazhong Yang. 2017. TIME: A training-in-memory architecture for memristor-based deep neural networks. In Proceedings of the 54th Design Automation Conference (DAC’17). ACM, New York, NY, Article 26, 6 pages. DOI:https://doi.org/10.1145/3061639.3062326Google Scholar
Digital Library
- P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proceedings of the ACM/IEEE 43rd International Symposium on Computer Architecture (ISCA’16). 27--39. DOI:https://doi.org/10.1109/ISCA.2016.13Google Scholar
- B. Feinberg, S. Wang, and E. Ipek. 2018. Making memristive neural network accelerators reliable. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’18). 52--65. DOI:https://doi.org/10.1109/HPCA.2018.00015Google Scholar
- T. Gokmen, O. Murat Onen, and W. Haensch. 2017. Training deep convolutional neural networks with resistive cross-point devices. Front. Neurosci. 11 (2017), 538. DOI:https://doi.org/10.3389/fnins.2017.00538Google Scholar
Cross Ref
- T. Gokmen and Y. Vlasov. 2016. Acceleration of deep neural network training with resistive cross-point devices: Design considerations. Front. Neurosci. 10 (2016), 333. DOI:https://doi.org/10.3389/fnins.2016.00333Google Scholar
Cross Ref
- Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization. and Huffman coding. Retrieved from: Arxiv Preprint Arxiv:1510.00149 (2015).Google Scholar
Digital Library
- Yihui He and Song Han. 2018. ADC: Automated deep compression and acceleration with reinforcement learning. CoRR abs/1802.03494 (2018). Retrieved from: http://arxiv.org/abs/1802.03494.Google Scholar
- Miao Hu, Catherine Graves, Can Li, Yunning Li, Ning Ge, Eric Montgomery, Noraica Davila, Hao Jiang, R. Stanley Williams, J. Joshua Yang, Qiangfei Xia, and John Paul Strachan. 2018. Memristor-based analog computation and neural network classification with a dot product engine. Advanced Materials 30, 9 (2018), 1705914.Google Scholar
Cross Ref
- M. Hu, J. P. Strachan, Z. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge, J. J. Yang, and R. S. Williams. 2016. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proceedings of the 53rd ACM/EDAC/IEEE Design Automation Conference (DAC’16). 1--6. DOI:https://doi.org/10.1145/2897937.2898010Google Scholar
- Shubham Jain, Abhronil Sengupta, Kaushik Roy, and Anand Raghunathan. 2018. Rx-Caffe: Framework for evaluating and training deep neural networks on resistive crossbars. CoRR abs/1809.00072 (2018). Retrieved from: http://arxiv.org/abs/1809.00072.Google Scholar
- Shubham Jain, Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi, Pierce Chuang, and Leland Chang. 2018. Compensated-DNN: Energy efficient low-precision deep neural networks by compensating quantization errors. In Proceedings of the 55th Design Automation Conference (DAC’18). ACM, New York, NY, Article 38, 6 pages. DOI:https://doi.org/10.1145/3195970.3196012Google Scholar
Digital Library
- S. Jain, S. Venkataramani, V. Srinivasan, J. Choi, K. Gopalakrishnan, and L. Chang. 2019. BiScaled-DNN: Quantizing long-tailed datastructures with two scale factors for deep neural networks. In Proceedings of the 56th ACM/IEEE Design Automation Conference (DAC’19). 1--6.Google Scholar
- Sung Hyun Jo, Ting Chang, Idongesit Ebong, Bhavitavya B. Bhadviya, Pinaki Mazumder, and Wei Lu. 2010. Nanoscale memristor device as synapse in neuromorphic systems. Nano Lett. 10, 4 (2010), 1297--1301. DOI:https://doi.org/10.1021/nl904092h PMID: 20192230.Google Scholar
Cross Ref
- Irina Kataeva, Farnood Merrikh-Bayat, Elham Zamanidoost, and Dmitri Strukov. 2015. Efficient training algorithms for neural networks based on memristive crossbar circuits. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’15). IEEE, 1--8.Google Scholar
Cross Ref
- L. Kull, T. Toifl, M. Schmatz, P. A. Francese, C. Menolfi, M. Braendli, M. Kossel, T. Morf, T. M. Andersen, and Y. Leblebici. 2013. A 3.1mW 8b 1.2GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32nm digital SOI CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers. 468--469. DOI:https://doi.org/10.1109/ISSCC.2013.6487818Google Scholar
- Beiye Liu, M. Hu, Hai Li, Zhi-Hong Mao, Yiran Chen, Tingwen Huang, and Wei Zhang. 2013. Digital-assisted noise-eliminating training for memristor crossbar-based analog neuromorphic computing engine. In Proceedings of the 50th ACM/EDAC/IEEE Design Automation Conference (DAC’13). 1--6.Google Scholar
Digital Library
- B. Liu, H. Li, Y. Chen, X. Li, T. Huang, Q. Wu, and M. Barnell. 2014. Reduction and IR-drop compensations techniques for reliable neuromorphic computing systems. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). 63--70. DOI:https://doi.org/10.1109/ICCAD.2014.7001330Google Scholar
- C. Liu, M. Hu, J. P. Strachan, and H. Li. 2017. Rescuing memristor-based neuromorphic design with high defects. In Proceedings of the 54th ACM/EDAC/IEEE Design Automation Conference (DAC’17). 1--6. DOI:https://doi.org/10.1145/3061639.3062310Google Scholar
- Peter Moon, Vinay Chikarmane, Kevin Fischer, Rohit Grover, Tarek A. Ibrahim, Doug Ingerly, Kevin J. Lee, Chris Litteken, Tony Mule, and Sarah Williams. 2008. Process and electrical results for the on-die interconnect stack for Intel’s 45nm process generation. Intel Technol. J. 12, 2 (2008).Google Scholar
- E. Park, D. Kim, and S. Yoo. 2018. Energy-efficient neural network accelerator based on outlier-aware low-precision computation. In Proceedings of the ACM/IEEE 45th International Symposium on Computer Architecture (ISCA’18). 688--698. DOI:https://doi.org/10.1109/ISCA.2018.00063Google Scholar
- R. Parloff. 2016. Why deep learning is suddenly changing your life. Fortune.com. 9/28/16. Retrieved from: http://fortune.com/ai-artificial-intelligence-deep-machine-learning/Google Scholar
- Design Rules. [n.d.]. MOSIS scalable CMOS (SCMOS). Retrieved from: https://www.mosis.com/files/scmos/scmos.pdf.Google Scholar
- Abhronil Sengupta, Yong Shim, and Kaushik Roy. 2016. Proposal for an all-spin artificial neural network: Emulating neural and synaptic functionalities through domain wall motion in ferromagnets. IEEE Trans. Biomed. Circ. Syst. 10, 6 (2016), 1152--1160.Google Scholar
Cross Ref
- A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the ACM/IEEE 43rd International Symposium on Computer Architecture (ISCA’16). 14--26. DOI:https://doi.org/10.1109/ISCA.2016.12Google Scholar
- Hokchhay Tann, Soheil Hashemi, R. Iris Bahar, and Sherief Reda. 2017. Hardware-software codesign of accurate, multiplier-free deep neural networks. In Proceedings of the 54th Design Automation Conference (DAC’17). ACM, New York, NY, Article 28, 6 pages. DOI:https://doi.org/10.1145/3061639.3062259Google Scholar
Digital Library
- A. F. Vincent, J. Larroque, N. Locatelli, N. Ben Romdhane, O. Bichler, C. Gamrat, W. S. Zhao, J. O. Klein, S. Galdin-Retailleau, and D. Querlioz. 2015. Spin-transfer torque magnetic memory as a stochastic memristive synapse for neuromorphic systems. IEEE Trans. Biomed. Circ. Syst. 9, 2 (Apr. 2015), 166--174. DOI:https://doi.org/10.1109/TBCAS.2015.2414423Google Scholar
Cross Ref
- Yandan Wang, Wei Wen, Beiye Liu, Donald M. Chiarulli, and Hai Helen Li. 2017. Group scissor: Scaling neuromorphic computing design to big neural networks. CoRR abs/1702.03443 (2017). Retrieved from: http://arxiv.org/abs/1702.03443.Google Scholar
Digital Library
- H. S Philip Wong, Heng Yuan Lee, Shimeng Yu, Yu Sheng Chen, Yi Wu, Pang Shiu Chen, Byoungil Lee, Frederick T. Chen, and Ming Jinn Tsai. 2012. Metal-oxide RRAM. Proc. IEEE 100, 6 (6 2012), 1951--1970. DOI:https://doi.org/10.1109/JPROC.2012.2190369Google Scholar
- L. Xia, B. Li, T. Tang, P. Gu, X. Yin, W. Huangfu, P. Y. Chen, S. Yu, Y. Cao, Y. Wang, Y. Xie, and H. Yang. 2016. MNSIM: Simulation platform for memristor-based neuromorphic computing system. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’16). 469--474.Google Scholar
- B. Yan, J. Yang, Q. Wu, Y. Chen, and H. Li. 2017. A closed-loop design to enhance weight stability of memristor based neural network chips. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD’17). 541--548. DOI:https://doi.org/10.1109/ICCAD.2017.8203824Google Scholar
- Jintao Zhang, Zhuo Wang, and Naveen Verma. June 2016. A machine-learning classifier implemented in a standard 6T SRAM array. In Proceedings of the IEEE Symposium on VLSI Circuits (VLSI-Circuits’16). IEEE, 1--2.Google Scholar
- Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. CoRR abs/1606.06160 (2016). Retrieved from: http://arxiv.org/abs/1606.06160.Google Scholar
Index Terms
CxDNN: Hardware-software Compensation Methods for Deep Neural Networks on Resistive Crossbar Systems
Recommendations
Impact of On-chip Interconnect on In-memory Acceleration of Deep Neural Networks
With the widespread use of Deep Neural Networks (DNNs), machine learning algorithms have evolved in two diverse directions—one with ever-increasing connection density for better accuracy and the other with more compact sizing for energy efficiency. The ...
SWIPE: enhancing robustness of ReRAM crossbars for in-memory computing
ICCAD '20: Proceedings of the 39th International Conference on Computer-Aided DesignCrossbar-based in-memory architectures have emerged as an attractive platform for energy-efficient realization of deep neural networks (DNNs). A key challenge in such architectures is achieving accurate and efficient writes due to the presence of ...
Examining the Role and Limits of Batchnorm Optimization to Mitigate Diverse Hardware-noise in In-memory Computing
GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area- & compute-efficiencies. However, the intrinsic non-idealities in crossbars, ...






Comments