Abstract
Neural networks over conventional computing platforms are heavily restricted by the data volume and performance concerns. While non-volatile memory offers potential solutions to data volume issues, challenges must be faced over performance issues, especially with asymmetric read and write performance. Beside that, critical concerns over endurance must also be resolved before non-volatile memory could be used in reality for neural networks. This work addresses the performance and endurance concerns altogether by proposing a data-aware programming scheme. We propose to consider neural network training jointly with respect to the data-flow and data-content points of view. In particular, methodologies with approximate results over Dual-SET operations were presented. Encouraging results were observed through a series of experiments, where great efficiency and lifetime enhancement is seen without sacrificing the result accuracy.
- A. Akel, A. M. Caulfield, T. I. Mollov, R. K. Gupta, and S. Swanson. 2011. Onyx: A prototype phase change memory storage array. HotStorage 1 (2011), 1.Google Scholar
- B. Chang, Y. Chang, H. Chang, T. Kuo, and H. Li. 2014. A PCM translation layer for integrated memory and storage management. In Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES’14). 6:1--6:10.Google Scholar
- H. Chang, Y. Chang, T. Kuo, and H. Li. 2015. A light-weighted software-controlled cache for PCM-based main memory systems. In 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 22--29.Google Scholar
- S. Chen, S. Jiang, B. He, and X. Tang. 2016. A study of sorting algorithms on approximate memory. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16). 647--662.Google Scholar
- P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 27--39.Google Scholar
- S. Cho and H. Lee. 2009. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 347--357.Google Scholar
- M. Courbariaux, Y. Bengio, and J. P. David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. CoRR abs/1511.00363 (2015).Google Scholar
- J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’12). 1223--1231.Google Scholar
Digital Library
- Y. Deguchi, K. Maeda, S. Suzuki, T. Nakamura, and K. Takeuchi. 2018. Error-reduction controller techniques of TaOx-based ReRAM for deep neural networks to extend data-retention lifetime by over 1700x. In 2018 IEEE International Memory Workshop (IMW). 1--4.Google Scholar
- Y. Deguchi and K. Takeuchi. 2018. 3D-NAND flash solid-state drive (SSD) for deep neural network weight storage of IoT edge devices with 700x data-retention lifetime extention. In 2018 IEEE International Memory Workshop (IMW). 1--4.Google Scholar
- M. Donato, B. Reagen, L. Pentecost, U. Gupta, D. Brooks, and G. Wei. 2018. On-chip deep neural network storage with multi-level eNVM. In Proceedings of the 55th Annual Design Automation Conference. 169.Google Scholar
- S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. 2015. Deep learning with limited numerical precision. CoRR abs/1502.02551 (2015).Google Scholar
- P. Gysel, M. Motamedi, and S. Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. CoRR abs/1604.03168 (2016).Google Scholar
- J. Han and C. Moraga. 1995. The influence of the sigmoid function parameters on the speed of backpropagation learning. In International Workshop on Artificial Neural Networks. 195--201.Google Scholar
- A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017).Google Scholar
- F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016).Google Scholar
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia (MM’14). 675--678.Google Scholar
- I. S. Kim, S. L. Cho, D. H. Im, E. H. Cho, D. H. Kim, G. H. Oh, D. H. Ahn, S. O. Park, S. W. Nam, J. T. Moon, and C. H. Chung. 2010. High performance PRAM cell scalable to sub-20nm technology with below 4F2 cell size, extendable to DRAM applications. In 2010 Symposium on VLSI Technology. 203--204.Google Scholar
- A. Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. CoRR abs/1404.5997 (2014).Google Scholar
- A. Krizhevsky, V. Nair, and G. Hinton. 2009. CIFAR-10 (canadian institute for advanced research). (2009). http://www.cs.toronto.edu/ kriz/cifar.html.Google Scholar
- A. Krizhevsky, I. Sutskever, and G. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’12). 1097--1105.Google Scholar
Digital Library
- A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, and V. Ferrari. 2018. The open images dataset V4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982 (2018).Google Scholar
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov 1998), 2278--2324.Google Scholar
Cross Ref
- B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. 2009. Architecting phase change memory as a scalable dram alternative. SIGARCH Comput. Archit. News (Jun 2009).Google Scholar
Digital Library
- B. Li, Y. Hu, and X. Li. 2014. Short-SET: An energy-efficient write scheme for MLC PCM. In 2014 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA). 1--6.Google Scholar
- B. Li, S. Shan, Y. Hu, and X. Li. 2014. Partial-SET: Write speedup of PCM main memory. In Proceedings of the Conference on Design, Automation 8 Test in Europe (DATE’14). 53:1--53:4.Google Scholar
- H. Lue, W. Chen, H. Chang, K. Wang, and C. Lu. 2018. A novel 3D AND-type NVM architecture capable of high-density, low-power in-memory sum-of-product computation for artificial intelligence application. In 2018 IEEE Symposium on VLSI Technology.Google Scholar
- Y. Ma, N. Suda, Y. Cao, J. Seo, and S. Vrudhula. 2016. Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL).Google Scholar
- F. Meng, Y. Xue, and C. Yang. 2018. Power- and endurance-aware neural network training in NVM-based platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (Nov 2018), 2709--2719.Google Scholar
Cross Ref
- Micron. 2013. NAND flash memory MT29F64G08AB[C/E]BB, MT29F128G08AE[C/E]BB, MT29F256G08AK[C/E]BB. (2013).Google Scholar
- V. Nair and G. E. Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 807--814.Google Scholar
Digital Library
- NVIDIA. 2018. NVIDIA GeForce GTX 1080. (2018). https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080/.Google Scholar
- M. K. Qureshi, M. M. Franceschini, A. Jagmohan, and L. A. Lastras. 2012. PreSET: Improving performance of phase change memories by exploiting asymmetry in write times. SIGARCH Comput. Archit. News 40, 3 (jun 2012), 380--391.Google Scholar
Digital Library
- M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). 24--33.Google Scholar
- M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. CoRR abs/1603.05279 (2016).Google Scholar
- B. Recht, C. Re, S. Wright, and F. Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems. 693--701.Google Scholar
Digital Library
- M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar, and S. W. Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49). 18:1--18:13.Google Scholar
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and F. Li. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.Google Scholar
Digital Library
- A. Sampson, J. Nelson, K. Strauss, and L. Ceze. 2013. Approximate storage in solid-state memories. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 25--36.Google Scholar
- A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). 14--26.Google Scholar
- K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).Google Scholar
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--9.Google Scholar
- C. Tsao, Y. Chang, and T. Kuo. 2018. Boosting NVDIMM performance with a lightweight caching algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 8 (Aug 2018), 1518--1530.Google Scholar
Cross Ref
- C. Wu, M. Yang, Y. Chang, and T. Kuo. 2018. Hot-spot suppression for resource-constrained image recognition devices with nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (Nov 2018), 2567--2577.Google Scholar
Cross Ref
- O. Yadan, K. Adams, Y. Taigman, and M. Ranzato. 2013. Multi-GPU training of ConvNets. CoRR abs/1312.5853 (2013).Google Scholar
- B. Yang, J. Lee, J. Kim, J. Cho, S. Lee, and B. Yu. 2007. A low power phase-change random access memory using a data-comparison write scheme. In 2007 IEEE International Symposium on Circuits and Systems.Google Scholar
- J. Yue and Y. Zhu. 2013. Accelerating write by exploiting PCM asymmetries. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 282--293.Google Scholar
- M. Zhang, L. Zhang, L. Jiang, F. T. Chong, and Z. Liu. 2017. Quick-and-Dirty: Improving performance of MLC PCM by using temporary short writes. In 2017 IEEE International Conference on Computer Design (ICCD). 585--588.Google Scholar
- P. Zhou, B. Zhao, J. Yang, and Y. Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. (Jun 2009).Google Scholar
Index Terms
Achieving Lossless Accuracy with Lossy Programming for Efficient Neural-Network Training on NVM-Based Systems
Recommendations
Emerging NVM: A Survey on Architectural Integration and Research Challenges
There has been a surge of interest in Non-Volatile Memory (NVM) in recent years. With many advantages, such as density and power consumption, NVM is carving out a place in the memory hierarchy and may eventually change our view of computer architecture. ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14: Proceedings of the 19th international conference on Architectural support for programming languages and operating systemsEmerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...
NVM duet: unified working memory and persistent store architecture
ASPLOS '14Emerging non-volatile memory (NVM) technologies have gained a lot of attention recently. The byte-addressability and high density of NVM enable computer architects to build large-scale main memory systems. NVM has also been shown to be a promising ...






Comments