skip to main content
research-article

Achieving Lossless Accuracy with Lossy Programming for Efficient Neural-Network Training on NVM-Based Systems

Authors Info & Claims
Published:08 October 2019Publication History
Skip Abstract Section

Abstract

Neural networks over conventional computing platforms are heavily restricted by the data volume and performance concerns. While non-volatile memory offers potential solutions to data volume issues, challenges must be faced over performance issues, especially with asymmetric read and write performance. Beside that, critical concerns over endurance must also be resolved before non-volatile memory could be used in reality for neural networks. This work addresses the performance and endurance concerns altogether by proposing a data-aware programming scheme. We propose to consider neural network training jointly with respect to the data-flow and data-content points of view. In particular, methodologies with approximate results over Dual-SET operations were presented. Encouraging results were observed through a series of experiments, where great efficiency and lifetime enhancement is seen without sacrificing the result accuracy.

References

  1. A. Akel, A. M. Caulfield, T. I. Mollov, R. K. Gupta, and S. Swanson. 2011. Onyx: A prototype phase change memory storage array. HotStorage 1 (2011), 1.Google ScholarGoogle Scholar
  2. B. Chang, Y. Chang, H. Chang, T. Kuo, and H. Li. 2014. A PCM translation layer for integrated memory and storage management. In Proceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis (CODES’14). 6:1--6:10.Google ScholarGoogle Scholar
  3. H. Chang, Y. Chang, T. Kuo, and H. Li. 2015. A light-weighted software-controlled cache for PCM-based main memory systems. In 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 22--29.Google ScholarGoogle Scholar
  4. S. Chen, S. Jiang, B. He, and X. Tang. 2016. A study of sorting algorithms on approximate memory. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16). 647--662.Google ScholarGoogle Scholar
  5. P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 27--39.Google ScholarGoogle Scholar
  6. S. Cho and H. Lee. 2009. Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 347--357.Google ScholarGoogle Scholar
  7. M. Courbariaux, Y. Bengio, and J. P. David. 2015. BinaryConnect: Training deep neural networks with binary weights during propagations. CoRR abs/1511.00363 (2015).Google ScholarGoogle Scholar
  8. J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. 2012. Large scale distributed deep networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’12). 1223--1231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Deguchi, K. Maeda, S. Suzuki, T. Nakamura, and K. Takeuchi. 2018. Error-reduction controller techniques of TaOx-based ReRAM for deep neural networks to extend data-retention lifetime by over 1700x. In 2018 IEEE International Memory Workshop (IMW). 1--4.Google ScholarGoogle Scholar
  10. Y. Deguchi and K. Takeuchi. 2018. 3D-NAND flash solid-state drive (SSD) for deep neural network weight storage of IoT edge devices with 700x data-retention lifetime extention. In 2018 IEEE International Memory Workshop (IMW). 1--4.Google ScholarGoogle Scholar
  11. M. Donato, B. Reagen, L. Pentecost, U. Gupta, D. Brooks, and G. Wei. 2018. On-chip deep neural network storage with multi-level eNVM. In Proceedings of the 55th Annual Design Automation Conference. 169.Google ScholarGoogle Scholar
  12. S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. 2015. Deep learning with limited numerical precision. CoRR abs/1502.02551 (2015).Google ScholarGoogle Scholar
  13. P. Gysel, M. Motamedi, and S. Ghiasi. 2016. Hardware-oriented approximation of convolutional neural networks. CoRR abs/1604.03168 (2016).Google ScholarGoogle Scholar
  14. J. Han and C. Moraga. 1995. The influence of the sigmoid function parameters on the speed of backpropagation learning. In International Workshop on Artificial Neural Networks. 195--201.Google ScholarGoogle Scholar
  15. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017).Google ScholarGoogle Scholar
  16. F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016).Google ScholarGoogle Scholar
  17. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22Nd ACM International Conference on Multimedia (MM’14). 675--678.Google ScholarGoogle Scholar
  18. I. S. Kim, S. L. Cho, D. H. Im, E. H. Cho, D. H. Kim, G. H. Oh, D. H. Ahn, S. O. Park, S. W. Nam, J. T. Moon, and C. H. Chung. 2010. High performance PRAM cell scalable to sub-20nm technology with below 4F2 cell size, extendable to DRAM applications. In 2010 Symposium on VLSI Technology. 203--204.Google ScholarGoogle Scholar
  19. A. Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. CoRR abs/1404.5997 (2014).Google ScholarGoogle Scholar
  20. A. Krizhevsky, V. Nair, and G. Hinton. 2009. CIFAR-10 (canadian institute for advanced research). (2009). http://www.cs.toronto.edu/ kriz/cifar.html.Google ScholarGoogle Scholar
  21. A. Krizhevsky, I. Sutskever, and G. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1 (NIPS’12). 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont-Tuset, S. Kamali, S. Popov, M. Malloci, T. Duerig, and V. Ferrari. 2018. The open images dataset V4: Unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982 (2018).Google ScholarGoogle Scholar
  23. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov 1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  24. B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. 2009. Architecting phase change memory as a scalable dram alternative. SIGARCH Comput. Archit. News (Jun 2009).Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Li, Y. Hu, and X. Li. 2014. Short-SET: An energy-efficient write scheme for MLC PCM. In 2014 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA). 1--6.Google ScholarGoogle Scholar
  26. B. Li, S. Shan, Y. Hu, and X. Li. 2014. Partial-SET: Write speedup of PCM main memory. In Proceedings of the Conference on Design, Automation 8 Test in Europe (DATE’14). 53:1--53:4.Google ScholarGoogle Scholar
  27. H. Lue, W. Chen, H. Chang, K. Wang, and C. Lu. 2018. A novel 3D AND-type NVM architecture capable of high-density, low-power in-memory sum-of-product computation for artificial intelligence application. In 2018 IEEE Symposium on VLSI Technology.Google ScholarGoogle Scholar
  28. Y. Ma, N. Suda, Y. Cao, J. Seo, and S. Vrudhula. 2016. Scalable and modularized RTL compilation of convolutional neural networks onto FPGA. In 2016 26th International Conference on Field Programmable Logic and Applications (FPL).Google ScholarGoogle Scholar
  29. F. Meng, Y. Xue, and C. Yang. 2018. Power- and endurance-aware neural network training in NVM-based platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (Nov 2018), 2709--2719.Google ScholarGoogle ScholarCross RefCross Ref
  30. Micron. 2013. NAND flash memory MT29F64G08AB[C/E]BB, MT29F128G08AE[C/E]BB, MT29F256G08AK[C/E]BB. (2013).Google ScholarGoogle Scholar
  31. V. Nair and G. E. Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 807--814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. NVIDIA. 2018. NVIDIA GeForce GTX 1080. (2018). https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080/.Google ScholarGoogle Scholar
  33. M. K. Qureshi, M. M. Franceschini, A. Jagmohan, and L. A. Lastras. 2012. PreSET: Improving performance of phase change memories by exploiting asymmetry in write times. SIGARCH Comput. Archit. News 40, 3 (jun 2012), 380--391.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). 24--33.Google ScholarGoogle Scholar
  35. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. CoRR abs/1603.05279 (2016).Google ScholarGoogle Scholar
  36. B. Recht, C. Re, S. Wright, and F. Niu. 2011. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems. 693--701.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar, and S. W. Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-49). 18:1--18:13.Google ScholarGoogle Scholar
  38. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and F. Li. 2015. ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Sampson, J. Nelson, K. Strauss, and L. Ceze. 2013. Approximate storage in solid-state memories. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 25--36.Google ScholarGoogle Scholar
  40. A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA’16). 14--26.Google ScholarGoogle Scholar
  41. K. Simonyan and A. Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014).Google ScholarGoogle Scholar
  42. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--9.Google ScholarGoogle Scholar
  43. C. Tsao, Y. Chang, and T. Kuo. 2018. Boosting NVDIMM performance with a lightweight caching algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 8 (Aug 2018), 1518--1530.Google ScholarGoogle ScholarCross RefCross Ref
  44. C. Wu, M. Yang, Y. Chang, and T. Kuo. 2018. Hot-spot suppression for resource-constrained image recognition devices with nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (Nov 2018), 2567--2577.Google ScholarGoogle ScholarCross RefCross Ref
  45. O. Yadan, K. Adams, Y. Taigman, and M. Ranzato. 2013. Multi-GPU training of ConvNets. CoRR abs/1312.5853 (2013).Google ScholarGoogle Scholar
  46. B. Yang, J. Lee, J. Kim, J. Cho, S. Lee, and B. Yu. 2007. A low power phase-change random access memory using a data-comparison write scheme. In 2007 IEEE International Symposium on Circuits and Systems.Google ScholarGoogle Scholar
  47. J. Yue and Y. Zhu. 2013. Accelerating write by exploiting PCM asymmetries. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 282--293.Google ScholarGoogle Scholar
  48. M. Zhang, L. Zhang, L. Jiang, F. T. Chong, and Z. Liu. 2017. Quick-and-Dirty: Improving performance of MLC PCM by using temporary short writes. In 2017 IEEE International Conference on Computer Design (ICCD). 585--588.Google ScholarGoogle Scholar
  49. P. Zhou, B. Zhao, J. Yang, and Y. Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. (Jun 2009).Google ScholarGoogle Scholar

Index Terms

  1. Achieving Lossless Accuracy with Lossy Programming for Efficient Neural-Network Training on NVM-Based Systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!