skip to main content
research-article

Coarse grain parallelization of deep neural networks

Published:27 February 2016Publication History
Skip Abstract Section

Abstract

Deep neural networks (DNN) have recently achieved extraordinary results in domains like computer vision and speech recognition. An essential element for this success has been the introduction of high performance computing (HPC) techniques in the critical step of training the neural network. This paper describes the implementation and analysis of a network-agnostic and convergence-invariant coarse-grain parallelization of the DNN training algorithm. The coarse-grain parallelization is achieved through the exploitation of the batch-level parallelism. This strategy is independent from the support of specialized and optimized libraries. Therefore, the optimization is immediately available for accelerating the DNN training. The proposal is compatible with multi-GPU execution without altering the algorithm convergence rate. The parallelization has been implemented in Caffe, a state-of-the-art DNN framework. The paper describes the code transformations for the parallelization and we also identify the limiting performance factors of the approach. We show competitive performance results for two state-of-the-art computer vision datasets, MNIST and CIFAR-10. In particular, on a 16-core Xeon E5-2667v2 at 3.30GHz we observe speedups of 8× over the sequential execution, at similar performance levels of those obtained by the GPU optimized Caffe version in a NVIDIA K40 GPU.

References

  1. F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.Google ScholarGoogle Scholar
  2. A. Basumallik, S.-J. Min, and R. Eigenmann. Programming distributed memory sytems using openmp. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1--8. IEEE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  3. L. S. Blackford, A. Petitet, R. Pozo, K. Remington, R. C. Whaley, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry, et al. An updated set of basic linear algebra subprograms (blas). ACM Transactions on Mathematical Software, 28(2):135--151, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Bottou. The tradeoffs of large scale learning. Advances in Neural Information Processing Systems, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014.Google ScholarGoogle Scholar
  6. T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. 11th USENIX Symposium on Operating Systems Design and Implementation, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. C. Ciresan, U. Meier, and J. Schmidhuber. Multicolumn deep neural networks for image classification. Computer Vision and Pattern Recognition. CVPR12., 2012.Google ScholarGoogle Scholar
  8. A. Coates, B. Huval, T. Wang, D. J. Wu, B. C. Catanzaro, and A. Y. Ng. Deep learning with COTS HPC systems. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pages 1337--1345, 2013.Google ScholarGoogle Scholar
  9. R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, 2011. URL https://publidiap.idiap.ch/downloads//papers/2011/Collobert_NIPSWORKSHOP_2011.pdf.Google ScholarGoogle Scholar
  10. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. W. Senior, P. A. Tucker, et al. Large scale distributed deep networks. In NIPS, pages 1232--1240, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Deng. The MNIST Database of Handwritten Digit Images for Machine Learning Research {Best of the Web}. Signal Processing Magazine, IEEE, 29(6):141--142, Nov 2012. ISSN 1053-5888.Google ScholarGoogle ScholarCross RefCross Ref
  12. J. Dongarra. Preface: basic linear algebra subprograms technical (blast) forum standard. International Journal of High Performance Computing Applications, 16(2):115--115, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Google. Protocol buffers. https://developers.google.com/protocol-buffers/, 2015.Google ScholarGoogle Scholar
  15. A. Y. Hannun, C. Case, J. Casper, B. C. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, and A. Y. Ng. Deep speech: Scaling up end-to-end speech recognition. CoRR, abs/1412.5567, 2014. URL http://arxiv.org/abs/1412.5567.Google ScholarGoogle Scholar
  16. G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 2012.Google ScholarGoogle Scholar
  17. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Google ScholarGoogle Scholar
  18. A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. URL http://www.cs.toronto.edu/~kriz/cifar.html.Google ScholarGoogle Scholar
  19. A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, pages 1097--1105, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient Based Learning Applied to Document Recognition. IEEE Press, pages 306--351, 2001.Google ScholarGoogle Scholar
  22. Y. LeCun, C. Cortes, and C. Burges. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 2015.Google ScholarGoogle Scholar
  23. Y. Nesterov. A method of solving a convex programming problem with convergence rate o(1/k). Soviet Mathematics Doklady, 1983.Google ScholarGoogle Scholar
  24. R. Raina, A. Madhavan, and A. Ng. Large-scale deep unsupervised learning using graphics processors. International Conference on Machine Learning, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), pages 1--42, April 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In Arxiv 1311.2901. http://arxiv.org/abs/1311.2901, 2013.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!