Abstract
Modern DNN frameworks exploit GPU acceleration by default to achieve high performance. The limitation of GPU memory capacity becomes a serious problem because DNNs are becoming deeper and larger. This paper proposes a purely software-based transparent solution, called tvDNN, to the GPU memory capacity problem. It is based on GPU memory swapping and memory object sectioning techniques. It also provides an efficient memory-object swapping schedule based on ILP (optimal) and heuristics (suboptimal). The experimental results show that tvDNN enables Caffe to build VGG-16 with a large batch size, such as 256 or 512, using a few GB of GPU memory without significant performance degradation.
- M. Abadi, A. Agarwal, P. Barham, and et al. 2016. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. (2016). arXiv:1603.04467Google Scholar
- Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural networks: Tricks of the trade. Springer, 437--478.Google Scholar
Digital Library
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.Google Scholar
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).Google Scholar
- M. Rhu, N. Gimelshein, J. Clemons, A. Zulfiqar, and S. W. Keckler. 2016. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13.Google Scholar
Digital Library
- K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. (2014). arXiv:1409.1556Google Scholar
Index Terms
Transparent GPU memory management for DNNs
Recommendations
Transparent GPU memory management for DNNs
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingModern DNN frameworks exploit GPU acceleration by default to achieve high performance. The limitation of GPU memory capacity becomes a serious problem because DNNs are becoming deeper and larger. This paper proposes a purely software-based transparent ...
Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationProcessing data in or near memory (PIM), as opposed to in conventional computational units in a processor, can greatly alleviate the performance and energy penalties of data transfers from/to main memory. Graphics Processing Unit (GPU) architectures and ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...







Comments