skip to main content
research-article
Open access

Scalable Parallel Programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?

Published: 01 March 2008 Publication History

Abstract

The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now parallel systems. Furthermore, their parallelism continues to scale with Moore’s law. The challenge is to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism to manycore GPUs with widely varying numbers of cores.

References

[1]
NVIDIA. 2007. CUDA Technology; http://www.nvidia.com/CUDA.
[2]
NVIDIA. 2007. CUDA Programming Guide 1.1; http://developer.download.nvidia.com/compute/cuda/1_1/NVIDIA_CUDA_Programming_Guide_1.1.pdf.
[3]
Stratton, J.A., Stone, S. S., Hwu, W. W. 2008. M-CUDA: An efficient implementation of CUDA kernels on multicores. IMPACT Technical Report 08-01, University of Illinois at Urbana-Champaign, (February).
[4]
See reference 3.
[5]
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P. Brook for GPUs: Stream computing on graphics hardware. 2004. Proceedings of SIGGRAPH (August): 777-786; http://doi.acm.org/10.1145/1186562.1015800.
[6]
Stone, S.S., Yi, H., Hwu, W.W., Haldar, J.P., Sutton, B.P., Liang, Z.-P. 2007. How GPUs can improve the quality of magnetic resonance imaging. The First Workshop on General-Purpose Processing on Graphics Processing Units (October).
[7]
Stone, J.E., Phillips, J.C., Freddolino, P.L., Hardy, D.J., Trabuco, L.G., Schulten, K. 2007. Accelerating molecular modeling applications with graphics processors. Journal of Computational Chemistry 28(16): 2618--2640; http://dx.doi.org/10.1002/jcc.20829.
[8]
Nyland, L., Harris, M., Prins, J. 2007. Fast n-body simulation with CUDA. In GPU Gems 3. H. Nguyen, ed. Addison-Wesley.
[9]
Golub, G.H., and Van Loan, C.F. 1996. Matrix Computations, 3rd edition. Johns Hopkins University Press.
[10]
Buatois, L., Caumon, G., Lévy, B. 2007. Concurrent number cruncher: An efficient sparse linear solver on the GPU. Proceedings of the High-Performance Computation Conference (HPCC), Springer LNCS.
[11]
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D. 2007. Scan primitives for GPU computing. In Proceedings of Graphics Hardware (August): 97--106.
[12]
See Reference 3.

Cited By

View all
  • (2025)Green/WeakCoupling: Implementation of fully self-consistent finite-temperature many-body perturbation theory for molecules and solidsComputer Physics Communications10.1016/j.cpc.2024.109380306(109380)Online publication date: Jan-2025
  • (2024)The DECam Ecliptic Exploration Project (DEEP). VI. First Multiyear Observations of Trans-Neptunian ObjectsThe Astronomical Journal10.3847/1538-3881/ad1524167:3(136)Online publication date: 27-Feb-2024
  • (2024)A review of rigid point cloud registration based on deep learningFrontiers in Neurorobotics10.3389/fnbot.2023.128133217Online publication date: 4-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Queue
Queue  Volume 6, Issue 2
GPU Computing
March/April 2008
63 pages
ISSN:1542-7730
EISSN:1542-7749
DOI:10.1145/1365490
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2008
Published in QUEUE Volume 6, Issue 2

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5,457
  • Downloads (Last 6 weeks)534
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Green/WeakCoupling: Implementation of fully self-consistent finite-temperature many-body perturbation theory for molecules and solidsComputer Physics Communications10.1016/j.cpc.2024.109380306(109380)Online publication date: Jan-2025
  • (2024)The DECam Ecliptic Exploration Project (DEEP). VI. First Multiyear Observations of Trans-Neptunian ObjectsThe Astronomical Journal10.3847/1538-3881/ad1524167:3(136)Online publication date: 27-Feb-2024
  • (2024)A review of rigid point cloud registration based on deep learningFrontiers in Neurorobotics10.3389/fnbot.2023.128133217Online publication date: 4-Jan-2024
  • (2024)Parallel computation to bidimensional heat equation using MPI/CUDA and FFTW packageFrontiers in Computer Science10.3389/fcomp.2023.13058005Online publication date: 11-Jan-2024
  • (2024)CAD-ASTRA: a versatile and efficient mesh projector for X-ray tomography with the ASTRA-toolboxOptics Express10.1364/OE.49819432:3(3425)Online publication date: 18-Jan-2024
  • (2024)Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU SystemsACM Transactions on Architecture and Code Optimization10.1145/367684721:4(1-24)Online publication date: 8-Jul-2024
  • (2024)H-PLOC: Hierarchical Parallel Locally-Ordered Clustering for Bounding Volume Hierarchy ConstructionProceedings of the ACM on Computer Graphics and Interactive Techniques10.1145/36753777:3(1-14)Online publication date: 9-Aug-2024
  • (2024)Exploring Scalability in C++ Parallel STL ImplementationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673065(284-293)Online publication date: 12-Aug-2024
  • (2024)Wavefront Threading Enables Effective High-Level SynthesisProceedings of the ACM on Programming Languages10.1145/36564208:PLDI(1066-1090)Online publication date: 20-Jun-2024
  • (2024)Descend: A Safe GPU Systems Programming LanguageProceedings of the ACM on Programming Languages10.1145/36564118:PLDI(841-864)Online publication date: 20-Jun-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media