skip to main content
research-article

Auto-tuning of fast fourier transform on graphics processors

Published:12 February 2011Publication History
Skip Abstract Section

Abstract

We present an auto-tuning framework for FFTs on graphics processors (GPUs). Due to complex design of the memory and compute subsystems on GPUs, the performance of FFT kernels over the range of possible input parameters can vary widely. We generate several variants for each component of the FFT kernel that, for different cases, are likely to perform well. Our auto-tuner composes variants to generate kernels and selects the best ones. We present heuristics to prune the search space and profile only a small fraction of all possible kernels. We compose optimized kernels to improve the performance of larger FFT computations. We implement the system using the NVIDIA CUDA API and compare its performance to the state-of-the-art FFT libraries. On a range of NVIDIA GPUs and input sizes, our auto-tuned FFTs outperform the NVIDIA CUFFT 3.0 library by up to 38x and deliver up to 3x higher performance compared to a manually-tuned FFT.

References

  1. S. Chellappa, F. Franchetti, and M. P¨ueschel. Computer generation of fast Fourier transforms for the cell broadband engine. In Proceedings of the 23rd international conference on Supercomputing, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Corp. CUDA occupancy calculator. http://developer.download.nvidia.com/compute/cuda/CUDA_Occupancy_calculator.xls, 2010.Google ScholarGoogle Scholar
  3. M. Frigo and S. G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  4. N. K. Govindaraju, S. Larsen, J. Gray, and D. Manocha. A memory model for scientific algorithms on graphics processors. In Proceedings of the ACM/IEEE conference on Supercomputing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. K. Govindaraju, B. Lloyd, Y. Dotsenko, B. Smith, and J. Manferdelli. High performance discrete Fourier transforms on graphics processors. In Proceedings of the ACM/IEEE conference on Supercomputing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Jansen, B. von Rymon-Lipinski, N. Hanssen, and E. Keeve. Fourier volume rendering on the GPU using a split-stream- FFT. In Proceedings of the Vision, Modeling, and Visualization Conference, 2004.Google ScholarGoogle Scholar
  7. Y. Li, J. Dongarra, and S. Tomov. A note on autotuning GEMM for GPUs. Technical Report UT-CS-09-635, Massachusetts Institute of Technology, May 2009. LAPACK Working Note 212.Google ScholarGoogle Scholar
  8. J. L. Mitchell, M. Y. Ansari, and E. Hart. Advanced image processing with DirectX 9 pixel shaders. In W. Engel, editor, ShaderX2: Shader Programming Tips and Tricks with DirectX 9.0. Wordware Publishing, Inc., 2003.Google ScholarGoogle Scholar
  9. K. Moreland and E. Angel. The FFT on a GPU. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Nukada and S. Matsuoka. Auto-tuning 3-D FFT library for CUDA GPUs. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Nukada, Y. Ogata, T. Endo, and S. Matsuoka. Bandwidth intensive 3-D FFT kernel for GPUs using CUDA. In Proceedings of the ACM/IEEE conference on Supercomputing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. NVIDIA Corp. NVIDIA CUDA Programming Guide, 2009.Google ScholarGoogle Scholar
  13. M. Pschel, J. M. F. Moura, B. Singer, J. Xiong, J. Johnson, D. Padua, M. Veloso, R. W. Johnson, M. Pschel, J. M. F. Moura, B. Singer, J. Xiong, J. Johnson, D. Padua, M. Veloso, and R. W. Johnson. Spiral: A generator for platform-adapted libraries of signal processing algorithms. Journal of High Performance Computing and Applications, 18:21--45, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Ryoo, C. I. Rodrigues, S. S. Stone, S. S. Baghsorkhi, S.-Z. Ueng, J. A. Stratton, and W.-m. W. Hwu. Program optimization space pruning for a multithreaded GPU. In Proceedings of the international symposium on Code generation and optimization, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Spitzer. Implementing a GPU-efficient FFT. SIGGRAPH Course on Interactive Geometric and Scientific Computations with Graphics Hardware, 2003.Google ScholarGoogle Scholar
  16. T. Sumanaweera and D. Liu. Medical image reconstruction with the FFT. In M. Pharr, editor, GPU Gems 2, pages 765--784. Addison-Wesley, 2005.Google ScholarGoogle Scholar
  17. C. Van Loan. Computational Frameworks for the Fast Fourier Transform. Society for Industrial Mathematics, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Volkov and B. Kazian. Fitting FFT onto the G80 architecture, 2008. http:www.cs.berkeley.edu/~kubitron/courses/cs258-S08/projects/reports/project6_ report.pdf.Google ScholarGoogle Scholar

Index Terms

  1. Auto-tuning of fast fourier transform on graphics processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 8
      PPoPP '11
      August 2011
      300 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2038037
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
        February 2011
        326 pages
        ISBN:9781450301190
        DOI:10.1145/1941553
        • General Chair:
        • Calin Cascaval,
        • Program Chair:
        • Pen-Chung Yew

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 February 2011

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!