skip to main content
poster

Data layout optimization for GPGPU architectures

Authors Info & Claims
Published:23 February 2013Publication History
Skip Abstract Section

Abstract

GPUs are being widely used in accelerating general-purpose applications, leading to the emergence of GPGPU architectures. New programming models, e.g., Compute Unified Device Architecture (CUDA), have been proposed to facilitate programming general-purpose computations in GPGPUs. However, writing high-performance CUDA codes manually is still tedious and difficult. In particular, the organization of the data in the memory space can greatly affect the performance due to the unique features of a custom GPGPU memory hierarchy. In this work, we propose an automatic data layout transformation framework to solve the key issues associated with a GPGPU memory hierarchy (i.e., channel skewing, data coalescing, and bank conflicts). Our approach employs a widely applicable strategy based on a novel concept called data localization. Specifically, we try to optimize the layout of the arrays accessed in affine loop nests, for both the device memory and shared memory, at both coarse grain and fine grain parallelization levels. We performed an experimental evaluation of our data layout optimization strategy using 15 benchmarks on an NVIDIA CUDA GPU device. The results show that the proposed data transformation approach brings around 4.3X speedup on average.

References

  1. CUDA. http://www.nvidia.com/object/cuda_home_new.html.Google ScholarGoogle Scholar
  2. PLUTO. http://pluto-compiler.sourceforge.net/.Google ScholarGoogle Scholar
  3. U. Bondhugula et al. A practical automatic polyhedral parallelizer and locality optimizer. Proc. of PLDI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. IISWC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Garland et al. Parallel computing experiences with CUDA. MICRO, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data layout optimization for GPGPU architectures

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 48, Issue 8
      PPoPP '13
      August 2013
      309 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2517327
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
        February 2013
        332 pages
        ISBN:9781450319225
        DOI:10.1145/2442516

      Copyright © 2013 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 February 2013

      Check for updates

      Qualifiers

      • poster

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!