skip to main content
column

A Case Study in Reverse Engineering GPGPUs: Outstanding Memory Handling Resources

Published:22 April 2016Publication History
Skip Abstract Section

Abstract

During recent years, GPU micro-architectures have changed dramatically, evolving into powerful many-core deep-multithreaded platforms for parallel workloads. While important micro-architectural modifications continue to appear in every new generation of these processors, unfortunately, little is known about the details of these innovative designs. One of the key questions in understanding GPUs is how they deal with outstanding memory misses. Our goal in this study is to find answers to this question. To this end, we develop a set of micro-benchmarks in CUDA to understand the outstanding memory requests handling resources. Particularly, we study two NVIDIA GPGPUs (Fermi and Kepler) and estimate their capability in handling outstanding memory requests. We show that Kepler can issue nearly 32X higher number of outstanding memory requests, compared to Fermi. We explain this enhancement by Kepler's architectural modifications in outstanding memory request handling resources.

References

  1. M. Anderson et al. A predictive model for solving small linear algebra problems in gpu registers. In IPDPS 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bakhoda et al. Analyzing cuda workloads using a detailed gpu simulator. In ISPASS 2009.Google ScholarGoogle ScholarCross RefCross Ref
  3. D. Kroft. Lockup-free instruction fetch/prefetch cache organization. In ISCA 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Lashgar and A. Baniasadi. A case against small data types in gpgpus. In ASAP 2014.Google ScholarGoogle ScholarCross RefCross Ref
  5. S. Moy and J. Lindholm. Across-thread out of order instruction dispatch in a multithreaded graphics processor, June 23 2005. US Patent App. 10/742,514.Google ScholarGoogle Scholar
  6. NVIDIA Corp. Nvidia's next generation cuda compute architecture: Kepler gk110. Available: http://www.nvidia.ca/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf.Google ScholarGoogle Scholar
  7. L. Nyland et al. Systems and methods for coalescing memory accesses of parallel threads, Mar. 5 2013. US Patent 8,392,669.Google ScholarGoogle Scholar
  8. Y. Torres et al. Understanding the impact of cuda tuning techniques for fermi. In HPCS 2011.Google ScholarGoogle ScholarCross RefCross Ref
  9. C. Wittenbrink et al. Fermi gf100 gpu architecture. IEEE Micro, 31(2):50--59, March 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Wong and others. Demystifying gpu microarchitecture through microbenchmarking. In ISPASS 2010.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Zhang et al. Performance and power analysis of ati gpu: A statistical approach. In NAS 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 43, Issue 4
    HEART '15
    September 2015
    98 pages
    ISSN:0163-5964
    DOI:10.1145/2927964
    Issue’s Table of Contents

    Copyright © 2016 Authors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 22 April 2016

    Check for updates

    Qualifiers

    • column

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!