skip to main content
poster

Accelerating GPGPU architecture simulation

Authors Info & Claims
Published:17 June 2013Publication History
Skip Abstract Section

Abstract

Recently, graphics processing units (GPUs) have opened up new opportunities for speeding up general-purpose parallel applications due to their massive computational power and up to hundreds of thousands of threads enabled by programming models such as CUDA. However, due to the serial nature of existing micro-architecture simulators, these massively parallel architectures and workloads need to be simulated sequentially. As a result, simulating GPGPU architectures with typical benchmarks and input data sets is extremely time-consuming. This paper addresses the GPGPU architecture simulation challenge by generating miniature, yet representative GPGPU kernels. We first summarize the static characteristics of an existing GPGPU kernel in a profile, and analyze its dynamic behavior using the novel concept of the divergence flow statistics graph (DFSG). We subsequently use a GPGPU kernel synthesizing framework to generate a miniature proxy of the original kernel, which can reduce simulation time significantly. The key idea is to reduce the number of simulated instructions by decreasing per-thread iteration counts of loops. Our experimental results show that our approach can accelerate GPGPU architecture simulation by a factor of 88X on average and up to 589X with an average IPC relative error of 5.6%.

References

  1. NVIDIA CORPORATION, CUDA Programming Guide Version 3.0, 2010.Google ScholarGoogle Scholar
  2. Bakhoda. A, Yuan. G. L, Fung. W. L, Wong. H, and Aamodt. T. M. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 163--174, April 2009.Google ScholarGoogle ScholarCross RefCross Ref
  3. Wunderlich. R. E, Wenisch. T. F, Fasafi. B, and Hoe. J. C. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA), pp. 84--95, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sherwood. T, Perelman. E, Hamerly. G, and Calder. B. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the 10th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS), pp. 45--57, Oct 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Accelerating GPGPU architecture simulation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGMETRICS Performance Evaluation Review
      ACM SIGMETRICS Performance Evaluation Review  Volume 41, Issue 1
      Performance evaluation review
      June 2013
      385 pages
      ISSN:0163-5999
      DOI:10.1145/2494232
      Issue’s Table of Contents
      • cover image ACM Conferences
        SIGMETRICS '13: Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
        June 2013
        406 pages
        ISBN:9781450319003
        DOI:10.1145/2465529

      Copyright © 2013 Copyright is held by the owner/author(s)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 June 2013

      Check for updates

      Qualifiers

      • poster

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!