Abstract
Recently, graphics processing units (GPUs) have opened up new opportunities for speeding up general-purpose parallel applications due to their massive computational power and up to hundreds of thousands of threads enabled by programming models such as CUDA. However, due to the serial nature of existing micro-architecture simulators, these massively parallel architectures and workloads need to be simulated sequentially. As a result, simulating GPGPU architectures with typical benchmarks and input data sets is extremely time-consuming. This paper addresses the GPGPU architecture simulation challenge by generating miniature, yet representative GPGPU kernels. We first summarize the static characteristics of an existing GPGPU kernel in a profile, and analyze its dynamic behavior using the novel concept of the divergence flow statistics graph (DFSG). We subsequently use a GPGPU kernel synthesizing framework to generate a miniature proxy of the original kernel, which can reduce simulation time significantly. The key idea is to reduce the number of simulated instructions by decreasing per-thread iteration counts of loops. Our experimental results show that our approach can accelerate GPGPU architecture simulation by a factor of 88X on average and up to 589X with an average IPC relative error of 5.6%.
- NVIDIA CORPORATION, CUDA Programming Guide Version 3.0, 2010.Google Scholar
- Bakhoda. A, Yuan. G. L, Fung. W. L, Wong. H, and Aamodt. T. M. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 163--174, April 2009.Google Scholar
Cross Ref
- Wunderlich. R. E, Wenisch. T. F, Fasafi. B, and Hoe. J. C. SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA), pp. 84--95, June 2003. Google Scholar
Digital Library
- Sherwood. T, Perelman. E, Hamerly. G, and Calder. B. Automatically Characterizing Large Scale Program Behavior. In Proceedings of the 10th International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS), pp. 45--57, Oct 2002. Google Scholar
Digital Library
Index Terms
Accelerating GPGPU architecture simulation
Recommendations
Accelerating GPGPU architecture simulation
SIGMETRICS '13: Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systemsRecently, graphics processing units (GPUs) have opened up new opportunities for speeding up general-purpose parallel applications due to their massive computational power and up to hundreds of thousands of threads enabled by programming models such as ...
GPGPU-MiniBench: Accelerating GPGPU Micro-Architecture Simulation
Graphics processing units (GPU), due to their massive computational power with up to thousands of concurrent threads and general-purpose GPU (GPGPU) programming models such as CUDA and OpenCL, have opened up new opportunities for speeding up general-...
Efficient Kernel Management on GPUs
Special Issue on Secure and Fault-Tolerant Embedded Computing and Regular PapersGraphics Processing Units (GPUs) have been widely adopted as accelerators for compute-intensive applications due to its tremendous computational power and high memory bandwidth. As the complexity of applications continues to grow, each new generation of ...







Comments