Abstract
Basic microarchitectural features of NVIDIA GPUs have been stable for a decade, and many analytic solutions were proposed to model their performance. We present a way to review, systematize, and evaluate these approaches by using a microbenchmark. In this manner, we produce a brief algebraic summary of key elements of selected performance models, identify patterns in their design, and highlight their previously unknown limitations. Also, we identify a potentially superior method for estimating performance based on classical work.
- Denning, P. J., and Buzen, J. P. 1978. The Operational Analysis of Queuing Network Models. ACM Computing Surveys 10, 3, 225--261. Google Scholar
Digital Library
- Hong, S., and Kim, H. 2009. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In International Symposium on Computer Architecture (ISCA '09), 152--163. Google Scholar
Digital Library
- Chen, X. E., and Aamodt, T. M. 2009. A first-order fine-grained multithreaded throughput model. In International Symposium on High Performance Computer Architecture (HPCA '09), 329--340.Google Scholar
- Huang, J.-C., Lee, J. H., Kim, H., and Lee, H.-H. S. 2014. GPUMech: GPU performance modeling technique based on interval analysis. In International Symposium on Microarchitecture (MICRO-47), 268--279. Google Scholar
Digital Library
- Sim, J., Dasgupta, A., Kim, H., and Vuduc, R. 2012. A performance analysis framework for identifying potential benefits in GPGPU applications. In Symposium on Principles and Practice of Parallel Programming (PPoPP '12), 11--22. Google Scholar
Digital Library
- Zhang, Y., and Owens, J. D. 2011. A quantitative performance analysis model for GPU architectures. In International Symposium on High Performance Computer Architecture (HPCA '11), 382--393. Google Scholar
Digital Library
- Baghsorkhi, S. S., Delahaye, M., Patel, S. J., Gropp, W. D., and Hwu, W. W. 2010. An adaptive performance modeling tool for GPU architectures. In Symposium on Principles and Practice of Parallel Programming (PPoPP '10), 105--114. Google Scholar
Digital Library
- NVIDIA. 2017. CUDA C Programming Guide v9.1. November 2017.Google Scholar
- Saavedra-Barrera, R., Culler, D., and von Eicken, T. 1990. Analysis of multithreaded architectures for parallel computing. In Symposium on Parallel Algorithms and Architectures (SPAA '90), 169--178. Google Scholar
Digital Library
Recommendations
A microbenchmark to study GPU performance models
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingBasic microarchitectural features of NVIDIA GPUs have been stable for a decade, and many analytic solutions were proposed to model their performance. We present a way to review, systematize, and evaluate these approaches by using a microbenchmark. In ...
An integrated GPU power and performance model
ISCA '10: Proceedings of the 37th annual international symposium on Computer architectureGPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is ...
Improving Performance of GPU Specific OpenCL Program on CPUs
PDCAT '12: Proceedings of the 2012 13th International Conference on Parallel and Distributed Computing, Applications and TechnologiesOpenCL provides unified programming interface for various parallel computing platforms. The OpenCL framework manifests good functional portability, the programs can be run on platforms supporting OpenCL programming without any modification. However, ...







Comments