Abstract
This paper introduces a predictive modeling framework to estimate the performance of GPUs during pre-silicon design. Early-stage performance prediction is useful when simulation times impede development by rendering driver performance validation, API conformance testing and design space explorations infeasible. Our approach builds a Random Forest regression model to analyze DirectX 3D workload behavior when executed by a software rasterizer, which we have extended with a workload characterizer to collect further performance information via program counters. In addition to regression models, this work produces detailed feature rankings which can provide valuable architectural insight, and accurate performance estimates for an Intel integrated Skylake generation GPU. Our models achieve reasonable out-of-sample-error rates of 14%, with an average simulation speedup of 327x.
- Hirotugu Akaike. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19 (1974), 716--723.Google Scholar
Cross Ref
- Newsha Ardalani, Clint Lestourgeon, Karthikeyan Sankaralingam, and Xiaojin Zhu. 2015. Cross-architecture performance prediction (xapp) using cpu code to predict gpu performance. In Microarchitecture (MICRO), 2015 48th Annual IEEE/ACM International Symposium on. 725--737. Google Scholar
Digital Library
- Peter E. Bailey, David K. Lowenthal, Vignesh Ravi, Barry Rountree, Martin Schulz, De Supinski, and R. Bronis. 2014. Adaptive configuration selection for power-constrained heterogeneous systems. In Parallel Processing (ICPP), 2014 43rd International Conference on. 371--380.Google Scholar
- Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, and Tor M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In Performance Analysis of Systems and Software. 2009. ISPASS 2009. IEEE International Symposium on. 163--174.Google Scholar
- Victor Barrio, Moya Barrio, Carlos González, Jordi Roca, Agusta Fernández, and E. Espasa. 2006. ATTILA: A cycle-level execution-driven simulator for modern GPU architectures. In Performance Analysis of Systems and Software, 2006 IEEE International Symposium on. 231--241.Google Scholar
- Leo Breiman, Jerome Friedman, Charles J. Stone, and Richard A. Olshen. 1984. Classification and Regression Trees. CRC Press, 1984.Google Scholar
- Leo Breiman. 2001. Random forests. Machine Learning 45 (2001), 5--32. Google Scholar
Digital Library
- Jianmin Chen, Bin Li, Ying Zhang, Lu Peng, and Jih-kwon Peir. 2011. Tree structured analysis on GPU power study. In Computer Design (ICCD), 2011 IEEE 29th International Conference on. 57--64. Google Scholar
Digital Library
- Derek Chiou, Dam Sunwoo, and Joonsoo Kim. 2007. Fpga-accelerated simulation technologies (fast): Fast, full-system, cycle-accurate simulators. In Proceedings of the 40th Annual IEEE/ACM international Symposium on Microarchitecture. 249--261. Google Scholar
Digital Library
- Adele Cutler. Random Forests for Regression and Classification. Retrieved 2017-07-14 from https://goo.gl/0d7mxj.Google Scholar
- Davy Genbrugge, Stijn Eyerman, and Lieven Eeckhout. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. 1--12.Google Scholar
Cross Ref
- Christoph Gerum, Oliver Bringmann, and Wolfgang Rosenstiel. 2015. Source level performance simulation of gpu cores. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE), 2015. 217--222. Google Scholar
Digital Library
- Qi Guo, Tianshi Chen, Yunji Chen, and Franz Franchetti. 2016. Accelerating architectural simulation via statistical techniques: A survey. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35 (2016), 433--446. Google Scholar
Digital Library
- Trevor Hastie, Jerome Friedman, and Robert Tibshirani. 2001. The elements of Statistical Learning. Springer series in statistics New York, 2001.Google Scholar
Cross Ref
- Intel Corporation. OpenSWR: A scalable High-Performance Sotware Rasterizer for SciVis. Retrieved 2017-07-14 from https://goo.gl/G8faFn.Google Scholar
- Intel Corporation. Intel Open Source HD Graphics, Intel Iris Graphics, and Intel Iris Pro Graphics Programmer's Reference Manual. Retrieved 2017-07-14 from https://goo.gl/KX3wgK.Google Scholar
- Intel Corporation. The Compute Architecture of Intel Processor Graphics Gen9. Retrieved 2017-07-14 from https://goo.gl/RMmUc6.Google Scholar
- Engin Ïpek, Sally A. McKee, Rich Caruana, Bronis R. de Supinski, and Martin Schulz. 2006. Efficiently Exploring Architectural Design Spaces Via Predictive Modeling. ACM, 2006.Google Scholar
- Ron Kohavi. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai 1137--1145. Google Scholar
Digital Library
- Benjamin C. Lee and David M. Brooks. 2006. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In ACM SIGOPS Operating Systems Review. 185--194. Google Scholar
Digital Library
- Andy Liaw and Matthew Wiener. 2002. Classification and regression by randomforest. R News 2 (2002), 18--22.Google Scholar
- Xiaohan Ma, Mian Dong, Lin Zhong, and Zhigang Deng. 2009. Statistical power consumption analysis and modeling for GPU-based computing. In Proceeding of ACM SOSP Workshop on Power Aware Computing and Systems (HotPower).Google Scholar
- Mesa 3D Graphics Library. Gallium Driver, SWR. Retrieved 2017-07-14 from https://goo.gl/YkuYyP.Google Scholar
- Microsoft Corporation. Windows Hardware Certification Kit User's Guide. Retrieved 2017-07-14 from https://goo.gl/s0TCzJ.Google Scholar
- Jason E. Miller, Harshad Kasture, and George Kurian. 2010. Graphite: A distributed parallel simulator for multicores. In High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. 1--12.Google Scholar
Cross Ref
- B. N. Petrov and F. Csáki. 1973. Information theory: Proceedings of the 2nd International symposium. Akadémiai Kiado. 1973, 1971, 267--281.Google Scholar
- Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. 2002. Automatically characterizing large scale program behavior. In ACM SIGARCH Computer Architecture News. 45--57. Google Scholar
Digital Library
- Shuaiwen Song, Chunyi Su, Barry Rountree, and Kirk W. Cameron. 2013. A simplified and accurate model of power-performance efficiency on emergent GPU architectures. In Parallel 8 Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on. 673--686. Google Scholar
Digital Library
- Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological). (1996), 267--288.Google Scholar
- Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Parallel Architectures and Compilation Techniques (PACT), 2012 21st International Conference on. 335--344. Google Scholar
Digital Library
- Gene Wu, Joseph L. Greathouse, Alexander Lyashevsky, Nuwan Jayasena, and Derek Chiou. 2015. GPGPU performance and power estimation using machine learning. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. 564--576.Google Scholar
Cross Ref
- Roland E. Wunderlich, Thomas F. Wenisch, Babak Falsafi, and James C. Hoe. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Computer Architecture, 2003. Proceedings. 30th Annual International Symposium on. 84--95. Google Scholar
Digital Library
- Zhibin Yu, Lieven Eeckhout, Nilanjan Goswami, Tao Li, Lizy John, Hai Jin, and Chengzhong Xu. 2013. Accelerating GPGPU architecture simulation. In ACM SIGMETRICS Performance Evaluation Review. 331--332. Google Scholar
Digital Library
- Ying Zhang, Yue Hu, Bin Li, and Lu Peng. 2011. Performance and power analysis of ATI GPU: A statistical approach. In Networking, Architecture and Storage (NAS), 2011 6th IEEE International Conference on. 149--158. Google Scholar
Digital Library
- Xinnian Zheng, Lizy K. John, and Andreas Gerstlauer. 2016. Accurate phase-level cross-platform power and performance estimation. In Proceedings of the 53rd Annual Design Automation Conference. 4. Google Scholar
Digital Library
- Hui Zou and Trevor Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (2005), 301--320.Google Scholar
Cross Ref
Index Terms
GPU Performance Estimation using Software Rasterization and Machine Learning
Recommendations
Predicting GPU Performance from CPU Runs Using Machine Learning
SBAC-PAD '14: Proceedings of the 2014 IEEE 26th International Symposium on Computer Architecture and High Performance ComputingGraphics processing units (GPUs) can deliver considerable performance gains over general purpose processors. However, GPU performance improvement vary considerably across applications. Porting applications to GPUs by rewriting code with GPU-specific ...
Importance of explicit vectorization for CPU and GPU software performance
Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are ...
High Performance Computing via a GPU
ICISE '09: Proceedings of the 2009 First IEEE International Conference on Information Science and EngineeringGraphics processor units (GPUs), such as the AMD FireStream series, offer a tremendous computing power that is frequently an order of magnitude larger than even the most modern multi-core CPUs, making them an attractive platform for high performance ...






Comments