Abstract
Embedded processor performance is dependent on both the underlying architecture and the compiler optimizations applied. However, designing both simultaneously is extremely difficult to achieve due to the time constraints designers must work under. Therefore, current methodology involves designing compiler and architecture in isolation, leading to suboptimal performance of the final product.
This article develops a novel approach to this codesign space problem. For our specific design space, we demonstrate that we can automatically predict the performance that an optimizing compiler would achieve without actually tuning it for any of the microarchitecture configurations considered. Once trained, a single run of the program compiled with the standard optimization setting is enough to make a prediction on the new microarchitecture with just a 3.2% error rate on average. This allows the designer to accurately choose an architectural configuration with knowledge of how an optimizing compiler will perform on it. We use this to find the best optimizing compiler/architectural configuration in our codesign space and demonstrate that it achieves an average 19% performance improvement and energy savings of 16% compared to the baseline, nearly doubling the energy-efficiency measured as the energy-delay-squared product (EDD).
- Abraham, S. G. and Rau, B. R. 2000. Efficient design space exploration in pico. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded System. Google Scholar
Digital Library
- Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M. F. P., Thomson, J., Toussaint, M., and Williams, C. K. I. 2006. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization. Google Scholar
Digital Library
- Almagor, L., Cooper, K. D., Grosul, A., Harvey, T. J., Reeves, S. W., Subramanian, D., Torczon, L., and Waterman, T. 2004. Finding effective compilation sequences. SIGPLAN Not. 39, 7. Google Scholar
Digital Library
- Cavazos, J., Dubach, C., Agakov, F., Bonilla, E., O’Boyle, M. F. P., Fursin, G., and Temam, O. 2006. Automatic performance model construction for the fast software exploration of new hardware designs. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. Google Scholar
Digital Library
- Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O’Boyle, M. F. P., and Temam, O. 2007. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the International Symposium on Code Generation and Optimization. Google Scholar
Digital Library
- Contreras, G. et al. 2004. XTREM: A power simulator for the Intel XScale core. In Proceedings of the ACM SIGPLAN Joint Conference on Languages, Compilers and Tools for Embedded Systems. Google Scholar
Digital Library
- Cooper, K. D., Grosul, A., Harvey, T. J., Reeves, S., Subramanian, D., Torczon, L., and Waterman, T. 2005. Acme: Adaptive compilation made efficient. SIGPLAN Not. 40, 7. Google Scholar
Digital Library
- Desmet, V., Girbal, S., and Temam, O. 2009. Archexplorer.org: Joint compiler/hardware exploration for fair comparison of architectures. In Proceedings of the INTERACT workshop at the International Symposium on High-Performance Computer Architecture.Google Scholar
- Dubach, C., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M. F. P., and Temam, O. 2007a. Fast compiler optimisation evaluation using code-feature based performance prediction. In Proceedings of the International Conference on Computer Frontiers. Google Scholar
Digital Library
- Dubach, C., Jones, T. M., and O’Boyle, M. F. P. 2007b. Microarchitectural design space exploration using an architecture-centric approach. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Dubach, C., Jones, T. M., and O’Boyle, M. F. P. 2008. Exploring and predicting the architecture/optimising compiler co-design space. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. Google Scholar
Digital Library
- Eyerman, S., Eeckhout, L., Karkhanis, T., and Smith, J. E. 2006. A performance counter architecture for computing accurate cpi components. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- Fischer, D., Teich, J., Thies, M., and Weper, R. 2002. Efficient architecture/compiler coexploration for asips. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded System. Google Scholar
Digital Library
- Fischer, D., Teich, J., Weper, R., Kastens, U., and Thies, M. 2001. Design space characterization for architecture/compiler co-exploration. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems. Google Scholar
Digital Library
- Guthaus, M., Ringenberg, J., Ernst, D., Austin, T., Mudge, T., and Brown, R. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the 4th Annual Workshop on Workload Characterization. Google Scholar
Digital Library
- Haneda, M., Knijnenburg, P., and Wijshoff, H. 2005. Automatic selection of compiler options using non-parametric inferential statistics. Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- Hoste, K., Phansalkar, A., Eeckhout, L., Georges, A., John, L. K., and Bosschere, K. D. 2006. Performance prediction based on inherent program similarity. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- Intel Corporation. Intel XScale microarchitecture. http://www.intel.com/design/intelxscale/.Google Scholar
- İpek, E., de Supinski, B. R., Schulz, M., and McKee, S. A. 2005. An approach to performance prediction for parallel applications. In Proceedings of the International Euro-Par Conference on Parallel Processing. Google Scholar
Digital Library
- İpek, E., McKee, S. A., Caruana, R., de Supinski, B. R., and Schulz, M. 2006. Efficiently exploring architectural design spaces via predictive modeling. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- Joseph, P. J., Vaswani, K., and Thazhuthaveetil, M. J. 2006a. Construction and use of linear regression models for processor performance analysis. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google Scholar
- Joseph, P. J., Vaswani, K., and Thazhuthaveetil, M. J. 2006b. A predictive performance model for superscalar processors. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- Karkhanis, T. S. and Smith, J. E. 2004. A first-order superscalar processor model. In Proceedings of the Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- Khan, S., Xekalakis, P., Cavazos, J., and Cintra, M. 2007. Using predictive modeling for cross-program design space exploration in multicore systems. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- Kulkarni, P., Hines, S., Hiser, J., Whalley, D., Davidson, J., and Jones, D. 2004. Fast searches for effective optimization phase sequences. In Proceedings of the Conference on Programming Language Design and Implementation. Google Scholar
Digital Library
- Lee, B. C. and Brooks, D. M. 2006. Accurate and efficient regression modeling for microarchitectural performance and power prediction. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- Lee, B. C. and Brooks, D. M. 2007. Illustrative design space studies with microarchitectural regression models. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google Scholar
Digital Library
- Lee, B. C., Brooks, D. M., de Supinski, B. R., Schulz, M., Singh, K., and McKee, S. A. 2007. Methods of inference and learning for performance modeling of parallel applications. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Google Scholar
Digital Library
- Leupers, R., Hohenauer, M., Ceng, J., Scharwaechter, H., Meyr, H., Ascheid, G., and Braun, G. 2005. Retargetable compilers and architecture exploration for embedded processors. IEE Proc., Comput. Digit. Tech. 152, 209--223.Google Scholar
Cross Ref
- Lloyd, S. 1982. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 2, 129--137. Google Scholar
Digital Library
- Pan, Z. and Eigenmann, R. 2006. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In Proceedings of the International Symposium on Code Generation and Optimization. Google Scholar
Digital Library
- Silvano, C., Agosta, G., and Palermo, G. 2007. Efficient architecture/compiler coexploration using analytical models. Des. Autom Embed. Syst. 11, 1, 1--23.Google Scholar
Digital Library
- Smola, A. J. and Schölkopf, B. 2004. A tutorial on support vector regression. Stat. Comput. 14, 3. Google Scholar
Digital Library
- Tarjan, D., Thoziyoor, S., and Jouppi, N. P. 2006. Cacti 4.0. Tech. rep. HPL-2006-86, HP Laboratories Palo Alto, CA.Google Scholar
- Triantafyllis, S., Vachharajani, M., Vachharajani, N., and August, D. I. 2003. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization. Google Scholar
Digital Library
- Trimaran. 2000. Trimaran: An infrastructure for research in instruction-level parallelism. http://www.trimaran.org/.Google Scholar
- Vaswani, K., Thazhuthaveetil, M. J., Srikant, Y. N., and Joseph, P. J. 2007. Microarchitecture sensitive empirical models for compiler optimizations. In Proceedings of the International Symposium on Code Generation and Optimization. Google Scholar
Digital Library
- Vuduc, R., Demmel, J. W., and Bilmes, J. A. 2004. Statistical models for empirical search-based performance tuning. Int. J. High Perform. Comput. Appl. 18, 1. Google Scholar
Digital Library
- Zhao, M., Childers, B. R., and Soffa, M. L. 2005. A model-based framework: An approach for profit-driven optimization. In Proceedings of the International Symposium on Code Generation and Optimization. Google Scholar
Digital Library
Index Terms
Exploring and Predicting the Effects of Microarchitectural Parameters and Compiler Optimizations on Performance and Energy
Recommendations
Portable compiler optimisation across embedded programs and microarchitectures using machine learning
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on MicroarchitectureBuilding an optimising compiler is a difficult and time consuming task which must be repeated for each generation of a microprocessor. As the underlying microarchitecture changes from one generation to the next, the compiler must be retuned to optimise ...
Tuning Compiler Optimizations for Simultaneous Multithreading
Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part IISimultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions ...
Tuning compiler optimizations for simultaneous multithreading
MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on MicroarchitectureCompiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in ...






Comments