Abstract
Compile-time optimizations generally improve program performance. Nevertheless, degradations caused by individual compiler optimization techniques are to be expected. Feedback-directed optimization orchestration systems generate optimized code versions under a series of optimization combinations, evaluate their performance, and search for the best version. One challenge to such systems is to tune program performance quickly in an exponential search space. Another challenge is to achieve high program performance, considering that optimizations interact. Aiming at these two goals, this article presents an automated performance tuning system, PEAK, which searches for the best compiler optimization combinations for the important code sections in a program. The major contributions made in this work are as follows: (1) An algorithm called Combined Elimination (CE) is developed to explore the optimization space quickly and effectively; (2) Three fast and accurate rating methods are designed to evaluate the performance of an optimized code section based on a partial execution of the program; (3) An algorithm is developed to identify important code sections as candidates for performance tuning by trading off tuning speed and tuned program performance; and (4) A set of compiler tools are implemented to automate optimization orchestration. Orchestrating optimization options in SUN Forte compilers at the whole-program level, our CE algorithm improves performance by 10.8% over the SPEC CPU2000 FP baseline setting, compared to 5.6% improved by manual tuning. Orchestrating GCC O3 optimizations, CE improves performance by 12% over O3, the highest optimization level. Applying the rating methods, PEAK reduces tuning time from 2.19 hours to 5.85 minutes on average, while achieving equal or better program performance.
- Adl-Tabatabai, A.-R., Cierniak, M., Lueh, G.-Y., Parikh, V. M., and Stichnoth, J. M. 1998. Fast, effective code generation in a just-in-time java compiler. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation. ACM, New York, 280--290. Google Scholar
Digital Library
- Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O'Boyle, M. F. P., Thomson, J., Toussaint, M., and Williams, C. K. I. 2006. Using machine learning to focus iterative optimization. In CGO '06: Proceedings of the International Symposium on Code Generation and Optimization (Washington, DC). IEEE Computer Society Press, Los Alamitos, CA, 295--305. Google Scholar
Digital Library
- Arnold, M., Fink, S., Grove, D., Hind, M., and Sweeney, P. F. 2000. Adaptive optimization in the Jalapeño JVM. In Proceedings of the 15th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, New York, 47--65. Google Scholar
Digital Library
- Arnold, M., Hind, M., and Ryder, B. G. 2002. Online feedback-directed optimization of java. In Proceedings of the 17th ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, New York, 111--129. Google Scholar
Digital Library
- Auslander, J., Philipose, M., Chambers, C., Eggers, S. J., and Bershad, B. N. 1996. Fast, effective dynamic compilation. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 149--159. Google Scholar
Digital Library
- Bala, V., Duesterwald, E., and Banerjia, S. 2000. Dynamo: A transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation. ACM, New York, 1--12. Google Scholar
Digital Library
- Bruening, D., Garnett, T., and Amarasinghe, S. 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Computer Generation and Optimization (CGO '03). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Cavazos, J. and O'Boyle, M. F. P. 2006. Method-specific dynamic compilation using logistic regression. In OOPSLA '06: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM, New York, 229--240. Google Scholar
Digital Library
- Chow, K. and Wu, Y. 1999. Feedback-directed selection and characterization of compiler optimizations. In Proceedings of the 2nd Workshop on Feedback Directed Optimizations. Israel.Google Scholar
- Cierniak, M., Lueh, G.-Y., and Stichnoth, J. M. 2000. Practicing JUDO: Java under dynamic optimizations. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation. ACM, New York, 13--26. Google Scholar
Digital Library
- Click, C. and Cooper, K. D. 1995. Combining analyses, combining optimizations. ACM Trans. Prog. Lang. Syst. (TOPLAS) 17, 2, 181--196. Google Scholar
Digital Library
- Consel, C. and Noël, F. 1996. A general approach for run-time specialization and its application to c. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, New York, 145--156. Google Scholar
Digital Library
- Cooper, K. D., Grosul, A., Harvey, T. J., Reeves, S., Subramanian, D., Torczon, L., and Waterman, T. 2005. Acme: adaptive compilation made efficient. In LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, 69--77. Google Scholar
Digital Library
- Cooper, K. D., Subramanian, D., and Torczon, L. 2002. Adaptive optimizing compilers for the 21st century. J. Supercomput. 23, 1, 7--22. Google Scholar
Digital Library
- Dang, F., Yu, H., and Rauchwerger, L. 2002. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS '02). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Diniz, P. C. and Rinard, M. C. 1997. Dynamic feedback: An effective technique for adaptive computing. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 71--84. Google Scholar
Digital Library
- Duesterwald, E. and Bala, V. 2000. Software profiling for hot path prediction: Less is more. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 202--211. Google Scholar
Digital Library
- Ebcioglu, K. and Altman, E. R. 1997. DAISY: Dynamic compilation for 100. In Proceedings of the Annual International Symposium on Computer Architecture. 26--37. Google Scholar
Digital Library
- Engler, D. R. 1996. VCODE: a retargetable, extensible, very fast dynamic code generation system. SIGPLAN Notes 31, 5, 160--170. Google Scholar
Digital Library
- Engler, D. R. and Proebsting, T. A. 1994. DCG: An efficient, retargetable dynamic code generation system. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 263--272. Google Scholar
Digital Library
- GNU. 2005. GCC online documentation. http://gcc.gnu.org/onlinedocs/.Google Scholar
- Graham, S. L., Kessler, P. B., and McKusick, M. K. 1982. GPROF: A call graph execution profiler. In Proceedings of the SIGPLAN Symposium on Compiler Construction. ACM, New York, 120--126. Google Scholar
Digital Library
- Granston, E. D. and Holler, A. 2001. Automatic recommendation of compiler options. In Proceedings of the 4th Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4).Google Scholar
- Grant, B., Philipose, M., Mock, M., Chambers, C., and Eggers, S. J. 1999. An evaluation of staged run-time optimizations in dyc. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation. ACM, New York, 293--304. Google Scholar
Digital Library
- Haneda, M., Knijnenburg, P. M. W., and Wijshoff, H. A. G. 2005. Generating new general compiler optimization settings. In ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing. ACM, New York, 161--168. Google Scholar
Digital Library
- Hedayat, A., Sloane, N., and Stufken, J. 1999. Orthogonal Arrays: Theory and Applications. Springer-Verlag, New York.Google Scholar
Cross Ref
- Joshi, R., Nelson, G., and Randall, K. 2002. Denali: A goal-directed superoptimizer. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation. ACM, New York, 304--314. Google Scholar
Digital Library
- Karp, R. 1972. Reducibility among combinatorial problems. In Proceedings of the Symposium on the Complexity of Computer Computations. Plenum Press, New York, 85--103.Google Scholar
Cross Ref
- Kistler, T. and Franz, M. 2003. Continuous program optimization: A case study. ACM Trans. Program. Lang. Syst. 25, 4, 500--548. Google Scholar
Digital Library
- Kisuki, T., Knijnenburg, P. M. W., and O'Boyle, M. F. P. 2000. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the IEEE PACT. IEEE Computer Society Press, Los Alamitos, CA, 237--248. Google Scholar
Digital Library
- Kisuki, T., Knijnenburg, P. M. W., O'Boyle, M. F. P., Bodin, F., and Wijshoff, H. A. G. 1999. A feasibility study in iterative compilation. In Proceedings of the International Symposium on High Performance Computing (ISHPC'99). 121--132. Google Scholar
Digital Library
- Kulkarni, P., Hines, S., Hiser, J., Whalley, D., Davidson, J., and Jones, D. 2004. Fast searches for effective optimization phase sequences. In PLDI '04: Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation. ACM, New York, 171--182. Google Scholar
Digital Library
- Kulkarni, P. A., Whalley, D. B., Tyson, G. S., and Davidson, J. W. 2006. Exhaustive optimization phase order space exploration. In CGO '06: Proceedings of the International Symposium on Code Generation and Optimization (Washington, DC). IEEE Computer Society Press, Los Alamitos, CA, 306--318. Google Scholar
Digital Library
- Lau, J., Arnold, M., Hind, M., and Calder, B. 2006. Online performance auditing: Using hot optimizations without getting burned. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York. Google Scholar
Digital Library
- Lee, P. and Leone, M. 1996. Optimizing ML with run-time code generation. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 137--148. Google Scholar
Digital Library
- Leone, M. and Lee, P. 1998. Dynamic specialization in the fabius system. ACM Comput. Surv. 30, 3es, 23. Google Scholar
Digital Library
- Merten, M. C., Trick, A. R., Barnes, R. D., Nystrom, E. M., George, C. N., Gyllenhaal, J. C., and Mei W. Hwu, W. 2001. An architectural framework for runtime optimization. IEEE Trans. Comput. 50, 6, 567--589. Google Scholar
Digital Library
- Merten, M. C., Trick, A. R., George, C. N., Gyllenhaal, J. C., and Hwu, W. W. 1999. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization. In Proceedings of the 26th Annual International Symposium on Computer Architecture. IEEE Computer Society Press, Los Alamitos, CA, 136--147. Google Scholar
Digital Library
- Mock, M., Chambers, C., and Eggers, S. J. 2000. CALPA: A tool for automating selective dynamic compilation. In Proceedings of the International Symposium on Microarchitecture. 291--302. Google Scholar
Digital Library
- Nandy, S., Gao, X., and Ferrante, J. 2003. TFP: Time-sensitive, flow-specific profiling at runtime. In Proceedings of the Workshop on Languages and Compiling for Parallel Computing (LCPC).Google Scholar
- Nystrom, E. M., Barnes, R. D., Merten, M. C., and Mei W. Hwu, W. 2001. Code reordering and speculation support for dynamic optimization systems. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- Pan, Z. and Eigenmann, R. 2004a. Compiler optimization orchestration for peak performance. Tech. Rep. TR-ECE-04-01, School of Electrical and Computer Engineering, Purdue University.Google Scholar
- Pan, Z. and Eigenmann, R. 2004b. Rating compiler optimizations for automatic performance tuning. In SC2004: Proceedings of the High Performance Computing, Networking and Storage Conference. (10 pages). Google Scholar
Digital Library
- Pan, Z. and Eigenmann, R. 2006. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In Proceedings of the 4th Annual International Symposium on Code Generation and Optimization (CGO). (12 pages). Google Scholar
Digital Library
- Pinkers, R. P. J., Knijnenburg, P. M. W., Haneda, M., and Wijshoff, H. A. G. 2004. Statistical selection of compiler options. In Proceedings of the IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'04) (Volendam, The Netherlands). IEEE Computer Society Press, Los Alamitos, CA, 494--501. Google Scholar
Digital Library
- Rauchwerger, L. and Padua, D. A. 1999. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Para. Distrib. Syst. 10, 2 (Feb.), 160--180. Google Scholar
Digital Library
- Rus, S., Rauchwerger, L., and Hoeflinger, J. 2002. Hybrid analysis: Static & dynamic memory reference analysis. In Proceedings of the 16th International Conference on Supercomputing. ACM, New York, 274--284. Google Scholar
Digital Library
- SPEC. 2000. SPEC CPU2000 Results. http://www.spec.org/cpu2000/results.Google Scholar
- Stephenson, M., Amarasinghe, S., Martin, M., and O'Reilly, U.-M. 2003. Meta optimization: improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation. ACM, New York, 77--90. Google Scholar
Digital Library
- Strout, M. M., Carter, L., and Ferrante, J. 2003. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York. Google Scholar
Digital Library
- Sun. 2000. Forte C 6 /Sun WorkShop 6 Compilers C User's Guide. http://docs.sun.com/app/docs/doc/806-3567.Google Scholar
- Triantafyllis, S., Vachharajani, M., Vachharajani, N., and August, D. I. 2003. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization. 204--215. Google Scholar
Digital Library
- Voss, M. and Eigenmann, R. 2000. ADAPT: Automated de-coupled adaptive program transformation. In Proceedings of the International Conference on Parallel Processing. IEEE Computer Society Press, Los Alamitos, CA, 163--170. Google Scholar
Digital Library
- Voss, M. J. and Eigemann, R. 2001. High-level adaptive program optimization with ADAPT. In Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming. ACM, New York, 93--102. Google Scholar
Digital Library
- Whaley, R. C. and Dongarra, J. 1998. Automatically tuned linear algebra software. In Proceedings of the SuperComputing 1998: High Performance Networking and Computing. Google Scholar
Digital Library
- Wolf, M. E., Maydan, D. E., and Chen, D.-K. 1996. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture. ACM, New York, 274--286. Google Scholar
Digital Library
Index Terms
PEAK—a fast and effective performance tuning system via compiler optimization orchestration
Recommendations
Fast, automatic, procedure-level performance tuning
PACT '06: Proceedings of the 15th international conference on Parallel architectures and compilation techniquesThis paper presents an automated performance tuning solution, which partitions a program into a number of tuning sections and finds the best combination of compiler options for each section. Our solution builds on prior work on feedback-driven ...
Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning
CGO '06: Proceedings of the International Symposium on Code Generation and OptimizationAlthough compile-time optimizations generally improve program performance, degradations caused by individual techniques are to be expected. One promising research direction to overcome this problem is the development of dynamic, feedback-directed ...
An Experimental Comparison Between Genetic Algorithm and Particle Swarm Optimization in Spark Performance Tuning
ETCD'17: Proceedings of the first Workshop on Emerging Technologies for software-defined and reconfigurable hardware-accelerated Cloud DatacentersThe most popular in-memory computing framework --- Spark --- has a number of performance-critical configuration parameters. Manually tuning these parameters for optimized performance is not practical because the parameter tuning space is huge. Searching ...






Comments