skip to main content
research-article
Free Access

PEAK—a fast and effective performance tuning system via compiler optimization orchestration

Published:21 May 2008Publication History
Skip Abstract Section

Abstract

Compile-time optimizations generally improve program performance. Nevertheless, degradations caused by individual compiler optimization techniques are to be expected. Feedback-directed optimization orchestration systems generate optimized code versions under a series of optimization combinations, evaluate their performance, and search for the best version. One challenge to such systems is to tune program performance quickly in an exponential search space. Another challenge is to achieve high program performance, considering that optimizations interact. Aiming at these two goals, this article presents an automated performance tuning system, PEAK, which searches for the best compiler optimization combinations for the important code sections in a program. The major contributions made in this work are as follows: (1) An algorithm called Combined Elimination (CE) is developed to explore the optimization space quickly and effectively; (2) Three fast and accurate rating methods are designed to evaluate the performance of an optimized code section based on a partial execution of the program; (3) An algorithm is developed to identify important code sections as candidates for performance tuning by trading off tuning speed and tuned program performance; and (4) A set of compiler tools are implemented to automate optimization orchestration. Orchestrating optimization options in SUN Forte compilers at the whole-program level, our CE algorithm improves performance by 10.8% over the SPEC CPU2000 FP baseline setting, compared to 5.6% improved by manual tuning. Orchestrating GCC O3 optimizations, CE improves performance by 12% over O3, the highest optimization level. Applying the rating methods, PEAK reduces tuning time from 2.19 hours to 5.85 minutes on average, while achieving equal or better program performance.

References

  1. Adl-Tabatabai, A.-R., Cierniak, M., Lueh, G.-Y., Parikh, V. M., and Stichnoth, J. M. 1998. Fast, effective code generation in a just-in-time java compiler. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation. ACM, New York, 280--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O'Boyle, M. F. P., Thomson, J., Toussaint, M., and Williams, C. K. I. 2006. Using machine learning to focus iterative optimization. In CGO '06: Proceedings of the International Symposium on Code Generation and Optimization (Washington, DC). IEEE Computer Society Press, Los Alamitos, CA, 295--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arnold, M., Fink, S., Grove, D., Hind, M., and Sweeney, P. F. 2000. Adaptive optimization in the Jalapeño JVM. In Proceedings of the 15th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, New York, 47--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arnold, M., Hind, M., and Ryder, B. G. 2002. Online feedback-directed optimization of java. In Proceedings of the 17th ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, New York, 111--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Auslander, J., Philipose, M., Chambers, C., Eggers, S. J., and Bershad, B. N. 1996. Fast, effective dynamic compilation. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 149--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bala, V., Duesterwald, E., and Banerjia, S. 2000. Dynamo: A transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation. ACM, New York, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bruening, D., Garnett, T., and Amarasinghe, S. 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Computer Generation and Optimization (CGO '03). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cavazos, J. and O'Boyle, M. F. P. 2006. Method-specific dynamic compilation using logistic regression. In OOPSLA '06: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications. ACM, New York, 229--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chow, K. and Wu, Y. 1999. Feedback-directed selection and characterization of compiler optimizations. In Proceedings of the 2nd Workshop on Feedback Directed Optimizations. Israel.Google ScholarGoogle Scholar
  10. Cierniak, M., Lueh, G.-Y., and Stichnoth, J. M. 2000. Practicing JUDO: Java under dynamic optimizations. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation. ACM, New York, 13--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Click, C. and Cooper, K. D. 1995. Combining analyses, combining optimizations. ACM Trans. Prog. Lang. Syst. (TOPLAS) 17, 2, 181--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Consel, C. and Noël, F. 1996. A general approach for run-time specialization and its application to c. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, New York, 145--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cooper, K. D., Grosul, A., Harvey, T. J., Reeves, S., Subramanian, D., Torczon, L., and Waterman, T. 2005. Acme: adaptive compilation made efficient. In LCTES '05: Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, 69--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cooper, K. D., Subramanian, D., and Torczon, L. 2002. Adaptive optimizing compilers for the 21st century. J. Supercomput. 23, 1, 7--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dang, F., Yu, H., and Rauchwerger, L. 2002. The R-LRPD test: Speculative parallelization of partially parallel loops. In Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS '02). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Diniz, P. C. and Rinard, M. C. 1997. Dynamic feedback: An effective technique for adaptive computing. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 71--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Duesterwald, E. and Bala, V. 2000. Software profiling for hot path prediction: Less is more. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 202--211. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ebcioglu, K. and Altman, E. R. 1997. DAISY: Dynamic compilation for 100. In Proceedings of the Annual International Symposium on Computer Architecture. 26--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Engler, D. R. 1996. VCODE: a retargetable, extensible, very fast dynamic code generation system. SIGPLAN Notes 31, 5, 160--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Engler, D. R. and Proebsting, T. A. 1994. DCG: An efficient, retargetable dynamic code generation system. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 263--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. GNU. 2005. GCC online documentation. http://gcc.gnu.org/onlinedocs/.Google ScholarGoogle Scholar
  22. Graham, S. L., Kessler, P. B., and McKusick, M. K. 1982. GPROF: A call graph execution profiler. In Proceedings of the SIGPLAN Symposium on Compiler Construction. ACM, New York, 120--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Granston, E. D. and Holler, A. 2001. Automatic recommendation of compiler options. In Proceedings of the 4th Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4).Google ScholarGoogle Scholar
  24. Grant, B., Philipose, M., Mock, M., Chambers, C., and Eggers, S. J. 1999. An evaluation of staged run-time optimizations in dyc. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation. ACM, New York, 293--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Haneda, M., Knijnenburg, P. M. W., and Wijshoff, H. A. G. 2005. Generating new general compiler optimization settings. In ICS '05: Proceedings of the 19th Annual International Conference on Supercomputing. ACM, New York, 161--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hedayat, A., Sloane, N., and Stufken, J. 1999. Orthogonal Arrays: Theory and Applications. Springer-Verlag, New York.Google ScholarGoogle ScholarCross RefCross Ref
  27. Joshi, R., Nelson, G., and Randall, K. 2002. Denali: A goal-directed superoptimizer. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation. ACM, New York, 304--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Karp, R. 1972. Reducibility among combinatorial problems. In Proceedings of the Symposium on the Complexity of Computer Computations. Plenum Press, New York, 85--103.Google ScholarGoogle ScholarCross RefCross Ref
  29. Kistler, T. and Franz, M. 2003. Continuous program optimization: A case study. ACM Trans. Program. Lang. Syst. 25, 4, 500--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kisuki, T., Knijnenburg, P. M. W., and O'Boyle, M. F. P. 2000. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the IEEE PACT. IEEE Computer Society Press, Los Alamitos, CA, 237--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kisuki, T., Knijnenburg, P. M. W., O'Boyle, M. F. P., Bodin, F., and Wijshoff, H. A. G. 1999. A feasibility study in iterative compilation. In Proceedings of the International Symposium on High Performance Computing (ISHPC'99). 121--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kulkarni, P., Hines, S., Hiser, J., Whalley, D., Davidson, J., and Jones, D. 2004. Fast searches for effective optimization phase sequences. In PLDI '04: Proceedings of the ACM SIGPLAN 2004 Conference on Programming Language Design and Implementation. ACM, New York, 171--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kulkarni, P. A., Whalley, D. B., Tyson, G. S., and Davidson, J. W. 2006. Exhaustive optimization phase order space exploration. In CGO '06: Proceedings of the International Symposium on Code Generation and Optimization (Washington, DC). IEEE Computer Society Press, Los Alamitos, CA, 306--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lau, J., Arnold, M., Hind, M., and Calder, B. 2006. Online performance auditing: Using hot optimizations without getting burned. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lee, P. and Leone, M. 1996. Optimizing ML with run-time code generation. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 137--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Leone, M. and Lee, P. 1998. Dynamic specialization in the fabius system. ACM Comput. Surv. 30, 3es, 23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Merten, M. C., Trick, A. R., Barnes, R. D., Nystrom, E. M., George, C. N., Gyllenhaal, J. C., and Mei W. Hwu, W. 2001. An architectural framework for runtime optimization. IEEE Trans. Comput. 50, 6, 567--589. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Merten, M. C., Trick, A. R., George, C. N., Gyllenhaal, J. C., and Hwu, W. W. 1999. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization. In Proceedings of the 26th Annual International Symposium on Computer Architecture. IEEE Computer Society Press, Los Alamitos, CA, 136--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mock, M., Chambers, C., and Eggers, S. J. 2000. CALPA: A tool for automating selective dynamic compilation. In Proceedings of the International Symposium on Microarchitecture. 291--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nandy, S., Gao, X., and Ferrante, J. 2003. TFP: Time-sensitive, flow-specific profiling at runtime. In Proceedings of the Workshop on Languages and Compiling for Parallel Computing (LCPC).Google ScholarGoogle Scholar
  41. Nystrom, E. M., Barnes, R. D., Merten, M. C., and Mei W. Hwu, W. 2001. Code reordering and speculation support for dynamic optimization systems. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Pan, Z. and Eigenmann, R. 2004a. Compiler optimization orchestration for peak performance. Tech. Rep. TR-ECE-04-01, School of Electrical and Computer Engineering, Purdue University.Google ScholarGoogle Scholar
  43. Pan, Z. and Eigenmann, R. 2004b. Rating compiler optimizations for automatic performance tuning. In SC2004: Proceedings of the High Performance Computing, Networking and Storage Conference. (10 pages). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Pan, Z. and Eigenmann, R. 2006. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In Proceedings of the 4th Annual International Symposium on Code Generation and Optimization (CGO). (12 pages). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Pinkers, R. P. J., Knijnenburg, P. M. W., Haneda, M., and Wijshoff, H. A. G. 2004. Statistical selection of compiler options. In Proceedings of the IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems (MASCOTS'04) (Volendam, The Netherlands). IEEE Computer Society Press, Los Alamitos, CA, 494--501. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Rauchwerger, L. and Padua, D. A. 1999. The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization. IEEE Trans. Para. Distrib. Syst. 10, 2 (Feb.), 160--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Rus, S., Rauchwerger, L., and Hoeflinger, J. 2002. Hybrid analysis: Static & dynamic memory reference analysis. In Proceedings of the 16th International Conference on Supercomputing. ACM, New York, 274--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. SPEC. 2000. SPEC CPU2000 Results. http://www.spec.org/cpu2000/results.Google ScholarGoogle Scholar
  49. Stephenson, M., Amarasinghe, S., Martin, M., and O'Reilly, U.-M. 2003. Meta optimization: improving compiler heuristics with machine learning. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation. ACM, New York, 77--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Strout, M. M., Carter, L., and Ferrante, J. 2003. Compile-time composition of run-time data and iteration reorderings. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sun. 2000. Forte C 6 /Sun WorkShop 6 Compilers C User's Guide. http://docs.sun.com/app/docs/doc/806-3567.Google ScholarGoogle Scholar
  52. Triantafyllis, S., Vachharajani, M., Vachharajani, N., and August, D. I. 2003. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization. 204--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Voss, M. and Eigenmann, R. 2000. ADAPT: Automated de-coupled adaptive program transformation. In Proceedings of the International Conference on Parallel Processing. IEEE Computer Society Press, Los Alamitos, CA, 163--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Voss, M. J. and Eigemann, R. 2001. High-level adaptive program optimization with ADAPT. In Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming. ACM, New York, 93--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Whaley, R. C. and Dongarra, J. 1998. Automatically tuned linear algebra software. In Proceedings of the SuperComputing 1998: High Performance Networking and Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Wolf, M. E., Maydan, D. E., and Chen, D.-K. 1996. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture. ACM, New York, 274--286. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PEAK—a fast and effective performance tuning system via compiler optimization orchestration

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Programming Languages and Systems
        ACM Transactions on Programming Languages and Systems  Volume 30, Issue 3
        May 2008
        245 pages
        ISSN:0164-0925
        EISSN:1558-4593
        DOI:10.1145/1353445
        Issue’s Table of Contents

        Copyright © 2008 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 May 2008
        • Accepted: 1 May 2007
        • Revised: 1 November 2006
        • Received: 1 May 2006
        Published in toplas Volume 30, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!