skip to main content
research-article
Open Access

Profile inference revisited

Published:12 January 2022Publication History
Skip Abstract Section

Abstract

Profile-guided optimization (PGO) is an important component in modern compilers. By allowing the compiler to leverage the program’s dynamic behavior, it can often generate substantially faster binaries. Sampling-based profiling is the state-of-the-art technique for collecting execution profiles in data-center environments. However, the lowered profile accuracy caused by sampling fully optimized binary often hurts the benefits of PGO; thus, an important problem is to overcome the inaccuracy in a profile after it is collected. In this paper we tackle the problem, which is also known as profile inference and profile rectification.

We investigate the classical approach for profile inference, based on computing minimum-cost maximum flows in a control-flow graph, and develop an extended model capturing the desired properties of real-world profiles. Next we provide a solid theoretical foundation of the corresponding optimization problem by studying its algorithmic aspects. We then describe a new efficient algorithm for the problem along with its implementation in an open-source compiler. An extensive evaluation of the algorithm and existing profile inference techniques on a variety of applications, including Facebook production workloads and SPEC CPU benchmarks, indicates that the new method outperforms its competitors by significantly improving the accuracy of profile data and the performance of generated binaries.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

Profile-guided optimization (PGO) is an important component in modern compilers. By allowing the compiler to leverage the program’s dynamic behavior, it can often generate substantially faster binaries. Sampling-based profiling is the state-of-the-art technique for collecting execution profiles in data-center environments. However, the lowered profile accuracy caused by sampling fully optimized binary hurts the benefits of PGO. Therefore, an important problem, called profile inference, is to overcome the inaccuracy in a profile after it is collected. In this talk we describe our work accepted in the research track of POPL 2022.

References

  1. Ravindra K Ahuja, Thomas L Magnanti, and James B Orlin. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc., USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Thomas Ball and James R Larus. 1994. Optimally profiling and tracing programs. Transactions on Programming Languages and Systems, 16, 4 (1994), 1319–1360. https://doi.org/10.1145/183432.183527 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Richard Bellman. 1958. On a routing problem. Quart. Appl. Math., 16 (1958), 87–90. https://doi.org/10.1090/qam/102435 Google ScholarGoogle ScholarCross RefCross Ref
  4. Brad Calder, Dirk Grunwald, Michael Jones, Donald Lindsay, James Martin, Michael Mozer, and Benjamin Zorn. 1997. Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems, 19, 1 (1997), 188–222. https://doi.org/10.1145/239912.239923 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Krishnendu Chatterjee, Amir Kafshdar Goharshady, Nastaran Okati, and Andreas Pavlogiannis. 2019. Efficient parameterized algorithms for data packing. Proceedings of the ACM on Programming Languages, 3, POPL (2019), 1–28. https://doi.org/10.1145/3290366 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dehao Chen, Tipp Moseley, and David Xinliang Li. 2016. AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications. In International Symposium on Code Generation and Optimization. ACM, 12–23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dehao Chen, Neil Vachharajani, Robert Hundt, Xinliang Li, Stephane Eranian, Wenguang Chen, and Weimin Zheng. 2013. Taming hardware event samples for precise and versatile feedback directed optimizations. IEEE Trans. Comput., 62, 2 (2013), 376–389. https://doi.org/10.1109/TC.2011.233 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. William Feller. 1968. An Introduction to Probability Theory and Its Applications. Wiley.Google ScholarGoogle Scholar
  9. L. R. Ford. 1956. Network Flow Theory. RAND Corporation, Santa Monica, CA.Google ScholarGoogle Scholar
  10. L. R. Ford and D. R. Fulkerson. 1956. Maximal flow through a network. Canadian Journal of Mathematics, 8 (1956), 399–404. https://doi.org/10.4153/CJM-1956-045-5 Google ScholarGoogle ScholarCross RefCross Ref
  11. Andrew V. Goldberg and Robert E. Tarjan. 1989. Finding minimum-cost circulations by canceling negative cycles. J. ACM, 36, 4 (1989), 873–886. https://doi.org/10.1145/76359.76368 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Wenlei He and Hongtao Yu. 2020. [llvm-dev] [RFC] Context-sensitive Sample PGO with Pseudo-Instrumentation. https://lists.llvm.org/pipermail/llvm-dev/2020-August/144101.htmlGoogle ScholarGoogle Scholar
  13. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization. IEEE, 75. https://doi.org/10.1109/CGO.2004.1281665 Google ScholarGoogle ScholarCross RefCross Ref
  14. Byeongcheol Lee. 2015. Adaptive correction of sampling bias in dynamic call graphs. ACM Transactions on Architecture and Code Optimization, 12, 4 (2015), 1–24. https://doi.org/10.1145/2840806 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Benjamin Letham, Brian Karrer, Guilherme Ottoni, and Eytan Bakshy. 2019. Constrained Bayesian optimization with noisy experiments. Bayesian Analysis, 14, 2 (2019), 495–519. https://doi.org/10.1214/18-BA1110 Google ScholarGoogle ScholarCross RefCross Ref
  16. Roy Levin, Ilan Newman, and Gadi Haber. 2008. Complementing missing and inaccurate profiling using a minimum cost circulation algorithm. In High Performance Embedded Architectures and Compilers. 4917, Springer, 291–304. https://doi.org/10.1007/978-3-540-77560-7_20 Google ScholarGoogle ScholarCross RefCross Ref
  17. Xian-hua Liu, Yuan Peng, and Ji-yu Zhang. 2016. A sample profile-based optimization method with better precision. In International Conference on Artificial Intelligence and Computer Science. DEStech Publications, 340–346. https://doi.org/10.12783/dtcse/aics2016/8220 Google ScholarGoogle ScholarCross RefCross Ref
  18. Isja Mannens, Jesper Nederlof, Céline Swennenhuis, and Krisztina Szilágyi. 2021. On the parameterized complexity of the connected flow and many visits TSP problem. In Graph-Theoretic Concepts in Computer Science. 12301, Springer, 52–79. https://doi.org/10.1007/978-3-030-86838-3_5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Julián Mestre, Sergey Pupyrev, and Seeun William Umboh. 2021. On the Extended TSP Problem. In Proc. of the 32st International Symposium on Algorithms and Computation.Google ScholarGoogle Scholar
  20. Edward F. Moore. 1959. The shortest path through a maze. In Proceedings of the International Symposium on the Theory of Switching. Harvard Annals, Harvard University. 285–292.Google ScholarGoogle Scholar
  21. Andy Newell and Sergey Pupyrev. 2020. Improved basic block reordering. IEEE Trans. Comput., 69, 12 (2020), 1784–1794. https://doi.org/10.1109/TC.2020.2982888 Google ScholarGoogle Scholar
  22. Diego Novillo. 2014. SamplePGO: the power of profile guided optimizations without the usability burden. In 2014 LLVM Compiler Infrastructure in HPC. IEEE Press, 22–28. https://doi.org/10.1109/LLVM-HPC.2014.8 Google ScholarGoogle ScholarCross RefCross Ref
  23. Andrzej Nowak, Ahmad Yasin, Avi Mendelson, and Willy Zwaenepoel. 2015. Establishing a base of trust with performance counters for enterprise workloads. In USENIX Annual Technical Conference. USENIX Association, 541–548.Google ScholarGoogle Scholar
  24. James Orlin. 1988. A faster strongly polynomial minimum cost flow algorithm. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing. ACM, 377–387. https://doi.org/10.1145/62212.62249 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Guilherme Ottoni. 2018. HHVM JIT: A Profile-guided, Region-based Compiler for PHP and Hack. In SIGPLAN Conference on Programming Language Design and Implementation. ACM, 151–165. https://doi.org/10.1145/3192366.3192374 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Guilherme Ottoni and Bin Liu. 2021. HHVM Jump-Start: Boosting both warmup and steady-state performance at scale. In International Symposium on Code Generation and Optimization. IEEE, 340–350. https://doi.org/10.1109/CGO51591.2021.9370314 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. BOLT: a practical binary optimizer for data centers and beyond. In International Symposium on Code Generation and Optimization. IEEE, 2–14. https://doi.org/10.1109/CGO.2019.8661201 Google ScholarGoogle ScholarCross RefCross Ref
  28. Christos H. Papadimitriou and Santosh S. Vempala. 2006. On the approximability of the Traveling Salesman Problem. Combinatorica, 26, 1 (2006), 101–120. https://doi.org/10.1007/s00493-006-0008-z Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Robert L. Probert. 1982. Optimal insertion of software probes in well-delimited programs. IEEE Transactions on Software Engineering, 8, 1 (1982), 34–42. https://doi.org/10.1109/TSE.1982.234772 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Vinodha Ramasamy, Paul Yuan, Dehao Chen, and Robert Hundt. 2008. Feedback-directed optimizations in GCC with estimated edge profiles from hardware event sampling. In Proceedings of GCC Summit 2008. 87–102.Google ScholarGoogle Scholar
  31. Ching-Yen Shih, Drake Svoboda, Siao-Jie Su, and Wei-Chung Liao. 2021. Static branch prediction for LLVM using machine learning. https://drakesvoboda.com/public/StaticBranchPrediction.pdfGoogle ScholarGoogle Scholar
  32. Robert Endre Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput., 1, 2 (1972), 146–160. https://doi.org/10.1109/SWAT.1971.10 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mikkel Thorup. 1998. All structured programs have small tree width and good register allocation. Information and Computation, 142, 2 (1998), 159–181. https://doi.org/10.1006/inco.1997.2697 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. László A. Végh. 2016. A strongly polynomial algorithm for a class of minimum-cost flow problems with separable convex objectives. SIAM J. Comput., 45, 5 (2016), 1729–1761. https://doi.org/10.1137/140978296 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Bo Wu, Mingzhou Zhou, Xipeng Shen, Yaoqing Gao, Raul Silvera, and Graham Yiu. 2013. Simple profile rectifications go a long way. In European Conference on Object-Oriented Programming. Springer, 654–678. https://doi.org/10.1007/978-3-642-39038-8_27 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Youfeng Wu and James R Larus. 1994. Static branch frequency and program profile analysis. In Annual International Symposium on Microarchitecture. ACM / IEEE Computer Society, 1–11. https://doi.org/10.1145/192724.192725 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Hao Xu, Qingsen Wang, Shuang Song, Lizy Kurian John, and Xu Liu. 2019. Can we trust profiling results? Understanding and fixing the inaccuracy in modern profilers. In International Conference on Supercomputing. ACM, 284–295. https://doi.org/10.1145/3330345.3330371 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jifei Yi, Benchao Dong, Mingkai Dong, and Haibo Chen. 2020. On the precision of precise event based sampling. In Asia-Pacific Workshop on Systems. ACM, 98–105. https://doi.org/10.1145/3409963.3410490 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mingzhou Zhou, Bo Wu, Xipeng Shen, Yaoqing Gao, and Graham Yiu. 2016. Examining and reducing the influence of sampling errors on feedback-driven optimizations. ACM Transactions on Architecture and Code Optimization, 13, 1 (2016), 1–24. https://doi.org/10.1145/2851502 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Profile inference revisited

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Programming Languages
        Proceedings of the ACM on Programming Languages  Volume 6, Issue POPL
        January 2022
        1886 pages
        EISSN:2475-1421
        DOI:10.1145/3511309
        Issue’s Table of Contents

        Copyright © 2022 Owner/Author

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 January 2022
        Published in pacmpl Volume 6, Issue POPL

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!