Abstract
Profile-guided optimization (PGO) is an important component in modern compilers. By allowing the compiler to leverage the program’s dynamic behavior, it can often generate substantially faster binaries. Sampling-based profiling is the state-of-the-art technique for collecting execution profiles in data-center environments. However, the lowered profile accuracy caused by sampling fully optimized binary often hurts the benefits of PGO; thus, an important problem is to overcome the inaccuracy in a profile after it is collected. In this paper we tackle the problem, which is also known as profile inference and profile rectification.
We investigate the classical approach for profile inference, based on computing minimum-cost maximum flows in a control-flow graph, and develop an extended model capturing the desired properties of real-world profiles. Next we provide a solid theoretical foundation of the corresponding optimization problem by studying its algorithmic aspects. We then describe a new efficient algorithm for the problem along with its implementation in an open-source compiler. An extensive evaluation of the algorithm and existing profile inference techniques on a variety of applications, including Facebook production workloads and SPEC CPU benchmarks, indicates that the new method outperforms its competitors by significantly improving the accuracy of profile data and the performance of generated binaries.
Supplemental Material
- Ravindra K Ahuja, Thomas L Magnanti, and James B Orlin. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc., USA.Google Scholar
Digital Library
- Thomas Ball and James R Larus. 1994. Optimally profiling and tracing programs. Transactions on Programming Languages and Systems, 16, 4 (1994), 1319–1360. https://doi.org/10.1145/183432.183527 Google Scholar
Digital Library
- Richard Bellman. 1958. On a routing problem. Quart. Appl. Math., 16 (1958), 87–90. https://doi.org/10.1090/qam/102435 Google Scholar
Cross Ref
- Brad Calder, Dirk Grunwald, Michael Jones, Donald Lindsay, James Martin, Michael Mozer, and Benjamin Zorn. 1997. Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems, 19, 1 (1997), 188–222. https://doi.org/10.1145/239912.239923 Google Scholar
Digital Library
- Krishnendu Chatterjee, Amir Kafshdar Goharshady, Nastaran Okati, and Andreas Pavlogiannis. 2019. Efficient parameterized algorithms for data packing. Proceedings of the ACM on Programming Languages, 3, POPL (2019), 1–28. https://doi.org/10.1145/3290366 Google Scholar
Digital Library
- Dehao Chen, Tipp Moseley, and David Xinliang Li. 2016. AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications. In International Symposium on Code Generation and Optimization. ACM, 12–23.Google Scholar
Digital Library
- Dehao Chen, Neil Vachharajani, Robert Hundt, Xinliang Li, Stephane Eranian, Wenguang Chen, and Weimin Zheng. 2013. Taming hardware event samples for precise and versatile feedback directed optimizations. IEEE Trans. Comput., 62, 2 (2013), 376–389. https://doi.org/10.1109/TC.2011.233 Google Scholar
Digital Library
- William Feller. 1968. An Introduction to Probability Theory and Its Applications. Wiley.Google Scholar
- L. R. Ford. 1956. Network Flow Theory. RAND Corporation, Santa Monica, CA.Google Scholar
- L. R. Ford and D. R. Fulkerson. 1956. Maximal flow through a network. Canadian Journal of Mathematics, 8 (1956), 399–404. https://doi.org/10.4153/CJM-1956-045-5 Google Scholar
Cross Ref
- Andrew V. Goldberg and Robert E. Tarjan. 1989. Finding minimum-cost circulations by canceling negative cycles. J. ACM, 36, 4 (1989), 873–886. https://doi.org/10.1145/76359.76368 Google Scholar
Digital Library
- Wenlei He and Hongtao Yu. 2020. [llvm-dev] [RFC] Context-sensitive Sample PGO with Pseudo-Instrumentation. https://lists.llvm.org/pipermail/llvm-dev/2020-August/144101.htmlGoogle Scholar
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization. IEEE, 75. https://doi.org/10.1109/CGO.2004.1281665 Google Scholar
Cross Ref
- Byeongcheol Lee. 2015. Adaptive correction of sampling bias in dynamic call graphs. ACM Transactions on Architecture and Code Optimization, 12, 4 (2015), 1–24. https://doi.org/10.1145/2840806 Google Scholar
Digital Library
- Benjamin Letham, Brian Karrer, Guilherme Ottoni, and Eytan Bakshy. 2019. Constrained Bayesian optimization with noisy experiments. Bayesian Analysis, 14, 2 (2019), 495–519. https://doi.org/10.1214/18-BA1110 Google Scholar
Cross Ref
- Roy Levin, Ilan Newman, and Gadi Haber. 2008. Complementing missing and inaccurate profiling using a minimum cost circulation algorithm. In High Performance Embedded Architectures and Compilers. 4917, Springer, 291–304. https://doi.org/10.1007/978-3-540-77560-7_20 Google Scholar
Cross Ref
- Xian-hua Liu, Yuan Peng, and Ji-yu Zhang. 2016. A sample profile-based optimization method with better precision. In International Conference on Artificial Intelligence and Computer Science. DEStech Publications, 340–346. https://doi.org/10.12783/dtcse/aics2016/8220 Google Scholar
Cross Ref
- Isja Mannens, Jesper Nederlof, Céline Swennenhuis, and Krisztina Szilágyi. 2021. On the parameterized complexity of the connected flow and many visits TSP problem. In Graph-Theoretic Concepts in Computer Science. 12301, Springer, 52–79. https://doi.org/10.1007/978-3-030-86838-3_5 Google Scholar
Digital Library
- Julián Mestre, Sergey Pupyrev, and Seeun William Umboh. 2021. On the Extended TSP Problem. In Proc. of the 32st International Symposium on Algorithms and Computation.Google Scholar
- Edward F. Moore. 1959. The shortest path through a maze. In Proceedings of the International Symposium on the Theory of Switching. Harvard Annals, Harvard University. 285–292.Google Scholar
- Andy Newell and Sergey Pupyrev. 2020. Improved basic block reordering. IEEE Trans. Comput., 69, 12 (2020), 1784–1794. https://doi.org/10.1109/TC.2020.2982888 Google Scholar
- Diego Novillo. 2014. SamplePGO: the power of profile guided optimizations without the usability burden. In 2014 LLVM Compiler Infrastructure in HPC. IEEE Press, 22–28. https://doi.org/10.1109/LLVM-HPC.2014.8 Google Scholar
Cross Ref
- Andrzej Nowak, Ahmad Yasin, Avi Mendelson, and Willy Zwaenepoel. 2015. Establishing a base of trust with performance counters for enterprise workloads. In USENIX Annual Technical Conference. USENIX Association, 541–548.Google Scholar
- James Orlin. 1988. A faster strongly polynomial minimum cost flow algorithm. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing. ACM, 377–387. https://doi.org/10.1145/62212.62249 Google Scholar
Digital Library
- Guilherme Ottoni. 2018. HHVM JIT: A Profile-guided, Region-based Compiler for PHP and Hack. In SIGPLAN Conference on Programming Language Design and Implementation. ACM, 151–165. https://doi.org/10.1145/3192366.3192374 Google Scholar
Digital Library
- Guilherme Ottoni and Bin Liu. 2021. HHVM Jump-Start: Boosting both warmup and steady-state performance at scale. In International Symposium on Code Generation and Optimization. IEEE, 340–350. https://doi.org/10.1109/CGO51591.2021.9370314 Google Scholar
Digital Library
- Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. BOLT: a practical binary optimizer for data centers and beyond. In International Symposium on Code Generation and Optimization. IEEE, 2–14. https://doi.org/10.1109/CGO.2019.8661201 Google Scholar
Cross Ref
- Christos H. Papadimitriou and Santosh S. Vempala. 2006. On the approximability of the Traveling Salesman Problem. Combinatorica, 26, 1 (2006), 101–120. https://doi.org/10.1007/s00493-006-0008-z Google Scholar
Digital Library
- Robert L. Probert. 1982. Optimal insertion of software probes in well-delimited programs. IEEE Transactions on Software Engineering, 8, 1 (1982), 34–42. https://doi.org/10.1109/TSE.1982.234772 Google Scholar
Digital Library
- Vinodha Ramasamy, Paul Yuan, Dehao Chen, and Robert Hundt. 2008. Feedback-directed optimizations in GCC with estimated edge profiles from hardware event sampling. In Proceedings of GCC Summit 2008. 87–102.Google Scholar
- Ching-Yen Shih, Drake Svoboda, Siao-Jie Su, and Wei-Chung Liao. 2021. Static branch prediction for LLVM using machine learning. https://drakesvoboda.com/public/StaticBranchPrediction.pdfGoogle Scholar
- Robert Endre Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput., 1, 2 (1972), 146–160. https://doi.org/10.1109/SWAT.1971.10 Google Scholar
Digital Library
- Mikkel Thorup. 1998. All structured programs have small tree width and good register allocation. Information and Computation, 142, 2 (1998), 159–181. https://doi.org/10.1006/inco.1997.2697 Google Scholar
Digital Library
- László A. Végh. 2016. A strongly polynomial algorithm for a class of minimum-cost flow problems with separable convex objectives. SIAM J. Comput., 45, 5 (2016), 1729–1761. https://doi.org/10.1137/140978296 Google Scholar
Digital Library
- Bo Wu, Mingzhou Zhou, Xipeng Shen, Yaoqing Gao, Raul Silvera, and Graham Yiu. 2013. Simple profile rectifications go a long way. In European Conference on Object-Oriented Programming. Springer, 654–678. https://doi.org/10.1007/978-3-642-39038-8_27 Google Scholar
Digital Library
- Youfeng Wu and James R Larus. 1994. Static branch frequency and program profile analysis. In Annual International Symposium on Microarchitecture. ACM / IEEE Computer Society, 1–11. https://doi.org/10.1145/192724.192725 Google Scholar
Digital Library
- Hao Xu, Qingsen Wang, Shuang Song, Lizy Kurian John, and Xu Liu. 2019. Can we trust profiling results? Understanding and fixing the inaccuracy in modern profilers. In International Conference on Supercomputing. ACM, 284–295. https://doi.org/10.1145/3330345.3330371 Google Scholar
Digital Library
- Jifei Yi, Benchao Dong, Mingkai Dong, and Haibo Chen. 2020. On the precision of precise event based sampling. In Asia-Pacific Workshop on Systems. ACM, 98–105. https://doi.org/10.1145/3409963.3410490 Google Scholar
Digital Library
- Mingzhou Zhou, Bo Wu, Xipeng Shen, Yaoqing Gao, and Graham Yiu. 2016. Examining and reducing the influence of sampling errors on feedback-driven optimizations. ACM Transactions on Architecture and Code Optimization, 13, 1 (2016), 1–24. https://doi.org/10.1145/2851502 Google Scholar
Digital Library
Index Terms
Profile inference revisited
Recommendations
Exploring Impact of Profile Data on Code Quality in the HotSpot JVM
Special Issue on LCETES, Part 2, Learning, Distributed, and Optimizing CompilersManaged language virtual machines (VM) rely on dynamic or just-in-time (JIT) compilation to generate optimized native code at run-time to deliver high execution performance. Many VMs and JIT compilers collect profile data at run-time to enable profile-...
AOT vs. JIT: impact of profile data on code quality
LCTES '17Just-in-time (JIT) compilation during program execution and ahead-of-time (AOT) compilation during software installation are alternate techniques used by managed language virtual machines (VM) to generate optimized native code while simultaneously ...
AOT vs. JIT: impact of profile data on code quality
LCTES 2017: Proceedings of the 18th ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded SystemsJust-in-time (JIT) compilation during program execution and ahead-of-time (AOT) compilation during software installation are alternate techniques used by managed language virtual machines (VM) to generate optimized native code while simultaneously ...






Comments