Abstract
Over the past few years, there has been a surge in the popularity of binary optimizers such as BOLT, Propeller, Janus and HALO. These tools use dynamic profiling information to make optimization decisions. Although effective, gathering runtime data presents developers with inconveniences such as unrepresentative inputs, the need to accommodate software modifications, and longer build times. In this paper, we revisit the static profiling technique proposed by Calder et al. in the late 90’s, and investigate its application to drive binary optimizations, in the context of the BOLT binary optimizer, as a replacement for dynamic profiling. A few core modifications to Calder et al.’s original proposal, consisting of new program features and a new regression model, are sufficient to enable some of the gains obtained through runtime profiling. An evaluation of BOLT powered by our static profiler on four large benchmarks (clang, GCC, MySQL and PostgreSQL) yields binaries that are 5.47 % faster than the executables produced by clang -O3.
Supplemental Material
- Andrei Rimsa Alvares, Jose Nelson Amaral, and Fernando Magno Quintao Pereira. 2021. Instruction Visibility in SPEC CPU2017. Journal of Computer Languages, 66 (2021), 1–10. https://doi.org/10.1016/j.cola.2021.101062 Google Scholar
Cross Ref
- Andrea Apicella, Francesco Donnarumma, Francesco Isgrò, and Roberto Prevete. 2021. A survey on modern trainable activation functions. Neural Networks, 138 (2021), Jun, 14–32. issn:0893-6080 https://doi.org/10.1016/j.neunet.2021.01.026 Google Scholar
Cross Ref
- Thomas Ball and James R. Larus. 1993. Branch Prediction for Free. SIGPLAN Not., 28, 6 (1993), 300–313. issn:0362-1340 https://doi.org/10.1145/173262.155119 Google Scholar
Digital Library
- Sumit Bandyopadhyay, Vimal S. Begwani, and Robert B. Murray. 1987. Compiling for the CRISP Microprocessor. In COMPCON. IEEE Computer Society, San Francisco, California, USA. 96–101.Google Scholar
- Brad Calder, Dirk Grunwald, Michael Jones, Donald Lindsay, James Martin, Michael Mozer, and Benjamin Zorn. 1997. Evidence-Based Static Branch Prediction Using Machine Learning. ACM Trans. Program. Lang. Syst., 19, 1 (1997), 188–222. issn:0164-0925 https://doi.org/10.1145/239912.239923 Google Scholar
Digital Library
- Dehao Chen, David Xinliang Li, and Tipp Moseley. 2016. AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications. In CGO. Association for Computing Machinery, New York, NY, USA. 12–23. isbn:9781450337786 https://doi.org/10.1145/2854038.2854044 Google Scholar
Digital Library
- A. P. Dempster. 1967. Upper and Lower Probabilities Induced by a Multivalued Mapping. Ann. Math. Statist., 38, 2 (1967), 04, 325–339. https://doi.org/10.1214/aoms/1177698950 Google Scholar
Cross Ref
- Veerle Desmet, Lieven Eeckhout, and Koen De Bosschere. 2005. Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction. In ACSAC. Springer-Verlag, Berlin, Heidelberg. 336–352. isbn:3540296433 https://doi.org/10.1007/11572961_27 Google Scholar
Digital Library
- Joseph A. Fisher and Stefan M. Freudenberger. 1992. Predicting Conditional Branch Directions from Previous Runs of a Program. In ASPLOS. ACM, New York, NY, USA. 85–95. isbn:0-89791-534-8 https://doi.org/10.1145/143365.143493 Google Scholar
Digital Library
- John L. Hennessy and David A. Patterson. 2011. Computer Architecture, Fifth Edition: A Quantitative Approach (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. isbn:012383872XGoogle Scholar
Digital Library
- Urs Hölzle and David Ungar. 1994. Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback. In PLDI. ACM, New York, NY, USA. 326–336. isbn:089791662X https://doi.org/10.1145/178243.178478 Google Scholar
Digital Library
- Bhargava Kalla, Nandakishore Santhi, Abdel-Hameed A. Badawy, Gopinath Chennupati, and Stephan J. Eidenbenz. 2017. A Probabilistic Monte Carlo Framework for Branch Prediction. In CLUSTER. IEEE Computer Society, New York, NY, USA. 651–652. https://doi.org/10.1109/CLUSTER.2017.29 Google Scholar
Cross Ref
- C. Laurent, G. Pereyra, P. Brakel, Y. Zhang, and Y. Bengio. 2016. Batch normalized recurrent neural networks. In ICASSP. IEEE, Shanghai, China. 2657–2661. https://doi.org/10.1109/ICASSP.2016.7472159 Google Scholar
Digital Library
- David Xinliang Li, Raksit Ashok, and Robert Hundt. 2010. Lightweight Feedback-Directed Cross-Module Optimization. In CGO. ACM, New York, NY, USA. 53–61. isbn:9781605586359 https://doi.org/10.1145/1772954.1772964 Google Scholar
Digital Library
- Yonghua Mao, Junjie Shen, and Xiaolin Gui. 2018. A Study on Deep Belief Net for Branch Prediction. Access, 6 (2018), 10779–10786. https://doi.org/10.1109/ACCESS.2017.2772334 Google Scholar
Cross Ref
- Mircea Namolaru, Albert Cohen, Grigori Fursin, Ayal Zaks, and Ari Freund. 2010. Practical Aggregation of Semantical Program Properties for Machine Learning Based Optimization. In CASES. Association for Computing Machinery, New York, NY, USA. 197–206. isbn:9781605589039 https://doi.org/10.1145/1878921.1878951 Google Scholar
Digital Library
- Guilherme Ottoni. 2018. HHVM JIT: A Profile-Guided, Region-Based Compiler for PHP and Hack. In PLDI. ACM, New York, NY, USA. 151–165. isbn:9781450356985 https://doi.org/10.1145/3192366.3192374 Google Scholar
Digital Library
- Guilherme Ottoni and Bertrand Maher. 2017. Optimizing Function Placement for Large-Scale Data-Center Applications. In CGO. IEEE Press, United States. 233–244. isbn:9781509049318 https://doi.org/10.1109/CGO.2017.7863743 Google Scholar
Cross Ref
- Maksim Panchenko. 2018. Optimizing Clang : A Practical Example of Applying BOLT. https://github.com/facebookincubator/BOLT/blob/master/docs/OptimizingClang.mdGoogle Scholar
- Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. BOLT: A Practical Binary Optimizer for Data Centers and Beyond. In CGO. IEEE Press, Washington, DC, USA. 2–14. isbn:9781728114361 https://doi.org/10.5555/3314872.3314876Google Scholar
- Maksim Panchenko, Rafael Auler, Laith Sakka, and Guilherme Ottoni. 2021. Lightning BOLT: Powerful, Fast, and Scalable Binary Optimization. In CC. Association for Computing Machinery, New York, NY, USA. 119–130. isbn:9781450383257 https://doi.org/10.1145/3446804.3446843 Google Scholar
Digital Library
- Fernando Magno Quintão Pereira, Guilherme Vieira Leobas, and Abdoulaye Gamatié. 2018. Static Prediction of Silent Stores. ACM Trans. Archit. Code Optim., 15, 4 (2018), Article 44, Nov., 26 pages. issn:1544-3566 https://doi.org/10.1145/3280848 Google Scholar
Digital Library
- Adam Preuss. 2010. Implementation of Path Profiling in the Low-Level Virtual-Machine (LLVM) Compiler Infrastructure. University of Alberta. https://doi.org/10.7939/R3GF0MX64 Google Scholar
Cross Ref
- Henry Gordon Rice. 1953. Classes of recursively enumerable sets and their decision problems. Trans. Amer. Math. Soc., 74, 2 (1953), 358–366. https://doi.org/10.1090/s0002-9947-1953-0053041-6 Google Scholar
Cross Ref
- Andrei Rimsa, Jose Nelson Amaral, and Fernando Magno Quintao Pereira. 2021. Practical dynamic reconstruction of control flow graphs. Softw. Pract. Exp., 51, 2 (2021), 353–384. https://doi.org/10.1002/spe.2907 Google Scholar
Cross Ref
- Andrei Rimsa, Jose Nelson Amaral, and Fernando Magno Quintao Pereira. 2019. Efficient and Precise Dynamic Construction of Control Flow Graphs. In SBLP. Association for Computing Machinery, New York, NY, USA. 19–26. isbn:9781450376389 https://doi.org/10.1145/3355378.3355383 Google Scholar
Digital Library
- James E. Smith. 1981. A Study of Branch Prediction Strategies. In ISCA. IEEE Computer Society Press, Washington, DC, USA. 135–148. https://doi.org/10.5555/800052.801871Google Scholar
- Sriraman Tallam. 2019. Profile Guided Optimizing Large Scale LLVM-based Relinker. Google. https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdfGoogle Scholar
- Stephen J. Tarsa, Chit-Kwan Lin, Gokce Keskin, Gautham N. Chinya, and Hong Wang. 2019. Improving Branch Prediction By Modeling Global History with Convolutional Neural Networks. CoRR, abs/1906.09889 (2019), 1–6. arxiv:1906.09889. arxiv:1906.09889Google Scholar
- April W. Wade, Prasad A. Kulkarni, and Michael R. Jantz. 2017. AOT vs. JIT: Impact of Profile Data on Code Quality. In LCTES. Association for Computing Machinery, New York, NY, USA. 1–10. isbn:9781450350303 https://doi.org/10.1145/3078633.3081037 Google Scholar
Digital Library
- Zheng Wang and Michael O’Boyle. 2018. Machine Learning in Compiler Optimization. Proc. IEEE, PP (2018), 05, 1–23. https://doi.org/10.1109/JPROC.2018.2817118 Google Scholar
Cross Ref
- Youfeng Wu and James R. Larus. 1994. Static Branch Frequency and Program Profile Analysis. In MICRO. Association for Computing Machinery, New York, NY, USA. 1–11. isbn:0897917073 https://doi.org/10.1145/192724.192725 Google Scholar
Digital Library
Index Terms
VESPA: static profiling for binary optimization
Recommendations
Accurate profiling in the presence of dynamic compilation
OOPSLA '15Many profilers based on bytecode instrumentation yield wrong results in the presence of an optimizing dynamic compiler, either due to not being aware of optimizations such as stack allocation and method inlining, or due to the inserted code disrupting ...
Accurate profiling in the presence of dynamic compilation
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsMany profilers based on bytecode instrumentation yield wrong results in the presence of an optimizing dynamic compiler, either due to not being aware of optimizations such as stack allocation and method inlining, or due to the inserted code disrupting ...
Profile-guided meta-programming
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and ImplementationContemporary compiler systems such as GCC, .NET, and LLVM incorporate profile-guided optimizations (PGOs) on low-level intermediate code and basic blocks, with impressive results over purely static heuristics. Recent work shows that profile information ...






Comments