skip to main content
research-article
Open Access
Artifacts Available / v1.1

VESPA: static profiling for binary optimization

Published:15 October 2021Publication History
Skip Abstract Section

Abstract

Over the past few years, there has been a surge in the popularity of binary optimizers such as BOLT, Propeller, Janus and HALO. These tools use dynamic profiling information to make optimization decisions. Although effective, gathering runtime data presents developers with inconveniences such as unrepresentative inputs, the need to accommodate software modifications, and longer build times. In this paper, we revisit the static profiling technique proposed by Calder et al. in the late 90’s, and investigate its application to drive binary optimizations, in the context of the BOLT binary optimizer, as a replacement for dynamic profiling. A few core modifications to Calder et al.’s original proposal, consisting of new program features and a new regression model, are sufficient to enable some of the gains obtained through runtime profiling. An evaluation of BOLT powered by our static profiler on four large benchmarks (clang, GCC, MySQL and PostgreSQL) yields binaries that are 5.47 % faster than the executables produced by clang -O3.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

This is a presentation video of my talk at OOPSLA 2021 of our accepted paper VESPA: Static Profiling for Binary Optimization. In this paper we explored the use of static profiles inferred by a machine learning model in the context of binary optimization. An evaluation of the binary optimizer BOLT powered by our static profiler on four large benchmarks (clang, GCC, MySQL and PostgreSQL) yields binaries that are 5.47% faster than the executables produced by clang -O3.

References

  1. Andrei Rimsa Alvares, Jose Nelson Amaral, and Fernando Magno Quintao Pereira. 2021. Instruction Visibility in SPEC CPU2017. Journal of Computer Languages, 66 (2021), 1–10. https://doi.org/10.1016/j.cola.2021.101062 Google ScholarGoogle ScholarCross RefCross Ref
  2. Andrea Apicella, Francesco Donnarumma, Francesco Isgrò, and Roberto Prevete. 2021. A survey on modern trainable activation functions. Neural Networks, 138 (2021), Jun, 14–32. issn:0893-6080 https://doi.org/10.1016/j.neunet.2021.01.026 Google ScholarGoogle ScholarCross RefCross Ref
  3. Thomas Ball and James R. Larus. 1993. Branch Prediction for Free. SIGPLAN Not., 28, 6 (1993), 300–313. issn:0362-1340 https://doi.org/10.1145/173262.155119 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sumit Bandyopadhyay, Vimal S. Begwani, and Robert B. Murray. 1987. Compiling for the CRISP Microprocessor. In COMPCON. IEEE Computer Society, San Francisco, California, USA. 96–101.Google ScholarGoogle Scholar
  5. Brad Calder, Dirk Grunwald, Michael Jones, Donald Lindsay, James Martin, Michael Mozer, and Benjamin Zorn. 1997. Evidence-Based Static Branch Prediction Using Machine Learning. ACM Trans. Program. Lang. Syst., 19, 1 (1997), 188–222. issn:0164-0925 https://doi.org/10.1145/239912.239923 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Dehao Chen, David Xinliang Li, and Tipp Moseley. 2016. AutoFDO: Automatic Feedback-Directed Optimization for Warehouse-Scale Applications. In CGO. Association for Computing Machinery, New York, NY, USA. 12–23. isbn:9781450337786 https://doi.org/10.1145/2854038.2854044 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. P. Dempster. 1967. Upper and Lower Probabilities Induced by a Multivalued Mapping. Ann. Math. Statist., 38, 2 (1967), 04, 325–339. https://doi.org/10.1214/aoms/1177698950 Google ScholarGoogle ScholarCross RefCross Ref
  8. Veerle Desmet, Lieven Eeckhout, and Koen De Bosschere. 2005. Using Decision Trees to Improve Program-Based and Profile-Based Static Branch Prediction. In ACSAC. Springer-Verlag, Berlin, Heidelberg. 336–352. isbn:3540296433 https://doi.org/10.1007/11572961_27 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joseph A. Fisher and Stefan M. Freudenberger. 1992. Predicting Conditional Branch Directions from Previous Runs of a Program. In ASPLOS. ACM, New York, NY, USA. 85–95. isbn:0-89791-534-8 https://doi.org/10.1145/143365.143493 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. John L. Hennessy and David A. Patterson. 2011. Computer Architecture, Fifth Edition: A Quantitative Approach (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. isbn:012383872XGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  11. Urs Hölzle and David Ungar. 1994. Optimizing Dynamically-Dispatched Calls with Run-Time Type Feedback. In PLDI. ACM, New York, NY, USA. 326–336. isbn:089791662X https://doi.org/10.1145/178243.178478 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bhargava Kalla, Nandakishore Santhi, Abdel-Hameed A. Badawy, Gopinath Chennupati, and Stephan J. Eidenbenz. 2017. A Probabilistic Monte Carlo Framework for Branch Prediction. In CLUSTER. IEEE Computer Society, New York, NY, USA. 651–652. https://doi.org/10.1109/CLUSTER.2017.29 Google ScholarGoogle ScholarCross RefCross Ref
  13. C. Laurent, G. Pereyra, P. Brakel, Y. Zhang, and Y. Bengio. 2016. Batch normalized recurrent neural networks. In ICASSP. IEEE, Shanghai, China. 2657–2661. https://doi.org/10.1109/ICASSP.2016.7472159 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. David Xinliang Li, Raksit Ashok, and Robert Hundt. 2010. Lightweight Feedback-Directed Cross-Module Optimization. In CGO. ACM, New York, NY, USA. 53–61. isbn:9781605586359 https://doi.org/10.1145/1772954.1772964 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yonghua Mao, Junjie Shen, and Xiaolin Gui. 2018. A Study on Deep Belief Net for Branch Prediction. Access, 6 (2018), 10779–10786. https://doi.org/10.1109/ACCESS.2017.2772334 Google ScholarGoogle ScholarCross RefCross Ref
  16. Mircea Namolaru, Albert Cohen, Grigori Fursin, Ayal Zaks, and Ari Freund. 2010. Practical Aggregation of Semantical Program Properties for Machine Learning Based Optimization. In CASES. Association for Computing Machinery, New York, NY, USA. 197–206. isbn:9781605589039 https://doi.org/10.1145/1878921.1878951 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Guilherme Ottoni. 2018. HHVM JIT: A Profile-Guided, Region-Based Compiler for PHP and Hack. In PLDI. ACM, New York, NY, USA. 151–165. isbn:9781450356985 https://doi.org/10.1145/3192366.3192374 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Guilherme Ottoni and Bertrand Maher. 2017. Optimizing Function Placement for Large-Scale Data-Center Applications. In CGO. IEEE Press, United States. 233–244. isbn:9781509049318 https://doi.org/10.1109/CGO.2017.7863743 Google ScholarGoogle ScholarCross RefCross Ref
  19. Maksim Panchenko. 2018. Optimizing Clang : A Practical Example of Applying BOLT. https://github.com/facebookincubator/BOLT/blob/master/docs/OptimizingClang.mdGoogle ScholarGoogle Scholar
  20. Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. BOLT: A Practical Binary Optimizer for Data Centers and Beyond. In CGO. IEEE Press, Washington, DC, USA. 2–14. isbn:9781728114361 https://doi.org/10.5555/3314872.3314876Google ScholarGoogle Scholar
  21. Maksim Panchenko, Rafael Auler, Laith Sakka, and Guilherme Ottoni. 2021. Lightning BOLT: Powerful, Fast, and Scalable Binary Optimization. In CC. Association for Computing Machinery, New York, NY, USA. 119–130. isbn:9781450383257 https://doi.org/10.1145/3446804.3446843 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Fernando Magno Quintão Pereira, Guilherme Vieira Leobas, and Abdoulaye Gamatié. 2018. Static Prediction of Silent Stores. ACM Trans. Archit. Code Optim., 15, 4 (2018), Article 44, Nov., 26 pages. issn:1544-3566 https://doi.org/10.1145/3280848 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Adam Preuss. 2010. Implementation of Path Profiling in the Low-Level Virtual-Machine (LLVM) Compiler Infrastructure. University of Alberta. https://doi.org/10.7939/R3GF0MX64 Google ScholarGoogle ScholarCross RefCross Ref
  24. Henry Gordon Rice. 1953. Classes of recursively enumerable sets and their decision problems. Trans. Amer. Math. Soc., 74, 2 (1953), 358–366. https://doi.org/10.1090/s0002-9947-1953-0053041-6 Google ScholarGoogle ScholarCross RefCross Ref
  25. Andrei Rimsa, Jose Nelson Amaral, and Fernando Magno Quintao Pereira. 2021. Practical dynamic reconstruction of control flow graphs. Softw. Pract. Exp., 51, 2 (2021), 353–384. https://doi.org/10.1002/spe.2907 Google ScholarGoogle ScholarCross RefCross Ref
  26. Andrei Rimsa, Jose Nelson Amaral, and Fernando Magno Quintao Pereira. 2019. Efficient and Precise Dynamic Construction of Control Flow Graphs. In SBLP. Association for Computing Machinery, New York, NY, USA. 19–26. isbn:9781450376389 https://doi.org/10.1145/3355378.3355383 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. James E. Smith. 1981. A Study of Branch Prediction Strategies. In ISCA. IEEE Computer Society Press, Washington, DC, USA. 135–148. https://doi.org/10.5555/800052.801871Google ScholarGoogle Scholar
  28. Sriraman Tallam. 2019. Profile Guided Optimizing Large Scale LLVM-based Relinker. Google. https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdfGoogle ScholarGoogle Scholar
  29. Stephen J. Tarsa, Chit-Kwan Lin, Gokce Keskin, Gautham N. Chinya, and Hong Wang. 2019. Improving Branch Prediction By Modeling Global History with Convolutional Neural Networks. CoRR, abs/1906.09889 (2019), 1–6. arxiv:1906.09889. arxiv:1906.09889Google ScholarGoogle Scholar
  30. April W. Wade, Prasad A. Kulkarni, and Michael R. Jantz. 2017. AOT vs. JIT: Impact of Profile Data on Code Quality. In LCTES. Association for Computing Machinery, New York, NY, USA. 1–10. isbn:9781450350303 https://doi.org/10.1145/3078633.3081037 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zheng Wang and Michael O’Boyle. 2018. Machine Learning in Compiler Optimization. Proc. IEEE, PP (2018), 05, 1–23. https://doi.org/10.1109/JPROC.2018.2817118 Google ScholarGoogle ScholarCross RefCross Ref
  32. Youfeng Wu and James R. Larus. 1994. Static Branch Frequency and Program Profile Analysis. In MICRO. Association for Computing Machinery, New York, NY, USA. 1–11. isbn:0897917073 https://doi.org/10.1145/192724.192725 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. VESPA: static profiling for binary optimization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!