skip to main content
10.1145/3503823.3503879acmotherconferencesArticle/Chapter ViewAbstractPublication PagespciConference Proceedingsconference-collections
research-article

Branchless Code Generation for Modern Processor Architectures

Authors Info & Claims
Published:22 February 2022Publication History

ABSTRACT

Compilers apply transformations to the code they compile in order to make it run faster without changing its behavior. This process is called code optimization. Modern compilers apply many different passes of code optimization to ensure maximum runtime performance and efficiency, at the rather negligible expense of larger compilation times. This study focuses on a particular optimization, called branchless optimization, which eliminates code branches by utilizing different data transformation techniques that have the same effect. Such techniques are explored on their implementation on the LLVM IR and MIPS and partly ARM assembly, and ranked based on their runtime efficiency. Moreover, the stages of implementing the optimization transformation are explored, as well as different instruction set features that some CPU architectures provide that can be used to increase the efficiency of the optimization.

References

  1. Pietro Borrello, Daniele C.D'Elia, Leonardo Querzoni and Cristiano Giuffrida, 2021. Constantine: Automatic Side-Channel Resistance Using Efficient Control and Data Flow Linearization. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Claudio Canella, Sai M.P.Dinakarrao, Daniel Gruss and Khaled N.Khasawneh, 2020. Evolution of Defenses against Transient-Execution Attacks. In proceedings of the Great Lakes Symposium on VLSI (GLSVLSI’20), 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amr Elmasry and Jyrki Katajainen, 2013. Branchless search programs. SEA 2013, LNCS 7933. 2013.Google ScholarGoogle Scholar
  4. T.L.Jeremiah, Stamatis Vassiliadis and Bart Blaner. 2000. Superscalar branch instruction processor, 2000.Google ScholarGoogle Scholar
  5. Marek Kokot, Sebastian Deorowicz and Maciej Dlugosz, 2017. Even faster sorting of (not only) integers. In the Advances in Intelligent Systems and Computing book series, AISC v.659, 2017.Google ScholarGoogle Scholar
  6. Geoff Langdale and Daniel Lemire, 2019. Parsing gigabytes of JSON per second. In the VLDB Journal, v.28. 2019.Google ScholarGoogle Scholar
  7. Daniel Lemire, 2020. Making your code faster by taming branches. InfoQ online journal, 2020. https://www.infoq.com/articles/making-code-faster-taming-branches/Google ScholarGoogle Scholar
  8. Cassio Neri, 2018. A loopless and branchless O(1) algorithm to generate the next Dyck word. Creative Commons 2018.Google ScholarGoogle Scholar
  9. S.J.Patel, T.Tung, S.Bose and M.M.Crum. 2000. Increasing the size of atomic instruction blocks using control-flow assertions. In Proceedings of the IEEE 33rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-33), 2000.Google ScholarGoogle Scholar
  10. Martin Schwarzl, Claudio Canella, Daniel Gruss and Michael Schwarz, 2021. Specfuscator: Evaluating Branch Removal as a Spectre Mitigation. In Proceedings of the FC Financial Cryptography and Data Security Conference, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Laurent Simon, David Chisnall and Ross Anderson, 2018. What you get is what you C: Controlling side effects in mainstream C compilers. In proceedings of the 2018 IEEE European Symposium on Security and Privacy (EuroS&P), 2018.Google ScholarGoogle ScholarCross RefCross Ref
  12. E. Wenger and Johann Grossschaedl. 2012. An 8-bit AVR-based elliptic curve cryptographic RISC processor for the Internet of Things. In proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops (MICRO-45), 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Wurster and J. Ward. 2016. Towards efficient dynamic integer overflow detection on ARM processors. Research Technical Report, Blackberry 2016.Google ScholarGoogle Scholar

Index Terms

(auto-classified)
  1. Branchless Code Generation for Modern Processor Architectures

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      PCI '21: Proceedings of the 25th Pan-Hellenic Conference on Informatics
      November 2021
      499 pages

      Copyright © 2021 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 February 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate190of390submissions,49%
    • Article Metrics

      • Downloads (Last 12 months)38
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format