ABSTRACT
Compilers apply transformations to the code they compile in order to make it run faster without changing its behavior. This process is called code optimization. Modern compilers apply many different passes of code optimization to ensure maximum runtime performance and efficiency, at the rather negligible expense of larger compilation times. This study focuses on a particular optimization, called branchless optimization, which eliminates code branches by utilizing different data transformation techniques that have the same effect. Such techniques are explored on their implementation on the LLVM IR and MIPS and partly ARM assembly, and ranked based on their runtime efficiency. Moreover, the stages of implementing the optimization transformation are explored, as well as different instruction set features that some CPU architectures provide that can be used to increase the efficiency of the optimization.
- Pietro Borrello, Daniele C.D'Elia, Leonardo Querzoni and Cristiano Giuffrida, 2021. Constantine: Automatic Side-Channel Resistance Using Efficient Control and Data Flow Linearization. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2021.Google Scholar
Digital Library
- Claudio Canella, Sai M.P.Dinakarrao, Daniel Gruss and Khaled N.Khasawneh, 2020. Evolution of Defenses against Transient-Execution Attacks. In proceedings of the Great Lakes Symposium on VLSI (GLSVLSI’20), 2020.Google Scholar
Digital Library
- Amr Elmasry and Jyrki Katajainen, 2013. Branchless search programs. SEA 2013, LNCS 7933. 2013.Google Scholar
- T.L.Jeremiah, Stamatis Vassiliadis and Bart Blaner. 2000. Superscalar branch instruction processor, 2000.Google Scholar
- Marek Kokot, Sebastian Deorowicz and Maciej Dlugosz, 2017. Even faster sorting of (not only) integers. In the Advances in Intelligent Systems and Computing book series, AISC v.659, 2017.Google Scholar
- Geoff Langdale and Daniel Lemire, 2019. Parsing gigabytes of JSON per second. In the VLDB Journal, v.28. 2019.Google Scholar
- Daniel Lemire, 2020. Making your code faster by taming branches. InfoQ online journal, 2020. https://www.infoq.com/articles/making-code-faster-taming-branches/Google Scholar
- Cassio Neri, 2018. A loopless and branchless O(1) algorithm to generate the next Dyck word. Creative Commons 2018.Google Scholar
- S.J.Patel, T.Tung, S.Bose and M.M.Crum. 2000. Increasing the size of atomic instruction blocks using control-flow assertions. In Proceedings of the IEEE 33rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-33), 2000.Google Scholar
- Martin Schwarzl, Claudio Canella, Daniel Gruss and Michael Schwarz, 2021. Specfuscator: Evaluating Branch Removal as a Spectre Mitigation. In Proceedings of the FC Financial Cryptography and Data Security Conference, 2021.Google Scholar
Digital Library
- Laurent Simon, David Chisnall and Ross Anderson, 2018. What you get is what you C: Controlling side effects in mainstream C compilers. In proceedings of the 2018 IEEE European Symposium on Security and Privacy (EuroS&P), 2018.Google Scholar
Cross Ref
- E. Wenger and Johann Grossschaedl. 2012. An 8-bit AVR-based elliptic curve cryptographic RISC processor for the Internet of Things. In proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture Workshops (MICRO-45), 2012.Google Scholar
Digital Library
- G. Wurster and J. Ward. 2016. Towards efficient dynamic integer overflow detection on ARM processors. Research Technical Report, Blackberry 2016.Google Scholar
Index Terms
(auto-classified)Branchless Code Generation for Modern Processor Architectures
Recommendations
New Code Generation Algorithm for QueueCore An Embedded Processor with High ILP
PDCAT '07: Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and TechnologiesModern architectures rely on exploiting parallelism found at the instruction level to achieve high performance. Aggressive ILP compilers expose high amounts of instruction level parallelism where, in some cases, the number of architected registers is ...
Compiler optimization and ordering effects on VLIW code compression
CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systemsCode size has always been an important issue for all embedded applications as well as larger systems. Code compression techniques have been devised as a way of battling bloated code; however, the impact of VLIW compiler methods and outputs on these ...
Compiler support for value-based indirect branch prediction
CC'12: Proceedings of the 21st international conference on Compiler ConstructionIndirect branch targets are hard to predict as there may be multiple targets corresponding to a single indirect branch instruction. Value Based BTB Indexing (VBBI), a recently proposed indirect branch prediction technique, utilizes the compiler to ...






Comments