Abstract
This paper describes a compiler optimization to eliminates dynamic occurrences of expressions in the format a ← a ⊕ b ⊗ c. The operation ⊕ must admit an identity element z, such that a ⊕ z = a. Also, z must be the absorbing element of ⊗, such that b ⊗ z = z ⊗ c = z. Semirings where ⊕ is the additive operator and ⊗ is the multiplicative operator meet this contract. This pattern is common in high-performance benchmarks—its canonical representative being the multiply-add operation a ← a + b × c. However, several other expressions involving arithmetic and logic operations satisfy the required algebra. We show that the runtime elimination of such assignments can be implemented in a performance-safe way via online profiling. The elimination of dynamic redundancies involving identity and absorbing elements in 35 programs of the LLVM test suite that present semiring patterns brings an average speedup of 1.19x (total optimized time over total unoptimized time) on top of clang -O3. When projected onto the entire test suite (259 programs) the optimization leads to a speedup of 1.025x. Once added onto clang, semiring optimizations approximates it to TACO, a specialized tensor compiler.
Supplemental Material
- Kadir Akbudak, Hatem Ltaief, Aleksandr Mikhalev, Ali Charara, Aniello Esposito, and David E. Keyes. 2018. Exploiting Data Sparsity for Large-Scale Matrix Computations. In Euro-Par. Springer, Heidelberg, Germany, 721-734. https: //doi.org/10.1007/978-3-319-96983-1_51 Google Scholar
Cross Ref
- Gordon B. Bell, Kevin M. Lepak, and Mikko H. Lipasti. 2000. Characterization of Silent Stores. In PACT. IEEE, Washington, DC, USA, 133-142.Google Scholar
- Hans-J. Boehm and Dhruva R. Chakrabarti. 2016. Persistence Programming Models for Non-Volatile Memory. In ISMM. Association for Computing Machinery, New York, NY, USA, 55-67. https://doi.org/10.1145/2926697.2926704 Google Scholar
Digital Library
- Qiong Cai and Jingling Xue. 2003. Optimal and Eficient Speculation-Based Partial Redundancy Elimination. In CGO. IEEE, USA, 91-102.Google Scholar
- Brad Calder, Peter Feller, and Alan Eustace. 1997. Value Profiling. In MICRO. IEEE, USA, 259-269.Google Scholar
- D. Callahan, J. Dongarra, and D. Levine. 1988. Vectorizing Compilers: A Test Suite and Results. In Supercomputing (Orlando, Florida, USA). IEEE, Washington, DC, USA, 98-105.Google Scholar
- Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An Automated End-to-end Optimizing Compiler for Deep Learning. In OSDI (Carlsbad, CA, USA). USENIX Association, Berkeley, CA, USA, 579-594. http://dl.acm.org/citation.cfm?id= 3291168. 3291211Google Scholar
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Cliford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press, Cambridge, MA, US.Google Scholar
Digital Library
- R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. 1989. An Eficient Method of Computing Static Single Assignment Form. In POPL (Austin, Texas, USA). ACM, New York, NY, USA, 25-35. https://doi.org/10.1145/75277.75280 Google Scholar
Digital Library
- Brian Grant, Matthai Philipose, Markus Mock, Craig Chambers, and Susan J. Eggers. 1999. An Evaluation of Staged Run-Time Optimizations in DyC. In PLDI. Association for Computing Machinery, New York, NY, USA, 293-304. https: //doi.org/10.1145/301618.301683 Google Scholar
Digital Library
- David Hilbert. 1904. Die Theorie der algebraischen Zahlkörper. Jahresbericht der Deutschen Mathematiker-Vereinigung, Germany.Google Scholar
- David G. Hough and Mike Cowlishaw. 2019. IEEE Standard for Floating-Point Arithmetic., 84 pages.Google Scholar
- Wen-Mei W. Hwu, Scott A. Mahlke, William Y. Chen, Pohua P. Chang, Nancy J. Warter, Roger A. Bringmann, Roland G. Ouellette, Richard E. Hank, Tokuzo Kiyohara, Grant E. Haab, John G. Holm, and Daniel M. Lavery. 1993. The Superblock: An Efective Technique for VLIW and Superscalar Compilation. J. Supercomput. 7, 1-2 (May 1993 ), 229-248. https: //doi.org/10.1007/BF01205185 Google Scholar
Digital Library
- Daniel A. Jiménez. 2003. Reconsidering Complex Branch Predictors. In HPCA (HPCA '03). IEEE Computer Society, USA, 43.Google Scholar
- Samira Khan, Chris Wilkerson, Zhe Wang, Alaa R. Alameldeen, Donghyuk Lee, and Onur Mutlu. 2017. Detecting and Mitigating Data-dependent DRAM Failures by Exploiting Current Memory Content. In MICRO. ACM, New York, NY, USA, 27-40.Google Scholar
Index Terms
Semiring optimizations: dynamic elision of expressions with identity and absorbing elements
Recommendations
A semiring-like representation of lattice pseudoeffect algebras
In order to represent lattice pseudoeffect algebras, a non-commutative generalization of lattice effect algebras, in terms of a particular subclass of near semirings, we introduce in this article the notion of near pseudoeffect semiring. Taking ...
Chaotic dynamic weight particle swarm optimization for numerical function optimization
Particle swarm optimization (PSO), which is inspired by social behaviors of individuals in bird swarms, is a nature-inspired and global optimization algorithm. The PSO method is easy to implement and has shown good performance for many real-world ...
Semiring Neighbours: An Algebraic Embedding and Extension of Neighbourhood Logic
In 1996 Zhou and Hansen proposed a first-order interval logic called Neighbourhood Logic (NL) for specifying liveness and fairness of computing systems and defining notions of real analysis in terms of expanding modalities. After that, Roy and Zhou ...






Comments