ABSTRACT
If-conversion is a simple yet powerful optimization that converts control dependences into data dependences. It allows elimination of branches and increases available instruction level parallelism and thus overall performance. If-conversion can either be applied alone or in combination with other techniques that increase the size of scheduling regions. The presence of hardware support for predicated execution allows if-conversion to be broadly applied in a given program. This makes it necessary to guide the optimization using heuristic estimates regarding its potential benefit. Similar to other transformations in an optimizing compiler, if-conversion in-herently suffers from phase ordering issues. Driven by these facts, we developed two algorithms for if-conversion targeting the TI TMS320C64x+ architecture within the LLVM framework. Each implementation targets a different level of code abstraction. While one targets the intermediate representation, the other addresses machine-level code. Both make use of an adapted set of estimation heuristics and prove to be successful in general, but each one exhibits different strengths and weaknesses. High-level if-conversion, applied before other control flow transformations, has more freedom to operate. But in contrast to its machine-level counter-part, which is more restricted, its estimations of runtime are less accurate. Our results from experimental evaluation show a mean speedup close to 14% for both algorithms on a set of programs from the MiBench and DSPstone benchmark suites. We give a comparison of the implemented optimizations and discuss gained insights on the topics of if-conversion, phase ordering issues and profitability analysis.
References
- L. Almagor, K. D. Cooper, A. Grosul, T. J. Harvey, S. W. Reeves, D. Subramanian, L. Torczon, and T. Waterman. Finding effective compilation sequences. In Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, LCTES '04, pages 231--239, New York, NY, USA, 2004. ACM. Google Scholar
Digital Library
- D. August, W.-M. Hwu, and S. Mahlke. The partial reverse if-conversion framework for balancing control flow and predication. International Journal of Parallel Programming, pages 381--423, 1999. Google Scholar
Digital Library
- C. Bruel. If-conversion ssa framework for partially predicated vliw architectures. In ODES Workshop on Optimizations for DSP and Embedded Systems, pages 5--13, 2006.Google Scholar
- K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems, LCTES '99, pages 1--9, New York, NY, USA, 1999. ACM. Google Scholar
Digital Library
- D. Ebner, F. Brandner, and A. Krall. Leveraging predicated execution for multimedia processing. In IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2007.Google Scholar
Cross Ref
- J. Fang. Compiler algorithms on if-conversion, speculative predicates assignment and predicated code optimizations. In LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing, 1996. Google Scholar
Digital Library
- J. A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Trans. Comput., 30:478--490, July 1981. Google Scholar
Digital Library
- W.-M. Hwu, S. Mahlke, W. Chen, P. Chang, N. Warter, R. Bringmann, R. Ouellette, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Lavery. The superblock: an effective technique for vliw and superscalar compilation. Journal of Supercomputing, 7(1-2): 229--248, 1993. Google Scholar
Digital Library
- R. Joshi, G. Nelson, and Y. Zhou. Denali: A practical algorithm for generating optimal code. ACM Trans. Program. Lang. Syst., 28:967--989, November 2006. Google Scholar
Digital Library
- R. Leupers. Exploiting conditional instructions in code generation for embedded VLIW processors. In Proceedings of the conference on Design, automation and test in Europe - DATE '99, pages 23--es, New York, New York, USA, Jan. 1999. ACM Press. Google Scholar
Digital Library
- S. Mahlke, D. Lin, W. Chen, R. Hank, and R. Bringmann. Effective compier support for predicated execution using the hyperblock. In Proceedings of the 25th Annual International Symposium on Microachitecture, pages 45--54, 1992. Google Scholar
Digital Library
- J. Park and M. Schlansker. On predicated execution. Technical report, Tech. report, HP laboratories, 1991.Google Scholar
- M. Stephenson, S. Amarasinghe, M. Martin, and U.-M. O'Reilly. Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Not., 38:77--90, May 2003. Google Scholar
Digital Library
- A. Stoutchinin and G. Gao. If-conversion in ssa form. In Proceedings of Euro-Par, pages 336--345, 2004.Google Scholar
Cross Ref
- R. Tate, M. Stepp, Z. Tatlock, and S. Lerner. Equality saturation: a new approach to optimization. SIGPLAN Not., 44:264--276, January 2009. Google Scholar
Digital Library
- N. Warter, S. Mahlke, W.-M. Hwu, and B. Ramakrishna Rau. Reverse if-conversion. In PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, 1993. Google Scholar
Digital Library
- S. Winkel. Optimal global scheduling for itanium processor family. In Proceedings of the EPIC-2 Workshop, number I, 2002.Google Scholar
- www.llvm.org.Google Scholar
- www.ti.com.Google Scholar
Index Terms
IR-level versus machine-level if-conversion for predicated architectures





Comments