10.1145/2443608.2443611acmotherconferencesArticle/Chapter ViewAbstractPublication PagesodesConference Proceedings
research-article

IR-level versus machine-level if-conversion for predicated architectures

ABSTRACT

If-conversion is a simple yet powerful optimization that converts control dependences into data dependences. It allows elimination of branches and increases available instruction level parallelism and thus overall performance. If-conversion can either be applied alone or in combination with other techniques that increase the size of scheduling regions. The presence of hardware support for predicated execution allows if-conversion to be broadly applied in a given program. This makes it necessary to guide the optimization using heuristic estimates regarding its potential benefit. Similar to other transformations in an optimizing compiler, if-conversion in-herently suffers from phase ordering issues. Driven by these facts, we developed two algorithms for if-conversion targeting the TI TMS320C64x+ architecture within the LLVM framework. Each implementation targets a different level of code abstraction. While one targets the intermediate representation, the other addresses machine-level code. Both make use of an adapted set of estimation heuristics and prove to be successful in general, but each one exhibits different strengths and weaknesses. High-level if-conversion, applied before other control flow transformations, has more freedom to operate. But in contrast to its machine-level counter-part, which is more restricted, its estimations of runtime are less accurate. Our results from experimental evaluation show a mean speedup close to 14% for both algorithms on a set of programs from the MiBench and DSPstone benchmark suites. We give a comparison of the implemented optimizations and discuss gained insights on the topics of if-conversion, phase ordering issues and profitability analysis.

References

  1. L. Almagor, K. D. Cooper, A. Grosul, T. J. Harvey, S. W. Reeves, D. Subramanian, L. Torczon, and T. Waterman. Finding effective compilation sequences. In Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, LCTES '04, pages 231--239, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. August, W.-M. Hwu, and S. Mahlke. The partial reverse if-conversion framework for balancing control flow and predication. International Journal of Parallel Programming, pages 381--423, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Bruel. If-conversion ssa framework for partially predicated vliw architectures. In ODES Workshop on Optimizations for DSP and Embedded Systems, pages 5--13, 2006.Google ScholarGoogle Scholar
  4. K. D. Cooper, P. J. Schielke, and D. Subramanian. Optimizing for reduced code space using genetic algorithms. In Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems, LCTES '99, pages 1--9, New York, NY, USA, 1999. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Ebner, F. Brandner, and A. Krall. Leveraging predicated execution for multimedia processing. In IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  6. J. Fang. Compiler algorithms on if-conversion, speculative predicates assignment and predicated code optimizations. In LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Trans. Comput., 30:478--490, July 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W.-M. Hwu, S. Mahlke, W. Chen, P. Chang, N. Warter, R. Bringmann, R. Ouellette, R. Hank, T. Kiyohara, G. Haab, J. Holm, and D. Lavery. The superblock: an effective technique for vliw and superscalar compilation. Journal of Supercomputing, 7(1-2): 229--248, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Joshi, G. Nelson, and Y. Zhou. Denali: A practical algorithm for generating optimal code. ACM Trans. Program. Lang. Syst., 28:967--989, November 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Leupers. Exploiting conditional instructions in code generation for embedded VLIW processors. In Proceedings of the conference on Design, automation and test in Europe - DATE '99, pages 23--es, New York, New York, USA, Jan. 1999. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Mahlke, D. Lin, W. Chen, R. Hank, and R. Bringmann. Effective compier support for predicated execution using the hyperblock. In Proceedings of the 25th Annual International Symposium on Microachitecture, pages 45--54, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Park and M. Schlansker. On predicated execution. Technical report, Tech. report, HP laboratories, 1991.Google ScholarGoogle Scholar
  13. M. Stephenson, S. Amarasinghe, M. Martin, and U.-M. O'Reilly. Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Not., 38:77--90, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Stoutchinin and G. Gao. If-conversion in ssa form. In Proceedings of Euro-Par, pages 336--345, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. R. Tate, M. Stepp, Z. Tatlock, and S. Lerner. Equality saturation: a new approach to optimization. SIGPLAN Not., 44:264--276, January 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Warter, S. Mahlke, W.-M. Hwu, and B. Ramakrishna Rau. Reverse if-conversion. In PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Winkel. Optimal global scheduling for itanium processor family. In Proceedings of the EPIC-2 Workshop, number I, 2002.Google ScholarGoogle Scholar
  18. www.llvm.org.Google ScholarGoogle Scholar
  19. www.ti.com.Google ScholarGoogle Scholar

Index Terms

  1. IR-level versus machine-level if-conversion for predicated architectures

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!