skip to main content
tutorial

Free Rider: A Tool for Retargeting Platform-Specific Intrinsic Functions

Authors Info & Claims
Published:04 June 2015Publication History
Skip Abstract Section

Abstract

Short-vector SIMD and DSP instructions are popular extensions to common Isas. These extensions deliver excellent performance and compact code for some compute-intensive applications, but they require specialised compiler support. To enable the programmer to explicitly request the use of such an instruction, many C compilers provide platform-specific intrinsic functions, whose implementation is handled specially by the compiler. The use of such intrinsics, however, inevitably results in non-portable code. In this paper we develop a novel methodology for retargeting such non-portable code, which maps intrinsics from one platform to another, taking advantage of similar intrinsics on the target platform. We employ a description language to specify the signature and semantics of intrinsics and perform graph-based pattern matching and high-level code transformations to derive optimised implementations exploiting the target's intrinsics, wherever possible. We demonstrate the effectiveness of our new methodology, implemented in the FREE RIDER tool, by automatically retargeting benchmarks derived from OpenCV samples and a complex embedded application optimised to run on an Arm Cortex-M4 to an Intel Edison module with Sse4.2 instructions. We achieve a speedup of up to 3.73 over a plain C baseline, and on average 96.0% of the speedup of manually ported and optimised versions of the benchmarks.

References

  1. ARM Ltd., Cortex™-M4 Devices Generic User Guide, 2010.Google ScholarGoogle Scholar
  2. D. Batten, S. Jinturkar, J. Glossner, M. Schulte, and P. D'Arcy, A new approach to DSP intrinsic functions, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences, Jan 2000, pp. 10 pp. vol.1--. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Bradski, The OpenCV Library, Dr. Dobb's Journal of Software Tools (2000).Google ScholarGoogle Scholar
  4. Jianjiang Ceng, Weihua Sheng, Manuel Hohenauer, Rainer Leupers, Gerd Ascheid, Heinrich Meyr, and Gunnar Braun, Modeling instruction semantics in ADL processor descriptions for C compiler retargeting, Journal of VLSI signal processing systems for signal, image and video technology 43 (2006), no. 2-3, 235--246 (English). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L.P. Cordella, P. Foggia, C. Sansone, and M. Vento, A (sub)graph isomorphism algorithm for matching large graphs, Pattern Analysis and Machine Intelligence, IEEE Transactions on 26 (2004), no. 10, 1367--1372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Serge Guelton, SAC: An efficient retargetable source-to-source compiler for multimedia instruction sets, 2010.Google ScholarGoogle Scholar
  7. Weihua Jiang, Chao Mei, Bo Huang, Jianhui Li, Jiahua Zhu, Binyu Zang, and Chuanqi Zhu, Boosting the performance of multimedia applications using SIMD instructions, Compiler Construction (Rastislav Bodik, ed.), Lecture Notes in Computer Science, vol. 3443, Springer Berlin Heidelberg, 2005, pp. 59--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. George Koharchik and Kathy Jones, An introduction to GCC compiler intrinsics in vector processing, Linux Journal, September 2012.Google ScholarGoogle Scholar
  9. Andreas Krall and Sylvain Lelait, Compilation techniques for multimedia processors, Int. J. Parallel Program. 28 (2000), no. 4, 347--361. Google ScholarGoogle ScholarCross RefCross Ref
  10. V. Lipets, N. Vanetik, and E. Gudes, Subsea: an efficient heuristic algorithm for subgraph isomorphism, Data Mining and Knowledge Discovery 19 (2009), no. 3, 320--350 (English).Google ScholarGoogle ScholarCross RefCross Ref
  11. Lorenz Meier, Petri Tanskanen, Lionel Heng, Gim Hee Lee, Friedrich Fraundorfer, and Marc Pollefeys, Pixhawk: A micro aerial vehicle design for autonomous flight using onboard computer vision, Autonomous Robots (2012), 1--19, 10.1007/s10514-012-9281-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gaurav Mitra, Beau Johnston, Alistair P. Rendell, Eric McCreath, and Jun Zhou, Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms, Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (Washington, DC, USA), IPDPSW '13, IEEE Computer Society, 2013, pp. 1107--1116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alastair Murray and Björn Franke, Compiling for automatically generated instruction set extensions, Proceedings of the Tenth International Symposium on Code Generation and Optimization (New York, NY, USA), CGO '12, ACM, 2012, pp. 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, and Ayal Zaks, Vapor SIMD: Auto-vectorize Once, Run Everywhere, Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (Washington, DC, USA), CGO '11, IEEE Computer Society, 2011, pp. 151--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dorit Nuzman and Ayal Zaks, Outer-loop vectorization: Revisited for short SIMD architectures, Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (New York, NY, USA), PACT '08, ACM, 2008, pp. 2--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gilles Pokam, Stéphane Bihan, Julien Simonnet, and François Bodin, SWARP: a retargetable preprocessor for multimedia instructions, Concurrency and Computation: Practice and Experience 16 (2004), no. 2--3, 303--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Sreraman and R. Govindarajan, A vectorizing compiler for multimedia extensions, Int. J. Parallel Program. 28 (2000), no. 4, 363--400. Google ScholarGoogle ScholarCross RefCross Ref
  18. Christian Tenllado, Luis Piñuel, Manuel Prieto, Francisco Tirado, and F. Catthoor, Improving superword level parallelism support in modern compilers, Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (New York, NY, USA), CODES+ISSS '05, ACM, 2005, pp. 303--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Victoria Zhislina, From ARM NEON to Intel SSE -- the automatic porting solution, tips and tricks, Intel Developer Zone, February 2014.Google ScholarGoogle Scholar

Index Terms

  1. Free Rider: A Tool for Retargeting Platform-Specific Intrinsic Functions

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 50, Issue 5
      LCTES '15
      May 2015
      141 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2808704
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        LCTES'15: Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM
        June 2015
        149 pages
        ISBN:9781450332576
        DOI:10.1145/2670529

      Copyright © 2015 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 June 2015

      Check for updates

      Qualifiers

      • tutorial
      • Research
      • Refereed limited

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!