Abstract
Short-vector SIMD and DSP instructions are popular extensions to common Isas. These extensions deliver excellent performance and compact code for some compute-intensive applications, but they require specialised compiler support. To enable the programmer to explicitly request the use of such an instruction, many C compilers provide platform-specific intrinsic functions, whose implementation is handled specially by the compiler. The use of such intrinsics, however, inevitably results in non-portable code. In this paper we develop a novel methodology for retargeting such non-portable code, which maps intrinsics from one platform to another, taking advantage of similar intrinsics on the target platform. We employ a description language to specify the signature and semantics of intrinsics and perform graph-based pattern matching and high-level code transformations to derive optimised implementations exploiting the target's intrinsics, wherever possible. We demonstrate the effectiveness of our new methodology, implemented in the FREE RIDER tool, by automatically retargeting benchmarks derived from OpenCV samples and a complex embedded application optimised to run on an Arm Cortex-M4 to an Intel Edison module with Sse4.2 instructions. We achieve a speedup of up to 3.73 over a plain C baseline, and on average 96.0% of the speedup of manually ported and optimised versions of the benchmarks.
- ARM Ltd., Cortex™-M4 Devices Generic User Guide, 2010.Google Scholar
- D. Batten, S. Jinturkar, J. Glossner, M. Schulte, and P. D'Arcy, A new approach to DSP intrinsic functions, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences, Jan 2000, pp. 10 pp. vol.1--. Google Scholar
Digital Library
- G. Bradski, The OpenCV Library, Dr. Dobb's Journal of Software Tools (2000).Google Scholar
- Jianjiang Ceng, Weihua Sheng, Manuel Hohenauer, Rainer Leupers, Gerd Ascheid, Heinrich Meyr, and Gunnar Braun, Modeling instruction semantics in ADL processor descriptions for C compiler retargeting, Journal of VLSI signal processing systems for signal, image and video technology 43 (2006), no. 2-3, 235--246 (English). Google Scholar
Digital Library
- L.P. Cordella, P. Foggia, C. Sansone, and M. Vento, A (sub)graph isomorphism algorithm for matching large graphs, Pattern Analysis and Machine Intelligence, IEEE Transactions on 26 (2004), no. 10, 1367--1372. Google Scholar
Digital Library
- Serge Guelton, SAC: An efficient retargetable source-to-source compiler for multimedia instruction sets, 2010.Google Scholar
- Weihua Jiang, Chao Mei, Bo Huang, Jianhui Li, Jiahua Zhu, Binyu Zang, and Chuanqi Zhu, Boosting the performance of multimedia applications using SIMD instructions, Compiler Construction (Rastislav Bodik, ed.), Lecture Notes in Computer Science, vol. 3443, Springer Berlin Heidelberg, 2005, pp. 59--75. Google Scholar
Digital Library
- George Koharchik and Kathy Jones, An introduction to GCC compiler intrinsics in vector processing, Linux Journal, September 2012.Google Scholar
- Andreas Krall and Sylvain Lelait, Compilation techniques for multimedia processors, Int. J. Parallel Program. 28 (2000), no. 4, 347--361. Google Scholar
Cross Ref
- V. Lipets, N. Vanetik, and E. Gudes, Subsea: an efficient heuristic algorithm for subgraph isomorphism, Data Mining and Knowledge Discovery 19 (2009), no. 3, 320--350 (English).Google Scholar
Cross Ref
- Lorenz Meier, Petri Tanskanen, Lionel Heng, Gim Hee Lee, Friedrich Fraundorfer, and Marc Pollefeys, Pixhawk: A micro aerial vehicle design for autonomous flight using onboard computer vision, Autonomous Robots (2012), 1--19, 10.1007/s10514-012-9281-4. Google Scholar
Digital Library
- Gaurav Mitra, Beau Johnston, Alistair P. Rendell, Eric McCreath, and Jun Zhou, Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms, Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (Washington, DC, USA), IPDPSW '13, IEEE Computer Society, 2013, pp. 1107--1116. Google Scholar
Digital Library
- Alastair Murray and Björn Franke, Compiling for automatically generated instruction set extensions, Proceedings of the Tenth International Symposium on Code Generation and Optimization (New York, NY, USA), CGO '12, ACM, 2012, pp. 13--22. Google Scholar
Digital Library
- Dorit Nuzman, Sergei Dyshel, Erven Rohou, Ira Rosen, Kevin Williams, David Yuste, Albert Cohen, and Ayal Zaks, Vapor SIMD: Auto-vectorize Once, Run Everywhere, Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (Washington, DC, USA), CGO '11, IEEE Computer Society, 2011, pp. 151--160. Google Scholar
Digital Library
- Dorit Nuzman and Ayal Zaks, Outer-loop vectorization: Revisited for short SIMD architectures, Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (New York, NY, USA), PACT '08, ACM, 2008, pp. 2--11. Google Scholar
Digital Library
- Gilles Pokam, Stéphane Bihan, Julien Simonnet, and François Bodin, SWARP: a retargetable preprocessor for multimedia instructions, Concurrency and Computation: Practice and Experience 16 (2004), no. 2--3, 303--318. Google Scholar
Digital Library
- N. Sreraman and R. Govindarajan, A vectorizing compiler for multimedia extensions, Int. J. Parallel Program. 28 (2000), no. 4, 363--400. Google Scholar
Cross Ref
- Christian Tenllado, Luis Piñuel, Manuel Prieto, Francisco Tirado, and F. Catthoor, Improving superword level parallelism support in modern compilers, Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (New York, NY, USA), CODES+ISSS '05, ACM, 2005, pp. 303--308. Google Scholar
Digital Library
- Victoria Zhislina, From ARM NEON to Intel SSE -- the automatic porting solution, tips and tricks, Intel Developer Zone, February 2014.Google Scholar
Index Terms
Free Rider: A Tool for Retargeting Platform-Specific Intrinsic Functions
Recommendations
Free Rider: A Source-Level Transformation Tool for Retargeting Platform-Specific Intrinsic Functions
Special Issue on LCETES 2015, Special Issue on ACSD 2015 and Special Issue on Embedded Devise Forensics and SecurityShort-vector Simd and Dsp instructions are popular extensions to common Isas. These extensions deliver excellent performance and compact code for some compute-intensive applications, but they require specialized compiler support. To enable the ...
Free Rider: A Tool for Retargeting Platform-Specific Intrinsic Functions
LCTES'15: Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROMShort-vector SIMD and DSP instructions are popular extensions to common Isas. These extensions deliver excellent performance and compact code for some compute-intensive applications, but they require specialised compiler support. To enable the ...
A Retargetable Static Binary Translator for the ARM Architecture
Machines designed with new but incompatible Instruction Set Architecture (ISA) may lack proper applications. Binary translation can address this incompatibility by migrating applications from one legacy ISA to a new one, although binary translation has ...







Comments