Abstract
Short-vector Simd and Dsp instructions are popular extensions to common Isas. These extensions deliver excellent performance and compact code for some compute-intensive applications, but they require specialized compiler support. To enable the programmer to explicitly request the use of such an instruction, many C compilers provide platform-specific intrinsic functions, whose implementation is handled specially by the compiler. The use of such intrinsics, however, inevitably results in nonportable code. In this article, we develop a novel methodology for retargeting such nonportable code, which maps intrinsics from one platform to another, taking advantage of similar intrinsics on the target platform. We employ a description language to specify the signature and semantics of intrinsics and perform graph-based pattern matching and high-level code transformations to derive optimized implementations exploiting the target’s intrinsics, wherever possible. We demonstrate the effectiveness of our new methodology, implemented in the Free Rider tool, by automatically retargeting benchmarks derived from OpenCV samples and a complex embedded application optimized to run on an Arm Cortex-M4 to an Intel Edison module with Sse4.2 instructions (and vice versa). We achieve a speedup of up to 3.73 over a plain C baseline, and on average 96.0% of the speedup of manually ported and optimized versions of the benchmarks.
- ARM Ltd. 2010. Cortex™-M4 Devices Generic User Guide. ARM Ltd. Retrieved from http://infocenter.arm.com.Google Scholar
- D. Batten, S. Jinturkar, J. Glossner, M. Schulte, and P. D’Arcy. 2000. A new approach to DSP intrinsic functions. In Proceedings of the 33rd Annual Hawaii International Conference on System Sciences. Vol. 1. 10 pp. Google Scholar
Digital Library
- G. Bradski. 2000. The OpenCV library. Dr. Dobb’s Journal of Software Tools (2000).Google Scholar
- J. Ceng, W. Sheng, M. Hohenauer, R. Leupers, G. Ascheid, H. Meyr, and G. Braun. 2006. Modeling instruction semantics in ADL processor descriptions for C compiler retargeting. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 43, 2--3 (2006), 235--246. Google Scholar
Digital Library
- L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. 2004. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 10 (Oct. 2004), 1367--1372. Google Scholar
Digital Library
- S. Guelton. 2010. SAC: An Efficient Retargetable Source-to-Source Compiler for Multimedia Instruction Sets.Google Scholar
- W. Jiang, C. Mei, B. Huang, J. Li, J. Zhu, B. Zang, and C. Zhu. 2005. Boosting the performance of multimedia applications using SIMD instructions. In Compiler Construction, Rastislav Bodik (Ed.). Lecture Notes in Computer Science, Vol. 3443. Springer, Berlin, 59--75. Google Scholar
Digital Library
- G. Koharchik and K. Jones. 2012. An Introduction to GCC Compiler Intrinsics in Vector Processing. Linux Journal (Sept. 2012). Retrieved from http://www.linuxjournal.com/content/introduction-gcc-compiler-intrinsics-vector-processing.Google Scholar
- A. Krall and S. Lelait. 2000. Compilation techniques for multimedia processors. International Journal on Parallel Programming 28, 4 (Aug. 2000), 347--361.Google Scholar
Cross Ref
- V. Lipets, N. Vanetik, and E. Gudes. 2009. Subsea: An efficient heuristic algorithm for subgraph isomorphism. Data Mining and Knowledge Discovery 19, 3 (2009), 320--350.Google Scholar
Cross Ref
- S. Manilov, B. Franke, A. Magrath, and C. Andrieu. 2015. Free Rider: A tool for retargeting platform-specific intrinsic functions. In Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM (LCTES’15). ACM, New York, NY, Article 5, 10 pages. Google Scholar
Digital Library
- L. Meier, P. Tanskanen, L. Heng, G. H. Lee, F. Fraundorfer, and M. Pollefeys. 2012. PIXHAWK: A micro aerial vehicle design for autonomous flight using onboard computer vision. Autonomous Robots 33 (2012), 21--39. Google Scholar
Digital Library
- G. Mitra, B. Johnston, A. P. Rendell, E. McCreath, and J. Zhou. 2013. Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW’13). IEEE Computer Society, Washington, DC, 1107--1116. Google Scholar
Digital Library
- A. Murray and B. Franke. 2012. Compiling for automatically generated instruction set extensions. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 13--22. Google Scholar
Digital Library
- D. Nuzman, S. Dyshel, E. Rohou, I. Rosen, K. Williams, D. Yuste, A. Cohen, and A. Zaks. 2011. Vapor SIMD: Auto-vectorize once, run everywhere. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE Computer Society, Washington, DC, 151--160. http://dl.acm.org/citation.cfm?id=2190025.2190062 Google Scholar
Digital Library
- D. Nuzman and A. Zaks. 2008. Outer-loop vectorization: Revisited for short SIMD architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 2--11. Google Scholar
Digital Library
- G. Pokam, S. Bihan, J. Simonnet, and F. Bodin. 2004. SWARP: A retargetable preprocessor for multimedia instructions. Concurrency and Computation: Practice and Experience 16, 2--3 (2004), 303--318. Google Scholar
Digital Library
- N. Sreraman and R. Govindarajan. 2000. A vectorizing compiler for multimedia extensions. International Journal of Parallel Programming 28, 4 (Aug. 2000), 363--400.Google Scholar
Cross Ref
- C. Tenllado, L. Piñuel, M. Prieto, F. Tirado, and F. Catthoor. 2005. Improving superword level parallelism support in modern compilers. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). ACM, New York, NY, 303--308. Google Scholar
Digital Library
- V. Zhislina. 2014. From ARM NEON to Intel SSE -- the automatic porting solution, tips and tricks. Intel Developer Zone. (Feb. 2014). http://software.intel.com/en-us/blogs/2012/12/12/from-arm-neon-to-intel-mmxsse-automatic-porting-solution-tips-and-tricks.Google Scholar
Index Terms
Free Rider: A Source-Level Transformation Tool for Retargeting Platform-Specific Intrinsic Functions
Recommendations
Free Rider: A Tool for Retargeting Platform-Specific Intrinsic Functions
LCTES '15Short-vector SIMD and DSP instructions are popular extensions to common Isas. These extensions deliver excellent performance and compact code for some compute-intensive applications, but they require specialised compiler support. To enable the ...
Free Rider: A Tool for Retargeting Platform-Specific Intrinsic Functions
LCTES'15: Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROMShort-vector SIMD and DSP instructions are popular extensions to common Isas. These extensions deliver excellent performance and compact code for some compute-intensive applications, but they require specialised compiler support. To enable the ...
Combining source-to-source transformations and processor instruction set extensions for the automated design-space exploration of embedded systems
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsIndustry's demand for flexible embedded solutions providing high performance and short time-to-market has led to the development of configurable and extensible processors. These pre-verified application-specific processors build on proven baseline cores ...






Comments