skip to main content
research-article

Free Rider: A Source-Level Transformation Tool for Retargeting Platform-Specific Intrinsic Functions

Published:12 December 2016Publication History
Skip Abstract Section

Abstract

Short-vector Simd and Dsp instructions are popular extensions to common Isas. These extensions deliver excellent performance and compact code for some compute-intensive applications, but they require specialized compiler support. To enable the programmer to explicitly request the use of such an instruction, many C compilers provide platform-specific intrinsic functions, whose implementation is handled specially by the compiler. The use of such intrinsics, however, inevitably results in nonportable code. In this article, we develop a novel methodology for retargeting such nonportable code, which maps intrinsics from one platform to another, taking advantage of similar intrinsics on the target platform. We employ a description language to specify the signature and semantics of intrinsics and perform graph-based pattern matching and high-level code transformations to derive optimized implementations exploiting the target’s intrinsics, wherever possible. We demonstrate the effectiveness of our new methodology, implemented in the Free Rider tool, by automatically retargeting benchmarks derived from OpenCV samples and a complex embedded application optimized to run on an Arm Cortex-M4 to an Intel Edison module with Sse4.2 instructions (and vice versa). We achieve a speedup of up to 3.73 over a plain C baseline, and on average 96.0% of the speedup of manually ported and optimized versions of the benchmarks.

References

  1. ARM Ltd. 2010. Cortex™-M4 Devices Generic User Guide. ARM Ltd. Retrieved from http://infocenter.arm.com.Google ScholarGoogle Scholar
  2. D. Batten, S. Jinturkar, J. Glossner, M. Schulte, and P. D’Arcy. 2000. A new approach to DSP intrinsic functions. In Proceedings of the 33rd Annual Hawaii International Conference on System Sciences. Vol. 1. 10 pp. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Bradski. 2000. The OpenCV library. Dr. Dobb’s Journal of Software Tools (2000).Google ScholarGoogle Scholar
  4. J. Ceng, W. Sheng, M. Hohenauer, R. Leupers, G. Ascheid, H. Meyr, and G. Braun. 2006. Modeling instruction semantics in ADL processor descriptions for C compiler retargeting. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 43, 2--3 (2006), 235--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. 2004. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 10 (Oct. 2004), 1367--1372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Guelton. 2010. SAC: An Efficient Retargetable Source-to-Source Compiler for Multimedia Instruction Sets.Google ScholarGoogle Scholar
  7. W. Jiang, C. Mei, B. Huang, J. Li, J. Zhu, B. Zang, and C. Zhu. 2005. Boosting the performance of multimedia applications using SIMD instructions. In Compiler Construction, Rastislav Bodik (Ed.). Lecture Notes in Computer Science, Vol. 3443. Springer, Berlin, 59--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Koharchik and K. Jones. 2012. An Introduction to GCC Compiler Intrinsics in Vector Processing. Linux Journal (Sept. 2012). Retrieved from http://www.linuxjournal.com/content/introduction-gcc-compiler-intrinsics-vector-processing.Google ScholarGoogle Scholar
  9. A. Krall and S. Lelait. 2000. Compilation techniques for multimedia processors. International Journal on Parallel Programming 28, 4 (Aug. 2000), 347--361.Google ScholarGoogle ScholarCross RefCross Ref
  10. V. Lipets, N. Vanetik, and E. Gudes. 2009. Subsea: An efficient heuristic algorithm for subgraph isomorphism. Data Mining and Knowledge Discovery 19, 3 (2009), 320--350.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Manilov, B. Franke, A. Magrath, and C. Andrieu. 2015. Free Rider: A tool for retargeting platform-specific intrinsic functions. In Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, Compilers and Tools for Embedded Systems 2015 CD-ROM (LCTES’15). ACM, New York, NY, Article 5, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Meier, P. Tanskanen, L. Heng, G. H. Lee, F. Fraundorfer, and M. Pollefeys. 2012. PIXHAWK: A micro aerial vehicle design for autonomous flight using onboard computer vision. Autonomous Robots 33 (2012), 21--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Mitra, B. Johnston, A. P. Rendell, E. McCreath, and J. Zhou. 2013. Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW’13). IEEE Computer Society, Washington, DC, 1107--1116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Murray and B. Franke. 2012. Compiling for automatically generated instruction set extensions. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Nuzman, S. Dyshel, E. Rohou, I. Rosen, K. Williams, D. Yuste, A. Cohen, and A. Zaks. 2011. Vapor SIMD: Auto-vectorize once, run everywhere. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO’11). IEEE Computer Society, Washington, DC, 151--160. http://dl.acm.org/citation.cfm?id=2190025.2190062 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Nuzman and A. Zaks. 2008. Outer-loop vectorization: Revisited for short SIMD architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 2--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Pokam, S. Bihan, J. Simonnet, and F. Bodin. 2004. SWARP: A retargetable preprocessor for multimedia instructions. Concurrency and Computation: Practice and Experience 16, 2--3 (2004), 303--318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Sreraman and R. Govindarajan. 2000. A vectorizing compiler for multimedia extensions. International Journal of Parallel Programming 28, 4 (Aug. 2000), 363--400.Google ScholarGoogle ScholarCross RefCross Ref
  19. C. Tenllado, L. Piñuel, M. Prieto, F. Tirado, and F. Catthoor. 2005. Improving superword level parallelism support in modern compilers. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). ACM, New York, NY, 303--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Zhislina. 2014. From ARM NEON to Intel SSE -- the automatic porting solution, tips and tricks. Intel Developer Zone. (Feb. 2014). http://software.intel.com/en-us/blogs/2012/12/12/from-arm-neon-to-intel-mmxsse-automatic-porting-solution-tips-and-tricks.Google ScholarGoogle Scholar

Index Terms

  1. Free Rider: A Source-Level Transformation Tool for Retargeting Platform-Specific Intrinsic Functions

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Article Metrics

            • Downloads (Last 12 months)3
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!