skip to main content
tutorial

Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation

Published:08 April 2017Publication History
Skip Abstract Section

Abstract

Computer systems are increasingly featuring powerful parallel devices with the advent of many-core CPUs and GPUs. This offers the opportunity to solve computationally-intensive problems at a fraction of the time traditional CPUs need. However, exploiting heterogeneous hardware requires the use of low-level programming language approaches such as OpenCL, which is incredibly challenging, even for advanced programmers.

On the application side, interpreted dynamic languages are increasingly becoming popular in many domains due to their simplicity, expressiveness and flexibility. However, this creates a wide gap between the high-level abstractions offered to programmers and the low-level hardware-specific interface. Currently, programmers must rely on high performance libraries or they are forced to write parts of their application in a low-level language like OpenCL. Ideally, nonexpert programmers should be able to exploit heterogeneous hardware directly from their interpreted dynamic languages.

In this paper, we present a technique to transparently and automatically offload computations from interpreted dynamic languages to heterogeneous devices. Using just-in-time compilation, we automatically generate OpenCL code at runtime which is specialized to the actual observed data types using profiling information. We demonstrate our technique using R, which is a popular interpreted dynamic language predominately used in big data analytic. Our experimental results show the execution on a GPU yields speedups of over 150x compared to the sequential FastR implementation and the obtained performance is competitive with manually written GPU code. We also show that when taking into account start-up time, large speedups are achievable, even when the applications run for as little as a few seconds.

References

  1. S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. IISWC 2009.Google ScholarGoogle Scholar
  2. G. Duboscq, T. Würthinger, L. Stadler, C. Wimmer, D. Simon, and H. Mössenböck. Graal IR: An Intermediate Representation for Speculative Optimizations in a Dynamic Compiler. VMIL 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. J. Fumero, T. Remmelg, M. Steuwer, and C. Dubach. Runtime Code Generation and Data Management for Heterogeneous Computing in Java. PPPJ 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. J. Fumero, M. Steuwer, and C. Dubach. A Composable Array Function Interface for Heterogeneous Computing in Java. ARRAY, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Futamura. Partial Evaluation of Computation Process--An Approach to a Compiler-Compiler. Higher-Order and Symbolic Computation, 1999.Google ScholarGoogle Scholar
  6. A. Gal, C. W. Probst, and M. Franz. HotpathVM: An Effective JIT Compiler for Resource-constrained Devices. VEE 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. U. Hölzle, C. Chambers, and D. Ungar. Debugging optimized code with dynamic deoptimization. PLDI 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Ishizaki, A. Hayashi, G. Koblents, and V. Sarkar. Compiling and optimizing java 8 programs for gpu execution. In PACT, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Kalibera, P. Maj, F. Morandat, and J. Vitek. A Fast Abstract Syntax Tree Interpreter for R. VEE 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M.-J. Kallen and H. Mühleisen. Latest developments around renjin. Talk at R Summit & Workshop, Copenhagen, 2015.Google ScholarGoogle Scholar
  11. M. N. Kedlaya, B. Robatmili, C. Caşcaval, and B. Hardekopf. Deoptimization for Dynamic Language JITs on Typed, Stack-based Virtual Machines. VEE 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. Kotzmann, C. Wimmer, H. Mössenböck, T. Rodriguez, K. Russell, and D. Cox. Design of the Java HotSpot&Trade; Client Compiler for Java 6. ACM Trans. Archit. Code Optim.Google ScholarGoogle Scholar
  13. S. K. Lam, A. Pitrou, and S. Seibert. Numba: A LLVM-based Python JIT Compiler. LLVM 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Paleczny, C. Vick, and C. Click. The java hotspottm server compiler. JVM' 2001.Google ScholarGoogle Scholar
  15. U. Pitambare, A. Chauhan, and S. Malviya. Just-in-time Acceleration of JavaScript. In Technical Report, School of Informatics and Computing, Indiana University, 2013.Google ScholarGoogle Scholar
  16. P. C. Pratt-Szeliga, J. W. Fawcett, and R. D. Welch. Rootbeer: Seamlessly Using GPUs from Java. HPCC-ICESS, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Rupp. GPU-Accelerated Non-negative Matrix Factorization for Text Mining. page 77, 2012.Google ScholarGoogle Scholar
  18. L. Stadler, A. Welc, C. Humer, and M. Jordan. Optimizing R Language Execution via Aggressive Speculation. DLS 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Stadler, T. Würthinger, and H. Mössenböck. Partial escape analysis and scalar replacement for Java. In CGO, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Talbot, Z. DeVito, and P. Hanrahan. Riposte: A Trace-driven Compiler and Parallel VM for Vector Code in R. PACT '12, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Wang, D. Padua, and P. Wu. Vectorization of Apply to Reduce Interpretation Overhead of R. OOPSLA 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Wang, P. Wu, and D. Padua. Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization. CGO 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Würthinger, C. Wimmer, A. Wöß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D. Simon, and M. Wolczko. One VM to Rule Them All. Onward! 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. Zaremba, Y. Lin, and V. Grover. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. GPGPU-5, 2012.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 52, Issue 7
    VEE '17
    July 2017
    256 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3140607
    Issue’s Table of Contents
    • cover image ACM Conferences
      VEE '17: Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments
      April 2017
      261 pages
      ISBN:9781450349482
      DOI:10.1145/3050748

    Copyright © 2017 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 8 April 2017

    Check for updates

    Qualifiers

    • tutorial
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!