skip to main content
research-article
Open Access
Artifacts Evaluated & Functional

Design, implementation, and application of GPU-based Java bytecode interpreters

Published:10 October 2019Publication History
Skip Abstract Section

Abstract

We present the design and implementation of GVM, the first system for executing Java bytecode entirely on GPUs. GVM is ideal for applications that execute a large number of short-living tasks, which share a significant fraction of their codebase and have similar execution time. GVM uses novel algorithms, scheduling, and data layout techniques to adapt to the massively parallel programming and execution model of GPUs. We apply GVM to generate and execute tests for Java projects. First, we implement a sequence-based test generation on top of GVM and design novel algorithms to avoid redundant test sequences. Second, we use GVM to execute randomly generated test cases. We evaluate GVM by comparing it with two existing Java bytecode interpreters (Oracle JVM and Java Pathfinder), as well as with the Oracle JVM with just-in-time (JIT) compiler, which has been engineered and optimized for over twenty years. Our evaluation shows that sequence-based test generation on GVM outperforms both Java Pathfinder and Oracle JVM interpreter. Additionally, our results show that GVM performs as well as running our parallel sequence-based test generation algorithm using JVM with JIT with many CPU threads. Furthermore, our evaluation on several classes from open-source projects shows that executing randomly generated tests on GVM outperforms sequential execution on JVM interpreter and JVM with JIT.

References

  1. Shoaib Akram, Jennifer B Sartor, Kenzo Van Craeynest, Wim Heirman, and Lieven Eeckhout. 2016. Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors. Transactions on Architecture and Code Optimization 13, 1 (2016), 4.Google ScholarGoogle Scholar
  2. Amazon. 2018. Amazon EC2 Elastic GPUs. https://aws.amazon.com/ec2/elastic-gpus/ .Google ScholarGoogle Scholar
  3. Saswat Anand, Corina S. Păsăreanu, and Willem Visser. 2007. JPF-SE: A Symbolic Execution Extension to Java Pathfinder. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. 134–138.Google ScholarGoogle ScholarCross RefCross Ref
  4. Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J. Rossbach, and Onur Mutlu. 2017. Mosaic: A GPU Memory Manager with Application-transparent Support for Multiple Page Sizes. In International Symposium on Microarchitecture . 136–150.Google ScholarGoogle Scholar
  5. Rachata Ausavarungnirun, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gandhi, Adwait Jog, Christopher J. Rossbach, and Onur Mutlu. 2018. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency. In International Conference on Architectural Support for Programming Languages and Operating Systems. 503–518.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Azure. 2018. Azure Windows VM sizes - GPU. https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu .Google ScholarGoogle Scholar
  7. Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, Michael Kruse, Chandan Reddy, Sven Verdoolaege, Adam Betts, Alastair F Donaldson, Jeroen Ketema, Javed Absar, Sven van Haastregt, Alexey Kravets, Anton Lokhmotov, Robert David, and Elnar Hajiyev. 2015. Pencil: A Platform-Neutral Compute Intermediate Language for Accelerator Programming. In International Conference on Parallel Architecture and Compilation. 138–149.Google ScholarGoogle Scholar
  8. Michael Bauer, Sean Treichler, Elliott Slaughter, and Alex Aiken. 2012. Legion: Expressing Locality and Independence with Logical Regions. In International Conference on High Performance Computing, Networking, Storage and Analysis. 66:1–66:11.Google ScholarGoogle Scholar
  9. João Bispo, Luís Reis, and João M. P. Cardoso. 2015. C and OpenCL Generation from MATLAB. In Symposium on Applied Computing . 1315–1320.Google ScholarGoogle Scholar
  10. David Blythe. 2006. The Direct3D 10 system. ACM Trans. Graph. 25, 3 (2006), 724–734.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Denis Bogdanas and Grigore Roşu. 2015. K-Java: A Complete Semantics of Java. In Symposium on Principles of Programming Languages . 445–456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K.J. Brown, A.K. Sujeeth, H.J. Lee, T. Rompf, H. Chafi, and K. Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In International Conference on Parallel Architectures and Compilation Techniques. 89–100.Google ScholarGoogle Scholar
  13. Bryan Catanzaro, Michael Garland, and Kurt Keutzer. 2010. Copperhead: Compiling an Embedded Data Parallel Language. Technical Report UCB/EECS-2010-124. EECS Department, University of California, Berkeley. http://www.eecs.berkeley. edu/Pubs/TechRpts/2010/EECS-2010-124.htmlGoogle ScholarGoogle Scholar
  14. Ahmet Celik, Sreepathi Pai, Sarfraz Khurshid, and Milos Gligoric. 2017. Bounded Exhaustive Test-Input Generation on GPUs. In Conference on Object-Oriented Programming, Systems, Languages, and Applications. 94:1–94:25.Google ScholarGoogle Scholar
  15. Manuel MT Chakravarty, Gabriele Keller, Sean Lee, Trevor L McDonell, and Vinod Grover. 2011. Accelerating Haskell Array Codes with Multicore GPUs. In Workshop on Declarative Aspects of Multicore Programming. 3–14.Google ScholarGoogle Scholar
  16. Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. 2011. Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers. In International Conference for High Performance Computing, Networking, Storage and Analysis . 9:1–9:12.Google ScholarGoogle Scholar
  17. Christophe Dubach, Perry Cheng, Rodric Rabbah, David F. Bacon, and Stephen J. Fink. 2012. Compiling a High-level Language for GPUs: (via Language Support for Architectures and Compilers). In Conference on Programming Language Design and Implementation . 1–12.Google ScholarGoogle Scholar
  18. Chucky Ellison and Grigore Roşu. 2012. An Executable Formal Semantics of C with Applications. In Symposium on Principles of Programming Languages . 533–544.Google ScholarGoogle Scholar
  19. Naila Farooqui, Christopher J. Rossbach, Yuan Yu, and Karsten Schwan. 2014. Leo: A Profile-Driven Dynamic Optimization Framework for GPU Applications. In International Conference on Timely Results in Operating Systems. 5–5.Google ScholarGoogle Scholar
  20. Patrice Godefroid. 1997. Model Checking for Programming Languages Using VeriSoft. In Symposium on Principles of Programming Languages . 174–186.Google ScholarGoogle Scholar
  21. Google. 2018. Graphics Processing Unit (GPU): Leverage GPUs on Google Cloud for Machine Learning and Scientific Computing. https://cloud.google.com/gpu/ .Google ScholarGoogle Scholar
  22. Java GPU. 2019. Java GPU Code Archive. https://code.google.com/archive/p/java-gpu .Google ScholarGoogle Scholar
  23. Kate Gregory and Ade Miller. 2012. C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++. Microsoft Press.Google ScholarGoogle Scholar
  24. Akihiro Hayashi, Max Grossman, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2013. Accelerating Habanero-Java Programs with OpenCL Generation. In International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools . 124–134.Google ScholarGoogle Scholar
  25. Amir Hormati, Yoonseo Choi, Manjunath Kudlur, Rodric M. Rabbah, Trevor N. Mudge, and Scott A. Mahlke. 2009. Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures. In International Conference on Parallel Architectures and Compilation Techniques . 214–223.Google ScholarGoogle Scholar
  26. Java Pathfinder. 2019. Java Pathfinder Home Page. https://github.com/javapathfinder/jpf-core .Google ScholarGoogle Scholar
  27. JCuda. 2018. Java Bindings for CUDA. https://www.jcuda.org/jcuda/JCuda.html .Google ScholarGoogle Scholar
  28. Jikes RVM. 2019. Jikes RVM Home Page. https://www.jikesrvm.org .Google ScholarGoogle Scholar
  29. Andrew Kerr, Gregory F. Diamos, and Sudhakar Yalamanchili. 2009. A Characterization and Analysis of PTX Kernels. In International Symposium on Workload Characterization . 3–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Khronos. 2019. OpenCL Overview. https://www.khronos.org/opencl .Google ScholarGoogle Scholar
  31. Chang Hwan Peter Kim, Darko Marinov, Sarfraz Khurshid, Don Batory, Sabrina Souto, Paulo Barros, and Marcelo d’Amorim. 2013. SPLat: Lightweight Dynamic Analysis for Reducing Combinatorics in Testing Configurable Systems. 257–267.Google ScholarGoogle Scholar
  32. James C. King. 1976. Symbolic Execution and Program Testing. Commun. ACM 19, 7 (1976), 385–394.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. 2012. PyCUDA and PyOpenCL: A Scripting-based Approach to GPU Run-time Code Generation. Parallel Comput. 38, 3 (2012), 157–174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ivan Kuraj, Viktor Kuncak, and Daniel Jackson. 2015. Programming with Enumerable Sets of Structures. In Conference on Object-Oriented Programming, Systems, Languages, and Applications . 37–56.Google ScholarGoogle Scholar
  35. Victor W. Lee, Changkyu Kim, Jatin Chhugani, Michael Deisher, Daehyun Kim, Anthony D. Nguyen, Nadathur Satish, Mikhail Smelyanskiy, Srinivas Chennupaty, Per Hammarlund, Ronak Singhal, and Pradeep Dubey. 2010. Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU. In International Symposium on Computer Architecuture . 451–460.Google ScholarGoogle Scholar
  36. Martin Maas, Philip Reames, Jeffrey Morlan, Krste Asanović, Anthony D. Joseph, and John Kubiatowicz. 2012. GPUs as an Opportunity for Offloading Garbage Collection. In International Symposium on Memory Management. 25–36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Madanlal Musuvathi and Shaz Qadeer. 2007. Iterative Context Bounding for Systematic Testing of Multithreaded Programs. In Conference on Programming Language Design and Implementation. 446–455.Google ScholarGoogle Scholar
  38. Hung Viet Nguyen, Christian Kästner, and Tien N. Nguyen. 2014. Exploring Variability-aware Execution for Testing Plugin-based Web Applications. In International Conference on Software Engineering. 907–918.Google ScholarGoogle Scholar
  39. NVIDIA. 2019. CUDA Zone. https://developer.nvidia.com/cuda-zone .Google ScholarGoogle Scholar
  40. Oracle. 2019a. Java SE at a Glance. https://www.oracle.com/technetwork/java/javase/overview/index.html .Google ScholarGoogle Scholar
  41. Oracle. 2019b. JEP 318: Epsilon: A No-Op Garbage Collector. https://openjdk.java.net/jeps/318 .Google ScholarGoogle Scholar
  42. Oracle. 2019c. JNI APIs and Developer Guides. https://docs.oracle.com/javase/8/docs/technotes/guides/jni .Google ScholarGoogle Scholar
  43. Oracle. 2019d. OpenJDK Project Sumatra. http://openjdk.java.net/projects/sumatra .Google ScholarGoogle Scholar
  44. Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-Directed Random Test Generation. In International Conference on Software Engineering. 75–84.Google ScholarGoogle Scholar
  45. Shoumik Palkar, James J. Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimarjan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk, Saman P. Amarasinghe, Samuel Madden, and Matei Zaharia. 2018. Evaluating End-to-End Optimization for Data Analytics Applications in Weld. Proceedings of the VLDB Endowment 11, 9 (2018), 1002–1015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Jonathan Power, Mark D Hill, and David A Wood. 2014. Supporting x86-64 Address Translation for 100s of GPU Lanes. In International Symposium on High Performance Computer Architecture . 568–578.Google ScholarGoogle ScholarCross RefCross Ref
  47. Michael Pradel and Thomas R. Gross. 2012. Fully Automatic and Precise Detection of Thread Safety Violations. In Conference on Programming Language Design and Implementation . 521–530.Google ScholarGoogle Scholar
  48. Ashwin Prasad, Jayvant Anantpur, and R. Govindarajan. 2011. Automatic Compilation of MATLAB Programs for Synergistic Execution on Heterogeneous Processors. In Conference on Programming Language Design and Implementation. 152–163.Google ScholarGoogle Scholar
  49. Philip C. Pratt-Szeliga, James W. Fawcett, and Roy D. Welch. 2012. Rootbeer: Seamlessly Using GPUs from Java. In International Conference on High Performance Computing and Communication . 375–380.Google ScholarGoogle Scholar
  50. Jonathan Ragan-Kelley, Andrew Adams, Dillon Sharlet, Connelly Barnes, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2017. Halide: Decoupling Algorithms from Schedules for High-performance Image Processing. Commun. ACM 61, 1 (2017), 106–115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Christopher J. Rossbach, Jon Currey, Mark Silberstein, Baishakhi Ray, and Emmett Witchel. 2011. PTask: Operating System Abstractions to Manage GPUs As Compute Devices. In Symposium on Operating Systems Principles. 233–248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Christopher J. Rossbach, Yuan Yu, Jon Currey, Jean-Philippe Martin, and Dennis Fetterly. 2013. Dandelion: a Compiler and Runtime for Heterogeneous Systems. In Symposium on Operating Systems Principles. 49–68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Rohan Sharma, Milos Gligoric, Andrea Arcuri, Gordon Fraser, and Darko Marinov. 2011. Testing Container Classes: Random or Systematic? 262–277.Google ScholarGoogle Scholar
  54. William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A Language for Streaming Applications. In International Conference on Compiler Construction . 179–196.Google ScholarGoogle Scholar
  55. Xinmin Tian, Hideki Saito, Ernesto Su, Jin Lin, Satish Guggilla, Diego Caballero, Matt Masten, Andrew Savonichev, Michael Rice, Elena Demikhovsky, Ayal Zaks, Gil Rapaport, Abhinav Gaba, Vasileios Porpodas, and Eric Garcia. 2017. LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization. In Workshop on the LLVM Compiler Infrastructure in HPC . 4.Google ScholarGoogle Scholar
  56. Jan Vesely, Arkaprava Basu, Mark Oskin, Gabriel H. Loh, and Abhishek Bhattacharjee. 2016. Observations and Opportunities in Architecting Shared Virtual Memory for Heterogeneous Systems. In International Symposium on Performance Analysis of Systems and Software . 161–171.Google ScholarGoogle Scholar
  57. Willem Visser, Klaus Havelund, Guillaume Brat, Seungjoon Park, and Flavio Lerda. 2003. Model Checking Programs. Automated Software Engineering 10, 2 (2003), 203–232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Willem Visser, Corina S. Pˇasˇareanu, and Radek Pelánek. 2006. Test Input Generation for Java Containers Using State Matching. In International Symposium on Software Testing and Analysis. 37–48.Google ScholarGoogle Scholar
  59. Mikhail Vorontsov. 2019. Java Performance Tuning Guide. http://java-performance.info/over-32g-heap-java .Google ScholarGoogle Scholar
  60. Yaron Weinsberg, Danny Dolev, Tal Anker, Muli Ben-Yehuda, and Pete Wyckoff. 2008. Tapping into the Fountain of CPUs: On Operating System Support for Programmable Devices. In International Conference on Architectural Support for Programming Languages and Operating Systems . 179–188.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Anton Wijs and Dragan Bošnački. 2014. GPUexplore: Many-core On-the-fly State Space Exploration Using GPUs. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems . 233–247.Google ScholarGoogle Scholar
  62. Anton Wijs, Thomas Neele, and Dragan Bošnački. 2016. GPUexplore 2.0: Unleashing GPU Explicit-state Model Checking. In International Symposium on Formal Methods. 694–701.Google ScholarGoogle ScholarCross RefCross Ref
  63. Yonghong Yan, Max Grossman, and Vivek Sarkar. 2009. JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA. In European Conference on Parallel Processing. 887–899.Google ScholarGoogle Scholar
  64. Vanya Yaneva, Ajitha Rajan, and Christophe Dubach. 2017. Compiler-assisted Test Acceleration on GPUs for Embedded Software. In International Symposium on Software Testing and Analysis. 35–45.Google ScholarGoogle Scholar
  65. Wojciech Zaremba, Yuan Lin, and Vinod Grover. 2012. JaBEE: Framework for Object-oriented Java Bytecode Compilation and Execution on Graphics Processor Units. In Workshop on General Purpose Processing with Graphics Processing Units. 74–83.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Design, implementation, and application of GPU-based Java bytecode interpreters

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!