skip to main content
research-article

Hera-JVM: a runtime system for heterogeneous multi-core architectures

Published:17 October 2010Publication History
Skip Abstract Section

Abstract

Heterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores support different instruction set architectures, and the processor as a whole does not provide coherence between the different cores' local memories.

We present Hera-JVM, an implementation of the Java Virtual Machine which operates over the Cell processor, thereby making this platforms more readily accessible to mainstream developers. Hera-JVM supports the full Java language; threads from an unmodified Java application can be simultaneously executed on both the main PowerPC-based core and on the additional SPE accelerator cores. Migration of threads between these cores is transparent from the point of view of the application, requiring no modification to Java source code or bytecode. Hera-JVM supports the existing Java Memory Model, even though the underlying hardware does not provide cache coherence between the different core types.

We examine Hera-JVM's performance under a series of real-world Java benchmarks from the SpecJVM, Java Grande and Dacapo benchmark suites. These benchmarks show a wide variation in relative performance on the different core types of the Cell processor, depending upon the nature of their workload. Execution of these benchmarks on Hera-JVM can achieve speedups of up to 2.25x by using one of the Cell processor's SPE accelerator cores, compared to execution on the main PowerPC-based core. When all six SPE cores are exploited, parallel workloads can achieve speedups of up to 13x compared to execution on the single PowerPC core.

References

  1. }}M. Adiletta, M. Rosenbluth, D. Bernstein, G. Wolrich, and H. Wilkinson. The Next Generation of Intel IXP Network Processors. Intel Tech. Journal, 6(3), 2002.Google ScholarGoogle Scholar
  2. }}T. Ainsworth and T. Pinkston. Characterizing the Cell EIB On-Chip Network. IEEE Micro, 27(5):6--14, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}B. Alpern, S. Augart, S. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, et al. The Jikes Research Virtual Machine project: building an open-source research community. IBM Systems Journal, 44(2):399--417, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}G. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the Spring Joint Computer Conference, pages 483--485, 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}S. Blackburn, R. Garner, C. Hoffmann, A. Khang, K. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Guyer, et al. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA'06), pages 169--190, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}T. Chen, R. Raghavan, J. N. Dale, and E. Iwata. Cell Broadband Engine Architecture and its First Implementation: A Performance View. IBM Journal of Research and Development, 51(5):559--572, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}A. Donaldson, C. Riley, A. Lokhmotov, and A. Cook. Auto-parallelisation of Sieve C++ programs. Lecture Notes in Computer Science, 4854:18, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}M. Hill and M. Marty. Amdahl's Law in the Multicore Era. Computer, 41(7):33--38, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}H. Hofstee. Power efficient processor architecture and the cell processor.11th International Symposium on High-Performance Computer Architecture (HPCA-11), pages 258--262, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}J. Manson, W. Pugh, and S. V. Adve. The Java Memory Model. In Proceedings of the 32nd Symposium on Principles of Programming Languages (POPL'05), pages 378--391, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}J. A. Mathew, P. D. Coddington, and K. A. Hawick. Analysis and development of Java Grande benchmarks. In JAVA '99: Proceedings of the ACM 1999 conference on Java Grande, pages 72--80. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}R. McIlroy. Using Program Behaviour to Exploit Heterogeneous Multi-Core Processors. PhD thesis, Department of Computing Science, The University of Glasgow, 2010.Google ScholarGoogle Scholar
  13. }}R. McIlroy and J. Sventek. Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine. In Workshop on Hot Topics in Operating Systems (HotOS), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}A. Munshi. The OpenCL Specification. Khronos OpenCL Working Group, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  15. }}A. Noll, A. Gal, and M. Franz. CellVM: A Homogeneous Virtual Machine Runtime System for a Heterogeneous Single-Chip Multiprocessor. In Workshop on Cell Systems and Applications, June 2008.Google ScholarGoogle Scholar
  16. }}J. Perez, P. Bellens, R. Badia, and J. Labarta. CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development, 51(5):593--604, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}D. Pham, S. Asano, M. Bolliger, M. Day, H. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, et al. The design and implementation of a first-generation CELL processor. IEEE Solid-State Circuits Conference, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  18. }}S. Ryoo, C. Rodrigues, S. Baghsorkhi, S. Stone, D. Kirk, and W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th Symposium on Principles and Practice of Parallel Programming (PPoPP'08), pages 73--82, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}J. Saez, M. Prieto, A. Fedorova, and S. Blagodurov. A Comprehensive Scheduler for Asymmetric Multicore Processors. In Proceedings of EuroSys'10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}B. Saha, A. Adl-Tabatabai, A. Ghuloum, M. Rajagopalan, R. Hudson, L. Petersen, V. Menon, B. Murphy, T. Shpeisman, E. Sprangle, et al. Enabling scalability and performance in a large scale CMP environment. In Proceedings of EuroSys'07, pages 73--86, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}K. Shiv, K. Chow, Y. Wang, and D. Petrochenko. SPECjvm2008 Performance Characterization. In Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking, pages 17--35. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}L. Smith, J. Bull, and J. Obdrizalek. A parallel Java Grande benchmark suite. In Proceedings of the Conference on Super-computing (SC'01), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, and M. Scott. Cashmere-2L: software coherent shared memory on a clustered remote-write network. In Proceedings of the 16th Symposium on Operating Systems Principles (SOSP'97), pages 170--183, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Hera-JVM: a runtime system for heterogeneous multi-core architectures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!