Abstract
Heterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores support different instruction set architectures, and the processor as a whole does not provide coherence between the different cores' local memories.
We present Hera-JVM, an implementation of the Java Virtual Machine which operates over the Cell processor, thereby making this platforms more readily accessible to mainstream developers. Hera-JVM supports the full Java language; threads from an unmodified Java application can be simultaneously executed on both the main PowerPC-based core and on the additional SPE accelerator cores. Migration of threads between these cores is transparent from the point of view of the application, requiring no modification to Java source code or bytecode. Hera-JVM supports the existing Java Memory Model, even though the underlying hardware does not provide cache coherence between the different core types.
We examine Hera-JVM's performance under a series of real-world Java benchmarks from the SpecJVM, Java Grande and Dacapo benchmark suites. These benchmarks show a wide variation in relative performance on the different core types of the Cell processor, depending upon the nature of their workload. Execution of these benchmarks on Hera-JVM can achieve speedups of up to 2.25x by using one of the Cell processor's SPE accelerator cores, compared to execution on the main PowerPC-based core. When all six SPE cores are exploited, parallel workloads can achieve speedups of up to 13x compared to execution on the single PowerPC core.
- }}M. Adiletta, M. Rosenbluth, D. Bernstein, G. Wolrich, and H. Wilkinson. The Next Generation of Intel IXP Network Processors. Intel Tech. Journal, 6(3), 2002.Google Scholar
- }}T. Ainsworth and T. Pinkston. Characterizing the Cell EIB On-Chip Network. IEEE Micro, 27(5):6--14, 2007. Google Scholar
Digital Library
- }}B. Alpern, S. Augart, S. Blackburn, M. Butrico, A. Cocchi, P. Cheng, J. Dolby, S. Fink, D. Grove, M. Hind, et al. The Jikes Research Virtual Machine project: building an open-source research community. IBM Systems Journal, 44(2):399--417, 2005. Google Scholar
Digital Library
- }}G. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the Spring Joint Computer Conference, pages 483--485, 1967. Google Scholar
Digital Library
- }}S. Blackburn, R. Garner, C. Hoffmann, A. Khang, K. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Guyer, et al. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 21st Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA'06), pages 169--190, 2006. Google Scholar
Digital Library
- }}T. Chen, R. Raghavan, J. N. Dale, and E. Iwata. Cell Broadband Engine Architecture and its First Implementation: A Performance View. IBM Journal of Research and Development, 51(5):559--572, 2007. Google Scholar
Digital Library
- }}A. Donaldson, C. Riley, A. Lokhmotov, and A. Cook. Auto-parallelisation of Sieve C++ programs. Lecture Notes in Computer Science, 4854:18, 2008. Google Scholar
Digital Library
- }}M. Hill and M. Marty. Amdahl's Law in the Multicore Era. Computer, 41(7):33--38, 2008. Google Scholar
Digital Library
- }}H. Hofstee. Power efficient processor architecture and the cell processor.11th International Symposium on High-Performance Computer Architecture (HPCA-11), pages 258--262, 2005. Google Scholar
Digital Library
- }}J. Manson, W. Pugh, and S. V. Adve. The Java Memory Model. In Proceedings of the 32nd Symposium on Principles of Programming Languages (POPL'05), pages 378--391, 2005. Google Scholar
Digital Library
- }}J. A. Mathew, P. D. Coddington, and K. A. Hawick. Analysis and development of Java Grande benchmarks. In JAVA '99: Proceedings of the ACM 1999 conference on Java Grande, pages 72--80. ACM, 1999. Google Scholar
Digital Library
- }}R. McIlroy. Using Program Behaviour to Exploit Heterogeneous Multi-Core Processors. PhD thesis, Department of Computing Science, The University of Glasgow, 2010.Google Scholar
- }}R. McIlroy and J. Sventek. Hera-JVM: Abstracting Processor Heterogeneity Behind a Virtual Machine. In Workshop on Hot Topics in Operating Systems (HotOS), 2009. Google Scholar
Digital Library
- }}A. Munshi. The OpenCL Specification. Khronos OpenCL Working Group, 2009.Google Scholar
Cross Ref
- }}A. Noll, A. Gal, and M. Franz. CellVM: A Homogeneous Virtual Machine Runtime System for a Heterogeneous Single-Chip Multiprocessor. In Workshop on Cell Systems and Applications, June 2008.Google Scholar
- }}J. Perez, P. Bellens, R. Badia, and J. Labarta. CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development, 51(5):593--604, 2007. Google Scholar
Digital Library
- }}D. Pham, S. Asano, M. Bolliger, M. Day, H. Hofstee, C. Johns, J. Kahle, A. Kameyama, J. Keaty, Y. Masubuchi, et al. The design and implementation of a first-generation CELL processor. IEEE Solid-State Circuits Conference, 2005.Google Scholar
Cross Ref
- }}S. Ryoo, C. Rodrigues, S. Baghsorkhi, S. Stone, D. Kirk, and W. Hwu. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In Proceedings of the 13th Symposium on Principles and Practice of Parallel Programming (PPoPP'08), pages 73--82, 2008. Google Scholar
Digital Library
- }}J. Saez, M. Prieto, A. Fedorova, and S. Blagodurov. A Comprehensive Scheduler for Asymmetric Multicore Processors. In Proceedings of EuroSys'10, 2010. Google Scholar
Digital Library
- }}B. Saha, A. Adl-Tabatabai, A. Ghuloum, M. Rajagopalan, R. Hudson, L. Petersen, V. Menon, B. Murphy, T. Shpeisman, E. Sprangle, et al. Enabling scalability and performance in a large scale CMP environment. In Proceedings of EuroSys'07, pages 73--86, 2007. Google Scholar
Digital Library
- }}K. Shiv, K. Chow, Y. Wang, and D. Petrochenko. SPECjvm2008 Performance Characterization. In Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking, pages 17--35. Springer, 2009. Google Scholar
Digital Library
- }}L. Smith, J. Bull, and J. Obdrizalek. A parallel Java Grande benchmark suite. In Proceedings of the Conference on Super-computing (SC'01), 2001. Google Scholar
Digital Library
- }}R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, and M. Scott. Cashmere-2L: software coherent shared memory on a clustered remote-write network. In Proceedings of the 16th Symposium on Operating Systems Principles (SOSP'97), pages 170--183, 1997. Google Scholar
Digital Library
Index Terms
Hera-JVM: a runtime system for heterogeneous multi-core architectures
Recommendations
Hera-JVM: a runtime system for heterogeneous multi-core architectures
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsHeterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores support different instruction set architectures, and the processor as a ...
Hera-JVM: abstracting processor heterogeneity behind a virtual machine
HotOS'09: Proceedings of the 12th conference on Hot topics in operating systemsHeterogeneous multi-core processors, such as the Cell processor, can deliver exceptional performance, however, they are notoriously difficult to program effectively. We present Hera-JVM, a runtime system which hides a processor's heterogeneity behind a ...
MCMG simulator
Accurate simulation is vital for the proper design and evaluation of any computing architecture. Researchers seek unified simulation frameworks that can model heterogeneous architectures like CPU and GPU devices and their interactions as computing ...







Comments