skip to main content
research-article

Parallelization of IBM mambo system simulator in functional modes

Published:01 January 2008Publication History
Skip Abstract Section

Abstract

Mambo [4] is IBM's full-system simulator which models PowerPC systems, and provides a complete set of simulation tools to help IBM and its partners in pre-hardware development and performance evaluation for future systems. Currently Mambo simulates target systems on a single host thread. When the number of cores increases in a target system, Mambo's simulation performance for each core goes down. As the so-called "multi-core era" approaches, both target and host systems will have more and more cores. It is very important for Mambo to efficiently simulate a multi-core target system on a multi-core host system. Parallelization is a natural method to speed up Mambo under this situation.

Parallel Mambo (P-Mambo) is a multi-threaded implementation of Mambo. Mambo's simulation engine is implemented as a user-level thread-scheduler. We propose a multi-scheduler method to adapt Mambo's simulation engine to multi-threaded execution. Based on this method a core-based module partition is proposed to achieve both high inter-scheduler parallelism and low inter-scheduler dependency. Protection of shared resources is crucial to both correctness and performance of P-Mambo. Since there are two tiers of threads in P-Mambo, protecting shared resources by only OS-level locks possibly introduces deadlocks due to user-level context switch. We propose a new lock mechanism to handle this problem. Since Mambo is an on-going project with many modules currently under development, co-existence with new modules is also important to P-Mambo. We propose a global-lock-based method to guarantee compatibility of P-Mambo with future Mambo modules.

We have implemented the first version of P-Mambo in functional modes. The performance of P-Mambo has been evaluated on the OpenMP implementation of NAS Parallel Benchmark (NPB) 3.2 [12]. Preliminary experimental results show that P-Mambo achieves an average speedup of 3.4 on a 4-core host machine.

References

  1. L. R. Bachega, J. R. Brunheroto, L. DeRose, P. Mindlin, and J. E. Moreira. The BlueGene/L Pseudo Cycle-accurate Simulator. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Bellard. QEMU, a Fast and Portable Dynamic Translator. USENIX 2005 Annual Technical Conference, FREENIX Track, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News (CAN), September 2005.Google ScholarGoogle Scholar
  4. P. Bohrer, M. Elnozahy, A. Gheith, C. Lefurgy, T. Nakra, J. Peterson, R. Rajamony, R. Rockhold, H. Shafi, R. Simpson, E. Speight, K. Sudeep, E. V. Hensbergen, and L. Zhang. Mambo -- A Full System Simulator for the PowerPC Architecture. ACM SIGMETRICS Performance Evaluation Review, 31(4):8--12, March 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Burger, T. M. Austin, and S. Bennett. Evaluating Future Microprocessors: The SimpleScalar Tool Set. Technical Report CS-TR-1996-1308, 1996.Google ScholarGoogle Scholar
  6. L. Ceze, K. Strauss, G. Almasi, P. J. Bohrer, J. R. Brunheroto, C. Cascaval, J. G. Castanos, D. Lieber, X. Martorell, J. E. Moreira, A. Sanomiya, and E. Schenfeld. Full Circle: Simulating Linux Clusters on Linux Clusters. In Proceedings of the Fourth LCI International Conference on Linux Clusters: The HPC Revolution 2003, June 2003.Google ScholarGoogle Scholar
  7. D. Chiou, D. Sunwoo, J. Kim, N. Patil, W. Reinhart, E. Johnson, J. Keefe, and H. Angepat. FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Corporation. The PowerPC Architecture: A Specification for a New Family of Processors. Morgan Kaufmann Publishers, Inc., 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Ebcioglu and E. R. Altman. DAISY: Dynamic Compilation for 100% Architectural Compatibility. In Proceedings of 24th Annual International Symposium on Computer Architecture, pages 26--37, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50--58, February 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Njoroge, J. Casper, S. Wee, Y. Teslyar, D. Ge, C. Kozyrakis, and K. Olukotun. ATLAS: A Chip-Multiprocessor with Transactional Memory Support. In Proceedings of the Conference on Design Automation and Test in Europe (DATE), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. NPB. NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.Google ScholarGoogle Scholar
  13. M. Rosenblum, S. A. Herrod, E. Witchel, and A. Gupta. Complete Computer System Simulation: The SimOS Approach. IEEE parallel and distributed technology: systems and applications, 3(4):34--43, Winter 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Shafi, P. J. Bohrer, J. Phelan, C. A. Rusu, and J. L. Peterson. Design and validation of a performance and power simulator for PowerPC systems. IBM Journal of Research and Development, 47(5--6):641--651, September 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. B. Team. An Overview of the BlueGene/L Supercomputer. In Proceedings of ACM/IEEE Conference on Supercomputing, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Wee, J. Casper, N. Njoroge, Y. Teslyar, D. Ge, C. Kozyrakis, and K. Olukotun. A Practical FPGA-based Framework for Novel CMP Research. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. E. Witchel and M. Rosenblum. Embra: Fast and Flexible Machine Simulation. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parallelization of IBM mambo system simulator in functional modes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!