skip to main content
research-article

COREMU: a scalable and portable parallel full-system emulator

Published:12 February 2011Publication History
Skip Abstract Section

Abstract

This paper presents the open-source COREMU, a scalable and portable parallel emulation framework that decouples the complexity of parallelizing full-system emulators from building a mature sequential one. The key observation is that CPU cores and devices in current (and likely future) multiprocessors are loosely-coupled and communicate through well-defined interfaces. Based on this observation, COREMU emulates multiple cores by creating multiple instances of existing sequential emulators, and uses a thin library layer to handle the inter-core and device communication and synchronization, to maintain a consistent view of system resources. COREMU also incorporates lightweight memory transactions, feedback-directed scheduling, lazy code invalidation and adaptive signal control to provide scalable performance. To make COREMU useful in practice, we also provide some preliminary tools and APIs that can help programmers to diagnose performance problems and (concurrency) bugs. A working prototype, which reuses the widely-used QEMU as the sequential emulator, is with only 2500 lines of code (LOCs) changes to QEMU. It currently supports x64 and ARM platforms, and can emulates up to 255 cores running commodity OSes with practical performance, while QEMU cannot scale above 32 cores. A set of performance evaluation against QEMU indicates that, COREMU has negligible uniprocessor emulation overhead, performs and scales significantly better than QEMU. We also show how COREMU could be used to diagnose performance problems and concurrency bugs of both OS kernel and parallel applications.

References

  1. http://davmac.org/davpage/linux/rtsignals.html.Google ScholarGoogle Scholar
  2. Kvm/qemu. http://wiki.qemu.org/KVM.Google ScholarGoogle Scholar
  3. R. Bedichek. SimNow: Fast Platform Simulation Purely in Software. In 16th Hot Chips Symp, 2004.Google ScholarGoogle Scholar
  4. C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. PACT, pages 72--81, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bochs. http://bochs.sourceforge.net/.Google ScholarGoogle Scholar
  6. P. Bohrer, J. Peterson, M. Elnozahy, R. Rajamony, A. Gheith, R. Rockhold, C. Lefurgy, H. Shafi, T. Nakra, R. Simpson, et al. Mambo: a full system simulator for the PowerPC architecture. ACM SIGMETRICS Performance Evaluation Review, 31(4):8--12, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Cain, K. Lepak, B. Schwartz, and M. Lipasti. Precise and accurate processor simulation. In Workshop on Computer Architecture Evaluation using Commercial Workload, 2002.Google ScholarGoogle Scholar
  8. E. Chung, E. Nurvitadhi, J. Hoe, B. Falsafi, and K. Mai. A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs. In Proc. FPGA, pages 77--86, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Chung, M. Papamichael, E. Nurvitadhi, J. Hoe, K. Mai, and B. Falsafi. ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Chung, M. Dalton, H. Kannan, and C. Kozyrakis. Thread-safe dynamic binary translation using transactional memory. In IEEE HPCA, 2008.Google ScholarGoogle Scholar
  11. G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08, pages 121--130, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. T. L. Harris, K. Fraser, and I. A. Pratt. A practical multi-word compare-and-swap operation. In Proc. DISC, pages 265--279, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer, 33(7):28--35, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Lantz. Parallel SimOS - Performance and Scalability for Large System. PhD thesis, Computer Systems Laboratory, Stanford University, 2007.Google ScholarGoogle Scholar
  15. P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hllberg, J. Hgberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, pages 50--58, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Magnusson and B. Werner. Efficient memory simulation in SimICS. In Proc. Annual Simulation Symposium, pages 62--73, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Computer Architecture News, 33(4):99, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proc. PODC, pages 267--275, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Miller, H. Kasture, G. Kurian, C. Gruenwald III, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A Distributed Parallel Simulator for Multicores. In Proc. HPCA, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. Over, B. Clarke, and P. E. Strazdins. A comparison of two approaches to parallel simulation of multiprocessors. In Proc. ISPASS, pages 12--22, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  21. QEMU. http://qemu.org/.Google ScholarGoogle Scholar
  22. C. Ranger, R. Raghuraman, A. Penmetsa, G. R. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA, pages 13--24, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. Complete computer system simulation: The SimOS approach. IEEE Parallel & Distributed Technology: Systems & Applications, 3(4):34--43, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Tridgell. Dbench filesystem benchmark. http://dbench.samba.org/.Google ScholarGoogle Scholar
  25. V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In Proceedings of the 3rd conference on Virtual Machine Research And Technology. USENIX Association, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Wang, Y. Zhang, H. Wang, and X. Shen. Parallelization of IBM mambo system simulator in functional modes. SIGOPS Operating System Review, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Wee, J. Casper, N. Njoroge, Y. Tesylar, D. Ge, C. Kozyrakis, and K. Olukotun. A practical FPGA-based framework for novel CMP research. In Proc. FPGA, pages 116--125, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. F. Wenisch, R. E. Wunderlich, M. Ferdman, A. Ailamaki, B. Falsafi, and J. C. Hoe. Simflex: Statistical sampling of computer system simulation. IEEE Micro, 26(4):18--31, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. Witchel and M. Rosenblum. Embra: Fast and flexible machine simulation. ACM SIGMETRICS Performance Evaluation Review, 24(1):68--79, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. Statistical sampling of microarchitecture simulation. ACM Trans. Model. Comput. Simul., 16(3):197--224, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Yeh, L.-S. Peh, S. Borkar, J. A. Darringer, A. Agarwal, and W. mei Hwu. Thousand-core chips {roundtable}. IEEE Design & Test of Computers, 25(3):272--278, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Yourst. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proc. ISPASS, pages 23--34, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  33. G. Zheng, G. Kakulapati, and L. V. Kalé. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In IPDPS. IEEE Computer Society, 2004.Google ScholarGoogle Scholar

Index Terms

  1. COREMU: a scalable and portable parallel full-system emulator

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 46, Issue 8
        PPoPP '11
        August 2011
        300 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2038037
        Issue’s Table of Contents
        • cover image ACM Conferences
          PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
          February 2011
          326 pages
          ISBN:9781450301190
          DOI:10.1145/1941553
          • General Chair:
          • Calin Cascaval,
          • Program Chair:
          • Pen-Chung Yew

        Copyright © 2011 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 February 2011

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!