Abstract
This paper presents the open-source COREMU, a scalable and portable parallel emulation framework that decouples the complexity of parallelizing full-system emulators from building a mature sequential one. The key observation is that CPU cores and devices in current (and likely future) multiprocessors are loosely-coupled and communicate through well-defined interfaces. Based on this observation, COREMU emulates multiple cores by creating multiple instances of existing sequential emulators, and uses a thin library layer to handle the inter-core and device communication and synchronization, to maintain a consistent view of system resources. COREMU also incorporates lightweight memory transactions, feedback-directed scheduling, lazy code invalidation and adaptive signal control to provide scalable performance. To make COREMU useful in practice, we also provide some preliminary tools and APIs that can help programmers to diagnose performance problems and (concurrency) bugs. A working prototype, which reuses the widely-used QEMU as the sequential emulator, is with only 2500 lines of code (LOCs) changes to QEMU. It currently supports x64 and ARM platforms, and can emulates up to 255 cores running commodity OSes with practical performance, while QEMU cannot scale above 32 cores. A set of performance evaluation against QEMU indicates that, COREMU has negligible uniprocessor emulation overhead, performs and scales significantly better than QEMU. We also show how COREMU could be used to diagnose performance problems and concurrency bugs of both OS kernel and parallel applications.
- http://davmac.org/davpage/linux/rtsignals.html.Google Scholar
- Kvm/qemu. http://wiki.qemu.org/KVM.Google Scholar
- R. Bedichek. SimNow: Fast Platform Simulation Purely in Software. In 16th Hot Chips Symp, 2004.Google Scholar
- C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. PACT, pages 72--81, 2008. Google Scholar
Digital Library
- Bochs. http://bochs.sourceforge.net/.Google Scholar
- P. Bohrer, J. Peterson, M. Elnozahy, R. Rajamony, A. Gheith, R. Rockhold, C. Lefurgy, H. Shafi, T. Nakra, R. Simpson, et al. Mambo: a full system simulator for the PowerPC architecture. ACM SIGMETRICS Performance Evaluation Review, 31(4):8--12, 2004. Google Scholar
Digital Library
- H. Cain, K. Lepak, B. Schwartz, and M. Lipasti. Precise and accurate processor simulation. In Workshop on Computer Architecture Evaluation using Commercial Workload, 2002.Google Scholar
- E. Chung, E. Nurvitadhi, J. Hoe, B. Falsafi, and K. Mai. A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs. In Proc. FPGA, pages 77--86, 2008. Google Scholar
Digital Library
- E. Chung, M. Papamichael, E. Nurvitadhi, J. Hoe, K. Mai, and B. Falsafi. ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs. ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2009. Google Scholar
Digital Library
- J. Chung, M. Dalton, H. Kannan, and C. Kozyrakis. Thread-safe dynamic binary translation using transactional memory. In IEEE HPCA, 2008.Google Scholar
- G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08, pages 121--130, 2008. Google Scholar
Digital Library
- T. L. Harris, K. Fraser, and I. A. Pratt. A practical multi-word compare-and-swap operation. In Proc. DISC, pages 265--279, 2002. Google Scholar
Digital Library
- J. Henning. SPEC CPU2000: Measuring CPU performance in the new millennium. Computer, 33(7):28--35, 2000. Google Scholar
Digital Library
- R. Lantz. Parallel SimOS - Performance and Scalability for Large System. PhD thesis, Computer Systems Laboratory, Stanford University, 2007.Google Scholar
- P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hllberg, J. Hgberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A full system simulation platform. Computer, pages 50--58, 2002. Google Scholar
Digital Library
- P. Magnusson and B. Werner. Efficient memory simulation in SimICS. In Proc. Annual Simulation Symposium, pages 62--73, 1995. Google Scholar
Digital Library
- M. Martin, D. Sorin, B. Beckmann, M. Marty, M. Xu, A. Alameldeen, K. Moore, M. Hill, and D. Wood. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Computer Architecture News, 33(4):99, 2005. Google Scholar
Digital Library
- M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In Proc. PODC, pages 267--275, 1996. Google Scholar
Digital Library
- J. Miller, H. Kasture, G. Kurian, C. Gruenwald III, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A Distributed Parallel Simulator for Multicores. In Proc. HPCA, 2010.Google Scholar
Cross Ref
- A. Over, B. Clarke, and P. E. Strazdins. A comparison of two approaches to parallel simulation of multiprocessors. In Proc. ISPASS, pages 12--22, 2007.Google Scholar
Cross Ref
- QEMU. http://qemu.org/.Google Scholar
- C. Ranger, R. Raghuraman, A. Penmetsa, G. R. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In HPCA, pages 13--24, 2007. Google Scholar
Digital Library
- M. Rosenblum, S. Herrod, E. Witchel, and A. Gupta. Complete computer system simulation: The SimOS approach. IEEE Parallel & Distributed Technology: Systems & Applications, 3(4):34--43, 1995. Google Scholar
Digital Library
- A. Tridgell. Dbench filesystem benchmark. http://dbench.samba.org/.Google Scholar
- V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In Proceedings of the 3rd conference on Virtual Machine Research And Technology. USENIX Association, 2004. Google Scholar
Digital Library
- K. Wang, Y. Zhang, H. Wang, and X. Shen. Parallelization of IBM mambo system simulator in functional modes. SIGOPS Operating System Review, 2008. Google Scholar
Digital Library
- S. Wee, J. Casper, N. Njoroge, Y. Tesylar, D. Ge, C. Kozyrakis, and K. Olukotun. A practical FPGA-based framework for novel CMP research. In Proc. FPGA, pages 116--125, 2007. Google Scholar
Digital Library
- T. F. Wenisch, R. E. Wunderlich, M. Ferdman, A. Ailamaki, B. Falsafi, and J. C. Hoe. Simflex: Statistical sampling of computer system simulation. IEEE Micro, 26(4):18--31, 2006. Google Scholar
Digital Library
- E. Witchel and M. Rosenblum. Embra: Fast and flexible machine simulation. ACM SIGMETRICS Performance Evaluation Review, 24(1):68--79, 1996. Google Scholar
Digital Library
- R. E. Wunderlich, T. F. Wenisch, B. Falsafi, and J. C. Hoe. Statistical sampling of microarchitecture simulation. ACM Trans. Model. Comput. Simul., 16(3):197--224, 2006. Google Scholar
Digital Library
- D. Yeh, L.-S. Peh, S. Borkar, J. A. Darringer, A. Agarwal, and W. mei Hwu. Thousand-core chips {roundtable}. IEEE Design & Test of Computers, 25(3):272--278, 2008. Google Scholar
Digital Library
- M. Yourst. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proc. ISPASS, pages 23--34, 2007.Google Scholar
Cross Ref
- G. Zheng, G. Kakulapati, and L. V. Kalé. Bigsim: A parallel simulator for performance prediction of extremely large parallel machines. In IPDPS. IEEE Computer Society, 2004.Google Scholar
Index Terms
COREMU: a scalable and portable parallel full-system emulator
Recommendations
COREMU: a scalable and portable parallel full-system emulator
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingThis paper presents the open-source COREMU, a scalable and portable parallel emulation framework that decouples the complexity of parallelizing full-system emulators from building a mature sequential one. The key observation is that CPU cores and ...
Scalable deterministic replay in a parallel full-system emulator
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingFull-system emulation has been an extremely useful tool in developing and debugging systems software like operating systems and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the ...
Scalable deterministic replay in a parallel full-system emulator
PPoPP '13Full-system emulation has been an extremely useful tool in developing and debugging systems software like operating systems and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the ...







Comments