Abstract
Full-system emulation has been an extremely useful tool in developing and debugging systems software like operating systems and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the reproducibility of concurrency bugs that is indispensable for analyzing and debugging the essentially multi-threaded systems software.
This paper analyzes the challenges in supporting deterministic replay in parallel full-system emulators and makes a comprehensive study on the sources of non-determinism. Unlike application-level replay systems, our system, called ReEmu, needs to log sources of non-determinism in both the guest software stack and the dynamic binary translator for faithful replay. To provide scalable and efficient record and replay on multicore machines, ReEmu makes several notable refinements to the CREW protocol that replays shared memory systems. First, being aware of the performance bottlenecks in frequent lock operations in the CREW protocol, ReEmu refines the CREW protocol with a seqlock-like design, to avoid serious contention and possible starvation in instrumentation code tracking dependence of racy accesses on a shared memory object. Second, to minimize the required log files, ReEmu only logs minimal local information regarding accesses to a shared memory location, but instead relies on an offline log processing tool to derive precise shared memory dependence for faithful replay. Third, ReEmu adopts an automatic lock clustering mechanism that clusters a set of uncontended memory objects to a bulk to reduce the frequencies of lock operations, which noticeably boost performance.
Our prototype ReEmu is based on our open-source COREMU system and supports scalable and efficient record and replay of full-system environments (both x64 and ARM). Performance evaluation shows that ReEmu has very good performance scalability on an Intel multicore machine. It incurs only 68.9% performance overhead on average (ranging from 51.8% to 94.7%) over vanilla COREMU to record five PARSEC benchmarks running on a 16-core emulated system.
- G. Altekar and I. Stoica. ODR: output-deterministic replay for multicore debugging. In Proc. SOSP, 2009. Google Scholar
Digital Library
- F. Bellard. Qemu, a fast and portable dynamic translator. In Proc. USENIX ATC, 2005. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. PACT, 2008. Google Scholar
Digital Library
- T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. In Proc. SOSP, 1995. Google Scholar
Digital Library
- P. Courtois, F. Heymans, and D. Parnas. Concurrent control with readers and writers. Comm. of the ACM, 14(10):667--668, 1971. Google Scholar
Digital Library
- J. Ding, P. Chang, W. Hsu, and Y. Chung. PQEMU: A parallel system emulator based on QEMU. In Proc. ICPADS, 2011. Google Scholar
Digital Library
- G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: enabling intrusion analysis through virtual-machine logging and replay. In Proc. OSDI, 2002. Google Scholar
Digital Library
- G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen. Execution replay of multiprocessor virtual machines. In Proc. VEE, 2008. Google Scholar
Digital Library
- D. Hong, C. Hsu, P. Yew, J. Wu, W. Hsu, P. Liu, C. Wang, and Y. Chung. HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores. In Proc. CGO, 2012. Google Scholar
Digital Library
- J. Huang, P. Liu, and C. Zhang. LEAP: lightweight deterministic multi-processor replay of concurrent java programs. In Proc. SIG-SOFT FSE, 2010. Google Scholar
Digital Library
- S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In Proc. USENIX ATC, 2005. Google Scholar
Digital Library
- O. Laadan, N. Viennot, and J. Nieh. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In Proc. SIGMETRICS, 2010. Google Scholar
Digital Library
- R. Lantz. Parallel SimOS - Performance and Scalability for Large System. PhD thesis, Stanford University, 2007.Google Scholar
- T. Leblanc and J. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. Computers, IEEE Transactions on Computers, C-36(4):471--482, 1987. Google Scholar
Digital Library
- D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: efficient online multiprocessor replayvia speculation and external determinism. In Proc. ASPLOS, 2010. Google Scholar
Digital Library
- M. McLoughlin. The qcow image format, 2008.Google Scholar
- S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Proc. ISCA, 2005. Google Scholar
Digital Library
- S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. Lee, and S. Lu. PRES: probabilistic replay with execution sketching on multiprocessors. In Proc. SOSP, 2009. Google Scholar
Digital Library
- H. Patil, C. Pereira, M. Stallcup, G. Lueck, and J. Cownie. PinPlay: a framework for deterministic replay and reproducible analysis of parallel programs. In Proc. CGO, 2010. Google Scholar
Digital Library
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In Proc. HPCA, pages 13--24, 2007. Google Scholar
Digital Library
- K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: parallelizing sequential logging and replay. In Proc. ASPLOS, 2011. Google Scholar
Digital Library
- Z. Wang, R. Liu, Y. Chen, X. Wu, H. Chen, Z. W., and B. Zang. Coremu: a scalable and portable parallel full-systememulator. In Proc. PPoPP, 2011. Google Scholar
Digital Library
- M. Xu, R. Bodik, and M. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In Proc. ISCA, 2003. Google Scholar
Digital Library
- M. Xu, V. Malyugin, J. Sheldon, G. Venkitachalam, and B.Weissman. Retrace: Collecting execution trace with virtual machine deterministic replay. In Proceedings of the Third Annual Workshop on Modeling, Benchmarking and Simulation, 2007.Google Scholar
- Z. Yang, M. Yang, L. Xu, H. Chen, and B. Zang. ORDER: object centric deterministic replay for java. In Proc. USENIX ATC, 2011. Google Scholar
Digital Library
Index Terms
Scalable deterministic replay in a parallel full-system emulator
Recommendations
COREMU: a scalable and portable parallel full-system emulator
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingThis paper presents the open-source COREMU, a scalable and portable parallel emulation framework that decouples the complexity of parallelizing full-system emulators from building a mature sequential one. The key observation is that CPU cores and ...
Scalable deterministic replay in a parallel full-system emulator
PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programmingFull-system emulation has been an extremely useful tool in developing and debugging systems software like operating systems and hypervisors. However, current full-system emulators lack the support for deterministic replay, which limits the ...
Live migration of virtual machine based on full system trace and replay
HPDC '09: Proceedings of the 18th ACM international symposium on High performance distributed computingLive migration of virtual machines (VM) across distinct physical hosts provides a significant new benefit for administrators of data centers and clusters. Previous migration schemes focused on transferring the runtime memory state of the VM. Those ...







Comments