ABSTRACT
While deterministic replay of parallel programs is a powerful technique, current proposals have shortcomings. Specifically, software-based replay systems have high overheads on multiprocessors, while hardware-based proposals focus only on basic hardware-level mechanisms, ignoring the overall replay system. To be practical, hardware-based replay systems need to support an environment with multiple parallel jobs running concurrently -- some being recorded, others being replayed and even others running without recording or replay. Moreover, they need to manage limited-size log buffers.
This paper addresses these shortcomings by introducing, for the first time, a set of abstractions and a software-hardware interface for practical hardware-assisted replay of multiprocessor systems. The approach, called Capo, introduces the novel abstraction of the Replay Sphere to separate the responsibilities of the hardware and software components of the replay system. In this paper, we also design and build CapoOne, a prototype of a deterministic multiprocessor replay system that implements Capo using Linux and simulated DeLorean hardware. Our evaluation of 4-processor executions shows that CapoOne largely records with the efficiency of hardware-based schemes and the flexibility of software-based schemes.
- H. Agrawal, R. A. DeMillo, and E. H. Spafford, "An Execution-Backtracking Approach to Debugging," IEEE Software, vol. 8, May 1991. Google Scholar
Digital Library
- D. F. Bacon and S. C. Goldstein, "Hardware-Assisted Replay of Multiprocessor Programs," in Workshop on Parallel and Distributed Debugging, August 1991. Google Scholar
Digital Library
- B. Boothe, "Efficient Algorithms for Bidirectional Debugging," in Conference on Programming Language Design and Implementation, June 2000. Google Scholar
Digital Library
- T. C. Bressoud and F. B. Schneider, "Hypervisor-Based Fault-Tolerance," in Symposium on Operating Systems Principles, December 1995. Google Scholar
Digital Library
- L. Ceze, J. M. Tuck, P. Montesinos, and J. Torrellas, "BulkSC: Bulk Enforcement of Sequential Consistency," in International Symposium on Computer Architecture, June 2007. Google Scholar
Digital Library
- S.-K. Chen, W. K. Fuchs, and J.-Y. Chung, "Reversible Debugging Using Program Instrumentation," IEEE Transactions on Software Engineering, vol. 27, August 2001. Google Scholar
Digital Library
- G. W. Dunlap, S. T. King, S. Cinar, M. Basrai, and P. M. Chen, "ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay," in Symposium on Operating Systems Design and Implementation, December 2002. Google Scholar
Digital Library
- G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen, "Execution Replay of Multiprocessor Virtual Machines," in International Conference on Virtual Execution Environments, March 2008. Google Scholar
Digital Library
- S. I. Feldman and C. B. Brown, "IGOR: A System for Program Debugging Via Reversible Execution," in Workshop on Parallel and Distributed Debugging, November 1988. Google Scholar
Digital Library
- A. Forin, "Debugging of Heterogeneous Parallel Systems," in Workshop on Parallel and Distributed Debugging, May 1988. Google Scholar
Digital Library
- D. Hitz, J. Lau, and M. Malcolm, "File System Design for an NFS File Server Appliance," in USENIX Technical Conference, January 1994. Google Scholar
Digital Library
- D. R. Hower and M. D. Hill, "Rerun: Exploiting Episodes for Lightweight Memory Race Recording," in International Symposium on Computer Architecture, June 2008. Google Scholar
Digital Library
- J. Choi and H. Srinivasan, "Deterministic Replay of Java Multithreaded Applications," in Symposium on Parallel and Distributed Tools, August 1998. Google Scholar
Digital Library
- A. Joshi, S. T. King, G. W. Dunlap, and P. M. Chen, "Detecting Past and Present Intrusions Through Vulnerability-Specific Predicates," in Symposium on Operating Systems Principles, October 2005. Google Scholar
Digital Library
- S. T. King and P. M. Chen, "Backtracking Intrusions," in Symposium on Operating Systems Principles, October 2003. Google Scholar
Digital Library
- S. T. King, G. W. Dunlap, and P. M. Chen, "Debugging Operating Systems with Time-Traveling Virtual Machines," in USENIX Technical Conference, April 2005. Google Scholar
Digital Library
- T. J. LeBlanc and J. M. Mellor-Crummey, "Debugging Parallel Programs with Instant Replay," IEEE Transactions on Computers, vol. 36, April 1987. Google Scholar
Digital Library
- P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A Full System Simulation Platform," IEEE Computer, vol. 35, no. 2, 2002. Google Scholar
Digital Library
- P. Montesinos, L. Ceze, and J. Torrellas, "DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently," in International Symposium on Computer Architecture, June 2008. Google Scholar
Digital Library
- S. Narayanasamy, C. Pereira, and B. Calder, "Recording Shared Memory Dependencies Using Strata," in International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006. Google Scholar
Digital Library
- S. Narayanasamy, G. Pokam, and B. Calder, "BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging," in International Symposium on Computer Architecture, June 2005. Google Scholar
Digital Library
- R. H. B. Netzer, "Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs," in Workshop on Parallel and Distributed Debugging, May 1993. Google Scholar
Digital Library
- D. Z. Pan and M. A. Linton, "Supporting Reverse Execution for Parallel Programs," in Workshop on Parallel and Distributed Debugging, 1988. Google Scholar
Digital Library
- M. Russinovich and B. Cogswell, "Replay for Concurrent Non-Deterministic Shared-Memory Applications," in Conference on Programming Language Design and Implementation, May 1996. Google Scholar
Digital Library
- D. S. Santry, M. J. Feeley, N. C. Hutchinson, A. C. Veitch, R. W. Carton, and J. Ofir, "Deciding When to Forget in the Elephant File System," in Symposium on Operating Systems Principles, December 1999. Google Scholar
Digital Library
- S. Srinivasan, S. Kandula, C. Andrews, and Y. Zhou, "Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging," in USENIX Technical Conference, 2004. Google Scholar
Digital Library
- M. Xu, R. Bodik, and M. D. Hill, "A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay," in International Symposium on Computer Architecture, June 2003. Google Scholar
Digital Library
- M. Xu, R. Bodik, and M. D. Hill, "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording," in International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006. Google Scholar
Digital Library
- M. V. Zelkowitz, "Reversible Execution," Communications of the ACM, vol. 16, September 1973. Google Scholar
Digital Library
Index Terms
Capo: a software-hardware interface for practical deterministic multiprocessor replay
Recommendations
Capo: a software-hardware interface for practical deterministic multiprocessor replay
ASPLOS 2009While deterministic replay of parallel programs is a powerful technique, current proposals have shortcomings. Specifically, software-based replay systems have high overheads on multiprocessors, while hardware-based proposals focus only on basic hardware-...
Capo: a software-hardware interface for practical deterministic multiprocessor replay
ASPLOS 2009While deterministic replay of parallel programs is a powerful technique, current proposals have shortcomings. Specifically, software-based replay systems have high overheads on multiprocessors, while hardware-based proposals focus only on basic hardware-...
Transparent mutable replay for multicore debugging and patch validation
ASPLOS '13We present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. This feature, not available in previous record-replay systems, enables powerful new ...








Comments