skip to main content
10.1145/1508244.1508254acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Capo: a software-hardware interface for practical deterministic multiprocessor replay

Authors Info & Claims
Published:07 March 2009Publication History

ABSTRACT

While deterministic replay of parallel programs is a powerful technique, current proposals have shortcomings. Specifically, software-based replay systems have high overheads on multiprocessors, while hardware-based proposals focus only on basic hardware-level mechanisms, ignoring the overall replay system. To be practical, hardware-based replay systems need to support an environment with multiple parallel jobs running concurrently -- some being recorded, others being replayed and even others running without recording or replay. Moreover, they need to manage limited-size log buffers.

This paper addresses these shortcomings by introducing, for the first time, a set of abstractions and a software-hardware interface for practical hardware-assisted replay of multiprocessor systems. The approach, called Capo, introduces the novel abstraction of the Replay Sphere to separate the responsibilities of the hardware and software components of the replay system. In this paper, we also design and build CapoOne, a prototype of a deterministic multiprocessor replay system that implements Capo using Linux and simulated DeLorean hardware. Our evaluation of 4-processor executions shows that CapoOne largely records with the efficiency of hardware-based schemes and the flexibility of software-based schemes.

References

  1. H. Agrawal, R. A. DeMillo, and E. H. Spafford, "An Execution-Backtracking Approach to Debugging," IEEE Software, vol. 8, May 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. F. Bacon and S. C. Goldstein, "Hardware-Assisted Replay of Multiprocessor Programs," in Workshop on Parallel and Distributed Debugging, August 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Boothe, "Efficient Algorithms for Bidirectional Debugging," in Conference on Programming Language Design and Implementation, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. C. Bressoud and F. B. Schneider, "Hypervisor-Based Fault-Tolerance," in Symposium on Operating Systems Principles, December 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Ceze, J. M. Tuck, P. Montesinos, and J. Torrellas, "BulkSC: Bulk Enforcement of Sequential Consistency," in International Symposium on Computer Architecture, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S.-K. Chen, W. K. Fuchs, and J.-Y. Chung, "Reversible Debugging Using Program Instrumentation," IEEE Transactions on Software Engineering, vol. 27, August 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. W. Dunlap, S. T. King, S. Cinar, M. Basrai, and P. M. Chen, "ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay," in Symposium on Operating Systems Design and Implementation, December 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. W. Dunlap, D. G. Lucchetti, M. A. Fetterman, and P. M. Chen, "Execution Replay of Multiprocessor Virtual Machines," in International Conference on Virtual Execution Environments, March 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. I. Feldman and C. B. Brown, "IGOR: A System for Program Debugging Via Reversible Execution," in Workshop on Parallel and Distributed Debugging, November 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Forin, "Debugging of Heterogeneous Parallel Systems," in Workshop on Parallel and Distributed Debugging, May 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Hitz, J. Lau, and M. Malcolm, "File System Design for an NFS File Server Appliance," in USENIX Technical Conference, January 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. R. Hower and M. D. Hill, "Rerun: Exploiting Episodes for Lightweight Memory Race Recording," in International Symposium on Computer Architecture, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Choi and H. Srinivasan, "Deterministic Replay of Java Multithreaded Applications," in Symposium on Parallel and Distributed Tools, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Joshi, S. T. King, G. W. Dunlap, and P. M. Chen, "Detecting Past and Present Intrusions Through Vulnerability-Specific Predicates," in Symposium on Operating Systems Principles, October 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. T. King and P. M. Chen, "Backtracking Intrusions," in Symposium on Operating Systems Principles, October 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. T. King, G. W. Dunlap, and P. M. Chen, "Debugging Operating Systems with Time-Traveling Virtual Machines," in USENIX Technical Conference, April 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. J. LeBlanc and J. M. Mellor-Crummey, "Debugging Parallel Programs with Instant Replay," IEEE Transactions on Computers, vol. 36, April 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A Full System Simulation Platform," IEEE Computer, vol. 35, no. 2, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Montesinos, L. Ceze, and J. Torrellas, "DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently," in International Symposium on Computer Architecture, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Narayanasamy, C. Pereira, and B. Calder, "Recording Shared Memory Dependencies Using Strata," in International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Narayanasamy, G. Pokam, and B. Calder, "BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging," in International Symposium on Computer Architecture, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. H. B. Netzer, "Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs," in Workshop on Parallel and Distributed Debugging, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Z. Pan and M. A. Linton, "Supporting Reverse Execution for Parallel Programs," in Workshop on Parallel and Distributed Debugging, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Russinovich and B. Cogswell, "Replay for Concurrent Non-Deterministic Shared-Memory Applications," in Conference on Programming Language Design and Implementation, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. S. Santry, M. J. Feeley, N. C. Hutchinson, A. C. Veitch, R. W. Carton, and J. Ofir, "Deciding When to Forget in the Elephant File System," in Symposium on Operating Systems Principles, December 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Srinivasan, S. Kandula, C. Andrews, and Y. Zhou, "Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging," in USENIX Technical Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Xu, R. Bodik, and M. D. Hill, "A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay," in International Symposium on Computer Architecture, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Xu, R. Bodik, and M. D. Hill, "A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording," in International Conference on Architectural Support for Programming Languages and Operating Systems, October 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. V. Zelkowitz, "Reversible Execution," Communications of the ACM, vol. 16, September 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Capo: a software-hardware interface for practical deterministic multiprocessor replay

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!