Abstract
Developers who set a breakpoint a few statements too late or who are trying to diagnose a subtle bug from a single core dump often wish for a time-traveling debugger. The ability to rewind time to see the exact sequence of statements and program values leading to an error has great intuitive appeal but, due to large time and space overheads, time traveling debuggers have seen limited adoption. A managed runtime, such as the Java JVM or a JavaScript engine, has already paid much of the cost of providing core features - type safety, memory management, and virtual IO - that can be reused to implement a low overhead time-traveling debugger. We leverage this insight to design and build affordable time-traveling debuggers for managed languages. Tardis realizes our design: it provides affordable time-travel with an average overhead of only 7% during normal execution, a rate of 0.6MB/s of history logging, and a worst-case 0.68s time-travel latency on our benchmark applications. Tardis can also debug optimized code using time-travel to reconstruct state. This capability, coupled with its low overhead, makes Tardis suitable for use as the default debugger for managed languages, promising to bring time-traveling debugging into the mainstream and transform the practice of debugging.
- H. Agrawal, R. A. Demillo, and E. H. Spafford. Debugging with dynamic slicing and backtracking. Software Practice and Experience, 1993. Google Scholar
Digital Library
- H. Agrawal and J. R. Horgan. Dynamic program slicing. In PLDI, 1990. Google Scholar
Digital Library
- B. Alpern, C. R. Attanasio, A. Cocchi, D. Lieber, S. Smith, T. Ngo, J. J. Barton, S. F. Hummel, J. C. Sheperd, and M. Mergen. Implementing Jalapeño in Java. In OOPSLA, 1999. Google Scholar
Digital Library
- S. Bhansali, W.-K. Chen, S. de Jong, A. Edwards, R. Murray, M. Drinić, D. Mihočka, and J. Chau. Framework for instruction-level tracing and analysis of program executions. In VEE, 2006. Google Scholar
Digital Library
- S. M. Blackburn, P. Cheng, and K. S. McKinley. Myths and realities: The performance impact of garbage collection. In SIGMETRICS, 2004. Google Scholar
Digital Library
- S. M. Blackburn, R. Garner, C. Hoffmann, A. M. Khang, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanovic, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis (2006-mr2). In OOPSLA, 2006. Google Scholar
Digital Library
- S. M. Blackburn and A. L. Hosking. Barriers: Friend or foe? In ISMM, 2004. Google Scholar
Digital Library
- S. M. Blackburn and K. S. McKinley. Immix:A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In PLDI, 2008. Google Scholar
Digital Library
- M. D. Bond, M. Kulkarni, M. Cao, M. Zhang, M. F. Salmi, S. Biswas, A. Sengupta, and J. Huang. Octet: Capturing and controlling cross-thread dependences efficiently. In OOPSLA, 2013. Google Scholar
Digital Library
- B. Boothe. Efficient algorithms for bidirectional debugging. In PLDI, 2000. Google Scholar
Digital Library
- B. Burg, R. Bailey, A. J. Ko, and M. D. Ernst. Interactive record/replay for web application debugging. In UIST, 2013. Google Scholar
Digital Library
- J. Caballero, G. Grieco, M. Marron, and A. Nappa. Undangle: Early detection of dangling pointers in use-after-free and double-free vulnerabilities. In ISSTA, 2012. Google Scholar
Digital Library
- T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In ISCA, 2012. Google Scholar
Digital Library
- Common compiler infrastructure. http://ccimetadata.codeplex.com.Google Scholar
- Chronon v3.10. http://chrononsystems.com.Google Scholar
- Chronon performance, Visited June 4, 2014. http://www.chrononsystems.com/what-is-chronon/performance.Google Scholar
- Announcing Chronon "DVR for Java", Visited June 4, 2014. http://www.theserverside.com/discussions/thread.tss?thread_id=62697.Google Scholar
- C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In NSDI, 2005. Google Scholar
Digital Library
- J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persistent. Journal of Computer and System Sciences, 1989. Google Scholar
Digital Library
- S. I. Feldman and C. B. Brown. Igor: A system for program debugging via reversible execution. In Parallel and Distributed Debugging, 1988. Google Scholar
Digital Library
- GDB v7. http://www.gnu.org/software/gdb/news/ reversible.html.Google Scholar
- Z. Gu, E. T. Barr, D. Schleck, and Z. Su. Reusing debugging knowledge via trace-based bug search. In OOPSLA, 2012. Google Scholar
Digital Library
- C. Head, G. Lefebvre, M. Spear, N. Taylor, and A. Warfield. Debugging through time with the Tralfamadore debugger. In RESoLVE, 2012.Google Scholar
- U. Hölzle, C. Chambers, and D. Ungar. Debugging optimized code with dynamic deoptimization. In PLDI, 1992. Google Scholar
Digital Library
- G. C. Hunt and J. R. Larus. Singularity: Rethinking the software stack. SIGOPS, 2007. Google Scholar
Digital Library
- Intellitrace. http://msdn.microsoft.com/en-us/library/vstudio/dd264915.aspx.Google Scholar
- L. Jiang and Z. Su. Context-aware statistical debugging: From bug predictors to faulty control flow paths. In ASE, 2007. Google Scholar
Digital Library
- R. Jones, A. Hosking, and E. Moss. The Garbage Collection Handbook: The art of automatic memory management. Chapman & Hall/CRC, 2012. Google Scholar
Digital Library
- Y. P. Khoo, J. S. Foster, and M. Hicks. Expositor: Scriptable time-travel debugging with first-class traces. In ICSE, 2013. Google Scholar
Digital Library
- S. T. King, G. W. Dunlap, and P. M. Chen. Debugging operating systems with time-traveling virtual machines. In USENIX, 2005. Google Scholar
Digital Library
- A. Ko and B. Myers. Debugging reinvented: Asking and answering why and why not questions about program behavior. In ICSE, 2008. Google Scholar
Digital Library
- B. Lee, M. Hirzel, R. Grimm, and K. S. McKinley. Debug all your code: Portable mixed-environment debugging. In OOPSLA, 2009. Google Scholar
Digital Library
- H. Lee, D. von Dincklage, A. Diwan, and J. E. B. Moss. Understanding the behavior of compiler optimizations. Software Practice and Experience, 2006. Google Scholar
Digital Library
- D. Lessa, B. Jayaraman, and J. Chomicki. Temporal data model for program debugging. In DBPL, 2011.Google Scholar
- B. Lewis. Debugging backwards in time. In AADEBUG, 2003.Google Scholar
- A. Lienhard, T. Gîrba, and O. Nierstrasz. Practical object-oriented back-in-time debugging. In ECOOP, 2008. Google Scholar
Digital Library
- J. T. K. Lo, E. Wohlstadter, and A. Mesbah. Imagen: Runtime migration of browser sessions for JavaScript web applications. In WWW, 2013. Google Scholar
Digital Library
- J. Mickens, J. Elson, and J. Howell. Mugshot: Deterministic capture and replay for Javascript applications. In NSDI, 2010. Google Scholar
Digital Library
- S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann, 1997. Google Scholar
Digital Library
- J. Nielsen. Usability Engineering. Morgan Kaufmann, 1st edition, 1993. Google Scholar
Digital Library
- F. Pizlo, L. Ziarek, P. Maj, A. L. Hosking, E. Blanton, and J. Vitek. Schism: Fragmentation-tolerant real-time garbage collection. In PLDI, 2010. Google Scholar
Digital Library
- J. S. Plank, M. Beck, G. Kingsley, and K. Li. Libckpt: Transparent checkpointing under Unix. In TCON, 1995. Google Scholar
Digital Library
- F. Pluquet, S. Langerman, and R. Wuyts. Executing code in the past: Efficient in-memory object graph versioning. In OOPSLA, 2009. Google Scholar
Digital Library
- D. E. Porter, S. Boyd-Wickizer, J. Howell, R. Olinsky, and G. C. Hunt. Rethinking the library OS from the top down. In ASPLOS, 2011. Google Scholar
Digital Library
- G. Pothier, E. Tanter, and J. M. Piquer. Scalable omniscient debugging. In OOPSLA, 2007. Google Scholar
Digital Library
- C. Reichenbach, N. Immerman, Y. Smaragdakis, E. E. Aftandilian, and S. Z. Guyer. What can the GC compute efficiently? A language for heap assertions at GC time. In OOPSLA, 2010. Google Scholar
Digital Library
- Retrace. http://www.replaydebugging.com/.Google Scholar
- Mozilla rr tool. http://rr-project.org/.Google Scholar
- J. B. Sartor, M. Hirzel, and K. S. McKinley. No bit left behind: The limits of heap data compression. In ISMM, 2008. Google Scholar
Digital Library
- P. Ta-Shma, G. Laden, M. Ben-Yehuda, and M. Factor. Virtual machine time travel using continuous data protection and checkpointing. SIGOPS, 2008. Google Scholar
Digital Library
- UndoDB v3.5. http://undo-software.com.Google Scholar
- UndoDB performance, Visited June 4, 2014. http://undo-software.com/content/faqs.Google Scholar
- A.-M. Visan, K. Arya, G. Cooperman, and T. Denniston. URDB: A universal reversible debugger based on decomposing debugging histories. In PLOS, 2011. Google Scholar
Digital Library
- P. R. Wilson and T. G. Moher. Demonic memories for process histories. In PLDI, 1989. Google Scholar
Digital Library
- M. Xu, V. Malyugin, J. Sheldon, G. Venkitachalam, and B. Weissman. Retrace: Collecting execution trace with virtual machine deterministic replay. In MoBS, 2007.Google Scholar
Index Terms
Tardis: affordable time-travel debugging in managed runtimes
Recommendations
Tardis: affordable time-travel debugging in managed runtimes
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & ApplicationsDevelopers who set a breakpoint a few statements too late or who are trying to diagnose a subtle bug from a single core dump often wish for a time-traveling debugger. The ability to rewind time to see the exact sequence of statements and program values ...
TARDIS: software-only system-level record and replay in wireless sensor networks
IPSN '15: Proceedings of the 14th International Conference on Information Processing in Sensor NetworksWireless sensor networks (WSNs) are plagued by the possibility of bugs manifesting only at deployment. However, debugging deployed WSNs is challenging for several reasons---the remote location of deployed sensor nodes, the non-determinism of execution ...
Tardis: A Fault-Tolerant Design for Network Control Planes
SOSR '21: Proceedings of the ACM SIGCOMM Symposium on SDN Research (SOSR)Guaranteeing high availability of networks virtually hinges on the ability to handle and recover from bugs and failures. Yet, despite the advances in verification, testing, and debugging, production networks remain susceptible to large-scale failures ---...







Comments