Abstract
Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the need to reproduce the order of or the values read by shared memory operations performed by multiple threads. In this paper, we present DoublePlay, a new way to efficiently guarantee replay on commodity multiprocessors. Our key insight is that one can use the simpler and faster mechanisms of single-processor record and replay, yet still achieve the scalability offered by multiple cores, by using an additional execution to parallelize the record and replay of an application. DoublePlay timeslices multiple threads on a single processor, then runs multiple time intervals (epochs) of the program concurrently on separate processors. This strategy, which we call uniparallelism, makes logging much easier because each epoch runs on a single processor (so threads in an epoch never simultaneously access the same memory) and different epochs operate on different copies of the memory. Thus, rather than logging the order of shared-memory accesses, we need only log the order in which threads in an epoch are timesliced on the processor. DoublePlay runs an additional execution of the program on multiple processors to generate checkpoints so that epochs run in parallel. We evaluate DoublePlay on a variety of client, server, and scientific parallel benchmarks; with spare cores, DoublePlay reduces logging overhead to an average of 15% with two worker threads and 28% with four threads.
- Altekar, G. and Stoica, I. 2009. ODR: Output-deterministic replay for multicore debugging. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles. 193--206. Google Scholar
Digital Library
- Aviram, A., Weng, S.-C., Hu, S., and Ford, B. 2010. Efficient system-enforced deterministic parallelism. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation. Google Scholar
Digital Library
- Bacon, D. F. and Goldstein, S. C. 1991. Hardware assisted replay of multiprocessor programs. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging. ACM Press, 194--206. Google Scholar
Digital Library
- Bergan, T., Anderson, O., Devietti, J., Ceze, L., and Grossman, D. 2010a. Coredet: A compiler and runtime system for deterministic multithreaded execution. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. 53--64. Google Scholar
Digital Library
- Bergan, T., Hunt, N., Ceze, L., and Gribble, S. D. 2010b. Deterministic Process Groups in dOS. In Proceedings of the Symposium on Operating Systems Design and Implementation. Google Scholar
Digital Library
- Berger, E. D., Yang, T., Liu, T., and Novark, G. 2009. Grace: Safe multithreaded programming for C/C++. In Proceedings of OOPSLA. 81--96. Google Scholar
Digital Library
- Bhansali, S., Chen, W., de Jong, S., Edwards, A., and Drinic, M. 2006. Framework for instruction-level tracing and analysis of programs. In Proceedings of the 2nd International Conference on Virtual Execution Environments. 154--163. Google Scholar
Digital Library
- Bocchino Jr., R. L., Adve, V. S., Dig, D., Adve, S. V., Heumann, S., Komuravelli, R., Overbey, J., Simmons, P., Sung, H., and Vakilian, M. 2009. A type and effect system for deterministic parallel java. In Proceedings of OOPSLA. 97--116. Google Scholar
Digital Library
- Bressoud, T. C. and Schneider, F. B. 1996. Hypervisor-based fault tolerance. ACM Trans. Comput. Syst. 14, 1, 80--107. Google Scholar
Digital Library
- Choi, J. D., Alpern, B., Ngo, T., and Sridharan, M. 2001. A perturbation free replay platform for cross-optimized multithreaded applications. In Proceedings of the 15th International Parallel and Distributed Processing Symposium. Google Scholar
Digital Library
- Chow, J., Garfinkel, T., and Chen, P. M. 2008. Decoupling dynamic program analysis from execution in virtual environments. In Proceedings of the USENIX Technical Conference. 1--14. Google Scholar
Digital Library
- Devietti, J., Lucia, B., Ceze, L., and Oskin, M. 2009. DMP: Deterministic shared memory multiprocessing. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 85--96. Google Scholar
Digital Library
- Dunlap, G. W., King, S. T., Cinar, S., Basrai, M. A., and Chen, P. M. 2002. ReVirt: Enabling intrusion analysis through virtual-machine logging and replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation. 211--224. Google Scholar
Digital Library
- Dunlap, G. W., Lucchetti, D. G., Fetterman, M., and Chen, P. M. 2008. Execution replay on multiprocessor virtual machines. In Proceedings of the 2008 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE). 121--130. Google Scholar
Digital Library
- Feldman, S. I. and Brown, C. B. 1988. IGOR: A system for program debugging via reversible execution. In Proceedings of the ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging (PADD’88). 112--123. Google Scholar
Digital Library
- Hower, D. R. and Hill, M. D. 2008. Rerun: Exploiting episodes for lightweight memory race recording. In Proceedings of the International Symposium on Computer Architecture. 265--276. Google Scholar
Digital Library
- Kelsey, K., Bai, T., Ding, C., and Zhang, C. 2009. Fast Track: A software system for speculative program optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO). 157--168. Google Scholar
Digital Library
- King, S. T., Dunlap, G. W., and Chen, P. M. 2005. Debugging operating systems with time-traveling virtual machines. In Proceedings of the USENIX Technical Conference. 1--15. Google Scholar
Digital Library
- Laadan, O., Viennot, N., and Nieh, J. 2010. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In Proceedings of the International Conference on Measurements and Modeling of Computer Systems (SIGMETRICS). 155--166. Google Scholar
Digital Library
- LeBlanc, T. J. and Mellor-Crummey, J. M. 1987. Debugging parallel programs with instant replay. IEEE Trans. Comput. 36, 4, 471--482. Google Scholar
Digital Library
- Lee, D., Said, M., Narayanasamy, S., Yang, Z. J., and Pereira, C. 2009. Offline symbolic analysis for multi-processor execution replay. In Proceedings of the International Symposium on Microarchitecture (MICRO). Google Scholar
Digital Library
- Lee, D., Wester, B., Veeraraghavan, K., Chen, P. M., Flinn, J., and Narayanasamy, S. 2010. Respec: Efficient online multiprocessor replay via speculation and external determinism. In Proceedings of ASPLOS. 77--89. Google Scholar
Digital Library
- Lucia, B., Ceze, L., Strauss, K., Qadeer, S., and Boehm, H.-J. 2010. Conflict Exceptions: Simplifying Concurrent Language Semantics with Precise Hardware Exceptions for Data-Races. In Proceedings of the International Symposium on Computer Architecture. 210--221. Google Scholar
Digital Library
- Mellor-Crummey, J. M. and LeBlanc, T. J. 1989. A Software Instruction Counter. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. 78--86. Google Scholar
Digital Library
- Montesinos, P., Ceze, L., and Torrellas, J. 2008. DeLorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In Proceedings of the International Symposium on Computer Architecture. 289--300. Google Scholar
Digital Library
- Musuvathi, M., Qadeer, S., Ball, T., Basler, G., Nainar, P. A., and Neamtiu, I. 2008. Finding and reproducing heisenbugs in concurrent programs. In Proceedings of the Symposium on Operating Systems Design and Implementation. 267--280. Google Scholar
Digital Library
- Narayanasamy, S., Pokam, G., and Calder, B. 2005. BugNet: Continuously recording program execution for deterministic replay debugging. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA). 284--295. Google Scholar
Digital Library
- Narayanasamy, S., Pereira, C., and Calder, B. 2006a. Recording shared memory dependencies using Strata. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. 229--240. Google Scholar
Digital Library
- Narayanasamy, S., Pereira, C., Patil, H., Cohn, R., and Calder, B. 2006b. Automatic logging of operating system effects to guide application-level architecture simulation. In Proceedings of the International Conference on Measurements and Modeling of Computer Systems (SIGMETRICS). 216--227. Google Scholar
Digital Library
- Netzer, R. H. B. 1993. Optimal tracing and replay for debugging shared-memory parallel programs. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging. 1--11. Google Scholar
Digital Library
- Nightingale, E. B., Chen, P. M., and Flinn, J. 2005. Speculative execution in a distributed file system. In Proceedings of the ACM Symposium on Operating Systems Principles. 191--205. Google Scholar
Digital Library
- Nightingale, E. B., Veeraraghavan, K., Chen, P. M., and Flinn, J. 2006. Rethink the sync. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation. 1--14. Google Scholar
Digital Library
- Nightingale, E. B., Peek, D., Chen, P. M., and Flinn, J. 2008. Parallelizing security checks on commodity hardware. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems. 308--318. Google Scholar
Digital Library
- Olszewski, M., Ansel, J., and Amarasinghe, S. 2009. Kendo: Efficient deterministic multithreading in software. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 97--108. Google Scholar
Digital Library
- Oplinger, J. and Lam, M. S. 2002. Enhancing software reliability using speculative threads. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 184--196. Google Scholar
Digital Library
- Park, S., Zhou, Y., Xiong, W., Yin, Z., Kaushik, R., Lee, K. H., and Lu, S. 2009. PRES: Probabilistic replay with execution sketching on multiprocessors. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles. 177--191. Google Scholar
Digital Library
- Peterson, Z. N. J. and Burns, R. 2005. Ext3cow: A time-shifting file system for regulatory compliance. ACM Trans. Storage 1, 2, 190--212. Google Scholar
Digital Library
- Purser, Z., Sundaramoorthy, K., and Rotenberg, E. 2000. Slipstream processors: Improving both performance and fault tolerance. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems. 257--268. Google Scholar
Digital Library
- Ronsse, M. and Bosschere, K. D. 1999. RecPlay: A full integrated practical record/replay system. ACM Trans. Comput. Syst. 17, 2, 133--152. Google Scholar
Digital Library
- Russinovich, M. and Cogswell, B. 1996. Replay for concurrent non-deterministic shared-memory applications. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 258--266. Google Scholar
Digital Library
- Santry, D. S., Feeley, M. J., Hutchinson, N. C., Veitch, A. C., Carton, R. W., and Ofir, J. 1999. Deciding when to forget in the Elephant file system. SIGOPS Oper. Syst. Rev. 33, 5, 110--123. Google Scholar
Digital Library
- Sohi, G. S., Breach, S. E., and Vijaykumar, T. N. 1995. Multiscalar processors. In Proceedings of the International Symposium on Computer Architecture. 414--425. Google Scholar
Digital Library
- Srinivasan, S., Andrews, C., Kandula, S., and Zhou, Y. 2004. Flashback: A light-weight extension for rollback and deterministic replay for software debugging. In Proceedings of the USENIX Technical Conference. 29--44. Google Scholar
Digital Library
- Steffan, J. G. and Mowry, T. C. 1998. The potential for using thread-level data speculation to facilitate automatic parallelization. In Proceedings of the Symposium on High Performance Computer Architecture. 2--13. Google Scholar
Digital Library
- Süßkraut, M., Knauth, T., Weigert, S., Schiffel, U., Meinhold, M., Fetzer, C., Bai, T., Ding, C., and Zhang, C. 2010. Prospect: A compiler framework for speculative parallelization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO). 131--140. Google Scholar
Digital Library
- Tucek, J., Lu, S., Huang, C., Xanthos, S., and Zhou, Y. 2007. Triage: Diagnosing production run failures at the user’s site. In Proceedings of the 21st ACM Symposium on Operating Systems Principles. 131--144. Google Scholar
Digital Library
- Veeraraghavan, K., Flinn, J., Nightingale, E. B., and Noble, B. 2010. quFiles: The right file at the right time. In Proceedings of the 8th USENIX Conference on File and Storage Technologies. 1--14. Google Scholar
Digital Library
- Veeraraghavan, K., Chen, P. M., Flinn, J., and Narayanasamy, S. 2011. Surviving and detecting data races using complementary schedules. In Proceedings of the Symposium on Operating Systems Principles (SOSP). Google Scholar
Digital Library
- Vlachos, E., Goodstein, M. L., Kozuch, M. A., Chen, S., Falsafi, B., Gibbons, P. B., and Mowry, T. C. 2010. ParaLog: Enabling and accelerating online parallel monitoring of multithreaded applications. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems. 271--284. Google Scholar
Digital Library
- Weeratunge, D., Zhang, X., and Jagannathan, S. 2010. Analyzing multicore dumps to facilitate concurrency bug reproduction. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 155--166. Google Scholar
Digital Library
- Woo, S. C., Ohara, M., Torrie, E., Singh, J. P., and Gupta, A. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22nd International Symposium on Computer Architecture. 24--36. Google Scholar
Digital Library
- Xu, M., Bodik, R., and Hill, M. D. 2003. A “flight data recorder” for enabling full-system multiprocessor deterministic replay. In Proceedings of the International Symposium on Computer Architecture. 122--135. Google Scholar
Digital Library
- Xu, M., Malyugin, V., Sheldon, J., Venkitachalam, G., and Weissman, B. 2007. ReTrace: Collecting execution trace with virtual machine deterministic replay. In Proceedings of the Workshop on Modeling, Benchmarking and Simulation (MoBS).Google Scholar
- Zamfir, C. and Candea, G. 2010. Execution synthesis: A technique for automated software debugging. In Proceedings of the European Conference on Computer Systems (EuroSys). 321--334. Google Scholar
Digital Library
- Zilles, C. and Sohi, G. 2002. Master/slave speculative parallelization. In Proceedings of the 35th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO). 85--96. Google Scholar
Digital Library
Index Terms
DoublePlay: Parallelizing Sequential Logging and Replay
Recommendations
DoublePlay: parallelizing sequential logging and replay
ASPLOS '11Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the ...
DoublePlay: parallelizing sequential logging and replay
ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systemsDeterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the ...
DoublePlay: parallelizing sequential logging and replay
ASPLOS '11Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on multiprocessors is very challenging to implement efficiently because of the ...








Comments