Abstract
Now at VMware. Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance and intrusion analysis. Memory race recording is a key technology for multithreaded deterministic replay. In this paper, we considerably improve our previous always-on Flight Data Recorder (FDR) in four ways: •Longer recording by reducing the log size growth rate to approximately one byte per thousand dynamic instructions. •Lower hardware cost by reducing the cost to 24 KB per processor core. •Simpler design by modifying only the cache coherence protocol, but not the cache. •Broader applicability by supporting both Sequential Consistency (SC) and Total Store Order (TSO) memory consistency models (existing recorders support only SC).These improvements stem from several ideas: (1) a Regulated Transitive Reduction (RTR) recording algorithm that creates stricter and vectorizable dependencies to reduce the log growth rate; (2) a Set/LRU timestamp approximation method that better approximates timestamps of uncached memory locations to reduce the hardware cost; (3) an order-value-hybrid recording methodthat explicitly logs the value of potential SC-violating load instructions to support multiprocessor systems with TSO.
- A.R. Alameldeen, et al. Simulating a $2M Commercial Server on a $2K PC. IEEE Computer, 36(2):50--57, Feb. 2003. Google Scholar
Digital Library
- Arvind and J.-W. Maessen. Memory Model = Instruction Reordering + Store Atomicity. In Proceedings of the 33nd Annual International Symposium on Computer Architecture, June 2006. Google Scholar
Digital Library
- D.F. Bacon and S.C. Goldstein. Hardware-Assisted Replay of Multiprocessor Programs. Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, published in ACM SIGPLAN Notices, pages 194--206, 1991. Google Scholar
Digital Library
- P. Barford and M. Crovella. Generating Representative Web Workloads for Network and Server Performance Evaluation. In Proceedings of the 1998 Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 151--160, June 1998. Google Scholar
Digital Library
- L.A. Barroso, et al. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000. Google Scholar
Digital Library
- H.W. Cain and M.H. Lipasti. Memory Ordering: A Value-Based Approach. In Proceedings of the 31st Annual International Symposium on Computer Architecture, June 2004. Google Scholar
Digital Library
- J.-D. Choi and H. Srinivasan. Deterministic Replay of Java Multithread Applications. In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools (SPDT-98), Aug. 1998. Google Scholar
Digital Library
- G.W. Dunlap, et al. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In Proceedings of the 2002 Symposium on Operating Systems Design and Implementation, pages 211--224, Dec. 2002. Google Scholar
Digital Library
- K. Gharachorloo, et al. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proceedings of the International Conference on Parallel Processing, volume I, p 355--364, Aug. 1991.Google Scholar
- C. Gniady, et al. Is SC + ILP = RC? In Proceedings of the 26th International Symposium on Computer Architecture, May 1999. Google Scholar
Digital Library
- P. Kongetira, et al. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21--29, Mar 2005. Google Scholar
Digital Library
- L. Lamport. Time, Clocks and the Ordering of Events in a Distributed System. Communications of the ACM, 21(7):558--565, July 1978. Google Scholar
Digital Library
- T.J. Leblanc and J.M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, C-36(4):471--482, Apr. 1987. Google Scholar
Digital Library
- K. Lepak. Personal Communication, Mar. 2006.Google Scholar
- D. Lucchetti, et al. ExtraVirt: Detecting and recovering from transient processor faults. In 2005 Symposium on Operating System Principles work-in-progress session, Oct. 2005. Google Scholar
Digital Library
- P.S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50--58, Feb. 2002. Google Scholar
Digital Library
- M. Martin, et al. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, pages 92--99, Sept. 2005. Google Scholar
Digital Library
- M.R. Marty, et al. Improving Multiple-CMP Systems Using Token Coherence. In Proceedings of the Eleventh IEEE Symposium on High-Performance Computer Architecture, Feb. 2005. Google Scholar
Digital Library
- S.L. Min and J.-D. Choi. An Efficient Cache-based Access Anomaly Detection Scheme. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 235-244, Apr. 1991. Google Scholar
Digital Library
- S. Narayanasamy, et al. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Proceedings of the 32nd International Symposium on Computer Architecture, June 2005. Google Scholar
Digital Library
- R.H.B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. In Proceedings of the Workshop on Parallel and Distributed Debugging (PADD), p 1--11, 1993. Google Scholar
Digital Library
- C. Newburn. Personal Communication, Oct. 2003.Google Scholar
- C.M. Pancake and R.H.B. Netzer. A bibliography of parallel debuggers, 1993 edition. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging (PADD), p 169--186, 1993. Google Scholar
Digital Library
- M. Prvulovic. CORD: Cost-effective (and nearly overhead-free) Order Recording and Data race detection. In Proceedings of the 12th Symposium on High-Performance Computer Architecture, Feb. 2006.Google Scholar
Cross Ref
- M. Prvulovic and J. Torrellas. ReEnact: Using Thread-Level Speculation Mechanisms to Debug Data Races in Multithreaded Codes. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pages 110--121, June 2003. Google Scholar
Digital Library
- F. Qin, S. Lu, and Y. Zhou. SafeMem: Exploiting ECC-Memory for Detecting Memory Leaks and Memory Corruption During Production Runs. In Proceedings of the Eleventh IEEE Symposium on High-Performance Computer Architecture, Feb. 2005. Google Scholar
Digital Library
- B. Richards and J.R. Larus. Protocol-based Data-race Detection. In SIGMETRICS symposium on Parallel and Distributed Tools, 1998. Google Scholar
Digital Library
- M. Ronsse and K. De Bosschere. Non-intrusive On-the-fly Data Race Detection using Execution Replay. In AADEBUG, Nov. 2000.Google Scholar
- M. Ronsse, et al. Efficient coding of execution-traces of parallel programs. In Proceedings of the ProRISC & IEEE-Benelux workshop on Circuits, Systems and Signal Processing, p 251--258, Mar. 1995.Google Scholar
- M. Rosenblum. Virtual is Better Than Real. http://www.vmware.com/vmworld/2005/keynote_rosenblum.pdf.Google Scholar
- D.L. Weaver and T. Germond, editors. SPARC Architecture Manual (Version 9). PTR Prentice Hall, 1994. Google Scholar
Digital Library
- M. Xu. Race Recording for Multithreaded Deterministic Replay Using Multiprocessor Hardware. PhD thesis, http://www.cs.wisc.edu/multifacet/theses/min_xu_phd.pdf, University of Wisconsin-Madison, 2006. Google Scholar
Digital Library
- M. Xu, et al. A "Flight Data Recorder" for Enabling Full-system Multiprocessor Deterministic Replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003. Google Scholar
Digital Library
- K.C. Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28--40, Apr. 1996. Google Scholar
Digital Library
- P. Zhou, et al. AccMon: Automatically Detecting Memory-related Bugs via Program Counter-based Invariants. In Proceedings of the 37th Annual International Symposium on Microarchitecture, 2004. Google Scholar
Digital Library
- P. Zhou, et al. iWatcher: Efficient Architectural Support for Software Debugging. In Proceedings of the 31st Annual International Symposium on Computer Architecture, page 224, June 2004. Google Scholar
Digital Library
- J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, 23(3):337--343, May 1977.Google Scholar
Digital Library
Index Terms
A regulated transitive reduction (RTR) for longer memory race recording
Recommendations
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 2006 ASPLOS ConferenceNow at VMware. Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance and intrusion analysis. Memory race recording is a key technology for multithreaded deterministic replay. In this paper, we considerably ...
A regulated transitive reduction (RTR) for longer memory race recording
Proceedings of the 2006 ASPLOS ConferenceNow at VMware. Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance and intrusion analysis. Memory race recording is a key technology for multithreaded deterministic replay. In this paper, we considerably ...
A regulated transitive reduction (RTR) for longer memory race recording
ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systemsNow at VMware. Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance and intrusion analysis. Memory race recording is a key technology for multithreaded deterministic replay. In this paper, we considerably ...






Comments