skip to main content
article

A regulated transitive reduction (RTR) for longer memory race recording

Published:20 October 2006Publication History
Skip Abstract Section

Abstract

Now at VMware. Multithreaded deterministic replay has important applications in cyclic debugging, fault tolerance and intrusion analysis. Memory race recording is a key technology for multithreaded deterministic replay. In this paper, we considerably improve our previous always-on Flight Data Recorder (FDR) in four ways: •Longer recording by reducing the log size growth rate to approximately one byte per thousand dynamic instructions. •Lower hardware cost by reducing the cost to 24 KB per processor core. •Simpler design by modifying only the cache coherence protocol, but not the cache. •Broader applicability by supporting both Sequential Consistency (SC) and Total Store Order (TSO) memory consistency models (existing recorders support only SC).These improvements stem from several ideas: (1) a Regulated Transitive Reduction (RTR) recording algorithm that creates stricter and vectorizable dependencies to reduce the log growth rate; (2) a Set/LRU timestamp approximation method that better approximates timestamps of uncached memory locations to reduce the hardware cost; (3) an order-value-hybrid recording methodthat explicitly logs the value of potential SC-violating load instructions to support multiprocessor systems with TSO.

References

  1. A.R. Alameldeen, et al. Simulating a $2M Commercial Server on a $2K PC. IEEE Computer, 36(2):50--57, Feb. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arvind and J.-W. Maessen. Memory Model = Instruction Reordering + Store Atomicity. In Proceedings of the 33nd Annual International Symposium on Computer Architecture, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D.F. Bacon and S.C. Goldstein. Hardware-Assisted Replay of Multiprocessor Programs. Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, published in ACM SIGPLAN Notices, pages 194--206, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Barford and M. Crovella. Generating Representative Web Workloads for Network and Server Performance Evaluation. In Proceedings of the 1998 Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 151--160, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L.A. Barroso, et al. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H.W. Cain and M.H. Lipasti. Memory Ordering: A Value-Based Approach. In Proceedings of the 31st Annual International Symposium on Computer Architecture, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J.-D. Choi and H. Srinivasan. Deterministic Replay of Java Multithread Applications. In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools (SPDT-98), Aug. 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G.W. Dunlap, et al. ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay. In Proceedings of the 2002 Symposium on Operating Systems Design and Implementation, pages 211--224, Dec. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Gharachorloo, et al. Two Techniques to Enhance the Performance of Memory Consistency Models. In Proceedings of the International Conference on Parallel Processing, volume I, p 355--364, Aug. 1991.Google ScholarGoogle Scholar
  10. C. Gniady, et al. Is SC + ILP = RC? In Proceedings of the 26th International Symposium on Computer Architecture, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Kongetira, et al. Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro, 25(2):21--29, Mar 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Lamport. Time, Clocks and the Ordering of Events in a Distributed System. Communications of the ACM, 21(7):558--565, July 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T.J. Leblanc and J.M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, C-36(4):471--482, Apr. 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Lepak. Personal Communication, Mar. 2006.Google ScholarGoogle Scholar
  15. D. Lucchetti, et al. ExtraVirt: Detecting and recovering from transient processor faults. In 2005 Symposium on Operating System Principles work-in-progress session, Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P.S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50--58, Feb. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Martin, et al. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, pages 92--99, Sept. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M.R. Marty, et al. Improving Multiple-CMP Systems Using Token Coherence. In Proceedings of the Eleventh IEEE Symposium on High-Performance Computer Architecture, Feb. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S.L. Min and J.-D. Choi. An Efficient Cache-based Access Anomaly Detection Scheme. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 235-244, Apr. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Narayanasamy, et al. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In Proceedings of the 32nd International Symposium on Computer Architecture, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R.H.B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. In Proceedings of the Workshop on Parallel and Distributed Debugging (PADD), p 1--11, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Newburn. Personal Communication, Oct. 2003.Google ScholarGoogle Scholar
  23. C.M. Pancake and R.H.B. Netzer. A bibliography of parallel debuggers, 1993 edition. In Proceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging (PADD), p 169--186, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Prvulovic. CORD: Cost-effective (and nearly overhead-free) Order Recording and Data race detection. In Proceedings of the 12th Symposium on High-Performance Computer Architecture, Feb. 2006.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Prvulovic and J. Torrellas. ReEnact: Using Thread-Level Speculation Mechanisms to Debug Data Races in Multithreaded Codes. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pages 110--121, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Qin, S. Lu, and Y. Zhou. SafeMem: Exploiting ECC-Memory for Detecting Memory Leaks and Memory Corruption During Production Runs. In Proceedings of the Eleventh IEEE Symposium on High-Performance Computer Architecture, Feb. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Richards and J.R. Larus. Protocol-based Data-race Detection. In SIGMETRICS symposium on Parallel and Distributed Tools, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Ronsse and K. De Bosschere. Non-intrusive On-the-fly Data Race Detection using Execution Replay. In AADEBUG, Nov. 2000.Google ScholarGoogle Scholar
  29. M. Ronsse, et al. Efficient coding of execution-traces of parallel programs. In Proceedings of the ProRISC & IEEE-Benelux workshop on Circuits, Systems and Signal Processing, p 251--258, Mar. 1995.Google ScholarGoogle Scholar
  30. M. Rosenblum. Virtual is Better Than Real. http://www.vmware.com/vmworld/2005/keynote_rosenblum.pdf.Google ScholarGoogle Scholar
  31. D.L. Weaver and T. Germond, editors. SPARC Architecture Manual (Version 9). PTR Prentice Hall, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Xu. Race Recording for Multithreaded Deterministic Replay Using Multiprocessor Hardware. PhD thesis, http://www.cs.wisc.edu/multifacet/theses/min_xu_phd.pdf, University of Wisconsin-Madison, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Xu, et al. A "Flight Data Recorder" for Enabling Full-system Multiprocessor Deterministic Replay. In Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K.C. Yeager. The MIPS R10000 Superscalar Microprocessor. IEEE Micro, 16(2):28--40, Apr. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Zhou, et al. AccMon: Automatically Detecting Memory-related Bugs via Program Counter-based Invariants. In Proceedings of the 37th Annual International Symposium on Microarchitecture, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Zhou, et al. iWatcher: Efficient Architectural Support for Software Debugging. In Proceedings of the 31st Annual International Symposium on Computer Architecture, page 224, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, 23(3):337--343, May 1977.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A regulated transitive reduction (RTR) for longer memory race recording

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 34, Issue 5
          Proceedings of the 2006 ASPLOS Conference
          December 2006
          425 pages
          ISSN:0163-5964
          DOI:10.1145/1168919
          Issue’s Table of Contents
          • cover image ACM Conferences
            ASPLOS XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
            October 2006
            440 pages
            ISBN:1595934510
            DOI:10.1145/1168857

          Copyright © 2006 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 October 2006

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!