Abstract
Providing deterministic execution significantly simplifies the debugging, testing, replication, and deployment of multithreaded programs. Recent work has developed deterministic multiprocessor architectures as well as compiler and runtime systems that enforce determinism in current hardware. Such work has incidentally imposed strong memory-ordering properties. Historically, memory ordering has been relaxed in favor of higher performance in shared memory multiprocessors and, interestingly, determinism exacerbates the cost of strong memory ordering. Consequently, we argue that relaxed memory ordering is vital to achieving faster deterministic execution.
This paper introduces RCDC, a deterministic multiprocessor architecture that takes advantage of relaxed memory orderings to provide high-performance deterministic execution with low hardware complexity. RCDC has two key innovations: a hybrid HW/SW approach to enforcing determinism; and a new deterministic execution strategy that leverages data-race-free-based memory models (e.g., the models for Java and C++) to improve performance and scalability without sacrificing determinism, even in the presence of races. In our hybrid HW/SW approach, the only hardware mechanisms required are software-controlled store buffering and support for precise instruction counting; we do not require speculation. A runtime system uses these mechanisms to enforce determinism for arbitrary programs.
We evaluate RCDC using PARSEC benchmarks and show that relaxing memory ordering leads to performance and scalability close to nondeterministic execution without requiring any form of speculation. We also compare our new execution strategy to one based on TSO (total-store-ordering) and show that some applications benefit significantly from the extra relaxation. We also evaluate a software-only implementation of our new deterministic execution strategy.
- S. Adve and M. Hill. Weak Ordering -- A New Definition. In ISCA, 1990. Google Scholar
Digital Library
- G. Altekar and I. Stoica. ODR: Output-Deterministic Replay for Multicore Debugging. In SOSP, 2009. Google Scholar
Digital Library
- A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient System-Enforced Deterministic Parallelism. In OSDI, 2010. Google Scholar
Digital Library
- T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: a Compiler and Runtime System for Deterministic Multithreaded Execution. In ASPLOS, 2010. Google Scholar
Digital Library
- T. Bergan, N. Hunt, L. Ceze, and S. Gribble. Deterministic Process Groups in dOS. In OSDI, 2010. Google Scholar
Digital Library
- E. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe and Efficient Concurrent Programming. In OOPSLA, 2009.Google Scholar
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In PACT, 2008. Google Scholar
Digital Library
- G. Blelloch. NESL: A Nested Data-Parallel Language (Version 3.1). Technical report, Carnegie Mellon University, Pittsburgh, PA, 2007.Google Scholar
Digital Library
- R. Bocchino, V. Adve, D. Dig, S. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A Type and Effect System for Deterministic Parallel Java. In OOPSLA, 2009. Google Scholar
Digital Library
- H.-J. Boehm and S. Adve. Foundations of the C++ Concurrency Memory Model. In PLDI, 2008. Google Scholar
Digital Library
- L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas. BulkSC: Bulk Enforcement of Sequential Consistency. In ISCA, 2007. Google Scholar
Digital Library
- M. M. T. Chakravarty, R. Leshchinskiy, S. P. Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A Status Report. In Workshop on Declarative Aspects of Multicore Programming (DAMP), 2007. Google Scholar
Digital Library
- H. Cui, J. Wu, C. che Tsai, and J. Yang. Stable Deterministic Multithreading through Schedule Memoization. In OSDI, 2010. Google Scholar
Digital Library
- J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, 2009. Google Scholar
Digital Library
- M. Dubois, J. Wang, L. Barroso, K. Lee, and Y. Chen. Delayed Consistency and Its Effects on the Miss Rate of Parallel Programs. In Supercomputing, 1991. Google Scholar
Digital Library
- S. A. Edwards and O. Tardieu. SHIM: A Deterministic Model for Heterogeneous Embedded Systems. In EMSOFT, 2005. Google Scholar
Digital Library
- K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors. In ISCA, 1990. Google Scholar
Digital Library
- C. Gniady, B. Falsafi, and T. N. Vijaykumar. Is SC ILP = RC? In ISCA, 1999. Google Scholar
Digital Library
- D. Hower, P. Dudnik, D. Wood, and M. Hill. Calvin: Deterministic or Not? Free Will to Choose. In HPCA, 2011. Google Scholar
Digital Library
- D. Hower and M. Hill. ReRun: Exploiting Episodes for Lightweight Memory Race Recording. In ISCA, 2008. Google Scholar
Digital Library
- C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. PIN: Building Customized Program Analysis Tools with Dynamic Instrumentation. In PLDI, 2005. Google Scholar
Digital Library
- J. Manson, W. Pugh, and S. Adve. The Java Memory Model. In POPL, 2005. Google Scholar
Digital Library
- P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In ISCA, 2008. Google Scholar
Digital Library
- S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In ASPLOS, 2006. Google Scholar
Digital Library
- R. H. B. Netzer. Optimal Tracing and Replay for Debugging Shared-Memory Parallel Programs. In PADD, 1993. Google Scholar
Digital Library
- M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, 2009. Google Scholar
Digital Library
- J. Ousterhout. Scheduling Techniques for Concurrent Systems. In ICDCS, 1982.Google Scholar
- S. Parka, W. Xiong, Z. Yin, R. Kaushik, K. Lee, S. Lu, and Y. Zhou. Do You Have to Reproduce the Bug at the First Replay Attempt? -- PRES: Probabilistic Replay with Execution Sketching on Multiprocessors. In SOSP, 2009. Google Scholar
Digital Library
- M. Rinard and M. Lam. The Design, Implementation, and Evaluation of Jade. ACM TOPLAS, 20(3), 1988. Google Scholar
Digital Library
- K. Russell and D. Detlefs. Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing. In OOPSLA, 2006. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In CC, 2002. Google Scholar
Digital Library
- E. Vallejo, M. Galluzzi, A. Cristal, F. Vallejo, R. Beivide, P. Stenstrom, J. E. Smith, and M. Valero. Implementing Kilo-Instruction Multiprocessors. In ICPS, July 2005.Google Scholar
Cross Ref
- C. von Praun, H. Cain, J. Choi, and K. Ryu. Conditional Memory Ordering. In ISCA, 2006. Google Scholar
Digital Library
- T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos. Mechanisms for Store-wait-free Multiprocessors. In ISCA, 2007. Google Scholar
Digital Library
- M. Xu, R. Bodik, and M. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In ISCA, 2003. Google Scholar
Digital Library
- M. Xu, M. Hill, and R. Bodik. A Regulated Transitive Reduction for Longer Memory Race Recording. In ASPLOS, 2006. Google Scholar
Digital Library
Index Terms
RCDC: a relaxed consistency deterministic computer
Recommendations
RCDC: a relaxed consistency deterministic computer
ASPLOS XVI: Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systemsProviding deterministic execution significantly simplifies the debugging, testing, replication, and deployment of multithreaded programs. Recent work has developed deterministic multiprocessor architectures as well as compiler and runtime systems that ...
RCDC: a relaxed consistency deterministic computer
ASPLOS '11Providing deterministic execution significantly simplifies the debugging, testing, replication, and deployment of multithreaded programs. Recent work has developed deterministic multiprocessor architectures as well as compiler and runtime systems that ...
Design and evaluation of a conit-based continuous consistency model for replicated services
The tradeoffs between consistency, performance, and availability are well understood. Traditionally, however, designers of replicated systems have been forced to choose from either strong consistency guarantees or none at all. This paper explores the ...







Comments