ABSTRACT
Current shared memory multicore and multiprocessor systems are nondeterministic. Each time these systems execute a multithreaded application, even if supplied with the same input, they can produce a different output. This frustrates debugging and limits the ability to properly test multithreaded code, becoming a major stumbling block to the much-needed widespread adoption of parallel programming.
In this paper we make the case for fully deterministic shared memory multiprocessing (DMP). The behavior of an arbitrary multithreaded program on a DMP system is only a function of its inputs. The core idea is to make inter-thread communication fully deterministic. Previous approaches to coping with nondeterminism in multithreaded programs have focused on replay, a technique useful only for debugging. In contrast, while DMP systems are directly useful for debugging by offering repeatability by default, we argue that parallel programs should execute deterministically in the field as well. This has the potential to make testing more assuring and increase the reliability of deployed multithreaded software. We propose a range of approaches to enforcing determinism and discuss their implementation trade-offs. We show that determinism can be provided with little performance cost using our architecture proposals on future hardware, and that software-only approaches can be utilized on existing systems.
- D. Bacon and S. Goldstein. Hardware-Assisted Replay of Multiprocessor Programs. In Workshop on Parallel and Distributed Debugging, 1991. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. Technical report, Princeton University, January 2008.Google Scholar
- S. Gopal, T. N. Vijaykumar, J. E. Smith, and G. S. Sohi. Speculative Versioning Cache. In International Symposium on High Performance Computer Architecture, 1998. Google Scholar
Digital Library
- L. Hammond, M. Willey, and K. Olukotun. Data Speculation Support for a Chip Multiprocessor. In International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998. Google Scholar
Digital Library
- L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional Memory Coherence and Consistency. In International Symposium on Computer Architecture, 2004. Google Scholar
Digital Library
- M. Herlihy and J. E. B. Moss. Transactional Memory: Architectural Support for Lock-Free Data Structures. In International Symposium on Computer Architecture, 1993. Google Scholar
Digital Library
- D. Hower and M. Hill. Rerun: Exploiting Episodes for Lightweight Memory Race Recording. In International Symposium on Computer Architecture, 2008. Google Scholar
Digital Library
- W. Hwu, S. Ryoo, Sain-Zee Ueng, J.H. Kelm, I. Gelado, S.S. Stone, R.E. Kidd, S.S. Baghsorkhi, A.A. Mahesri, S.C. Tsao, N. Navarro, S.S. Lumetta, M.I. Frank, and S.J. Patel. Implicitly Parallel Programming Models for Thousand-Core Microprocessors. In Design Automation Conference, 2007. Google Scholar
Digital Library
- J. Choi and H. Srinivasan. Deterministic Replay of Java Multithreaded Applications. In SIGMETRICS Symposium on Parallel and Distributed Tools, 1998. Google Scholar
Digital Library
- V. Krishnan and J. Torrellas. A Chip-Multiprocessor Architecture with Speculative Multithreading. IEEE Transactions on Computers, September 1999. Google Scholar
Digital Library
- C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In International Symposium on Code Generation and Optimization, 2004. Google Scholar
Digital Library
- T.J. Leblanc and J.M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE Transactions on Computers, April 1987. Google Scholar
Digital Library
- E. A. Lee. The problem with threads. IEEE Computer, May 2006. Google Scholar
Digital Library
- C. K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Janapa Reddi, and K. Hazelwood. PIN: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Conference on Programming Language Design and Implementation, 2005. Google Scholar
Digital Library
- P. Montesinos, L. Ceze, and J. Torrellas. DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently. In International Symposium on Computer Architecture, 2008. Google Scholar
Digital Library
- S. Narayanasamy, C. Pereira, and B. Calder. Recording Shared Memory Dependencies Using Strata. In International Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google Scholar
Digital Library
- S. Narayanasamy, G. Pokam, and B. Calder. BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging. In International Symposium on Computer Architecture, 2005. Google Scholar
Digital Library
- M. Rinard and M. Lam. The design, implementation, and evaluation of Jade. ACM Transactions on Programming Languages and Systems, May 1988. Google Scholar
Digital Library
- M. Ronsee and K. De Bosschere. RecPlay: A Fully Integrated Practical Record/Replay System. ACM Transactions on Computer Systems, 1999. Google Scholar
Digital Library
- G. S. Sohi, S. E. Breach, and T. N. Vijayakumar. Multiscalar Processors. In International Symposium on Computer Architecture, June 1995. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, and S. Amarasinghe. StreamIt: A Language for Streaming Applications. In International Conference on Compiler Construction, 2002. Google Scholar
Digital Library
- S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. The SPLASH-2 Programs: Characterization and Methodological Considerations. In International Symposium on Computer Architecture, 1995. Google Scholar
Digital Library
- M. Xu, R. Bodik, and M. Hill. A "Flight Data Recorder" for Enabling Full-System Multiprocessor Deterministic Replay. In International Symposium on Computer Architecture, 2003. Google Scholar
Digital Library
- M. Xu, M. Hill, and R. Bodik. A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording. In International Conference on Architectural Support for Programming Languages and Operating Systems, 2006. Google Scholar
Digital Library
Index Terms
DMP: deterministic shared memory multiprocessing
Recommendations
DMP: deterministic shared memory multiprocessing
ASPLOS 2009Current shared memory multicore and multiprocessor systems are nondeterministic. Each time these systems execute a multithreaded application, even if supplied with the same input, they can produce a different output. This frustrates debugging and limits ...
DMP: deterministic shared memory multiprocessing
ASPLOS 2009Current shared memory multicore and multiprocessor systems are nondeterministic. Each time these systems execute a multithreaded application, even if supplied with the same input, they can produce a different output. This frustrates debugging and limits ...
Kendo: efficient deterministic multithreading in software
ASPLOS 2009Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for parallel programmers by ...








Comments