Abstract
Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend on its inputs only, which can totally solve the above problem. However, current DMT implementations suffer from a common inefficiency: they use frequent global barriers to enforce a deterministic ordering on memory accesses. In this paper, we eliminate that inefficiency using an execution model we call deterministic lazy release consistency (DLRC). Our execution model uses the Kendo algorithm to enforce a deterministic ordering on synchronization, and it uses a deterministic version of the lazy release consistency memory model to propagate memory updates across threads. Our approach guarantees that programs execute deterministically even when they contain data races. We implemented a DMT system based on these ideas (RFDet) and evaluated it using 16 parallel applications. Our implementation targets C/C++ programs that use POSIX threads. Results show that RFDet gains nearly 2x speedup compared with DThreads-a start-of-the-art DMT system.
- S. V. Adve and J. K. Aggarwal, "A Unified Formalization of Four Shared-Memory Models," IEEE Trans. Parallel Distrib. Syst., vol. 4, pp. 613--624, 1993. Google Scholar
Digital Library
- S. V. Adve and M. D. Hill, "Weak ordering--a new definition," presented at the Proceedings of the 17th annual international symposium on Computer Architecture, Seattle, Washington, USA, 1990. Google Scholar
Digital Library
- A. Amittai, W. Shu-Chun, H. Sen, and F. Bryan, "Efficient system-enforced deterministic parallelism," presented at the Proceedings of the 9th USENIX conference on Operating systems design and implementation, Vancouver, BC, Canada, 2010. Google Scholar
Digital Library
- T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman, "CoreDet: a compiler and runtime system for deterministic multithreaded execution," presented at the Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, Pittsburgh, Pennsylvania, USA, 2010. Google Scholar
Digital Library
- T. Bergan, L. Ceze, and D. Grossman, "Input-Covering Schedules for Multithreaded Programs," in Proceedings of the Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), Indianapolis, Indiana, USA, 2013. Google Scholar
Digital Library
- T. Bergan, J. Devietti, N. Hunt, and L. Ceze, "The Deterministic Execution Hammer: How Well Does it Actually Pound Nails?," in WoDET, 2011.Google Scholar
- T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble, "Deterministic process groups in dOS," in Proceedings of the 9th USENIX conference on Operating systems design and implementation, 2010. Google Scholar
Digital Library
- E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. "Hoard: A scalable memory allocator for multithreaded applications," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), pages 117--128, Cambridge, MA, Nov. 2000. Google Scholar
Digital Library
- E. D. Berger, T. Yang, T. Liu, and G. Novark, "Grace: Safe multithreaded programming for C/C+," in OOPSLA, 2009, pp. 81--96. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Implications," in Proceedings of the 17th international conference on Parallel architectures and compilation techniques, 2008. Google Scholar
Digital Library
- R. L. Bocchino Jr, V. S. Adve, S. V. Adve, and M. Snir, "Parallel programming must be deterministic by default," in Proceedings of the First USENIX conference on Hot topics in parallelism, 2009, pp. 4--4. Google Scholar
Digital Library
- H.-J. Boehm, "Position Paper: Nondeterminism is Unavoidable, but Data Races are Pure Evil," in Proceedsing of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability (RACES), 2012. Google Scholar
Digital Library
- H.-J. Boehm and S. V. Adve, "Foundations of the C+ concurrency memory model," presented at the Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, Tucson, AZ, USA, 2008. Google Scholar
Digital Library
- H. Cui, J. Simsa, H. Li, B. Blum, X. Xu, J. Yang, G. A. Gibson, and R. E. Bryant, "Parrot: A Practical Runtime for Deterministic, Stable, and Reliable Threads," in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, Farmington, PA, USA, 2013. Google Scholar
Digital Library
- H. Cui, J. Wu, and J. Yang, "Stable deterministic multithreading through schedule memoization," in Proceedings of the 9th USENIX conference on Operating systems design and implementation, 2010. Google Scholar
Digital Library
- H. Cui, J. Wu, J. Gallagher, H. Guo, and J. Yang, "Efficient Deterministic Multithreading through Schedule Relaxation," in Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, Cascais, Portugal, 2011. Google Scholar
Digital Library
- J. Devietti, B. Lucia, L. Ceze, M. Oskin, "DMP: deterministic shared memory multiprocessing,"presented at the Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, Washington, DC, USA, 2009. Google Scholar
Digital Library
- J. Devietti, J. Nelson, T. Bergan, L. Ceze, and D. Grossman, "RCDC: a relaxed consistency deterministic computer," in Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, Newport Beach, California, USA, 2011, pp. 67--78. Google Scholar
Digital Library
- C. J. Fidge., "Partial orders for parallel debugging," in ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, January 1989, pp. 24(1): 183--194. Google Scholar
Digital Library
- M. Hill and M. Xu. Racey: A Stress Test for Deterministic Execution. Available: http://www.cs.wisc.edu/~markhill/racey.htmlGoogle Scholar
- D. R. Hower, P. Dudnik, M. D. Hill, and D. A. Wood, "Calvin: Deterministic or not? Free will to choose," in High Performance Computer Architecture (HPCA), 2011, pp. 333--334. Google Scholar
Digital Library
- B. Kasikci, C. Zamfir, and G. Candea. "Data Races vs. Data Race Bugs: Telling the Difference with Portend," in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012. Google Scholar
Digital Library
- P. Keleher, A. L. Cox, and W. Zwaenepoel, "Lazy release consistency for software distributed shared memory," SIGARCH Comput. Archit. News, vol. 20, pp. 13--21, 1992. Google Scholar
Digital Library
- P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel, "TreadMarks: distributed shared memory on standard workstations and operating systems," presented at the Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, San Francisco, California, 1994. Google Scholar
Digital Library
- T. J. LeBlanc and J. M. Mellor-Crummey, "Debugging parallel programs with instant replay," Computers, IEEE Transactions on, vol. 100, pp. 471--482, 1987. Google Scholar
Digital Library
- E. A. Lee, "The problem with threads," Computer, vol. 39, pp. 33--42, 2006. Google Scholar
Digital Library
- D. Lee, P. M. Chen, J. Flinn, and S. Narayanasamy, "Chimera: Hybrid Program Analysis for Determinism," presented at the Proceedings of the 2012 ACM SIGPLAN conference on Programming language design and implementation, Beijing, China, 2012. Google Scholar
Digital Library
- T. Liu, C. Curtsinger, and E. D. Berger, "DTHREADS: Efficient Deterministic Multithreading," in Proceedings of the 22nd ACM Symposium on Operating Systems Principles, 2011. Google Scholar
Digital Library
- T. Merrifield, and J. Eriksson, "Conversion: Multi-Version Concurrency Control for Main Memory Segments," in EuroSys, 2013. Google Scholar
Digital Library
- M. Olszewski, J. Ansel, and S. Amarasinghe, "Kendo: efficient deterministic multithreading in software," in Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, 2009, pp. 97--108. Google Scholar
Digital Library
- C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, "Evaluating MapReduce for Multi-core and Multiprocessor Systems," in Proceedings of the 13th International Symposium on High Performance Computer Architecture, Washington, DC, USA, 2007, pp. 13--24. Google Scholar
Digital Library
- D. Subhraveti and J. Nieh, "Record and transplay: partial checkpointing for replay debugging across heterogeneous systems," in SIGMETRICS 2011, pp. 109--120. Google Scholar
Digital Library
- K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, et al., "DoublePlay: Parallelizing Sequential Logging and Replay," in Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, Newport Beach, California, USA, 2011. Google Scholar
Digital Library
- V. M. Weaver and S. A. McKee, "Can hardware performance counters be trusted?," in IISWC, 2008, pp. 141--150.Google Scholar
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The SPLASH-2 programs: Characterization and methodological considerations," in Proceedings of the 22nd annual international symposium on Computer architecture, 1995, pp. 24--36. Google Scholar
Digital Library
- W. Xiong, S. Park, J. Zhang, Y. Zhou, and Z. Ma, "Ad hoc synchronization considered harmful," in Proceedings of the 9th USENIX conference on Operating Systems Design and Implementation, 2010, pp. 163--176. Google Scholar
Digital Library
- X. Zhou, K. Lu, X. Wang, and X. Li, "Exploiting parallelism in deterministic shared memory multiprocessing," J. Parallel Distrib. Comput., pp. 72(2012)716--727, 2012. Google Scholar
Digital Library
Index Terms
Efficient deterministic multithreading without global barriers
Recommendations
Lazy Determinism for Faster Deterministic Multithreading
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating SystemsDeterministic multithreading (DMT) fundamentally requires total, deterministic ordering of synchronization operations on each synchronization variable, i.e. a partial ordering over all synchronization operations. In practice, prior DMT systems totally ...
Efficient deterministic multithreading without global barriers
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programmingMultithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend ...
CoreDet: a compiler and runtime system for deterministic multithreaded execution
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsThe behavior of a multithreaded program does not depend only on its inputs. Scheduling, memory reordering, timing, and low-level hardware effects all introduce nondeterminism in the execution of multithreaded programs. This severely complicates many ...







Comments