skip to main content
research-article

Efficient deterministic multithreading without global barriers

Authors Info & Claims
Published:06 February 2014Publication History
Skip Abstract Section

Abstract

Multithreaded programs execute nondeterministically on conventional architectures and operating systems. This complicates many tasks, including debugging and testing. Deterministic multithreading (DMT) makes the output of a multithreaded program depend on its inputs only, which can totally solve the above problem. However, current DMT implementations suffer from a common inefficiency: they use frequent global barriers to enforce a deterministic ordering on memory accesses. In this paper, we eliminate that inefficiency using an execution model we call deterministic lazy release consistency (DLRC). Our execution model uses the Kendo algorithm to enforce a deterministic ordering on synchronization, and it uses a deterministic version of the lazy release consistency memory model to propagate memory updates across threads. Our approach guarantees that programs execute deterministically even when they contain data races. We implemented a DMT system based on these ideas (RFDet) and evaluated it using 16 parallel applications. Our implementation targets C/C++ programs that use POSIX threads. Results show that RFDet gains nearly 2x speedup compared with DThreads-a start-of-the-art DMT system.

References

  1. S. V. Adve and J. K. Aggarwal, "A Unified Formalization of Four Shared-Memory Models," IEEE Trans. Parallel Distrib. Syst., vol. 4, pp. 613--624, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. V. Adve and M. D. Hill, "Weak ordering--a new definition," presented at the Proceedings of the 17th annual international symposium on Computer Architecture, Seattle, Washington, USA, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Amittai, W. Shu-Chun, H. Sen, and F. Bryan, "Efficient system-enforced deterministic parallelism," presented at the Proceedings of the 9th USENIX conference on Operating systems design and implementation, Vancouver, BC, Canada, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman, "CoreDet: a compiler and runtime system for deterministic multithreaded execution," presented at the Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems, Pittsburgh, Pennsylvania, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Bergan, L. Ceze, and D. Grossman, "Input-Covering Schedules for Multithreaded Programs," in Proceedings of the Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), Indianapolis, Indiana, USA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Bergan, J. Devietti, N. Hunt, and L. Ceze, "The Deterministic Execution Hammer: How Well Does it Actually Pound Nails?," in WoDET, 2011.Google ScholarGoogle Scholar
  7. T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble, "Deterministic process groups in dOS," in Proceedings of the 9th USENIX conference on Operating systems design and implementation, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. D. Berger, K. S. McKinley, R. D. Blumofe, and P. R. Wilson. "Hoard: A scalable memory allocator for multithreaded applications," in Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), pages 117--128, Cambridge, MA, Nov. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. D. Berger, T. Yang, T. Liu, and G. Novark, "Grace: Safe multithreaded programming for C/C+," in OOPSLA, 2009, pp. 81--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Bienia, S. Kumar, J. P. Singh, and K. Li, "The PARSEC Benchmark Suite: Characterization and Architectural Implications," in Proceedings of the 17th international conference on Parallel architectures and compilation techniques, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. L. Bocchino Jr, V. S. Adve, S. V. Adve, and M. Snir, "Parallel programming must be deterministic by default," in Proceedings of the First USENIX conference on Hot topics in parallelism, 2009, pp. 4--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H.-J. Boehm, "Position Paper: Nondeterminism is Unavoidable, but Data Races are Pure Evil," in Proceedsing of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability (RACES), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H.-J. Boehm and S. V. Adve, "Foundations of the C+ concurrency memory model," presented at the Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, Tucson, AZ, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Cui, J. Simsa, H. Li, B. Blum, X. Xu, J. Yang, G. A. Gibson, and R. E. Bryant, "Parrot: A Practical Runtime for Deterministic, Stable, and Reliable Threads," in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, Farmington, PA, USA, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Cui, J. Wu, and J. Yang, "Stable deterministic multithreading through schedule memoization," in Proceedings of the 9th USENIX conference on Operating systems design and implementation, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Cui, J. Wu, J. Gallagher, H. Guo, and J. Yang, "Efficient Deterministic Multithreading through Schedule Relaxation," in Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, Cascais, Portugal, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Devietti, B. Lucia, L. Ceze, M. Oskin, "DMP: deterministic shared memory multiprocessing,"presented at the Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, Washington, DC, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Devietti, J. Nelson, T. Bergan, L. Ceze, and D. Grossman, "RCDC: a relaxed consistency deterministic computer," in Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, Newport Beach, California, USA, 2011, pp. 67--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. J. Fidge., "Partial orders for parallel debugging," in ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging, January 1989, pp. 24(1): 183--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Hill and M. Xu. Racey: A Stress Test for Deterministic Execution. Available: http://www.cs.wisc.edu/~markhill/racey.htmlGoogle ScholarGoogle Scholar
  21. D. R. Hower, P. Dudnik, M. D. Hill, and D. A. Wood, "Calvin: Deterministic or not? Free will to choose," in High Performance Computer Architecture (HPCA), 2011, pp. 333--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Kasikci, C. Zamfir, and G. Candea. "Data Races vs. Data Race Bugs: Telling the Difference with Portend," in Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Keleher, A. L. Cox, and W. Zwaenepoel, "Lazy release consistency for software distributed shared memory," SIGARCH Comput. Archit. News, vol. 20, pp. 13--21, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel, "TreadMarks: distributed shared memory on standard workstations and operating systems," presented at the Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference, San Francisco, California, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. J. LeBlanc and J. M. Mellor-Crummey, "Debugging parallel programs with instant replay," Computers, IEEE Transactions on, vol. 100, pp. 471--482, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. A. Lee, "The problem with threads," Computer, vol. 39, pp. 33--42, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Lee, P. M. Chen, J. Flinn, and S. Narayanasamy, "Chimera: Hybrid Program Analysis for Determinism," presented at the Proceedings of the 2012 ACM SIGPLAN conference on Programming language design and implementation, Beijing, China, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Liu, C. Curtsinger, and E. D. Berger, "DTHREADS: Efficient Deterministic Multithreading," in Proceedings of the 22nd ACM Symposium on Operating Systems Principles, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Merrifield, and J. Eriksson, "Conversion: Multi-Version Concurrency Control for Main Memory Segments," in EuroSys, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Olszewski, J. Ansel, and S. Amarasinghe, "Kendo: efficient deterministic multithreading in software," in Proceeding of the 14th international conference on Architectural support for programming languages and operating systems, 2009, pp. 97--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis, "Evaluating MapReduce for Multi-core and Multiprocessor Systems," in Proceedings of the 13th International Symposium on High Performance Computer Architecture, Washington, DC, USA, 2007, pp. 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Subhraveti and J. Nieh, "Record and transplay: partial checkpointing for replay debugging across heterogeneous systems," in SIGMETRICS 2011, pp. 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, et al., "DoublePlay: Parallelizing Sequential Logging and Replay," in Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, Newport Beach, California, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. V. M. Weaver and S. A. McKee, "Can hardware performance counters be trusted?," in IISWC, 2008, pp. 141--150.Google ScholarGoogle Scholar
  35. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The SPLASH-2 programs: Characterization and methodological considerations," in Proceedings of the 22nd annual international symposium on Computer architecture, 1995, pp. 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. W. Xiong, S. Park, J. Zhang, Y. Zhou, and Z. Ma, "Ad hoc synchronization considered harmful," in Proceedings of the 9th USENIX conference on Operating Systems Design and Implementation, 2010, pp. 163--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. X. Zhou, K. Lu, X. Wang, and X. Li, "Exploiting parallelism in deterministic shared memory multiprocessing," J. Parallel Distrib. Comput., pp. 72(2012)716--727, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient deterministic multithreading without global barriers

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!