ABSTRACT
Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for parallel programmers by hindering their ability to create parallel applications with repeatable results. As a consequence, parallel applications are significantly harder to debug, test, and maintain than sequential programs.
This paper introduces Kendo: a new software-only system that provides deterministic multithreading of parallel applications. Kendo enforces a deterministic interleaving of lock acquisitions and specially declared non-protected reads through a novel dynamically load-balanced deterministic scheduling algorithm. The algorithm tracks the progress of each thread using performance counters to construct a deterministic logical time that is used to compute an interleaving of shared data accesses that is both deterministic and provides good load balancing. Kendo can run on today's commodity hardware while incurring only a modest performance cost. Experimental results on the SPLASH-2 applications yield a geometric mean overhead of only 16% when running on 4 processors. This low overhead makes it possible to benefit from Kendo even after an application is deployed. Programmers can start using Kendo today to program parallel applications that are easier to develop, debug, and test.
- Claudio Basile, Keith Whisnant, Zbigniew Kalbarczyk, and Ravi Iyer. Loose synchronization of multithreaded replicas. pages 250--255, 2002. Google Scholar
- R. Bocchino, V. Adve, S. Adve, and M. Snir. Parallel programming must be deterministic by default. Technical Report UIUCDCS-R-2008-3012, University of Illinois at Urbana-Champaign, 2008. URL http://dpj.cs.uiuc.edu/DPJ/Publications_files/paper.pdf.Google Scholar
- Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. Detecting data races in cilk programs that use locks. In proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA'98), pages 298--309, Puerto Vallarta, Mexico, June 28-July 2, 1998. Google Scholar
- Joe Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. Explicitly parallel programming with shared-memory is insane: At least make it deterministic! In proceedings of SHCMP 2008: Workshop on Software and Hardware Challenges of Manycore Platforms, Beijing, China, June 22, 2008.Google Scholar
- Rachna Dhamija and Adrian Perrig. Deja vu: a user study using images for authentication. In SSYM'00: Proceedings of the 9th conference on USENIX Security Symposium, pages 4--4, Berkeley, CA, USA, 2000. USENIX Association. Google Scholar
- Jorg Domaschka, Franz J. Hauck, Hans P. Reiser, and Rutiger Kapitza. Deterministic multithreading for java-based replicated objects. In proceedings of the International Conference on Parallel and Distributed Computing and Systems, 2006.Google Scholar
- Jorg Domaschka, Andreas I. Schmied, Hans P. Reiser, and Franz J. Hauck. Revisiting deterministic multithreading strategies. International Parallel and Distributed Processing Symposium, pages 1--8, 2007.Google Scholar
- George W. Dunlap, Dominic G. Lucchetti, Michael A. Fetterman, and Peter M. Chen. Execution replay of multiprocessor virtual machines. In VEE'08: Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, pages 121--130, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-796-4. Google Scholar
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In proceedings of the ACM SIGPLAN-98 Conference on Programming Language Design and Implementation, pages 212--223, Montreal, Quebec, Canada, June 1998. Proceedings published ACMSIGPLAN Notices, Vol. 33, No. 5, May, 1998. Google Scholar
- Derek R. Hower and Mark D. Hill. Rerun: Exploiting episodes for lightweight memory race recording. In proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008. Google Scholar
- Wolfgang Karl, Markus Leberecht, and Michael Oberhuber. Forcing deterministic execution of parallel programs -- debugging support through the smile monitoring approach. In proceedings of the SCI-Europe, September 1998.Google Scholar
- Milind Kulkarni, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala, and L. Paul Chew. Optimistic parallelism benefits from data partitioning. SIGARCH Comput. Archit. News, 36(1):233--243, 2008. ISSN 0163-5964. Google Scholar
- Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, 1978. ISSN 0001-0782. Google Scholar
- T. J. LeBlanc and J. M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Trans. Comput., 36(4):471--482, 1987. ISSN 0018-9340. Google Scholar
- Edward A. Lee. The problem with threads. Computer, 39(5):33--42, May 2006. ISSN 0018-9162. Google Scholar
Digital Library
- Shan Lu, Joe Tucek, Feng Qin, and Yuanyuan Zhou. Avio: Detecting atomicity violations via access-interleaving invariants. In proceedings of the International Conference on Architecture Support for Programming Languages and Operating Systems, October 2006. Google Scholar
- Shan Lu, Weihang Jiang, and Yuanyuan Zhou. A study of interleaving coverage criteria. In proceedings of ESEC-FSE, pages 533--536, New York, NY, USA, 2007a. ACM. ISBN 978-1-59593-811-4. Google Scholar
- Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca A. Popa, and Yuanyuan Zhou. Muvi: Automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In proceedings of the 21st ACM Symposium on Operating Systems Principles, October 2007b. Google Scholar
- Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In ASPLOS XIII: proceedings of the 13th international conference on Architectural support for programming languages and operating systems, pages 329--339, New York, NY, USA, 2008. ACM. ISBN 978-1-59593-958-6. Google Scholar
- Pablo Montesinos, Luis Ceze, and Josep Torrellas. Delorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008. Google Scholar
- Hans P. Reiser, Jorg Domaschka, Franz J. Hauck, Rudiger Kapitza, and Wolfgang Schroder-Preikschat. Consistent replication of multithreaded distributed objects. 25th IEEE Symposium on Reliable Distributed Systems, pages 257--266, 2006. ISSN 1060-9857. Google Scholar
- Jonathan Rose. Locusroute: a parallel global router for standard cells. In DAC-88: Proceedings of the 25th ACM/IEEE conference on Design automation, pages 189--195, Los Alamitos, CA, USA, 1988. IEEE Computer Society Press. ISBN 0-8186-8864-5. Google Scholar
Digital Library
- Mark Russinovich and Bryce Cogswell. Replay for concurrent non-deterministic shared-memory applications. In proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation, pages 258--266, New York, NY, USA, 1996. ACM. ISBN 0-89791-795-2. Google Scholar
- Debashis Saha and Sourav K. Dutta. Specification of deterministic execution timing schema for parallel programs on a multiprocessor. Computer, Communication, Control and Power Engineering, pages 114--116 vol.1, 1993.Google Scholar
- Jaswinder Pal Singh, Anoop Gupta, and Marc Levoy. Parallel visualization algorithms: Performance and architectural implications. Computer, 27(7):45--55, 1994. ISSN 0018-9162. Google Scholar
- William Thies, Michal Karczmarek, and Saman Amarasinghe. Streamit: A language for streaming applications. In International Conference on Compiler Construction, Grenoble, France, April 2002. Google Scholar
- Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanthos, and Yuanyuan Zhou. Triage: diagnosing production run failures at the user's site. In proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, pages 131--144, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-591-5. Google Scholar
- Larry D. Wittie. Debugging distributed C programs by real time reply. SIGPLAN Not., 24(1):57--67, 1989. ISSN 0362-1340. Google Scholar
- S.C. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. The SPLASH-2 programs: characterization and methodological considerations. In proceedings of 22nd Annual International Symposium on Computer Architecture News, pages 24--36, June 1995. Google Scholar
- Min Xu, Rastislav Bodik, and Mark D. Hill. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. SIGARCH Comput. Archit. News, 31(2):122--135, 2003. ISSN 0163-5964. Google Scholar
Index Terms
Kendo: efficient deterministic multithreading in software
Recommendations
CoreDet: a compiler and runtime system for deterministic multithreaded execution
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsThe behavior of a multithreaded program does not depend only on its inputs. Scheduling, memory reordering, timing, and low-level hardware effects all introduce nondeterminism in the execution of multithreaded programs. This severely complicates many ...
Kendo: efficient deterministic multithreading in software
ASPLOS 2009Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for parallel programmers by ...
Kendo: efficient deterministic multithreading in software
ASPLOS 2009Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded applications, causes significant challenges for parallel programmers by ...








Comments