skip to main content
research-article

DDOS: taming nondeterminism in distributed systems

Published:16 March 2013Publication History
Skip Abstract Section

Abstract

Nondeterminism complicates the development and management of distributed systems, and arises from two main sources: the local behavior of each individual node as well as the behavior of the network connecting them. Taming nondeterminism effectively requires dealing with both sources.

This paper proposes DDOS, a system that leverages prior work on deterministic multithreading to offer: 1) space-efficient record/replay of distributed systems; and 2) fully deterministic distributed behavior. Leveraging deterministic behavior at each node makes outgoing messages strictly a function of explicit inputs. This allows us to record the system by logging just message's arrival time, not the contents. Going further, we propose and implement an algorithm that makes all communication between nodes deterministic by scheduling communication onto a global logical timeline.

We implement both algorithms in a system called DDOS and evaluate our system with parallel scientific applications, an HTTP/memcached system and a distributed microbenchmark with a high volume of peer-to-peer communication. Our results show up to two orders of magnitude reduction in log size of record/replay, and that distributed systems can be made deterministic with an order of magnitude of overhead.

References

  1. A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient System-Enforced Deterministic Parallelism. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Basrai and P. M. Chen. Cooperative revirt: Adapting message logging for intrusion analysis. Technical Report CSE-TR-504-04, University of Michigan, 2004.Google ScholarGoogle Scholar
  3. T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution. In ASPLOS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic Process Groups in dOS. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Berger, T. Yang, T. Liu, and G. Novark. Grace: Safe and Efficient Concurrent Programming. In OOPSLA, 2009.Google ScholarGoogle Scholar
  6. K. P. Birman. The Process Group Approach to Reliable Distributed Computing. Communications of the ACM, 36(12), December 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Choi and H. Srinivasan. Deterministic Replay of Java Multithreaded Applications. In SIGMETRICS SPDT, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: Deterministic Shared Memory Multiprocessing. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Devietti, J. Nelson, T. Bergan, L. Ceze, and D. Grossman. RCDC: A Relaxed Consistency Deterministic Computer. In ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Dunlap, S. King, S. Cinar, M. Basrai, and P. Chen. ReVirt: Enabling Intrusion Analysis Through Virtual-Machine Logging and Replay. In OSDI, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Dunlap, D. Lucchetti, M. Fetterman, and P. Chen. Execution replay of multiprocessor virtual machines. In VEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. A. Edwards and O. Tardieu. SHIM: A Deterministic Model for Heterogeneous Embedded Systems. In EMSOFT, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32:374--382, April 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Geels, G. Altekar, S. Shenker, and I. Stoica. Abstract replay debugging for distributed applications. In USENIX Annual Technical Conference, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Hill and M. Xu. Racey: A Stress Test for Deterministic Execution. http://www.cs.wisc.edu/ markhill/racey.html.Google ScholarGoogle Scholar
  16. D. Hower, P. Dudnik, D. Wood, and M. Hill. Calvin: Deterministic or Not? Free Will to Choose. In HPCA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Kahn. The Semantics of a Simple Language for Parallel Programming. Information Processing, pages 471--475, 1974.Google ScholarGoogle Scholar
  18. R. Konuru. Deterministic replay of distributed java applications. In In Proceedings of the 14th IEEE International Parallel and Distributed Processing Symposium, pages 219--228, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Laadan, N. Viennot, and J. Nieh. Transparent, Lightweight Application Execution Replay on Commodity Multiprocessor Operating Systems. In SIGMETRICS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM, 21(7), July 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Lamport. Distributed Snapshots: Determining Global States of Distributed Systems. ACM TOCS, 3(1), 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Lamport. The Part-Time Parliament. ACM TOCS, 16(2), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. J. LeBlanc and J. M. Mellor-Crummey. Debugging Parallel Programs with Instant Replay. IEEE TC, 36(4), 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. Architecting Phase Change Memory as a Scalable DRAM Alternative. In ISCA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. NASA Advanced Supercomputing Division. The NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.Google ScholarGoogle Scholar
  26. M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: Efficient Deterministic Multithreading in Software. In ASPLOS, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast Crash Recovery in RAMCloud. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Ronsse and K. D. Bosschere. RecPlay: A Fully Integrated Practical Record/Replay System. ACM TOCS, 17(2), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Saito. Jockey: A User-Space Library for Record-Replay Debugging. In International Symposium on Automated Analysis-driven Debugging, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Thomson and D. J. Abadi. The case for determinism in database systems. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Van Renesse, K. Birman, and S. Maffeis. Horus: A Flexible Group Communication System. Communications of the ACM, 39(4), April 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: Parallelizing Sequential Logging and Replay. In ASPLOS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DDOS: taming nondeterminism in distributed systems

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 48, Issue 4
        ASPLOS '13
        April 2013
        540 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2499368
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '13: Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
          March 2013
          574 pages
          ISBN:9781450318709
          DOI:10.1145/2451116

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 March 2013

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!