Abstract
Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks nondeterminism from thread scheduling, system calls, CPU instructions, and leakage of environmental information such as date or CPU model. In this work, we present a system for achieving low-overhead deterministic execution of batch-processing programs that read and write the file system—turning them into pure functions on files.
We allow multi-process executions where a permissions system prevents races on the file system. Process separation enables different processes to enforce permissions and enforce determinism using distinct mechanisms. Our prototype, DetFlow, allows a statically-typed coordinator process to use shared-memory parallelism, as well as invoking process-trees of sandboxed legacy binaries. DetFlow currently implements the coordinator as a Haskell program with a restricted I/O type for its main function: a new monad we call DetIO. Legacy binaries launched by the coordinator run concurrently, but internally each process schedules threads sequentially, allowing dynamic determinism-enforcement with predictably low overhead.
We evaluate DetFlow by applying it to bioinformatics data pipelines and software build systems. DetFlow enables determinizing these data-processing workflows by porting a small amount of code to become a statically-typed coordinator. This hybrid approach of static and dynamic determinism enforcement permits freedom where possible but restrictions where necessary.
Supplemental Material
Available for Download
- Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data structures for parallel computing. ACM Trans. Program. Lang. Syst. 11 (October 1989), 598–632. Issue 4. DOI: Google Scholar
Digital Library
- Amittai Aviram, Shu-Chun Weng, Sen Hu, and Bryan Ford. 2010. Efficient system-enforced deterministic parallelism. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation.Google Scholar
Digital Library
- Juan Benet. 2014. IPFS - Content Addressed, Versioned, P2P File System. CoRR abs/1407.3561 (2014). http://arxiv.org/abs/ 1407.3561Google Scholar
- Tom Bergan, Nicholas Hunt, Luis Ceze, and Steven Gribble. 2010. Deterministic process groups in dOS. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation.Google Scholar
Digital Library
- Guy Blelloch. 1992. NESL: A Nested Data-Parallel Language. Technical Report CMU-CS-92-103. Carnegie Mellon University, Pittsburgh, PA.Google Scholar
- Robert Bocchino, Mohsen Vakilian, Vikram Adve, Danny Dig, Sarita Adve, Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, and Hyojin Sung. 2009. A Type and Effect System for Deterministic Parallel Java. In Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA ’09. Orlando, Florida, USA, 97. DOI: Google Scholar
Digital Library
- Robert L. Bocchino and Vikram S. Adve. 2011. Types, Regions, and Effects for Safe Programming with Object-oriented Parallel Frameworks. In Proceedings of the 25th European Conference on Object-oriented Programming (ECOOP’11). Springer-Verlag, Berlin, Heidelberg, 306–332. http://dl.acm.org/citation.cfm?id=2032497.2032519 Google Scholar
Cross Ref
- John Boyland. 2003. Checking Interference with Fractional Permissions. In Proceedings of the 10th International Conference on Static Analysis (SAS’03). Springer-Verlag, Berlin, Heidelberg, 55–72. http://dl.acm.org/citation.cfm?id=1760267.1760273 Google Scholar
Cross Ref
- Sebastian Burckhardt, Alexandro Baldassin, and Daan Leijen. 2010. Concurrent Programming with Revisions and Isolation Types. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’10). ACM, New York, NY, USA, 691–707. DOI: Google Scholar
Digital Library
- Luca Cardelli, James Donahue, Lucille Glassman, Mick Jordan, Bill Kalsow, and Greg Nelson. 1989. Modula-3 report (revised). Vol. 52. Digital Systems Research Center.Google Scholar
- Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, and Simon Marlow. 2007. Data Parallel Haskell: A Status Report. In Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming (DAMP ’07). ACM, New York, NY, USA, 10–18. DOI: Google Scholar
Digital Library
- Ramu Chenna, Hideaki Sugawara, Tadashi Koike, Rodrigo Lopez, Toby J. Gibson, Desmond G. Higgins, and Julie D. Thompson. 2003. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Research 31, 13 (2003), 3497. DOI: Google Scholar
Cross Ref
- Douglas Crockford. 2008. ADsafe: Making JavaScript Safe for Advertising. http://www.adsafe.org/ . (2008).Google Scholar
- Debian Wiki. 2016. ReproducibleBuilds. (2016). https://wiki.debian.org/ReproducibleBuilds?action=recall&rev=339 [Online; accessed 14-April-2017].Google Scholar
- David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M. Chen. 2014. Eidetic Systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, USA, 525–540. http://dl.acm.org/citation.cfm?id=2685048.2685090Google Scholar
Digital Library
- Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. 2009. DMP: Deterministic Shared Memory Multiprocessing. In Proceedings of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS ’09). Washington, DC, USA, 85. DOI: Google Scholar
Digital Library
- Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: A Relaxed Consistency Deterministic Computer. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems. Google Scholar
Digital Library
- Sean R. Eddy. 1998. Profile hidden Markov models. Bioinformatics 14, 9 (1998), 755–763. Google Scholar
Cross Ref
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI ’98). ACM, New York, NY, USA, 212–223. DOI: Google Scholar
Digital Library
- Kim Hazelwood, Greg Lueck, and Robert Cohn. 2009. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In Proceedings of the 2009 International Symposium on Memory Management (ISMM ’09). ACM, New York, NY, USA, 20–29. DOI: Google Scholar
Digital Library
- Derek R. Hower, Polina Dudnik, David A. Wood, and Mark D. Hill. 2011. Calvin: Deterministic or Not? Free Will to Choose. In Proceedings of the 17th International Symposium on High-Performance Computer Architecture (HPCA). Google Scholar
Cross Ref
- Nicholas Hunt, Tom Bergan, Luis Ceze, and Steven D. Gribble. 2013. DDOS: taming nondeterminism in distributed systems. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, Vol. 48. 499–508. DOI: Google Scholar
Digital Library
- Kai Lu, Xu Zhou, Tom Bergan, and Xiaoping Wang. 2014. Efficient Deterministic Multithreading Without Global Barriers. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Google Scholar
Digital Library
- Lindsey Kuper, Aaron Todd, Sam Tobin-Hochstadt, and Ryan R Newton. 2014a. Taming the parallel effect zoo: Extensible deterministic parallelism with LVish. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2. Google Scholar
Digital Library
- Lindsey Kuper, Aaron Turon, Neelakantan R. Krishnaswami, and Ryan R. Newton. 2014b. Freeze after writing: quasideterministic parallel programming with LVars. In POPL. 257–270. Google Scholar
Digital Library
- Daan Leijen, Manuel Fahndrich, and Sebastian Burckhardt. 2011. Prettier Concurrency: Purely Functional Concurrent Revisions. In Proceedings of the 4th ACM Symposium on Haskell (Haskell ’11). ACM, 83–94. Google Scholar
Digital Library
- Heng Li and Richard Durbin. 2010. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 5 (2010), 589. DOI: Google Scholar
Cross Ref
- Linux. 2015. ptrace(2) Linux User’s Manual.Google Scholar
- Tongping Liu, Charlie Curtsinger, and Emery D. Berger. 2011. Dthreads: efficient deterministic multithreading. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP ’11). New York, NY, USA, 327–336. DOI: Google Scholar
Digital Library
- Simon Marlow, Patrick Maier, Hans-Wolfgang Loidl, Mustafa K Aswad, and Phil Trinder. 2010. Seq no more: better strategies for parallel Haskell. In ACM Sigplan Notices, Vol. 45. ACM, 91–102.Google Scholar
Digital Library
- Simon Marlow, Ryan R. Newton, and Simon Peyton Jones. 2011. A monad for deterministic parallelism. In Proceedings of the 4th ACM symposium on Haskell (Haskell ’11). ACM, 71–82. Google Scholar
Digital Library
- Simon Marlow, Simon Peyton Jones, and Satnam Singh. 2009. Runtime Support for Multicore Haskell. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). ACM, New York, NY, USA, 65–78. DOI: Google Scholar
Digital Library
- Ali José Mashtizadeh, Tal Garfinkel, David Terei, David Mazieres, and Mendel Rosenblum. 2017. Towards Practical Default-On Multi-Core Record/Replay. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). ACM, New York, NY, USA, 693–708. DOI: Google Scholar
Digital Library
- John Mellor-Crummey. 1991. On-the-fly detection of data races for programs with nested fork-join parallelism. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing - Supercomputing ’91. Albuquerque, New Mexico, United States, 24–33. DOI: Google Scholar
Digital Library
- Timothy Merrifield, Joseph Devietti, and Jakob Eriksson. 2015. High-performance Determinism with Total Store Order Consistency. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys ’15). ACM, New York, NY, USA, Article 31, 13 pages. DOI: Google Scholar
Digital Library
- Timothy Merrifield and Jakob Eriksson. 2013. Conversion: multi-version concurrency control for main memory segments. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys ’13). ACM, New York, NY, USA, 127–139. DOI: Google Scholar
Digital Library
- Mozilla. 2015. rr: lightweight recording & deterministic debugging. (2015). http://rr-project.org/ [Online; accessed 16-April-2017].Google Scholar
- Andrew C. Myers. 1999. JFlow: practical mostly-static information flow control. In Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL ’99). New York, NY, USA, 228–241. DOI: Google Scholar
Digital Library
- Andrew C. Myers and Barbara Liskov. 1997. A decentralized model for information flow control. In Proceedings of the sixteenth ACM symposium on Operating systems principles (SOSP ’97). ACM, New York, NY, USA, 129–142. DOI: Google Scholar
Digital Library
- Andrew C. Myers and Barbara Liskov. 2000. Protecting privacy using the decentralized label model. ACM Transactions on Software Engineering and Methodology 9, 4 (Oct. 2000), 410–442. DOI: Google Scholar
Digital Library
- Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 28thACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), San Diego, CA, USA, June 10-13, 2007, Vol. 42. ACM, 89–100. Google Scholar
Digital Library
- Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2014. Deterministic Galois: On-demand, Portable and Parameterless. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’14). ACM, New York, NY, USA, 499–512. DOI: Google Scholar
Digital Library
- Nix. 2015. Nix: The Purely Functional Package Manager. (2015). https://nixos.org/nix/ [Online; accessed 16-April-2017].Google Scholar
- Marek Olszewski, Jason Ansel, and Saman Amarasinghe. 2009. Kendo: Efficient Deterministic Multithreading in Software. In Proceeding of the 14th international conference on Architectural support for programming languages and operating systems -ASPLOS ’09. Washington, DC, USA, 97. DOI: Google Scholar
Digital Library
- Harish Patil, Cristiano Pereira, Mack Stallcup, Gregory Lueck, and James Cownie. 2010. Pinplay: a framework for deterministic replay and reproducible analysis of parallel programs. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization. ACM, 2–11. Google Scholar
Digital Library
- Simon Peyton Jones, Alastair Reid, Fergus Henderson, Tony Hoare, and Simon Marlow. 1999. A semantics for imprecise exceptions. In ACM SIGPLAN Notices, Vol. 34. ACM, 25–36. Google Scholar
Digital Library
- Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2012. Scalable and precise dynamic datarace detection for structured parallelism. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation. 531–542. Google Scholar
Digital Library
- Tom Ridge, David Sheets, Thomas Tuerk, Andrea Giugliano, Anil Madhavapeddy, and Peter Sewell. 2015. SibylFS: Formal Specification and Oracle-based Testing for POSIX and Real-world File Systems. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15). ACM, New York, NY, USA, 38–53. DOI: Google Scholar
Digital Library
- Amr Sabry. 1998. What is a purely functional language? Journal of Functional Programming 8, 01 (1998), 1–22. Google Scholar
Digital Library
- Peter Jay Salzman. 2009. The Linux Kernel Module Programming Guide. CreateSpace, Paramount, CA.Google Scholar
- Patrick D Schloss, Sarah L Westcott, Thomas Ryabin, Justine R Hall, Martin Hartmann, Emily B Hollister, Ryan A Lesniewski, Brian B Oakley, Donovan H Parks, Courtney J Robinson, and others. 2009. Introducing mothur: open-source, platformindependent, community-supported software for describing and comparing microbial communities. Applied and environmental microbiology 75, 23 (2009), 7537–7541.Google Scholar
- Nir Shavit and Dan Touitou. 1995. Elimination Trees and the Construction of Pools and Stacks: Preliminary Version. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’95). ACM, New York, NY, USA, 54–63. DOI: Google Scholar
Digital Library
- Alexandros Stamatakis. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 9 (2014), 1312. DOI: Google Scholar
Cross Ref
- Ke Sun, Xiaoning Li, and Ya Ou. 2016. Break Out of the Truman Show: Active Detection and Escape of Dynamic Binary Instrumentation. In Black Hat Asia (Black Hat Asia ’16). https://www.blackhat.com/docs/asia-16/materials/ asia-16-Sun-Break-Out-Of-The-Truman-Show-Active-Detection-And-Escape-Of-Dynamic-Binary-Instrumentation. pdfGoogle Scholar
- Ankur Taly, Úlfar Erlingsson, John C. Mitchell, Mark S. Miller, and Jasvir Nagra. 2011. Automated Analysis of SecurityCritical JavaScript APIs. In Proceedings of the 2011 IEEE Symposium on Security and Privacy (SP ’11). IEEE Computer Society, Washington, DC, USA, 363–378. DOI: Google Scholar
Digital Library
- David Terei, Simon Marlow, Simon Peyton Jones, and David Mazières. 2012. Safe Haskell. In Proceedings of the 2012 Haskell Symposium (Haskell ’12). ACM, New York, NY, USA, 137–148. DOI: Google Scholar
Digital Library
- William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A Language for Streaming Applications. In Proceedings of the 11th International Conference on Compiler Construction. Google Scholar
Cross Ref
Index Terms
Monadic composition for deterministic, parallel batch processing
Recommendations
Lightweight concurrency primitives for GHC
Haskell '07: Proceedings of the ACM SIGPLAN workshop on Haskell workshopThe Glasgow Haskell Compiler (GHC) has quite sophisticated support for concurrency in its runtime system, which is written in low-level C code. As GHC evolves, the runtime system becomes increasingly complex, error-prone, difficult to maintain and ...
GPUDet: a deterministic GPU architecture
ASPLOS '13Nondeterminism is a key challenge in developing multithreaded applications. Even with the same input, each execution of a multithreaded program may produce a different output. This behavior complicates debugging and limits one's ability to test for ...
Combining events and threads for scalable network services implementation and evaluation of monadic, application-level concurrency primitives
PLDI '07: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and ImplementationThis paper proposes to combine two seemingly opposed programming models for building massively concurrent network services: the event-driven model and the multithreaded model. The result is a hybrid design that offers the best of both worlds--the ease ...






Comments