skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

Monadic composition for deterministic, parallel batch processing

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

Achieving determinism on real software systems remains difficult. Even a batch-processing job, whose task is to map input bits to output bits, risks nondeterminism from thread scheduling, system calls, CPU instructions, and leakage of environmental information such as date or CPU model. In this work, we present a system for achieving low-overhead deterministic execution of batch-processing programs that read and write the file system—turning them into pure functions on files.

We allow multi-process executions where a permissions system prevents races on the file system. Process separation enables different processes to enforce permissions and enforce determinism using distinct mechanisms. Our prototype, DetFlow, allows a statically-typed coordinator process to use shared-memory parallelism, as well as invoking process-trees of sandboxed legacy binaries. DetFlow currently implements the coordinator as a Haskell program with a restricted I/O type for its main function: a new monad we call DetIO. Legacy binaries launched by the coordinator run concurrently, but internally each process schedules threads sequentially, allowing dynamic determinism-enforcement with predictably low overhead.

We evaluate DetFlow by applying it to bioinformatics data pipelines and software build systems. DetFlow enables determinizing these data-processing workflows by porting a small amount of code to become a statically-typed coordinator. This hybrid approach of static and dynamic determinism enforcement permits freedom where possible but restrictions where necessary.

Skip Supplemental Material Section

Supplemental Material

References

  1. Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data structures for parallel computing. ACM Trans. Program. Lang. Syst. 11 (October 1989), 598–632. Issue 4. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amittai Aviram, Shu-Chun Weng, Sen Hu, and Bryan Ford. 2010. Efficient system-enforced deterministic parallelism. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Juan Benet. 2014. IPFS - Content Addressed, Versioned, P2P File System. CoRR abs/1407.3561 (2014). http://arxiv.org/abs/ 1407.3561Google ScholarGoogle Scholar
  4. Tom Bergan, Nicholas Hunt, Luis Ceze, and Steven Gribble. 2010. Deterministic process groups in dOS. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Guy Blelloch. 1992. NESL: A Nested Data-Parallel Language. Technical Report CMU-CS-92-103. Carnegie Mellon University, Pittsburgh, PA.Google ScholarGoogle Scholar
  6. Robert Bocchino, Mohsen Vakilian, Vikram Adve, Danny Dig, Sarita Adve, Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, and Hyojin Sung. 2009. A Type and Effect System for Deterministic Parallel Java. In Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA ’09. Orlando, Florida, USA, 97. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Robert L. Bocchino and Vikram S. Adve. 2011. Types, Regions, and Effects for Safe Programming with Object-oriented Parallel Frameworks. In Proceedings of the 25th European Conference on Object-oriented Programming (ECOOP’11). Springer-Verlag, Berlin, Heidelberg, 306–332. http://dl.acm.org/citation.cfm?id=2032497.2032519 Google ScholarGoogle ScholarCross RefCross Ref
  8. John Boyland. 2003. Checking Interference with Fractional Permissions. In Proceedings of the 10th International Conference on Static Analysis (SAS’03). Springer-Verlag, Berlin, Heidelberg, 55–72. http://dl.acm.org/citation.cfm?id=1760267.1760273 Google ScholarGoogle ScholarCross RefCross Ref
  9. Sebastian Burckhardt, Alexandro Baldassin, and Daan Leijen. 2010. Concurrent Programming with Revisions and Isolation Types. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’10). ACM, New York, NY, USA, 691–707. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Luca Cardelli, James Donahue, Lucille Glassman, Mick Jordan, Bill Kalsow, and Greg Nelson. 1989. Modula-3 report (revised). Vol. 52. Digital Systems Research Center.Google ScholarGoogle Scholar
  11. Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, and Simon Marlow. 2007. Data Parallel Haskell: A Status Report. In Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming (DAMP ’07). ACM, New York, NY, USA, 10–18. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ramu Chenna, Hideaki Sugawara, Tadashi Koike, Rodrigo Lopez, Toby J. Gibson, Desmond G. Higgins, and Julie D. Thompson. 2003. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Research 31, 13 (2003), 3497. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  13. Douglas Crockford. 2008. ADsafe: Making JavaScript Safe for Advertising. http://www.adsafe.org/ . (2008).Google ScholarGoogle Scholar
  14. Debian Wiki. 2016. ReproducibleBuilds. (2016). https://wiki.debian.org/ReproducibleBuilds?action=recall&rev=339 [Online; accessed 14-April-2017].Google ScholarGoogle Scholar
  15. David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M. Chen. 2014. Eidetic Systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (OSDI’14). USENIX Association, Berkeley, CA, USA, 525–540. http://dl.acm.org/citation.cfm?id=2685048.2685090Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. 2009. DMP: Deterministic Shared Memory Multiprocessing. In Proceedings of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS ’09). Washington, DC, USA, 85. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: A Relaxed Consistency Deterministic Computer. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sean R. Eddy. 1998. Profile hidden Markov models. Bioinformatics 14, 9 (1998), 755–763. Google ScholarGoogle ScholarCross RefCross Ref
  19. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI ’98). ACM, New York, NY, USA, 212–223. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kim Hazelwood, Greg Lueck, and Robert Cohn. 2009. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In Proceedings of the 2009 International Symposium on Memory Management (ISMM ’09). ACM, New York, NY, USA, 20–29. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Derek R. Hower, Polina Dudnik, David A. Wood, and Mark D. Hill. 2011. Calvin: Deterministic or Not? Free Will to Choose. In Proceedings of the 17th International Symposium on High-Performance Computer Architecture (HPCA). Google ScholarGoogle ScholarCross RefCross Ref
  22. Nicholas Hunt, Tom Bergan, Luis Ceze, and Steven D. Gribble. 2013. DDOS: taming nondeterminism in distributed systems. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, Vol. 48. 499–508. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kai Lu, Xu Zhou, Tom Bergan, and Xiaoping Wang. 2014. Efficient Deterministic Multithreading Without Global Barriers. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lindsey Kuper, Aaron Todd, Sam Tobin-Hochstadt, and Ryan R Newton. 2014a. Taming the parallel effect zoo: Extensible deterministic parallelism with LVish. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lindsey Kuper, Aaron Turon, Neelakantan R. Krishnaswami, and Ryan R. Newton. 2014b. Freeze after writing: quasideterministic parallel programming with LVars. In POPL. 257–270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Daan Leijen, Manuel Fahndrich, and Sebastian Burckhardt. 2011. Prettier Concurrency: Purely Functional Concurrent Revisions. In Proceedings of the 4th ACM Symposium on Haskell (Haskell ’11). ACM, 83–94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Heng Li and Richard Durbin. 2010. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 5 (2010), 589. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  28. Linux. 2015. ptrace(2) Linux User’s Manual.Google ScholarGoogle Scholar
  29. Tongping Liu, Charlie Curtsinger, and Emery D. Berger. 2011. Dthreads: efficient deterministic multithreading. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP ’11). New York, NY, USA, 327–336. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Simon Marlow, Patrick Maier, Hans-Wolfgang Loidl, Mustafa K Aswad, and Phil Trinder. 2010. Seq no more: better strategies for parallel Haskell. In ACM Sigplan Notices, Vol. 45. ACM, 91–102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Simon Marlow, Ryan R. Newton, and Simon Peyton Jones. 2011. A monad for deterministic parallelism. In Proceedings of the 4th ACM symposium on Haskell (Haskell ’11). ACM, 71–82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Simon Marlow, Simon Peyton Jones, and Satnam Singh. 2009. Runtime Support for Multicore Haskell. In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’09). ACM, New York, NY, USA, 65–78. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ali José Mashtizadeh, Tal Garfinkel, David Terei, David Mazieres, and Mendel Rosenblum. 2017. Towards Practical Default-On Multi-Core Record/Replay. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). ACM, New York, NY, USA, 693–708. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. John Mellor-Crummey. 1991. On-the-fly detection of data races for programs with nested fork-join parallelism. In Proceedings of the 1991 ACM/IEEE conference on Supercomputing - Supercomputing ’91. Albuquerque, New Mexico, United States, 24–33. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Timothy Merrifield, Joseph Devietti, and Jakob Eriksson. 2015. High-performance Determinism with Total Store Order Consistency. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys ’15). ACM, New York, NY, USA, Article 31, 13 pages. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Timothy Merrifield and Jakob Eriksson. 2013. Conversion: multi-version concurrency control for main memory segments. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys ’13). ACM, New York, NY, USA, 127–139. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Mozilla. 2015. rr: lightweight recording & deterministic debugging. (2015). http://rr-project.org/ [Online; accessed 16-April-2017].Google ScholarGoogle Scholar
  38. Andrew C. Myers. 1999. JFlow: practical mostly-static information flow control. In Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages (POPL ’99). New York, NY, USA, 228–241. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Andrew C. Myers and Barbara Liskov. 1997. A decentralized model for information flow control. In Proceedings of the sixteenth ACM symposium on Operating systems principles (SOSP ’97). ACM, New York, NY, USA, 129–142. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Andrew C. Myers and Barbara Liskov. 2000. Protecting privacy using the decentralized label model. ACM Transactions on Software Engineering and Methodology 9, 4 (Oct. 2000), 410–442. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Nicholas Nethercote and Julian Seward. 2007. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the 28thACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), San Diego, CA, USA, June 10-13, 2007, Vol. 42. ACM, 89–100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2014. Deterministic Galois: On-demand, Portable and Parameterless. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’14). ACM, New York, NY, USA, 499–512. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Nix. 2015. Nix: The Purely Functional Package Manager. (2015). https://nixos.org/nix/ [Online; accessed 16-April-2017].Google ScholarGoogle Scholar
  44. Marek Olszewski, Jason Ansel, and Saman Amarasinghe. 2009. Kendo: Efficient Deterministic Multithreading in Software. In Proceeding of the 14th international conference on Architectural support for programming languages and operating systems -ASPLOS ’09. Washington, DC, USA, 97. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Harish Patil, Cristiano Pereira, Mack Stallcup, Gregory Lueck, and James Cownie. 2010. Pinplay: a framework for deterministic replay and reproducible analysis of parallel programs. In Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization. ACM, 2–11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Simon Peyton Jones, Alastair Reid, Fergus Henderson, Tony Hoare, and Simon Marlow. 1999. A semantics for imprecise exceptions. In ACM SIGPLAN Notices, Vol. 34. ACM, 25–36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin Vechev, and Eran Yahav. 2012. Scalable and precise dynamic datarace detection for structured parallelism. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation. 531–542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Tom Ridge, David Sheets, Thomas Tuerk, Andrea Giugliano, Anil Madhavapeddy, and Peter Sewell. 2015. SibylFS: Formal Specification and Oracle-based Testing for POSIX and Real-world File Systems. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15). ACM, New York, NY, USA, 38–53. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Amr Sabry. 1998. What is a purely functional language? Journal of Functional Programming 8, 01 (1998), 1–22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Peter Jay Salzman. 2009. The Linux Kernel Module Programming Guide. CreateSpace, Paramount, CA.Google ScholarGoogle Scholar
  51. Patrick D Schloss, Sarah L Westcott, Thomas Ryabin, Justine R Hall, Martin Hartmann, Emily B Hollister, Ryan A Lesniewski, Brian B Oakley, Donovan H Parks, Courtney J Robinson, and others. 2009. Introducing mothur: open-source, platformindependent, community-supported software for describing and comparing microbial communities. Applied and environmental microbiology 75, 23 (2009), 7537–7541.Google ScholarGoogle Scholar
  52. Nir Shavit and Dan Touitou. 1995. Elimination Trees and the Construction of Pools and Stacks: Preliminary Version. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’95). ACM, New York, NY, USA, 54–63. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Alexandros Stamatakis. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 9 (2014), 1312. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  54. Ke Sun, Xiaoning Li, and Ya Ou. 2016. Break Out of the Truman Show: Active Detection and Escape of Dynamic Binary Instrumentation. In Black Hat Asia (Black Hat Asia ’16). https://www.blackhat.com/docs/asia-16/materials/ asia-16-Sun-Break-Out-Of-The-Truman-Show-Active-Detection-And-Escape-Of-Dynamic-Binary-Instrumentation. pdfGoogle ScholarGoogle Scholar
  55. Ankur Taly, Úlfar Erlingsson, John C. Mitchell, Mark S. Miller, and Jasvir Nagra. 2011. Automated Analysis of SecurityCritical JavaScript APIs. In Proceedings of the 2011 IEEE Symposium on Security and Privacy (SP ’11). IEEE Computer Society, Washington, DC, USA, 363–378. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. David Terei, Simon Marlow, Simon Peyton Jones, and David Mazières. 2012. Safe Haskell. In Proceedings of the 2012 Haskell Symposium (Haskell ’12). ACM, New York, NY, USA, 137–148. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A Language for Streaming Applications. In Proceedings of the 11th International Conference on Compiler Construction. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Monadic composition for deterministic, parallel batch processing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!