10.1145/3373376.3378519acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections

Reproducible Containers

Published:13 March 2020Publication History

ABSTRACT

We describe the design and implementation of DetTrace, a reproducible container abstraction for Linux implemented in user space. All computation that occurs inside a DetTrace container is a pure function of the initial filesystem state of the container. Reproducible containers can be used for a variety of purposes, including replication for fault-tolerance, reproducible software builds and reproducible data analytics. We use DetTrace to achieve, in an automatic fashion, reproducibility for 12,130 Debian package builds, containing over 800 million lines of code, as well as bioinformatics and machine learning workflows. We show that, while software in each of these domains is initially irreproducible, DetTrace brings reproducibility without requiring any hardware, OS or application changes. DetTrace's performance is dictated by the frequency of system calls: IO-intensive software builds have an average overhead of 3.49x, while a compute-bound bioinformatics workflow is under 2%.

References

  1. Bazel. https://bazel.build/.Google ScholarGoogle Scholar
  2. ReproducibleBuilds. https://wiki.debian.org/ReproducibleBuilds.Google ScholarGoogle Scholar
  3. Pachyderm reproducible data science homepage. https://www.pachyderm.io.Google ScholarGoogle Scholar
  4. Code Ocean homepage. https://codeocean.com.Google ScholarGoogle Scholar
  5. John B. Carter, John K. Bennett, and Willy Zwaenepoel. Implementation and performance of munin. SIGOPS Oper. Syst. Rev., 25(5):152--164, September 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jonathan Goldstein, Ahmed Abdelhamid, Mike Barnett, Sebastian Burckhardt, Badrish Chandramouli, Darren Gehring, Niel Lebeck, Umar Farooq Minhas, Ryan Newton, Rahee Ghosh Peshawaria, et al. Ambrosia: Providing performant virtual resiliency for distributed applications. Technical report, Technical report, 2018. https://aka. ms/amb-tr, 2018.Google ScholarGoogle Scholar
  7. Fred B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv., 22(4):299--319, December 1990.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andreas Haeberlen, Petr Kouznetsov, and Peter Druschel. Peerreview: Practical accountability for distributed systems. SIGOPS Oper. Syst. Rev., 41(6):175--188, October 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Robert Bocchino, Mohsen Vakilian, Vikram Adve, Danny Dig, Sarita Adve, Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, and Hyojin Sung. A type and effect system for deterministic parallel java. In Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA '09, page 97, Orlando, Florida, USA, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Simon Marlow, Ryan R. Newton, and Simon Peyton Jones. A monad for deterministic parallelism. In Proceedings of the 4th ACM symposium on Haskell, Haskell '11, pages 71--82. ACM, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ryan G. Scott, Omar S. Navarro Leija, Joseph Devietti, and Ryan R. Newton. Monadic composition for deterministic, parallel batch processing. Proc. ACM Program. Lang., 1(OOPSLA):73:1--73:26, October 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Amittai Aviram, Shu-Chun Weng, Sen Hu, and Bryan Ford. Efficient system-enforced deterministic parallelism. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Nicholas Hunt, Tom Bergan, Luis Ceze, and Steven D. Gribble. DDOS: taming nondeterminism in distributed systems. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, volume 48, pages 499--508, March 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Tom Bergan, Nicholas Hunt, Luis Ceze, and Steven Gribble. Deterministic process groups in dOS. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Abadi, Daniel J. and Faleiro, Jose M. An Overview of Deterministic Database Systems. Communications of the ACM, 61(9):78--88, August 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Raymond Chen. Why are the module timestamps in Windows 10 so nonsensical? https://blogs.msdn.microsoft.com/oldnewthing/20180103-00/?p=97705.Google ScholarGoogle Scholar
  17. Jared Parsons. Deterministic builds in Roslyn. http://blog.paranoidcoding.com/2016/04/05/deterministic-builds-in-roslyn.html.Google ScholarGoogle Scholar
  18. tar: please add --clamp-mtime to only update mtimes after a given time. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=790415., June 2015.Google ScholarGoogle Scholar
  19. Ren, Zhilei and Jiang, He and Xuan, Jifeng and Yang, Zijiang. Automated Localization for Unreproducible Builds. In Proceedings of the 40th International Conference on Software Engineering, ICSE '18, pages 71--81, New York, NY, USA, 2018. ACM.Google ScholarGoogle Scholar
  20. National Academies of Sciences, Engineering, and Medicine. Reproducibility and Replicability in Science. The National Academies Press, Washington, DC, 2019.Google ScholarGoogle Scholar
  21. Rosemary Nan Ke and Alex Lamb and Olexa Bilaniuk and Anirudh Goyal and Yoshua Bengio. Reproducibility in Machine Learning: An ICLR 2019 Workshop. https://sites.google.com/view/icml-reproducibility-workshop/home.Google ScholarGoogle Scholar
  22. Daniel Maskit. Problems Getting TensorFlow to behave Deterministically. https://github.com/tensorflow/tensorflow/issues/16889.Google ScholarGoogle Scholar
  23. Jennifer Villa and Yoav Zimmerman. Reproducibility in ML: why it matters and how to achieve it. https://determined.ai/blog/reproducibility-in-ml/, May 2018.Google ScholarGoogle Scholar
  24. Li Lu and Michael L. Scott. Toward a formal semantic framework for deterministic parallel programming. In Proceedings of the 25th International Conference on Distributed Computing, DISC'11, page 460--474, Berlin, Heidelberg, 2011. Springer-Verlag.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lindsey Kuper, Aaron Turon, Neelakantan R. Krishnaswami, and Ryan R. Newton. Freeze after writing: quasi-deterministic parallel programming with lvars. In POPL, pages 257--270, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gerald J. Popek and Robert P. Goldberg. Formal requirements for virtualizable third generation architectures. Commun. ACM, 17(7):412--421, July 1974.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christopher Domas. Breaking the x86 Instruction Set. Black Hat, 2017. https://www.youtube.com/watch?v=KrksBdWcZgQ.Google ScholarGoogle Scholar
  28. Intel 64 and IA-32 Architectures Software Developer Manuals. https://software.intel.com/en-us/articles/intel-sdm.Google ScholarGoogle Scholar
  29. Robert O'Callahan, Chris Jones, Nathan Froyd, Kyle Huey, Albert Noll, and Nimrod Partush. Engineering record and replay for deployability: Extended technical report, 2017.Google ScholarGoogle Scholar
  30. Intel Xeon Processor E3--1200 v3 Product Family. https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3--1200v3-spec-update.pdf, July 2018. HSW136: Software Using Intel TSX May Result in Unpredictable System Behavior.Google ScholarGoogle Scholar
  31. Amittai Aviram, Sen Hu, Bryan Ford, and Ramakrishna Gummadi. Determinating timing channels in compute clouds. In Proceedings of the 2010 ACM workshop on Cloud computing security workshop, CCSW '10, page 103--108, New York, NY, USA, 2010. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Marek Olszewski, Jason Ansel, and Saman Amarasinghe. Kendo: Efficient deterministic multithreading in software. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pages 97--108, New York, NY, USA, 2009. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Alexandros Stamatakis. Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9):1312, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  34. Ramu Chenna, Hideaki Sugawara, Tadashi Koike, Rodrigo Lopez, Toby J. Gibson, Desmond G. Higgins, and Julie D. Thompson. Multiple sequence alignment with the clustal series of programs. Nucleic Acids Research, 31(13):3497, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  35. Sean R. Eddy. Profile hidden markov models. Bioinformatics, 14(9):755--763, 1998.Google ScholarGoogle Scholar
  36. tensorflow. Models and examples built with TensorFlow. https://github.com/tensorflow/models/tree/master/tutorials/image. Commit 583408.Google ScholarGoogle Scholar
  37. Reprotest GitLab page. https://salsa.debian.org/reproducible-builds/reprotest.Google ScholarGoogle Scholar
  38. strip-nondeterminism Debian Package Description. https://packages.debian.org/sid/strip-nondeterminism.Google ScholarGoogle Scholar
  39. Packages in Stretch/Amd64 Which Failed to Build Reproducibly. https://tests.reproducible-builds.org/debian/stretch/amd64/index_FTBR.html.Google ScholarGoogle Scholar
  40. Thomas J. Leblanc and John M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Transactions on Computers, C-36(4):471--482, April 1987.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Michiel Ronsse and Koen De Bosschere. RecPlay: a fully integrated practical record/replay system. ACM Transactions on Computer Systems, 17(2):133--152, May 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M. Chen. Eidetic systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 525--540, Berkeley, CA, USA, 2014. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn. Respec: efficient online multiprocessor replayvia speculation and external determinism. In Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10, page 77, Pittsburgh, Pennsylvania, USA, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ali José Mashtizadeh, Tal Garfinkel, David Terei, David Mazieres, and Mendel Rosenblum. Towards practical default-on multi-core record/replay. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, pages 693--708, New York, NY, USA, 2017. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza A. Basrai, and Peter M. Chen. Revirt: Enabling intrusion analysis through virtual-machine logging and replay. SIGOPS Oper. Syst. Rev., 36(SI):211--224, December 2002.Google ScholarGoogle Scholar
  46. George W. Dunlap, Dominic G. Lucchetti, Michael A. Fetterman, and Peter M. Chen. Execution replay of multiprocessor virtual machines. In Proceedings of the Fourth ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '08, pages 121--130, New York, NY, USA, 2008. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Robert OtextquoterightCallahan, Chris Jones, Nathan Froyd, Kyle Huey, Albert Noll, and Nimrod Partush. Engineering record and replay for deployability. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 377--389, Santa Clara, CA, 2017. USENIX Association.Google ScholarGoogle Scholar
  48. VMware: VMware workstation zealot: Enhanced execution record / replay in workstation 6.5, April 2008.Google ScholarGoogle Scholar
  49. Program Record/Replay Toolkit. https://software.intel.com/en-us/articles/program-recordreplay-toolkit.Google ScholarGoogle Scholar
  50. Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. DMP: deterministic shared memory multiprocessing. In Proceedings of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS '09), page 85, Washington, DC, USA, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. Coredet: a compiler and runtime system for deterministic multithreaded execution. In ACM SIGARCH Computer Architecture News, volume 38, pages 53--64. ACM, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Derek R. Hower, Polina Dudnik, David A. Wood, and Mark D. Hill. Calvin: Deterministic or not? free will to choose. In Proceedings of the 17th International Symposium on High-Performance Computer Architecture (HPCA), 2011.Google ScholarGoogle ScholarCross RefCross Ref
  53. Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. RCDC: a relaxed consistency deterministic computer. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Tongping Liu, Charlie Curtsinger, and Emery D. Berger. Dthreads: efficient deterministic multithreading. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 327--336, New York, NY, USA, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Timothy Merrifield and Jakob Eriksson. Conversion: multi-version concurrency control for main memory segments. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 127--139, New York, NY, USA, 2013. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Kai Lu, Xu Zhou, Tom Bergan, and Xiaoping Wang. Efficient deterministic multithreading without global barriers. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, February 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Timothy Merrifield, Joseph Devietti, and Jakob Eriksson. High-performance determinism with total store order consistency. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 31:1--31:13, New York, NY, USA, 2015. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Timothy Merrifield, Sepideh Roghanchi, Joseph Devietti, and Jakob Eriksson. Lazy determinism for faster deterministic multithreading. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '19, pages 879--891, New York, NY, USA, 2019. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, and Simon Marlow. Data parallel haskell: A status report. In Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming, DAMP '07, pages 10--18, New York, NY, USA, 2007. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Simon Marlow, Patrick Maier, Hans-Wolfgang Loidl, Mustafa K Aswad, and Phil Trinder. Seq no more: better strategies for parallel haskell. In ACM Sigplan Notices, volume 45, pages 91--102. ACM, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Ryan R. Newton, Ömer S. Augacan, Peter Fogg, and Sam Tobin-Hochstadt. Parallel type-checking with haskell using saturating lvars and stream generators. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '16, pages 6:1--6:12, New York, NY, USA, 2016. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Daan Leijen, Manuel Fahndrich, and Sebastian Burckhardt. Prettier concurrency: Purely functional concurrent revisions. In Proceedings of the 4th ACM Symposium on Haskell, Haskell '11, pages 83--94. ACM, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Robert L. Bocchino and Vikram S. Adve. Types, regions, and effects for safe programming with object-oriented parallel frameworks. In Proceedings of the 25th European Conference on Object-oriented Programming, ECOOP'11, pages 306--332, Berlin, Heidelberg, 2011. Springer-Verlag.Google ScholarGoogle Scholar

Index Terms

  1. Reproducible Containers

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!