ABSTRACT
We describe the design and implementation of DetTrace, a reproducible container abstraction for Linux implemented in user space. All computation that occurs inside a DetTrace container is a pure function of the initial filesystem state of the container. Reproducible containers can be used for a variety of purposes, including replication for fault-tolerance, reproducible software builds and reproducible data analytics. We use DetTrace to achieve, in an automatic fashion, reproducibility for 12,130 Debian package builds, containing over 800 million lines of code, as well as bioinformatics and machine learning workflows. We show that, while software in each of these domains is initially irreproducible, DetTrace brings reproducibility without requiring any hardware, OS or application changes. DetTrace's performance is dictated by the frequency of system calls: IO-intensive software builds have an average overhead of 3.49x, while a compute-bound bioinformatics workflow is under 2%.
- Bazel. https://bazel.build/.Google Scholar
- ReproducibleBuilds. https://wiki.debian.org/ReproducibleBuilds.Google Scholar
- Pachyderm reproducible data science homepage. https://www.pachyderm.io.Google Scholar
- Code Ocean homepage. https://codeocean.com.Google Scholar
- John B. Carter, John K. Bennett, and Willy Zwaenepoel. Implementation and performance of munin. SIGOPS Oper. Syst. Rev., 25(5):152--164, September 1991.Google Scholar
Digital Library
- Jonathan Goldstein, Ahmed Abdelhamid, Mike Barnett, Sebastian Burckhardt, Badrish Chandramouli, Darren Gehring, Niel Lebeck, Umar Farooq Minhas, Ryan Newton, Rahee Ghosh Peshawaria, et al. Ambrosia: Providing performant virtual resiliency for distributed applications. Technical report, Technical report, 2018. https://aka. ms/amb-tr, 2018.Google Scholar
- Fred B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv., 22(4):299--319, December 1990.Google Scholar
Digital Library
- Andreas Haeberlen, Petr Kouznetsov, and Peter Druschel. Peerreview: Practical accountability for distributed systems. SIGOPS Oper. Syst. Rev., 41(6):175--188, October 2007.Google Scholar
Digital Library
- Robert Bocchino, Mohsen Vakilian, Vikram Adve, Danny Dig, Sarita Adve, Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, and Hyojin Sung. A type and effect system for deterministic parallel java. In Proceeding of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications - OOPSLA '09, page 97, Orlando, Florida, USA, 2009.Google Scholar
Digital Library
- Simon Marlow, Ryan R. Newton, and Simon Peyton Jones. A monad for deterministic parallelism. In Proceedings of the 4th ACM symposium on Haskell, Haskell '11, pages 71--82. ACM, 2011.Google Scholar
Digital Library
- Ryan G. Scott, Omar S. Navarro Leija, Joseph Devietti, and Ryan R. Newton. Monadic composition for deterministic, parallel batch processing. Proc. ACM Program. Lang., 1(OOPSLA):73:1--73:26, October 2017.Google Scholar
Digital Library
- Amittai Aviram, Shu-Chun Weng, Sen Hu, and Bryan Ford. Efficient system-enforced deterministic parallelism. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, 2010.Google Scholar
Digital Library
- Nicholas Hunt, Tom Bergan, Luis Ceze, and Steven D. Gribble. DDOS: taming nondeterminism in distributed systems. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, volume 48, pages 499--508, March 2013.Google Scholar
Digital Library
- Tom Bergan, Nicholas Hunt, Luis Ceze, and Steven Gribble. Deterministic process groups in dOS. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, 2010.Google Scholar
Digital Library
- Abadi, Daniel J. and Faleiro, Jose M. An Overview of Deterministic Database Systems. Communications of the ACM, 61(9):78--88, August 2018.Google Scholar
Digital Library
- Raymond Chen. Why are the module timestamps in Windows 10 so nonsensical? https://blogs.msdn.microsoft.com/oldnewthing/20180103-00/?p=97705.Google Scholar
- Jared Parsons. Deterministic builds in Roslyn. http://blog.paranoidcoding.com/2016/04/05/deterministic-builds-in-roslyn.html.Google Scholar
- tar: please add --clamp-mtime to only update mtimes after a given time. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=790415., June 2015.Google Scholar
- Ren, Zhilei and Jiang, He and Xuan, Jifeng and Yang, Zijiang. Automated Localization for Unreproducible Builds. In Proceedings of the 40th International Conference on Software Engineering, ICSE '18, pages 71--81, New York, NY, USA, 2018. ACM.Google Scholar
- National Academies of Sciences, Engineering, and Medicine. Reproducibility and Replicability in Science. The National Academies Press, Washington, DC, 2019.Google Scholar
- Rosemary Nan Ke and Alex Lamb and Olexa Bilaniuk and Anirudh Goyal and Yoshua Bengio. Reproducibility in Machine Learning: An ICLR 2019 Workshop. https://sites.google.com/view/icml-reproducibility-workshop/home.Google Scholar
- Daniel Maskit. Problems Getting TensorFlow to behave Deterministically. https://github.com/tensorflow/tensorflow/issues/16889.Google Scholar
- Jennifer Villa and Yoav Zimmerman. Reproducibility in ML: why it matters and how to achieve it. https://determined.ai/blog/reproducibility-in-ml/, May 2018.Google Scholar
- Li Lu and Michael L. Scott. Toward a formal semantic framework for deterministic parallel programming. In Proceedings of the 25th International Conference on Distributed Computing, DISC'11, page 460--474, Berlin, Heidelberg, 2011. Springer-Verlag.Google Scholar
Digital Library
- Lindsey Kuper, Aaron Turon, Neelakantan R. Krishnaswami, and Ryan R. Newton. Freeze after writing: quasi-deterministic parallel programming with lvars. In POPL, pages 257--270, 2014.Google Scholar
Digital Library
- Gerald J. Popek and Robert P. Goldberg. Formal requirements for virtualizable third generation architectures. Commun. ACM, 17(7):412--421, July 1974.Google Scholar
Digital Library
- Christopher Domas. Breaking the x86 Instruction Set. Black Hat, 2017. https://www.youtube.com/watch?v=KrksBdWcZgQ.Google Scholar
- Intel 64 and IA-32 Architectures Software Developer Manuals. https://software.intel.com/en-us/articles/intel-sdm.Google Scholar
- Robert O'Callahan, Chris Jones, Nathan Froyd, Kyle Huey, Albert Noll, and Nimrod Partush. Engineering record and replay for deployability: Extended technical report, 2017.Google Scholar
- Intel Xeon Processor E3--1200 v3 Product Family. https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3--1200v3-spec-update.pdf, July 2018. HSW136: Software Using Intel TSX May Result in Unpredictable System Behavior.Google Scholar
- Amittai Aviram, Sen Hu, Bryan Ford, and Ramakrishna Gummadi. Determinating timing channels in compute clouds. In Proceedings of the 2010 ACM workshop on Cloud computing security workshop, CCSW '10, page 103--108, New York, NY, USA, 2010. ACM.Google Scholar
Digital Library
- Marek Olszewski, Jason Ansel, and Saman Amarasinghe. Kendo: Efficient deterministic multithreading in software. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pages 97--108, New York, NY, USA, 2009. ACM.Google Scholar
Digital Library
- Alexandros Stamatakis. Raxml version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9):1312, 2014.Google Scholar
Cross Ref
- Ramu Chenna, Hideaki Sugawara, Tadashi Koike, Rodrigo Lopez, Toby J. Gibson, Desmond G. Higgins, and Julie D. Thompson. Multiple sequence alignment with the clustal series of programs. Nucleic Acids Research, 31(13):3497, 2003.Google Scholar
Cross Ref
- Sean R. Eddy. Profile hidden markov models. Bioinformatics, 14(9):755--763, 1998.Google Scholar
- tensorflow. Models and examples built with TensorFlow. https://github.com/tensorflow/models/tree/master/tutorials/image. Commit 583408.Google Scholar
- Reprotest GitLab page. https://salsa.debian.org/reproducible-builds/reprotest.Google Scholar
- strip-nondeterminism Debian Package Description. https://packages.debian.org/sid/strip-nondeterminism.Google Scholar
- Packages in Stretch/Amd64 Which Failed to Build Reproducibly. https://tests.reproducible-builds.org/debian/stretch/amd64/index_FTBR.html.Google Scholar
- Thomas J. Leblanc and John M. Mellor-Crummey. Debugging parallel programs with instant replay. IEEE Transactions on Computers, C-36(4):471--482, April 1987.Google Scholar
Digital Library
- Michiel Ronsse and Koen De Bosschere. RecPlay: a fully integrated practical record/replay system. ACM Transactions on Computer Systems, 17(2):133--152, May 1999.Google Scholar
Digital Library
- David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M. Chen. Eidetic systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 525--540, Berkeley, CA, USA, 2014. USENIX Association.Google Scholar
Digital Library
- Dongyoon Lee, Benjamin Wester, Kaushik Veeraraghavan, Satish Narayanasamy, Peter M. Chen, and Jason Flinn. Respec: efficient online multiprocessor replayvia speculation and external determinism. In Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10, page 77, Pittsburgh, Pennsylvania, USA, 2010.Google Scholar
Digital Library
- Ali José Mashtizadeh, Tal Garfinkel, David Terei, David Mazieres, and Mendel Rosenblum. Towards practical default-on multi-core record/replay. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '17, pages 693--708, New York, NY, USA, 2017. ACM.Google Scholar
Digital Library
- George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza A. Basrai, and Peter M. Chen. Revirt: Enabling intrusion analysis through virtual-machine logging and replay. SIGOPS Oper. Syst. Rev., 36(SI):211--224, December 2002.Google Scholar
- George W. Dunlap, Dominic G. Lucchetti, Michael A. Fetterman, and Peter M. Chen. Execution replay of multiprocessor virtual machines. In Proceedings of the Fourth ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '08, pages 121--130, New York, NY, USA, 2008. ACM.Google Scholar
Digital Library
- Robert OtextquoterightCallahan, Chris Jones, Nathan Froyd, Kyle Huey, Albert Noll, and Nimrod Partush. Engineering record and replay for deployability. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 377--389, Santa Clara, CA, 2017. USENIX Association.Google Scholar
- VMware: VMware workstation zealot: Enhanced execution record / replay in workstation 6.5, April 2008.Google Scholar
- Program Record/Replay Toolkit. https://software.intel.com/en-us/articles/program-recordreplay-toolkit.Google Scholar
- Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. DMP: deterministic shared memory multiprocessing. In Proceedings of the 14th international conference on Architectural support for programming languages and operating systems (ASPLOS '09), page 85, Washington, DC, USA, 2009.Google Scholar
Digital Library
- Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. Coredet: a compiler and runtime system for deterministic multithreaded execution. In ACM SIGARCH Computer Architecture News, volume 38, pages 53--64. ACM, 2010.Google Scholar
Digital Library
- Derek R. Hower, Polina Dudnik, David A. Wood, and Mark D. Hill. Calvin: Deterministic or not? free will to choose. In Proceedings of the 17th International Symposium on High-Performance Computer Architecture (HPCA), 2011.Google Scholar
Cross Ref
- Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. RCDC: a relaxed consistency deterministic computer. In Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems, 2011.Google Scholar
Digital Library
- Tongping Liu, Charlie Curtsinger, and Emery D. Berger. Dthreads: efficient deterministic multithreading. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 327--336, New York, NY, USA, 2011.Google Scholar
Digital Library
- Timothy Merrifield and Jakob Eriksson. Conversion: multi-version concurrency control for main memory segments. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 127--139, New York, NY, USA, 2013. ACM.Google Scholar
Digital Library
- Kai Lu, Xu Zhou, Tom Bergan, and Xiaoping Wang. Efficient deterministic multithreading without global barriers. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, February 2014.Google Scholar
Digital Library
- Timothy Merrifield, Joseph Devietti, and Jakob Eriksson. High-performance determinism with total store order consistency. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 31:1--31:13, New York, NY, USA, 2015. ACM.Google Scholar
Digital Library
- Timothy Merrifield, Sepideh Roghanchi, Joseph Devietti, and Jakob Eriksson. Lazy determinism for faster deterministic multithreading. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '19, pages 879--891, New York, NY, USA, 2019. ACM.Google Scholar
Digital Library
- Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, and Simon Marlow. Data parallel haskell: A status report. In Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming, DAMP '07, pages 10--18, New York, NY, USA, 2007. ACM.Google Scholar
Digital Library
- Simon Marlow, Patrick Maier, Hans-Wolfgang Loidl, Mustafa K Aswad, and Phil Trinder. Seq no more: better strategies for parallel haskell. In ACM Sigplan Notices, volume 45, pages 91--102. ACM, 2010.Google Scholar
Digital Library
- Ryan R. Newton, Ömer S. Augacan, Peter Fogg, and Sam Tobin-Hochstadt. Parallel type-checking with haskell using saturating lvars and stream generators. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '16, pages 6:1--6:12, New York, NY, USA, 2016. ACM.Google Scholar
Digital Library
- Daan Leijen, Manuel Fahndrich, and Sebastian Burckhardt. Prettier concurrency: Purely functional concurrent revisions. In Proceedings of the 4th ACM Symposium on Haskell, Haskell '11, pages 83--94. ACM, 2011.Google Scholar
Digital Library
- Robert L. Bocchino and Vikram S. Adve. Types, regions, and effects for safe programming with object-oriented parallel frameworks. In Proceedings of the 25th European Conference on Object-oriented Programming, ECOOP'11, pages 306--332, Berlin, Heidelberg, 2011. Springer-Verlag.Google Scholar
Index Terms
Reproducible Containers
Recommendations
Reproducible Online Search Experiments
AbstractIn the empirical sciences, the evidence is commonly manifested by experimental results. However, very often, these findings are not reproducible, hindering scientific progress. Innovations in the field of information retrieval (IR) are mainly ...
A comparison of layouts of reefer containers in automated container terminal
For managing reefer containers more efficiently, it is important to optimally determine the block size and the layout of reefer containers in the early design phase. Work balances among blocks of a yard have a significant effect on the productivity of ...
Reproducibility of Computational Experiments on Kubernetes-Managed Container Clouds with HyperFlow
AbstractWe propose a comprehensive solution for reproducibility of scientific workflows. We focus particularly on Kubernetes-managed container clouds, increasingly important in scientific computing. Our solution addresses conservation of the scientific ...





Comments