Abstract
We present noWorkflow, an open-source tool that systematically and transparently collects provenance from Python scripts, including data about the script execution and how the script evolves over time. During the demo, we will show how noWorkflow collects and manages provenance, as well as how it supports the analysis of computational experiments. We will also encourage attendees to use noWorkflow for their own scripts.
- C. Bochner, R. Gude, and A. Schreiber. A python library for provenance recording and querying. In IPAW, pages 229--240, 2008. Google Scholar
Digital Library
- S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva, and H. T. Vo. Managing the Evolution of Dataflows with VisTrails. In ICDE, pages 71--71, 2006. Google Scholar
Digital Library
- F. Chirigati, D. Shasha, and J. Freire. Reprozip: Using provenance to support computational reproducibility. In TaPP, pages 977--980, 2013. Google Scholar
Digital Library
- S. Dar and R. Agrawal. Extending Sql with generalized transitive closure. IEEE Transactions on Knowledge and Data Engineering, 5(5):799--812, 1993. Google Scholar
Digital Library
- S. Dey, K. Belhajjame, D. Koop, M. Raul, and B. Ludäscher. Linking prospective and retrospective provenance in scripts. In TaPP, pages 1--7, 2015. Google Scholar
Digital Library
- J. Freire, D. Koop, E. Santos, and C. T. Silva. Provenance for computational tasks: A survey. Computing in Science & Engineering, 10(3):11--21, 2008. Google Scholar
Digital Library
- P. J. Guo and D. Engler. Using automatic persistent memoization to facilitate data analysis scripting. In ISSTA, pages 287--297, 2011. Google Scholar
Digital Library
- P. J. Guo and M. Seltzer. BUrrIto: Wrapping Your Lab Notebook in Computational Infrastructure. In TaPP, volume 12, pages 1--7, 2012. Google Scholar
Digital Library
- T. McPhillips, T. Song, T. Kolisnik, S. Aulenbach, K. Belhajjame, K. Bocinsky, Y. Cao, F. Chirigati, S. Dey, J. Freire, et al. YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. International Journal of Digital Curation, 10(1):298--313, 2015.Google Scholar
Cross Ref
- R. Meyer and K. Obermayer. pypet: A python Toolkit for Data Management of Parameter Explorations. Frontiers in Neuroinformatics, 10:1--16, 2016.Google Scholar
Cross Ref
- K.-K. Muniswamy-Reddy, D. A. Holland, U. Braun, and M. I. Seltzer. Provenance-Aware Storage Systems. In USENIX ATC, pages 43--56, 2006. Google Scholar
Digital Library
- L. Murta, V. Braganholo, F. Chirigati, D. Koop, and J. Freire. noWorkflow: capturing and analyzing provenance of scripts. In IPAW, pages 71--83, 2014. Google Scholar
Digital Library
- J. F. Pimentel, S. Dey, T. McPhillips, K. Belhajjame, D. Koop, L. Murta, V. Braganholo, and B. Ludäscher. Yin & Yang: demonstrating complementary provenance from noWorkflow & YesWorkflow. In IPAW, pages 161--165, 2016.Google Scholar
Cross Ref
- J. F. Pimentel, J. Freire, V. Braganholo, and L. Murta. Tracking and analyzing the evolution of provenance from scripts. In IPAW, pages 16--28, 2016.Google Scholar
Cross Ref
- J. F. Pimentel, J. Freire, L. Murta, and V. Braganholo. Fine-grained provenance collection over scripts through program slicing. In IPAW, pages 199--203, 2016.Google Scholar
Cross Ref
- J. F. N. Pimentel, V. Braganholo, L. Murta, and J. Freire. Collecting and analyzing provenance on interactive notebooks: when IPython meets noWorkflow. In TaPP, pages 1--6, 2015. Google Scholar
Digital Library
- C. E. Scheidegger, H. T. Vo, D. Koop, J. Freire, and C. T. Silva. Querying and re-using workflows with Vstrails. In SIGMOD, pages 1251--1254, 2008. Google Scholar
Digital Library
- M. Stamatogiannakis, P. Groth, and H. Bos. Looking inside the black-box: capturing data provenance using dynamic instrumentation. In IPAW, pages 155--167, 2014. Google Scholar
Digital Library
- M. Weske, G. Vossen, and C. B. Medeiros. Scientific workflow management: WASA architecture and applications. Citeseer, Universität Münster. Angewandte Mathematik und Informatik, 1996.Google Scholar
Recommendations
noWorkflow: Capturing and Analyzing Provenance of Scripts
IPAW 2014: Revised Selected Papers of the 5th International Provenance and Annotation Workshop on Provenance and Annotation of Data and Processes - Volume 8628We propose noWorkflow, a tool that transparently captures provenance of scripts and enables reproducibility. Unlike existing approaches, noWorkflow is non-intrusive and does not require users to change the way they work --- users need not wrap their ...






Comments