skip to main content
research-article

noWorkflow: a tool for collecting, analyzing, and managing provenance from python scripts

Published:01 August 2017Publication History
Skip Abstract Section

Abstract

We present noWorkflow, an open-source tool that systematically and transparently collects provenance from Python scripts, including data about the script execution and how the script evolves over time. During the demo, we will show how noWorkflow collects and manages provenance, as well as how it supports the analysis of computational experiments. We will also encourage attendees to use noWorkflow for their own scripts.

References

  1. C. Bochner, R. Gude, and A. Schreiber. A python library for provenance recording and querying. In IPAW, pages 229--240, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva, and H. T. Vo. Managing the Evolution of Dataflows with VisTrails. In ICDE, pages 71--71, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Chirigati, D. Shasha, and J. Freire. Reprozip: Using provenance to support computational reproducibility. In TaPP, pages 977--980, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Dar and R. Agrawal. Extending Sql with generalized transitive closure. IEEE Transactions on Knowledge and Data Engineering, 5(5):799--812, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Dey, K. Belhajjame, D. Koop, M. Raul, and B. Ludäscher. Linking prospective and retrospective provenance in scripts. In TaPP, pages 1--7, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Freire, D. Koop, E. Santos, and C. T. Silva. Provenance for computational tasks: A survey. Computing in Science & Engineering, 10(3):11--21, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. J. Guo and D. Engler. Using automatic persistent memoization to facilitate data analysis scripting. In ISSTA, pages 287--297, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. J. Guo and M. Seltzer. BUrrIto: Wrapping Your Lab Notebook in Computational Infrastructure. In TaPP, volume 12, pages 1--7, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. McPhillips, T. Song, T. Kolisnik, S. Aulenbach, K. Belhajjame, K. Bocinsky, Y. Cao, F. Chirigati, S. Dey, J. Freire, et al. YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. International Journal of Digital Curation, 10(1):298--313, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  10. R. Meyer and K. Obermayer. pypet: A python Toolkit for Data Management of Parameter Explorations. Frontiers in Neuroinformatics, 10:1--16, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  11. K.-K. Muniswamy-Reddy, D. A. Holland, U. Braun, and M. I. Seltzer. Provenance-Aware Storage Systems. In USENIX ATC, pages 43--56, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Murta, V. Braganholo, F. Chirigati, D. Koop, and J. Freire. noWorkflow: capturing and analyzing provenance of scripts. In IPAW, pages 71--83, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. F. Pimentel, S. Dey, T. McPhillips, K. Belhajjame, D. Koop, L. Murta, V. Braganholo, and B. Ludäscher. Yin & Yang: demonstrating complementary provenance from noWorkflow & YesWorkflow. In IPAW, pages 161--165, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. F. Pimentel, J. Freire, V. Braganholo, and L. Murta. Tracking and analyzing the evolution of provenance from scripts. In IPAW, pages 16--28, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. F. Pimentel, J. Freire, L. Murta, and V. Braganholo. Fine-grained provenance collection over scripts through program slicing. In IPAW, pages 199--203, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. F. N. Pimentel, V. Braganholo, L. Murta, and J. Freire. Collecting and analyzing provenance on interactive notebooks: when IPython meets noWorkflow. In TaPP, pages 1--6, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. E. Scheidegger, H. T. Vo, D. Koop, J. Freire, and C. T. Silva. Querying and re-using workflows with Vstrails. In SIGMOD, pages 1251--1254, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Stamatogiannakis, P. Groth, and H. Bos. Looking inside the black-box: capturing data provenance using dynamic instrumentation. In IPAW, pages 155--167, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Weske, G. Vossen, and C. B. Medeiros. Scientific workflow management: WASA architecture and applications. Citeseer, Universität Münster. Angewandte Mathematik und Informatik, 1996.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 10, Issue 12
    August 2017
    427 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 August 2017
    Published in pvldb Volume 10, Issue 12

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!