skip to main content
research-article

A model of process documentation to determine provenance in mash-ups

Published:23 February 2009Publication History
Skip Abstract Section

Abstract

Through technologies such as RSS (Really Simple Syndication), Web Services, and AJAX (Asynchronous JavaScript and XML), the Internet has facilitated the emergence of applications that are composed from a variety of services and data sources. Through tools such as Yahoo Pipes, these “mash-ups” can be composed in a dynamic, just-in-time manner from components provided by multiple institutions (i.e., Google, Amazon, your neighbor). However, when using these applications, it is not apparent where data comes from or how it is processed. Thus, to inspire trust and confidence in mash-ups, it is critical to be able to analyze their processes after the fact. These trailing analyses, in particular the determination of the provenance of a result (i.e., the process that led to it), are enabled by process documentation, which is documentation of an application's past process created by the components of that application at execution time. In this article, we define a generic conceptual data model that supports the autonomous creation of attributable, factual process documentation for dynamic multi-institutional applications. The data model is instantiated using two Internet formats, OWL and XML, and is evaluated with respect to questions about the provenance of results generated by a complex bioinformatics mash-up.

References

  1. Aguilera, M. K., Mogul, J. C., Wiener, J. L., Reynolds, P., and Muthitacharoen, A. 2003. Performance debugging for distributed systems of black boxes. In SOSP '03: Proceedings of the 19th ACM Symposium on Operating Systems Principles. ACM, New York, 74--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Altintas, I., Barney, O., and Jaeger-Frank, E. 2006. Provenance collection support in the Kepler scientific workflow system. In Proceedings of the International Provenance and Annotation Workshop, IPAW 2006, L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany, 118--132.Google ScholarGoogle Scholar
  3. Bates, P. C. 1995. Debugging heterogeneous distributed systems using event-based models of behavior. ACM Trans. Comput. Syst. 13, 1, 1--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Boag, S., Chamberlin, D., Fernández, M. F., Florescu, D., Robie, J., and Simon, J. 2006. Xquery 1.0: An XML query language. Tech. rep., World Wide Web Consortium.Google ScholarGoogle Scholar
  5. Booch, G. 1999. UML in action. Comm. ACM 42, 10, 26--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bose, R. and Frew, J. 2005. Lineage retrieval for scientific data processing: a survey. ACM Comput. Surv. 37, 1, 1--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Buneman, P., Khanna, S., and Tan, W. 2001. Why and where: A characterization of data provenance. In Proceedings of the International Conference on Databases Theory (ICDT). Lecture Notes in Computer Science, vol. 1973. Springer-Verlag, Berlin, Germany, 316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Butler, D. 2006. Mashups mix data into global service. Nature 439, 6--7.Google ScholarGoogle ScholarCross RefCross Ref
  9. Carroll, J. J., Bizer, C., Hayes, P., and Stickler, P. 2005. Named graphs, provenance and trust. In WWW '05: Proceedings of the 14th International Conference on World Wide Web. ACM, New York, 613--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cui, Y. and Widom, J. 2003. Lineage tracing for general data warehouse transformations. VLDB J. 12, 1, 41--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cui, Y., Widom, J., and Wiener, J. L. 2000. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25, 2, 179--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Curbera, F., Khalaf, R., Mukhi, N., Tai, S., and Weerawarana, S. 2003. The next step in web services. Comm. ACM 46, 10, 29--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dean, T. 1996. Automated planning. ACM Comput. Surv. 28, 1, 85--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Deora, V., Contes, A., Rana, O. F., Rajbhandari, S., Wootten, I., Tamas, K., and Varga, L. Z. 2006. Navigating provenance information for distributed healthcare management. In Proceedings of the IEEE/WIC/ACM Web Intelligence Conference. IEEE Computer Society Press, Los Alamitos, CA, 859--865. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. DeRoure, D., Ed. 2007. Web 2.0 and Grids Workshop at OGF19. http://www.semanticgrid.org/OGF/ogf19/.Google ScholarGoogle Scholar
  16. Foster, I. and Kesselman, C., Eds. 1999. The Grid: Blueprint for a New Computing Infrastructure, 1st Edition ed. Morgan-Kaufmann, Reading, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Foster, I. and Kesselman, C. 2006. Scaling system-level science: Scientific exploration and it implications. IEEE Comput. 39, 11 (Nov.), 31--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Futrelle, J. 2006. Harvesting RDF triples. In Proceedings of the International Provenance and Annotation Workshop (IPAW 2006), L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany, 64--72.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Golbeck, J. 2006. Combining provenance with trust in social networks for semantic web content filtering. In Proceedings of the International Provenance and Annotation Workshop (IPAW 2006), L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany, 101--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Golbeck, J. and Hendler, J. 2006. Inferring binary trust relationships in web-based social networks. ACM Trans. Internet Tech. 6, 4, 497--529. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Groth, P., Miles, S., Fang, W., Wong, S. C., Zauner, K.-P., and Moreau, L. 2005a. Recording and using provenance in a protein compressibility experiment. In Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC'05). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Groth, P., Miles, S., and Moreau, L. 2005b. PReServ: Provenance Recording for Services. In Proceedings of the UK OST e-Science 2nd All Hands Meeting 2005 (AHM'05).Google ScholarGoogle Scholar
  23. Ho, A. and Hand, S. 2005. On the design of a pervasive debugger. In AADEBUG'05: Proceedings of the 6th International Symposium on Automated Analysis-Driven Debugging. ACM, New York, 117--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ibbotson, J. and Jiang, S. 2006. D9.3.3: Final Functional Prototype. Tech. rep., IBM United Kingdom. Nov.Google ScholarGoogle Scholar
  25. Jones, R. and Lins, R. 1996. Garbage Collection. Algorithms for Automatic Dynamic Memory Management. Wiley, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kifor, T., Varga, L. Z., Vazquez-Salceda, J., Alvarez, S., Willmott, S., Miles, S., and Moreau, L. 2006. Provenance in agent-mediated healthcare systems. IEEE Intell. Syst. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kloss, G. K. and Schreiber, A. 2006. Provenance implementation in a scientific simulation environment. In Proceedings of the International Provenance and Annotation Workshop (IPAW). L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany, 37--46.Google ScholarGoogle Scholar
  28. Lamport, L. 1978. Time, clocks and the ordering of events in a distributed system. Comm. ACM 21, 7 (July), 558--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lavana, H., Khetawat, A., Brglez, F., and Kozminski, K. 1997. Executable workflows: A paradigm for collaborative design on the Internet. In DAC '97: Proceedings of the 34th Annual Conference on Design Automation. ACM, New York, 553--558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lynch, N. 1996. Distributed Algorithms. Morgan-Kaufmann, Reading, MA, page 460. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Martin, D., Burstein, M., McDermott, D., McIlraith, S., Paolucci, M., Sycara, K., McGuinness, D., Sirin, E., and Srinivasan, N. 2007. Bringing semantics to web services with owl-s. World Wide Web J. 10, 3 (Sept.), 243--277. (Special Issue: Recent Advances in Web Services). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. McIlraith, S. and Son, T. 2002. Adapting golog for composition of semantic web services. In Proceedings of the 8th International Conference on Knowledge Representation and Reasoning (KR2002). 482--493.Google ScholarGoogle Scholar
  33. Miles, S. 2006. Electronically querying for the provenance of entities. In Proceedings of the International Provenance and Annotation Workshop (IPAW). L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany, 37--46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Miles, S., Groth, P., Branco, M., and Moreau, L. 2007a. The requirements of using provenance in e-science experiments. J. Grid Comput. 5, 1, 1--25.Google ScholarGoogle ScholarCross RefCross Ref
  35. Miles, S., Groth, P., Munroe, S., Jiang, S., Assandri, T., and Moreau, L. 2008. Extracting causal graphs from an open provenance data model. Concurr. Comput.: Pract. Exper. 20, 5, 577--586 (Special Issue: The First Provenance Challenge). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Miles, S., Wong, S. C., Fang, W., Groth, P., Zauner, K.-P., and Moreau, L. 2007b. Provenance-based validation of e-science experiments. J. Web Semantics: Science Services and Agents on the World Wide Web 5, 28--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Moreau, L. and Foster, I., Eds. 2006. Proceedings of the Provenance and Annotation of Data—International Provenance and Annotation Workshop (IPAW 2006). Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany.Google ScholarGoogle Scholar
  38. Munroe, S., Groth, P., Jiang, S., Miles, S., Tan, V., and Moreau, L. 2006a. Data model for process documentation. Tech. rep., University of Southampton. http://eprints.ecs.soton.ac.uk/13200/.Google ScholarGoogle Scholar
  39. Munroe, S., Miles, S., Moreau, L., and Vazquez-Salceda, J. 2006b. PrIMe: A Software Engineering Methodology for Developing Provenance-Aware Applications. In Proceedings of 6th International Workshop on Software Engineering and Middleware (SEM'06). ACM Digital, Portland, Oregon, 8. Published electronically by ACM Digital at http://portal.acm.org/toc.cfm?id=1210525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Novak, J. D. 1998. Learning, Creating, and Using Knowledge: Concept Maps As Facilitative Tools in Schools and Corporations. LEA, Inc.Google ScholarGoogle Scholar
  41. Simmhan, Y. L., Plale, B., and Gannon, D. 2005. A survey of data provenance in e-science. SIGMOD Record 34, 3, 31--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Simmhan, Y. L., Plale, B., Gannon, D., and Marru, S. 2006. Performance evaluation of the karma provenance framework for scientific workflows. In Proceedings of the International Provenance and Annotation Workshop (IPAW 2006). L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany.Google ScholarGoogle Scholar
  43. Stein, S., Payne, T. R., and Jennings, N. R. 2008. Flexible provisioning of web service workflows. ACM Trans. Internet Tech. 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Szomszor, M., Payne, T. R., and Moreau, L. 2006. Automated syntactic medation for web service integration. In Proceedings of IEEE International Conference on Web Services (ICWS'06). IEEE Computer Society Press, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tan, V., Groth, P., Miles, S., Jiang, S., Munroe, S., Tsasakou, S., and Moreau, L. 2006. Security issues in a soa-based provenance system. In Proceedings of the International Provenance and Annotation Workshop (IPAW'06). Springer-Verlag, Berlin, Germany.Google ScholarGoogle Scholar
  46. Townend, P., Groth, P., and Xu, J. 2005. A provenance-aware weighted fault tolerance scheme for service-based applications. In Proceedings of the 8th IEEE International Symposium on Object-oriented Real-time Distributed Computing (ISORC 2005). IEEE Computer Society Press, Los Alamitos, CA, 258--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wang, G. and Dunbrack, Jr., R. L. 2003. Pisces: A protein sequence culling server. Bioinformatics 19, 1589--1591.Google ScholarGoogle ScholarCross RefCross Ref
  48. Zhao, J., Goble, C., Greenwood, M., Wroe, C., and Stevens, R. 2003. Annotating, linking and browsing provenance logs for e-science. In Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data.Google ScholarGoogle Scholar
  49. Zhao, Y., Wilde, M., and Foster, I. 2006. A virtual data provenance model. In Proceedings of the International Provenance and Annotation Workshop (IPAW 2006). L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany.Google ScholarGoogle Scholar

Index Terms

  1. A model of process documentation to determine provenance in mash-ups

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  • Published in

                    cover image ACM Transactions on Internet Technology
                    ACM Transactions on Internet Technology  Volume 9, Issue 1
                    February 2009
                    155 pages
                    ISSN:1533-5399
                    EISSN:1557-6051
                    DOI:10.1145/1462159
                    Issue’s Table of Contents

                    Copyright © 2009 ACM

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 23 February 2009
                    • Received: 1 February 2007
                    • Accepted: 1 February 2007
                    Published in toit Volume 9, Issue 1

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article
                    • Research
                    • Refereed

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader
                  About Cookies On This Site

                  We use cookies to ensure that we give you the best experience on our website.

                  Learn more

                  Got it!