Abstract
Through technologies such as RSS (Really Simple Syndication), Web Services, and AJAX (Asynchronous JavaScript and XML), the Internet has facilitated the emergence of applications that are composed from a variety of services and data sources. Through tools such as Yahoo Pipes, these “mash-ups” can be composed in a dynamic, just-in-time manner from components provided by multiple institutions (i.e., Google, Amazon, your neighbor). However, when using these applications, it is not apparent where data comes from or how it is processed. Thus, to inspire trust and confidence in mash-ups, it is critical to be able to analyze their processes after the fact. These trailing analyses, in particular the determination of the provenance of a result (i.e., the process that led to it), are enabled by process documentation, which is documentation of an application's past process created by the components of that application at execution time. In this article, we define a generic conceptual data model that supports the autonomous creation of attributable, factual process documentation for dynamic multi-institutional applications. The data model is instantiated using two Internet formats, OWL and XML, and is evaluated with respect to questions about the provenance of results generated by a complex bioinformatics mash-up.
- Aguilera, M. K., Mogul, J. C., Wiener, J. L., Reynolds, P., and Muthitacharoen, A. 2003. Performance debugging for distributed systems of black boxes. In SOSP '03: Proceedings of the 19th ACM Symposium on Operating Systems Principles. ACM, New York, 74--89. Google Scholar
Digital Library
- Altintas, I., Barney, O., and Jaeger-Frank, E. 2006. Provenance collection support in the Kepler scientific workflow system. In Proceedings of the International Provenance and Annotation Workshop, IPAW 2006, L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany, 118--132.Google Scholar
- Bates, P. C. 1995. Debugging heterogeneous distributed systems using event-based models of behavior. ACM Trans. Comput. Syst. 13, 1, 1--31. Google Scholar
Digital Library
- Boag, S., Chamberlin, D., Fernández, M. F., Florescu, D., Robie, J., and Simon, J. 2006. Xquery 1.0: An XML query language. Tech. rep., World Wide Web Consortium.Google Scholar
- Booch, G. 1999. UML in action. Comm. ACM 42, 10, 26--28. Google Scholar
Digital Library
- Bose, R. and Frew, J. 2005. Lineage retrieval for scientific data processing: a survey. ACM Comput. Surv. 37, 1, 1--28. Google Scholar
Digital Library
- Buneman, P., Khanna, S., and Tan, W. 2001. Why and where: A characterization of data provenance. In Proceedings of the International Conference on Databases Theory (ICDT). Lecture Notes in Computer Science, vol. 1973. Springer-Verlag, Berlin, Germany, 316. Google Scholar
Digital Library
- Butler, D. 2006. Mashups mix data into global service. Nature 439, 6--7.Google Scholar
Cross Ref
- Carroll, J. J., Bizer, C., Hayes, P., and Stickler, P. 2005. Named graphs, provenance and trust. In WWW '05: Proceedings of the 14th International Conference on World Wide Web. ACM, New York, 613--622. Google Scholar
Digital Library
- Cui, Y. and Widom, J. 2003. Lineage tracing for general data warehouse transformations. VLDB J. 12, 1, 41--58. Google Scholar
Digital Library
- Cui, Y., Widom, J., and Wiener, J. L. 2000. Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25, 2, 179--227. Google Scholar
Digital Library
- Curbera, F., Khalaf, R., Mukhi, N., Tai, S., and Weerawarana, S. 2003. The next step in web services. Comm. ACM 46, 10, 29--34. Google Scholar
Digital Library
- Dean, T. 1996. Automated planning. ACM Comput. Surv. 28, 1, 85--87. Google Scholar
Digital Library
- Deora, V., Contes, A., Rana, O. F., Rajbhandari, S., Wootten, I., Tamas, K., and Varga, L. Z. 2006. Navigating provenance information for distributed healthcare management. In Proceedings of the IEEE/WIC/ACM Web Intelligence Conference. IEEE Computer Society Press, Los Alamitos, CA, 859--865. Google Scholar
Digital Library
- DeRoure, D., Ed. 2007. Web 2.0 and Grids Workshop at OGF19. http://www.semanticgrid.org/OGF/ogf19/.Google Scholar
- Foster, I. and Kesselman, C., Eds. 1999. The Grid: Blueprint for a New Computing Infrastructure, 1st Edition ed. Morgan-Kaufmann, Reading, MA. Google Scholar
Digital Library
- Foster, I. and Kesselman, C. 2006. Scaling system-level science: Scientific exploration and it implications. IEEE Comput. 39, 11 (Nov.), 31--39. Google Scholar
Digital Library
- Futrelle, J. 2006. Harvesting RDF triples. In Proceedings of the International Provenance and Annotation Workshop (IPAW 2006), L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany, 64--72.Google Scholar
Digital Library
- Golbeck, J. 2006. Combining provenance with trust in social networks for semantic web content filtering. In Proceedings of the International Provenance and Annotation Workshop (IPAW 2006), L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany, 101--108.Google Scholar
Digital Library
- Golbeck, J. and Hendler, J. 2006. Inferring binary trust relationships in web-based social networks. ACM Trans. Internet Tech. 6, 4, 497--529. Google Scholar
Digital Library
- Groth, P., Miles, S., Fang, W., Wong, S. C., Zauner, K.-P., and Moreau, L. 2005a. Recording and using provenance in a protein compressibility experiment. In Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing (HPDC'05). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Groth, P., Miles, S., and Moreau, L. 2005b. PReServ: Provenance Recording for Services. In Proceedings of the UK OST e-Science 2nd All Hands Meeting 2005 (AHM'05).Google Scholar
- Ho, A. and Hand, S. 2005. On the design of a pervasive debugger. In AADEBUG'05: Proceedings of the 6th International Symposium on Automated Analysis-Driven Debugging. ACM, New York, 117--122. Google Scholar
Digital Library
- Ibbotson, J. and Jiang, S. 2006. D9.3.3: Final Functional Prototype. Tech. rep., IBM United Kingdom. Nov.Google Scholar
- Jones, R. and Lins, R. 1996. Garbage Collection. Algorithms for Automatic Dynamic Memory Management. Wiley, New York. Google Scholar
Digital Library
- Kifor, T., Varga, L. Z., Vazquez-Salceda, J., Alvarez, S., Willmott, S., Miles, S., and Moreau, L. 2006. Provenance in agent-mediated healthcare systems. IEEE Intell. Syst. Google Scholar
Digital Library
- Kloss, G. K. and Schreiber, A. 2006. Provenance implementation in a scientific simulation environment. In Proceedings of the International Provenance and Annotation Workshop (IPAW). L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany, 37--46.Google Scholar
- Lamport, L. 1978. Time, clocks and the ordering of events in a distributed system. Comm. ACM 21, 7 (July), 558--565. Google Scholar
Digital Library
- Lavana, H., Khetawat, A., Brglez, F., and Kozminski, K. 1997. Executable workflows: A paradigm for collaborative design on the Internet. In DAC '97: Proceedings of the 34th Annual Conference on Design Automation. ACM, New York, 553--558. Google Scholar
Digital Library
- Lynch, N. 1996. Distributed Algorithms. Morgan-Kaufmann, Reading, MA, page 460. Google Scholar
Digital Library
- Martin, D., Burstein, M., McDermott, D., McIlraith, S., Paolucci, M., Sycara, K., McGuinness, D., Sirin, E., and Srinivasan, N. 2007. Bringing semantics to web services with owl-s. World Wide Web J. 10, 3 (Sept.), 243--277. (Special Issue: Recent Advances in Web Services). Google Scholar
Digital Library
- McIlraith, S. and Son, T. 2002. Adapting golog for composition of semantic web services. In Proceedings of the 8th International Conference on Knowledge Representation and Reasoning (KR2002). 482--493.Google Scholar
- Miles, S. 2006. Electronically querying for the provenance of entities. In Proceedings of the International Provenance and Annotation Workshop (IPAW). L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany, 37--46.Google Scholar
Digital Library
- Miles, S., Groth, P., Branco, M., and Moreau, L. 2007a. The requirements of using provenance in e-science experiments. J. Grid Comput. 5, 1, 1--25.Google Scholar
Cross Ref
- Miles, S., Groth, P., Munroe, S., Jiang, S., Assandri, T., and Moreau, L. 2008. Extracting causal graphs from an open provenance data model. Concurr. Comput.: Pract. Exper. 20, 5, 577--586 (Special Issue: The First Provenance Challenge). Google Scholar
Digital Library
- Miles, S., Wong, S. C., Fang, W., Groth, P., Zauner, K.-P., and Moreau, L. 2007b. Provenance-based validation of e-science experiments. J. Web Semantics: Science Services and Agents on the World Wide Web 5, 28--38. Google Scholar
Digital Library
- Moreau, L. and Foster, I., Eds. 2006. Proceedings of the Provenance and Annotation of Data—International Provenance and Annotation Workshop (IPAW 2006). Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany.Google Scholar
- Munroe, S., Groth, P., Jiang, S., Miles, S., Tan, V., and Moreau, L. 2006a. Data model for process documentation. Tech. rep., University of Southampton. http://eprints.ecs.soton.ac.uk/13200/.Google Scholar
- Munroe, S., Miles, S., Moreau, L., and Vazquez-Salceda, J. 2006b. PrIMe: A Software Engineering Methodology for Developing Provenance-Aware Applications. In Proceedings of 6th International Workshop on Software Engineering and Middleware (SEM'06). ACM Digital, Portland, Oregon, 8. Published electronically by ACM Digital at http://portal.acm.org/toc.cfm?id=1210525. Google Scholar
Digital Library
- Novak, J. D. 1998. Learning, Creating, and Using Knowledge: Concept Maps As Facilitative Tools in Schools and Corporations. LEA, Inc.Google Scholar
- Simmhan, Y. L., Plale, B., and Gannon, D. 2005. A survey of data provenance in e-science. SIGMOD Record 34, 3, 31--36. Google Scholar
Digital Library
- Simmhan, Y. L., Plale, B., Gannon, D., and Marru, S. 2006. Performance evaluation of the karma provenance framework for scientific workflows. In Proceedings of the International Provenance and Annotation Workshop (IPAW 2006). L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany.Google Scholar
- Stein, S., Payne, T. R., and Jennings, N. R. 2008. Flexible provisioning of web service workflows. ACM Trans. Internet Tech. 8. Google Scholar
Digital Library
- Szomszor, M., Payne, T. R., and Moreau, L. 2006. Automated syntactic medation for web service integration. In Proceedings of IEEE International Conference on Web Services (ICWS'06). IEEE Computer Society Press, Los Alamitos, CA. Google Scholar
Digital Library
- Tan, V., Groth, P., Miles, S., Jiang, S., Munroe, S., Tsasakou, S., and Moreau, L. 2006. Security issues in a soa-based provenance system. In Proceedings of the International Provenance and Annotation Workshop (IPAW'06). Springer-Verlag, Berlin, Germany.Google Scholar
- Townend, P., Groth, P., and Xu, J. 2005. A provenance-aware weighted fault tolerance scheme for service-based applications. In Proceedings of the 8th IEEE International Symposium on Object-oriented Real-time Distributed Computing (ISORC 2005). IEEE Computer Society Press, Los Alamitos, CA, 258--266. Google Scholar
Digital Library
- Wang, G. and Dunbrack, Jr., R. L. 2003. Pisces: A protein sequence culling server. Bioinformatics 19, 1589--1591.Google Scholar
Cross Ref
- Zhao, J., Goble, C., Greenwood, M., Wroe, C., and Stevens, R. 2003. Annotating, linking and browsing provenance logs for e-science. In Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data.Google Scholar
- Zhao, Y., Wilde, M., and Foster, I. 2006. A virtual data provenance model. In Proceedings of the International Provenance and Annotation Workshop (IPAW 2006). L. Moreau and I. Foster, Eds. Lecture Notes in Computer Science, vol. 4145. Springer-Verlag, Berlin, Germany.Google Scholar
Index Terms
A model of process documentation to determine provenance in mash-ups
Recommendations
Really simple mash-ups
IS-EUD'11: Proceedings of the Third international conference on End-user developmentMash-ups are applications - typically web applications - designed by combining data from several web services into a new tool or expression. New mash-ups emerge every day. Different End-User Development environments for mash-ups are available. However, ...
Mining web interactions to automatically create mash-ups
UIST '09: Proceedings of the 22nd annual ACM symposium on User interface software and technologyThe deep web contains an order of magnitude more information than the surface web, but that information is hidden behind the web forms of a large number of web sites. Metasearch engines can help users explore this information by aggregating results from ...
Provenance in Web Feed Mash-Up Systems
The recent emergence of web 2.0 technologies and rich internet applications is driving the development of a new class of applications that combines data from diverse sources which we refer to as "mash-ups." One of the most popular mash-ups comes in the ...






Comments