ABSTRACT

Scientific workflow systems increasingly store provenance information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions. However, authors/owners of workflows may wish to keep some of this information confidential. In particular, a module may be proprietary, and users should not be able to infer its behavior by seeing mappings between all data inputs and outputs.
The problem we address in this paper is the following: Given a workflow, abstractly modeled by a relation R, a privacy requirement ? and costs associated with data. The owner of the workflow decides which data (attributes) to hide, and provides the user with a view R' which is the projection of R over attributes which have not been hidden. The goal is to minimize the cost of hidden data while guaranteeing that individual modules are ?-private. We call this the Secure-View problem. We formally define the problem, study its complexity, and offer algorithmic solutions.
- www.myexperiment.org.Google Scholar
- C. C. Aggarwal and P. S. Yu. Privacy-Preserving Data Mining: Models and Algorithms. Springer, 2008. Google Scholar
Digital Library
- L. Backstrom, C. Dwork, and J. M. Kleinberg. Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In WWW, pages 181--190, 2007. Google Scholar
Digital Library
- S. Bowers and B. Ludascher. Actor-oriented design of scientific workflows. In Int. Conf. on Concept. Modeling, pages 369--384, 2005. Google Scholar
Digital Library
- U. Braun, A. Shinnar, and M. Seltzer. Securing provenance. In USENIX HotSec, pages 1--5, 2008. Google Scholar
Digital Library
- A. Chebotko, S. Chang, S. Lu, F. Fotouhi, and P. Yang. Scientific workflow provenance querying with security views. In WAIM, pages 349--356, 2008. Google Scholar
Digital Library
- S. B. Davidson, S. Khanna, S. Roy, J. Stoyanovich, V. Tannen, Y. Chen, and T. Milo. Enabling privacy in provenance-aware workflow systems. In CIDR, 2011.Google Scholar
- I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, pages 202--210, 2003. Google Scholar
Digital Library
- C. Dwork. Differential privacy: A survey of results. In TAMC, pages 1--19, 2008. Google Scholar
Digital Library
- J. Freire, C. T. Silva, S. P. Callahan, E. Santos, C. E. Scheidegger, and H. T. Vo. Managing rapidly-evolving scientific workflows. In IPAW, pages 10--18, 2006. Google Scholar
Digital Library
- Y. Gil, W. K. Cheung, V. Ratnakar, and K. kin Chan. Priv- acy enforcement in data analysis workflows. In PEAS, 2007.Google Scholar
- Y. Gil and C. Fritz. Reasoning about the appropriate use of private data through computational workflows. In Intelli- gent Information Privacy Management, pages 69--74, 2010.Google Scholar
- R. Hasan, R. Sion, and M. Winslett. Introducing secure provenance: problems and challenges. In StorageSS, pages 13--18, 2007. Google Scholar
Digital Library
- J. Lyle and A. Martin. Trusted computing and provenance: better together. In TAPP, page 1, 2010. Google Scholar
Digital Library
- A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. In ICDE, page 24, 2006. Google Scholar
Digital Library
- G. Miklau and D. Suciu. A formal analysis of information disclosure in data exchange. In SIGMOD, pages 575--586, 2004. Google Scholar
Digital Library
- L. Moreau, J. Freire, J. Futrelle, R. E. McGrath, J. Myers, and P. Paulson. The open provenance model: An overview. In IPAW, pages 323--326, 2008. Google Scholar
Digital Library
- R. Motwani, S. U. Nabar, and D. Thomas. Auditing sql queries. In ICDE, pages 287--296, 2008. Google Scholar
Digital Library
- T. 'Oinn phet al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, 20(1):3045--3054, 2003. Google Scholar
Digital Library
- V. Rastogi, M. Hay, G. Miklau, and D. Suciu. Relationship privacy: output perturbation for queries with joins. In PODS, pages 107--116, 2009. Google Scholar
Digital Library
- L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557--570, 2002. Google Scholar
Digital Library
- V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Rec., 33(1):50--57, 2004. Google Scholar
Digital Library
Index Terms
Provenance views for module privacy
Recommendations
On provenance and privacy
ICDT '11: Proceedings of the 14th International Conference on Database TheoryProvenance in scientific workflows is a double-edged sword. On the one hand, recording information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions, ...
A propagation model for provenance views of public/private workflows
ICDT '13: Proceedings of the 16th International Conference on Database TheoryWe study the problem of concealing functionality of a proprietary or private module when provenance information is shown over repeated executions of a workflow which contains both public and private modules. Our approach is to use provenance views to ...
Privacy-preserving publication of provenance workflows
CODASPY '14: Proceedings of the 4th ACM conference on Data and application security and privacyProvenance workflows capture the data movement and the operations changing the data in complex applications such as scientific computations, document management in large organizations, content generation in social media, etc. Provenance is essential to ...






Comments