skip to main content
research-article

Causality-based versioning

Published:14 December 2009Publication History
Skip Abstract Section

Abstract

Versioning file systems provide the ability to recover from a variety of failures, including file corruption, virus and worm infestations, and user mistakes. However, using versions to recover from data-corrupting events requires a human to determine precisely which files and versions to restore. We can create more meaningful versions and enhance the value of those versions by capturing the causal connections among files, facilitating selection and recovery of precisely the right versions after data corrupting events.

We determine when to create new versions of files automatically using the causal relationships among files. The literature on versioning file systems usually examines two extremes of possible version-creation algorithms: open-to-close versioning and versioning on every write. We evaluate causal versions of these two algorithms and introduce two additional causality-based algorithms: Cycle-Avoidance and Graph-Finesse.

We show that capturing and maintaining causal relationships imposes less than 7% overhead on a versioning system, providing benefit at low cost. We then show that Cycle-Avoidance provides more meaningful versions of files created during concurrent program execution, with overhead comparable to open/close versioning. Graph-Finesse provides even greater control, frequently at comparable overhead, but sometimes at unacceptable overhead. Versioning on every write is an interesting extreme case, but is far too costly to be useful in practice.

References

  1. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. 1990. Basic local alignment search tool. Molec. Biol. 215, 403--410.Google ScholarGoogle ScholarCross RefCross Ref
  2. Braun, U., Garfinkel, S., Muniswamy-Reddy, K.-K., Holland, D. A., and Seltzer, M. 2006. Issues in automatic provenance collection. In Proceedings of the International Provenance and Annotation Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cellary, W. and Jomier, G. 1990. Consistency of versions in objects-oriented databases. In Proceedings of the 16th International Conference on Very Large Databases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Chapman, A. P., Jagadish, H. V., and Ramanan, P. 2008. Efficient provenance storage. In Proceedings of the ACM SIGMOD International Conference on Management of data (SIGMOD'08). ACM, New York, NY, 993--1006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chutani, S., Anderson, O. T., Kazar, M. L., Leverett, B. W., Mason, W. A., and Sidebotham, R. N. 1992. The Episode file system. In Proceedings of the USENIX Technical Conference. 43--60.Google ScholarGoogle Scholar
  6. Cornell, B., Dinda, P., and Bustamante, F. 2004. Wayback: A user-level versioning file system for Linux. In Proceedings of the USENIX Annual Technical Conference, FREENIX Track. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Goel, A., Po, K., Farhadi, K., Li, Z., and de Lara, E. 2005. The Taser intrusion recovery system. In Proceedings of the 20th ACM SIGOPS Symposium on Operating Systems Principles (SOSP'05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Halcrow, M. A. 2005. eCryptfs: An enterprise-class encrypted filesystem for Linux. Proceedings of the Ottawa Linux Symposium.Google ScholarGoogle Scholar
  9. Hitz, D., Lau, J., and Malcolm, M. 1994. File system design for an nfs file server appliance. In Proceedings of the USENIX Winter Technical Conference. 235--245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. King, S. T. and Chen, P. M. 2003. Backtracking Intrusions. In Proceedings of the 19th ACM SIGOPS Symposium on Operating Systems Principles (SOSP'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. King, S. T., Mao, Z. M., Lucchetti, D. G., and Chen, P. M. 2005. Enriching intrusion alerts through multi-host causality. In Proceedings of the 12th Annual Network and Distributed System Security Symposium.Google ScholarGoogle Scholar
  12. Kistler, J. J. and Satyanarayanan, M. 1991. Disconnected operation in the Coda file system. In Proceedings of the 13th ACM SIGOPS Symposium on Operating Systems Principles (SOSP'91). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Laadan, O. and Nieh, J. 2007. Transparent checkpoint-restart of multiple processes on commodity operating systems. In Proceedings of the USENIX Annual Technical Conference (ATC'07). USENIX Association, Berkeley, CA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Muniswamy-Reddy, K., Wright, C. P., Himmer, A., and Zadok, E. 2004. A versatile and user-oriented versioning file system. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Muniswamy-Reddy, K.-K., Braun, U., Holland, D. A., Macko, P., Maclean, D., Margo, D., Seltzer, M., and Smogor, R. 2009. Layering in provenance systems. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Muniswamy-Reddy, K.-K., Holland, D. A., Braun, U., and Seltzer, M. 2006. Provenance-aware storage systems. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Peterson, Z. and Burns, R. 2005. Ext3cow: A time-shifting file system for regulatory compliance. ACM Trans. Stor. 1, 2, 190--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Prabhakaran, V., Bairavasundaram, L., Agrawal, N., Gunawi, H., Arpaci-Dusseau, A., and Arpaci-Dusseau, R. 2005. IRON File Systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP'05). 206--220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Quinlan, S. 1991. A cached worm file system. Softw. Pract. Exper. 21, 12, 1289--1299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Quinlan, S. and Dorward, S. 2002. Venti: a new approach to archival storage. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST'02). 89--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Santry, D. S., Feeley, M. J., Hutchinson, N. C., Veitch, A. C., Carton, R., and Ofir, J. 1999. Deciding when to forget in the elephant file system. In Proceedings of the 17th ACM SIGOPS Symposium on Operating Systems Principles (SOSP'99). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Shah, S., Soules, C. A. N., Ganger, G. R., and Noble, B. D. 2007. Using provenance to aid in personal file search. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shaull, R., Shrira, L., and Xu, H. 2008. Skippy: A new snapshot indexing method for time travel in the storage manager. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shrira, L. and Xu, H. 2006. Thresher: An efficient storage manager for copy-on-write snapshots. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Simmhan, Y. L., Plale, B., and Gannon, D. 2005. A survey of data provenance in e-science. SIGMOD Rec. 34, 3, 31--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Somayaji, A. and Forrest, S. 2000. Automated response using system-call delays. In Proceedings of the USENIX Security Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Soules, C. A. N., Goodson, G. R., Strunk, J. D., and Ganger, G. R. 2003. Metadata efficiency in versioning file systems. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03). 43--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Su, Y.-Y., Attariyan, M., and Flinn, J. 2007. Autobash: Improving configuration management with operating system causality analysis. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP'07). ACM, New York, NY, 237--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sundararaman, S., Sivathanu, G., and Zadok, E. 2008. Selective versioning in a secure disk system. In Proceedings of the 17th USENIX Security Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Talens, G., Oussalah, C., and Colinas, M. F. 1993. Versions of simple and composite objects. In Proceedings of the 19th International Conference on Very Large Data Bases (VLDB'93). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zhu, N. and Chiueh, T.-C. 2003. Design, implementation, and evaluation of repairable file service. In Proceedings of the International Conference on Dependable Systems and Networks.Google ScholarGoogle Scholar

Index Terms

  1. Causality-based versioning

        Recommendations

        Reviews

        David Gary Hill

        Logical data protection problems-such as accidental file deletion and data corruption due to a virus or a worm-can render files useless. A versioning file system can enable recovery from such failures. The problem is how to find the right files and versions, so that restoration is easy and correct. This paper advances the notion that causality-based versioning can facilitate the process of selecting and recovering the right versions of a file-after the occurrence of a logical data protection problem. Causality information is derived by examining the processes that read and write files, as well as any changes to the files, in order to determine how two files differ and from what file a certain file is derived. The authors compare two causality-based algorithms-cycle-avoidance and graph-finesse-to two traditional algorithms in versioning file systems-"open-to-close versioning and versioning on every write." Compared to the two traditional algorithms, the two new ones do not introduce any significant new overheads, such as compile space, postmark space, or mercurial activity space; in fact, they perform better than versioning on every write algorithm. As we depend on files more and more-files that are exposed to many risks, such as data corruption-the ability to quickly recover the right files is critical. This paper should be mandatory reading for anyone involved with file system design and development. Online Computing Reviews Service

        Access critical reviews of Computing literature here

        Become a reviewer for Computing Reviews.

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 5, Issue 4
          December 2009
          155 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/1629080
          Issue’s Table of Contents

          Copyright © 2009 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 December 2009
          • Accepted: 1 August 2009
          • Received: 1 February 2009
          Published in tos Volume 5, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!