skip to main content
research-article

PRESIDIO: A Framework for Efficient Archival Data Storage

Published:01 July 2011Publication History
Skip Abstract Section

Abstract

The ever-increasing volume of archival data that needs to be reliably retained for long periods of time and the decreasing costs of disk storage, memory, and processing have motivated the design of low-cost, high-efficiency disk-based storage systems. However, managed disk storage is still expensive. To further lower the cost, redundancy can be eliminated with the use of interfile and intrafile data compression. However, it is not clear what the optimal strategy for compressing data is, given the diverse collections of data.

To create a scalable archival storage system that efficiently stores diverse data, we present PRESIDIO, a framework that selects from different space-reduction efficent storage methods (ESMs) to detect similarity and reduce or eliminate redundancy when storing objects. In addition, the framework uses a virtualized content addressable store (VCAS) that hides from the user the complexity of knowing which space-efficient techniques are used, including chunk-based deduplication or delta compression. Storing and retrieving objects are polymorphic operations independent of their content-based address. A new technique, harmonic super-fingerprinting, is also used for obtaining successively more accurate (but also more costly) measures of similarity to identify the existing objects in a very large data set that are most similar to an incoming new object.

The PRESIDIO design, when reported earlier, had comprehensively introduced for the first time the notion of deduplication, which is now being offered as a service in storage systems by major vendors. As an aid to the design of such systems, we evaluate and present various parameters that affect the efficiency of a storage system using empirical data.

References

  1. Adya, A., Bolosky, W. J., Castro, M., Chaiken, R., Cermak, G., Douceur, J. R., Howell, J., Lorch, J. R., Theimer, M., and Wattenhofer, R. 2002. FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ajtai, M., Burns, R., Fagin, R., Long, D. D. E., and Stockmeyer, L. 2002. Compactly encoding unstructured inputs with differential compression. J. ACM 49, 3, 318--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alvarez, C. 2010. NetApp deduplication for FAS and V-Series deployment and implementation guide. Tech. rep. TR-3505, NetApp.Google ScholarGoogle Scholar
  4. Apache Subversion. 2010. http://subversion.apache.org/.Google ScholarGoogle Scholar
  5. Bhagwat, D., Pollack, K., Long, D. D. E., Schwarz, T., Miller, E. L., and Paris, J.-F. 2006. Providing high reliability in a minimum redundancy archival storage system. In Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation. IEEE, Los Alamitos, CA, 413--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brin, S. and Page, L. 1998a. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference. 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brin, S. and Page, L. 1998b. The anatomy of a large-scale hypertextual web search engine. http://www-db.stanford.edu/_backrub/google.html.Google ScholarGoogle Scholar
  8. Broder, A. Z. 1993. Some applications of Rabin’s fingerprinting method. In Sequences II: Methods in Communications, Security, and Computer Science, R. Capocelli et al. Eds., Springer, Berlin, 143--152.Google ScholarGoogle Scholar
  9. Broder, A. Z., Glassman, S. C., Manasse, M. S., and Zweig, G. 1997. Syntactic clustering of the web. In Proceedings of the 6th International World Wide Web Conference. 391--404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Broder, A. Z. 1998. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences (SEQUENCES’97). IEEE, Los Alamitos, CA, 21--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Broder, A. Z., Charikar, M., Frieze, A. M., and Mitzenmacher, M. 1998. Min-wise independent permutations. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC’98). 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Broder, A. Z., Charikar, M., Frieze, A. M., and Mitzenmacher, M. 2000. Min-wise independent permutations. J. Comput. Syst. Sci. 60, 3, 630--659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Buchsbaum, A. L., Caldwell, D. F., Church, K. W., Fowler, G. S., and Muthukrishnan, S. 2000. Engineering the compression of massive tables: An experimental approach. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’00). ACM, New York, 175--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Buchsbaum, A. L., Fowler, G. S., and Giancarlo, R. 2003. Improving table compression with combinatorial optimization. J. ACM 50, 6, 825--851. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Buneman, P., Khanna, S., Tajima, K., and Tan, W. C. 2002. Archiving scientific data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Burns, R. 1996. Differential compression: A generalized solution for binary files. M.S. thesis, University of California, Santa Cruz.Google ScholarGoogle Scholar
  17. Burns, R. and Long, D. D. E. 1997a. Efficient distributed back-up with delta compression. In Proceedings of the IO Conference on Parallel and Distributed Systems (IOPADS’97). ACM, New York, 27--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Burns, R. and Long, D. D. E. 1997b. A linear time, constant space differencing algorithm. In Proceedings of the 16th IEEE International Performance, Computing and Communications Conference (IPCCC’97). IEEE, Los Alamitos, CA, 429--436.Google ScholarGoogle Scholar
  19. Burns, R. and Long, D. D. E. 1998. In-place reconstruction of delta compressed files. In Proceedings of the 17th ACM Symposium on Principles of Distributed Computing (PODC’98). ACM, New York, 267--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Burns, R., Stockmeyer, L., and Long, D. D. E. 2002. Experimentally evaluating in-place delta reconstruction. In Proceedings of the 19th IEEE Symposium on Mass Storage Systems and Technologies. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  21. Burrows, M. and Wheeler, D. J. 1994. A block-sorting lossless data compression algorithm. Tech. rep. 124, Digital Systems Research Center.Google ScholarGoogle Scholar
  22. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Berkeley, CA, 205--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Charikar, M. S. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC’02). ACM, New York, 380--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Chen, Y., Edler, J., Goldberg, A., Gottlieb, A., Sobti, S., and Yianilos, P. 1999. A prototype implementation of archival intermemory. In Proceedings of the 4th ACM International Conference on Digital Libraries (DL’99). ACM, New York, 28--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Crespo, A. and Garcia-Molina, H. 1998. Archival storage for digital libraries. In Proceedings of the 3rd ACM International Conference on Digital Libraries (DL’98). ACM, New York, 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Dean, J. and Henziger, M. R. 1999. Finding related pages in the World Wide Web. In Proceedings of the 8th International World Wide Web Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Douglis, F. and Iyengar, A. 2003. Application-specific delta-encoding via resemblance detection. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA.Google ScholarGoogle Scholar
  28. Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., and Welnicki, M. 2009. HYDRAstor: A scalable secondary storage. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 197--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. EMC Corporation. 2002. EMC Centera: Content addressed storage system, data sheet. http://www.emc.com/pdf/products/centera/centera\_ds.pdf.Google ScholarGoogle Scholar
  30. Fetterly, D., Manasse, M., Najork, M., and Wiener, J. 2003. A large-scale study of the evolution of web pages. In Proceedings of the 12th International World Wide Web Conference. ACM, New York, 669--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Free Software Foundation. 2000. The gzip data compression program. http://www.gnu.org/software/gzip/gzip.html.Google ScholarGoogle Scholar
  32. Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Gibson, T. J. 1998. Long-term UNIX file system activity and the efficacy of automatic file migration. Ph.D. dissertation, University of Maryland, Baltimore. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Goldberg, A. V. and Yianilos, P. N. 1998. Towards an archival intermemory. In Proceedings of the IEEE Advances in Digital Libraries (ADL’98). IEEE, Los Alamitos, CA, 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Gray, J. and Shenoy, P. 2000. Rules of thumb in data engineering. In Proceedings of the 16th International Conference on Data Engineering (ICDE’00). IEEE, Los Alamitos, CA, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Gray, J., Chong, W., Barclay, T., Szalay, A., and Vandenberg, J. 2002. TeraScale SneakerNet: Using inexpensive disks for backup, archiving, and data exchange. Tech. rep. MS-TR-02-54, Microsoft Research, Redmond, WA.Google ScholarGoogle Scholar
  37. Henson, V. 2003. An analysis of compare-by-hash. In Proceedings of the 9th Workshop on Hot Topics in Operating Systems (HotOS-IX). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Henzinger, M. 2006. Finding near-duplicate web pages: A large-scale evaluation of algorithms. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). ACM, New York, 284--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hitachi Global Storage Technologies. 2004. Hitachi hard disk drive specification: Deskstar 7K400 3.5-inch Ultra ATA/133 and 3.5-inch Serial ATA hard disk drives, ver. 1.4.Google ScholarGoogle Scholar
  40. Hitz, D., Lau, J., and Malcom, M. 1994. File system design for an NFS file server appliance. In Proceedings of the Winter USENIX Technical Conference. USENIX Association, Berkeley, CA, 235--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Hollingsworth, J. and Miller, E. L. 1997. Using content-derived names for configuration management. In Proceedings of the Symposium on Software Reusability (SSR’97), 104--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Hong, B., Plantenberg, D., Long, D. D. E., and Sivan-Zimet, M. 2004. Duplicate data elimination in a SAN file system. In Proceedings of the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  43. Hunt, J. W. and McIlroy, M. D. 1976. An algorithm for differential file comparison. Tech. rep. CSTR 41, Bell Laboratories, Murray Hill, NJ.Google ScholarGoogle Scholar
  44. IBM. 1999. IBM OEM hard disk drive specification for DPTA-3xxxxx 37.5 GB-13.6 GB 3.5-inch hard disk drive with ATA interface, revision (2.1).Deskstar 34GXP and 37GP hard disk drives.Google ScholarGoogle Scholar
  45. IBM. 2005. IBM Tivoli Storage Manager. http://www.tivoli.com/products/solutions/storage/.Google ScholarGoogle Scholar
  46. Indyk, P. and Motwani, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC’98). ACM, New York, 604--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: A review. ACM Comput. Surv. 31, 3, 264--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jain, N., Dahlin, M., and Tewari, R. 2005. TAPER: Tiered approach for eliminating redundancy in replica synchronization. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 281--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Kohl, J. T., Staelin, C., and Stonebraker, M. 1993. HighLight: Using a log-structured file system for tertiary storage management. In Proceedings of the Winter USENIX Technical Conference. USENIX Association, Berkeley, CA, 435--447.Google ScholarGoogle Scholar
  50. Kolivas, C. 2010. lrzip v0.46. http://ck.kolivas.org/apps/lrzip/README.Google ScholarGoogle Scholar
  51. Koller, R. and Rangaswami, R. 2010. I/O deduplication: Utilizing content similarity to improve I/O performance. ACM Trans. Storage 6, 3, 1--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Korn, D. G. and Vo, K.-P. 2002. Engineering a differencing and compression data format. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 219--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Korn, D. G., MacDonald, J., Mogul, J., and Vo, K. 2002. The VCDIFF generic differencing and compression data format. Request For Comments (RFC) 3284, IETF. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Kubiatowicz, J., Bindel, D., Chen, Y., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. 2000. OceanStore: An architecture for global-scale persistent storage. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Kulkarni, P., Douglis, F., Lavoie, J., and Tracey, J. M. 2004. Redundancy elimination within large collections of files. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 59--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Lee, E. K. and Thekkath, C. A. 1996. Petal: Distributed virtual disks. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, New York, 84--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Lewis, B. 2008. Deduplication comes of age. http://www.netapp.com/us/communities/tech-ontap/dedupe-0708.html.Google ScholarGoogle Scholar
  58. Liefke, H. and Suciu, D. 2000. XMill: An efficient compressor for XML data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 153--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezise, G., and Camble, P. 2009. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 111--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Litwin, W. and Neimat, M.-A. 1996. High-availability LH* schemes with mirroring. In Proceedings of the Conference on Cooperative Information Systems. 196--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Litwin, W., Neimat, M.-A., and Schneider, D. A. 1993. LH*---Linear hashing for distributed files. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Litwin, W., Neimat, M.-A., and Schneider, D. A. 1996. LH*---A scalable, distributed data structure. ACM Trans. Datab. Syst. 21, 4, 480--525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Long, D. D. E. 2002. A scalable on-line associative deep store. NSF Grant Proposal, Award Number 0310888.Google ScholarGoogle Scholar
  64. Lorie, R. A. 2001. A project on preservation of digital data. http://www.rlg.org/preserv/diginews/diginews5-3.html. Vol. 5, No. 3.Google ScholarGoogle Scholar
  65. Lorie, R. A. 2004. Long-term archival of digital information. Storage Systems Research Center Seminar, University of California, Santa Cruz.Google ScholarGoogle Scholar
  66. Lyman, P., Varian, H. R., Searingen, K., Charles, P., Good, N., Jordan, L. L., and Pal, J. 2003. How much information? 2003. http://www.sims.berkeley.edu/research/projects/how-much-info-2003/.Google ScholarGoogle Scholar
  67. MacDonald, J. P. 2000. File system support for delta compression. M.S. thesis, University of California, Berkeley.Google ScholarGoogle Scholar
  68. Mahalingam, M., Tang, C., and Xu, Z. 2002. Towards a semantic, deep archival file system. Tech. rep. HPL-2002-199, HP Laboratories, Palo Alto, CA.Google ScholarGoogle Scholar
  69. Manasse, M. 2003. Finding similar things quickly in large collections. http://research.microsoft.com/research/sv/PageTurner/similarity.htm.Google ScholarGoogle Scholar
  70. Manber, U. 1993. Finding similar files in a large file system. Tech. rep. TR 93-33, Department of Computer Science, The University of Arizona, Tucson, AZ.Google ScholarGoogle Scholar
  71. Manku, G. S., Jain, A., and Das Sarma, A. 2007. Detecting near-duplicates for web crawling. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, 141--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Mogul, J., Douglis, F., Feldmann, A., and Krishnamurthy, B. 1997. Potential benefits of delta-encoding and data compression for HTTP. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’97). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Muthitacharoen, A., Chen, B., and Mazieres, D. 2001. A low-bandwidth network file system. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01), ACM, New York, 174--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. National Institute of Standards and Technology. 2008. FIPS PUB 180-3: Secure Hash Standard (SHS). Gaithersburg, MD 20899-8900.Google ScholarGoogle Scholar
  75. Nelson, M. and Gailly, J.-L. 1996. The Data Compression Book 2nd Ed. M&T Books, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Oracle Berkeley DB. 2010. Berkeley DB Database. http://www.oracle.com/database/berkeley-db/index.html.Google ScholarGoogle Scholar
  77. Otoo, E. J. 1986. Balanced multidimensional extendible hash tree. In Proceedings of the 5th ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. ACM, New York, 100--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Ouksel, M. and Scheuermann, P. 1983. Storage mappings for multidimensional linear dynamic hashing. In Proceedings of the 2nd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. ACM, New York, 90--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Ouyang, Z., Memon, N., Suel, T., and Trendafilov, D. 2002. Cluster-based delta compression of a collection of files. In Proceedings of the International Conference on Web Information Systems Engineering (WISE’02). IEEE, Los Alamitos, CA, 257--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Quinlan, S. and Dorward, S. 2002. Venti: A new approach to archival storage. In Proceedings of the Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 89--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Rabin, M. O. 1981. Fingerprinting by random polynomials. Tech. rep. TR-15-81, Center for Research in Computing Technology, Harvard University.Google ScholarGoogle Scholar
  82. Rajasekar, A. and Moore, R. 2001. Data and metadata collections for scientific applications. In Proceedings of the 9th International Conference on High-Performance Computing and Networking (HPCN Europe’01). Springer, Berlin, 72--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Riggle, C. M. and McCarthy, S. G. 1998. Design of error correction systems for disk drives. IEEE Transactions on Magnetics 34, 4, 2362--2371.Google ScholarGoogle ScholarCross RefCross Ref
  84. Rivest, R. 1992. The MD5 message-digest algorithm. Request for comments (RFC) 1321, IETF. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Rochkind, M. J. 1975. The source code control system. IEEE Transactions on Software Engineering SE-1, 4, 364--370.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Rosenblum, M. and Ousterhout, J. K. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Rothenberg, J. 1995. Ensuring the longevity of digital documents. Sci. American 272, 1, 42--47.Google ScholarGoogle Scholar
  88. Saito, Y. and Karamanolis, C. 2002. Pangaea: A symbiotic wide-area file system. In Proceedings of the ACM SIGOPS European Workshop. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Saito, Y., Karamanolis, C., Karlsson, M., and Mahalingam, M. 2002. Taming aggressive replication in the Pangaea wide-area file system. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. SAND Technology. 2009. SAND/DNA. http://www.sand.com/dna/compress/index.html/.Google ScholarGoogle Scholar
  91. Santry, D. S., Feeley, M. J., Hutchinson, N. C., Veitch, A. C., Carton, R. W., and Ofir, J. 1999. Deciding when to forget in the Elephant file system. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP’99). 110--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Sayood, K., Ed. 2003. Lossless Compression Handbook. Academic Press.Google ScholarGoogle Scholar
  93. Schmuck, F. and Haskin, R. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the Conference on File and Storage Technologies (FAST). USENIX Association, 231--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Security Innovation, I. 2006. Regulatory compliance demystified: An introduction to compliance for developers. http://msdn.microsoft.com/en-us/library/aa480484.aspx.Google ScholarGoogle Scholar
  95. Seltzer, M. I. and Yigit, O. 1991. A new hashing package for UNIX. In Proceedings of the Winter USENIX Technical Conference. USENIX Association, Berkeley, CA, 173--184.Google ScholarGoogle Scholar
  96. Seward, J. 2002. http://sources.redhat.com/bzip2/.Google ScholarGoogle Scholar
  97. Tanenbaum, A. S., Herder, J. N., and Bos, H. 2006. File size distribution on UNIX systems: Then and now. ACM SIGOPS Operating Systems Review 40, 1, 100--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Tichy, W. F. 1985. RCS---A system for version control. Softw. Pract. Exper. 15, 7, 637--654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Trendafilov, D., Memon, N., and Suel, T. 2002. Zdelta: An efficient delta compression tool. Tech. rep. TR-CIS- 2002-02, Polytechnic University.Google ScholarGoogle Scholar
  100. Trendafilov, D., Memon, N., and Suel, T. 2004. Compression file collections with a TSP-based approach. Tech. rep. TR-CIS-2004-02, Polytechnic University.Google ScholarGoogle Scholar
  101. Tridgell, A. 1999. Efficient algorithms for sorting and synchronization. Ph.D. thesis, Australian National University.Google ScholarGoogle Scholar
  102. Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Całkowski, G., Dubnicki, C., and Bohra, A. 2010. HydraFS: A high-throughput file system for the HYDRAstor content-addressable storage system. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 17--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Vo, B. D. and Manku, G. S. 2007. RadixZip: Linear time compression of token streams. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 1162--1172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Vo, K.-P. 2007. Vcodex: A data compression platform. In Proceedings of the International Conference on Software and Data Technologies (ICSOFT’07). Springer, Berlin, 201--212.Google ScholarGoogle Scholar
  105. Wiederhold, G. 1983. Database Design 2nd Ed. McGraw-Hill, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images 2nd Ed. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Yang, T., Jiang, H., Feng, D., and Niu, Z. 2009. DEBAR: A scalable high-performance de-duplication storage system for backup and archiving. Tech. rep. TR-UNL-CSE-2009-0004, University of Nebraska-Lincoln.Google ScholarGoogle Scholar
  108. You, L. L. 2006. Efficient archival data storage. Ph.D. dissertation, University of California, Santa Cruz. Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. You, L. L. and Karamanolis, C. 2004. Evaluation of efficient archival storage techniques. In Proceedings of the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies. IEEE, Los Alamitos, CA, 227--232.Google ScholarGoogle Scholar
  110. You, L. L., Pollack, K. T., and Long, D. D. E. 2005. Deep Store: An archival storage system architecture. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Zadok, E., Osborn, J., Shater, A., Wright, C. P., Muniswamy-Reddy, K.-K., and Nieh, J. 2003. Reducing storage management costs via informed user-based policies. Tech. rep. FSL-03-01, Computer Science Department, SUNY, Stony Brook. http://www.ncl.cs.columbia.edu/publications/sunysb-fsl-03-01.pdf.Google ScholarGoogle Scholar
  112. Zhu, B., Li, K., and Patterson, H. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08). USENIX Association, Berkeley, CA, 18:1--18:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Ziv, J. and Lempel, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory IT-23, 3, 337--343.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PRESIDIO: A Framework for Efficient Archival Data Storage

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!