Abstract
The ever-increasing volume of archival data that needs to be reliably retained for long periods of time and the decreasing costs of disk storage, memory, and processing have motivated the design of low-cost, high-efficiency disk-based storage systems. However, managed disk storage is still expensive. To further lower the cost, redundancy can be eliminated with the use of interfile and intrafile data compression. However, it is not clear what the optimal strategy for compressing data is, given the diverse collections of data.
To create a scalable archival storage system that efficiently stores diverse data, we present PRESIDIO, a framework that selects from different space-reduction efficent storage methods (ESMs) to detect similarity and reduce or eliminate redundancy when storing objects. In addition, the framework uses a virtualized content addressable store (VCAS) that hides from the user the complexity of knowing which space-efficient techniques are used, including chunk-based deduplication or delta compression. Storing and retrieving objects are polymorphic operations independent of their content-based address. A new technique, harmonic super-fingerprinting, is also used for obtaining successively more accurate (but also more costly) measures of similarity to identify the existing objects in a very large data set that are most similar to an incoming new object.
The PRESIDIO design, when reported earlier, had comprehensively introduced for the first time the notion of deduplication, which is now being offered as a service in storage systems by major vendors. As an aid to the design of such systems, we evaluate and present various parameters that affect the efficiency of a storage system using empirical data.
- Adya, A., Bolosky, W. J., Castro, M., Chaiken, R., Cermak, G., Douceur, J. R., Howell, J., Lorch, J. R., Theimer, M., and Wattenhofer, R. 2002. FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Berkeley, CA. Google Scholar
Digital Library
- Ajtai, M., Burns, R., Fagin, R., Long, D. D. E., and Stockmeyer, L. 2002. Compactly encoding unstructured inputs with differential compression. J. ACM 49, 3, 318--367. Google Scholar
Digital Library
- Alvarez, C. 2010. NetApp deduplication for FAS and V-Series deployment and implementation guide. Tech. rep. TR-3505, NetApp.Google Scholar
- Apache Subversion. 2010. http://subversion.apache.org/.Google Scholar
- Bhagwat, D., Pollack, K., Long, D. D. E., Schwarz, T., Miller, E. L., and Paris, J.-F. 2006. Providing high reliability in a minimum redundancy archival storage system. In Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation. IEEE, Los Alamitos, CA, 413--421. Google Scholar
Digital Library
- Brin, S. and Page, L. 1998a. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference. 107--117. Google Scholar
Digital Library
- Brin, S. and Page, L. 1998b. The anatomy of a large-scale hypertextual web search engine. http://www-db.stanford.edu/_backrub/google.html.Google Scholar
- Broder, A. Z. 1993. Some applications of Rabin’s fingerprinting method. In Sequences II: Methods in Communications, Security, and Computer Science, R. Capocelli et al. Eds., Springer, Berlin, 143--152.Google Scholar
- Broder, A. Z., Glassman, S. C., Manasse, M. S., and Zweig, G. 1997. Syntactic clustering of the web. In Proceedings of the 6th International World Wide Web Conference. 391--404. Google Scholar
Digital Library
- Broder, A. Z. 1998. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences (SEQUENCES’97). IEEE, Los Alamitos, CA, 21--29. Google Scholar
Digital Library
- Broder, A. Z., Charikar, M., Frieze, A. M., and Mitzenmacher, M. 1998. Min-wise independent permutations. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC’98). 327--336. Google Scholar
Digital Library
- Broder, A. Z., Charikar, M., Frieze, A. M., and Mitzenmacher, M. 2000. Min-wise independent permutations. J. Comput. Syst. Sci. 60, 3, 630--659. Google Scholar
Digital Library
- Buchsbaum, A. L., Caldwell, D. F., Church, K. W., Fowler, G. S., and Muthukrishnan, S. 2000. Engineering the compression of massive tables: An experimental approach. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’00). ACM, New York, 175--184. Google Scholar
Digital Library
- Buchsbaum, A. L., Fowler, G. S., and Giancarlo, R. 2003. Improving table compression with combinatorial optimization. J. ACM 50, 6, 825--851. Google Scholar
Digital Library
- Buneman, P., Khanna, S., Tajima, K., and Tan, W. C. 2002. Archiving scientific data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York. Google Scholar
Digital Library
- Burns, R. 1996. Differential compression: A generalized solution for binary files. M.S. thesis, University of California, Santa Cruz.Google Scholar
- Burns, R. and Long, D. D. E. 1997a. Efficient distributed back-up with delta compression. In Proceedings of the IO Conference on Parallel and Distributed Systems (IOPADS’97). ACM, New York, 27--36. Google Scholar
Digital Library
- Burns, R. and Long, D. D. E. 1997b. A linear time, constant space differencing algorithm. In Proceedings of the 16th IEEE International Performance, Computing and Communications Conference (IPCCC’97). IEEE, Los Alamitos, CA, 429--436.Google Scholar
- Burns, R. and Long, D. D. E. 1998. In-place reconstruction of delta compressed files. In Proceedings of the 17th ACM Symposium on Principles of Distributed Computing (PODC’98). ACM, New York, 267--275. Google Scholar
Digital Library
- Burns, R., Stockmeyer, L., and Long, D. D. E. 2002. Experimentally evaluating in-place delta reconstruction. In Proceedings of the 19th IEEE Symposium on Mass Storage Systems and Technologies. IEEE, Los Alamitos, CA.Google Scholar
- Burrows, M. and Wheeler, D. J. 1994. A block-sorting lossless data compression algorithm. Tech. rep. 124, Digital Systems Research Center.Google Scholar
- Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Berkeley, CA, 205--215. Google Scholar
Digital Library
- Charikar, M. S. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC’02). ACM, New York, 380--388. Google Scholar
Digital Library
- Chen, Y., Edler, J., Goldberg, A., Gottlieb, A., Sobti, S., and Yianilos, P. 1999. A prototype implementation of archival intermemory. In Proceedings of the 4th ACM International Conference on Digital Libraries (DL’99). ACM, New York, 28--37. Google Scholar
Digital Library
- Crespo, A. and Garcia-Molina, H. 1998. Archival storage for digital libraries. In Proceedings of the 3rd ACM International Conference on Digital Libraries (DL’98). ACM, New York, 69--78. Google Scholar
Digital Library
- Dean, J. and Henziger, M. R. 1999. Finding related pages in the World Wide Web. In Proceedings of the 8th International World Wide Web Conference. Google Scholar
Digital Library
- Douglis, F. and Iyengar, A. 2003. Application-specific delta-encoding via resemblance detection. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA.Google Scholar
- Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., and Welnicki, M. 2009. HYDRAstor: A scalable secondary storage. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 197--210. Google Scholar
Digital Library
- EMC Corporation. 2002. EMC Centera: Content addressed storage system, data sheet. http://www.emc.com/pdf/products/centera/centera\_ds.pdf.Google Scholar
- Fetterly, D., Manasse, M., Najork, M., and Wiener, J. 2003. A large-scale study of the evolution of web pages. In Proceedings of the 12th International World Wide Web Conference. ACM, New York, 669--678. Google Scholar
Digital Library
- Free Software Foundation. 2000. The gzip data compression program. http://www.gnu.org/software/gzip/gzip.html.Google Scholar
- Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). ACM, New York. Google Scholar
Digital Library
- Gibson, T. J. 1998. Long-term UNIX file system activity and the efficacy of automatic file migration. Ph.D. dissertation, University of Maryland, Baltimore. Google Scholar
Digital Library
- Goldberg, A. V. and Yianilos, P. N. 1998. Towards an archival intermemory. In Proceedings of the IEEE Advances in Digital Libraries (ADL’98). IEEE, Los Alamitos, CA, 147--156. Google Scholar
Digital Library
- Gray, J. and Shenoy, P. 2000. Rules of thumb in data engineering. In Proceedings of the 16th International Conference on Data Engineering (ICDE’00). IEEE, Los Alamitos, CA, 3--12. Google Scholar
Digital Library
- Gray, J., Chong, W., Barclay, T., Szalay, A., and Vandenberg, J. 2002. TeraScale SneakerNet: Using inexpensive disks for backup, archiving, and data exchange. Tech. rep. MS-TR-02-54, Microsoft Research, Redmond, WA.Google Scholar
- Henson, V. 2003. An analysis of compare-by-hash. In Proceedings of the 9th Workshop on Hot Topics in Operating Systems (HotOS-IX). USENIX Association, Berkeley, CA. Google Scholar
Digital Library
- Henzinger, M. 2006. Finding near-duplicate web pages: A large-scale evaluation of algorithms. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’06). ACM, New York, 284--291. Google Scholar
Digital Library
- Hitachi Global Storage Technologies. 2004. Hitachi hard disk drive specification: Deskstar 7K400 3.5-inch Ultra ATA/133 and 3.5-inch Serial ATA hard disk drives, ver. 1.4.Google Scholar
- Hitz, D., Lau, J., and Malcom, M. 1994. File system design for an NFS file server appliance. In Proceedings of the Winter USENIX Technical Conference. USENIX Association, Berkeley, CA, 235--246. Google Scholar
Digital Library
- Hollingsworth, J. and Miller, E. L. 1997. Using content-derived names for configuration management. In Proceedings of the Symposium on Software Reusability (SSR’97), 104--109. Google Scholar
Digital Library
- Hong, B., Plantenberg, D., Long, D. D. E., and Sivan-Zimet, M. 2004. Duplicate data elimination in a SAN file system. In Proceedings of the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies. IEEE, Los Alamitos, CA.Google Scholar
- Hunt, J. W. and McIlroy, M. D. 1976. An algorithm for differential file comparison. Tech. rep. CSTR 41, Bell Laboratories, Murray Hill, NJ.Google Scholar
- IBM. 1999. IBM OEM hard disk drive specification for DPTA-3xxxxx 37.5 GB-13.6 GB 3.5-inch hard disk drive with ATA interface, revision (2.1).Deskstar 34GXP and 37GP hard disk drives.Google Scholar
- IBM. 2005. IBM Tivoli Storage Manager. http://www.tivoli.com/products/solutions/storage/.Google Scholar
- Indyk, P. and Motwani, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing (STOC’98). ACM, New York, 604--613. Google Scholar
Digital Library
- Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: A review. ACM Comput. Surv. 31, 3, 264--323. Google Scholar
Digital Library
- Jain, N., Dahlin, M., and Tewari, R. 2005. TAPER: Tiered approach for eliminating redundancy in replica synchronization. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 281--294. Google Scholar
Digital Library
- Kohl, J. T., Staelin, C., and Stonebraker, M. 1993. HighLight: Using a log-structured file system for tertiary storage management. In Proceedings of the Winter USENIX Technical Conference. USENIX Association, Berkeley, CA, 435--447.Google Scholar
- Kolivas, C. 2010. lrzip v0.46. http://ck.kolivas.org/apps/lrzip/README.Google Scholar
- Koller, R. and Rangaswami, R. 2010. I/O deduplication: Utilizing content similarity to improve I/O performance. ACM Trans. Storage 6, 3, 1--26. Google Scholar
Digital Library
- Korn, D. G. and Vo, K.-P. 2002. Engineering a differencing and compression data format. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 219--228. Google Scholar
Digital Library
- Korn, D. G., MacDonald, J., Mogul, J., and Vo, K. 2002. The VCDIFF generic differencing and compression data format. Request For Comments (RFC) 3284, IETF. Google Scholar
Digital Library
- Kubiatowicz, J., Bindel, D., Chen, Y., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. 2000. OceanStore: An architecture for global-scale persistent storage. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, New York. Google Scholar
Digital Library
- Kulkarni, P., Douglis, F., Lavoie, J., and Tracey, J. M. 2004. Redundancy elimination within large collections of files. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 59--72. Google Scholar
Digital Library
- Lee, E. K. and Thekkath, C. A. 1996. Petal: Distributed virtual disks. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, New York, 84--92. Google Scholar
Digital Library
- Lewis, B. 2008. Deduplication comes of age. http://www.netapp.com/us/communities/tech-ontap/dedupe-0708.html.Google Scholar
- Liefke, H. and Suciu, D. 2000. XMill: An efficient compressor for XML data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 153--164. Google Scholar
Digital Library
- Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezise, G., and Camble, P. 2009. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 111--123. Google Scholar
Digital Library
- Litwin, W. and Neimat, M.-A. 1996. High-availability LH* schemes with mirroring. In Proceedings of the Conference on Cooperative Information Systems. 196--205. Google Scholar
Digital Library
- Litwin, W., Neimat, M.-A., and Schneider, D. A. 1993. LH*---Linear hashing for distributed files. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 327--336. Google Scholar
Digital Library
- Litwin, W., Neimat, M.-A., and Schneider, D. A. 1996. LH*---A scalable, distributed data structure. ACM Trans. Datab. Syst. 21, 4, 480--525. Google Scholar
Digital Library
- Long, D. D. E. 2002. A scalable on-line associative deep store. NSF Grant Proposal, Award Number 0310888.Google Scholar
- Lorie, R. A. 2001. A project on preservation of digital data. http://www.rlg.org/preserv/diginews/diginews5-3.html. Vol. 5, No. 3.Google Scholar
- Lorie, R. A. 2004. Long-term archival of digital information. Storage Systems Research Center Seminar, University of California, Santa Cruz.Google Scholar
- Lyman, P., Varian, H. R., Searingen, K., Charles, P., Good, N., Jordan, L. L., and Pal, J. 2003. How much information? 2003. http://www.sims.berkeley.edu/research/projects/how-much-info-2003/.Google Scholar
- MacDonald, J. P. 2000. File system support for delta compression. M.S. thesis, University of California, Berkeley.Google Scholar
- Mahalingam, M., Tang, C., and Xu, Z. 2002. Towards a semantic, deep archival file system. Tech. rep. HPL-2002-199, HP Laboratories, Palo Alto, CA.Google Scholar
- Manasse, M. 2003. Finding similar things quickly in large collections. http://research.microsoft.com/research/sv/PageTurner/similarity.htm.Google Scholar
- Manber, U. 1993. Finding similar files in a large file system. Tech. rep. TR 93-33, Department of Computer Science, The University of Arizona, Tucson, AZ.Google Scholar
- Manku, G. S., Jain, A., and Das Sarma, A. 2007. Detecting near-duplicates for web crawling. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, 141--150. Google Scholar
Digital Library
- Mogul, J., Douglis, F., Feldmann, A., and Krishnamurthy, B. 1997. Potential benefits of delta-encoding and data compression for HTTP. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’97). ACM, New York. Google Scholar
Digital Library
- Muthitacharoen, A., Chen, B., and Mazieres, D. 2001. A low-bandwidth network file system. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01), ACM, New York, 174--187. Google Scholar
Digital Library
- National Institute of Standards and Technology. 2008. FIPS PUB 180-3: Secure Hash Standard (SHS). Gaithersburg, MD 20899-8900.Google Scholar
- Nelson, M. and Gailly, J.-L. 1996. The Data Compression Book 2nd Ed. M&T Books, New York. Google Scholar
Digital Library
- Oracle Berkeley DB. 2010. Berkeley DB Database. http://www.oracle.com/database/berkeley-db/index.html.Google Scholar
- Otoo, E. J. 1986. Balanced multidimensional extendible hash tree. In Proceedings of the 5th ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. ACM, New York, 100--113. Google Scholar
Digital Library
- Ouksel, M. and Scheuermann, P. 1983. Storage mappings for multidimensional linear dynamic hashing. In Proceedings of the 2nd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. ACM, New York, 90--105. Google Scholar
Digital Library
- Ouyang, Z., Memon, N., Suel, T., and Trendafilov, D. 2002. Cluster-based delta compression of a collection of files. In Proceedings of the International Conference on Web Information Systems Engineering (WISE’02). IEEE, Los Alamitos, CA, 257--266. Google Scholar
Digital Library
- Quinlan, S. and Dorward, S. 2002. Venti: A new approach to archival storage. In Proceedings of the Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 89--101. Google Scholar
Digital Library
- Rabin, M. O. 1981. Fingerprinting by random polynomials. Tech. rep. TR-15-81, Center for Research in Computing Technology, Harvard University.Google Scholar
- Rajasekar, A. and Moore, R. 2001. Data and metadata collections for scientific applications. In Proceedings of the 9th International Conference on High-Performance Computing and Networking (HPCN Europe’01). Springer, Berlin, 72--80. Google Scholar
Digital Library
- Riggle, C. M. and McCarthy, S. G. 1998. Design of error correction systems for disk drives. IEEE Transactions on Magnetics 34, 4, 2362--2371.Google Scholar
Cross Ref
- Rivest, R. 1992. The MD5 message-digest algorithm. Request for comments (RFC) 1321, IETF. Google Scholar
Digital Library
- Rochkind, M. J. 1975. The source code control system. IEEE Transactions on Software Engineering SE-1, 4, 364--370.Google Scholar
Digital Library
- Rosenblum, M. and Ousterhout, J. K. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52. Google Scholar
Digital Library
- Rothenberg, J. 1995. Ensuring the longevity of digital documents. Sci. American 272, 1, 42--47.Google Scholar
- Saito, Y. and Karamanolis, C. 2002. Pangaea: A symbiotic wide-area file system. In Proceedings of the ACM SIGOPS European Workshop. ACM Press. Google Scholar
Digital Library
- Saito, Y., Karamanolis, C., Karlsson, M., and Mahalingam, M. 2002. Taming aggressive replication in the Pangaea wide-area file system. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Berkeley, CA. Google Scholar
Digital Library
- SAND Technology. 2009. SAND/DNA. http://www.sand.com/dna/compress/index.html/.Google Scholar
- Santry, D. S., Feeley, M. J., Hutchinson, N. C., Veitch, A. C., Carton, R. W., and Ofir, J. 1999. Deciding when to forget in the Elephant file system. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP’99). 110--123. Google Scholar
Digital Library
- Sayood, K., Ed. 2003. Lossless Compression Handbook. Academic Press.Google Scholar
- Schmuck, F. and Haskin, R. 2002. GPFS: A shared-disk file system for large computing clusters. In Proceedings of the Conference on File and Storage Technologies (FAST). USENIX Association, 231--244. Google Scholar
Digital Library
- Security Innovation, I. 2006. Regulatory compliance demystified: An introduction to compliance for developers. http://msdn.microsoft.com/en-us/library/aa480484.aspx.Google Scholar
- Seltzer, M. I. and Yigit, O. 1991. A new hashing package for UNIX. In Proceedings of the Winter USENIX Technical Conference. USENIX Association, Berkeley, CA, 173--184.Google Scholar
- Seward, J. 2002. http://sources.redhat.com/bzip2/.Google Scholar
- Tanenbaum, A. S., Herder, J. N., and Bos, H. 2006. File size distribution on UNIX systems: Then and now. ACM SIGOPS Operating Systems Review 40, 1, 100--104. Google Scholar
Digital Library
- Tichy, W. F. 1985. RCS---A system for version control. Softw. Pract. Exper. 15, 7, 637--654. Google Scholar
Digital Library
- Trendafilov, D., Memon, N., and Suel, T. 2002. Zdelta: An efficient delta compression tool. Tech. rep. TR-CIS- 2002-02, Polytechnic University.Google Scholar
- Trendafilov, D., Memon, N., and Suel, T. 2004. Compression file collections with a TSP-based approach. Tech. rep. TR-CIS-2004-02, Polytechnic University.Google Scholar
- Tridgell, A. 1999. Efficient algorithms for sorting and synchronization. Ph.D. thesis, Australian National University.Google Scholar
- Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Całkowski, G., Dubnicki, C., and Bohra, A. 2010. HydraFS: A high-throughput file system for the HYDRAstor content-addressable storage system. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA, 17--17. Google Scholar
Digital Library
- Vo, B. D. and Manku, G. S. 2007. RadixZip: Linear time compression of token streams. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 1162--1172. Google Scholar
Digital Library
- Vo, K.-P. 2007. Vcodex: A data compression platform. In Proceedings of the International Conference on Software and Data Technologies (ICSOFT’07). Springer, Berlin, 201--212.Google Scholar
- Wiederhold, G. 1983. Database Design 2nd Ed. McGraw-Hill, New York. Google Scholar
Digital Library
- Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images 2nd Ed. Morgan Kaufmann. Google Scholar
Digital Library
- Yang, T., Jiang, H., Feng, D., and Niu, Z. 2009. DEBAR: A scalable high-performance de-duplication storage system for backup and archiving. Tech. rep. TR-UNL-CSE-2009-0004, University of Nebraska-Lincoln.Google Scholar
- You, L. L. 2006. Efficient archival data storage. Ph.D. dissertation, University of California, Santa Cruz. Google Scholar
Digital Library
- You, L. L. and Karamanolis, C. 2004. Evaluation of efficient archival storage techniques. In Proceedings of the 21st IEEE/12th NASA Goddard Conference on Mass Storage Systems and Technologies. IEEE, Los Alamitos, CA, 227--232.Google Scholar
- You, L. L., Pollack, K. T., and Long, D. D. E. 2005. Deep Store: An archival storage system architecture. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05). IEEE, Los Alamitos, CA. Google Scholar
Digital Library
- Zadok, E., Osborn, J., Shater, A., Wright, C. P., Muniswamy-Reddy, K.-K., and Nieh, J. 2003. Reducing storage management costs via informed user-based policies. Tech. rep. FSL-03-01, Computer Science Department, SUNY, Stony Brook. http://www.ncl.cs.columbia.edu/publications/sunysb-fsl-03-01.pdf.Google Scholar
- Zhu, B., Li, K., and Patterson, H. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’08). USENIX Association, Berkeley, CA, 18:1--18:14. Google Scholar
Digital Library
- Ziv, J. and Lempel, A. 1977. A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory IT-23, 3, 337--343.Google Scholar
Digital Library
Index Terms
PRESIDIO: A Framework for Efficient Archival Data Storage
Recommendations
Reducing impact of data fragmentation caused by in-line deduplication
SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage ConferenceDeduplication results inevitably in data fragmentation, because logically continuous data is scattered across many disk locations. In this work we focus on fragmentation caused by duplicates from previous backups of the same backup set, since such ...
Efficient utilization of limited access archival storage in a time shared environment
SIGIR '71: Proceedings of the 1971 international ACM SIGIR conference on Information storage and retrievalThe public storage in any time sharing system tends to continually grow. This necessitates the implementation of certain measures to maintain public storage. One of these possibilities is creation of an archival level of storage called "migrated" ...
Antiquity: exploiting a secure log for wide-area distributed storage
EuroSys'07 Conference ProceedingsAntiquity is a wide-area distributed storage system designed to provide a simple storage service for applications like file systems and back-up. The design assumes that all servers eventually fail and attempts to maintain data despite those failures. ...






Comments