Abstract
Cumulus is a system for efficiently implementing filesystem backups over the Internet, specifically designed under a thin cloud assumption—that the remote datacenter storing the backups does not provide any special backup services, but only a least-common-denominator storage interface. Cumulus aggregates data from small files for storage and uses LFS-inspired segment cleaning to maintain storage efficiency. While Cumulus can use virtually any storage service, we show its efficiency is comparable to integrated approaches.
- Agrawal, N., Bolosky, W. J., Douceur, J. R., and Lorch, J. R. 2007. A five-year study of file-system metadata. ACM Trans. Stor. 3, 3, 9. Google Scholar
Digital Library
- Amazon Web Services. 2009. Amazon Simple Storage Service. http://aws.amazon.com/s3/.Google Scholar
- boto 2009. boto: Python interface to Amazon Web Services. http://code.google.com/p/boto/.Google Scholar
- Cox, L. P., Murray, C. D., and Noble, B. D. 2002. Pastiche: Making backup cheap and easy. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI). USENIX, 285--298. Google Scholar
Digital Library
- Douceur, J. R., Adya, A., Bolosky, W. J., Simon, D., and Theimer, M. 2002. Reclaiming space from duplicate files in a serverless distributed file system. Tech. rep. MSR-TR-2002-30.Google Scholar
- Escoto, B. 2009. rdiff-backup. http://www.nongnu.org/rdiff-backup/.Google Scholar
- Escoto, B. and Loafman, K. 2009. Duplicity. http://duplicity.nongnu.org/.Google Scholar
- Fitzpatrick, B. 2009. Brackup. http://code.google.com/p/brackup/, http://brad.livejournal.com/tag/brackup.Google Scholar
- fuse 2009. FUSE: Filesystem in userspace. http://fuse.sourceforge.net/.Google Scholar
- Garnaat, M. 2009. Sqlite. http://www.sqlite.org/.Google Scholar
- Henson, V. 2003. An analysis of compare-by-hash. Proceedings of the 9th Workshop on Hot Topics in Operating Systems (HotOS IX). Google Scholar
Digital Library
- Henson, V. 2007. The code monkey's guide to cryptographic hashes for content-based addressing. http://www.linuxworld.com/news/2007/111207-hash.html.Google Scholar
- jungledisk 2009. Jungle disk. http://www.jungledisk.com/.Google Scholar
- librsync 2009. librsync. http://librsync.sourceforge.net/.Google Scholar
- Muthitacharoen, A., Chen, B., and Mazières, D. 2001. A low-bandwidth network file system. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP). ACM, 174--187. Google Scholar
Digital Library
- Preston, W. C. 2006. Backup&Recovery. O'Reilly. Google Scholar
Digital Library
- Quinlan, S. and Dorward, S. 2002. Venti: a new approach to archival storage. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST). USENIX Association. Google Scholar
Digital Library
- Rosenblum, M. and Ousterhout, J. K. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52. Google Scholar
Digital Library
- Summers, B. and Wilson, C. 2009. Box backup. http://www.boxbackup.org/.Google Scholar
- Tridgell, A. 1999. Efficient algorithms for sorting and synchronization. Ph.D. thesis, Australian National University.Google Scholar
- Wang, J. and Hu, Y. 2002. WOLF--A novel reordering write buffer to boost the performance of log-structured file systems. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST). USENIX Association. Google Scholar
Digital Library
- Weatherspoon, H., Eaton, P., Chun, B.-G., and Kubiatowicz, J. 2007. Antiquity: Exploiting a secure log for wide-area distributed storage. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys'07). ACM, New York, NY, 371--384. Google Scholar
Digital Library
- Wheeler, D. A. 2009. SLOCCount. http://www.dwheeler.com/sloccount/.Google Scholar
- Zhu, B., Li, K., and Patterson, H. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, 269--282. Google Scholar
Digital Library
Index Terms
Cumulus: Filesystem backup to the cloud
Recommendations
Cumulus: filesystem backup to the cloud
FAST '09: Proccedings of the 7th conference on File and storage technologiesIn this paper we describe Cumulus, a system for efficiently implementing filesystem backups over the Internet. Cumulus is specifically designed under a thin cloud assumption--that the remote datacenter storing the backups does not provide any special ...
An economical backup strategy for floppy disks
A variety of data files are preserved on floppy disks. Document files created using a word processor as well as a personal computer are, for instance, stored on floppy disks. Nevertheless, such flies are occasionally lost due to human errors, the life ...
MyCassandra: a cloud storage supporting both read heavy and write heavy workloads
SYSTOR '12: Proceedings of the 5th Annual International Systems and Storage ConferenceA cloud storage with persistence shows solid performance only with a read heavy or write heavy workload. There is a trade-off between the read-optimized and write-optimized design of a cloud storage. This is dominated by its storage engine, which is a ...






Comments