skip to main content
article

Pastiche: making backup cheap and easy

Published: 31 December 2002 Publication History

Abstract

Backup is cumbersome and expensive. Individual users almost never back up their data, and backup is a significant cost in large organizations. This paper presents Pastiche, a simple and inexpensive backup system. Pastiche exploits excess disk capacity to perform peer-to-peer backup with no administrative costs. Each node minimizes storage overhead by selecting peers that share a significant amount of data. It is easy for common installations to find suitable peers, and peers with high overlap can be identified with only hundreds of bytes. Pastiche provides mechanisms for confidentiality, integrity, and detection of failed or malicious peers. A Pastiche prototype suffers only 7.4% overhead for a modified Andrew Benchmark, and restore performance is comparable to cross-machine copy.

References

[1]
M. Ajtai, R. Burns, R Fagin, D. D. E. Long, and L. Stockmeyer. Compactly encoding unstructured inputs with differential compression. Journal of the Association for Computing Machinery, to appear.]]
[2]
G. Banga, P. Druschel, and J. C. Mogul. Resource containers: A new facility for resource management in server systems. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation, pages 45--58, New Orleans, LA, February 1999.]]
[3]
C. Batten, K. Barr, A. Saraf, and S. Trepetin. pStore: A secure peer-to-peer backup system. Unpublished report, MIT Laboratory for Computer Science, December 2001.]]
[4]
M. Blaze, J. Ioannidis, and A. Keromytis. Offlinemicropayments without trusted hardware. In Proceedings of the Fifth Annual Conference on Financial Cryptography, Cayman Islands, BWI, February 2001.]]
[5]
W. J. Bolosky, S. Corbin, D. Goebel, and J. R. Douceur. Single instance storage in Windows 2000. In Proceedings of the 4th USENIX Windows Systems Symposium, pages 13--24, Seattle, WA, August 2000.]]
[6]
W. J. Bolosky, J. R. Douceur, D. Ely, and M. Theimer. Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, pages 34--43, Santa Clara, CA, June 2000.]]
[7]
A. Broder, S. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of the web. In Proceedings of the 6th International World-Wide Web Conference, pages 391--401, Santa Clara, CA, April 1997.]]
[8]
A. Z. Broder. On the resemblance and containment of documents. In Proceedings. Compression and Complexity of SEQUENCES, pages 21--29, Salerno, Italy, June 1997. Published in 1998.]]
[9]
M. Castro, P. Druschel, A. Ganesh, A. Rowstron, and D. S. Wallach. Security for structured peer-to-peer overlay networks. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation, Boston, MA, December 2002.]]
[10]
M. Castro, P. Druschel, Y. C. Hu, and A. Rowstron. Exploiting network proximity in peer-to-peer overlay networks. Submitted for publication.]]
[11]
F. Chang, M. Ji, S. A. Leung, J. MacCormick, S. E. Perl, and L. Zhang. Myriad: Cost-effective disaster tolerance. In Proceedings of the USENIX Conference on File and Storage Technologies, pages 103--116, Monterey, CA, January 2002.]]
[12]
D. Chaum. Blind signatures for untraceable payments. In Advances in Cryptology: Proceedings of Crypto '82, pages 199--203, August 1982.]]
[13]
A. L. Chervenak, V. Vellanki, and Z. Kurmas. Protecting file systems: A survey of backup techniques. In Proceedings of the Joint NASA and IEEE Mass Storage Conference, March 1998.]]
[14]
I. Clarke, S. G. Miller, T. W. Hong, O. Sandberg, and B. Wiley. Protecting fee expression online with Freenet. IEEE lnternet Computing, 6(1):40--49, 2002.]]
[15]
Connected Corporation. The 60% you're missing: Preventing data loss through PC management. White paper, Farmingham, MA, 2002.]]
[16]
F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Wide-area cooperative storage with CFS. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 202--215, Banff, Canada, October 2001.]]
[17]
J. Daemen and V. Rijmen. AES proposal: Rijndael. Advanced Encryption Standard Submission, 2nd version, March 1999.]]
[18]
J. R. Douceur. The Sybil attack. In 1st International Workshop on Peer-to-Peer Systems, Cambridge, MA, March 2002.]]
[19]
J. R. Douceur and W. J. Bolosky. A large-scale study of file-system contents. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, pages 59--70, Atlanta, GA, May 1999.]]
[20]
J. R. Douceur and W. J. Bolosky. Progress-based regulation of low-importance processes. In Proceedings of the 17th ACM Symposium on Operating Systems Principles, pages 247--260, Kiawah Island Resort, SC, December 1999.]]
[21]
S. Elnikety, M. Lillibridge, M. Burrows, and W. Zwaenepoel. Cooperative backup system. In The USENIX Conference on File and Storage Technologies, Monterey, CA, January 2002. Work-in-progress report.]]
[22]
D. Hitz, J. Lau, and M. A. Malcom. File system design for an NFS file server appliance. In Proceedings USENIX Winter Technical Conference, pages 235--246, San Francisco, CA, January 1994.]]
[23]
J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols, M. Satyanarayanan, R. N. Sidebotham, and M. J. West. Scale and performance in a distributed file system. ACM Transactions on Computer Systems, 6(1):51--81, February 1988.]]
[24]
A. Juels and J. Brainard. Client puzzles: A cryptographic countermeasure against connection depletion attacks. In Proceedings of the Network and Distributed System Security Symposium, pages 151--165, San Diego, CA, February 1999.]]
[25]
M. F. Kaashoek, D. R. Engler, G. R. Ganger, H. M. Briceno, R. Hunt, D. Mazieres, T. Pinckney, R. Grimm, J. Jannotti, and K. Mackenzie. Application performance and flexibility on exokernel systems. In Proceedings of the 16th ACM Symposium on Operating Systems Principles, pages 52--65, Saint Malo, France, October 1997.]]
[26]
S. R. Kleiman. Vnodes: An architecture for multiple file system types in Sun UNIX. In USENIX Association Summer Conference Proceedings, pages 238--247, Atlanta, GA, June 1986.]]
[27]
U. Manber. Finding similar files in a large file system. In Proceedings of the USENIX Winter 1994 Conference, pages 1--10, San Francisco, CA, January 1994.]]
[28]
E. Melski. Burt: the backup and recovery tool. In Proceedings of LISA'99, pages 207--217, Seattle, WA, November 1999.]]
[29]
Microsoft Corporation. SimPastry. http://www.research.microsoft.com/~ antr/Pastry/ download.htm.]]
[30]
A. Muthitacharoen, B. Chen, and D. Maziéres. A lowbandwidth network file system. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 174--187, Banff, Candada, October 2001.]]
[31]
National Institute of Standards and Technology. Computer data authentication. FIPS Publication #113, May 1985.]]
[32]
National Institute of Standards and Technology. Secure hash standard. FIPS Publication #180-1, April 1997.]]
[33]
Network Appliance. NetApp unveils first nearstore release. Computer Reseller News, page 33, March 25, 2002.]]
[34]
H. Patterson, S. Manley, M. Federwisch, D. Hitz, S. Kleiman, and S. Owara. SnapMirror: File system based asynchronous mirroring for disaster recovery. In Proceedings of the USENIX Conference on File and Storage Technologies, pages 117--129, Monterey, CA, January 2002.]]
[35]
W. W. Peterson and E. J. Weldon. Error-correcting Codes. The MIT Press, 1972.]]
[36]
W. C. Preston. Using Gigabit Ethernet to backup six Terabytes. In Proceedings of LISA'98, pages 87--95, Boston, MA, December 1998.]]
[37]
S. Quinlan. A cache WORM file system. Software---Practice and Experience, 21(12):1289--1299, December 1991.]]
[38]
S. Quinlan and S. Dorward. Venti: A new approach to archival storage. In Proceedings of the USENIX Conference on File and Storage Technologies, pages 89--102, Monterey, CA, January 2002.]]
[39]
M. O. Rabin. Fingerprinting by random polynomials. Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University, 1981.]]
[40]
S. Rhea, C. Wells, R Eaton, D. Geels, B. Zhao, H. Weatherspoon, and J. Kubiatowicz. Maintenance-free global data storage. IEEE lnternet Computing, 5(5):40--49, September 2001.]]
[41]
A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In IFIP/ACM International Conference on Distributed Systems Platforms, pages 329--350, Heidelberg, Germany, November 2001.]]
[42]
A. Rowstron and R Druschel. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 188--201, Banff, Canada, October 2001.]]
[43]
D. S. Santry, M. J. Feeley, N. C. Hutchinson, A. C. Veitch, R. W. Carton, and J. Ofir. Deciding when to forget in the Elephant file system. In Proceedings of the 17th ACM Symposium on Operating Systems Principles, pages 110--123, Kiawah Island Resort, SC, December 1999.]]
[44]
M. Satyanarayanan. RPC2 User Guide and Reference Manual. School of Computer Science, Carnegie Mellon University, October 1991.]]
[45]
M. Spasojevic and M. Satyanarayanan. An empirical study of a wide-area distributed file system. ACM Transactions on Computer Systems, 14(2):200--222, May 1996.]]
[46]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceedings of the ACM SIGCOMM 2001 Conference, pages 149--160, San Diego, CA, August 2001.]]
[47]
J. D. Strunk, G. R. Goodson, M. L. Scheinholtz, C. A. N. Soules, and G. R. Ganger. Self-securing storage: Protecting data in compromised systems. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation, pages 165--179, San Diego, CA, October 2000.]]
[48]
A. Tridgell. Efficient algorithms for sorting and synchronization. PhD thesis, The Austrailian National University, 1999.]]
[49]
J. D. Tygar, A. Gupta, O. Shmueli, and J. Widom. Atomicity versus anonymity: Distributed transactions for electronic commerce. In Proceedings of the 24th Annual International Conference on Very Large Data Bases, pages 1--12, New York, NY, August 1998.]]
[50]
W. Vogels. File system usage in Windows NT 4.0. In Proceedings of the 17th ACM Symposium on Operating Systems Principles, pages 93--109, Kiawah Island Resort, SC, December 1999.]]
[51]
A. Westerlund and J. Danielsson. Arla--afree afs client. In Proceedings of 1998 USENIX, Freenix track, New Orleans, LA, June 1998.]]

Cited By

View all
  • (2023)A lightweight encrypted deduplication scheme supporting backupJournal of Systems Architecture10.1016/j.sysarc.2023.102858138(102858)Online publication date: May-2023
  • (2022)Secure Password-Protected Encryption Key for Deduplicated Cloud Storage SystemsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.307414619:4(2789-2806)Online publication date: 1-Jul-2022
  • (2022)Secure DeduplicationData Deduplication for High Performance Storage System10.1007/978-981-19-0112-6_6(95-109)Online publication date: 3-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 36, Issue SI
OSDI '02: Proceedings of the 5th Symposium on Operating Systems Design and Implementation
Winter 2002
398 pages
ISSN:0163-5980
DOI:10.1145/844128
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 December 2002
Published in SIGOPS Volume 36, Issue SI

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)6
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A lightweight encrypted deduplication scheme supporting backupJournal of Systems Architecture10.1016/j.sysarc.2023.102858138(102858)Online publication date: May-2023
  • (2022)Secure Password-Protected Encryption Key for Deduplicated Cloud Storage SystemsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2021.307414619:4(2789-2806)Online publication date: 1-Jul-2022
  • (2022)Secure DeduplicationData Deduplication for High Performance Storage System10.1007/978-981-19-0112-6_6(95-109)Online publication date: 3-Jun-2022
  • (2021)What If Keys Are Leaked? towards Practical and Secure Re-Encryption in Deduplication-Based Cloud StorageInformation10.3390/info1204014212:4(142)Online publication date: 26-Mar-2021
  • (2021) Modified SeDaSc system for efficient data sharing in the cloud Concurrency and Computation: Practice and Experience10.1002/cpe.637733:21Online publication date: 15-May-2021
  • (2020)Boafft: Distributed Deduplication for Big Data Storage in the CloudIEEE Transactions on Cloud Computing10.1109/TCC.2015.25117528:4(1199-1211)Online publication date: 1-Oct-2020
  • (2020)R-Dedup: Secure client-side deduplication for encrypted data without involving a third-party entityJournal of Network and Computer Applications10.1016/j.jnca.2020.102664162(102664)Online publication date: Jul-2020
  • (2019)Formation of Stable and Efficient Social Storage CloudGames10.3390/g1004004410:4(44)Online publication date: 1-Nov-2019
  • (2019)Stability, efficiency, and contentedness of social storage networksAnnals of Operations Research10.1007/s10479-019-03309-9287:2(811-842)Online publication date: 5-Aug-2019
  • (2019)The ubiquitous digital fileJournal of the Association for Information Science and Technology10.1002/asi.2422271:1(E1-E32)Online publication date: 4-Dec-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media