Abstract
Kinesis is a novel data placement model for distributed storage systems. It exemplifies three design principles: structure (division of servers into a few failure-isolated segments), freedom of choice (freedom to allocate the best servers to store and retrieve data based on current resource availability), and scattered distribution (independent, pseudo-random spread of replicas in the system). These design principles enable storage systems to achieve balanced utilization of storage and network resources in the presence of incremental system expansions, failures of single and shared components, and skewed distributions of data size and popularity. In turn, this ability leads to significantly reduced resource provisioning costs, good user-perceived response times, and fast, parallelized recovery from independent and correlated failures.
This article validates Kinesis through theoretical analysis, simulations, and experiments on a prototype implementation. Evaluations driven by real-world traces show that Kinesis can significantly outperform the widely used Chain replica-placement strategy in terms of resource requirements, end-to-end delay, and failure recovery.
- Azar, Y., Broder, A. Z., Karlin, A. R., and Upfal, E. 1999. Balanced allocations. SIAM J. Comput. 29, 1, 180--200. Google Scholar
Digital Library
- Berenbrink, P., Czumaj, A., Steger, A., and Vöcking, B. 2000. Balanced allocations: the heavily loaded case. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). Google Scholar
Digital Library
- Byers, J., Considine, J., and Mitzenmacher, M. 2003. Simple load balancing for distributed hash tables. In Proceedings of the International Workshop on Peer-to-Peer Systems (IPTPS).Google Scholar
- Czumaj, A., Riley, C., and Scheideler, C. 2003. Perfectly balanced allocation.Google Scholar
- Dabek, F., Kaashoek, M., Karger, D., Morris, R., and Stoica, I. 2001. Wide-Area cooperative storage with CFS. In Proceedings of the SIGOPS Symposium on Operating Systems Principles (SOSP). Google Scholar
Digital Library
- Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The Google file system. In Proceedings of the SIGOPS Symposium on Operating Systems Principles (SOSP). Google Scholar
Digital Library
- Godfrey, B., Lakshminarayanan, K., Surana, S., Karp, R., and Stoica, I. 2004. Load balancing in dynamic structured p2p systems. In Proceedings of the Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM).Google Scholar
- Hsiao, H. and DeWitt, D. J. 1990. Chained declustering: A new availability strategy for multiprocessor database machines. In Proceedings of the International Conference on Data Engineering (ICDE). Google Scholar
Digital Library
- Ji, M., Felten, E. W., Wang, R., and Singh, J. P. 2000. Archipelago: An island-based file system for highly available and scalable internet services. In Proceedings of the Windows Systems Symposium. Google Scholar
Digital Library
- Karger, D., Lehman, E., Leighton, T., Levine, M., Lewin, D., and Panigrahy, R. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). Google Scholar
Digital Library
- Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., and Zhao, B. 2000. OceanStore: An architecture for global-scale persistent storage. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google Scholar
Digital Library
- Lee, E. K. and Thekkath, C. A. 1996. Petal: Distributed virtual disks. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google Scholar
Digital Library
- Litwin, W. 1980. Linear hashing: A new tool for file and table addressing. In Proceedings of the Intlernational Conference on Very Large Data Bases (VLDB). Google Scholar
Digital Library
- Lumb, C. R., Golding, R., and Ganger, G. R. 2004. DSPTF: Decentralized request distribution in brickbased storage systems. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google Scholar
Digital Library
- MacCormick, J., Murphy, N., Najork, M., Thekkath, C. A., and Zhou, L. 2004. Boxwood: Abstractions as the foundation for storage infrastructure. In Proceedings of the ACM/USENIX Symposium on Operating Systems Design and Implementation (OSDI). Google Scholar
Digital Library
- Pagh, R. and Rodler, F. F. 2004. Cuckoo hashing. J. Algor. 51, 2, 122--144. Google Scholar
Digital Library
- Pai, V. S., Aron, M., Banga, G., Svendsen, M., Druschel, P., Zwaenepoel, W., and Nahum, E. 1998. Locality-Aware request distribution in cluster-based network servers. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google Scholar
Digital Library
- Quinlan, S. and Dorward, S. 2002. Venti: A new approach to archival storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). Google Scholar
Digital Library
- Rowstron, A. and Druschel, P. 2001. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility. In Proceedings of the SIGOPS Symposium on Operating Systems Principles (SOSP). Google Scholar
Digital Library
- Sanders, P., Egner, S., and Korst, J. H. M. 2003. Fast concurrent access to parallel disks. Algorithmica 35, 1, 21--55.Google Scholar
Cross Ref
- Talwar, K. and Wieder, U. 2007. Ballanced allocations: The weighted case. In Proceedings of the Annual ACM Symposium on Theory of Computing (STOC). Google Scholar
Digital Library
- van Renesse, R. and Schneider, F. B. 2004. Chain replication for supporting high throughput and availability. In Proceedings of the ACM/USENIX Symposium on Operating Systems Design and Implementation (OSDI). Google Scholar
Digital Library
- Vöcking, B. 1999. How asymmetry helps load balancing. In Proceedings of the Annual Symposium on Foundations of Computer Science (FOCS). New York, NY. Google Scholar
Digital Library
- Weil, S. A., Brandt, S. A., Miller, E. L., Long, D. D. E., and Maltzahn, C. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the ACM/USENIX Symposium on Operating Systems Design and Implementation (OSDI). Google Scholar
Digital Library
- Weil, S. A., Brandt, S. A., Miller, E. L., and Maltzahn, C. 2006. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proceedings of the International Conference on Super Computing (SC). Google Scholar
Digital Library
- Wieder, U. 2007. Ballanced allocations with heterogeneous bins. In Proceedings of the Sympostiom on Parallel Algorithms and Architecture (SPAA). Google Scholar
Digital Library
Index Terms
Kinesis: A new approach to replica placement in distributed storage systems
Recommendations
ACS: an alternate coding scheme to improve degrade read performance for SSD-based RAID5 systems
To guarantee high performance and reliability, storage systems require better devices and data redundancy schemes, e.g., SSD-based RAID5. However, failures in the large-scale storage systems are common. In order to serve requests on a failed node, the SSD-...
Reparo: A Fast RAID Recovery Scheme for Ultra-large SSDs
A recent ultra-large SSD (e.g., a 32-TB SSD) provides many benefits in building cost-efficient enterprise storage systems. Owing to its large capacity, however, when such SSDs fail in a RAID storage system, a long rebuild overhead is inevitable for RAID ...
Protecting Data against Consecutive Disk Failures in RAID-5
ENC '09: Proceedings of the 2009 Mexican International Conference on Computer ScienceIn this letter we present a reorganization method to protect against data loss when one or two disks fail in a RAID level 5. The main advantage of the proposed method is that it is robust against a second failure if a first failed disk has not been ...






Comments