Abstract
The ever-growing amount of data requires highly scalable storage solutions. The most flexible approach is to use storage pools that can be expanded and scaled down by adding or removing storage devices. To make this approach usable, it is necessary to provide a solution to locate data items in such a dynamic environment. This article presents and evaluates the Random Slicing strategy, which incorporates lessons learned from table-based, rule-based, and pseudo-randomized hashing strategies and is able to provide a simple and efficient strategy that scales up to handle exascale data. Random Slicing keeps a small table with information about previous storage system insert and remove operations, drastically reducing the required amount of randomness while delivering a perfect load distribution.
- S. Amarasinghe, D. Campbell, W. Carlson, A. Chien, W. Dally, E. Elnohazy, M. Hall, et al. 2010. ExaScale software study: Software challenges in extreme scale systems. Tech. Rep., sponsored by DARPA IPTO in the context of the ExaScale Computing Study.Google Scholar
- A. Azagury, V. Dreizin, M. Factor, E. Henis, D. Naor, Y. Rinetzky, O. Rodeh, J. Satran, A. Tavory, and L. Yerushalmi. 2003. Towards an object store. In Proceedings of the 20th IEEE Conference on Mass Storage Systems and Technologies (MSST). 165--176. Google Scholar
Digital Library
- Y. Azar, A. Broder, A. Karlin, and E. Upfal. 1999. Balanced allocations. SIAM J. Comput. 29, 1, 180--200. Google Scholar
Digital Library
- J. L. Bentley. 1977. Solutions to Klees rectangle problems. Tech. Rep. Carnegie-Mellon University, Pittsburgh, PA.Google Scholar
- M. Blaum, J. Brady, J. Bruck, and J. Menon. 1994. EVENODD: An optimal scheme for tolerating double disk failures in RAID architectures. In Proceedings of the 21st International Symposium on Computer Architecture (ISCA). 245--254. Google Scholar
Digital Library
- R. P. Brent. 1992. Uniform random number generators for supercomputers. In Proceedings of the 5th Australian Supercomputer Conference. 95--104.Google Scholar
- A. Brinkmann and S. Effert. 2008. Redundant data placement strategies for cluster storage environments. In Proceedings of the 12th International Conference on Principles of DIstributed Systems (OPODIS). Google Scholar
Digital Library
- A. Brinkmann, S. Effert, F. Meyer Auf Der Heide, and C. Scheideler. 2007. Dynamic and redundant data placement. In Proceedings of the 27th IEEE International Conference on Distributed Computing Systems (ICDCS). Google Scholar
Digital Library
- A. Brinkmann, M. Heidebuer, F. Meyer Auf Der Heide, U. Rückert, K. Salzwedel, and M. Vodisek. 2004. V: Drive - Costs and Benefits of an Out-of-Band Storage Virtualization System. In Proceedings of the 21st IEEE Conference on Mass Storage Systems and Technologies (MSST). 153--157.Google Scholar
- A. Brinkmann, K. Salzwedel, and C. Scheideler. 2000. Efficient, distributed data placement strategies for storage area networks. In Proceedings of the 12th ACM Symposium on Parallel Algorithms and Architectures (SPAA). 119--128. Google Scholar
Digital Library
- A. Brinkmann, Kay Salzwedel, and C. Scheideler. 2002. Compact, adaptive placement schemes for non-uniform distribution requirements. In Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA). 53--62. Google Scholar
Digital Library
- W. S. Chou. 1995. On inversive maximal period polynomials over finite fields. Appl. Algeb. Eng. Commun. Comput. 6, 4--5, 245--250.Google Scholar
- P. Corbett, B. English, A. Goel, T. Grcanac, S. Kleiman, J. Leong, and S. Sankar. 2004. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST). 1--14. Google Scholar
Digital Library
- T. Cortes and J. Labarta. 2001. Extending heterogeneity to RAID level 5. In Proceedings of the USENIX Annual Technical Conference. 119--132. Google Scholar
Digital Library
- M. De Berg, O. Cheong, and M. Van Kreveld. 2008. Computational Geometry: Algorithms and Applications. Springer. Google Scholar
Cross Ref
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. 2007. Dynamo: Amazon's highly available key-value store. ACM SIGOPS Oper. Syst. Rev. 41, 6, 205--220. Google Scholar
Digital Library
- A. Devulapalli, D. Dalessandro, and P. Wyckoff. 2008. Data structure consistency using atomic operations in storage devices. In Proceedings of the 5th International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI). 65--73. Google Scholar
Digital Library
- D. Eastlake and P. Jones. 2001. US secure hash algorithm 1 (SHA1).Google Scholar
- J. Gonzalez and T. Cortes. 2008. Distributing orthogonal redundancy on adaptive disk arrays. In Proceedings of the International Conference on Grid Computing, High-Performance and Distributed Applications (GADA). Google Scholar
Digital Library
- R. J. Honicky and E. L. Miller. 2003. A fast algorithm for online placement and reorganization of replicated data. In Proceedings of the 17th IEEE International Parallel and Distributed Processing Symposium (IPDPS). Google Scholar
Digital Library
- R. J. Honicky and E. L. Miller. 2004. Replication under scalable hashing: A family of algorithms for scalable decentralized data distribution. In Proceedings of the 18th IEEE International Parallel and Distributed Processing Symposium (IPDPS).Google Scholar
- N. L. Johnson and S. Kotz. 1977. Urn Models and Their Applications. Wiley, New York.Google Scholar
- D. Karger, E. Lehman, T. Leighton, M. Levine, D. Lewin, and R. Panigrahy. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the 29th ACM Symposium on Theory of Computing (STOC). 654--663. Google Scholar
Digital Library
- D. E. Knuth. 1997. Volume 2: Seminumerical Algorithms. In the Art of Computer Programming, 192.Google Scholar
- D. H. Lehmer. 1951. Mathematical methods in large-scale computing units. Ann. Comput. Lab. Harvard Univ. 26, 141--146.Google Scholar
- M. Luby. 1996. Pseudorandomness and Cryptographic Applications. Princeton University Press. Google Scholar
Digital Library
- G. Marsaglia and A. Zaman. 1991. A new class of random number generators. Ann. Appl. Probab. 462--480.Google Scholar
- M. Mense and C. Scheideler. 2008. SPREAD: An adaptive scheme for redundant and fair storage in dynamic heterogeneous storage systems. In Proceedings of the 19th ACM-SIAM Symposium on Discrete Algorithms (SODA). Google Scholar
Digital Library
- M. Mitzenmacher. 1996. The power of two choices in randomized load balancing. Ph.D. thesis. Computer Science Department, University of California, Berkeley. Google Scholar
Digital Library
- D. A. Patterson, G. Gibson, and R. H. Katz. 1988. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the ACM Conference on Management of Data (SIGMOD). 109--116. Google Scholar
Digital Library
- I. Popov, A. Brinkmann, and T. Friedetzky. 2012. On the influence of PRNGs on data distribution. In Proceedings of the 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP). IEEE, 536--543. Google Scholar
Digital Library
- M. Raab and A. Steger. 1998. Balls into BINSA simple and tight analysis. Random. Approx. Tech. Comput. Sci., 159--170. Google Scholar
Digital Library
- P. Sanders. 2001. Reconciling simplicity and realism in parallel disk models. In Proceedings of the 12th ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM, 67--76. Google Scholar
Digital Library
- C. Schindelhauer and G. Schomaker. 2005. Weighted distributed hash tables. In Proceedings of the 17th ACM Symposium on Parallel Algorithms and Architectures (SPAA). 218--227. Google Scholar
Digital Library
- F. L. Severence. 2009. System Modeling and Simulation: An Introduction. Wiley.Google Scholar
- M. Stevens, A. Lenstra, and B. de Weger. 2007. Chosen-prefix collisions for md5 and colliding x.509 certificates for different identities. In Proceedings of the 26th Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT). 1--22. Google Scholar
Digital Library
- M. Stevens, A. Sotirov, J. Appelbaum, A. Lenstra, D. Molnar, D. Osvik, and B. De Weger. 2009. Short Chosen-Prefix Collisions for MD5 and the Creation of a rogue CA certificate. In Proceedings of the 29th Annual International Cryptology Conference (CRYPTO). 55--69. Google Scholar
Digital Library
- I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, F. Dabek, and H. Balakrishnan. 2003. Chord: A scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Trans. Netw. 11, 1, 17--32. Google Scholar
Digital Library
- J. Viega. 2003. Practical random number generation in software. In Proceedings of the 19th Annual Computer Security Applications Conference (ACSAC). 129--141. Google Scholar
Digital Library
- J. Walker. 1998. ENT Test suite. http://www.fourmilab.ch/random.Google Scholar
- T. Wang. 2007. Integer hash function. http://www.concentric.net/ttwang/tech/inthash.htm.Google Scholar
- X. Wang, Y. Yin, and H. Yu. 2005. Finding Collisions in the Full SHA-1. In Proceedings of the 25th Annual International Cryptology Conference (CRYPTO). 17--36. Google Scholar
Digital Library
- S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. 2006a. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI). 307--320. Google Scholar
Digital Library
- S. A. Weil, S. A. Brandt, E. L. Miller, and C. Maltzahn. 2006b. CRUSH: Controlled, scalable and decentralized placement of replicated data. In Proceedings of the ACM/IEEE Conference on Supercomputing. Google Scholar
Digital Library
- W. Zheng and G. Zhang. 2011. FastScale: Accelerate RAID Scaling by minimizing data migration. In Proceedings of the 9th USENIX Conference on File and Storage Technologies (FAST). Google Scholar
Digital Library
Index Terms
Random Slicing: Efficient and Scalable Data Placement for Large-Scale Storage Systems
Recommendations
Opportunities and challenges of storage grid enabled by grid service
The explosive growth of data generated worldwide by information digitization has been identified as the key driver of storage system evolution. In this paper, we discuss the existing storage system architectures and pose the challenges they are facing. ...
The Tiger Shark file system
COMPCON '96: Proceedings of the 41st IEEE International Computer ConferenceTiger Shark is a parallel file system for IBM's AIX operating system. It is designed to support interactive multimedia, particularly large-scale systems such as interactive television (ITV). Tiger Shark scales across the entire RS/6000 product line, ...
Efficient detection of large-scale redundancy in enterprise file systems
In order to catch and reduce waste in the exponentially increasing demand for disk storage, we have developed very efficient technology to detect approximate duplication of large directory hierarchies. Such duplication can be caused, for example, by ...






Comments