Abstract
Parity is a popular form of data protection in redundant arrays of inexpensive/independent disks (RAID). RAID5 dedicates one out of N disks to parity to mask single disk failures, that is, the contents of a block on a failed disk can be reconstructed by exclusive-ORing the corresponding blocks on surviving disks. RAID5 can mask a single disk failure, and it is vulnerable to data loss if a second disk failure occurs. The RAID5 rebuild process systematically reconstructs the contents of a failed disk on a spare disk, returning the system to its original state, but the rebuild process may be unsuccessful due to unreadable sectors. This has led to two disk failure tolerant arrays (2DFTs), such as RAID6 based on Reed-Solomon (RS) codes. EVENODD, RDP (Row-Diagonal-Parity), the X-code, and RM2 (Row-Matrix) are 2DFTs with parity coding. RM2 incurs a higher level of redundancy than two disks, while the X-code is limited to a prime number of disks. RDP is optimal with respect to the number of XOR operations at the encoding, but not for short write operations. For small symbol sizes EVENODD and RDP have the same disk access pattern as RAID6, while RM2 and the X-code incur a high recovery cost with two failed disks. We describe variations to RAID5 and RAID6 organizations, including clustered RAID, different methods to update parities, rebuild processing, disk scrubbing to eliminate sector errors, and the intra-disk redundancy (IDR) method to deal with sector errors. We summarize the results of recent studies of failures in hard disk drives. We describe Markov chain reliability models to estimate RAID mean time to data loss (MTTDL) taking into account sector errors and the effect of disk scrubbing. Numerical results show that RAID5 plus IDR attains the same MTTDL level as RAID6, while incurring a lower performance penalty. We conclude with a survey of analytic and simulation studies of RAID performance and tools and benchmarks for RAID performance evaluation.
- Alvarez, G. A., Burkhard, W. A., and Cristian, F. 1997. Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA'97). 62--72. Google Scholar
Digital Library
- Alvarez, G. A., Burkhard, W. A., Stockmeyer, L. J., and Cristian, F. 1998. Declustered disk array architectures with optimal and near optimal parallelism. In Proceedings of the 25th International Symposium on Computer Architecture (ISCA'98). 109--120. Google Scholar
Digital Library
- Alvarez, G. A., Borowsky, E., Go, S., Romer, T. H., Becker-Szendy, R., Golding, R., Merchant, A., Spasojevic, M., Veitch, A., and Wilkes, J. 2001. Minerva: An automated resource provisioning tool for large-scale storage systems. ACM Trans. Comput. Syst. 19, 4, 483--518. Google Scholar
Digital Library
- Anderson, D., Dykes, J., and Riedel, E. 2003. More than an interface—SCSI vs ATA. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03). 245--257. Google Scholar
Digital Library
- Anderson, E., Kallhalla, M., Spence, S., Swaminathan, R., and Wang, Q. 2005. Quickly finding near-optimal storage system designs. ACM Trans. Comput. Syst. 23, 4, 337--374. Google Scholar
Digital Library
- Bachmat, E. and Schindler, J. 2002. Analysis of methods for scheduling low priority disk drive tasks. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 55--65. Google Scholar
Digital Library
- Baek, S. H., Kim, B. W., Jeung, E., and Park, C. W. 2001. Reliability and performance of hierarchical RAID with multiple controllers. In Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing (PODC'01). 246--254. Google Scholar
Digital Library
- Bairavasundaram, L. N., Goodson, G. R., Pasupathy, S., and Schindler, J. 2007. An analysis of latent sector errors in disk drives. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 289--300. Google Scholar
Digital Library
- Bairavasundaram, L. N., Goodson, G. R., Schroeder, B., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2008. An analysis of data corruption in the storage stack. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST'08). 223--238. Google Scholar
Digital Library
- Balafoutis. E., Panagakis, A., Triantafilliou, P. Nerjes, G., Muth, P., and Weikum, G. 2003. Clustered scheduling algorithms for mixed media disk workloads in a multimedia server. Cluster Comput. 6, 1, 75--86. Google Scholar
Digital Library
- Barve, R., Shriver, E. A. M., Gibbons, P., Hillyer, B. K., Matias, Y., and Vitter, J. S. 1998. Modeling and optimizing I/O throughput of multiple disks on a bus. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 83--92. Google Scholar
Digital Library
- Baylor, S., Corbett, P. F., and Park, C. 1999. Efficient method for providing fault tolerance against double device failures in multiple device systems. US Patent 5,862,158.Google Scholar
- Blaum, M. 1987. A class of byte-correcting array codes. Res. Rep. RJ 5652 (57151). IBM Almaden Research Center, San Jose, CA.Google Scholar
- Blaum, M. and Roth, R. M. 1993. New array codes for multiple phased burst correction. IEEE Trans. Inform. Theory 39, 1, 66--77.Google Scholar
Digital Library
- Blaum, M. and Ouchi, K. 1994. Method and means for B-adjacent coding and rebuilding data from up to two unavailable DASDs in a DASD array. U.S. Patent 5,333,143.Google Scholar
- Blaum, M., Brady, J., Bruck, J., and Menon, J. 1995. EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 44, 2, 192--202. Google Scholar
Digital Library
- Blaum, M., Bruck, J., and Vardy, A. 1996. MDS array codes with independent parity symbols. IEEE Trans. Inform. Theory 42, 2, 529--542. Google Scholar
Digital Library
- Blaum, M., Farrell, P. G., and van Tilborg, H. C. A. 1998. Array codes. In Handbook of Coding Theory, V. S. Pless and W. C. Huffman, Eds., Elsevier Science, Amsterdam, The Netherlands, Chapter 22, 1855--1909.Google Scholar
- Blaum, M. and Roth, R. M. 1999. On lowest-density MDS codes. IEEE Trans. Inform. Theory 45, 1, 46--59. Google Scholar
Digital Library
- Blaum, M., Brady, J., Bruck, J., Menon, J., and Vardy, A. 2002. The EVENODD Code and Its Generalizations. In High Performance Mass Storage and Parallel I/O: Technologies and Applications, H. Jin. T. Cortes, and R. Buyya, Eds., Wiley, New York, NY, 187--205.Google Scholar
- Blaum, M. 2005. An Introduction to Error-Correcting Codes. In Coding and Signal Processing for Magnetic Recording Systems. B. Vasic and E. M. Kurtas, Eds., Chapter 9, CRC Press, Orlando, FL.Google Scholar
- Blaum, M. 2006. A family of MDS array codes with a minimal number of encoding operations. In Proceedings of the International Symposium on Information Theory (ISIT'06). 2784--2788.Google Scholar
Cross Ref
- Blum, A., Goyal, A., Heidelberger, P., Lavenberg, S. S., Nakayama, M., and Shahabuddin, P. 1994. Modeling and analysis of system dependability using the system availability estimator. In Proceedings of the 24th IEEE Annual International Symposium on Fault-Tolerant Computing Systems (FTCS-24), 137--141.Google Scholar
- Borowsky, E., Golding, R., Merchant, A., Shriver, E., Spasojevic, M., and Wilkes, J. 1997. Using attribute-managed storage to achieve QoS. In Proceedings of the 5th International Conference on Workshop on Quality of Service. 203--206.Google Scholar
- Boxma, O. J. and Cohen, J. W. 1991. The M/G/1 queue with permanent customers. IEEE J. Select. Areas Comm. 9, 2, 179--184.Google Scholar
Digital Library
- Bucy, J. S., Schindler, J., Schlosser, S. W., Ganger, G. R., and Contributors. 2008. The DiskSim simulation environment version 4.0 reference manual. Tech. rep. CMU-PDL-08-101.Google Scholar
- Butterworth, H. E. 1999. The design of segment filling and selection algorithms for efficient free-space collection in a log-structured array. IBM Hursley, UK, unpublished manuscript.Google Scholar
- Carley, L. R., Ganger, G. R., and Nagle, D. F. 2000. MEMS-based integrated-circuit mass-storage systems. Comm. ACM 43, 11, 72--80. Google Scholar
Digital Library
- Chandy, J. and Narasimha Reddy, A. L. 1993. Failure evaluation of disk array organizations. In Proceedings of the 13th International Conference on Distributed Computing Systems (ICDCS'93), 319--326.Google Scholar
- Chen, P. M., Lee, E. K., Gibson, G. A., Katz, R. H., and Patterson, D. A. 1994. RAID: High-performance, reliable secondary storage. ACM Comput. Surv. 26, 2, 145--185. Google Scholar
Digital Library
- Chen, S.-Z. and Towsley, D. F. 1993. The design and evaluation of RAID 5 and parity striping disk array architectures. J. Parall. Distrib. Comput. 10, 1/2, 41--57.Google Scholar
- Chen, S.-Z. and Towsley, D. F. 1996. A performance evaluation of RAID architectures. IEEE Trans. Comput. 45, 10, 1116--1130. Google Scholar
Digital Library
- Coffman, Jr. E. G. and Denning, P. 1972. Operating Systems Principles. Prentice-Hall, 1972.Google Scholar
- Coffman, Jr. E. G. and Hofri, M. 1990. Queueing models of secondary storage devices. In Stochastic Analysis of Computer and Communication Systems. H. Takagi, Ed., Elsevier Science, Amsterdam, The Netherlands, 549--588.Google Scholar
- Corbett, P. F., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., and Sankar, S. 2004. Row-diagonal parity for double disk failure correction. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST'04). Google Scholar
Digital Library
- Courtright II, W. V., Holland, M. C., Gibson, G. A., Reilly, L. N., and Zelenka, J. 1996. RAIDframe: A rapid prototyping tool for RAID systems. Parallel Data Laboratory, CMU. http://www.pdl.cmu.edu/RAIDframe.Google Scholar
- Denning, P. J. 1967. Effects of scheduling in file memory operations. In Proceedings of the AFIPS Spring Joint Computer Conference on (SJCC). Google Scholar
Digital Library
- Dholakia, A., Eleftheriou, E., Hou, X.-Y., Iliadis, I., Menon, J., and Rao, K. K. 2008. Analysis of a new intra-disk redundancy scheme for high reliability RAID storage systems in the presence of unrecoverable errors. ACM Trans. Storage 4, 1. Google Scholar
Digital Library
- Durstenfeld, R. 1964. Algorithm 235: Random permutation. Comm. ACM 7, 7, 420. Google Scholar
Digital Library
- Elerath, J. H. 2007. Reliability model and assessment of RAID incorporating latent defects and non-homogeneous Poisson process events. Tech. rep. Mechanical Engineering Department, University of Maryland.Google Scholar
- Elerath, J. G. and Pecht, M. 2007. Enhanced reliability modeling of RAID storage systems. Proceedings of the 37th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07), 175--184. Google Scholar
Digital Library
- Feng, G.-L., Deng, R. H., Bao, F., and Shen, J.-C. 2005a. New efficient MDS array codes for RAID. Part I: Reed-Solomon-like codes for tolerating three disk failures. IEEE Trans. Comput. 54, 9, 1071--1080. Google Scholar
Digital Library
- Feng, G.-L., Deng, R. H., Bao, F., and Shen J.-C. 2005b. New efficient MDS array codes for RAID. Part II: Rabin-like codes for tolerating multiple (greater than or equal to 4) disk failures. IEEE Trans. Comput. 54, 12, 1473--1483. Google Scholar
Digital Library
- Ferrari, D. 1984. On the foundations of artificial workload design. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. Google Scholar
Digital Library
- Fleiner, C., Garner, R. B., Hafner, J. L., Rao, K. K., Hosekote, D. R. K., Wilcke, W., and Golder, J. S. 2006. Reliability of modular mesh-connected intelligent storage brick systems. IBM J. Res. and Devel. 50, 2/3, 199-208. Google Scholar
Digital Library
- Franaszek, P. A., Robinson, J. T., and Thomasian, A. 1996. RAID level 5 with free blocks/parity cache. US Patent 5,522,032.Google Scholar
- Franaszek, P. A. and Robinson, J. T. 1997. On variable scope of parity protection in disk arrays. IEEE Trans. Comput. 46, 2, 234--240. Google Scholar
Digital Library
- Freitas, R. F. and Wilcke, W. W. 2008. The next storage system technology. IBM J. Res. Devel. 52, 4--5, 439--448. Google Scholar
Digital Library
- Friedman, M B. 1995. The performance and tuning of a StorageTek Iceberg RAID6 disk subsystem. Trans. Comput. Measure. Group. 77--88.Google Scholar
- Fu, G., Thomasian, A., Han, C., and Ng, S. W. 2004. Rebuild strategies for redundant disk arrays. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage and Technologies (MSST'04).Google Scholar
- Fu, G., Thomasian, A., Han, C., and Ng, S. W. 2004b. Rebuild strategies for clustered RAID. In Proceedings of the International Symposium on Performance Evaluation Computer and Telecommunication Systems (SPECTS'04). 598--607.Google Scholar
- Fuja, T., Heegard, C., and Blaum, M. 1989. Cross parity check convolutional codes. IEEE Trans. Inform. Theory 35, 6, 1264--1276.Google Scholar
Digital Library
- Fujita, H. 2006. Modified low-density MDS array codes. In Proceedings of the International Symposium on Information Theory (ISIT'06). 2789--2793.Google Scholar
Cross Ref
- Ganger. G. 1995. Generating synthetic workloads. In Proceedings of the 21st Computer Measurement Group, 1263--1269.Google Scholar
- Ganger, G. R. and Patt, Y. N. 1998. Using system-level models to evaluate I/O subsystem designs. IEEE Trans. Comput. 47, 6, 667--678. Google Scholar
Digital Library
- Gibson, G. A. 1992. Redundant Disk Arrays: Reliable, Parallel Secondary Storage. MIT Press, Cambridge, MA. Google Scholar
Digital Library
- Gibson, G. A. and Patterson, D. A. 1993. Designing disk arrays for high reliability. J. Parall. Distrib. Comput. 17, 1--2, 4--27. Google Scholar
Digital Library
- Golding, R., Shriver, E., Sullivan, T., and Wilkes, J. 1995. Attribute-managed storage. In Proceedings of the Workshop on Modeling and Specification of I/O.Google Scholar
- Gomez, M. E. and Santonja, V. 2000. A new approach in the modeling and generation of synthetic disk workload. In Proceedings of the 8th Annual Meeting of the IEEE Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'00). Google Scholar
Digital Library
- Goodman, R. and Sayano, M. 1990. Size limits on phased burst error correcting array codes. Electron. Lett. 26, 55--56.Google Scholar
Cross Ref
- Goodman, R., McEliece, R. J., and Sayano, M. 1993. Phased burst error correcting codes. IEEE Trans. Inform. Theory 39, 2, 684--693.Google Scholar
Digital Library
- Gray, J, Horst, B., and Walker, M. 1990. Parity striping of disk arrays: Low-cost reliable storage with acceptable throughput. In Proceedings of the 16th International Conference on Very Large Data Bases. 148--159. Google Scholar
Digital Library
- Gray, J. and Shenoy, P. J. 2000. Rules of thumb in data engineering. In Proceedings of the 16th Annual IEEE International Conference on Data Engineering (ICDE'00). 3--12. Google Scholar
Digital Library
- Gray, J. 2002. Storage bricks have arrived (Keynote Speech), First USENIX Conference on File and Storage Technologies (FAST'02), 56--65. Google Scholar
Digital Library
- Gribble, S. D., Manku, G. S., Roselli, D. S., Brewer, E. A., Gibson, T. J., and Miller, E. L. 1998. Self-similarity in file systems. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 141--150. Google Scholar
Digital Library
- Griffin, J. L., Schlosser, S. W., Ganger, G. R., and Nagle, D. 2000. Modeling and performance of MEMS-based storage devices. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. Google Scholar
Digital Library
- Hafner, J. L. 2005. WEAVER codes: Highly fault tolerant erasure codes for storage systems. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST'05). Google Scholar
Digital Library
- Hafner, J. L., Deenadhayalan, V. W., Rao, K. K., and Tomlin, J. A. 2005. Matrix methods for lost data reconstruction in erasure codes. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST'05). Google Scholar
Digital Library
- Hafner, J. L. 2006. HoVer erasure codes for disk arrays. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'06). 217--226. Google Scholar
Digital Library
- Hafner, J. L., Deenadhayalan, V., Belluomini, W. and K. Rao. 2008. Undetected disk errors in RAID arrays. IBM J. Res. Develop. 52, 4/5. Google Scholar
Digital Library
- Hall, M. 1986. Combinatorial Theory, Second Edition, Wiley-Interscience, New York, NY.Google Scholar
- Hellerstein, L., Gibson, G. A., Karp, R. M., and Katz, R. H. 1994. Coding techniques for handling failures in large disk arrays. Algorithmica 12, 2/3, 182--208.Google Scholar
- Hennessy, J. L. and Patterson, D. A. 2006. Computer Architecture: A Quantitative Approach: 4th Ed. Morgan-Kaufman Publishers, San Mateo, CA. Google Scholar
Digital Library
- Hill, E. A. 1994. System for managing data storage based on vector-summed size-frequency vectors for data sets, devices, and residual storage on devices, U.S. Patent 5345584.Google Scholar
- Hitz, D., Lau, J., and Malcolm, M. 1994. File system design for an NFS file server appliance. In Proceedings of the USENIX Conference, 235--246. Google Scholar
Digital Library
- Holland, M. C., Gibson, G. A. and Siewiorek, D. P. 1994. Architectures and algorithms for on-line failure recovery in redundant disk arrays. J. Distrib. Parall. Datab. 11, 3 295--335. Google Scholar
Digital Library
- Holland, M. C. 1994. On-line data reconstruction in redundant disk arrays. Ph.D. Thesis, Department of Electrical and Computer Engineering, Carnegie-Mellon University, Pittsburgh, PA. Google Scholar
Digital Library
- Hou, R. Y., Menon, J., and Patt, Y. N. 1993. Balancing I/O response time and disk rebuild time in a RAID5 disk array. In Proceedings of the 26th Hawaii International Conference on System Sciences (HICSS 26), Vol. I, 70--79.Google Scholar
- Hsu, W. W. and Smith, A. J. 2003. Characteristics of I/O traffic in personal computer and server workloads. IBM Syst. J. 42, 2, 347--372. Google Scholar
Digital Library
- Hsu, W. W. and Smith, A. J. 2004. The performance impact of I/O optimizations and disk improvements. IBM J. Res. Develop. 48, 2, 255--269. Google Scholar
Digital Library
- Hsu, W. W., Smith, A. J., and Young, H. C. 2005. The automatic improvement of locality in storage systems. ACM Trans. Comput. Syst. 23, 4, 424--473. Google Scholar
Digital Library
- Huang, C. and Xu. L. 2008. STAR: An efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57, 7, 899--901. Google Scholar
Digital Library
- Iliadis, I., Haas, R. Hu, X.-Y., and Eleftheriou, E. 2008. Disk scrubbing versus intra-disk redundancy for high-reliability RAID storage systems. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 241--252. Google Scholar
Digital Library
- Jacob, B., Ng, S. W., and Wang, D. T. 2008. Memory Systems: Cache, DRAM, and Disk. Morgan Kaufmann Publishers. Google Scholar
Digital Library
- Ji, M., Veitch, A. C., Wilkes, J. 2003. Seneca: Remote mirroring done write. In Proceedings of the USENIX Annual Technical Conference. 253--268.Google Scholar
- Kari, H. H. 1997. Latent sector faults and reliability of disk arrays. Doctor of Technology Thesis, University of Technology, Espoo, Finland. http://www.tcs.hut.fi/~hhk/.Google Scholar
- Kelton, W. D., Sadowksi, R. P., and Sturrok, D. E. 2006. Simulation with Arena, 4th Ed., McGraw-Hill, New York, NY. Google Scholar
Digital Library
- Kenyon, C. 1996. Best-fit bin-packing with random order. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). 359--364. Google Scholar
Digital Library
- Kleinrock, L. 1975. Queueing Systems, Vol. I: Theory. Wiley, New York, NY.Google Scholar
Digital Library
- Kotz, D. F., Roh, S. B., and Radhakrishnan, S. 1999. A detailed simulation model of the HP 97560 disk drive. Dartmouth College, Hanover, NH. http://www.cs.dartmouth.edu/~dfk/diskmodel.Google Scholar
- Lavenberg, S. S. 1983. Computer Performance Modeling Handbook. Academic Press, New York, NY. Google Scholar
Digital Library
- Lazowska, E. D., Zahorjan, J., Graham. G. S., and Sevcik, K. C. 1984. Quantitative Systems Performance: Computer System Analysis Using Queueing Network Models, Prentice-Hall, Upper Saddle River, NJ. http://www.cs.washington.edu/homes/lazowska/qsp/. Google Scholar
Digital Library
- Lee, E. K. and Katz, R. H. 1993. The performance of parity placements in disk arrays. IEEE Trans. Comput. 42, 6, 651--664. Google Scholar
Digital Library
- Lu, C., Alvarez, G. A., and Wilkes, J. 2002. Aqueduct: Online data migration with performance guarantees. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST'02). 219--230. Google Scholar
Digital Library
- Lumb, C. R., Schindler, J., and Ganger, G. R. 2002. Freeblock scheduling outside of disk firmware. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST'02). 275--288. Google Scholar
Digital Library
- Lumb, C. R., Merchant, A., and Alvarez, G. A. 2003. Facade: Virtual storage device with performance guarantees. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03). Google Scholar
Digital Library
- MacWilliams, F. J. and Sloane, N. J. A. 1977. The Theory of Error-Correcting Codes. North Holland, Amsterdam, The Netherlands.Google Scholar
- Malhotra, M. and Trivedi, K. S. 1993. Reliability analysis of redundant arrays of inexpensive disks. J. Paral. Distrib. Comput. 17, 1/2, 146--151. Google Scholar
Digital Library
- Mathews, J., Trika, S., Hensgen, D., Coulson, R. and Grimsrud, K. 2008. Intel Turbo Memory: Nonvolatile disk caches in the storage hierarchy of mainstream computer systems. ACM Trans. Storage 4, 2. Google Scholar
Digital Library
- McKusick, M. K., Joy, W. N., Leffler, S. J., and Fabry, R. S. 1984. A fast file system for UNIX. ACM Trans. Comput. Syst. 2, 3, 181--197. Google Scholar
Digital Library
- McNutt, B. 1994. Background data movement in a log-structured disk subsystem. IBM J. Res. Develop. 38, 1, 47--58. Google Scholar
Digital Library
- McNutt, B. 2000. The Fractal Structure of Data Reference: Applications to the Memory Hierarchy. Kluwer Academic Publishers, Norwell, MA. Google Scholar
Digital Library
- Menon, J. and Mattson, D. 1992. Distributed sparing in disk arrays. In Proceedings of the 37th Annual IEEE Computer Society Conference (COMPCON'92). 410--421. Google Scholar
Digital Library
- Menon, J., Roche, J., and Kasson, J. 1993. Floating parity and data disk arrays. J. Parall. Distrib. Comput. 17, 1/2, 129--139. Google Scholar
Digital Library
- Menon, J. and Cortney, J. 1993. The architecture of a fault-tolerant cached RAID controller. In Proceedings of the 20th Annual International Symposium on Computer Architecture (ISCA'93). 76--86. Google Scholar
Digital Library
- Menon, J. 1994. Performance of RAID5 disk arrays with read and write caching. J. Distrib. Parall. Datab. 11, 3, 261--293. Google Scholar
Digital Library
- Menon, J. 1995. A performance comparison of RAID5 and log-structured arrays. In Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing (HPDC'95). 167--178. Google Scholar
Digital Library
- Menon, J. and Stockmeyer, L. 1998. An age threshold algorithm for garbage collection in log-structured arrays and file systems. IBM Research Report RJ 10120, Almaden Research Center. 119--132.Google Scholar
- Merchant, A. and Yu, P. S. 1996. Analytic modeling of clustered RAID with mapping based on nearly random permutation. IEEE Trans. Comput. 45, 3, 367--373. Google Scholar
Digital Library
- Mogi, K. and Kitsuregawa, M. 1996. Hot mirroring: A study to hide parity upgrade penalty and degradations during rebuilds for RAID5. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 183--194. Google Scholar
Digital Library
- Muntz, R. R. and Lui, J. C. S. 1990. Performance analysis of disk arrays under failure. In Proceedings of the 16th International Conference on Very Large Data Bases (VLDB). 162--173. Google Scholar
Digital Library
- Nelson, R. and Tantawi, A. 1988. Approximate analysis of fork-join synchronization in parallel queues. IEEE Trans. Comput. 37, 6, 736--743. Google Scholar
Digital Library
- Newberg, L. and Wolf, D. 1994. String layouts for a redundant array of inexpensive disks. Algorithmica 12, 2/3, 209--224.Google Scholar
- Ng, S. W. 1994a. Crosshatch disk array for improved reliability and performance. In Proceedings of the 21st Annual International Symposium on Computer Architecture (ISCA'94). 255--264. Google Scholar
Digital Library
- Ng, S. W. 1994b. Sparing for redundant disk arrays. Distrib. Paral. Datab. 2, 2, 133--149. Google Scholar
Digital Library
- Ng, S. W. and Mattson, R. L. 1994. Uniform parity distribution in disk arrays with multiple failures. IEEE Trans. Comput. 43, 4, 501--506. Google Scholar
Digital Library
- Ng, S. W. 1998. Advances in disk technology: Performance issues. IEEE Comput. 40, 1, 75--81. Google Scholar
Digital Library
- Ng, Y. W. and Avizienis, A. 1980. A unified reliability model for fault-tolerant computers. IEEE Trans. Comput. 29, 1, 1002--1011. Google Scholar
Digital Library
- Nicola, V. F., Shahabuddin, P., Heidelberger, P., and Glynn, P. W. 1993. Fast simulation of steady-state availability in non-Markovian highly dependable systems. In Proceedings of the 23rd Annual International Symposium on Fault Tolerant Computing (FTCS-23). 38--47.Google Scholar
- Park, C.-I. 1995. Efficient placement of parity and data to tolerate two disk failures in disk array systems. IEEE Trans. Parall. Distrib. Syst. 6, 11, 1177--1184. Google Scholar
Digital Library
- Patel, A. M. 1985. Adaptive cross parity code for a high density magnetic tape subsystem. IBM J. Resear. Develop. 29, 5, 546--562. Google Scholar
Digital Library
- Patterson, D. A., Gibson, G. A., and Katz, R. 1988. A case for Redundant Arrays of Inexpensive Disks (RAID). In Proceedings of the ACM SIGMOD International Conference on Management of Data. 109--116. Google Scholar
Digital Library
- Pinheiro, E., Weber, W. D., and Barroso, L. A. 2007. Failure trend in a large disk drive population. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST'07). Google Scholar
Digital Library
- Plank, J. S. 1997. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Softw. Pract. Exper. 27, 9, 995--1012. Google Scholar
Digital Library
- Plank, J. S. and Ding, Y. 2005. Note: Correction to the 1997 tutorial on Reed-Solomon coding. Softw. Pract. Exper. 35, 2, 178--194. Google Scholar
Digital Library
- Plank, J. S. 2005. Erasure Codes for Storage Applications (Tutorial). In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST'05).Google Scholar
- Plank, J. S. and Xu, L. 2006. Optimizing Cauchy Reed-Solomon codes for fault-tolerant network storage applications. In Proceedings of the 5th IEEE International Symposium on Network Computing and Applications (NCA06). Google Scholar
Digital Library
- Plank, J. S and Thomason, M. G. 2007. An exploration of non-asymptotic low-density, parity check erasure codes for wide-area storage applications. Paral. Process. Lett. 17, 103--123.Google Scholar
Cross Ref
- Plank, J. S. 2008a. The RAID-6 liberation codes. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST'08). Google Scholar
Digital Library
- Plank, J. S. 2008b. A new minimum density RAID-6 code with a word size of eight. In Proceedings of the 7th IEEE International Symposium on Network Computing Applications (NCA-08). Google Scholar
Digital Library
- Plank, J. S., Luo, J., Schuman, C. D., Xu. L., and Wilcox-O'Hearn, C. 2009. A performance evaluation and examination of open-source erasure coding libraries for storage. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST'09). Google Scholar
Digital Library
- Prusinkiewicz, P. and Budkowski, S. 1976. A double-track error-correction code for magnetic tape. IEEE Trans. on Comput. 25, 6, 642--645. Google Scholar
Digital Library
- Ramakrishnan, K. K., Biswas, P., and Karedla, R. 1992. Analysis of file I/O traces in commercial computing environments. In Proceedings of the Joint ACM SIGMETRICS/Performance'92 Conference on Measurement and Modeling of Computer Systems, 78--90. Google Scholar
Digital Library
- Ramakrishnan, R. and Gehrke, J. 2003. Database Management Systems 3rd Ed., McGraw-Hill, New York, NY. Google Scholar
Digital Library
- Rao, K. K., Hafner, J. L., and Golding, R. A. 2006. Reliability for networked storage nodes. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'06). 237--248. Google Scholar
Digital Library
- Rosenblum, M. and Ousterhout, J. K. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52. Google Scholar
Digital Library
- Ruemmler, C. and Wilkes, J. 1994. An introduction to disk drive modeling. IEEE Comput. 27, 3, 217--228. Google Scholar
Digital Library
- Sahner, R. A., Trivedi, K. S., and Puliafito, A. 1996. Performance and Reliability Analysis of Computer Systems. Kluwer Academic Publishers, Norwell, MA. Google Scholar
Digital Library
- Scheuermann, P., Weikum, G., and Zabback, P. 1994. “Disk cooling” in parallel disk systems. Data Engin. Bul. 17, 3, 29--40.Google Scholar
- Schindler, J., Griffin, J. L., Lumb, C. R., and Ganger, G. R. 2002. Track-aligned extents: Matching access patterns to disk drive characteristics. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST'02). 259--274. Google Scholar
Digital Library
- Schlosser, S. W., Papadimanolakis, S., Shao, M., Schindler, J. Ailamaki, A., Faloutsos, C., and Ganger, G. R. 2005. On multidimensional data and modern disks. In Proceedings of the 4th USENIX Conference on File and Storage Technologies (FAST'05). Google Scholar
Digital Library
- Schroeder, B. and Gibson, G. A. 2007. Understanding disk failure rates: What does an MTTF of 1,000.000 hours mean to you? ACM Trans. Storage Syst. 3, 3, Article No. 8. Google Scholar
Digital Library
- Schwarz, T. J. E. 1994. Reliability and performance of disk arrays. Ph.D. Thesis, University of California, San Diego, CA. Google Scholar
Digital Library
- Schwarz, T. J. E., Steinberg, J., and Burkhard, W. A. 1999. Permutation development data layout (PDDL) disk array declustering. In Proceedings of the 5th IEEE Symposium on High Performance Computer Architecture (HPCA). 214--217. Google Scholar
Digital Library
- Schwarz, T. J. E., Xin, Q., Miller, E. L., Long, D. D. E., Hospodor, A., and Ng, S. W. 2004. Disk scrubbing in large archival storage systems. In Proceedings of the 13th IEEE Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'04). 409--418. Google Scholar
Digital Library
- Seltzer, M. I., Bostic, K., McKusick, M. K., and Staelin, C. 1993. An implementation of a log-structured file system for UNIX. In Proceedings of the USENIX Winter Technical Conference. 307--326. Google Scholar
Digital Library
- Shenoy, P. J. and Vin, H. M. 2002. A disk scheduling framework for next generation operating systems. Real-Time Syst. 22, 1--2, 9--48. Google Scholar
Digital Library
- Smartmontools. 2008. Self-Monitoring Analysis and Reporting Technology (SMART) disk drive monitoring tools. http://sourceforge.net/projects/smartmontools/.Google Scholar
- Smith, A. J. 1985. Disk cache: Miss ratio analysis and design considerations. ACM Trans. Comput. Syst. 3, 3, 161--203. Google Scholar
Digital Library
- Shriver, E., Merchant, A., and Wilkes, J. 1998. An analytic behavior model for disk drives with readahead caches and request reordering. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 181--191. Google Scholar
Digital Library
- Stockmeyer, L. 2001. Simulations of the age-threshold and fitness free space collection algorithms on a long trace. IBM Res. Rep. RJ 10222, Almaden Research Center, CA.Google Scholar
- Stodolsky, D., Holland, M., Courtright II, W. C., and Gibson. G. A. 1994. Parity logging disk arrays. ACM Trans. Comput. Syst. (TOCS) 12, 3, 206--235. Google Scholar
Digital Library
- Takagi, H. 1991. Queueing Analysis: Foundations of Performance Evaluation, Vol. 1: Vacation and Priority Systems, Part 1. North-Holland, Amsterdam, The Netherlands.Google Scholar
- Tay, Y. C. and Zou, M. 2006. A page fault equation for modeling the effect of memory size. Perform. Eval. 63, 2, 99--130. Google Scholar
Digital Library
- Thereska, E. and Ganger, G. E. 2008. IRONModel: Robust performance modes in the wild. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. June, 253--264. Google Scholar
Digital Library
- Thomasian, A. and Menon, J. 1994. Performance analysis of RAID5 disk arrays with a vacationing server model for rebuild mode operation. In Proceedings of the 10th IEEE International Conference on Data Engineering (ICDE'94). 111--119. Google Scholar
Digital Library
- Thomasian, A. 1995. Rebuild options in RAID5 disk arrays. In Proceedings of the 7th IEEE Symposium on Parallel and Distributed Systems. 511--518. Google Scholar
Digital Library
- Thomasian, A. and Menon, J. 1997. RAID5 performance with distributed sparing. IEEE Trans. Parall. Distrib. Syst. 8, 6, 640--657. Google Scholar
Digital Library
- Thomasian. A. and Liu, C. 2002. Some new disk scheduling policies and their performance. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 266--267. Google Scholar
Digital Library
- Thomasian, A. and Liu, C. 2004. Performance evaluation for variations of the SATF scheduling policy. In Proceedings of the International Symposium on Performance Evaluation Computer and Telecommunication Systems (SPECTS'04). 431--437.Google Scholar
- Thomasian, A., Han, C., Fu, G. and Liu, C. 2004. A performance tool for RAID disk arrays. In Proceedings of the Conference on Quantitative Evaluation of Systems (QEST'04). 8--17. Google Scholar
Digital Library
- Thomasian, A. 2005a. Read-modify-writes versus reconstruct writes in RAID. Inform. Process. Lett. 93, 4, 163--168. Google Scholar
Digital Library
- Thomasian, A. 2005b. Access costs in clustered RAID disk arrays. Comput. J. 48, 6, 702--713. Google Scholar
Digital Library
- Thomasian, A., Branzoi, B. A., and Han, C. 2005. Performance evaluation of a heterogeneous disk array architecture. In Proceedings of the 13th IEEE/ACM Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'05). 517--520. Google Scholar
Digital Library
- Thomasian, A. and Liu, C. 2005. Comment on “Issues and challenges in the performance analysis of real disk arrays.” IEEE Trans. Parall. Distrib. Syst. 16, 11, 1103--1104. Google Scholar
Digital Library
- Thomasian, A. 2006a. Multi-level RAID for very large disk arrays—VLDAs. ACM Perform. Eval. Rev. 33, 4, 17--22. Google Scholar
Digital Library
- Thomasian, A. 2006b. Comment on “RAID performance with distributed sparing”. IEEE Trans. Parall. Distrib. Syst. 17, 4, 399--400. Google Scholar
Digital Library
- Thomasian, A. 2006c. Shortcut method for reliability comparisons in RAID. J. Syst. Soft. 79, 11, 1599--1605.Google Scholar
Cross Ref
- Thomasian. A. and Blaum, M. 2006. Mirrored disk reliability and performance. IEEE Trans. Comput. 55, 12, 1640--1644. Google Scholar
Digital Library
- Thomasian, A., Fu, G., and Ng, S. W. 2007a. Analysis of rebuild processing in RAID5 disk arrays. Comput. J. 50, 2, 1--15. Google Scholar
Digital Library
- Thomasian, A., Han, C., and Fu, G. 2007b. Performance evaluation of two-disk failure tolerant arrays. IEEE Trans. Comput. 56, 6, 799--814. Google Scholar
Digital Library
- Thomasian, A. and Xu, J. 2008. Reliability and performance of mirrored disk organizations. Comput. J. 51, 6, 615--629. Google Scholar
Digital Library
- Tian, L., Feng, D., Jiang, H., Zhou, K., Zeng, L., Chen, J., Wang, Z., and Song, Z. 2007. PRO: A popularity-based multi-threaded reconstruction optimization for RAID-structured storage systems. In Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST'07). Google Scholar
Digital Library
- Traeger, A., Zadok, E., Joukov, N., and Wright, C. P. 2008. A nine year study of file system and storage benchmarking. ACM Trans. Storage 4, 2, Article No. 5. Google Scholar
Digital Library
- Treiber, K. and Menon, J. 1995. Simulation study of cached RAID5 designs. In Proceedings of the 1st IEEE Symposium on High Performance Computer Architecture (HPCA). 186--197. Google Scholar
Digital Library
- Trivedi, K. S. 2002. Probability and Statistics with Reliability, Queueing, and Computer Science Applications 2nd Ed. Wiley, New York, NY. Google Scholar
Digital Library
- Uysal. M., Alaverez, G., and Merchant, A. 2001. Analytical throughput model for modern disk arrays. In Proceedings of the 9th IEEE Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'01). 183--192. Google Scholar
Digital Library
- Uysal, M., Merchant, A., and Alvarez, G. 2003. Using MEMS-based storage in disk arrays. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03). Google Scholar
Digital Library
- Varki, E., Merchant, A., Xu, J., and Qiu, X. 2004. Issues and challenges in the performance analysis of real disk arrays. IEEE Trans. Parall. Distrib. Syst. 15, 4, 559--574. Google Scholar
Digital Library
- Varma, A and Jacobson, Q. 1998. Destage algorithms for disk arrays with non-volatile storage. IEEE Trans. Comput. 47, 2, 228--235. Google Scholar
Digital Library
- Willinger, W., Taqqu, M. S. Sherman, R., and Wilson, D. V. 1997. Self-similarity through high variability: Statistical analysis of Ethernet LAN traffic at the source level. IEEE/ACM Trans. Netw. 5, 1, 71--86. Google Scholar
Digital Library
- Wilkes, J., Golding, R. A., Staelin, C., and Sullivan, T. 1996. The HP AutoRAID hierarchical storage system. ACM Trans. Comput. Syst. 14, 1, 108--136. Google Scholar
Digital Library
- Wilkes, J. 1996. The Pantheon storage-system simulator. Tech rep. HPL-SSP-95-14, HP Labs, Palo Alto, CA.Google Scholar
- Wolf, J. L. 1989. The placement optimization program: A practical solution to the disk assignment problem. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 1--10. Google Scholar
Digital Library
- Wong, T. M., Golding, R. A., Lin. C., and Becker-Szendy, R. A. 2006. Zygaria: Storage performance as a managed resource. In Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium on (RTAS'06). 125--134. Google Scholar
Digital Library
- Worthington, B. L., Ganger, G. R., and Patt, Y. L. 1994. Scheduling for modern disk drives and non-random workloads. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. 241--251. Google Scholar
Digital Library
- Wu, S., Jiang, H., Feng, D., Tian, L. and Mao, B. 2009. Workout: I/O workload outsourcing for boosting RAID reconstruction performance. In Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST'09). Google Scholar
Digital Library
- Xin, Q., Schwarz, T. J. E., and Miller, E. L. 2005. Disk infant mortality in large storage systems. In Proceedings of the 13th Annual Meeting of the IEEE Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS'05). 125--134. Google Scholar
Digital Library
- Xu, L. and Bruck, J. 1999. X-Code: MDS array codes with optimal encoding. IEEE Trans. Inform. Theory 45, 1, 272--276. Google Scholar
Digital Library
- Xu, L. Bohossian, V., Bruck, J. and Wagner, D. G. 1999. Low-density MDS codes and factors of complete graphs. IEEE Trans. Inform. Theory 45, 6, 1817--1826. Google Scholar
Digital Library
- Zabback, P., Riegel, J., and Menon, J. 1996. The RAID configuration tool. In Proceedings of the 3rd International Conference on High Performance Computing (HiPC'96). 55--61. Google Scholar
Digital Library
- Zaitsev, G. V., Zinov'ev, V. A. and Semakov, N. V. 1983. Minimum-check-density codes for correcting bytes of errors, erasures, or defects. Probl. Inform. Trans. 19, 197--204.Google Scholar
Index Terms
Higher reliability redundant disk arrays: Organization, operation, and coding
Recommendations
RAID: high-performance, reliable secondary storage
Disk arrays were proposed in the 1980s as a way to use parallelism between multiple disks to improve aggregate I/O performance. Today they appear in the product lines of most major computer manufacturers. This article gives a comprehensive overview of ...
On Variable Scope of Parity Protection in Disk Arrays
In a common form of a RAID 5 architecture, data is organized on a disk array consisting of N + 1 disks into stripes of N data blocks and one parity block (with parity block locations staggered so as to balance the number of parity blocks on each disk). ...
Mirrored and hybrid disk arrays and their reliability
AbstractReplication and erasure coding are two alternative methods for disk arrays to deal with disk failures. This work concentrates on mirrored disk arrays, classified as RAID1, and hybrid disk arrays, which implement redundancy by storing XORed data ...






Comments