Abstract
A recent ultra-large SSD (e.g., a 32-TB SSD) provides many benefits in building cost-efficient enterprise storage systems. Owing to its large capacity, however, when such SSDs fail in a RAID storage system, a long rebuild overhead is inevitable for RAID reconstruction that requires a huge amount of data copies among SSDs. Motivated by modern SSD failure characteristics, we propose a new recovery scheme, called
- Samsung SSD. 2018. Retrieved from https://www.samsung.com/semiconductor/insights/news-events/samsung-starts-producing-industrys-largest-capacity-ssd/.Google Scholar
- David Patterson, Garth Gibson, and Randy Katz. 1988. A case for redundant arrays of inexpensive disks (RAID). In Proceedings of the ACM-SIGMOD International Conference on the Management of Data. Google Scholar
Digital Library
- Broadcom. 2018. 12Gb/s MegaRAID Tri-Mode Software. Retrieved from https://docs.broadcom.com/docs/MR-TM-SW-UG105.Google Scholar
- Bianca Schroeder, Raghav Lagisetty, and Arif Merchant. 2016. Flash reliability in production: The expected and the unexpected. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Bianca Schroeder and Garth Gibson. 2007. Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Stathis Maneas, Kaveh Mahdaviani, Tim Emami, and Bianca Schroeder. 2020. A study of SSD reliability in large scale enterprise storage deployments. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Jimmy Yang and Feng-Bin Sun. 1999. A comprehensive review of hard-disk drive reliability. In Proceedings of the Annual Reliability and Maintainability Symposium.Google Scholar
Cross Ref
- Yu Cai, Erich F. Haratsch, Onur Mutlu, and Ken Mai. 2012. Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE’12). Google Scholar
Digital Library
- Yu Cai, Saugata Ghose, Erich F. Haratsch, Yixin Luo, and Onur Mutlu. 2017. Error characterization, mitigation, and recovery in flash-memory-based solid-state drives. Proc. IEEE 105, 9 (2017), 1666–1704.Google Scholar
Cross Ref
- Myungsuk Kim, Youngsun Song, Myoungsoo Jung, and Jihong Kim. 2018. SARO: A state-aware reliability optimization technique for high density NAND flash memory. In Proceedings of the Great Lakes Symposium on VLSI. Google Scholar
Digital Library
- Micron. 2011. TN-29-59: Bad Block Management. Retrieved from https://www.micron.com/-/media/client/global/documents/products/technical-note/nand-flash/tn2959_bbm_in_nand_flash.pdf.Google Scholar
- Samsung. 2014. Samsung V-NAND Technology, White Paper. Retrieved from https://studylib.net/doc/8282074/samsung-v-nand-technology.Google Scholar
- Jacob Alter, Ji Xue, Alma Dimnaku, and Evgenia Smirni. 2019. SSD failures in the field: Symptoms, causes, and prediction models. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Google Scholar
Digital Library
- Over-provisioning. 2020. Retrieved from https://www.seagate.com/tech-insights/ssd-over-provisioning-benefits-master-ti/.Google Scholar
- Peter M. Chen, Edward K. Lee, Garth A. Gibson, Randy H. Katz, and David A. Patterson. 1994. RAID: High-performance, reliable secondary storage. ACM Comput. Surv. 26, 2 (1994), 145–185. Google Scholar
Digital Library
- Serial AT Attachment. Retrieved from https://sata-io.org/.Google Scholar
- NVM Express. Retrieved from https://nvmexpress.org/resources/specifications/.Google Scholar
- SCSI Storage Interfaces. Retrieved from http://www.t10.org.Google Scholar
- Seagate Technology. 2011. Reducing RAID Recovery Downtime. Retrieved from https://www.seagate.com/files/staticfiles/docs/pdf/whitepaper/tp620-1-1110us-reducing-raid-recovery.pdf.Google Scholar
- Mai Zheng, Joseph Tucek, Feng Qin, and Mark Lillibridge. 2013. Understanding the robustness of SSDs under power fault. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Ying Y. Tai. 2016. High performance FTL for PCIe/NVMe SSDs. In Proceedings of the Flash Memory Summit.Google Scholar
- Shunzhuo Wang, Fei Wu, Chengmo Yang, Jiaona Zhou, Changsheng Xie, and Jiguang Wan. 2019. WAS: Wear aware superblock management for prolonging SSD lifetime. In Proceedings of the Design Automation Conference. Google Scholar
Digital Library
- Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The multi-streamed solid-state drive. In Proceedings of the Workshop on Hot Topics in Storage and File Systems. Google Scholar
Digital Library
- Taejin Kim, Duwon Hong, Sangwook Shane Hahn, Myoungjun Chun, Sungjin Lee, Jooyoung Hwang, Jongyoul Lee, and Jihong Kim. 2019. Fully automatic stream management for multi-streamed ssds using program contexts. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Ulink. DriveMaster. 2019. Retrieved from https://ulinktech.com/products/drivemaster-8-enterprise-sas/.Google Scholar
- Jens Axboe. 2020. FIO. Retrieved from https://github.com/axboe/fio.Google Scholar
- Iometer. 2014. Retrieved from http://www.iometer.org/.Google Scholar
- Eden Kim. 2014. Enterprise Applications: How to Create a Synthetic Workload Test. Retrieved from https://www.snia.org/sites/default/files/EdenKim_Enterprise_Applications_WorkLoad_Test_SDC_2014.pdf.Google Scholar
- Youngjae Kim, Sarp Oral, Galen M. Shipman, Junghee Lee, David A. Dillow, and Feiyi Wang. 2011. Harmonia: A globally coordinated garbage collector for arrays of solid-state drives. In Proceedings of the Symposium on Mass Storage Systems and Technologies. Google Scholar
Digital Library
- Ulrich Hansen. 2012. The SSD Endurance Race: Who’s Got the Write Stuff? Retrieved from https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2012/20120821_TC11_Hansen.pdf.Google Scholar
- Richard R. Muntz and John C. S. Lui. 1990. Performance analysis of disk arrays under failure. In Proceedings of the International Conference on Very Large Databases. Google Scholar
Digital Library
- Mark Holland and Garth Gibson. 1992. Parity declustering for continuous operation in redundant disk arrays. In Proceedings of the Architectural Support for Programming Languages and Operating Systems. Google Scholar
Digital Library
- G. A. Alverez, Walter A. Burkhard, L. L. Stockmeyer, and Flaviu Cristian. 1998. Declustered disk array architectures with optimal and near-optimal parallelism. In Proceedings of the International Symposium on Computer Architecture. Google Scholar
Digital Library
- Siu-Cheung Chau and Ada Wai-Chee Fu. 2002. A gracefully degradable declustered RAID architecture. Clust. Comput. 5, 1 (2002), 97–105. Google Scholar
Digital Library
- Jiguang Wan, Jibin Wang, Changsheng Xie, and Qing Yang. 2013. S2-RAID: Parallel RAID architecture for fast data recovery. IEEE Trans. Parallel Distrib. Syst. 25, 6 (2013), 1638–1647. Google Scholar
Digital Library
- Guangyan Zhang, Zican Huang, Xiaosong Ma, Songlin Yang, Zhufan Wang, and Weimin Zheng. 2018. RAID+: Deterministic and balanced data distribution for large disk enclosures. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
- Scott Shadley. 2011. SSD RAIN. Retrieved from https://www.micron.com/ /media/documents/products/technical-marketing-brief/brief_ssd_rain.pdf.Google Scholar
- Yangsup Lee, Sanghyuk Jung, and Yong Ho Song. 2009. FRA: A flash-aware redundancy array of flash storage devices. In Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis. Google Scholar
Digital Library
- Soojun Im and Dongkun Shin. 2011. Flash-aware RAID techniques for dependable and high-performance flash memory SSD. IEEE Trans. Comput. 60, 1 (2011), 80–92. Google Scholar
Digital Library
- Sehwan Lee, Bitna Lee, Kern Koh, and Hyokyung Bahn. 2011. A lifespan-aware reliability scheme for RAID-based flash storage. In Proceedings of the ACM Symposium on Applied Computing. Google Scholar
Digital Library
- Yi Qin, Dan Feng, Jingning Liu, Wei Tong, Yang Hu, and Zhiming Zhu. 2012. A parity scheme to enhance reliability for SSDs. In Proceedings of the International Conference on Networking, Architecture, and Storage. Google Scholar
Digital Library
- Heejin Park, Jaeho Kim, Jongmoo Choi, Donghee Lee, and Sam H. Noh. 2015. Incremental redundancy to reduce data retention errors in flash-based SSDs. In Proceedings of the International Conferece on Massive Storage Systems and Technology.Google Scholar
- Jaeho Kim, Eunjae Lee, Jongmoo Choi, Donghee Lee, and Sam H Noh. 2016. Chip-level raid with flexible stripe size and parity placement for enhanced ssd reliability. IEEE Trans. Comput. 65, 4 (2016), 1116–1130. Google Scholar
Digital Library
- Bryan S Kim, Jongmoo Choi, and Sang Lyul Min. 2019. Design tradeoffs for SSD reliability. In Proceedings of the USENIX Conference on File and Storage Technologies. Google Scholar
Digital Library
Index Terms
Reparo: A Fast RAID Recovery Scheme for Ultra-large SSDs
Recommendations
Hybrid S-RAID: A Power-Aware Archival Storage Architecture
PDCAT '12: Proceedings of the 2012 13th International Conference on Parallel and Distributed Computing, Applications and TechnologiesSemi-RAID (S-RAID) is an alternative RAID data layout for applications that exhibit sequential data access pattern in order to reduce power consumption of storage systems. However, it is not design for archival storage specially, and that makes it not ...
ACS: an alternate coding scheme to improve degrade read performance for SSD-based RAID5 systems
To guarantee high performance and reliability, storage systems require better devices and data redundancy schemes, e.g., SSD-based RAID5. However, failures in the large-scale storage systems are common. In order to serve requests on a failed node, the SSD-...
Understanding the SWD-based RAID System
CCBD '14: Proceedings of the 2014 International Conference on Cloud Computing and Big DataThe emergence of big data needs more and more storage capacity, and hard disk drive (HDD) plays a very important role in storage supplying. However, because of super paramagnetic effect, the growth of the areal density of HDD will quickly reach the ...






Comments