Abstract
Despite its widespread use in consumer devices and enterprise storage systems, NAND flash faces a growing number of challenges. While technology advances have helped to increase the storage density and reduce costs, they have also led to reduced endurance and larger block variations, which cannot be compensated solely by stronger ECC or read-retry schemes but have to be addressed holistically.
Our goal is to enable low-cost NAND flash in enterprise storage for cost efficiency. We present novel flash-management approaches that reduce write amplification, achieve better wear leveling, and enhance endurance without sacrificing performance. We introduce block calibration, a technique to determine optimal read-threshold voltage levels that minimize error rates, and novel garbage-collection as well as data-placement schemes that alleviate the effects of block health variability and show how these techniques complement one another and thereby achieve enterprise storage requirements.
By combining the proposed schemes, we improve endurance by up to 15× compared to the baseline endurance of NAND flash without using a stronger ECC scheme. The flash-management algorithms presented herein were designed and implemented in simulators, hardware test platforms, and eventually in the flash controllers of production enterprise all-flash arrays. Their effectiveness has been validated across thousands of customer deployments since 2015.
- Jens Axboe. 2014. FIO—Flexible IO Tester. Retrieved from https://linux.die.net/man/1/fio.Google Scholar
- Avraham Ben-Aroya and Sivan Toledo. 2006. Competitive analysis of flash-memory algorithms. In Proceedings of 14th Annual European Symposium on Algorithms (ESA’06). 100--111. Google Scholar
Digital Library
- Simona Boboila and Peter Desnoyers. 2010. Write endurance in flash drives: Measurements and analysis. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST’10). 115--128. Google Scholar
Digital Library
- Luc Bouganim, Björn Por Jonsson, and Philippe Bonnet. 2009. uFLIP: Understanding flash IO patterns. In Biennial Conference on Innovative Data Systems Research (CIDR’09).Google Scholar
- Werner Bux and Ilias Iliadis. 2010. Performance of greedy garbage collection in flash-based solid-state drives. Perform. Eval. 67, 11 (Nov. 2010), 1172--1186. Google Scholar
Digital Library
- Yu Cai, Erich F. Haratsch, Onur Mutlu, and Ken Mai. 2013. Threshold voltage distribution in MLC NAND flash memory: Characterization, analysis, and modeling. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’13). 1285--1290. Google Scholar
Digital Library
- Yu Cai, O. Mutlu, E. F. Haratsch, and Ken Mai. 2013. Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation. In Proceedings of the IEEE 31st International Conference on Computer Design (ICCD’13). 123--130.Google Scholar
Cross Ref
- Yu Cai, Gulay Yalcin, Onur Mutlu, Erich F. Haratsch, Adrian Cristal, Osman Unsal, and Ken Mai. 2012. Flash correct-and-refresh: Retention-aware error management for increased flash memory lifetime. In Proceedings of the IEEE Conference on Computer Design (ICCD’12). 94--101. DOI:https://doi.org/ICCD.2012.6378623 Google Scholar
Digital Library
- Li-Pin Chang. 2007. On efficient wear leveling for large-scale flash-memory storage systems. In Proceedings of the 2007 ACM Symposium on Applied Computing (SAC’07). 1126--1130. Google Scholar
Digital Library
- Li-Pin Chang, Tei-Wei Kuo, and Shi-Wu Lo. 2004. Real-time garbage collection for flash-memory storage systems of real-time embedded systems. ACM Trans. Embed. Comput. Syst. 3, 4 (Nov. 2004), 837--863. Google Scholar
Digital Library
- Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (Feb. 2013), 74--80. Google Scholar
Digital Library
- Peter Desnoyers. 2012. Analytic modeling of SSD write performance. In Proceedings of the 5th Annual International Systems and Storage Conference (SYSTOR’12). 12:1--12:10. Google Scholar
Digital Library
- Yoav Etsion and Dror G. Feitelson. 2012. Exploiting core working sets to filter the L1 cache with random sampling. IEEE Trans. Comput. 61, 11 (2012), 1535--1550. Google Scholar
Digital Library
- Eran Gal and Sivan Toledo. 2005. Algorithms and data structures for flash memories. Comput. Surv. 37, 2 (Jun. 2005), 138--163. Google Scholar
Digital Library
- Alessandro Grossi, Lorenzo Zuolo, Franesco Restuccia, and Piero Olivo. 2015. Quality-of-service implications of enhanced program algorithms for charge-trapping NAND in future solid-state drives. IEEE Trans. Device Mater. Rel. 15, 3 (Sept. 2015), 363--369.Google Scholar
Cross Ref
- Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. 2009. Characterizing flash memory: Anomalies, observations, and applications. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). 24--33. Google Scholar
Digital Library
- Xiao-Yu Hu, Evangelos Eleftheriou, Robert Haas, Ilias Iliadis, and Roman Pletka. 2009. Write amplification analysis in flash-based solid state drives. In Proceedings of the Israeli Experimental Systems Conference (SYSTOR’09). 10:1--10:9. Google Scholar
Digital Library
- Xiao-Yu Hu, Robert Haas, and Evangelos Eleftheriou. 2011. Container marking: Combining data placement, garbage collection and wear levelling for flash. In Proceedings of the 2011 IEEE 19th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS’11). 237--247. Google Scholar
Digital Library
- Jian Huang, Anirudh Badam, Laura Caulfield, Suman Nath, Sudipta Sengupta, Bikash Sharma, and Moinuddin K. Qureshi. 2017. FlashBlox: Achieving both performance isolation and uniform lifetime for virtualized SSDs. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). 375--390. Google Scholar
Digital Library
- IBM. 2015. FlashSystem 900. Retrieved from http://www-03.ibm.com/systems/storage/flash/900/.Google Scholar
- JEDEC 2017. Stress-Test-Driven Qualification of Integrated Circuits. Retrieved from http://jedec.org/.Google Scholar
- Yangwook Kang, Jingpei Yang, and Ethan L. Miller. 2010. Efficient storage management for object-based flash memory. In Proceedings of the 18th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’10). 407--409. Google Scholar
Digital Library
- Han-Joon Kim and Sang-Goo Lee. 1999. A new flash memory management for flash storage system. In Proceedings of the 23rd International Computer Software and Applications Conference (COMPSAC’99). 284--289. Google Scholar
Digital Library
- Youngjoo Lee, Hoyoung Yoo, Injae Yoo, and In-Cheol Park. 2012. 6.4 Gb/s multi-threaded BCH encoder and decoder for multi-channel SSD controllers. In Proceedings of the IEEE International Solid-State Circuit Conference (ISSCC’12). DOI:https://doi.org/ISSCC.2012.6177075Google Scholar
- Jai Menon and Larry Stockmeyer. 1998. An age-threshold algorithm for garbage collection in log-structured arrays and file systems. In High Performance Computing Systems and Applications. 119--132.Google Scholar
- Neal Mielke, Todd Marquart, Ning Wu, Jeff Kessenich, Hanmant P. Belgal, Eric Schares, Falgun Trivedi, Evan Goodness, and Leland R. Nevill. 2008. Bit error rate in NAND flash memories. In Proceedings of the 46th Annual Int. Reliability Physics Symposium (IRPS’08). 9--19.Google Scholar
- Changwoo Min, Kangnyeon Kim, Hyunjin Cho, Sang-Won Lee, and Young Ik Eom. 2012. SFS: Random write considered harmful in solid state drives. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). 139--154. http://dl.acm.org/citation.cfm?id=2208461.2208473 Google Scholar
Digital Library
- Vidyabhushan Mohan, Taniya Siddiqua, Sudhanva Gurumurthi, and Mircea R. Stan. 2010. How I learned to stop worrying and love flash endurance. In Proceedings of the 2nd USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’10). Google Scholar
Digital Library
- Yangyang Pan, Guiqiang Dong, and Tong Zhang. 2013. Error rate-based wear-leveling for NAND flash memory at highly scaled technology nodes. IEEE Trans. VLSI Syst. 21, 7 (Jul. 2013), 1350--1354. Google Scholar
Digital Library
- Nikolaos Papandreaou, Theodore Antonakopoulos, Urs Egger, Aspa Palli, Haris Pozidis, and Evangelos S. Eleftheriou. 2013. A versatile platform for characterization of solid-state memory channels. In Proceedings of the 2013 18th International Conference on Digital Signal Processing (DSP’13). 1--5. DOI:https://doi.org/ICDSP.2013.6622745Google Scholar
- Nikolaos Papandreou, Thomas Parnell, Haris Pozidis, Thomas Mittelholzer, Evangelos S. Eleftheriou, Charles J. Camp, Thomas J. Griffin, Gary A. Tressler, and Andrew A. Walls. 2014. Using adaptive read voltage thresholds to enhance the reliability of MLC NAND flash memory systems. In Proceedings of the 24th ACM Great Lakes Symp. on VLSI (GLSVLSI’14). 151--156. Google Scholar
Digital Library
- Ki-Tae Park, Sangwan Nam, Daehan Kim, Pansuk Kwak, Doosub Lee, Yoon-He Choi, Myung-Hoon Choi, Dong-Hun Kwak, Doo-Hyun Kim, Min-Su Kim, Hyun-Wook Park, Sang-Won Shim, Kyung-Min Kang, Sang-Won Park, Kangbin Lee, Hyun-Jun Yoon, Kuihan Ko, Dong-Kyo Shim, Yang-Lo Ahn, Jinho Ryu, Donghyun Kim, Kyunghwa Yun, Joonsoo Kwon, Seunghoon Shin, Dae-Seok Byeon, Kihwan Choi, Jin-Man Han, Kye-Hyun Kyung, Jeong-Hyuk Choi, and Kinam Kim. 2015. Three-dimensional 128 Gb MLC vertical NAND flash memory with 24-WL stacked layers and 50 MB/s high-speed programming. IEEE J. Solid-State Circ. 50, 1 (Jan. 2015), 204--213.Google Scholar
Cross Ref
- B. Peleato, H. Tabrizi, R. Agarwal, and J. Ferreira. 2015. BER-based wear leveling and bad block management for NAND flash. In Proceedings of the IEEE International Conference on Communications (ICC’15). 295--300.Google Scholar
- Roman A. Pletka and Saša Tomić. 2016. Health-binning: Maximizing the performance and the endurance of consumer-level NAND flash. In Proceedings of the 9th ACM International Systems and Storage Conference (SYSTOR’16). Article 4, 10 pages. Google Scholar
Digital Library
- Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comp. Syst. 10, 1 (Feb. 1992), 26--52. Google Scholar
Digital Library
- Bianca Schroeder, Raghav Lagisetty, and Arif Merchant. 2016. Flash reliability in production: The expected and the unexpected. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 67--80. http://dl.acm.org/citation.cfm?id=2930583.2930589 Google Scholar
Digital Library
- Radu Stoica and Anastasia Ailamaki. 2013. Improving flash write performance by using update frequency. Proc. VLDB Endow. 6, 9 (Jul. 2013), 733--744. Google Scholar
Digital Library
- Fei Sun, Ken Rose, and Tong Zhang. 2006. On the use of strong BCH codes for improving multilevel NAND flash memory storage capacity. In Proceedings of the IEEE Workshop on Signal Processing Systems: Design and Implementation (SiPS’06). 241--249.Google Scholar
- Benny Van Houdt. 2013. A mean field model for a class of garbage collection algorithms in flash-based solid state drives. In Proceedings of the ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’13). ACM, New York, NY, 191--202. Google Scholar
Digital Library
- Benny Van Houdt. 2013. A mean field model for a class of garbage collection algorithms in flash-based solid state drives. SIGMETRICS Perform. Eval. Rev. 41, 1 (Jun. 2013), 191--202. Google Scholar
Digital Library
- Benny Van Houdt. 2013. Performance of garbage collection algorithms for flash-based solid state drives with hot/cold data. Perform. Eval. 70, 10 (Oct. 2013), 692--703. Google Scholar
Digital Library
- Steven E. Wells. 1994. Method for Wear Leveling in a Flash EEPROM Memory. U.S. Patent 5 341 339.Google Scholar
- Jingpei Yang, Ned Plasson, Greg Gillis, and Nisha Talagala. 2013. HEC: Improving endurance of high performance flash-based cache devices. In Proceedings of the 6th International Systems and Storage Conference (SYSTOR’13). 10:1--10:11. Google Scholar
Digital Library
- Yue Yang and Jianwen Zhu. 2016. Write skew and zipf distribution: Evidence and implications. ACM Trans. Stor. 12, 4 (Jun. 2016), 21:1--21:19. Google Scholar
Digital Library
- Kai Zhao, Wenzhe Zhao, Hongbin Sun, Tong Zhang, Xiaodong Zhang, and Nanning Zheng. 2013. LDPC-in-SSD: Making advanced error correction codes work effectively in solid state drives. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 243--256. http://dl.acm.org/citation.cfm?id=2591272.2591298 Google Scholar
Digital Library
- Lorenzo Zuolo, Christian Zambelli, Rino Micheloni, Marco Indaco, Stefano Di Carlo, Paolo Prinetto, Davide Pertozzi, and Piero Olivo. 2015. SSDExplorer: A virtual platform for performance/reliability-oriented fine-grained design space exploration of solid state drives. IEEE Trans. Comput.-Aid. Design Integrat. Circ. Syst. 34, 10 (Oct. 2015), 1627--1638.Google Scholar
Index Terms
Management of Next-Generation NAND Flash to Achieve Enterprise-Level Endurance and Latency Targets
Recommendations
Endurance enhancement of flash-memory storage systems: an efficient static wear leveling design
DAC '07: Proceedings of the 44th annual Design Automation ConferenceThis work is motivated by the strong demand of reliability enhancement over flash memory. Our objective is to improve the endurance of flash memory with limited overhead and without many modifications to popular implementation designs, such as Flash ...
Health-Binning: Maximizing the Performance and the Endurance of Consumer-Level NAND Flash
SYSTOR '16: Proceedings of the 9th ACM International on Systems and Storage ConferenceIn recent years, the adoption of NAND flash in enterprise storage systems has been progressing rapidly. Todays all-flash storage arrays exhibit excellent I/O throughput, latency, storage density, and energy efficiency. However, the advancements in NAND ...
Improving Flash Wear-Leveling by Proactively Moving Static Data
Motivated by the strong demand for flash memory with enhanced reliability, this work attempts to achieve improved flash-memory endurance without substantially increasing overhead and without excessively modifying popular implementation designs such as ...






Comments