skip to main content
research-article

Automatic Stream Identification to Improve Flash Endurance in Data Centers

Published:28 April 2022Publication History
Skip Abstract Section

Abstract

The demand for high performance I/O in Storage-as-a-Service (SaaS) is increasing day by day. To address this demand, NAND Flash-based Solid-state Drives (SSDs) are commonly used in data centers as cache- or top-tiers in the storage rack ascribe to their superior performance compared to traditional hard disk drives (HDDs). Meanwhile, with the capital expenditure of SSDs declining and the storage capacity of SSDs increasing, all-flash data centers are evolving to serve cloud services better than SSD-HDD hybrid data centers. During this transition, the biggest challenge is how to reduce the Write Amplification Factor (WAF) as well as to improve the endurance of SSD since this device has a limited program/erase cycles. A specified case is that storing data with different lifetimes (i.e., I/O streams with similar temporal fetching patterns such as reaccess frequency) in one single SSD can cause high WAF, reduce the endurance, and downgrade the performance of SSDs. Motivated by this, multi-stream SSDs have been developed to enable data with a different lifetime to be stored in different SSD regions. The logic behind this is to reduce the internal movement of data—when garbage collection is triggered, there are high chances of having data blocks with either all the pages being invalid or valid. However, the limitation of this technology is that the system needs to manually assign the same streamID to data with a similar lifetime. Unfortunately, when data arrives, it is not known how important this data is and how long this data will stay unmodified. Moreover, according to our observation, with different definitions of a lifetime (i.e., different calculation formulas based on selected features previously exhibited by data, such as sequentiality, and frequency), streamID identification may have varying impacts on the final WAF of multi-stream SSDs. Thus, in this article, we first develop a portable and adaptable framework to study the impacts of different workload features and their combinations on write amplification. We then propose a feature-based stream identification approach, which automatically co-relates the measurable workload attributes (such as I/O size, I/O rate, and so on.) with high-level workload features (such as frequency, sequentiality, and so on.) and determines a right combination of workload features for assigning streamIDs. Finally, we develop an adaptable stream assignment technique to assign streamID for changing workloads dynamically. Our evaluation results show that our automation approach of stream detection and separation can effectively reduce the WAF by using appropriate features for stream assignment with minimal implementation overhead.

REFERENCES

  1. [1] Multi-Stream Technology. 2020. Retrieved 15 March, 2020 from http://www.samsung.com/semiconductor/insights/article/25465/multistream.Google ScholarGoogle Scholar
  2. [2] Performance and Endurance Enhancements with Multi-stream SSDs on Apache Cassandra. 2020. Retrieved 27 Jan., 2020 from https://www.samsung.com/semiconductor/global.semi.static/Multi-stream_Cassandra_Whitepaper_Final-0.pdf.Google ScholarGoogle Scholar
  3. [3] systemd. 2020. Retrieved 18 Dec., 2020 from http://manpages.ubuntu.com/manpages/bionic/man1/systemd.1.html.Google ScholarGoogle Scholar
  4. [4] UMass Trace Repository. 2020. Retrieved 18 Dec., 2020 from http://traces.cs.umass.edu/index.php/Storage/Storage.Google ScholarGoogle Scholar
  5. [5] (accessed January 13, 2017). SNIA Iotta Repository. Retrieved from http://iotta.snia.org/historical_section.Google ScholarGoogle Scholar
  6. [6] (accessed Septenber 7, 2016). FIO - flexible I/O benchmark. Retrieved from http://linux.die.net/man/1/fio.Google ScholarGoogle Scholar
  7. [7] Agrawal Nitin, Prabhakaran Vijayan, Wobber Ted, Davis John D., Manasse Mark S., and Panigrahy Rina. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Annual Technical Conference. 5770.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Bae Jinwook, Kim Hanbyeol, Im Junsu, and Lee Sungjin. 2019. Demand-based FTL cache partitioning for large capacity SSDs. IEMEK Journal of Embedded Systems and Applications 14, 2 (2019), 7178.Google ScholarGoogle Scholar
  9. [9] Benesty Jacob, Chen Jingdong, Huang Yiteng, and Cohen Israel. 2009. Pearson correlation coefficient. In Proceedings of the Noise Reduction in Speech Processing. Springer, 14.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Bhimani Janki, Leeser Miriam, and Mi Ningfang. 2015. Accelerating k-means clustering with parallel implementations and GPU computing. In Proceedings of the High Performance Extreme Computing Conference. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Bhimani J., Yang J., Yang Z., Mi N., Xu Q., Awasthi M., Pandurangan R, and Balakrishnan V.. 2016. Understanding performance of I/O intensive containerized applications for NVMe SSDs. In Proceedings of the 35th IEEE International Performance Computing and Communications Conference. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Bhimani Janki S., Yang Jingpei, Choi Changho, and Huo Jianjian. 2017. Smart I/O stream detection based on multiple attributes. (March 16 2017). US Patent App. 15/344,422.Google ScholarGoogle Scholar
  13. [13] Campello Daniel, Lopez Hector, Koller Ricardo, Rangaswami Raju, and Useche Luis. 2015. Non-blocking writes to files. In Proceedings of the 13th USENIX Conference on File and Storage Technologies. 151165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Cho Hyunjin, Shin Dongkun, and Eom Young Ik. 2009. KAST: K-associative sector translation for NAND flash memory in real-time systems. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. European Design and Automation Association, 507512.Google ScholarGoogle Scholar
  15. [15] Choi Changho. Multi-Stream Write SSD: Increasing SSD Performance and Lifetime with Multi-Stream Write Technology. 2020. Retrieved 18 Dec., 2020 from http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2016/20160809_FC12_Choi.pdf.Google ScholarGoogle Scholar
  16. [16] Desnoyers Peter. 2012. Analytic modeling of SSD write performance. In Proceedings of the 5th Annual International Systems and Storage Conference. ACM, 12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] DuBois Paul and By-Widenius Michael Foreword. 1999. MySQL. New riders publishing.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Fischer Stephen G, Choi Changho, Martineau Jason, and Pandurangan Rajinikanth. 2018. Methods for multi-stream garbage collection. (October 25 2018). US Patent App. 15/821,708.Google ScholarGoogle Scholar
  19. [19] Gessert Felix, Wingerath Wolfram, Friedrich Steffen, and Ritter Norbert. 2017. NoSQL database systems: A survey and decision guidance. Computer Science-Research and Development 32, 3–4 (2017), 353365.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Hingkwan HUEN and Choi Changho. 2019. Method of consolidate data streams for multi-stream enabled ssds. (May 2 2019). US Patent App. 16/219,936.Google ScholarGoogle Scholar
  21. [21] Hingkwan HUEN, Choi Changho, Tseng Derrick, and Huo Jianjian. 2020. Multi-stream SSD QoS management. (March 17 2020). US Patent 10,592,171.Google ScholarGoogle Scholar
  22. [22] Idreos S., Groffen F., Nes N., Manegold S., Mullender S., and Kersten M.. 2012. Monetdb: Two decades of research in column-oriented database. IEEE Data Engineering Bulletin 35, 1 (2012), 40–45.Google ScholarGoogle Scholar
  23. [23] Kang Jeong-Uk, Hyun Jeeseok, Maeng Hyunjoo, and Cho Sangyeun. 2014. The multi-streamed solid-state drive. In Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File Systems. USENIX Association, Philadelphia, PA. Retrieved from https://www.usenix.org/conference/hotstorage14/workshop-program/presentation/kang.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Kim Jesung, Kim Jong Min, Noh Sam H., Min Sang Lyul, and Cho Yookun. 2002. A space-efficient flash translation layer for CompactFlash systems. IEEE Transactions on Consumer Electronics 48, 2 (2002), 366375.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Kim Taejin, Hahn Sangwook Shane, Lee Sungjin, Hwang Jooyoung, Lee Jongyoul, and Kim Jihong. 2018. PCStream: Automatic stream allocation using program contexts. In Proceedings of the 10th USENIX Workshop on Hot Topics in Storage and File Systems.Google ScholarGoogle Scholar
  26. [26] Kim Taejin, Hong Duwon, Hahn Sangwook Shane, Chun Myoungjun, Lee Sungjin, Hwang Jooyoung, Lee Jongyoul, and Kim Jihong. 2019. Fully automatic stream management for multi-streamed ssds using program contexts. In Proceedings of the 17th USENIX Conference on File and Storage Technologies. 295308.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Kuzmin Andrey V. and Wayda James G.. 2016. Multi-array operation support and related devices, systems and software. (January 5 2016). US Patent 9,229,854.Google ScholarGoogle Scholar
  28. [28] Lee Sungjin, Shin Dongkun, Kim Young-Jin, and Kim Jihong. 2008. LAST: Locality-aware sector translation for NAND flash memory-based storage systems. ACM SIGOPS Operating Systems Review 42, 6 (2008), 3642.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Lee Sang-Won, Park Dong-Joo, Chung Tae-Sun, Lee Dong-Ho, Park Sangwon, and Song Ha-Joo. 2007. A log buffer-based flash translation layer using fully-associative sector translation. ACM Transactions on Embedded Computing Systems 6, 3 (2007), 18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Luo Yixin, Cai Yu, Ghose Saugata, Choi Jongmoo, and Mutlu Onur. 2015. WARM: Improving NAND flash memory lifetime with write-hotness aware retention management. In Proceedings of the 2015 31st Symposium on Mass Storage Systems and Technologies. IEEE, 114.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Menon Prashanth, Rabl Tilmann, Sadoghi Mohammad, and Jacobsen Hans-Arno. 2014. CaSSanDra: An SSD boosted key-value store. In Proceedings of the 2014 IEEE 30th International Conference on Data Engineering. IEEE, 11621167.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Min Changwoo, Kim Kangnyeon, Cho Hyunjin, Lee Sang-Won, and Eom Young Ik. 2012. SFS: Random write considered harmful in solid state drives. In Proceedings of the 10th USENIX conference on File and Storage Technologies. 12.Google ScholarGoogle Scholar
  33. [33] Naeini Mahdi Pakdaman and Cooper Gregory F.. 2018. Binary classifier calibration using an ensemble of piecewise linear regression models. Knowledge and Information Systems 54, 1 (2018), 151170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Narayanan D., Donnelly A., and Rowstron A.. 2008. Write off-loading: Practical power management for enterprise storage. ACM Transactions on Storage 4, 3 (2008), 10:1–10:23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Park Hyun-Woo, Choi Soyee, An Mijin, and Lee Sang-Won. 2019. Freezing frozen pages with multi-stream SSDs. In Proceedings of the 15th International Workshop on Data Management on New Hardware. 13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Xie Wei, Chen Yong, and Roth Philip C.. 2016. Parallel-DFTL: A flash translation layer that exploits internal parallelism in solid state drives. In Proceedings of the 2016 IEEE International Conference on Networking, Architecture and Storage. IEEE, 110.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Yang Fei, Dou Kun, Chen Siyu, Hou Mengwei, Kang Jeong-Uk, and Cho Sangyeun. 2015. Optimizing nosql db on flash: A case study of rocksdb. In Proceedings of the 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops. IEEE, 10621069.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Yang Fei, Dou Kun, Chen Siyu, Kang Jeong-Uk, and Cho Sangyeun. 2015. Multi-streaming RocksDB. In Proceedings of the Non-Volatile Memories Workshop.Google ScholarGoogle Scholar
  39. [39] Yang Jingpei, Pandurangan Rajinikanth, Choi Changho, and Balakrishnan Vijay. 2017. AutoStream: Automatic stream management for multi-streamed SSDs. In Proceedings of the 10th ACM International Systems and Storage Conference. 111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Yang Pan, Xue Ni, Zhang Yuqi, Zhou Yangxu, Sun Li, Chen Wenwen, Chen Zhonggang, Xia Wei, Li Junke, and Kwon Kihyoun. 2019. Reducing garbage collection overhead in \( \lbrace \)SSD\( \rbrace \) based on workload prediction. In Proceedings of the 11th USENIX Workshop on Hot Topics in Storage and File Systems.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Yang Zhengyu, Awasthi Manu, Ghosh Mrinmoy, and Mi Ningfang. 2016. A fresh perspective on total cost of ownership models for flash storage. In Proceedings of the 2016 IEEE 8th International Conference on Cloud Computing Technology and Science. IEEE.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Yong Hwanjin, Jeong Kisik, Lee Joonwon, and Kim Jin-Soo. 2018. vStream: Virtual stream management for multi-streamed SSDs. In Proceedings of the 10th USENIX Workshop on Hot Topics in Storage and File Systems.Google ScholarGoogle Scholar
  43. [43] Zhang Peiyong and Tang Huanjie. 2020. High-efficient superblock flash translation layer for NAND flash controller. Electronics Letters 56, 6 (2020), 278280.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Automatic Stream Identification to Improve Flash Endurance in Data Centers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Storage
      ACM Transactions on Storage  Volume 18, Issue 2
      May 2022
      248 pages
      ISSN:1553-3077
      EISSN:1553-3093
      DOI:10.1145/3522733
      • Editor:
      • Sam H. Noh
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 April 2022
      • Online AM: 29 March 2022
      • Accepted: 1 June 2021
      • Revised: 1 March 2021
      • Received: 1 July 2020
      Published in tos Volume 18, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)332
      • Downloads (Last 6 weeks)19

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!