Abstract
The demand for high performance I/O in Storage-as-a-Service (SaaS) is increasing day by day. To address this demand, NAND Flash-based Solid-state Drives (SSDs) are commonly used in data centers as cache- or top-tiers in the storage rack ascribe to their superior performance compared to traditional hard disk drives (HDDs). Meanwhile, with the capital expenditure of SSDs declining and the storage capacity of SSDs increasing, all-flash data centers are evolving to serve cloud services better than SSD-HDD hybrid data centers. During this transition, the biggest challenge is how to reduce the Write Amplification Factor (WAF) as well as to improve the endurance of SSD since this device has a limited program/erase cycles. A specified case is that storing data with different lifetimes (i.e., I/O streams with similar temporal fetching patterns such as reaccess frequency) in one single SSD can cause high WAF, reduce the endurance, and downgrade the performance of SSDs. Motivated by this, multi-stream SSDs have been developed to enable data with a different lifetime to be stored in different SSD regions. The logic behind this is to reduce the internal movement of data—when garbage collection is triggered, there are high chances of having data blocks with either all the pages being invalid or valid. However, the limitation of this technology is that the system needs to manually assign the same streamID to data with a similar lifetime. Unfortunately, when data arrives, it is not known how important this data is and how long this data will stay unmodified. Moreover, according to our observation, with different definitions of a lifetime (i.e., different calculation formulas based on selected features previously exhibited by data, such as sequentiality, and frequency), streamID identification may have varying impacts on the final WAF of multi-stream SSDs. Thus, in this article, we first develop a portable and adaptable framework to study the impacts of different workload features and their combinations on write amplification. We then propose a feature-based stream identification approach, which automatically co-relates the measurable workload attributes (such as I/O size, I/O rate, and so on.) with high-level workload features (such as frequency, sequentiality, and so on.) and determines a right combination of workload features for assigning streamIDs. Finally, we develop an adaptable stream assignment technique to assign streamID for changing workloads dynamically. Our evaluation results show that our automation approach of stream detection and separation can effectively reduce the WAF by using appropriate features for stream assignment with minimal implementation overhead.
- [1] Multi-Stream Technology. 2020. Retrieved 15 March, 2020 from http://www.samsung.com/semiconductor/insights/article/25465/multistream.Google Scholar
- [2] Performance and Endurance Enhancements with Multi-stream SSDs on Apache Cassandra. 2020. Retrieved 27 Jan., 2020 from https://www.samsung.com/semiconductor/global.semi.static/Multi-stream_Cassandra_Whitepaper_Final-0.pdf.Google Scholar
- [3] systemd. 2020. Retrieved 18 Dec., 2020 from http://manpages.ubuntu.com/manpages/bionic/man1/systemd.1.html.Google Scholar
- [4] UMass Trace Repository. 2020. Retrieved 18 Dec., 2020 from http://traces.cs.umass.edu/index.php/Storage/Storage.Google Scholar
- [5] (accessed January 13, 2017). SNIA Iotta Repository. Retrieved from http://iotta.snia.org/historical_section.Google Scholar
- [6] (accessed Septenber 7, 2016). FIO - flexible I/O benchmark. Retrieved from http://linux.die.net/man/1/fio.Google Scholar
- [7] . 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Annual Technical Conference. 57–70.Google Scholar
Digital Library
- [8] . 2019. Demand-based FTL cache partitioning for large capacity SSDs. IEMEK Journal of Embedded Systems and Applications 14, 2 (2019), 71–78.Google Scholar
- [9] . 2009. Pearson correlation coefficient. In Proceedings of the Noise Reduction in Speech Processing. Springer, 1–4.Google Scholar
Cross Ref
- [10] . 2015. Accelerating k-means clustering with parallel implementations and GPU computing. In Proceedings of the High Performance Extreme Computing Conference. IEEE, 1–6.Google Scholar
Cross Ref
- [11] . 2016. Understanding performance of I/O intensive containerized applications for NVMe SSDs. In Proceedings of the 35th IEEE International Performance Computing and Communications Conference. IEEE.Google Scholar
Cross Ref
- [12] . 2017. Smart I/O stream detection based on multiple attributes. (
March 16 2017).US Patent App. 15/344,422. Google Scholar - [13] . 2015. Non-blocking writes to files. In Proceedings of the 13th USENIX Conference on File and Storage Technologies. 151–165.Google Scholar
Digital Library
- [14] . 2009. KAST: K-associative sector translation for NAND flash memory in real-time systems. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition. European Design and Automation Association, 507–512.Google Scholar
- [15] . Multi-Stream Write SSD: Increasing SSD Performance and Lifetime with Multi-Stream Write Technology. 2020. Retrieved 18 Dec., 2020 from http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2016/20160809_FC12_Choi.pdf.Google Scholar
- [16] . 2012. Analytic modeling of SSD write performance. In Proceedings of the 5th Annual International Systems and Storage Conference. ACM, 12.Google Scholar
Digital Library
- [17] . 1999. MySQL. New riders publishing.Google Scholar
Digital Library
- [18] . 2018. Methods for multi-stream garbage collection. (
October 25 2018).US Patent App. 15/821,708. Google Scholar - [19] . 2017. NoSQL database systems: A survey and decision guidance. Computer Science-Research and Development 32, 3–4 (2017), 353–365.Google Scholar
Digital Library
- [20] . 2019. Method of consolidate data streams for multi-stream enabled ssds. (
May 2 2019).US Patent App. 16/219,936. Google Scholar - [21] . 2020. Multi-stream SSD QoS management. (
March 17 2020).US Patent 10,592,171. Google Scholar - [22] . 2012. Monetdb: Two decades of research in column-oriented database. IEEE Data Engineering Bulletin 35, 1 (2012), 40–45.Google Scholar
- [23] . 2014. The multi-streamed solid-state drive. In Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File Systems. USENIX Association, Philadelphia, PA. Retrieved from https://www.usenix.org/conference/hotstorage14/workshop-program/presentation/kang.Google Scholar
Digital Library
- [24] . 2002. A space-efficient flash translation layer for CompactFlash systems. IEEE Transactions on Consumer Electronics 48, 2 (2002), 366–375.Google Scholar
Digital Library
- [25] . 2018. PCStream: Automatic stream allocation using program contexts. In Proceedings of the 10th USENIX Workshop on Hot Topics in Storage and File Systems.Google Scholar
- [26] . 2019. Fully automatic stream management for multi-streamed ssds using program contexts. In Proceedings of the 17th USENIX Conference on File and Storage Technologies. 295–308.Google Scholar
Digital Library
- [27] . 2016. Multi-array operation support and related devices, systems and software. (
January 5 2016).US Patent 9,229,854. Google Scholar - [28] . 2008. LAST: Locality-aware sector translation for NAND flash memory-based storage systems. ACM SIGOPS Operating Systems Review 42, 6 (2008), 36–42.Google Scholar
Digital Library
- [29] . 2007. A log buffer-based flash translation layer using fully-associative sector translation. ACM Transactions on Embedded Computing Systems 6, 3 (2007), 18.Google Scholar
Digital Library
- [30] . 2015. WARM: Improving NAND flash memory lifetime with write-hotness aware retention management. In Proceedings of the 2015 31st Symposium on Mass Storage Systems and Technologies. IEEE, 1–14.Google Scholar
Cross Ref
- [31] . 2014. CaSSanDra: An SSD boosted key-value store. In Proceedings of the 2014 IEEE 30th International Conference on Data Engineering. IEEE, 1162–1167.Google Scholar
Cross Ref
- [32] . 2012. SFS: Random write considered harmful in solid state drives. In Proceedings of the 10th USENIX conference on File and Storage Technologies. 12.Google Scholar
- [33] . 2018. Binary classifier calibration using an ensemble of piecewise linear regression models. Knowledge and Information Systems 54, 1 (2018), 151–170.Google Scholar
Digital Library
- [34] . 2008. Write off-loading: Practical power management for enterprise storage. ACM Transactions on Storage 4, 3 (2008), 10:1–10:23.Google Scholar
Digital Library
- [35] . 2019. Freezing frozen pages with multi-stream SSDs. In Proceedings of the 15th International Workshop on Data Management on New Hardware. 1–3.Google Scholar
Digital Library
- [36] . 2016. Parallel-DFTL: A flash translation layer that exploits internal parallelism in solid state drives. In Proceedings of the 2016 IEEE International Conference on Networking, Architecture and Storage. IEEE, 1–10.Google Scholar
Cross Ref
- [37] . 2015. Optimizing nosql db on flash: A case study of rocksdb. In Proceedings of the 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops. IEEE, 1062–1069.Google Scholar
Cross Ref
- [38] . 2015. Multi-streaming RocksDB. In Proceedings of the Non-Volatile Memories Workshop.Google Scholar
- [39] . 2017. AutoStream: Automatic stream management for multi-streamed SSDs. In Proceedings of the 10th ACM International Systems and Storage Conference. 1–11.Google Scholar
Digital Library
- [40] . 2019. Reducing garbage collection overhead in \( \lbrace \)SSD\( \rbrace \) based on workload prediction. In Proceedings of the 11th USENIX Workshop on Hot Topics in Storage and File Systems.Google Scholar
Digital Library
- [41] . 2016. A fresh perspective on total cost of ownership models for flash storage. In Proceedings of the 2016 IEEE 8th International Conference on Cloud Computing Technology and Science. IEEE.Google Scholar
Cross Ref
- [42] . 2018. vStream: Virtual stream management for multi-streamed SSDs. In Proceedings of the 10th USENIX Workshop on Hot Topics in Storage and File Systems.Google Scholar
- [43] . 2020. High-efficient superblock flash translation layer for NAND flash controller. Electronics Letters 56, 6 (2020), 278–280.Google Scholar
Cross Ref
Index Terms
Automatic Stream Identification to Improve Flash Endurance in Data Centers
Recommendations
Exploiting flash memory characteristics to improve performance of RAIS storage systems
Redundant array of independent SSDs (RAIS) is generally based on the traditional RAID design and implementation. The random small write problem is a serious challenge of RAIS. Random small writes in parity-based RAIS systems generate significantly more ...
Write amplification analysis in flash-based solid state drives
SYSTOR '09: Proceedings of SYSTOR 2009: The Israeli Experimental Systems ConferenceWrite amplification is a critical factor limiting the random write performance and write endurance in storage devices based on NAND-flash memories such as solid-state drives (SSD). The impact of garbage collection on write amplification is influenced by ...
Performance of garbage collection algorithms for flash-based solid state drives with hot/cold data
To avoid a poor random write performance, flash-based solid state drives typically rely on an internal log-structure. This log-structure reduces the write amplification and thereby improves the write throughput and extends the drive's lifespan. In this ...






Comments