Abstract
We introduce ZNSwap , a novel swap subsystem optimized for the recent Zoned Namespace (ZNS) SSDs. ZNSwap leverages ZNS’s explicit control over data management on the drive and introduces a space-efficient host-side Garbage Collector (GC) for swap storage co-designed with the OS swap logic. ZNSwap enables cross-layer optimizations, such as direct access to the in-kernel swap usage statistics by the GC to enable fine-grain swap storage management, and correct accounting of the GC bandwidth usage in the OS resource isolation mechanisms to improve performance isolation in multi-tenant environments. We evaluate ZNSwap using standard Linux swap benchmarks and two production key-value stores. ZNSwap shows significant performance benefits over the Linux swap on traditional SSDs, such as stable throughput for different memory access patterns, and 10× lower 99th percentile latency and 5× higher throughput for
- [1] 2009. Swapfile: swap allocation use discard.Retrieved from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7992fde72ce06c73280a1939b7a1e903bc95ef85.Google Scholar
- [2] 2016. Making swapping scalable.Retrieved from https://lwn.net/Articles/704478/.Google Scholar
- [3] 2016. Reconsidering swapping.Retrieved from https://lwn.net/Articles/690079/.Google Scholar
- [4] 2018. NVM Express 2.0 Zoned Namespace Command Set Specification. Retrieved from https://nvmexpress.org/specifications.Google Scholar
- [5] 2019. SAMSUNG. Ultra-low latency with Samsung Z-NAND SSD. Retrieved from http://www.samsung.com/us/labs/pdfs/collateral/Samsung_ZNAND_Technology_Brief_v5.pdf.Google Scholar
- [6] 2020. Swap: try to scan more free slots even when fragmented.Retrieved from https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed43af10975eef7e21abbb81297d9735448ba4fa.Google Scholar
- [7] 2021. Archlinux SSD Optimizations. Retrieved from https://wiki.archlinux.org/title/Solid_state_drive##Continuous_TRIM.Google Scholar
- [8] 2021. cloc: Count lines of code.Retrieved from https://github.com/AlDanial/cloc.Google Scholar
- [9] 2021. Debian SSD Optimizations. Retrieved from https://wiki.debian.org/SSDOptimization##Mounting_SSD_filesystems.Google Scholar
- [10] 2021. Facebook cgroupv2 memory controller. Retrieved from https://facebookmicrosites.github.io/cgroup2/docs/memory-controller.html.Google Scholar
- [11] 2021. Kioxia’s PCIe 5.0 SSD Just Hit 14,000 MBps. Retrieved from https://www.tomshardware.com/news/kioxia-pcie-5-ssd-just-hit-140000-mbps.Google Scholar
- [12] 2021. LKP. https://01.org/lkp/.Google Scholar
- [13] 2021. Memcg backend asynchronous reclaim. Retrieved from https://partners-intl.aliyun.com/help/doc-detail/169535.htm.Google Scholar
- [14] 2021. Multi-generational LRU: the next generation.Retrieved from https://lwn.net/Articles/856931/.Google Scholar
- [15] 2021. OpenStack: Overcommitting CPU and RAM. Retrieved from https://docs.openstack.org/arch-design/design-compute.Google Scholar
- [16] 2021. Red Hat: Discarding Unused Blocks. Retrieved from https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_file_systems/discarding-unused-blocks_managing-file-systems.Google Scholar
- [17] 2021. Redis. Retrieved from https://redis.io.Google Scholar
- [18] 2021. Redis on Flash. Retrieved from https://redis.com/redis-enterprise/technology/redis-on-flash/.Google Scholar
- [19] 2021. Solid State Storage Performance Test Specification. Retrieved from https://www.snia.org/sites/default/files/technical_work/PTS/SSS_PTS_2.0.2.pdf.Google Scholar
- [20] 2021. Swap file on Amazon EC2. Retrieved from https://aws.amazon.com/premiumsupport/knowledge-center/ec2-memory-swap-file/.Google Scholar
- [21] 2021. Swap space on Amazon EC2. Retrieved from https://aws.amazon.com/premiumsupport/knowledge-center/ec2-memory-partition-hard-drive/.Google Scholar
- [22] 2021. swapon(8) Linux man pages. Retrieved from https://man7.org/linux/man-pages/man8/swapon.8.html.Google Scholar
- [23] 2021. Ubuntu: TRIM the swap partition. Retrieved from https://wiki.ubuntuusers.de/SSD/TRIM/##TRIM-der-Swap-Partition.Google Scholar
- [24] 2021. vm-scalability. Retrieved from https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git.Google Scholar
- [25] . 2012. Workload analysis of a large-scale key-value store. In Proceedings of the ACM SIGMETRICS Performance Evaluation Review. ACM, 53–64.Google Scholar
- [26] . 2020. Zone append: A new way of writing to zoned storage. In Proceedings of the Vault Linux Storage and Filesystems Conference. USENIX Association, Santa Clara, CA.Google Scholar
- [27] . 2021. ZNS: Avoiding the block interface tax for flash-based SSDs. In Proceedings of the 2021 usenix Annual Technical Conference.Google Scholar
- [28] . 2017. LightNVM: The linux open-channel SSD subsystem. In Proceedings of the 15th USENIX Conference on File and Storage Technologies FAST 17. 359–374.Google Scholar
- [29] . 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143–154.Google Scholar
- [30] . 2014. Analytic models of SSD write performance. ACM Transactions on Storage 10, 2 (2014), 1–25.Google Scholar
- [31] . 2004. Distributed caching with memcached. Linux Journal 2004, 124 (2004), 5.Google Scholar
- [32] . 2016. Application-driven flash translation layers on open-channel SSDs. In Proceedings of the 7th non Volatile Memory Workshop. 1–2.Google Scholar
- [33] . 2012. The bleak future of NAND flash memory. In Proceedings of the FAST. 10–2.Google Scholar
- [34] . 2009. DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings. ACM SIGPLAN Notices 44, 3 (2009), 229–240.Google Scholar
- [35] . 2021. dm-zap: Host-based FTL for ZNS SSDs. Retrieved from https://github.com/westerndigitalcorporation/dm-zap.Google Scholar
- [36] . 2009. Write amplification analysis in flash-based solid state drives. In Proceedings of the SYSTOR 2009: The Israeli Experimental Systems Conference. 1–9.Google Scholar
- [37] . 2011. To TRIM or not to TRIM: Judicious triming for solid state drives. In Proceedings of the Poster Presentation in the 23rd ACM Symposium on Operating Systems Principles.Google Scholar
- [38] Junsu Im, Jinwook Bae, Chanwoo Chung, Arvind, and Sungjin Lee. 2020. PinK: High-speed In-storage key-value store with bounded tails. In Proceeding of the USENIX Annual Technical Conference (USENIX ATC’20).Google Scholar
- [39] . 2011. S-FTL: An efficient address translation for flash memory by exploiting spatial locality. In Proceedings of the 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies. IEEE, 1–12.Google Scholar
- [40] . 2005. Fass: A flash-aware swap system. In Proceedings of the International Workshop on Software Support for Portable Storage. Citeseer.Google Scholar
- [41] . 2017. TO FLUSH or NOT: Zero padding in the file system with SSD devices. In Proceedings of the 8th Asia-Pacific Workshop on Systems. 1–9.Google Scholar
- [42] . 2014. The multi-streamed solid-state drive. In Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File Systems.Google Scholar
- [43] . 2008. A new linux swap system for flash memory storage devices. In Proceedings of the 2008 International Conference on Computational Sciences and Its Applications. IEEE, 151–156.Google Scholar
- [44] . 2020. A case for hardware-based demand paging. In Proceedings of the 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture. IEEE, 1103–1116.Google Scholar
- [45] . 2013. An empirical study of hot/cold data separation policies in solid state drives (SSDs). In Proceedings of the 6th International Systems and Storage Conference. 1–6.Google Scholar
- [46] . 2014. Performance evaluation of the SSD-based swap system for big data processing. In Proceedings of the 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications. IEEE, 673–680.Google Scholar
- [47] . 2014. Mutilate: high-performance memcached load generator. https://github.com/leverich/mutilate.Google Scholar
- [48] . 2021. Leveraging NVMe SSDs for building a fast, cost-effective, LSM-tree-based KV Store. ACM Transactions on Storage 17, 4 (2021), 1–29.Google Scholar
- [49] . 2012. Flash-aware linux swap system for portable consumer electronics. IEEE Transactions on Consumer Electronics 58, 2 (2012), 419–427.Google Scholar
- [50] . 2012. Greedy page replacement algorithm for flash-aware swap system. IEEE Transactions on Consumer Electronics 58, 2 (2012), 435–440.Google Scholar
- [51] . 2017. dm-zoned: Zoned Block Device device mapper. Retrieved from https://lwn.net/Articles/714387/.Google Scholar
- [52] . 2016. I/O characteristics of MongoDB and trim-based optimization in flash SSDs. In Proceedings of the 6th International Conference on Emerging Databases: Technologies, Applications, and Theory. 139–144.Google Scholar
- [53] 2018. Scaling flash technology to meet application demands. Keynote 3 at Flash Memory Summit 2018.Google Scholar
- [54] . 2019. Automating context-based access pattern hint injection for system performance and swap storage durability. In Proceedings of the 11th USENIX Workshop on Hot Topics in Storage and File Systems.Google Scholar
- [55] . 2006. CFLRU: A replacement algorithm for flash memory. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems. 234–241.Google Scholar
- [56] . 2010. FlashVM: Virtual memory management on Flash. In Proceedings of the USENIX Annual Technical Conference.Google Scholar
- [57] . 2019. Enhanced flash swap: Using NAND flash as a swap device with lifetime control. In Proceedings of the 2019 IEEE International Conference on Consumer Electronics. IEEE, 1–5.Google Scholar
- [58] . 2013. Improving flash write performance by using update frequency. Proceedings of the VLDB Endowment 6, 9 (2013), 733–744.Google Scholar
- [59] . 2013. Performance of garbage collection algorithms for flash-based solid state drives with hot/cold data. Performance Evaluation 70, 10 (2013), 692–703.Google Scholar
- [60] . 2016. Bluecache: A Scalable Distributed Flash-based Key-value Store. Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
- [61] . 2015. It’s not where your data is, it’s how it got there. In Proceedings of the 7th {\(USENIX\)} Workshop on Hot Topics in Storage and File Systems.Google Scholar
- [62] . 2017. AutoStream: Automatic stream management for multi-streamed SSDs. In Proceedings of the 10th ACM International Systems and Storage Conference. 1–11.Google Scholar
- [63] . 2018. Pmbench: A micro-benchmark for profiling paging performance on a system with low-latency SSDs. In Proceedings of the Information Technology-New Generations. Springer, 627–633.Google Scholar
- [64] . 2017. FlashKV: Accelerating KV performance with open-channel SSDs. ACM Transactions on Embedded Computing Systems 16, 5s (2017), 1–19.Google Scholar
Index Terms
ZNSwap: un-Block your Swap
Recommendations
Accelerating RocksDB for small-zone ZNS SSDs by parallel I/O mechanism
Middleware Industrial Track '22: Proceedings of the 23rd International Middleware Conference Industrial TrackZoned Namespace (ZNS) is a novel storage interface that offers logical zones aligned to a physical media (e.g., NAND flash) to the host system. Due to the out-of-place updating scheme of the physical media, a log-structured merged-tree (LSM-Tree)-based ...
Performance Evaluation of the SSD-Based Swap System for Big Data Processing
TRUSTCOM '14: Proceedings of the 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and CommunicationsSolid State Drives (SSDs) are quickly replacing HDDs not only in laptops but also in servers. Since SSD uses semiconductor, i.e., NAND flash memory, as its storage medium, it locates itself between memory and storage: it is faster but more expensive ...
HPDA: A hybrid parity-based disk array for enhanced performance and reliability
Flash-based Solid State Drive (SSD) has been productively shipped and deployed in large scale storage systems. However, a single flash-based SSD cannot satisfy the capacity, performance and reliability requirements of the modern storage systems that ...






Comments