Abstract
After request completion, an I/O device must decide whether to minimize latency by immediately firing an interrupt or to optimize for throughput by delaying the interrupt, anticipating that more requests will complete soon and help amortize the interrupt cost. Devices employ adaptive interrupt coalescing heuristics that try to balance between these opposing goals. Unfortunately, because devices lack the semantic information about which I/O requests are latency-sensitive, these heuristics can sometimes lead to disastrous results.
Instead, we propose addressing the root cause of the heuristics problem by allowing software to explicitly specify to the device if submitted requests are latency-sensitive. The device then “calibrates” its interrupts to completions of latency-sensitive requests. We focus on NVMe storage devices and show that it is natural to express these semantics in the kernel and the application and only requires a modest two-bit change to the device interface. Calibrated interrupts increase throughput by up to 35%, reduce CPU consumption by as much as 30%, and achieve up to 37% lower latency when interrupts are coalesced.
- [1] Administration and Data Access Tool. https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool. ([n.d.]).
Accessed: May 2021. Google Scholar - [2] . 2011. vIC: Interrupt coalescing for virtual machine storage device IO. In USENIX Annual Technical Conference (USENIX ATC).Google Scholar
- [3] . 2012. Less is more: Trading a little bandwidth for ultra-low latency in the data center. In USENIX Symposium on Networked Systems Design and Implementation (NSDI).Google Scholar
- [4] . Flexible I/O Tester. https://github.com/axboe/fio. ([n.d.]).
Accessed: May 2021. Google Scholar - [5] . 2016. Linux Kernel Mailing List, BLK-MQ: Make the Polling Code Adaptive. https://lkml.org/lkml/2016/11/3/548. (2016).
Accessed: May 2021. Google Scholar - [6] . 2019. Linux Kernel Mailing List, Blk-mq: Adjust Hybrid Poll Sleep Time. https://lkml.org/lkml/2019/4/30/120. (2019).
Accessed: May 2021. Google Scholar - [7] . 2014. IX: A protected dataplane operating system for high throughput and low latency. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google Scholar
- [8] Benchmarking Tools. https://github.com/facebook/rocksdb/wiki/Benchmarking-tools. ([n.d.]).
Accessed: May 2021. Google Scholar - [9] . 2013. Linux block IO: Introducing multi-queue SSD access on multi-core systems. In International Systems and Storage Conference (SYSTOR).Google Scholar
Digital Library
- [10] Block IO Controller. https://www.kernel.org/doc/Documentation/cgroup-v1/blkio-controller.txt. ([n.d.]).
Accessed: May 2021. Google Scholar - [11] . 2019. Linux NVMe Mailing List: Nvme-PEI Interrupt Handling Improvements. https://lore.kernel.org/linux-nvme/[email protected]/. (2019).
Accessed: May 2021. Google Scholar - [12] . 2021. Cisco ASA Series Command Reference: Urgent-flag. https://www.cisco.com/c/en/us/td/docs/security/asa/asa-cli-reference/T-Z/asa-command-ref-T-Z/u-commands.html#wp2606000884. (2021).
Accessed: May 2021. Google Scholar - [13] . 2020. SplinterDB: Closing the bandwidth gap for NVMe key-value stores. In USENIX Annual Technical Conference (USENIX ATC).Google Scholar
- [14] . 2010. Benchmarking cloud serving systems with YCSB. In 1st ACM Symposium on Cloud Computing (SoCC).Google Scholar
- [15] . Batch Processing of Network Packets. https://lwn.net/Articles/763056/. ([n.d.]).
Accessed: May 2021. Google Scholar - [16] . Driver Porting: Network Drivers. https://lwn.net/Articles/30107/. ([n.d.]).
Accessed: May 2021. Google Scholar - [17] . Intel Optane SSD DC D4800X Product Brief. https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/optane-ssd-dc-d4800x-product-brief.pdf. ([n.d.]).
Accessed: May 2021. Google Scholar - [18] . Intel Optane Technology for Data Centers. https://www.intel.com/content/www/us/en/architecture-and-technology/optane-technology/optane-for-data-centers.html. ([n.d.]).
Accessed: May 2021. Google Scholar - [19] . 2012. Intel Data Direct I/O Technology (Intel DDIO): A Primer. https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf. (2012).
Accessed: May 2021. Google Scholar - [20] . 2014. Intel Ethernet Converged Network Adapter XL710. https://ark.intel.com/content/www/us/en/ark/products/83967/intel-ethernet-converged-network-adapter-xl710-qda2.html. (2014).
Accessed: May 2021. Google Scholar - [21] . 2014. Intel SSD DC P3700 Series. https://ark.intel.com/content/www/us/en/ark/products/79624/intel-ssd-dc-p3700-series-400gb-1-2-height-pcie-3-0-20nm-mlc.html. (2014).
Accessed: May 2021. Google Scholar - [22] . 2017. Intel Optane SSD 900P Series. https://ark.intel.com/content/www/us/en/ark/products/123623/intel-optane-ssd-900p-series-280gb-2-5in-pcie-x4-20nm-3d-xpoint.html. (2017).
Accessed: May 2021. Google Scholar - [23] . 2017. Intel Optane SSD DC P4800X Series. https://ark.intel.com/content/www/us/en/ark/products/97161/intel-optane-ssd-dc-p4800x-series-375gb-2-5in-pcie-x4-3d-xpoint.html. (2017).
Accessed: May 2021. Google Scholar - [24] . 2018. Intel Optane SSD DC P5800X Series. https://ark.intel.com/content/www/us/en/ark/products/201861/intel-optane-ssd-dc-p5800x-series-400gb-2-5in-pcie-x4-3d-xpoint.html. (2018).
Accessed: May 2021. Google Scholar - [25] . 2019. Intel SSD DC P4618 Series. https://ark.intel.com/content/www/us/en/ark/products/192574/intel-ssd-dc-p4618-series-6-4tb-1-2-height-pcie-3-1-x8-3d2-tlc.html. (2019).
Accessed: May 2021. Google Scholar - [26] . 2019. Microsoft Documentation: Optimize Performance on the Lsv2-series Virtual Machines. https://docs.microsoft.com/en-us/azure/virtual-machines/windows/storage-performance. (2019).
Accessed: May 2021. Google Scholar - [27] . 2011. TCP/IP Illustrated, Volume 1: The Protocols. Addison-Wesley.Google Scholar
- [28] . 2012. Survey of Security Hardening Methods for Transmission Control Protocol (TCP) Implementations.
Technical Report . Internet Engineering Task Force. https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-tcp-security-03Work in Progress. Google Scholar - [29] . 2011. On the Implementation of the TCP Urgent Mechanism.
RFC 768. Internet Engineering Task Force.Google ScholarCross Ref
- [30] . 2017. KASLR is dead: Long live KASLR. In International Symposium on Engineering Secure Software and Systems (ESSoS).Google Scholar
- [31] Hardware Vulnerabilities, The Linux Kernel User’s and Administrator’s Guide. https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/index.html. ([n.d.]).
Accessed: May 2021. Google Scholar - [32] . 2013. Fibre Channel over Ethernet (FCoE). https://www.snia.org/educational-library/fibre-channel-over-ethernet-fcoe-2013-2013. (2013).
Accessed: May 2021. Google Scholar - [33] . 2014. DPDK: Data Plane Development Kit. https://www.dpdk.org. (2014).
Accessed: May 2021. Google Scholar - [34] . 1995. Netperf: A Network Performance Benchmark. https://github.com/HewlettPackard/netperf. (1995).
Accessed: May 2021. Google Scholar - [35] . 2012. Chronos: Predictable low latency for data center applications. In 3rd ACM Symposium on Cloud Computing (SoCC).Google Scholar
- [36] . 1998. Security Architecture for the Internet Protocol.
RFC 2401. Internet Engineering Task Force. 66 pages. http://www.rfc-editor.org/rfc/rfc2401.txt.Google Scholar - [37] . 2017. Managing array of SSDs when the storage device is no longer the performance bottleneck. In USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage).Google Scholar
- [38] . 2016. NVMeDirect: A user-space I/O framework for application-specific optimization on NVMe SSDs. In USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage).Google Scholar
- [39] . 2017. Enlightening the I/O path: A holistic approach for application performance. In USENIX Conference on File and Storage Technologies (FAST).Google Scholar
- [40] . 2018. Wasted Processing Time due to NVMe Interrupts. https://github.com/scylladb/seastar/issues/507. (2018).
Accessed: May 2021. Google Scholar - [41] . 2019. Faster than flash: An in-depth study of system challenges for emerging ultra-low latency SSDs. In IEEE International Symposium on Workload Characterization (IISWC).Google Scholar
- [42] . 2018. Exploring system challenges of ultra-low latency solid state drives. In USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage).Google Scholar
- [43] . 2019. Reaping the performance of fast NVM storage with uDepot. In USENIX Conference on File and Storage Technologies (FAST).Google Scholar
- [44] . The TCP/IP Guide. http://www.tcpipguide.com/free/t_IPDatagramOptionsandOptionFormat.htm. ([n.d.]).
Accessed: May, 2021. Google Scholar - [45] . 2017. I/O latency optimization with polling. In Linux Storage and Filesystems Conference (VAULT) (2017).Google Scholar
- [46] . 2019. Asynchronous I/O stack: A low-latency kernel I/O stack for ultra-low latency SSDs. In USENIX Annual Technical Conference (USENIX ATC).Google Scholar
- [47] . 2019. Linux-NVMe Mailing List: NVMe-PCI: Check CQ after batch submission for Microsoft device. https://lore.kernel.org/linux-nvme/[email protected]/. (2019).
Accessed: May 2021. Google Scholar - [48] . 2019. KVell: The design and implementation of a fast persistent key-value store. In ACM Symposium on Operating Systems Principles (SOSP).Google Scholar
- [49] . 2014. Reconciling high server utilization and sub-millisecond quality-of-service. In European Conference on Computer Systems (EuroSys).Google Scholar
Digital Library
- [50] . 2019. Linux kernel mailing list: Fix interrupt swamp in NVMe. https://lkml.org/lkml/2019/8/20/45. (2019).
Accessed: May 2021. Google Scholar - [51] . 2021. Segmentation Offloads. https://www.kernel.org/doc/html/latest/networking/segmentation-offloads.html. (2021).
Accessed: May 2021. Google Scholar - [52] . 2020. How To Enable Large Receive Offload (LRO). https://community.mellanox.com/s/article/how-to-enable-large-receive-offload--lro-x. (2020).
Accessed: May 2021. Google Scholar - [53] . 2020. “Calibrate”. https://www.merriam-webster.com/dictionary/calibrate. (2020).
Accessed: May 2021. Google Scholar - [54] . 2018. OOB Data in TCP. https://docs.microsoft.com/en-us/windows/win32/winsock/protocol-independent-out-of-band-data-2#oob-data-in-tcp. (2018).
Accessed: May 2021. Google Scholar - [55] . 1996. Eliminating receive livelock in an interrupt-driven kernel. In USENIX Annual Technical Conference (ATC).Google Scholar
- [56] . 2017. Comparison of accelerator coherency port (ACP) and high performance port (HP) for data transfer in DDR memory using Xilinx ZYNQ SoC. In International Conference on Information and Communication Technology for Intelligent Systems (ICTIS).Google Scholar
- [57] NVM Express, Revision 1.3. https://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf. ([n.d.]).
Accessed: May2021. Google Scholar - [58] NVM Express, Revision 1.4, Figure 284. https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf. ([n.d.]).
Accessed: May 2021. Google Scholar - [59] . 2018. How to Preserve the TCP URG Flag and Pointer.https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClWACA0. (2018).
Accessed: May 2021. Google Scholar - [60] . 2016. Tucana: Design and implementation of a fast and efficient scale-up key-value store. In USENIX Annual Technical Conference (USENIX ATC).Google Scholar
- [61] . 2014. Arrakis: The operating system is the control plane. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google Scholar
- [62] . 2021. Autonomous NIC offloads. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).Google Scholar
Digital Library
- [63] preadv2(2) – Linux Manual Page. https://man7.org/linux/man-pages/man2/preadv2.2.html. ([n.d.]).
Accessed: May, 2021. Google Scholar - [64] . . https://github.com/facebook/rocksdb. ([n.d.]).
Accessed: May 2021. Google Scholar - [65] . 2017. Samsung SSD 850 PRO. https://www.samsung.com/semiconductor/minisite/ssd/product/consumer/850pro. (2017).Google Scholar
- [66] . 2014. OS I/O path optimizations for flash solid-state drives. In USENIX Annual Technical Conference (USENIX ATC).Google Scholar
- [67] . 2010. FlexSC: Flexible system call scheduling with exception-less system calls. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google Scholar
- [68] SPDK: Storage Performance Development Kit. https://spdk.io/. ([n.d.]).
Accessed: May 2021. Google Scholar - [69] . Ethernet: The Definitive Guide, 2nd Edition. https://www.oreilly.com/library/view/ethernet-the-definitive/9781449362980/ch04.html. ([n.d.]).
Accessed: May, 2021. Google Scholar - [70] . 2013. Refactor, reduce, recycle: Restructuring the IO stack for the future of storage. Computer (2013).Google Scholar
Digital Library
- [71] . 2017. Intel Optane SSD DC P4800X 750GB Hands-On Review. https://www.anandtech.com/show/11930/intel-optane-ssd-dc-p4800x-750gb-handson-review/3. (2017).
Accessed: May 2021. Google Scholar - [72] . 2018. Mellanox ConnectX-5 VPI Adapter. https://www.mellanox.com/files/doc-2020/pb-connectx-5-vpi-card.pdf. (2018).
Accessed: May 2021. Google Scholar - [73] . RoCE Is RDMA over Converged Ethernet. https://www.roceinitiative.org. ([n.d.]).
Accessed: May 2021. Google Scholar - [74] . 2007. The context-switch overhead inflicted by hardware interrupts (and the enigma of do-nothing loops). In ACM Workshop on Experimental Computer Science (ExpCS).Google Scholar
- [75] . 1997. The 80x86 Family: Design, Programming, and Interfacing. Prentice Hall.Google Scholar
- [76] . Ultrastar DC SN200. https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-nvme-series/data-sheet-ultrastar-dc-sn200.pdf. ([n.d.]).
Accessed: May 2021. Google Scholar - [77] . 2015. Performance analysis of NVMe SSDs and their implication on real world dtabases. In ACM International Systems and Storage Conference (SYSTOR).Google Scholar
Digital Library
- [78] . 2012. When poll is better than interrupt. In USENIX Conference on File and Storage Technologies (FAST).Google Scholar
- [79] . 2008. Redline: First class support for interactivity in commodity operating systems.. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google Scholar
- [80] . Improvements to the Block Layer. https://lwn.net/Articles/735275/. ([n.d.]).
Accessed: May 2021. Google Scholar - [81] . 2001. NIDS evasion method named ”SeolMa”. Phrack Magazine, Volume 0x0b, Issue 0x39 (2001).Google Scholar
- [82] . 2014. Optimizing the block I/O subsystem for fast storage devices. ACM Transactions on Computer Systems (TOCS) (2014).Google Scholar
Digital Library
- [83] . 2018. FlashShare: Punching through server storage stack from kernel to firmware for ultra-low latency SSDs. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).Google Scholar
- [84] . 2020. Scalable parallel flash firmware for many-core architectures. In USENIX Conference on File and Storage Technologies (FAST).Google Scholar
Index Terms
Optimizing Storage Performance with Calibrated Interrupts
Recommendations
Improving I/O performance of NVMe SSD on virtual machines
SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied ComputingThe ever increasing demand of effective resource utilization in data centers has resulted in the dramatic development of various virtualization environments. Furthermore, the requirements on rapid processing of large data has not only caused to the ...
Optimizing virtual machine live storage migration in heterogeneous storage environment
VEE '13Virtual machine (VM) live storage migration techniques significantly increase the mobility and manageability of virtual machines in the era of cloud computing. On the other hand, as solid state drives (SSDs) become increasingly popular in data centers, ...






Comments