skip to main content
research-article

DirectNVM: Hardware-accelerated NVMe SSDs for High-performance Embedded Computing

Authors Info & Claims
Published:10 February 2022Publication History
Skip Abstract Section

Abstract

With data-intensive artificial intelligence (AI) and machine learning (ML) applications rapidly surging, modern high-performance embedded systems, with heterogeneous computing resources, critically demand low-latency and high-bandwidth data communication. As such, the newly emerging NVMe (Non-Volatile Memory Express) protocol, with parallel queuing, access prioritization, and optimized I/O arbitration, starts to be widely adopted as a de facto fast I/O communication interface. However, effectively leveraging the potential of modern NVMe storage proves to be nontrivial and demands fine-grained control, high processing concurrency, and application-specific optimization. Fortunately, modern FPGA devices, capable of efficient parallel processing and application-specific programmability, readily meet the underlying physical layer requirements of the NVMe protocol, therefore providing unprecedented opportunities to implementing a rich-featured NVMe middleware to benefit modern high-performance embedded computing.

In this article, we present how to rethink existing accessing mechanisms of NVMe storage and devise innovative hardware-assisted solutions to accelerating NVMe data access performance for the high-performance embedded computing system. Our key idea is to exploit the massively parallel I/O queuing capability, provided by the NVMe storage system, through leveraging FPGAs’ reconfigurability and native hardware computing power to operate transparently to the main processor. Specifically, our DirectNVM system aims at providing effective hardware constructs for facilitating high-performance and scalable userspace storage applications through (1) hardening all the essential NVMe driver functionalities, therefore avoiding expensive OS syscalls and enabling zero-copy data access from the application, (2) relying on hardware for the I/O communication control instead of relying on OS-level interrupts that can significantly reduce both total I/O latency and its variance, and (3) exposing cutting-edge and application-specific weighted-round-robin I/O traffic scheduling to the userspace.

To validate our design methodology, we developed a complete DirectNVM system utilizing the Xilinx Zynq MPSoC architecture that incorporates a high-performance application processor (APU) equipped with DDR4 system memory and a hardened configurable PCIe Gen3 block in its programmable logic part. We then measured the storage bandwidth and I/O latency of both our DirectNVM system and a conventional OS-based system when executing the standard FIO benchmark suite [2]. Specifically, compared against the PetaLinux built-in kernel driver code running on a Zynq MPSoC, our DirectNVM has shown to achieve up to 18.4× higher throughput and up to 4.5× lower latency. To ensure the fairness of our performance comparison, we also measured our DirectNVM system against the Intel SPDK [26], a highly optimized userspace asynchronous NVMe I/O framework running on a X86 PC system. Our experiment results have shown that our DirectNVM, even running on a considerably less powerful embedded ARM processor than a full-scale AMD processor, achieved up to 2.2× higher throughput and 1.3× lower latency. Furthermore, by experimenting with a multi-threading test case, we have demonstrated that our DirectNVM’s weighted-round-robin scheduling can significantly optimize the bandwidth allocation between latency-constraint frontend applications and other backend applications in real-time systems. Finally, we have developed a theoretical framework of performance modeling with classic queuing theory that can quantitatively define the relationship between a system’s I/O performance and its I/O implementation.

REFERENCES

  1. [1] Axboe Jens. 2006. Linux Kernel 2.6.18 - Make CFQ the default IO scheduler. Retrieved from https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b17fd9bceb99610f6dc7998c9a4ed6b71520be2b.Google ScholarGoogle Scholar
  2. [2] Axboe Jens. 2020. Flexible I/O. Retrieved from https://github.com/axboe/fio.Google ScholarGoogle Scholar
  3. [3] Bjørling Matias, Axboe Jens, Nellans David, and Bonnet Philippe. 2013. Linux block IO: introducing multi-queue SSD access on multi-core systems. In Proceedings of the 6th International Systems and Storage Conference. 110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Design Opsero Electronic. 2020. FPGA Drive FMC. Retrieved from https://opsero.com/product/fpga-drive-fmc-dual/.Google ScholarGoogle Scholar
  5. [5] Express NVMe. 2017. NVMe Specification 1.3. Retrieved from https://nvmexpress.org/wp-content/uploads/NVM_Express_Revision_1.3.pdf.Google ScholarGoogle Scholar
  6. [6] Group OpenPOWER Accelerator Work. 2020. CAPI Storage, Network, and Analytics Programming (SNAP) Framework. Retrieved from https://developer.ibm.com/linuxonpower/capi/snap.Google ScholarGoogle Scholar
  7. [7] Gugnani Shashank, Lu Xiaoyi, and Panda Dhabaleswar K. D. K.. 2019. Analyzing, modeling, and provisioning QoS for NVME SSDs. In Proceedings of the 11th IEEE/ACM International Conference on Utility and Cloud Computing. 247256. DOI:DOI: DOI: https://doi.org/10.1109/UCC.2018.00033Google ScholarGoogle Scholar
  8. [8] H Jeremysu. 2018. It’s Time to Think Beyond Cloud Computing. Retrieved from https://www.wired.com/story/its-time-to-think-beyond-cloud-computing/.Google ScholarGoogle Scholar
  9. [9] Intel. 2015. Performance Benchmarking for PCIe and NVMe Enterprise Solid-State Drives. Retrieved from https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-pcie-nvme-enterprise-ssds-white-paper.pdf.Google ScholarGoogle Scholar
  10. [10] Intel. 2020. Intel Optane Persistent Memory. Retrieved from https://www.intel.com/content/www/us/en/architecture-and-technology/optane-dc-persistent-memory.html.Google ScholarGoogle Scholar
  11. [11] Intel. 2020. Open Programmable Accelerator Engine. Retrieved from https://opae.github.io/latest/index.html.Google ScholarGoogle Scholar
  12. [12] Kang Yangwook, Kee Yang-suk, Miller Ethan L., and Park Chanik. 2013. Enabling cost-effective data processing with smart SSD. In Proceedings of the IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, 112.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Kim Hyeong-Jun, Lee Young-Sik, and Kim Jin-Soo. 2016. NVMeDirect: A user-space I/O framework for application-specific optimization on nvme SSDs. In Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Lakatos László, Szeidl László, and Telek Miklós. 2013. Introduction to Queueing Systems with Telecommunication Applications. Vol. 9781461453. Springer. DOI:DOI: DOI: https://doi.org/10.1007/978-1-4614-5317-8Google ScholarGoogle Scholar
  15. [15] Moal Damien Le. 2017. I/O Latency Optimization with polling. In Proceedings of the Vault Linux Storage and Filesystems Conference.Google ScholarGoogle Scholar
  16. [16] Miemietz Till, Weisbach Hannes, Roitzsch Michael, and Härtig Hermann. 2019. K2: Work-constraining scheduling of NVMe-attached storage. In Proceedings of the IEEE Real-time Systems Symposium (RTSS). IEEE, 5668.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Munir Arslan, Ranka Sanjay, and Gordon-Ross Ann. 2012. High-performance energy-efficient multicore embedded computing. Trans. Parallel Distrib. Syst. 23 (05 2012), 684-700. DOI:DOI: DOI: https://doi.org/10.1109/TPDS.2011.214Google ScholarGoogle Scholar
  18. [18] Ruan Zhenyuan, He Tong, and Cong Jason. 2019. INSIDER: Designing in-storage computing system for emerging high-performance drive. In Proceedings of the USENIX Annual Technical Conference (USENIXATC’19). 379394.Google ScholarGoogle Scholar
  19. [19] Samsung. 2020. Samsung 970 EVO Plus Specification. Retrieved from https://www.samsung.com/semiconductor/minisite/ssd/product/consumer/970evoplus/.Google ScholarGoogle Scholar
  20. [20] Seo Dong Won. 2014. Explicit formulae for characteristics of finite-Capacity M/D/1 queues. ETRI J. 36, 4 (2014), 609616. DOI:DOI: DOI: https://doi.org/10.4218/etrij.14.0113.0812Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Stratikopoulos Athanasios. 2019. Low Overhead & Energy Efficient Storage Path for Next Generation Computer Systems.Ph.D. Dissertation. University of Manchester.Google ScholarGoogle Scholar
  22. [22] Stratikopoulos Athanasios, Kotselidis Christos, Goodacre John, and Luján Mikel. 2018. FastPath: Towards wire-speed NVMe SSDs. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1701707.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Xilinx. 2019. DMA/Bridge Subsystem for PCI Express v4.1. Retrieved from https://www.xilinx.com/support/documentation/ip_documentation/xdma/v4_1/pg195-pcie-dma.pdf.Google ScholarGoogle Scholar
  24. [24] Xilinx. 2019. PetaLinux Tools Documentation. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug1144-petalinux-tools-reference-guide.pdf.Google ScholarGoogle Scholar
  25. [25] Xilinx. 2019. UltraScale+ Devices Integrated Block for PCI Express v1.3. Retrieved from https://www.xilinx.com/support/documentation/ip_documentation/pcie4_uscale_plus/v1_3/pg213-pcie4-ultrascale-plus.pdf.Google ScholarGoogle Scholar
  26. [26] Yang Ziye, Harris James R., Walker Benjamin, Verkamp Daniel, Liu Changpeng, Chang Cunyin, Cao Gang, Stern Jonathan, Verma Vishal, and Paul Luse E.. 2017. SPDK: A development kit to build high performance storage applications. In Proceedings of the IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, 154161.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Zou Yu. 2020. DirectNVM. Retrieved from https://github.com/yu-zou/DirectNVM.Google ScholarGoogle Scholar

Index Terms

  1. DirectNVM: Hardware-accelerated NVMe SSDs for High-performance Embedded Computing

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Transactions on Embedded Computing Systems
                ACM Transactions on Embedded Computing Systems  Volume 21, Issue 1
                January 2022
                288 pages
                ISSN:1539-9087
                EISSN:1558-3465
                DOI:10.1145/3505211
                Issue’s Table of Contents

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 10 February 2022
                • Accepted: 1 April 2021
                • Revised: 1 March 2021
                • Received: 1 December 2020
                Published in tecs Volume 21, Issue 1

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader

              Full Text

              View this article in Full Text.

              View Full Text

              HTML Format

              View this article in HTML Format .

              View HTML Format
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!