Abstract
Direct network I/O allows network controllers (NICs) to expose multiple instances of themselves, to be used by untrusted software without a trusted intermediary. Direct I/O thus frees researchers from legacy software, fueling studies that innovate in multitenant setups. Such studies, however, overwhelmingly ignore one serious problem: direct memory accesses (DMAs) of NICs disallow page faults, forcing systems to either pin entire address spaces to physical memory and thereby hinder memory utilization, or resort to APIs that pin/unpin memory buffers before/after they are DMAed, which complicates the programming model and hampers performance.
We solve this problem by designing and implementing page fault support for InfiniBand and Ethernet NICs. A main challenge we tackle---unique to NICs---is handling receive DMAs that trigger page faults, leaving the NIC without memory to store the incoming data. We demonstrate that our solution provides all the benefits associated with "regular" virtual memory, notably (1) a simpler programming model that rids users from the need to pin, and (2) the ability to employ all the canonical memory optimizations, such as memory overcommitment and demand-paging based on actual use. We show that, as a result, benchmark performance improves by up to 1.9x.
- Brian Aker and Mingqiang Zhuang. Memaslap - load testing and benchmarking a server. http://docs.libmemcached.org/bin/memaslap.html. libmemcached 1.1.0 documentation. Accessed: May 2016.Google Scholar
- M. Alizadeh, B. Atikoglu, A. Kabbani, A. Lakshmikantha, Rong Pan, B. Prabhakar, and M. Seaman. Data center transport mechanisms: Congestion control theory and IEEE standardization. In Communication, Control, and Computing, 2008 46th Annual Allerton Conference on, pages 1270--1277, Sept 2008. http://dx.doi.org/10.1109/ALLERTON.2008.4797706. Google Scholar
Cross Ref
- M. Allman, V. Paxson, and W. Stevens. TCP Congestion Control. RFC 2581, Internet Engineering Task Force, April 1999.Google Scholar
Digital Library
- AMD Inc. AMD IOMMU architectural specification, rev 2.00. http://developer.amd.com/wordpress/media/2012/10/488821.pdf, Mar 2011. Accessed: May 2016.Google Scholar
- Nadav Amit, Muli Ben-Yehuda, Dan Tsafrir, and Assaf Schuster. vIOMMU: efficient IOMMU emulation. In USENIX Annual Technical Conference (ATC), pages 73--86, 2011. http://www.usenix.org/events/atc11/tech/final_files/Amit.pdf.Google Scholar
Digital Library
- Nadav Amit, Abel Gordon, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. Bare-metal performance for virtual machines with exitless interrupts. Communications of the ACM (CACM), 59(1):108--116, Jan 2016. http://dx.doi.org/10.1145/2845648. Google Scholar
Digital Library
- Nadav Amit, Dan Tsafrir, and Assaf Schuster. VSwapper: A memory swapper for virtualized environments. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 349--366, 2014. http://dx.doi.org/10.1145/2541940.2541969. Google Scholar
Digital Library
- Apple Inc. Thunderbolt device driver programming guide: Debugging VT-d I/O MMU virtualization. https://developer.apple.com/library/mac/documentation/HardwareDrivers/Conceptual/ThunderboltDevGuide/DebuggingThunderboltDrivers/DebuggingThunderboltDrivers.html, 2013. Accessed: May 2014.Google Scholar
- Andrea Arcangeli. Integrating KVM with the linux memory management. In KVM Forum, 2008.Google Scholar
- ARM Holdings. ARM system memory management unit architecture specification -- SMMU architecture version 2.0. http://infocenter.arm.com/help/topic/com.arm.doc.ihi0062c/IHI0062C_system_mmu_architecture_specification.pdf, 2013. Accessed: Jan 2015.Google Scholar
- Jens Axboe. Fio -- flexible IO tester. http://git.kernel.dk/?p=fio.git.Google Scholar
- Thomas Ball, Ella Bounimova, Byron Cook, Vladimir Levin, Jakob Lichtenberg, Con McGarvey, Bohus Ondrusek, Sriram K. Rajamani, and Abdullah Ustuner. Thorough static analysis of device drivers. In ACM Eurosys, pages 73--85, 2006. Google Scholar
Digital Library
- Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. IX: A protected dataplane operating system for high throughput and low latency. In USENIX Symposium on Operating System Design and Implementation (OSDI), pages 49--65, 2014. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-belay.pdf.Google Scholar
- Christian Bell and Dan Bonachea. A new DMA registration strategy for pinning-based high performance networks. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2003. http://dx.doi.org/10.1109/IPDPS.2003.1213363. Google Scholar
Cross Ref
- Muli Ben-Yehuda, Orna Agmon Ben-Yehuda, and Dan Tsafrir. The nom profit-maximizing operating system. In ACM International Conference on Virtual Execution Environments (VEE), pages 145--160, 2016. http://dx.doi.org/10.1145/2892242.2892250. Google Scholar
Digital Library
- Muli Ben-Yehuda, Michael D. Day, Zvi Dubitzky, Michael Factor, Nadav Har'El, Abel Gordon, Anthony Liguori, Orit Wasserman, and Ben-Ami Yassour. The Turtles project: Design and implementation of nested virtualization. In USENIX Symposium on Operating System Design and Implementation (OSDI), pages 423--436, 2010. http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf.Google Scholar
- Muli Ben-Yehuda, Jimi Xenidis, Michal Ostrowski, Karl Rister, Alexis Bruemmer, and Leendert van Doorn. The price of safety: Evaluating IOMMU performance. In Ottawa Linux Symposium (OLS), pages 9--20, 2007. https://www.kernel.org/doc/mirror/ols2007v1.pdf#page=9.Google Scholar
- Ravi Bhargava, Benjamin Serebrin, Francesco Spadini, and Srilatha Manne. Accelerating two-dimensional page walks for virtualized systems. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 26--35, 2008. http://dx.doi.org/10.1145/1346281.1346286. Google Scholar
Digital Library
- Robert Birke, Lydia Y Chen, and Evgenia Smirni. Data centers in the wild: A large performance study. Technical Report RZ3820, IBM Research, 2012. http://domino.research.ibm.com/library/cyberdig.nsf/papers/0C306B31CF0D3861852579E40045F17F.Google Scholar
- James E.J. Bottomley. Dynamic DMA mapping using the generic device. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/DMA-API.txt?id=refs/tags/v3.18.3. Linux kernel documentation. Accessed: Jan 2015.Google Scholar
- Ethan Burns. Implementation and comparison of iSCSI over RDMA. PhD thesis, University of New Hampshire, 2008.Google Scholar
- Mallikarjun Chadalapaka, Uri Elzur, Michael Ku, Hemal Shah, and Patricia Thaler. A Study of iSCSI Extensions for RDMA. In Computer-Communication Networks, August 2003. Google Scholar
Digital Library
- Yuqun Chen, Angelos Bilas, Stefanos N. Damianakis, Cezary Dubnicki, and Kai Li. UTLB: A mechanism for address translation on network interfaces. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 193--204, 1998. http://dx.doi.org/10.1145/291069.291046. Google Scholar
Digital Library
- Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler. An empirical study of operating systems errors. In ACM Symposium on Operating Systems Principles (SOSP), pages 73--88, 2001. Google Scholar
Digital Library
- Jonathan Corbet. Linux Device Drivers, chapter 15: Memory Mapping and DMA. O'Reilly, 3rd edition, 2005.Google Scholar
- Jonathan Corbet. Newer, newer NAPI. LWN https://lwn.net/Articles/244640/, Aug 2007. (Accessed: Aug 2016).Google Scholar
- Intel Corporation. Intel MPI benchmarks. https://software.intel.com/en-us/articles/intel-mpi-benchmarks, 2013.Google Scholar
- Crehan Research. Another year of robust growth and record shipments for branded data center switches. http://www.crehanresearch.com/wp-content/uploads/2015/03/CREHAN-2014-Data-Center-Switching-CR.pdf, Mar 2015. (Accessed: Aug 2015).Google Scholar
- Yaozu Dong, Yu Chen, Zhenhao Pan, Jinquan Dai, and Yunhong Jiang. ReNIC: Architectural extension to SR-IOV I/O virtualization for efficient replication. ACM Transactions on Architecture and Code Optimization (TACO), 8(4):40:1--40:22, Jan 2012. http://dx.doi.org/10.1145/2086696.2086719. Google Scholar
Digital Library
- Aleksandar Dragojević, Dushyanth Narayanan, Miguel Castro, and Orion Hodson. FaRM: Fast remote memory. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 401--414, 2014. https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevic.Google Scholar
Digital Library
- Adam Dunkels. Design and implementation of the lwIP TCP/IP stack. Swedish Institute of Computer Science, 2:77, 2001.Google Scholar
- Montse Farreras, George Almasi, Calin Cascaval, and Toni Cortes. Scalable RDMA performance in PGAS languages. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2009. http://dx.doi.org/10.1109/IPDPS.2009.5161025. Google Scholar
Digital Library
- Brad Fitzpatrick. Distributed caching with memcached. Linux Journal, 2004(124):5, Aug 2004. http://dl.acm.org/citation.cfm?id=1012889.1012894.Google Scholar
Digital Library
- Sally Floyd, Dr. K. K. Ramakrishnan, and David L. Black. The Addition of Explicit Congestion Notification (ECN) to IP. RFC 3168, March 2013.Google Scholar
- Tomonori Fujita and Mike Christie. tgt: Framework for Storage Target Drivers. In Proceedings of the Linux Symposium, July 2006.Google Scholar
- GASNet 1.26.0. https://gasnet.lbl.gov/GASNet-1.26.0.tar.gz, October 2015. (Accessed: May 2016).Google Scholar
- Dror Goldenberg, Michael Kagan, Ran Ravid, and Michael S. Tsirkin. Zero copy sockets direct protocol over InfiniBand -- preliminary implementation and performance analysis. In IEEE Symposium on High Performance Interconnects (HOTI), pages 128--137, 2005. http://dx.doi.org/10.1109/CONECT.2005.35. Google Scholar
Digital Library
- Kinshuk Govil, Dan Teodosiu, Yongqiang Huang, and Mendel Rosenblum. Cellular disco: Resource management using virtual clusters on shared-memory multiprocessors. In ACM Symposium on Operating Systems Principles (SOSP), pages 154--169, 1999. http://dx.doi.org/10.1145/319344.319162. Google Scholar
Digital Library
- Diwaker Gupta, Sangmin Lee, Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geoffrey M. Voelker, and Amin Vahdat. Difference engine: Harnessing memory redundancy in virtual machines. Communications of the ACM (CACM), pages 85--93, 2010. http://dx.doi.org/10.1145/1831407.1831429. Google Scholar
Digital Library
- James Hamilton. AWS innovation at scale. https://www.youtube.com/watch?t=113&v=JIQETrFC_SQ, Nov 2014. (Accessed: Aug 2015).Google Scholar
- Nadav Har'El, Abel Gordon, Alex Landau, Muli Ben-Yehuda, Avishay Traeger, and Razya Ladelsky. Efficient and scalable paravirtual I/O system. In USENIX Annual Technical Conference (ATC), pages 231--242, 2013. https://www.usenix.org/system/files/conference/atc13/atc13-harel.pdf.Google Scholar
- Jorrit N. Herder, Herbert Bos, Ben Gras, Philip Homburg, and Andrew S. Tanenbaum. Failure resilience for device drivers. In IEEE/IFIP Annual International Conference on Dependable Systems and Networks (DSN), pages 41--50, 2007. Google Scholar
Digital Library
- Gregory D. Hill and Albert H. Chen. High performance network multiplexing with IXGoogle Scholar
- . Research report, Stanford University, 2015. http://hselin.com/resources/CS344g_ixplusplus_final%20paper.pdf.Google Scholar
- Michael R. Hines, Abel Gordon, Marcio Silva, Dilma Da Silva, Kyung Ryu, and Muli Ben-Yehuda. Applications know best: Performance-driven memory overcommit with Ginkgo. In IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pages 130--137, 2011. http://dx.doi.org/10.1109/CloudCom.2011.27. Google Scholar
Digital Library
- Eric Horschman. Hypervisor memory management done right. http://blogs.vmware.com/virtualreality/2011/02/hypervisor-memory-management-done-right.html, 2011. (Accessed: May 2016).Google Scholar
- The HSA Foundation. http://www.hsafoundation.com/.Google Scholar
- HSA Foundation. HSA-Drivers-Linux-AMD. https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD. (Accessed: May 2016).Google Scholar
- Woomin Hwang, Yangwoo Roh, Youngwoo Park, Ki-Woong Park, and Kyu Ho Park. HyperDealer: Reference pattern aware instant memory balancing for consolidated virtual machines. In IEEE International Conference on Cloud Computing (CLOUD), pages 426--434, 2014. http://dx.doi.org/10.1109/CLOUD.2010.70. Google Scholar
Digital Library
- IBM Corporation. PowerLinux servers -- 64-bit DMA concepts. http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liabm/liabmconcepts.htm. Accessed: May 2014.Google Scholar
- IBM Corporation. AIX kernel extensions and device support programming concepts. https://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.kernelext/doc/kernextc/kernextc\_pdf.pdf, 2013. Accssed: May 2014.Google Scholar
- IEEE. Specification for 802.3 full duplex operation. IEEE Standard 802.3x http://dx.doi.org, 1997.Google Scholar
- VMware Inc. Configuring VMDirectPath I/O pass-through devices on a VMware ESX or VMware ESXi host. https://kb.vmware.com/kb/1010789. VMware Knowledge Base. Accessed: Aug 2016.Google Scholar
- InfiniBand Trade Association (IBTA). About InfiniBand. http://www.infinibandta.org/content/pages.php?pg=about\_us\_infiniband. (Accessed: May 2016).Google Scholar
- InfiniBand Trade Association (IBTA). About RoCE. http://www.infinibandta.org/content/pages.php?pg=about\_us\_RoCE. (Accessed: May 2016).Google Scholar
- Intel. PCI-SIG SR-IOV primer: An introduction to SR-IOV technology. http://www.intel.com/content/www/us/en/pci-express/pci-sig-sr-iov-primer-sr-iov-technology-paper.html, Jan 2011.Google Scholar
- Intel Corporation. DPDK: Data plane development kit. http://dpdk.org. (Accessed: May 2016).Google Scholar
- Intel Corporation. Intel virtualization technology for directed I/O - architecture specification - specification - Rev.\ 2.2. http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf, Sep 2013. Accessed: Jan 2015.Google Scholar
- Intel Corporation. Intel virtualization technology for directed I/O - architecture specification - Rev.\ 2.3. http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf, Oct 2014.Google Scholar
- Rick A. Jones. Netperf: A network performance benchmark (Revision 2.0). http://www.netperf.org/netperf/training/Netperf.html, 1995. Accessed: August, 2016.Google Scholar
- M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Hector M. Brice\ no, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie. Application performance and flexibility on exokernel systems. In ACM Symposium on Operating Systems Principles (SOSP), pages 52--65, 1997.Google Scholar
Digital Library
- Anuj Kalia, Michael Kaminsky, and David G. Andersen. Design guidelines for high performance RDMA systems. In USENIX Annual Technical Conference (ATC), pages 437--450, 2016. https://www.usenix.org/conference/atc16/technical-sessions/presentation/kalia.Google Scholar
- Antoine Kaufmann, SImon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. High performance packet processing with FlexNIC. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 67--81, 2016. http://dx.doi.org/10.1145/2872362.2872367. Google Scholar
Digital Library
- Alice E. Koniges, Rolf Rabenseifner, and Karl Solchenbach. Benchmark design for characterization of balanced high-performance architectures. In Proceedings of the 15th International Parallel & Distributed Processing Symposium, IPDPS '01, pages 196--, Washington, DC, USA, 2001. IEEE Computer Society. Google Scholar
Cross Ref
- Yossi Kuperman, Eyal Moscovici, Joel Nider, Razya Ladelsky, Abel Gordon, and Dan Tsafrir. Paravirtual remote I/O. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 49--65, 2016. http://dx.doi.org/10.1145/2872362.2872378. Google Scholar
Digital Library
- George Kyriazis. Heterogeneous system architecture: A technical review. Technical report, AMD Inc., Aug 2012. Rev. 1.0 http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/hsa10.pdf (Accessed: May 2016).Google Scholar
- Joshua LeVasseur, Volkmar Uhlig, Jan Stoess, and Stefan Götz. Unmodified device driver reuse and improved system dependability via virtual machines. In USENIX Symposium on Operating System Design and Implementation (OSDI), pages 17--30, 2004. https://www.usenix.org/legacy/publications/library/proceedings/osdi04/tech/full_papers/levasseur/levasseur.pdf.Google Scholar
Digital Library
- Sheng Li, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, O. Seongil, Sukhan Lee, and Pradeep Dubey. Architecting to achieve a billion requests per second throughput on a single key-value store server platform. In ACM International Symposium on Computer Architecture (ISCA), pages 476--488, 2015. https://doi.org/10.1145/2749469.2750416.Google Scholar
Digital Library
- Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 429--444, 2014. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/lim.Google Scholar
- getrlimit(2) -- Linux man page. http://linux.die.net/man/2/getrlimit. (Accessed: May 2016).Google Scholar
- mlock(2) -- Linux man page. http://linux.die.net/man/2/mlock. (Accessed: May 2016).Google Scholar
- The include/uapi/linux/resource.h header file of Linux 4.5. http://lxr.free-electrons.com/source/include/uapi/linux/resource.h?v=4.5\#L71. (Accessed: May 2016).Google Scholar
- Jiuxing Liu, Dhabaleswar K. Panda, Jiuxing Liu Dhabaleswar K. P, and Mohammad Banikazemi. Evaluating the impact of RDMA on storage I/O over InfiniBand. In SAN-03 Workshop (in conjunction with HPCA), 2004, 2004.Google Scholar
- Moshe Malka, Nadav Amit, Muli Ben-Yehuda, and Dan Tsafrir. rIOMMU: Efficient IOMMU for I/O devices that employ ring buffers. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 355--368, 2015. Google Scholar
Digital Library
- Moshe Malka, Nadav Amit, and Dan Tsafrir. Efficient intra-operating system protection against harmful DMAs. In USENIX Conference on File and Storage Technologies (FAST), pages 29--44, 2015.Google Scholar
Digital Library
- Vinod Mamtani. DMA directions and Windows. http://download.microsoft.com/download/a/f/d/afdfd50d-6eb9--425e-84e1-b4085a80e34e/sys-t304\_wh07.pptx, 2007. Accessed: May 2014.Google Scholar
- Alex Markuze, Adam Morrison, and Dan Tsafrir. It's DAMN time for overhead-free IOMMU protection. Submitted.Google Scholar
- Alex Markuze, Adam Morrison, and Dan Tsafrir. True IOMMU protection from DMA attacks: When copy is faster than zero copy. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 249--262, 2016. http://dx.doi.org/10.1145/2872362.2872379. Google Scholar
Digital Library
- Paul Menage. Cgroups. https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt.Google Scholar
- Frank Mietke, Robert Rex, Robert Baumgartl, Torsten Mehlan, Torsten Hoefler, and Wolfgang Rehm. Analysis of the memory registration process in the Mellanox InfiniBand software stack. In International European Conference on Parallel and Distributed Computing (Euro-Par), pages 124--133, 2006. http://dx.doi.org/10.1007/11823285_13. Google Scholar
Digital Library
- Juan Navarro, Sitaram Iyer, Peter Druschel, and Alan Cox. Practical, transparent operating system support for superpages. In USENIX Symposium on Operating System Design and Implementation (OSDI), pages 89--104, 2002. Google Scholar
Cross Ref
- Jarek Nieplocha, Vinod Tipparaju, Amina Saify, and Dhabaleswar K. Panda. Protocols and strategies for optimizing performance of remote memory operations on clusters. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2002. http://dx.doi.org/10.1109/IPDPS.2002.1016563. Google Scholar
Cross Ref
- Radhika Niranjan Mysore, George Porter, and Amin Vahdat. FasTrak: Enabling express lanes in multi-tenant data centers. In ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT), pages 139--150, 2013. http://dx.doi.org/10.1145/2535372.2535386. Google Scholar
Digital Library
- Li Ou, Xubin He, and Jizhong Han. An efficient design for fast memory registration in RDMA. Journal of Network and Computer Applications, 2009. Google Scholar
Digital Library
- Shoumik Palkar, Chang Lan, Sangjin Han, Keon Jang, Aurojit Panda, Sylvia Ratnasamy, Luigi Rizzo, and Scott Shenker. E2: A framework for NFV applications. In ACM Symposium on Operating Systems Principles (SOSP), pages 121--136, 2015. https://doi.org/10.1145/2815400.2815423. Google Scholar
Digital Library
- PCI-SIG. Single root I/O virtualization and sharing 1.0 specification. http://www.pcisig.com/specifications/iov/single_root/, Sep 2007. (Accessed: Aug 2016).Google Scholar
- PCI-SIG. Address Translation Services Revision 1.1. http://www.pcisig.com/specifications/iov/ats/, 2009.Google Scholar
- PCI-SIG. Single root I/O virtualization and sharing 1.1 specification. http://www.pcisig.com/specifications/iov/single_root/, Jan 2010. (Accessed: Aug 2016).Google Scholar
- Omer Peleg, Adam Morrison, Benjamin Serebrin, and Dan Tsafrir. Utilizing the IOMMU Scalably. In USENIX Annual Technical Conference (ATC), 2015.Google Scholar
- Simon Peter, Jialin Li, Doug Woos, Irene Zhang, Dan R. K. Ports, Thomas Anderson, Arvind Krishnamurthy, and Mark Zbikowski. Towards high-performance application-level storage management. In USENIX Workshop on Hot Topics in Storage and File Systems (HOTSTORAGE), 2014. https://www.usenix.org/system/files/conference/hotstorage14/hotstorage14-paper-peter.pdf.Google Scholar
- Simon Peter, Jialin Li, Irene Zhang, Dan R. K. Ports, Doug Woos, Arvind Krishnamurthy, Thomas Anderson, and Timothy Roscoe. Arrakis: The operating system is the control plane. In USENIX Symposium on Operating System Design and Implementation (OSDI), pages 1--16, 2014. https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-peter_simon.pdf.Google Scholar
Digital Library
- Renato J. Recio, Bernard Metzler, Paul R. Culley, Jeff Hilland, and Dave Garcia. A remote direct memory access protocol specification. RFC 5040, The Internet Engineering Task Force (IETF) Network Working Group, 2007. https://tools.ietf.org/html/rfc5040 (Accessed: May 2016).Google Scholar
- Bruce Richardson. [dpdk-dev] memory pinning. http://dpdk.org/ml/archives/dev/2014-June/003937.html, 2014. (Accessed: Aug 2016).Google Scholar
- Luigi Rizzo. Netmap: A novel framework for fast packet I/O. In USENIX Annual Technical Conference (ATC), pages 101--112, 2012. https://www.usenix.org/conference/atc12/technical-sessions/presentation/rizzo.Google Scholar
- Phil Rogers. Heterogeneous System Architecture (HSA): Overview and implementation. In Hot Chips, 2013. HC25. http://www.hotchips.org/wp-content/uploads/hc_archives/hc25/HC25.0T1-Hetero-epub/HC25.25.100-Intro-Rogers-HSA%20Intro%20HotChips2013_Final.pdf (Accessed: May 2016).Google Scholar
- Arvind Seshadri, Mark Luk, Ning Qu, and Adrian Perrig. SecVisor: A tiny hypervisor to provide lifetime kernel code integrity for commodity OSes. In ACM Symposium on Operating Systems Principles (SOSP), pages 335--350, 2007. http://dx.doi.org/10.1145/1294261.1294294. Google Scholar
Digital Library
- Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. Fast and concurrent RDF queries with RDMA-based distributed graph exploration. In USENIX Symposium on Operating System Design and Implementation (OSDI), pages 317--332, 2016. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/shi.Google Scholar
- Igor Smolyar, Muli Ben-Yehuda, and Dan Tsafrir. Securing self-virtualizing Ethernet devices. In USENIX Security Symposium, pages 335--350, 2015.Google Scholar
- Livio Soares and Michael Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In USENIX Symposium on Operating System Design and Implementation (OSDI), pages 33--46, 2010. https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Soares.pdf.Google Scholar
Digital Library
- Vaidyanathan Srinivasan, Anand K. Santhanam, and Madhavan Srinivasan. Cell Broadband Engine processor DMA engines, Part 1: The little engines that move data. http://www.ibm.com/developerworks/library/pa-celldmas, 2005. (Accessed: May 2016).Google Scholar
- B. Stephens, A.L. Cox, A. Singla, J. Carter, C. Dixon, and W. Felter. Practical DCB for improved data center networks. In INFOCOM, 2014 Proceedings IEEE, pages 1824--1832, April 2014. Google Scholar
Cross Ref
- Michael Swift, Brian Bershad, and Henry Levy. Improving the reliability of commodity operating systems. ACM Transactions on Computer Systems (TOCS), 23(1):77--110, Feb 2005. Google Scholar
Digital Library
- Taneja Group. Hypervisor shootout: Maximizing workload density in the virtualization platform. http://www.vmware.com/files/pdf/vmware-maximize-workload-density-tg.pdf, 2010. (Accessed: May 2016).Google Scholar
- Hiroshi Tezuka, Francis O'Carroll, Atsushi Hori, and Yutaka Ishikawa. Pin-down cache: A virtual memory management technique for zero-copy communication. In IEEE International Parallel Processing Symposium (IPPS), pages 308--314, 1998. http://dx.doi.org/10.1109/IPPS.1998.669932. Google Scholar
Cross Ref
- Animesh Trivedi. Remote direct memory access (RDMA) 101 -- quick history lesson and introduction. http://0x8086.blogspot.com/2011/11/remote-direct-memory-access-rdma-101.html, 2011. (Accessed: May 2016).Google Scholar
- Cheng-Chun Tu, Michael Ferdman, Chao-tung Lee, and Tzi-cker Chiueh. A comprehensive implementation and evaluation of direct interrupt delivery. In ACM International Conference on Virtual Execution Environments (VEE), pages 1--15, 2016. http://dx.doi.org/10.1145/2731186.2731189. Google Scholar
Digital Library
- Cheng-Chun Tu, Chao-tang Lee, and Tzi-cker Chiueh. Marlin: A memory-based rack area network. In ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), pages 125--136, 2014. http://doi.acm.org/10.1145/2658260.2658262.Google Scholar
- Gabrie van Zanten. Memory overcommit in production? YES YES YES. http://www.gabesvirtualworld.com/memory-overcommit-in-production-yes-yes-yes/, 2010. (Accessed: May 2016).Google Scholar
- Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems, EuroSys '15, pages 18:1--18:17, New York, NY, USA, 2015. ACM. Google Scholar
Digital Library
- Carl A. Waldspurger. Memory resource management in VMware ESX server. In USENIX Symposium on Operating System Design and Implementation (OSDI), pages 181--194, 2002. https://www.usenix.org/legacy/events/osdi02/tech/waldspurger.html. Google Scholar
Cross Ref
- Xingda Wei, Jiaxin Shi, Yanzhe Chen, Rong Chen, and Haibo Chen. Fast in-memory transaction processing using RDMA and H™. In ACM Symposium on Operating Systems Principles (SOSP), pages 87--104, 2015. https://doi.org/10.1145/2815400.2815419. Google Scholar
Digital Library
- Wikipedia. iWARP -- internet Wide Area RDMA Protocol. https://en.wikipedia.org/wiki/IWARP. (Accessed: Aug 2016).Google Scholar
- Dan Williams, Patrick Reynolds, Kevin Walsh, Emin Gün Sirer, and Fred B. Schneider. Device driver safety through a reference validation mechanism. In USENIX Symposium on Operating System Design and Implementation (OSDI), pages 241--254, 2008.Google Scholar
Digital Library
- Alex Williamson. VFIO: A user's perspective. In KVM Forum, 2012. http://www.linux-kvm.org/images/b/b4/2012-forum-VFIO.pdf. (Accsessed: May 2016).Google Scholar
- Alex Williamson. [qemu-devel] Intel IOMMU guest emulation and vfio-pci passthrough. https://lists.gnu.org/archive/html/qemu-devel/2015--11/msg04284.html, Nov 2015. (Accessed: Aug 2016).Google Scholar
- Paul Willmann, Scott Rixner, and Alan L. Cox. Protection strategies for direct access to virtualized I/O devices. In USENIX Annual Technical Conference (ATC), pages 15--28, 2008. https://www.usenix.org/legacy/event/usenix08/tech/full_papers/willmann/willmann.pdf.Google Scholar
Digital Library
- Timothy Wood, Gabriel Tarasuk-Levin, Prashant Shenoy, Peter Desnoyers, Emmanuel Cecchet, and Mark D. Corner. Memory buddies: Exploiting page sharing for smart colocation in virtualized data centers. In ACM International Conference on Virtual Execution Environments (VEE), pages 31--40, 2009. http://dx.doi.org/10.1145/1508293.1508299. Google Scholar
Digital Library
- Jiesheng Wu, Pete Wyckoff, and Dhabaleswar Panda. PVFS over InfiniBand: Design and performance evaluation. In International Conference on Parallel Processing (ICPP), pages 125--132, 2003. http://dx.doi.org/10.1109/ICPP.2003.1240573. Google Scholar
Cross Ref
- Xiaowei Yang, Chuan Ye, and Qiangmin Lin. Evaluation and enhancement to memory sharing and swapping in Xen 4.1. In Xen Summit, 2011. http://tinyurl.com/xen-mem-share-swap (Accessed: May 2016).Google Scholar
- Ben-Ami Yassour, Muli Ben-Yehuda, and Orit Wasserman. On the DMA mapping problem in direct device assignment. In ACM International Systems and Storage Conference (SYSTOR), pages 18:1--18:12, 2010. http://dx.doi.org/10.1145/1815695.1815718. Google Scholar
Digital Library
Index Terms
Page Fault Support for Network Controllers
Recommendations
Page Fault Support for Network Controllers
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating SystemsDirect network I/O allows network controllers (NICs) to expose multiple instances of themselves, to be used by untrusted software without a trusted intermediary. Direct I/O thus frees researchers from legacy software, fueling studies that innovate in ...
Page Fault Support for Network Controllers
Asplos'17Direct network I/O allows network controllers (NICs) to expose multiple instances of themselves, to be used by untrusted software without a trusted intermediary. Direct I/O thus frees researchers from legacy software, fueling studies that innovate in ...
Dynamic scratchpad memory management for code in portable systems with an MMU
In this work, we present a dynamic memory allocation technique for a novel, horizontally partitioned memory subsystem targeting contemporary embedded processors with a memory management unit (MMU). We propose to replace the on-chip instruction cache ...







Comments