Abstract
The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing sub systems. We argue that the primary reason for high memory and processing overheads is the inefficient use of these resources by current commodity network interface cards (NICs). We propose FlexNIC, a flexible network DMA interface that can be used by operating systems and applications alike to reduce packet processing overheads. FlexNIC allows services to install packet processing rules into the NIC, which then executes simple operations on packets while exchanging them with host memory. Thus, our proposal moves some of the packet processing traditionally done in software to the NIC, where it can be done flexibly and at high speed.
We quantify the potential benefits of FlexNIC by emulating the proposed FlexNIC functionality with existing hardware or in software. We show that significant gains in application performance are possible, in terms of both latency and throughput, for several widely used applications, including a key-value store, a stream processing system, and an intrusion detection system.
- http://ictf.cs.ucsb.edu/ictfdata/2010/dumps/.Google Scholar
- http://memcached.org/.Google Scholar
- G. Banga, P. Druschel, and J. C. Mogul. Resource containers: A new facility for resource management in server systems. In 3rd USENIX Symposium on Operating Systems Design and Implementation, OSDI, 1999.Google Scholar
- B. W. Barrett, R. Brightwell, S. Hemmert, K. Pedretti, K. Wheeler, K. Underwood, R. Riesen, A. B. Maccabee, and T. Hudson. The Portals 4.0.1 Network Programming Interface. Sandia National Laboratories, sand2013--3181 edition, Apr. 2013.Google Scholar
- A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. IX: A protected dataplane operating system for high throughput and low latency. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.Google Scholar
- N. L. Binkert, A. G. Saidi, and S. K. Reinhardt. Integrated network interfaces for high-bandwidth TCP/IP. In 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2006.Google Scholar
Digital Library
- M. Blott, K. Karras, L. Liu, K. A. Vissers, J. Bar, and Z. István. Achieving 10Gbps line-rate key-value stores with FPGAs. In 5th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud, 2013.Google Scholar
- P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Izzard, F. Mujica, and M. Horowitz. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN. In ACM Conference on SIGCOMM, 2013.Google Scholar
Digital Library
- P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker. P4: Programming protocol-independent packet processors. SIGCOMM Computer Communication Review, 44 (3): 87--95, July 2014.Google Scholar
Digital Library
- Cavium Corporation. OCTEON II CN68XX multi-core MIPS64 processors. http://www.cavium.com/pdfFiles/CN68XX_PB_Rev1.pdf.Google Scholar
- S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala. An FPGA Memcached appliance. In 21st ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA, 2013.Google Scholar
Digital Library
- S. Di Girolamo, P. Jolivet, K. Underwood, and T. Hoefler. Exploiting offload enabled network interfaces. In 23rd IEEE Symposium on High Performance Interconnects, HOTI, 2015.Google Scholar
Digital Library
- A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2014.Google Scholar
- P. Druschel and G. Banga. Lazy receiver processing (LRP): A network subsystem architecture for server systems. In 2nd USENIX Symposium on Operating Systems Design and Implementation, OSDI, 1996.Google Scholar
Digital Library
- P. Druschel, L. Peterson, and B. Davie. Experiences with a high-speed network adaptor: A software perspective. In ACM Conference on SIGCOMM, 1994.Google Scholar
Digital Library
- M. Flajslik and M. Rosenblum. Network interface design for low latency request-response protocols. In 2013 USENIX Annual Technical Conference, ATC, 2013.Google Scholar
- S. Floyd and E. Kohler. Profile for datagram congestion control protocol (DCCP) congestion control ID 2: TCP-like congestion control. RFC 4341, Mar. 2006.Google Scholar
- S. Han, K. Jang, K. Park, and S. Moon. PacketShader: A GPU-accelerated software router. In ACM Conference on SIGCOMM, 2010.Google Scholar
Digital Library
- S. Han, K. Jang, A. Panda, S. Palkar, D. Han, and S. Ratnasamy. SoftNIC: A software NIC to augment hardware. Technical Report UCB/EECS-2015--155, EECS Department, University of California, Berkeley, May 2015. http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015--155.html.Google Scholar
- R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network I/O. In 32nd Annual International Symposium on Computer Architecture, ISCA, 2005.Google Scholar
Digital Library
- Intel Corporation. Intel data direct I/O technology (Intel DDIO): A primer, Feb. 2012. Revision 1.0. http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf.Google Scholar
- Intel Corporation. Flow APIs for hardware offloads. Open vSwitch Fall Conference Talk, Nov. 2014. http://openvswitch.org/support/ovscon2014/18/1430-hardware-based-packet-processing.pdf.Google Scholar
- Intel Corporation. Intel 82599 10 GbE controller datasheet, Oct. 2015. Revision 3.2. http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82599--10-gbe-controller-datasheet.pdf.Google Scholar
- K. Jang, S. Han, S. Han, S. Moon, and K. Park. SSLShader: Cheap SSL acceleration with commodity processors. In 8th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2011.Google Scholar
- A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In ACM Conference on SIGCOMM, 2014.Google Scholar
Digital Library
- S. Kim, S. Huh, X. Zhang, Y. Hu, A. Wated, E. Witchel, and M. Silberstein. GPUnet: Networking abstractions for GPU programs. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.Google Scholar
- E. Kohler, M. Handley, and S. Floyd. Datagram congestion control protocol (DCCP). RFC 4340, Mar. 2006.Google Scholar
- S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J. M. Patel, K. Ramasamy, and S. Taneja. Twitter Heron: Stream processing at scale. In 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2015.Google Scholar
Digital Library
- J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data.Google Scholar
- H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2014.Google Scholar
- N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. Openflow: Enabling innovation in campus networks. SIGCOMM Computer Communication Review, 38 (2): 69--74, Mar. 2008.Google Scholar
Digital Library
- C. Mitchell, Y. Geng, and J. Li. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In 2013 USENIX Annual Technical Conference, ATC, 2013.Google Scholar
- J. C. Mogul and K. K. Ramakrishnan. Eliminating receive livelock in an interrupt-driven kernel. ACM Transactions on Computer Systems, 15 (3): 217--252, Aug. 1997.Google Scholar
Digital Library
- ]molka:sandybridgeperfD. Molka, D. Hackenberg, and R. Schöne. Main memory and cache performance of Intel Sandy Bridge and AMD Bulldozer. In 2014 Workshop on Memory Systems Performance and Correctness, MSPC, 2014.Google Scholar
Digital Library
- Netronome. NFP-6xxx flow processor. https://netronome.com/product/nfp-6xxx/.Google Scholar
- S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, and B. Grot. Scale-out NUMA. In ph19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2014.Google Scholar
- PCI-SIG. Atomic operations. PCI-SIG Engineering Change Notice, Jan. 2008. https://www.pcisig.com/specifications/pciexpress/specifications/ECN_Atomic_Ops_080417.pdf.Google Scholar
- PCI-SIG. TLP processing hints. PCI-SIG Engineering Change Notice, Sept. 2008. https://www.pcisig.com/specifications/pciexpress/specifications/ECN_TPH_11Sept08.pdf.Google Scholar
- S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. In ph11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.Google Scholar
- B. Pfaff, J. Pettit, T. Koponen, E. Jackson, A. Zhou, J. Rajahalme, J. Gross, A. Wang, J. Stringer, P. Shelar, K. Amidon, and M. Casado. The design and implementation of Open vSwitch. In 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2015.Google Scholar
- I. Pratt and K. Fraser. Arsenic: A user-accessible Gigabit Ethernet interface. In 20th IEEE International Conference on Computer Communications, INFOCOM, 2001.Google Scholar
Cross Ref
- RDMA Consortium. Architectural specifications for RDMA over TCP/IP. http://www.rdmaconsortium.org/.Google Scholar
- M. Roesch. Snort - lightweight intrusion detection for networks. In 13th USENIX Conference on System Administration, LISA, 1999.Google Scholar
Digital Library
- M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, 10 (1): 26--52, Feb. 1992.Google Scholar
Digital Library
- Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang. FPMR: MapReduce framework on FPGA. In 18th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA, 2010.Google Scholar
Digital Library
- P. Shinde, A. Kaufmann, T. Roscoe, and S. Kaestle. We need to talk about NICs. In 14th Workshop on Hot Topics in Operating Systems, HOTOS, 2013.Google Scholar
- A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, and A. Vahdat. Jupiter rising: A decade of Clos topologies and centralized control in Google's datacenter network. In ACM Conference on SIGCOMM, 2015.Google Scholar
Digital Library
- W. Sun and R. Ricci. Fast and flexible: Parallel packet processing with GPUs and Click. In 9th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS, 2013.Google Scholar
Cross Ref
- A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. [email protected]. In 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2014.Google Scholar
Digital Library
- T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: a user-level network interface for parallel and distributed computing. In 15th ACM Symposium on Operating Systems Principles, SOSP, 1995.Google Scholar
Digital Library
- N. Zilberman, Y. Audzevich, G. Covington, and A. Moore. NetFPGA SUME: Toward 100 Gbps as research commodity. IEEE Micro, 34 (5): 32--41, Sept. 2014.Google Scholar
Cross Ref
Index Terms
High Performance Packet Processing with FlexNIC
Recommendations
High Performance Packet Processing with FlexNIC
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsThe recent surge of network I/O performance has put enormous pressure on memory and software I/O processing sub systems. We argue that the primary reason for high memory and processing overheads is the inefficient use of these resources by current ...
High Performance Packet Processing with FlexNIC
ASPLOS'16The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing sub systems. We argue that the primary reason for high memory and processing overheads is the inefficient use of these resources by current ...
FlexNIC: rethinking network DMA
HOTOS'15: Proceedings of the 15th USENIX conference on Hot Topics in Operating SystemsWe propose FlexNIC, a flexible network DMA interface that can be used by operating systems and applications alike to reduce packet processing overheads. The recent surge of network I/O performance has put enormous pressure on memory and software I/O ...







Comments