skip to main content
research-article
Open Access

High Performance Packet Processing with FlexNIC

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing sub systems. We argue that the primary reason for high memory and processing overheads is the inefficient use of these resources by current commodity network interface cards (NICs). We propose FlexNIC, a flexible network DMA interface that can be used by operating systems and applications alike to reduce packet processing overheads. FlexNIC allows services to install packet processing rules into the NIC, which then executes simple operations on packets while exchanging them with host memory. Thus, our proposal moves some of the packet processing traditionally done in software to the NIC, where it can be done flexibly and at high speed.

We quantify the potential benefits of FlexNIC by emulating the proposed FlexNIC functionality with existing hardware or in software. We show that significant gains in application performance are possible, in terms of both latency and throughput, for several widely used applications, including a key-value store, a stream processing system, and an intrusion detection system.

References

  1. http://ictf.cs.ucsb.edu/ictfdata/2010/dumps/.Google ScholarGoogle Scholar
  2. http://memcached.org/.Google ScholarGoogle Scholar
  3. G. Banga, P. Druschel, and J. C. Mogul. Resource containers: A new facility for resource management in server systems. In 3rd USENIX Symposium on Operating Systems Design and Implementation, OSDI, 1999.Google ScholarGoogle Scholar
  4. B. W. Barrett, R. Brightwell, S. Hemmert, K. Pedretti, K. Wheeler, K. Underwood, R. Riesen, A. B. Maccabee, and T. Hudson. The Portals 4.0.1 Network Programming Interface. Sandia National Laboratories, sand2013--3181 edition, Apr. 2013.Google ScholarGoogle Scholar
  5. A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. IX: A protected dataplane operating system for high throughput and low latency. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.Google ScholarGoogle Scholar
  6. N. L. Binkert, A. G. Saidi, and S. K. Reinhardt. Integrated network interfaces for high-bandwidth TCP/IP. In 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Blott, K. Karras, L. Liu, K. A. Vissers, J. Bar, and Z. István. Achieving 10Gbps line-rate key-value stores with FPGAs. In 5th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud, 2013.Google ScholarGoogle Scholar
  8. P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Izzard, F. Mujica, and M. Horowitz. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN. In ACM Conference on SIGCOMM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker. P4: Programming protocol-independent packet processors. SIGCOMM Computer Communication Review, 44 (3): 87--95, July 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cavium Corporation. OCTEON II CN68XX multi-core MIPS64 processors. http://www.cavium.com/pdfFiles/CN68XX_PB_Rev1.pdf.Google ScholarGoogle Scholar
  11. S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala. An FPGA Memcached appliance. In 21st ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Di Girolamo, P. Jolivet, K. Underwood, and T. Hoefler. Exploiting offload enabled network interfaces. In 23rd IEEE Symposium on High Performance Interconnects, HOTI, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2014.Google ScholarGoogle Scholar
  14. P. Druschel and G. Banga. Lazy receiver processing (LRP): A network subsystem architecture for server systems. In 2nd USENIX Symposium on Operating Systems Design and Implementation, OSDI, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Druschel, L. Peterson, and B. Davie. Experiences with a high-speed network adaptor: A software perspective. In ACM Conference on SIGCOMM, 1994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Flajslik and M. Rosenblum. Network interface design for low latency request-response protocols. In 2013 USENIX Annual Technical Conference, ATC, 2013.Google ScholarGoogle Scholar
  17. S. Floyd and E. Kohler. Profile for datagram congestion control protocol (DCCP) congestion control ID 2: TCP-like congestion control. RFC 4341, Mar. 2006.Google ScholarGoogle Scholar
  18. S. Han, K. Jang, K. Park, and S. Moon. PacketShader: A GPU-accelerated software router. In ACM Conference on SIGCOMM, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Han, K. Jang, A. Panda, S. Palkar, D. Han, and S. Ratnasamy. SoftNIC: A software NIC to augment hardware. Technical Report UCB/EECS-2015--155, EECS Department, University of California, Berkeley, May 2015. http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015--155.html.Google ScholarGoogle Scholar
  20. R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network I/O. In 32nd Annual International Symposium on Computer Architecture, ISCA, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Intel Corporation. Intel data direct I/O technology (Intel DDIO): A primer, Feb. 2012. Revision 1.0. http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf.Google ScholarGoogle Scholar
  22. Intel Corporation. Flow APIs for hardware offloads. Open vSwitch Fall Conference Talk, Nov. 2014. http://openvswitch.org/support/ovscon2014/18/1430-hardware-based-packet-processing.pdf.Google ScholarGoogle Scholar
  23. Intel Corporation. Intel 82599 10 GbE controller datasheet, Oct. 2015. Revision 3.2. http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82599--10-gbe-controller-datasheet.pdf.Google ScholarGoogle Scholar
  24. K. Jang, S. Han, S. Han, S. Moon, and K. Park. SSLShader: Cheap SSL acceleration with commodity processors. In 8th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2011.Google ScholarGoogle Scholar
  25. A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In ACM Conference on SIGCOMM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Kim, S. Huh, X. Zhang, Y. Hu, A. Wated, E. Witchel, and M. Silberstein. GPUnet: Networking abstractions for GPU programs. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.Google ScholarGoogle Scholar
  27. E. Kohler, M. Handley, and S. Floyd. Datagram congestion control protocol (DCCP). RFC 4340, Mar. 2006.Google ScholarGoogle Scholar
  28. S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J. M. Patel, K. Ramasamy, and S. Taneja. Twitter Heron: Stream processing at scale. In 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data.Google ScholarGoogle Scholar
  30. H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2014.Google ScholarGoogle Scholar
  31. N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. Openflow: Enabling innovation in campus networks. SIGCOMM Computer Communication Review, 38 (2): 69--74, Mar. 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Mitchell, Y. Geng, and J. Li. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In 2013 USENIX Annual Technical Conference, ATC, 2013.Google ScholarGoogle Scholar
  33. J. C. Mogul and K. K. Ramakrishnan. Eliminating receive livelock in an interrupt-driven kernel. ACM Transactions on Computer Systems, 15 (3): 217--252, Aug. 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. ]molka:sandybridgeperfD. Molka, D. Hackenberg, and R. Schöne. Main memory and cache performance of Intel Sandy Bridge and AMD Bulldozer. In 2014 Workshop on Memory Systems Performance and Correctness, MSPC, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Netronome. NFP-6xxx flow processor. https://netronome.com/product/nfp-6xxx/.Google ScholarGoogle Scholar
  36. S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, and B. Grot. Scale-out NUMA. In ph19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2014.Google ScholarGoogle Scholar
  37. PCI-SIG. Atomic operations. PCI-SIG Engineering Change Notice, Jan. 2008. https://www.pcisig.com/specifications/pciexpress/specifications/ECN_Atomic_Ops_080417.pdf.Google ScholarGoogle Scholar
  38. PCI-SIG. TLP processing hints. PCI-SIG Engineering Change Notice, Sept. 2008. https://www.pcisig.com/specifications/pciexpress/specifications/ECN_TPH_11Sept08.pdf.Google ScholarGoogle Scholar
  39. S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. In ph11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.Google ScholarGoogle Scholar
  40. B. Pfaff, J. Pettit, T. Koponen, E. Jackson, A. Zhou, J. Rajahalme, J. Gross, A. Wang, J. Stringer, P. Shelar, K. Amidon, and M. Casado. The design and implementation of Open vSwitch. In 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2015.Google ScholarGoogle Scholar
  41. I. Pratt and K. Fraser. Arsenic: A user-accessible Gigabit Ethernet interface. In 20th IEEE International Conference on Computer Communications, INFOCOM, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  42. RDMA Consortium. Architectural specifications for RDMA over TCP/IP. http://www.rdmaconsortium.org/.Google ScholarGoogle Scholar
  43. M. Roesch. Snort - lightweight intrusion detection for networks. In 13th USENIX Conference on System Administration, LISA, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, 10 (1): 26--52, Feb. 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang. FPMR: MapReduce framework on FPGA. In 18th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. P. Shinde, A. Kaufmann, T. Roscoe, and S. Kaestle. We need to talk about NICs. In 14th Workshop on Hot Topics in Operating Systems, HOTOS, 2013.Google ScholarGoogle Scholar
  47. A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, and A. Vahdat. Jupiter rising: A decade of Clos topologies and centralized control in Google's datacenter network. In ACM Conference on SIGCOMM, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. W. Sun and R. Ricci. Fast and flexible: Parallel packet processing with GPUs and Click. In 9th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  49. A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. [email protected]. In 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: a user-level network interface for parallel and distributed computing. In 15th ACM Symposium on Operating Systems Principles, SOSP, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. N. Zilberman, Y. Audzevich, G. Covington, and A. Moore. NetFPGA SUME: Toward 100 Gbps as research commodity. IEEE Micro, 34 (5): 32--41, Sept. 2014.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. High Performance Packet Processing with FlexNIC

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 51, Issue 4
          ASPLOS '16
          April 2016
          774 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2954679
          • Editor:
          • Andy Gill
          Issue’s Table of Contents
          • cover image ACM Conferences
            ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
            March 2016
            824 pages
            ISBN:9781450340915
            DOI:10.1145/2872362
            • General Chair:
            • Tom Conte,
            • Program Chair:
            • Yuanyuan Zhou

          Copyright © 2016 Owner/Author

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 March 2016

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!