skip to main content
10.1145/3359989.3365412acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

RSS++: load and state-aware receive side scaling

Published: 03 December 2019 Publication History

Abstract

While the current literature typically focuses on load-balancing among multiple servers, in this paper, we demonstrate the importance of load-balancing within a single machine (potentially with hundreds of CPU cores). In this context, we propose a new load-balancing technique (RSS++) that dynamically modifies the receive side scaling (RSS) indirection table to spread the load across the CPU cores in a more optimal way. RSS++ incurs up to 14x lower 95th percentile tail latency and orders of magnitude fewer packet drops compared to RSS under high CPU utilization. RSS++ allows higher CPU utilization and dynamic scaling of the number of allocated CPU cores to accommodate the input load while avoiding the typical 25% over-provisioning.
RSS++ has been implemented for both (i) DPDK and (ii) the Linux kernel. Additionally, we implement a new state migration technique which facilitates sharding and reduces contention between CPU cores accessing per-flow data. RSS++ keeps the flow-state by groups that can be migrated at once, leading to a 20% higher efficiency than a state of the art shared flow table.

References

[1]
João Taveira Araújo, Lorenzo Saino, Lennert Buytenhek, and Raul Landa. 2018. Balancing on the Edge: Transport Affinity Without Network State. In Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation (NSDI'18). USENIX Association, Berkeley, CA, USA, 111--124. http://dl.acm.org/citation.cfm?id=3307441.3307452
[2]
Tom Barbette. 2019. Public repository with all the experiments conducted in the course of the RSS++ paper. https://github.com/rsspp/experiments
[3]
Tom Barbette, Cyril Soldani, Romain Gaillard, and Laurent Mathy. 2018. Building a chain of high-speed VNFs in no time. In 2018 IEEE 19th International Conference on High Performance Switching and Routing (HPSR). IEEE, Bucharest, Romania, 8 pages.
[4]
Tom Barbette, Cyril Soldani, and Laurent Mathy. 2015. Fast userspace packet processing. In Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS '15). IEEE Computer Society, Washington, DC, USA, 5--16. http://dl.acm.org/citation.cfm?id=2772722.2772727
[5]
Adam Belay, George Prekas, Mia Primorac, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2016. The IX Operating System: Combining Low Latency, High Throughput, and Efficiency in a Protected Dataplane. ACM Trans. Comput. Syst. 34, 4, Article 11 (Dec. 2016), 39 pages.
[6]
Nicola Bonelli, Andrea Di Pietro, Stefano Giordano, and Gregorio Procissi. 2012. On Multi---gigabit Packet Capturing with Multi---core Commodity Hardware. In Proceedings of the 13th International Conference on Passive and Active Measurement (PAM'12). Springer-Verlag, Berlin, Heidelberg, 64--73.
[7]
Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, New York, NY, USA, 153--167.
[8]
Alexandros Daglis, Mark Sutherland, and Babak Falsafi. 2019. RPCValet: NI-Driven Tail-Aware Balancing of μs-Scale RPCs. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '19). USENIX Association, NY, USA, 35--48.
[9]
Diego Didona and Willy Zwaenepoel. 2019. Size-aware Sharding For Improving Tail Latencies in In-memory Key-value Stores. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 79--94. https://www.usenix.org/conference/nsdi19/presentation/didona
[10]
Mihai Dobrescu, Katerina Argyraki, Gianluca Iannaccone, Maziar Manesh, and Sylvia Ratnasamy. 2010. Controlling Parallelism in a Multicore Software Router. In Proceedings of the Workshop on Programmable Routers for Extensible Services of Tomorrow (PRESTO '10). ACM, New York, NY, USA, Article 2, 6 pages.
[11]
Alireza Farshin, Amir Roozbeh, Gerald Q. Maguire Jr., and Dejan Kostić. 2019. Make the Most out of Last Level Cache in Intel Processors. In Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys '19). ACM, New York, NY, USA, Article 8, 17 pages.
[12]
Massimo Gallo and Rafael Laufer. 2018. ClickNF: a modular stack for custom network functions. In USENIX Annual Technical Conference (ATC'18). USENIX Association, Boston, MA, 745--757. https://www.usenix.org/conference/atc18/presentation/gallo
[13]
Liang Guo and Ibrahim Matta. 2001. The war between mice and elephants. In Proceedings of the 9th International Conference on Network Protocols (ICNP). IEEE, Riverside, CA, USA, 180--188.
[14]
Sangjin Han, Keon Jang, Aurojit Panda, Shoumik Palkar, Dongsu Han, and Sylvia Ratnasamy. 2015. SoftNIC: A Software NIC to Augment Hardware. Technical Report UCB/EECS-2015-155. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-155.html
[15]
Sangjin Han, Scott Marshall, Byung-Gon Chun, and Sylvia Ratnasamy. 2012. MegaPipe: A New Programming Interface for Scalable Network I/O. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 135--148. http://dl.acm.org/citation.cfm?id=2387880.2387894
[16]
Toke Høiland-Jørgensen, Jesper Dangaard Brouer, Daniel Borkmann, John Fastabend, Tom Herbert, David Ahern, and David Miller. 2018. The eXpress Data Path: Fast Programmable Packet Processing in the Operating System Kernel. In Proceedings of the 14th International Conference on Emerging Networking EXperiments and Technologies (CoNEXT '18). ACM, New York, NY, USA, 54--66.
[17]
Intel. 2016. Ethernet Flow Director. http://www.intel.com/content/www/us/en/ethernet-controllers/ethernet-flow-director-video.html
[18]
Intel. 2016. Receive-Side Scaling (RSS). http://www.intel.com/content/dam/support/us/en/documents/network/sb/318483001us2.pdf
[19]
EunYoung Jeong, Shinae Woo, Muhammad Asim Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 489--502. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/jeong
[20]
Kostis Kaffes, Timothy Chong, Jack Tigar Humphries, Adam Belay, David Mazières, and Christos Kozyrakis. 2019. Shinjuku: Preemptive Scheduling for μSecond-scale Tail Latency. In Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation (NSDI'19). USENIX Association, Berkeley, CA, USA, 345--359. http://dl.acm.org/citation.cfm?id=3323234.3323264
[21]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2014. Using RDMA Efficiently for Key-value Services. In Proceedings of the 2014 ACM SIGCOMM conference (2014). ACM, NY, USA, 295--306.
[22]
Rishi Kapoor, George Porter, Malveeka Tewari, Geoffrey M. Voelker, and Amin Vahdat. 2012. Chronos: Predictable Low Latency for Data Center Applications. In Proceedings of the Third ACM Symposium on Cloud Computing (SoCC '12). ACM, New York, NY, USA, Article 9, 14 pages.
[23]
David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing (STOC '97). ACM, New York, NY, USA, 654--663.
[24]
Georgios P. Katsikas. 2018. NFV Service Chains at the Speed of the Underlying Commodity Hardware. Doctoral thesis. KTH Royal Institute of Technology, School of Electrical Engineering and Computer Science, Kista, Sweden. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233629 TRITA-EECS-AVL-2018:50.
[25]
Georgios P. Katsikas, Tom Barbette, Dejan Kostić, Rebecca Steinert, and Gerald Q. Maguire Jr. 2018. Metron: NFV Service Chains at the True Speed of the Underlying Hardware. In 15th USENIX Conference on Networked Systems Design and Implementation (NSDI'18). USENIX Association, Renton, WA, 171--186. https://www.usenix.org/system/fies/conference/nsdi18/nsdi18-katsikas.pdf
[26]
Georgios P. Katsikas, Marcel Enguehard, Maciej Kuźniar, Gerald Q. Maguire Jr., and Dejan Kostić. 2016. SNF: Synthesizing high performance NFV service chains. PeerJ Computer Science 2 (Nov. 2016), e98.
[27]
Georgios P. Katsikas, Gerald Q. Maguire Jr., and Dejan Kostić. 2017. Profiling and accelerating commodity NFV service chains with SCC. Journal of Systems and Software 127C (Feb. 2017), 12--27.
[28]
Antoine Kaufmann, SImon Peter, Naveen Kr. Sharma, Thomas Anderson, and Arvind Krishnamurthy. 2016. High Performance Packet Processing with FlexNIC. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS '16. ACM Press, Atlanta, Georgia, USA, 67--81.
[29]
Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M. Frans Kaashoek. 2000. The Click Modular Router. ACM Trans. Comput. Syst. 18, 3 (Aug. 2000), 263--297.
[30]
Richard Earl Korf. 2009. Multi-way number partitioning. In Proceedings of the 21st International Jont Conference on Artifical Intelligence (IJCAI'09). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 538--543. http://dl.acm.org/citation.cfm?id=1661445.1661531
[31]
KVM. 2019. Kernel-based Virtual Machine (KVM). https://www.linux-kvm.org/
[32]
Kun-chan Lan and John Heidemann. 2006. A measurement study of correlations of internet flow characteristics. Computer Networks 50, 1 (2006), 46--62.
[33]
Hyeontaek Lim, Donsu Han, David G Andersen, and Michael Kaminsky. 2014. MICA: A holistic approach to fast in-memory key-value storage. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14). USENIX Association, Seattle, WA, 429--444. https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/lim
[34]
Linux Foundation. 2019. Data Plane Development Kit (DPDK). http://www.dpdk.org
[35]
Libin Liu, Hong Xu, Zhixiong Niu, Peng Wang, and Dongsu Han. 2016. U-HAUL: Efficient state migration in NFV. In Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys '16). ACM, New York, NY, USA, Article 2, 8 pages.
[36]
DPDK mailing list. 2019. doc: fix update release notes for Mellanox drivers. https://mails.dpdk.org/archives/dev/2019-May/132128.html
[37]
Mellanox. 2019. Socket Direct. http://www.mellanox.com/page/products_dyn?product_family=285&mtag=socketdc
[38]
Aravind Menon and Willy Zwaenepoel. 2008. Optimizing TCP Receive Performance. In USENIX 2008 Annual Technical Conference (ATC'08). USENIX Association, Berkeley, CA, USA, 85--98. http://dl.acm.org/citation.cfm?id=1404014.1404021
[39]
Zviad Metreveli, Nickolai Zeldovich, and M. Frans Kaashoek. 2012. CPHASH: A Cache-partitioned Hash Table. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '12). ACM, New York, NY, USA, 319--320.
[40]
Netcope Technologies. 2018. Netcope P4 Cloud: Online P4 to FPGA synthesis and in-hardware evaluation. https://www.netcope.com/en/products/netcopep4
[41]
Netronome. 2017. Agilio LX 1×100GbE SmartNIC. https://www.netronome.com/m/documents/PB_Agilio_Lx_1x100GbE.pdf
[42]
Netronome. 2019. Agilio CX. https://www.netronome.com/products/agilio-cx/
[43]
Netronome. 2019. Agilio FX. https://www.netronome.com/products/agilio-fx/
[44]
Vladimir Olteanu, Alexandru Agache, Andrei Voinescu, and Costin Raiciu. 2018. Stateless datacenter load-balancing with Beamer. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Renton, WA, 125--139. https://www.usenix.org/conference/nsdi18/presentation/olteanu
[45]
Amy Ousterhout, Joshua Fried, Jonathan Behrens, Adam Belay, and Hari Balakrishnan. 2019. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In 16th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 361--378. https://www.usenix.org/conference/nsdi19/presentation/ousterhout
[46]
Rasmus Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. Journal of Algorithms 51, 2 (May 2004), 122--144.
[47]
Shoumik Palkar, Chang Lan, Sangjin Han, Keon Jang, Aurojit Panda, Sylvia Ratnasamy, Luigi Rizzo, and Scott Shenker. 2015. E2: a framework for NFV applications. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15). ACM, New York, NY, USA, 121--136.
[48]
Aurojit Panda, Sangjin Han, Keon Jang, Melvin Walls, Sylvia Ratnasamy, and Scott Shenker. 2016. NetBricks: Taking the V out of NFV. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI'16). USENIX Association, Berkeley, CA, USA, 203--216. http://dl.acm.org/citation.cfm?id=3026877.3026894
[49]
Aleksey Pesterev, Jacob Strauss, Nickolai Zeldovich, and Robert T Morris. 2012. Improving network connection locality on multicore systems. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys '12). ACM, New York, NY, USA, 337--350.
[50]
George Prekas, Marios Kogias, and Edouard Bugnion. 2017. ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). ACM, New York, NY, USA, 325--341.
[51]
Shriram Rajagopalan, Dan Williams, Hani Jamjoom, and Andrew Warfield. 2013. Split/Merge: System Support for Elastic Execution in Virtual Middleboxes. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI'13). USENIX Association, Berkeley, CA, USA, 227--240. http://dl.acm.org/citation.cfm?id=2482626.2482649
[52]
Ron Renwick. 2017. Increase Application Performance with SmartNICs. https://www.openstack.org/assets/presentation-media/Netronome-OpenStack-Summit-Marketplace-presentation.pdf
[53]
Amir Roozbeh, Joao Soares, Gerald Q. Maguire Jr., Fetahi Wuhib, Chakri Padala, Mozhgan Mahloo, Daniel Turull, Vinay Yadhav, and Dejan Kostić. 2018. Software-Defined "Hardware" Infrastructures: A Survey on Enabling Technologies and Open Research Directions. IEEE Communications Surveys & Tutorials 20, 3 (thirdquarter 2018), 2454--2485.
[54]
Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C. Snoeren. 2015. Inside the Social Network's (Datacenter) Network. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication (SIGCOMM '15). ACM, New York, NY, USA, 123--137.
[55]
Hugo Sadok, Miguel Elias M Campista, and Luís Henrique MK Costa. 2018. A Case for Spraying Packets in Software Middleboxes. In Proceedings of the 17th ACM Workshop on Hot Topics in Networks (HotNets '18). ACM, New York, NY, USA, 127--133.
[56]
Ethan L. Schreiber, Richard E. Korf, and Michael D. Moffitt. 2018. Optimal Multi-Way Number Partitioning. J. ACM 65, 4, Article 24 (July 2018), 61 pages.
[57]
Chen Sun, Jun Bi, Zhilong Zheng, Heng Yu, and Hongxin Hu. 2017. NFP: Enabling Network Function Parallelism in NFV. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). ACM, New York, NY, USA, 43--56.
[58]
Xiang Wang, Yang Hong, Harry Chang, KyoungSoo Park, Geoff Langdale, Jiayu Hu, and Heqing Zhu. 2019. Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 631--648. https://www.usenix.org/conference/nsdi19/presentation/wang-xiang
[59]
Wenji Wu, Phil DeMar, and Matt Crawford. 2010. Why can some advanced Ethernet NICs cause packet reordering? IEEE Communications Letters 15, 2 (February 2010), 253--255.
[60]
Kenichi Yasukata, Michio Honda, Douglas Santry, and Lars Eggert. 2016. StackMap: Low-latency Networking with the OS Stack and Dedicated NICs. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 43--56. http://dl.acm.org/citation.cfm?id=3026959.3026964
[61]
Wei Zhang, Jinho Hwang, Shriram Rajagopalan, K.K. Ramakrishnan, and Timothy Wood. 2016. Flurries: Countless Fine-Grained NFs for Flexible Per-Flow Customization. In Proceedings of the 12th International on Conference on Emerging Networking EXperiments and Technologies (CoNEXT '16). ACM, New York, NY, USA, 3--17.

Cited By

View all
  • (2025)COREC: Concurrent non-blocking single-queue receive driver for low latency networkingComputer Networks10.1016/j.comnet.2024.110982258(110982)Online publication date: Feb-2025
  • (2024)Automatic parallelization of software network functionsProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691910(1531-1550)Online publication date: 16-Apr-2024
  • (2024)SURE: Secure Unikernels Make Serverless Computing Rapid and EfficientProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698558(668-688)Online publication date: 20-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CoNEXT '19: Proceedings of the 15th International Conference on Emerging Networking Experiments And Technologies
December 2019
395 pages
ISBN:9781450369985
DOI:10.1145/3359989
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NIC indirection table
  2. intra server
  3. load-balancing
  4. state-aware

Qualifiers

  • Research-article

Funding Sources

Conference

CoNEXT '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 198 of 789 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)205
  • Downloads (Last 6 weeks)29
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)COREC: Concurrent non-blocking single-queue receive driver for low latency networkingComputer Networks10.1016/j.comnet.2024.110982258(110982)Online publication date: Feb-2025
  • (2024)Automatic parallelization of software network functionsProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691910(1531-1550)Online publication date: 16-Apr-2024
  • (2024)SURE: Secure Unikernels Make Serverless Computing Rapid and EfficientProceedings of the 2024 ACM Symposium on Cloud Computing10.1145/3698038.3698558(668-688)Online publication date: 20-Nov-2024
  • (2024)FAJITA: Stateful Packet Processing at 100 Million ppsProceedings of the ACM on Networking10.1145/36768612:CoNEXT3(1-22)Online publication date: 21-Aug-2024
  • (2024)NetBlocks: Staging Layouts for High-Performance Custom Host Network StacksProceedings of the ACM on Programming Languages10.1145/36563968:PLDI(467-491)Online publication date: 20-Jun-2024
  • (2024)Deploying Stateful Network Functions Efficiently using Large Language ModelsProceedings of the 4th Workshop on Machine Learning and Systems10.1145/3642970.3655836(28-38)Online publication date: 22-Apr-2024
  • (2024)Transparent Multicore Scaling of Single-Threaded Network FunctionsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629591(1142-1159)Online publication date: 22-Apr-2024
  • (2024)Programming Network Stack for Physical Middleboxes and Virtualized Network FunctionsIEEE/ACM Transactions on Networking10.1109/TNET.2023.330764132:2(971-986)Online publication date: Apr-2024
  • (2024)State Disaggregation for Dynamic Scaling of Network FunctionsIEEE/ACM Transactions on Networking10.1109/TNET.2023.328256232:1(81-95)Online publication date: Feb-2024
  • (2024)A High-Speed Robust Tunnel Using Forward Erasure Correction in Segment Routing2024 IEEE 32nd International Conference on Network Protocols (ICNP)10.1109/ICNP61940.2024.10858552(1-12)Online publication date: 28-Oct-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media