skip to main content
10.1145/3341301.3359657acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Snap: a microkernel approach to host networking

Published: 27 October 2019 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents our design and experience with a microkernel-inspired approach to host networking called Snap. Snap is a userspace networking system that supports Google's rapidly evolving needs with flexible modules that implement a range of network functions, including edge packet switching, virtualization for our cloud platform, traffic shaping policy enforcement, and a high-performance reliable messaging and RDMA-like service. Snap has been running in production for over three years, supporting the extensible communication needs of several large and critical systems.
    Snap enables fast development and deployment of new networking features, leveraging the benefits of address space isolation and the productivity of userspace software development together with support for transparently upgrading networking services without migrating applications off of a machine. At the same time, Snap achieves compelling performance through a modular architecture that promotes principled synchronization with minimal state sharing, and supports real-time scheduling with dynamic scaling of CPU resources through a novel kernel/userspace CPU scheduler co-design. Our evaluation demonstrates over 3x Gbps/core improvement compared to a kernel networking stack for RPC workloads, software-based RDMA-like performance of up to 5M IOPS/core, and transparent upgrades that are largely imperceptible to user applications. Snap is deployed to over half of our fleet of machines and supports the needs of numerous teams.

    References

    [1]
    Data plane development kit. http://www.dpdk.org.
    [2]
    Fast memcpy with SPDK and intel I/OAT DMA engine. https://software.intel.com/en-us/articles/fast-memcpy-using-spdk-and-ioat-dma-engine.
    [3]
    Github repository: Neper linux networking performance tool. https://github.com/google/neper.
    [4]
    grpc benchmarking. https://grpc.io/docs/guides/benchmarking.html.
    [5]
    Linux CFS scheduler. https://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt.
    [6]
    memfd manpage. http://man7.org/linux/man-pages/man2/memfd_create.2.html.
    [7]
    Nice levels in the linux scheduler. https://www.kernel.org/doc/Documentation/scheduler/sched-nice-design.txt.
    [8]
    Scaling in the linux networking stack. https://www.kernel.org/doc/Documentation/networking/scaling.txt.
    [9]
    Short waits with umwait. https://lwn.net/Articles/790920/.
    [10]
    M. J. Accetta, R. V. Baron, W. J. Bolosky, D. B. Golub, R. F. Rashid, A. Tevanian, and M. Young. Mach: A new kernel foundation for UNIX development. In Proceedings of the USENIX Summer Conference, Altanta, GA, USA, June 1986, pages 93--113, 1986.
    [11]
    M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pFabric: Minimal near-optimal datacenter transport. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13, pages 435--446, New York, NY, USA, 2013. ACM.
    [12]
    G. Banga, P. Druschel, and J. C. Mogul. Resource containers: A new facility for resource management in server systems. In Proceedings of the Third Symposium on Operating Systems Design and Implementation, OSDI '99, pages 45--58, Berkeley, CA, USA, 1999. USENIX Association.
    [13]
    A. Belay, A. Bittau, A. Mashtizadeh, D. Terei, D. Mazières, and C. Kozyrakis. Dune: Safe user-level access to privileged CPU features. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 335--348, Berkeley, CA, USA, 2012. USENIX Association.
    [14]
    A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Conference on (Operating Systems Design and Implementation, OSDI'14, pages 49--65, Berkeley, CA, USA, 2014. USENIX Association.
    [15]
    B. N. Bershad, T. E. Anderson, E. D. Lazowska, and H. M. Levy. Lightweight remote procedure call. ACM Trans. Comput. Syst., 8(1):37--55, Feb. 1990.
    [16]
    J. B. Chen and B. N. Bershad. The impact of operating system structure on memory system performance. In Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, SOSP '93, pages 120--133, New York, NY, USA, 1993. ACM.
    [17]
    L. Chen, K. Chen, W. Bai, and M. Alizadeh. Scheduling mix-flows in commodity datacenters with karuna. In Proceedings of the 2016 ACM SIGCOMM Conference, SIGCOMM '16, pages 174--187, New York, NY, USA, 2016. ACM.
    [18]
    C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In Proceedings of the 2Nd Conference on Symposium on Networked Systems Design & Implementation - Volume 2, NSDI'05, pages 273--286, Berkeley, CA, USA, 2005. USENIX Association.
    [19]
    M. Dalton, D. Schultz, J. Adriaens, A. Arefin, A. Gupta, B. Fahs, D. Rubinstein, E. C. Zermeno, E. Rubow, J. A. Docauer, J. Alpert, J. Ai, J. Olson, K. DeCabooter, M. de Kruijf, N. Hua, N. Lewis, N. Kasinadhuni, R. Crepaldi, S. Krishnan, S. Venkata, Y. Richter, U. Naik, and A. Vahdat. Andromeda: Performance, isolation, and velocity at scale in cloud network virtualization. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 373--387, Renton, WA, 2018. USENIX Association.
    [20]
    R. P. Draves, B. N. Bershad, R. F. Rashid, and R. W. Dean. Using continuations to implement thread management and communication in operating systems. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, SOSP '91, pages 122--136, New York, NY, USA, 1991. ACM.
    [21]
    P. Druschel and G. Banga. Lazy receiver processing (LRP): A network subsystem architecture for server systems. In Proceedings of the Second USENIX Symposium on Operating Systems Design and Implementation, OSDI '96, pages 261--275, New York, NY, USA, 1996. ACM.
    [22]
    D. E. Eisenbud, C. Yi, C. Contavalli, C. Smith, R. Kononov, E. Mann-Hielscher, A. Cilingiroglu, B. Cheyney, W. Shang, and J. D. Hosein. Maglev: A fast and reliable software network load balancer. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, NSDI'16, pages 523--535, Berkeley, CA, USA, 2016. USENIX Association.
    [23]
    D. R. Engler, M. F. Kaashoek, and J. O'Toole, Jr. Exokernel: An operating system architecture for application-level resource management. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, SOSP '95, pages 251--266, New York, NY, USA, 1995. ACM.
    [24]
    P. X. Gao, A. Narayan, G. Kumar, R. Agarwal, S. Ratnasamy, and S. Shenker. pHost: Distributed near-optimal datacenter transport over commodity network fabric. In Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies, CoNEXT '15, pages 1:1--1:12, New York, NY, USA, 2015. ACM.
    [25]
    D. Gruss, M. Lipp, M. Schwarz, R. Fellner, C. Maurice, and S. Mangard. KASLR is dead: Long live KASLR. In Engineering Secure Software and Systems - 9th International Symposium, ESSoS 2017, Proceedings, volume 10379 LNCS of Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pages 161--176, Italy, 2017. Springer-Verlag Italia.
    [26]
    S. Han, K. Jang, A. Panda, S. Palkar, D. Han, and S. Ratnasamy. SoftNIC: A software NIC to augment hardware. Technical Report UCB/EECS-2015-155, EECS Department, University of California, Berkeley, May 2015.
    [27]
    M. Handley, C. Raiciu, A. Agache, A. Voinescu, A. W. Moore, G. Antichi, and M. Wójcik. Re-architecting datacenter networks and stacks for low latency and high performance. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM '17, pages 29--42, New York, NY, USA, 2017. ACM.
    [28]
    P. B. Hansen. The nucleus of a multiprogramming system. Commun. ACM, 13(4):238--241, Apr. 1970.
    [29]
    H. Härtig, M. Hohmuth, J. Liedtke, S. Schönberg, and J. Wolter. The performance of μ-kernel-based systems. In Proceedings of the Sixteenth ACM Symposium on Operating Systems Principles, SOSP '97, pages 66--77, New York, NY, USA, 1997. ACM.
    [30]
    E. Jeong, S. Wood, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park. mTCP: a highly scalable user-level TCP stack for multicore systems. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pages 489--502, Seattle, WA, 2014. USENIX Association.
    [31]
    K. Kaffes, T. Chong, J. T. Humphries, A. Belay, D. Mazières, and C. Kozyrakis. Shinjuku: Preemptive scheduling for usecond-scale tail latency. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 345--360, Boston, MA, 2019. USENIX Association.
    [32]
    A. Kalia, M. Kaminsky, and D. Andersen. Datacenter RPCs can be general and fast. In 16th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2019, Boston, MA, February 26--28, 2019., pages 1--16, 2019.
    [33]
    A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, pages 295--306, New York, NY, USA, 2014. ACM.
    [34]
    A. Kalia, M. Kaminsky, and D. G. Andersen. FaSST: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram rpcs. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'16, pages 185--201, Berkeley, CA, USA, 2016. USENIX Association.
    [35]
    A. Kaufmann, T. Stamler, S. Peter, N. K. Sharma, A. Krishnamurthy, and T. Anderson. TAS: TCP acceleration as an OS service. In Proceedings of the Fourteenth EuroSys Conference 2019, EuroSys '19, pages 24:1--24:16, New York, NY, USA, 2019. ACM.
    [36]
    J. Khalid, E. Rozner, W. Felter, C. Xu, K. Rajamani, A. Ferreira, and A. Akella. Iron: Isolating network-based CPU in container environments. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 313--328, Renton, WA, 2018. USENIX Association.
    [37]
    P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom. Spectre attacks: Exploiting speculative execution. In 40th IEEE Symposium on Security and Privacy (S&P'19), 2019.
    [38]
    C. Kulkarni, S. Moore, M. Naqvi, T. Zhang, R. Ricci, and R. Stutsman. Splinter: Bare-metal extensions for multi-tenant low-latency storage. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 627--643, Carlsbad, CA, Oct. 2018. USENIX Association.
    [39]
    A. Kumar, S. Jain, U. Naik, A. Raghuraman, N. Kasinadhuni, E. C. Zermeno, C. S. Gunn, J. Ai, B. Carlin, M. Amarandei-Stavila, M. Robin, A. Siganporia, S. Stuart, and A. Vahdat. BwE: Flexible, hierarchical bandwidth allocation for WAN distributed computing. In Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM '15, pages 1--14, New York, NY, USA, 2015. ACM.
    [40]
    A. Langley, A. Riddoch, A. Wilk, A. Vicente, C. Krasic, D. Zhang, F. Yang, F. Kouranov, I. Swett, J. Iyengar, J. Bailey, J. Dorfman, J. Roskind, J. Kulik, P. Westin, R. Tenneti, R. Shade, R. Hamilton, V. Vasiliev, W.-T. Chang, and Z. Shi. The QUIC transport protocol: Design and internet-scale deployment. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM '17, pages 183--196, New York, NY, USA, 2017. ACM.
    [41]
    J. Liedtke. Improving IPC by kernel design. In Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, SOSP '93, pages 175--188, New York, NY, USA, 1993. ACM.
    [42]
    H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 429--444, Berkeley, CA, USA, 2014. USENIX Association.
    [43]
    M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg. Meltdown: Reading kernel memory from user space. In 27th USENIX Security Symposium (USENIX Security 18), 2018.
    [44]
    Z. Mi, D. Li, Z. Yang, X. Wang, and H. Chen. SkyBridge: Fast and secure inter-process communication for microkernels. In Proceedings of the Fourteenth EuroSys Conference 2019, EuroSys '19, pages 9:1--9:15, New York, NY, USA, 2019. ACM.
    [45]
    R. Mittal, T. Lam, N. Dukkipati, E. Blem, H. Wassel, M. Ghobadi, A. Vahdat, Y. Wang, D. Wetherall, and D. Zats. Timely: RTT-based congestion control for the datacenter. In Sigcomm '15, 2015.
    [46]
    B. Montazeri, Y. Li, M. Alizadeh, and J. K. Ousterhout. Homa: A receiver-driven low-latency transport protocol using network priorities. CoRR, abs/1803.09615, 2018.
    [47]
    R. Morris, E. Kohler, J. Jannotti, and M. F. Kaashoek. The click modular router. In Proceedings of the Seventeenth ACM Symposium on Operating Systems Principles, SOSP '99, pages 217--231, New York, NY, USA, 1999. ACM.
    [48]
    A. Ousterhout, J. Fried, J. Behrens, A. Belay, and H. Balakrishnan. Shenango: Achieving high CPU efficiency for latency-sensitive data-center workloads. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 361--378, Boston, MA, 2019. USENIX Association.
    [49]
    J. Perry, A. Ousterhout, H. Balakrishnan, D. Shah, and H. Fugal. Fast-pass: A centralized "zero-queue" datacenter network. In Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM '14, pages 307--318, New York, NY, USA, 2014. ACM.
    [50]
    S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. pages 1--16, 2014.
    [51]
    G. Prekas, M. Kogias, and E. Bugnion. Zygos: Achieving low tail latency for microsecond-scale networked tasks. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17, pages 325--341, New York, NY, USA, 2017. ACM.
    [52]
    G. Prekas, M. Primorac, A. Belay, C. Kozyrakis, and E. Bugnion. Energy proportionality and workload consolidation for latency-critical applications. In Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC 2015, Kohala Coast, Hawaii, USA, August 27--29, 2015, pages 342--355, 2015.
    [53]
    H. Qin, Q. Li, J. Speiser, P. Kraft, and J. Ousterhout. Arachne: Core-aware thread management. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI'18, pages 145--160, Berkeley, CA, USA, 2018. USENIX Association.
    [54]
    G. J. Regnier, S. Makineni, R. Illikkal, R. R. Iyer, D. B. Minturn, R. Huggahalli, D. Newell, L. S. Cline, and A. P. Foong. TCP onloading for data center servers. IEEE Computer, 37(11):48--58, 2004.
    [55]
    L. Rizzo. Netmap: A novel framework for fast packet I/O. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC'12, pages 9--9, Berkeley, CA, USA, 2012. USENIX Association.
    [56]
    L. Shalev, V. Makhervaks, Z. Machulsky, G. Biran, J. Satran, M. Ben-Yehuda, and I. Shimony. Loosely coupled TCP acceleration architecture. In Hot Interconnects, pages 3--8. IEEE Computer Society, 2006.
    [57]
    L. Shalev, J. Satran, E. Borovik, and M. Ben-Yehuda. IsoStack: Highly efficient network processing on dedicated cores. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC'10, pages 5--5, Berkeley, CA, USA, 2010. USENIX Association.
    [58]
    L. Soares and M. Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 33--46, Berkeley, CA, USA, 2010. USENIX Association.
    [59]
    C. A. Thekkath, T. D. Nguyen, E. Moy, and E. D. Lazowska. Implementing network protocols at user level. IEEE/ACM Trans. Nettw., 1(5):554--565, Oct. 1993.
    [60]
    S.-Y. Tsai and Y. Zhang. LITE kernel RDMA support for datacenter applications. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17, pages 306--324, New York, NY, USA, 2017. ACM.
    [61]
    W. Wulf, E. Cohen, W. Corwin, A. Jones, R. Levin, C. Pierson, and F. Pollack. HYDRA: The kernel of a multiprocessor operating system. Commun. ACM, 17(6):337--345, June 1974.
    [62]
    K. Yap, M. Motiwala, J. Rahe, S. Padgett, M. Holliman, G. Baldus, M. Hines, T. Kim, A. Narayanan, A. Jain, V. Lin, C. Rice, B. Rogan, A. Singh, B. Tanaka, M. Verma, P. Sood, M. Tariq, M. Tierney, D. Trumic, V. Valancius, C. Ying, M. Kallahalla, B. Koley, and A. Vahdat. Taking the edge off with espresso: Scale, reliability and programmability for global internet peering. 2017.

    Cited By

    View all
    • (2024)CloudRIC: Open Radio Access Network (O-RAN) Virtualization with Shared Heterogeneous ComputingProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649381(558-572)Online publication date: 29-May-2024
    • (2024)Enoki: High Velocity Linux Kernel Scheduler DevelopmentProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629569(962-980)Online publication date: 22-Apr-2024
    • (2024)CC-NIC: a Cache-Coherent Interface to the NICProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624868(52-68)Online publication date: 27-Apr-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems Principles
    October 2019
    615 pages
    ISBN:9781450368735
    DOI:10.1145/3341301
    This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

    Sponsors

    In-Cooperation

    • USENIX Assoc: USENIX Assoc

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. RDMA
    2. datacenter
    3. microkernel
    4. network stack

    Qualifiers

    • Research-article

    Conference

    SOSP '19
    Sponsor:
    SOSP '19: ACM SIGOPS 27th Symposium on Operating Systems Principles
    October 27 - 30, 2019
    Ontario, Huntsville, Canada

    Acceptance Rates

    Overall Acceptance Rate 131 of 716 submissions, 18%

    Upcoming Conference

    SOSP '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)591
    • Downloads (Last 6 weeks)83

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)CloudRIC: Open Radio Access Network (O-RAN) Virtualization with Shared Heterogeneous ComputingProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649381(558-572)Online publication date: 29-May-2024
    • (2024)Enoki: High Velocity Linux Kernel Scheduler DevelopmentProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629569(962-980)Online publication date: 22-Apr-2024
    • (2024)CC-NIC: a Cache-Coherent Interface to the NICProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624868(52-68)Online publication date: 27-Apr-2024
    • (2023)Treehouse: A Case For Carbon-Aware Datacenter SoftwareACM SIGEnergy Energy Informatics Review10.1145/3630614.36306263:3(64-70)Online publication date: 25-Oct-2023
    • (2023)Kernel vs. User-Level Networking: Don't Throw Out the Stack with the InterruptsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36267807:3(1-25)Online publication date: 7-Dec-2023
    • (2023)Of Apples and OrangesProceedings of the 22nd ACM Workshop on Hot Topics in Networks10.1145/3626111.3628186(1-8)Online publication date: 28-Nov-2023
    • (2023)CapybaraProceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3609510.3609813(30-36)Online publication date: 24-Aug-2023
    • (2023)Aurelia: CXL Fabric with TentacleProceedings of the 4th Workshop on Resource Disaggregation and Serverless10.1145/3605181.3626287(29-36)Online publication date: 23-Oct-2023
    • (2023)Poster: Understanding Interactions between Overload Control Core Allocation in Low-Latency Network StacksProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3610844(1159-1161)Online publication date: 10-Sep-2023
    • (2023)Improving Network Availability with Protective ReRouteProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604867(684-695)Online publication date: 10-Sep-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media