skip to main content
research-article

Scalable Kernel TCP Design and Implementation for Short-Lived Connections

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

With the rapid growth of network bandwidth, increases in CPU cores on a single machine, and application API models demanding more short-lived connections, a scalable TCP stack is performance-critical. Although many clean-state designs have been proposed, production environments still call for a bottom-up parallel TCP stack design that is backward-compatible with existing applications.

We present Fastsocket, a BSD Socket-compatible and scalable kernel socket design, which achieves table-level connection partition in TCP stack and guarantees connection locality for both passive and active connections. Fastsocket architecture is a ground up partition design, from NIC interrupts all the way up to applications, which naturally eliminates various lock contentions in the entire stack. Moreover, Fastsocket maintains the full functionality of the kernel TCP stack and BSD-socket-compatible API, and thus applications need no modifications.

Our evaluations show that Fastsocket achieves a speedup of 20.4x on a 24-core machine under a workload of short-lived connections, outperforming the state-of-the-art Linux kernel TCP implementations. When scaling up to 24 CPU cores, Fastsocket increases the throughput of Nginx and HAProxy by 267% and 621% respectively compared with the base Linux kernel. We also demonstrate that Fastsocket can achieve scalability and preserve BSD socket API at the same time. Fastsocket is already deployed in the production environment of Sina WeiBo, serving 50 million daily active users and billions of requests per day.

References

  1. Haproxy. http://haproxy.1wt.eu/.Google ScholarGoogle Scholar
  2. http_load - multiprocessing http test client. http://www.acme.com/software/http_load/.Google ScholarGoogle Scholar
  3. Hypertext transfer protocol -- http/1.0. http://tools.ietf.org/html/rfc1945.Google ScholarGoogle Scholar
  4. Lock statistics. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/lockstat.txt.Google ScholarGoogle Scholar
  5. Tcp syn flooding attacks and common mitigations. http://tools.ietf.org/html/rfc4987.Google ScholarGoogle Scholar
  6. Rfs hardware acceleration. http://lwn.net/Articles/406489/, 2010.Google ScholarGoogle Scholar
  7. rfs: Receive flow steering. http://lwn.net/Articles/381955/, 2010.Google ScholarGoogle Scholar
  8. xps: Transmit packet steering. http://lwn.net/Articles/412062/, 2010.Google ScholarGoogle Scholar
  9. Intel 82599 10 gigabit ethernet controller datasheet. http://www.intel.com/content/www/us/en/ethernet-controllers/82599--10-gbe-controller-datasheet.html, 2014.Google ScholarGoogle Scholar
  10. Intel i/o acceleration technology: Intel network adapters user guide. http://web.mit.edu/cron/documentation/dell-server-admin/en/IntelNIC/ioat.htm, 2014.Google ScholarGoogle Scholar
  11. A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: A new os architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP '09, pages 29--44, New York, NY, USA, 2009. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. Ix: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 49--65, Berkeley, CA, USA, 2014. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. M. Bellovin. A look back at "security problems in the tcp/ip protocol suite". In Proceedings of the 20th Annual Computer Security Applications Conference, ACSAC '04, pages 229--249, Washington, DC, USA, 2004. IEEE Computer Society.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of linux scalability to many cores. In R. H. Arpaci-Dusseau and B. Chen, editors, OSDI, pages 1--16. USENIX Association, 2010.Google ScholarGoogle Scholar
  15. H.-k. J. Chu. Zero-copy tcp in solaris. In Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference, ATEC '96, pages 21--21, Berkeley, CA, USA, 1996. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E. Kohler. The scalable commutativity rule: designing scalable software for multicore processors. In M. Kaminsky and M. Dahlin, editors, SOSP, pages 1--17. ACM, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Communications. Introduction to openonload: Building application transparency and protocol conformance into application acceleration middleware. http://www.solarflare.com/content/userfiles/documents/solarflare_openonload_intropaper.pdf, 2011.Google ScholarGoogle Scholar
  18. D. Ely, S. Savage, and D. Wetherall. Alpine: A user-level infrastructure for network protocol development. In USITS. USENIX, 2001.Google ScholarGoogle Scholar
  19. G. R. Ganger, D. R. Engler, M. F. Kaashoek, H. M. Briceo, R. Hunt, T. Pinckney, and V. Inc. Fast and flexible application-level networking on exokernel systems. ACM Transactions on Computer Systems, 20:49--83, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. S. Gunawi, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Deploying safe user-level network services with icTCP. In OSDI, pages 317--332. USENIX Association, 2004.Google ScholarGoogle Scholar
  21. S. Han, S. Marshall, B.-G. Chun, and S. Ratnasamy. Megapipe: A new programming interface for scalable network i/o. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 135--148, Berkeley, CA, USA, 2012. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. Jeong, S. Wood, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park. mtcp: a highly scalable user-level tcp stack for multicore systems. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pages 489--502, Seattle, WA, Apr. 2014. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Kerrisk. The so_reuseport socket option. http://lwn.net/Articles/542629/, 2013.Google ScholarGoogle Scholar
  24. I. Krsul, A. Ganguly, J. Zhang, J. A. B. Fortes, and R. J. Figueiredo. Vmplants: Providing and managing virtual machine execution environments for grid computing. In Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC '04, pages 7--, Washington, DC, USA, 2004. IEEE Computer Society.Google ScholarGoogle Scholar
  25. G. Loukas and G. Öke. Protection against denial of service attacks. Comput. J., 53(7):1020--1037, Sept. 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. lwIP community. lwip - a lightweight tcp/ip stack - summary. http://savannah.nongnu.org/projects/lwip/, 2012.Google ScholarGoogle Scholar
  27. E. M. Nahum, D. J. Yates, J. F. Kurose, and D. Towsley. Performance issues in parallelized network protocols. In Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation, OSDI '94, Berkeley, CA, USA, 1994. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. S. Pai, P. Druschel, and W. Zwaenepoel. Io-lite: A unified i/o buffering and caching system. ACM Trans. Comput. Syst., 18(1):37--66, Feb. 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Pesterev, J. Strauss, N. Zeldovich, and R. T. Morris. Improving network connection locality on multicore systems. In P. Felber, F. Bellosa, and H. Bos, editors, EuroSys, pages 337--350. ACM, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 1--16, Berkeley, CA, USA, 2014. USENIX Association.Google ScholarGoogle Scholar
  31. L. Shalev, J. Satran, E. Borovik, and M. Ben-Yehuda. Isostack: Highly efficient network processing on dedicated cores. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC'10, pages 5--5, Berkeley, CA, USA, 2010. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Soares and M. Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In R. H. Arpaci-Dusseau and B. Chen, editors, OSDI, pages 33--46. USENIX Association, 2010.Google ScholarGoogle Scholar
  33. T. Suzumura, M. Tatsubori, S. Trent, A. Tozawa, and T. Onodera. Highly scalable web applications with zero-copy data transfer. In Proceedings of the 18th International Conference on World Wide Web, WWW '09, pages 921--930, New York, NY, USA, 2009. ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Tripathi. Fireengine new networking architecture for the solaris operating system. http://www.scn.rain.com/neighorn/PDF/FireEngine_WP.pdf, 2004.Google ScholarGoogle Scholar
  35. R. Uhlig, G. Neiger, D. Rodgers, A. L. Santoni, F. C. M. Martins, A. V. Anderson, S. M. Bennett, A. Kagi, F. H. Leung, and L. Smith. Intel virtualization technology. Computer, 38(5):48--56, May 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. N. M. Watson. Introduction to multithreading and multiprocessing in the freebsd smpng network stack. http://www.watson.org/robert/freebsd/netperf/20051027-eurobsdcon2005-netperf.pdf, 2005.Google ScholarGoogle Scholar
  37. D. F. Williamson, R. A. Parker, and J. S. Kendrick. The box plot: a simple visual method to interpret data. Annals of internal medicine, 110(11):916--921, 1989.Google ScholarGoogle Scholar
  38. P. Willmann, S. Rixner, and A. L. Cox. An evaluation of network stack parallelization strategies in modern operating systems. In Proceedings of the Annual Conference on USENIX '06 Annual Technical Conference, ATEC '06, pages 8--8, Berkeley, CA, USA, 2006. USENIX Association.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Woo, E. Jeong, S. Park, J. Lee, S. Ihm, and K. Park. Comparison of caching strategies in modern cellular backhaul networks. In H.-H. Chu, P. Huang, R. R. Choudhury, and F. Zhao, editors, MobiSys, pages 319--332. ACM, 2013.Google ScholarGoogle Scholar
  40. P. Xie, B. Wu, M. Liu, J. Harris, and C. Scheiman. Profiling the performance of tcp/ip on windows nt. Computer Performance and Dependability Symposium, International, 0:133, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  41. H. youb Kim and S. Rixner. Performance characterization of the freebsd network stack. http://www.cs.rice.edu/CS/Architecture/docs/kim-tr05.pdf, 2005.Google ScholarGoogle Scholar
  42. H. Zou, W. Wu, X.-H. Sun, P. DeMar, and M. Crawford. An evaluation of parallel optimization for opensolaris network stack. In Local Computer Networks (LCN), 2010 IEEE 35th Conference on, pages 296--299, Oct 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable Kernel TCP Design and Implementation for Short-Lived Connections

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 51, Issue 4
        ASPLOS '16
        April 2016
        774 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2954679
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
          March 2016
          824 pages
          ISBN:9781450340915
          DOI:10.1145/2872362
          • General Chair:
          • Tom Conte,
          • Program Chair:
          • Yuanyuan Zhou

        Copyright © 2016 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 March 2016

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!