Abstract
With the rapid growth of network bandwidth, increases in CPU cores on a single machine, and application API models demanding more short-lived connections, a scalable TCP stack is performance-critical. Although many clean-state designs have been proposed, production environments still call for a bottom-up parallel TCP stack design that is backward-compatible with existing applications.
We present Fastsocket, a BSD Socket-compatible and scalable kernel socket design, which achieves table-level connection partition in TCP stack and guarantees connection locality for both passive and active connections. Fastsocket architecture is a ground up partition design, from NIC interrupts all the way up to applications, which naturally eliminates various lock contentions in the entire stack. Moreover, Fastsocket maintains the full functionality of the kernel TCP stack and BSD-socket-compatible API, and thus applications need no modifications.
Our evaluations show that Fastsocket achieves a speedup of 20.4x on a 24-core machine under a workload of short-lived connections, outperforming the state-of-the-art Linux kernel TCP implementations. When scaling up to 24 CPU cores, Fastsocket increases the throughput of Nginx and HAProxy by 267% and 621% respectively compared with the base Linux kernel. We also demonstrate that Fastsocket can achieve scalability and preserve BSD socket API at the same time. Fastsocket is already deployed in the production environment of Sina WeiBo, serving 50 million daily active users and billions of requests per day.
- Haproxy. http://haproxy.1wt.eu/.Google Scholar
- http_load - multiprocessing http test client. http://www.acme.com/software/http_load/.Google Scholar
- Hypertext transfer protocol -- http/1.0. http://tools.ietf.org/html/rfc1945.Google Scholar
- Lock statistics. https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/lockstat.txt.Google Scholar
- Tcp syn flooding attacks and common mitigations. http://tools.ietf.org/html/rfc4987.Google Scholar
- Rfs hardware acceleration. http://lwn.net/Articles/406489/, 2010.Google Scholar
- rfs: Receive flow steering. http://lwn.net/Articles/381955/, 2010.Google Scholar
- xps: Transmit packet steering. http://lwn.net/Articles/412062/, 2010.Google Scholar
- Intel 82599 10 gigabit ethernet controller datasheet. http://www.intel.com/content/www/us/en/ethernet-controllers/82599--10-gbe-controller-datasheet.html, 2014.Google Scholar
- Intel i/o acceleration technology: Intel network adapters user guide. http://web.mit.edu/cron/documentation/dell-server-admin/en/IntelNIC/ioat.htm, 2014.Google Scholar
- A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: A new os architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP '09, pages 29--44, New York, NY, USA, 2009. ACM.Google Scholar
Digital Library
- A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. Ix: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 49--65, Berkeley, CA, USA, 2014. USENIX Association.Google Scholar
Digital Library
- S. M. Bellovin. A look back at "security problems in the tcp/ip protocol suite". In Proceedings of the 20th Annual Computer Security Applications Conference, ACSAC '04, pages 229--249, Washington, DC, USA, 2004. IEEE Computer Society.Google Scholar
Digital Library
- S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of linux scalability to many cores. In R. H. Arpaci-Dusseau and B. Chen, editors, OSDI, pages 1--16. USENIX Association, 2010.Google Scholar
- H.-k. J. Chu. Zero-copy tcp in solaris. In Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference, ATEC '96, pages 21--21, Berkeley, CA, USA, 1996. USENIX Association.Google Scholar
Digital Library
- A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E. Kohler. The scalable commutativity rule: designing scalable software for multicore processors. In M. Kaminsky and M. Dahlin, editors, SOSP, pages 1--17. ACM, 2013.Google Scholar
Digital Library
- S. Communications. Introduction to openonload: Building application transparency and protocol conformance into application acceleration middleware. http://www.solarflare.com/content/userfiles/documents/solarflare_openonload_intropaper.pdf, 2011.Google Scholar
- D. Ely, S. Savage, and D. Wetherall. Alpine: A user-level infrastructure for network protocol development. In USITS. USENIX, 2001.Google Scholar
- G. R. Ganger, D. R. Engler, M. F. Kaashoek, H. M. Briceo, R. Hunt, T. Pinckney, and V. Inc. Fast and flexible application-level networking on exokernel systems. ACM Transactions on Computer Systems, 20:49--83, 2000.Google Scholar
Digital Library
- H. S. Gunawi, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Deploying safe user-level network services with icTCP. In OSDI, pages 317--332. USENIX Association, 2004.Google Scholar
- S. Han, S. Marshall, B.-G. Chun, and S. Ratnasamy. Megapipe: A new programming interface for scalable network i/o. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 135--148, Berkeley, CA, USA, 2012. USENIX Association.Google Scholar
Digital Library
- E. Jeong, S. Wood, M. Jamshed, H. Jeong, S. Ihm, D. Han, and K. Park. mtcp: a highly scalable user-level tcp stack for multicore systems. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pages 489--502, Seattle, WA, Apr. 2014. USENIX Association.Google Scholar
Digital Library
- M. Kerrisk. The so_reuseport socket option. http://lwn.net/Articles/542629/, 2013.Google Scholar
- I. Krsul, A. Ganguly, J. Zhang, J. A. B. Fortes, and R. J. Figueiredo. Vmplants: Providing and managing virtual machine execution environments for grid computing. In Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC '04, pages 7--, Washington, DC, USA, 2004. IEEE Computer Society.Google Scholar
- G. Loukas and G. Öke. Protection against denial of service attacks. Comput. J., 53(7):1020--1037, Sept. 2010.Google Scholar
Digital Library
- lwIP community. lwip - a lightweight tcp/ip stack - summary. http://savannah.nongnu.org/projects/lwip/, 2012.Google Scholar
- E. M. Nahum, D. J. Yates, J. F. Kurose, and D. Towsley. Performance issues in parallelized network protocols. In Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation, OSDI '94, Berkeley, CA, USA, 1994. USENIX Association.Google Scholar
Digital Library
- V. S. Pai, P. Druschel, and W. Zwaenepoel. Io-lite: A unified i/o buffering and caching system. ACM Trans. Comput. Syst., 18(1):37--66, Feb. 2000.Google Scholar
Digital Library
- A. Pesterev, J. Strauss, N. Zeldovich, and R. T. Morris. Improving network connection locality on multicore systems. In P. Felber, F. Bellosa, and H. Bos, editors, EuroSys, pages 337--350. ACM, 2012.Google Scholar
Digital Library
- S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI'14, pages 1--16, Berkeley, CA, USA, 2014. USENIX Association.Google Scholar
- L. Shalev, J. Satran, E. Borovik, and M. Ben-Yehuda. Isostack: Highly efficient network processing on dedicated cores. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC'10, pages 5--5, Berkeley, CA, USA, 2010. USENIX Association.Google Scholar
Digital Library
- L. Soares and M. Stumm. FlexSC: Flexible system call scheduling with exception-less system calls. In R. H. Arpaci-Dusseau and B. Chen, editors, OSDI, pages 33--46. USENIX Association, 2010.Google Scholar
- T. Suzumura, M. Tatsubori, S. Trent, A. Tozawa, and T. Onodera. Highly scalable web applications with zero-copy data transfer. In Proceedings of the 18th International Conference on World Wide Web, WWW '09, pages 921--930, New York, NY, USA, 2009. ACM.Google Scholar
Digital Library
- S. Tripathi. Fireengine new networking architecture for the solaris operating system. http://www.scn.rain.com/neighorn/PDF/FireEngine_WP.pdf, 2004.Google Scholar
- R. Uhlig, G. Neiger, D. Rodgers, A. L. Santoni, F. C. M. Martins, A. V. Anderson, S. M. Bennett, A. Kagi, F. H. Leung, and L. Smith. Intel virtualization technology. Computer, 38(5):48--56, May 2005.Google Scholar
Digital Library
- R. N. M. Watson. Introduction to multithreading and multiprocessing in the freebsd smpng network stack. http://www.watson.org/robert/freebsd/netperf/20051027-eurobsdcon2005-netperf.pdf, 2005.Google Scholar
- D. F. Williamson, R. A. Parker, and J. S. Kendrick. The box plot: a simple visual method to interpret data. Annals of internal medicine, 110(11):916--921, 1989.Google Scholar
- P. Willmann, S. Rixner, and A. L. Cox. An evaluation of network stack parallelization strategies in modern operating systems. In Proceedings of the Annual Conference on USENIX '06 Annual Technical Conference, ATEC '06, pages 8--8, Berkeley, CA, USA, 2006. USENIX Association.Google Scholar
Digital Library
- S. Woo, E. Jeong, S. Park, J. Lee, S. Ihm, and K. Park. Comparison of caching strategies in modern cellular backhaul networks. In H.-H. Chu, P. Huang, R. R. Choudhury, and F. Zhao, editors, MobiSys, pages 319--332. ACM, 2013.Google Scholar
- P. Xie, B. Wu, M. Liu, J. Harris, and C. Scheiman. Profiling the performance of tcp/ip on windows nt. Computer Performance and Dependability Symposium, International, 0:133, 2000.Google Scholar
Cross Ref
- H. youb Kim and S. Rixner. Performance characterization of the freebsd network stack. http://www.cs.rice.edu/CS/Architecture/docs/kim-tr05.pdf, 2005.Google Scholar
- H. Zou, W. Wu, X.-H. Sun, P. DeMar, and M. Crawford. An evaluation of parallel optimization for opensolaris network stack. In Local Computer Networks (LCN), 2010 IEEE 35th Conference on, pages 296--299, Oct 2010.Google Scholar
Digital Library
Index Terms
Scalable Kernel TCP Design and Implementation for Short-Lived Connections
Recommendations
Scalable Kernel TCP Design and Implementation for Short-Lived Connections
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsWith the rapid growth of network bandwidth, increases in CPU cores on a single machine, and application API models demanding more short-lived connections, a scalable TCP stack is performance-critical. Although many clean-state designs have been proposed,...
Scalable Kernel TCP Design and Implementation for Short-Lived Connections
ASPLOS'16With the rapid growth of network bandwidth, increases in CPU cores on a single machine, and application API models demanding more short-lived connections, a scalable TCP stack is performance-critical. Although many clean-state designs have been proposed,...
TCP tunnels: avoiding congestion collapse
LCN '00: Proceedings of the 25th Annual IEEE Conference on Local Computer NetworksThis paper examines the attributes of TCP tunnels which are TCP circuits that carry IP packets and benefit from the congestion control mechanism of TCP/IP. The deployment of TCP tunnels reduces the many flows situation on the Internet to that of a few ...







Comments