skip to main content
article

Evaluating network processing efficiency with processor partitioning and asynchronous I/O

Published:18 April 2006Publication History
Skip Abstract Section

Abstract

Applications requiring high-speed TCP/IP processing can easily saturate a modern server. We and others have previously suggested alleviating this problem in multiprocessor environments by dedicating a subset of the processors to perform network packet processing. The remaining processors perform only application computation, thus eliminating contention between these functions for processor resources. Applications interact with packet processing engines (PPEs) using an asynchronous I/O (AIO) programming interface which bypasses the operating system. A key attraction of this overall approach is that it exploits the architectural trend toward greater thread-level parallelism in future systems based on multi-core processors. In this paper, we conduct a detailed experimental performance analysis comparing this approach to a best-practice configured Linux baseline system.We have built a prototype system implementing this architecture, ETA+AIO (Embedded Transport Acceleration with Asynchronous I/O), and ported a high-performance web-server to the AIO interface. Although the prototype uses modern single-core CPUs instead of future multi-core CPUs, an analysis of its performance can reveal important properties of this approach. Our experiments show that the ETA+AIO prototype has a modest advantage over the baseline Linux system in packet processing efficiency, consuming fewer CPU cycles to sustain the same throughput. This efficiency advantage enables the ETA+AIO prototype to achieve higher peak throughput than the baseline system, but only for workloads where the mix of packet processing and application processing approximately matches the allocation of CPUs in the ETA+AIO system thereby enabling high utilization of all the CPUs. Detailed analysis shows that the efficiency advantage of the ETA+AIO prototype, which uses one PPE CPU, comes from avoiding multiprocessing overheads in packet processing, lower overhead of our AIO interface compared to standard sockets, and reduced cache misses due to processor partitioning.

References

  1. Apache. URL www.apache.org.Google ScholarGoogle Scholar
  2. OProfile. URL oprofile.sourceforge.net/news/.Google ScholarGoogle Scholar
  3. RDMA Consortium. URL www.rdmaconsortium.org.Google ScholarGoogle Scholar
  4. Sockets API Extensions. URL www.opengroup.org.Google ScholarGoogle Scholar
  5. Zeus Technology. URL www.zeus.co.uk.Google ScholarGoogle Scholar
  6. Design notes on asynchronous I/O (aio) for Linux, 2002. URL lse.sourceforge.net/io/aionotes.txt.Google ScholarGoogle Scholar
  7. The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2003 Edition.Google ScholarGoogle Scholar
  8. V. Anand and B. Hartner. TCP/IP network stack performance in Linux kernel 2.4 and 2.5. In Proceedings of the Linux Symposium, pages 8--30. Ottawa, Ontario, Canada, July 2003Google ScholarGoogle Scholar
  9. B. S. Ang. An evaluation of an attempt at offloading TCP/IP processing onto an i960rn-based NIC. Technical Report HPL-2001-8, HP Labs, Palo Alto, CA, Jan 2001.Google ScholarGoogle Scholar
  10. G. Banga, J. Mogul, and P. Druschel. A scalable and explicit event delivery mechanism for UNIX. In Proceedings of the 1999 USENIX Annual Technical Conference. Monterey, CA, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. V. Bhatt. Creating a PCI Express interconnect. URL www.pcisig.com/specifications/pciexpress/technical_library/pciexpress_whitepaper.pdf.Google ScholarGoogle Scholar
  12. N. L. Binkert, L. R. Hsu, A. G. Saidi, R. G. Dreslinski, A. L. Schultz, and S. K. Reinhardt. Performance analysis of system overheads in TCP/IP workloads. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques. St. Louis, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Brecht and M. Ostrowski. Exploring the performance of select-based internet servers. Technical Report HPL-2001-314, HP Labs, November 2001.Google ScholarGoogle Scholar
  14. T. Brecht, D. Pariag, and L. Gammo. accept()able strategies for improving web server performance. In Proceedings of the 2004 USENIX Annual Technical Conference. Boston, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Clark, V. Jacobson, J. Romkey, and H. Salwen. An analysis of TCP processing overhead. IEEE Communications Magazine, 27(6):23--29, June 1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Z. Ditta, G. Parulkar, and J. Cox Jr. The APIC approach to high performance network interface design: Protected and other techniques. In Proceedings of IEEE INFOCOM '97, volume 2, pages 7--11, April 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Dunning, G. Regnier, G. McAlpine, D. Cameron, B. Shubert, F. Berry, A. M. Merritt, E. Gronke, and C. Dodd. The Virtual Interface Architecture. IEEE Micro, 18(2):66--76, March-April 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Elmeleegy, A. Chanda, A. L. Cox, and W. Zwaenepoel. Lazy asynchronous I/O for event-driven servers. In Proceedings of the 2004 USENIX Annual Technical Conference. Boston, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Foong, J. Fung, and D. Newell. An in-depth analysis of the impact of processor affinity on network performance. In IEEE International Conference on Networks, November 2004.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. Foong, T. Huff, H. Hum, J. Patwardhan, and G. Regnier. TCP performance re-visited. In IEEE International Symposium on Performance of Systems and Software, March 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Freimuth, E. Hu, J. LaVoie, R. Mraz, E. Nahum, P. Pradhan, and J. Tracey. Server network scalability and TCP offload. In Proceedings of the 2005 USENIX Annual Technical Conference, pages 209--222. Anaheim, April 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Gallatin, J. Chase, and K. Yocum. Trapeze/IP: TCP/IP at near-gigabit speeds. In Proceedings of 1999 USENIX Technical Conference (Freenix Track), pages 109--120, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. M. Hart. Win32 System Programming. Addison Wesley, 2nd edition, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. HP Labs. The userver home page, 2005. URL www.hpl.hp.com/research/linux/userver.Google ScholarGoogle Scholar
  25. R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network I/O. In Proceedings of the 32nd International Conference on Computer Architecture (ISCA'05). Madison, WI, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. InfiniBandSM Trade Association. InfiniBand#8482; Architecture Specification Volume 1, Release 1.0. October 2000. URL www.infinibandta.org.Google ScholarGoogle Scholar
  27. Intel® Corporation. PCI/PCI-X Family of Gigabit Ethernet Controllers Software Developer's Manual, Revision 2.5. July 2005.Google ScholarGoogle Scholar
  28. V. Jacobson and B. Felderman. A modest proposal to help speed up and scale up the linux networking stack. In linux.conf.au, January 2006.Google ScholarGoogle Scholar
  29. J. Kay and J. Pasquale. The importance of non-data touching processing overheads in TCP/IP. In SIGCOMM, pages 259--268, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Kay and J. Pasquale. Profiling and reducing processing overheads in TCP/IP. IEEE/ACM Transations on Networking, 4(6):817--828, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Khalidi and M. Thadani. An efficient zero-copy I/O framework for UNIX. Technical report, SMLI TR95--39, Sun Microsystems Lab, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Libenzi. Improving (network) I/O performance. URL http://www.xmailserver.org/linux-patches/nio-improve.html.Google ScholarGoogle Scholar
  33. J. C. Mogul. TCP offload is a dumb idea whose time has come. In 9th Workshop on Hot Topics in Operating Systems (HotOS IX). USENIX, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. C. Mogul and K. K. Ramakrishnan. Eliminating receive livelock in an interrupt-driven kernel. ACM Transactions on Computer Systems, 15(3):217--252, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Mosberger and T. Jin. httperf: A tool for measuring web server performance. In First Workshop on Internet Server Performance, pages 59--67. Madison, WI, June 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Muir and J. Smith. AsyMOS - an asymmetric multiprocessor operating system. In IEEE Conf on Open Architectures and Network Programming (OPENARCH), April 1998.Google ScholarGoogle ScholarCross RefCross Ref
  37. S. Muir and J. Smith. Functional divisions in the Piglet multiprocessor operating system. In ACM SIGOPS European Workshop, September 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Nagar, P. Larson, H. Linder, and D. Stevens, epoll scalability web page. URL http://Ise.sourceforge.net/epoll/index.html.Google ScholarGoogle Scholar
  39. V. S. Pai, P. Druschel, and W. Zwaenepoel. Flash: An efficient and portable Web server. In Proceedings of the USENIX 1999 Annual Technical Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. Rangarajan, K. Banerjee, J. Yeo, and L. Iftode. MemNet: Efficient offloading of TCP/IP processing using memory-mapped communication. Technical Report DCS-TR-485, Rutgers University Technical Report, 2002.Google ScholarGoogle Scholar
  41. M. Rangarajan, A. Bohra, K. Banerjee, E. Carrera, R. Bianchini, L. Iftode, and W. Zwaenepoel. TCP Servers: Offloading TCP processing in Internet servers. Technical Report DCS-TR-481, Rutgers University, Mar 2002.Google ScholarGoogle Scholar
  42. G. Regnier, D. Minturn, G. McAlpine, V. Saletore, and A. Foong. ETA: Experience with an Intel® Xeon#8482; processor as a packet processing engine. In Hot Interconnects, August 2003.Google ScholarGoogle Scholar
  43. G. J. Regnier, S. Makineni, R. Illikkal, R. R. Iyer, D. B. Minturn, R. Huggahalli, D. Newell, L. S. Cline, and A. Foong. TCP onloading for data center servers. IEEE Computer, 37(11):48--58, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. V. A. Saletore, P. M. Stillwell, J. A. Wiegert, P. Cayton, J. Gray, and G. J. Regnier. Efficient direct user level sockets for an Intel® Xeon#8482; processor based TCP on-load engine. In The Workshop on Communication Architecture for Clusters. Denver, CO, April 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. H. Salim, R. Olsson, and A. Kuznetsov. Beyond Softnet. In 5th Annual Linux Showcase and Conference, pages 165--172, November 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. P. Sarkar, S. Uttamchandani, and K. Voruganti. Storage over IP: when does hardware support help? In 2nd USENIX Conference on File and Storage Technologies (FAST), Mar 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. P. Shivam and J. S. Chase. On the elusive benefits of protocol offload. In ACM SigComm Workshop on Network-IO Convergence (NICELI). Germany, August 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Standard Performance Evaluation Corporation. SPECweb99 Benchmark, 1999. URL www.spec.org/osg/web99.Google ScholarGoogle Scholar
  49. W. Stevens. Unix Network Programming, Volume 1. Addison Wesley, third edition, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Y. Turner, T. Brecht, G. Regnier, V. Saletore, G. J. Janakiraman, and B. Lynn. Scalable networking for next-generation computing platforms. In Third Annual Workshop on System Area Networks (SAN-3). Madrid, Spain, February 2004.Google ScholarGoogle Scholar
  51. M. Welsh, D. Culler, and E. Brewer. SEDA: an architecture for well-conditioned, scalable Internet services. In 18th Symp. on Operating System Principles (SOSP-18), Oct 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. N. Zeldovich, A. Yip, F. Dabek, R. T. Morris, D. Mazieres, and F. Kaashoek. Multiprocessor support for event-driven programs. In Proceedings of the USENIX 2003 Annual Technical Conference, June 2003.Google ScholarGoogle Scholar

Index Terms

  1. Evaluating network processing efficiency with processor partitioning and asynchronous I/O

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM SIGOPS Operating Systems Review
                  ACM SIGOPS Operating Systems Review  Volume 40, Issue 4
                  Proceedings of the 2006 EuroSys conference
                  October 2006
                  383 pages
                  ISSN:0163-5980
                  DOI:10.1145/1218063
                  Issue’s Table of Contents
                  • cover image ACM Conferences
                    EuroSys '06: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
                    April 2006
                    420 pages
                    ISBN:1595933220
                    DOI:10.1145/1217935

                  Copyright © 2006 Authors

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 18 April 2006

                  Check for updates

                  Qualifiers

                  • article

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!