skip to main content
10.1145/3484266.3487382acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Public Access

TCP is Harmful to In-Network Computing: Designing a Message Transport Protocol (MTP)

Published: 04 November 2021 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents the motivation and design of MTP, a new offload-friendly message transport protocol. Existing transport protocols like TCP, MPTCP, and UDP/Quic all have key limitations when used in a network that may potentially offload computation from end-servers into NICs, switches, and other network devices. To enable important new in-network computing use cases and correct congestion control in the face of ever changing network paths and application replicas, MTP introduces a new message transport protocol design and pathlet congestion control, a new approach where end-hosts explicitly communicate messaging information to network devices and network devices explicitly communicate network path and congestion information back to end-hosts.

    References

    [1]
    The ns-3 discrete-event network simulator. http://www.nsnam.org.
    [2]
    Alizadeh, M., Edsall, T., Dharmapurikar, S., Vaidyanathan, R., Chu, K., Fingerhut, A., Lam, V. T., Matus, F., Pan, R., Yadav, N., and Varghese, G. CONGA: Distributed Congestion-Aware Load Balancing for Datacenters. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2014), SIGCOMM, Association for Computing Machinery.
    [3]
    Alizadeh, M., Greenberg, A., Maltz, D. A., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., and Sridharan, M. Data Center TCP (DCTCP). In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2010), SIGCOMM, Association for Computing Machinery.
    [4]
    Cho, I., Saeed, A., Fried, J., Park, S. J., Alizadeh, M., and Belay, A. Overload control for μs-scale rpcs with breakwater. In USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2020), USENIX Association.
    [5]
    Dong, M., Li, Q., Zarchy, D., Godfrey, P. B., and Schapira, M. PCC: Re-architecting congestion control for consistent high performance. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2015), USENIX Association.
    [6]
    Dukkipati, N. Rate Control Protocol (Rcp): Congestion Control to Make Flows Complete Quickly. PhD thesis, Stanford, CA, USA, 2008. AAI3292347.
    [7]
    Dunning, D., Régnier, G., McAlpine, G., Cameron, D., Shubert, B., Berry, F., Merritt, A., Gronke, E., and Dodd, C. The virtual interface architecture. IEEE Micro 18, 2 (1998), 66--76.
    [8]
    Floyd, S., Ramakrishnan, D. K. K., and Black, D. L. The Addition of Explicit Congestion Notification (ECN) to IP. RFC 3168, Sept. 2001.
    [9]
    Gandhi, R., Liu, H. H., Hu, Y. C., Lu, G., Padhye, J., Yuan, L., and Zhang, M. Duet: Cloud scale load balancing with hardware and software. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2014), SIGCOMM, Association for Computing Machinery.
    [10]
    Ghorbani, S., Yang, Z., Godfrey, P. B., Ganjali, Y., and Firoozshahian, A. DRILL: Micro Load Balancing for Low-Latency Data Center Networks. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2017), SIGCOMM, Association for Computing Machinery.
    [11]
    Godfrey, P. B., Ganichev, I., Shenker, S., and Stoica, I. Path-let routing. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2009), SIGCOMM, Association for Computing Machinery.
    [12]
    Grosvenor, M. P., Schwarzkopf, M., Gog, I., Watson, R. N. M., Moore, A. W., Hand, S., and Crowcroft, J. Queues don't matter when you can JUMP them! In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2015), USENIX Association.
    [13]
    Handley, M., Raiciu, C., Agache, A., Voinescu, A., Moore, A. W., Antichi, G., and Wójcik, M. Re-architecting datacenter networks and stacks for low latency and high performance. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2017), SIGCOMM, Association for Computing Machinery.
    [14]
    He, K., Rozner, E., Agarwal, K., Felter, W., Carter, J., and Akella, A. Presto: Edge-based load balancing for fast datacenter networks. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2015), SIGCOMM, Association for Computing Machinery.
    [15]
    Infiniband trade association, 2017.
    [16]
    Jin, X., Li, X., Zhang, H., Soulé, R., Lee, J., Foster, N., Kim, C., and Stoica, I. NetCache: Balancing Key-Value Stores with Fast In-Network Caching. In Proceedings of the Symposium on Operating Systems Principles (2017), SOSP, Association for Computing Machinery.
    [17]
    Jose, L., Yan, L., Alizadeh, M., Varghese, G., McKeown, N., and Katti, S. High speed networks need proactive congestion control. In Proceedings of the ACM Workshop on Hot Topics in Networks (2015), HotNets, Association for Computing Machinery.
    [18]
    Kaffes, K., Chong, T., Humphries, J. T., Belay, A., Mazières, D., and Kozyrakis, C. Shinjuku: Preemptive scheduling for μsecond-scale tail latency. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2019), USENIX Association.
    [19]
    Katabi, D., Handley, M., and Rohrs, C. Congestion control for high bandwidth-delay product networks. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2002), SIGCOMM, Association for Computing Machinery.
    [20]
    Kaufmann, A., Peter, S., Sharma, N. K., Anderson, T., and Krishnamurthy, A. High Performance Packet Processing with FlexNIC. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (2016), ASPLOS, Association for Computing Machinery.
    [21]
    Kulkarni, S. G., Zhang, W., Hwang, J., Rajagopalan, S., Ramakrishnan, K. K., Wood, T., Arumaithurai, M., and Fu, X. NFVnice: Dynamic Backpressure and Scheduling for NFV Service Chains. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2017), SIGCOMM, Association for Computing Machinery.
    [22]
    Kumar, G., Dukkipati, N., Jang, K., Wassel, H. M. G., Wu, X., Montazeri, B., Wang, Y., Springborn, K., Alfeld, C., Ryan, M., Wetherall, D., and Vahdat, A. Swift: Delay is simple and effective for congestion control in the datacenter. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2020), SIGCOMM, Association for Computing Machinery.
    [23]
    Lao, C., Le, Y., Mahajan, K., Chen, Y., Wu, W., Akella, A., and Swift, M. ATP: In-network aggregation for multi-tenant learning. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2021), USENIX Association.
    [24]
    Li, J., Nelson, J., Michael, E., Jin, X., and Ports, D. R. K. Pegasus: Tolerating skewed workloads in distributed storage with innetwork coherence directories. In USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2020), USENIX Association.
    [25]
    Li, X., Sethi, R., Kaminsky, M., Andersen, D. G., and Freedman, M. J. Be Fast, Cheap and in Control with SwitchKV. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2016), USENIX Association.
    [26]
    Liu, J., Wu, J., and Panda, D. K. High Performance RDMA-based MPI Implementation over InfiniBand. Int. J. Parallel Program. 32, 3 (June 2004), 167--198.
    [27]
    Lu, Y., Chen, G., Li, B., Tan, K., Xiong, Y., Cheng, P., Zhang, J., Chen, E., and Moscibroda, T. Multi-path transport for RDMA in datacenters. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2018), USENIX Association.
    [28]
    Mellanox Technologies. RDMA Aware Networks Programming User Manual, 2015.
    [29]
    Miao, R., Zeng, H., Kim, C., Lee, J., and Yu, M. SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2017), SIGCOMM, Association for Computing Machinery.
    [30]
    Montazeri, B., Li, Y., Alizadeh, M., and Ousterhout, J. Homa: A receiver-driven low-latency transport protocol using network priorities. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2018), SIGCOMM, Association for Computing Machinery.
    [31]
    Paasch, C., Barre, S., et al. Multipath TCP implementation in the Linux kernel. Available from http://www.multipath-tcp.org.
    [32]
    Patel, P., Bansal, D., Yuan, L., Murthy, A., Greenberg, A., Maltz, D. A., Kern, R., Kumar, H., Zikos, M., Wu, H., Kim, C., and Karri, N. Ananta: Cloud scale load balancing. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2013), SIGCOMM, Association for Computing Machinery.
    [33]
    Phothilimthana, P. M., Liu, M., Kaufmann, A., Peter, S., Bodik, R., and Anderson, T. Floem: A Programming System for NIC-Accelerated Network Applications. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (2018), OSDI, USENIX Association.
    [34]
    Ports, D. R. K., and Nelson, J. When should the network be the computer? In Proceedings of the Workshop on Hot Topics in Operating Systems (2019), HotOS, Association for Computing Machinery.
    [35]
    Postel, J. Transmission control protocol. RFC 793, September 1981.
    [36]
    Raghavan, D., Levis, P., Zaharia, M., and Zhang, I. Breakfast of Champions: Towards Zero-Copy Serialization with NIC Scatter-Gather. In Proceedings of the Workshop on Hot Topics in Operating Systems (2021), HotOS, Association for Computing Machinery.
    [37]
    Sapio, A., Canini, M., Ho, C.-Y., Nelson, J., Kalnis, P., Kim, C., Krishnamurthy, A., Moshref, M., Ports, D., and Richtarik, P. Scaling distributed machine learning with in-network aggregation. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2021), USENIX Association.
    [38]
    Sharma, N. K., Kaufmann, A., Anderson, T., Krishnamurthy, A., Nelson, J., and Peter, S. Evaluating the power of flexible packet processing for network resource allocation. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2017), USENIX Association.
    [39]
    Sharma, N. K., Liu, M., Atreya, K., and Krishnamurthy, A. Approximating fair queueing on reconfigurable switches. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2018), USENIX Association.
    [40]
    Sharma, N. K., Zhao, C., Liu, M., Kannan, P. G., Kim, C., Krishnamurthy, A., and Sivaraman, A. Programmable calendar queues for high-speed packet scheduling. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2020), USENIX Association.
    [41]
    Sherry, J., Lan, C., Popa, R. A., and Ratnasamy, S. BlindBox: Deep Packet Inspection over Encrypted Traffic. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2015), SIGCOMM, Association for Computing Machinery.
    [42]
    Suresh, L., Canini, M., Schmid, S., and Feldmann, A. C3: Cutting tail latency in cloud data stores via adaptive replica selection. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2015), USENIX Association.
    [43]
    Wolnikowski, A., Ibanez, S., Stone, J., Kim, C., Manohar, R., and Soulé, R. Zerializer: Towards Zero-Copy Serialization. In Proceedings of the Workshop on Hot Topics in Operating Systems (2021), HotOS, Association for Computing Machinery.
    [44]
    Yu, Z., Wu, J., Braverman, V., Stoica, I., and Jin, X. Twenty Years After: Hierarchical Core-Stateless Fair Queueing. In USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2021), USENIX Association.
    [45]
    Zhao, Z., Sadok, H., Atre, N., Hoe, J. C., Sekar, V., and Sherry, J. Achieving 100Gbps intrusion prevention on a single server. In USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2020), USENIX Association.
    [46]
    Zhou, H., Chen, M., Lin, Q., Wang, Y., She, X., Liu, S., Gu, R., Ooi, B.C., and Yang, J. Overload Control for Scaling WeChat Microservices. In Proceedings of the ACM Symposium on Cloud Computing (2018), SoCC, Association for Computing Machinery.
    [47]
    Zhu, H., Kaffes, K., Chen, Z., Liu, Z., Kozyrakis, C., Stoica, I., and Jin, X. RackSched: A microsecond-scale scheduler for rack-scale computers. In USENIX Symposium on Operating Systems Design and Implementation (OSDI) (2020), USENIX Association.
    [48]
    Zhu, Y., Eran, H., Firestone, D., Guo, C., Lipshteyn, M., Liron, Y., Padhye, J., Raindel, S., Yahia, M. H., and Zhang, M. Congestion control for large-scale RDMA deployments. In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (2015), SIGCOMM, Association for Computing Machinery.

    Cited By

    View all
    • (2023)Application Defined NetworksProceedings of the 22nd ACM Workshop on Hot Topics in Networks10.1145/3626111.3628178(87-94)Online publication date: 28-Nov-2023
    • (2023)Offloading Machine Learning to Programmable Data Planes: A Systematic SurveyACM Computing Surveys10.1145/360515356:1(1-34)Online publication date: 26-Aug-2023
    • (2023)Augmented Queue: A Scalable In-Network Abstraction for Data Center Network SharingProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604858(305-318)Online publication date: 10-Sep-2023
    • Show More Cited By

    Index Terms

    1. TCP is Harmful to In-Network Computing: Designing a Message Transport Protocol (MTP)

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      HotNets '21: Proceedings of the 20th ACM Workshop on Hot Topics in Networks
      November 2021
      246 pages
      ISBN:9781450390873
      DOI:10.1145/3484266
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 04 November 2021

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      HotNets '21
      Sponsor:
      HotNets '21: The 20th ACM Workshop on Hot Topics in Networks
      November 10 - 12, 2021
      Virtual Event, United Kingdom

      Acceptance Rates

      Overall Acceptance Rate 110 of 460 submissions, 24%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)227
      • Downloads (Last 6 weeks)31

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Application Defined NetworksProceedings of the 22nd ACM Workshop on Hot Topics in Networks10.1145/3626111.3628178(87-94)Online publication date: 28-Nov-2023
      • (2023)Offloading Machine Learning to Programmable Data Planes: A Systematic SurveyACM Computing Surveys10.1145/360515356:1(1-34)Online publication date: 26-Aug-2023
      • (2023)Augmented Queue: A Scalable In-Network Abstraction for Data Center Network SharingProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604858(305-318)Online publication date: 10-Sep-2023
      • (2023)Amphis: Rearchitecturing Congestion Control for Capturing Internet Application VarietyProceedings of the 7th Asia-Pacific Workshop on Networking10.1145/3600061.3600076(95-101)Online publication date: 29-Jun-2023
      • (2023)In-Network Aggregation with Transport Transparency for Distributed TrainingProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582037(376-391)Online publication date: 25-Mar-2023
      • (2023)An Intelligent User Plane to Support In-Network Computing in 6G NetworksICC 2023 - IEEE International Conference on Communications10.1109/ICC45041.2023.10279652(1100-1105)Online publication date: 28-May-2023
      • (2023)In‐network computing—challenges and opportunitiesInternet Technology Letters10.1002/itl2.487Online publication date: 17-Oct-2023
      • (2022)Evolving the End-to-End Transport Layer in Times of Emerging Computing In The Network (COIN)2022 IEEE 30th International Conference on Network Protocols (ICNP)10.1109/ICNP55882.2022.9940379(1-6)Online publication date: 30-Oct-2022

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media