skip to main content
research-article

Prophet: Toward Fast, Error-Tolerant Model-Based Throughput Prediction for Reactive Flows in DC Networks

Published: 15 December 2020 Publication History
  • Get Citation Alerts
  • Abstract

    As modern network applications (<italic>e.g.</italic>, large data analytics) become more distributed and can conduct application-layer traffic adaptation, they demand better network visibility to better orchestrate their data flows. As a result, the ability to predict the available bandwidth for a set of flows has become a fundamental requirement of today&#x2019;s networking systems. While there are previous studies addressing the case of non-reactive flows, the prediction for <italic>reactive flows</italic>, <italic>e.g.</italic>, flows managed by TCP congestion control algorithms, still remains an open problem. In this paper, we take the first step to solving this problem in a data center network. To address both theoretical and practical challenges, we introduce a novel learning-based prediction system based on the NUM model, with two key techniques named <italic>fast factor learning</italic> (FFL) and <italic>efficient flow sampling</italic>. We adopt novel techniques to overcome practical concerns such as scalability, convergence and unknown system parameters. A system, Prophet, is proposed leveraging the emerging technologies of <italic>Software Defined Networking</italic> (SDN) to realize the model. Evaluations demonstrate that our solution achieves significant accuracy in a wide range of settings.

    References

    [1]
    Amazon elastic compute cloud (Amazon EC2), Amazon, Bellevue, WA, USA, 2010.
    [2]
    Google. (2017). Google Cloud Computing, Hosting Services & APIs. [Online]. Available: https://cloud.google.com/
    [3]
    Microsoft. (2017). Microsoft Azure Cloud Computing Platform & Services. [Online]. Available: https://azure.microsoft.com/en-us/
    [4]
    E. Nygren, R. K. Sitaraman, and J. Sun, “The akamai network: A platform for high-performance Internet applications,” ACM SIGOPS Oper. Syst. Rev., vol. 44, no. 3, pp. 2–19, Aug. 2010.
    [5]
    J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008.
    [6]
    I. Birdet al., “LHC computing grid,” Des. Rep., vol. 1, p. 8, Oct. 2005.
    [7]
    H. Rodrigues, J. R. Santos, Y. Turner, P. Soares, and D. O. Guedes, “Gatekeeper: Supporting bandwidth guarantees for multi-tenant datacenter networks,” in Proc. WIOV, 2011, pp. 1–7.
    [8]
    H. Ballani, P. Costa, T. Karagiannis, and A. Rowstron, “Towards predictable datacenter networks,” in Proc. SIGCOMM, New York, NY, USA, 2011, pp. 242–253.
    [9]
    B. Hindmanet al., “Mesos: A platform for fine-grained resource sharing in the data center,” in Proc. NSDI, vol. 11, 2011, p. 22.
    [10]
    R. Alimi, Y. Yang, and R. Penno, Application-Layer Traffic Optimization (ALTO) Protocol, document RFC 7285, RFC, 2014. 10.17487/RFC7285.
    [11]
    K. Gao, Q. Xiang, X. Wang, Y. R. Yang, and J. Bi, “NOVA: Towards on-demand equivalent network view abstraction for network optimization,” in Proc. IEEE/ACM 25th Int. Symp. Qual. Service (IWQoS), Jun. 2017, pp. 1–10.
    [12]
    CAIDA. Analyzing UDP Usage in Internet Traffic. Accessed: May 5, 2020. [Online]. Available: http://www.caida.org/research/traffic-analysis/tcpudpratio/index.xml
    [13]
    Linux. (2017). Manual TCP (7). [Online]. Available: http://man7.org/linux/man-pages/man7/tcp.7.html
    [14]
    Å. Budzisz, R. Stanojevic, A. Schlote, F. Baker, and R. Shorten, “On the fair coexistence of loss- and delay-based TCP,” IEEE/ACM Trans. Netw., vol. 19, no. 6, pp. 1811–1824, Dec. 2011.
    [15]
    A. Tang, J. Wang, S. H. Low, and M. Chiang, “Equilibrium of heterogeneous congestion control: Existence and uniqueness,” IEEE/ACM Trans. Netw., vol. 15, no. 4, pp. 824–837, Aug. 2007.
    [16]
    R. Wang, M. Valla, M. Y. Sanadidi, and M. Gerla, “Adaptive bandwidth share estimation in TCP westwood,” in Proc. Global Telecommun. Conf., Nov. 2002, pp. 2604–2608.
    [17]
    S. Mascolo, C. Casetti, M. Gerla, S. Lee, and M. Sanadidi, “TCP Westwood: Congestion control with faster recovery,” Univ. California, Los Angeles, CA, USA, Tech. Rep. CSD TR 200017, 2000.
    [18]
    A. Capone, L. Fratta, and F. Martignon, “Bandwidth estimation schemes for TCP over wireless networks,” IEEE Trans. Mobile Comput., vol. 3, no. 2, pp. 129–143, Apr. 2004.
    [19]
    R. Srikant, The Mathematics of Internet Congestion Control. Cham, Switzerland: Springer, 2012.
    [20]
    F. P. Kelly, A. K. Maulloo, and D. K. H. Tan, “Rate control for communication networks: Shadow prices, proportional fairness and stability,” J. Oper. Res. Soc., vol. 49, no. 3, pp. 237–252, Apr. 1998.
    [21]
    S. H. Low, L. L. Peterson, and L. Wang, “Understanding TCP Vegas: A duality model,” J. ACM, vol. 49, no. 2, pp. 207–235, 2002.
    [22]
    A. Greenberget al., “VL2: A scalable and flexible data center network,” in Proc. ACM SIGCOMM Conf. Data Commun., New York, NY, USA, 2009, pp. 51–62.
    [23]
    M. Chowdhury, Z. Liu, A. Ghodsi, and I. Stoica, “HUG: Multi-resource fairness for correlated and elastic demands,” in Proc. 13th USENIX Symp. Netw. Syst. Des. Implement., Santa Clara, CA, USA, 2016, pp. 407–424.
    [24]
    J. Oshio, S. Ata, and I. Oka, “Real-time identification of different TCP versions,” in Managing Next Generation Networks and Services, S. Ata and C. S. Hong, Eds. Berlin, Germany: Springer, 2007, pp. 215–224.
    [25]
    J. Oshio, S. Ata, and I. Oka, “Identification of different TCP versions based on cluster analysis,” in Proc. 18th Int. Conf. Comput. Commun. Netw., Aug. 2009, pp. 1–6.
    [26]
    P. Yang, J. Shao, W. Luo, L. Xu, J. Deogun, and Y. Lu, “TCP congestion avoidance algorithm identification,” IEEE/ACM Trans. Netw., vol. 22, no. 4, pp. 1311–1324, Aug. 2014.
    [27]
    M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable, commodity data center network architecture,” in Proc. ACM SIGCOMM Conf. Data Commun., 2008, pp. 63–74.
    [28]
    P. Hansen and B. Jaumard, Lipschitz Optimization. Boston, MA, USA: Springer, 1995, pp. 407–493. 10.1007/978-1-4615-2025-2_9.
    [29]
    Q. Huanget al., “SketchVisor: Robust network measurement for software packet processing,” in Proc. Conf. ACM Special Interest Group Data Commun., Aug. 2017, pp. 113–126.
    [30]
    W. Bai, L. Chen, K. Chen, D. Han, C. Tian, and H. Wang, “PIAS: Practical information-agnostic flow scheduling for commodity data centers,” IEEE/ACM Trans. Netw., vol. 25, no. 4, pp. 1954–1967, Aug. 2017.
    [31]
    L. Chen, J. Lingys, K. Chen, and F. Liu, “AuTO: Scaling deep reinforcement learning for datacenter-scale automatic traffic optimization,” in Proc. Conf. ACM Special Interest Group Data Commun., Aug. 2018, pp. 191–205.
    [32]
    Z. Liu, A. Manousis, G. Vorsanger, V. Sekar, and V. Braverman, “One sketch to rule them all: Rethinking network flow monitoring with UnivMon,” in Proc. Conf. ACM SIGCOMM Conf., New York, NY, USA, 2016, pp. 101–114.
    [33]
    Y. Li, R. Miao, C. Kim, and M. Yu, “Flow radar: A better netFlow for data centers,” in Proc. 13th Symp. Netw. Syst. Des. Implement., Santa Clara, CA, USA, 2016, pp. 311–324.
    [34]
    T. Yanget al., “Elastic sketch: Adaptive and fast network-wide measurements,” in Proc. Conf. ACM Special Interest Group Data Commun., Aug. 2018, pp. 561–575.
    [35]
    L. S. Brakmo and L. L. Peterson, “TCP vegas: End to end congestion avoidance on a global Internet,” IEEE J. Sel. Areas Commun., vol. 13, no. 8, pp. 1465–1480, Oct. 1995.
    [36]
    N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson, “BBR: Congestion-based congestion control,” Queue, vol. 14, no. 5, pp. 50-20–50-53, Oct. 2016.
    [37]
    V. K. Vavilapalliet al., “Apache Hadoop YARN: Yet another resource negotiator,” in Proc. 4th Annu. Symp. Cloud Comput., 2013, pp. 1–7.
    [38]
    M. Vojnovic, J.-Y. Le Boudec, and C. Boutremans, “Global fairness of additive-increase and multiplicative-decrease with heterogeneous round-trip times,” in Proc. 19th Annu. Joint Conf. IEEE Comput. Commun. Soc., 2000, pp. 1303–1312.
    [39]
    Apache. (2017). Spark SQL & Data Frames. [Online]. Available: https://spark.apache.org/sql/
    [40]
    Apache. (2017). Calcite—Dynamic Data Management Framework. [Online]. Available: http://calcite.apache.org/

    Cited By

    View all
    • (2022)On the Performance of TCP in Reconfigurable Data Center NetworksProceedings of the 18th International Conference on Network and Service Management10.5555/3581644.3581701(1-9)Online publication date: 31-Oct-2022

    Index Terms

    1. Prophet: Toward Fast, Error-Tolerant Model-Based Throughput Prediction for Reactive Flows in DC Networks
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image IEEE/ACM Transactions on Networking
          IEEE/ACM Transactions on Networking  Volume 28, Issue 6
          Dec. 2020
          457 pages

          Publisher

          IEEE Press

          Publication History

          Published: 15 December 2020
          Published in TON Volume 28, Issue 6

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)5
          • Downloads (Last 6 weeks)1

          Other Metrics

          Citations

          Cited By

          View all
          • (2022)On the Performance of TCP in Reconfigurable Data Center NetworksProceedings of the 18th International Conference on Network and Service Management10.5555/3581644.3581701(1-9)Online publication date: 31-Oct-2022

          View Options

          Get Access

          Login options

          Full Access

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media