Lightweight Acquisition and Ranging of Flows in the Data Plane

As networks get more complex, the ability to track almost all the flows is becoming of paramount importance. This is because we can then detect transient events impacting only a subset of the traffic. Solutions for flow monitoring exist, but it is getting very difficult to produce accurate estimations for every  tuple given the memory constraints of commodity programmable switches. Indeed, as networks grow in size, more flows have to be tracked, increasing the number of tuples to be recorded. At the same time, end-host virtualization requires more specific flowIDs, enlarging the memory cost for every single entry. Finally, the available memory resources have to be shared with other important functions as well (e.g., load balancing, forwarding, ACL). To address those issues, we present FlowLiDAR (Flow Lightweight Detection and Ranging), a new solution that is capable of tracking almost all the flows in the network while requiring only a modest amount of data plane memory, which is not dependent on the size of flowIDs. We implemented the scheme in P4, tested it using real traffic from ISPs, and compared it against four state-of-the-art solutions: FlowRadar, NZE, PR-sketch, and Elastic Sketch.


INTRODUCTION
Solutions for flow-level monitoring exist [3,4,7,9], but three trends are making this practice a steadily more challenging task: (1) networking devices are becoming commonly adopted in support of many use-cases, impacting the amount of resources to be solely dedicated to flow-monitoring; (2) the increase in switches' per-port capacity has consistently outpaced the growth of their internal ASIC memory, making the latter a very scarce resource; (3) the rise of end-host virtualization and cloud-native services imposes flow IDs that are larger than the common 5-tuple [1,2] (i.e., IPs, protocol and layer 4 ports) further adding pressure to the memory requirements of flow-level telemetry.
In an effort to reduce the memory requirements at switches, the research community has been proposing various solutions [3,4,8].Some rely on probabilistic data structures with bounded errors to store counters and keep track only of the flowIDs for heavy flows (e.g., ElasticSketch [8]) so to enable the reconstruction of <flowID, counter> tuples.The problem is that they can suffer potentially unacceptable inaccuracies when required to track short flows [3].Others encode flowIDs and associated counters directly in the ASIC (e.g., FlowRadar [4]) or adopt signal-processing techniques to limit the amount of resources to be used (e.g., NZE [3]).Although they can potentially track all flows in the network, they experience a loss in accuracy when fine-grained flow-level telemetry (i.e., flow IDs more specific than the standard 5-tuple) is needed.An alternative approach in this scenario is to send the flowIDs to the control plane and keep only the counters in the dataplane as done in the PR-sketch [6].This makes the dataplane memory independent of the flowID size.However, the dataplane memory needed for the filter used to detect flows and the counters is still large, limiting the number of flows that can be monitored.

FLOWLIDAR
We present FlowLiDAR, a new solution that can track almost all flows present in the network.The concept of FlowLiDAR is illustrated in Figure 1 and makes use of both switch data and control planes.Similarly to [6], the idea is to place all the functions that must be done per packet in the dataplane while those that are much less frequent are placed in the control plane.In more detail, flow detection and packet counting are done in the dataplane, while the processing of new flows and the fine-grained computation of the number of packets per flow is done in the control plane.This approach needs to send information between the control and data planes and thus the bandwidth on this interface may be an issue.However, this should not be the case when switching ASICs that have high-speed links between both planes.We also propose variants of our design that can reduce the amount of information sent on that interface.
FlowLIDAR introduces key innovations over the PR-sketch design that allow us to significantly reduce the amount of data plane memory needed to achieve close to 100% accuracy in estimating flow size.For example, we use an exact equation solving in the dataplane to extract the size of the flows from the counters at the end of a measurement epoch.We also propose a new mechanism, named lazy updates that eliminates the need to use counters in the ASIC for most flows that have only one or a few packets.This not only reduces the dataplane memory needed for the counters, but it also reduces the complexity of the equation solving and improves its accuracy.We implemented FlowLiDAR in P4 and tested it against real traffic traces taken from a large ISP.We found that using the same amount of memory FlowLiDAR improves the accuracy of flow counting in terms of average relative error (ARE) and average absolute error (AAE) when compared against state-of-the-art solutions such as NZE, the PR-Sketch, and Elastic Sketch by up to 10x, 100x, and 100x, respectively.Moreover, while FlowLiDAR is able to successfully track 98.7% of existing flows, other techniques can only reconstruct up to 60% of them.The main novelties and features of our proposed flow monitoring scheme, FlowLiDAR are: (

EVALUATION
In Figure 2, we show the fraction of flows that can be accurately tracked when using a fixed amount of memory, i.e., 10 MB and  2M active flows to be tracked [5].In this situation, NZE can successfully track all flows only if flowIDs are just 32 bits (an IP address).When a more fine-grained flow analysis is needed, larger flow identifiers must be adopted, impacting the ability of NZE to track all flows.Indeed, when using just the standard 5-tuple, the flow coverage can drop to 60%.On the other hand, ElasticSketch is able to track only 90% of flows with high accuracy (<1% relative error) with 32 bits, but its performance degrades more gracefully than NZE.FlowRadar is not able to track all flows for any flowID size and its coverage is lower than that of NZE and Elastic Sketch.

Figure 1 :
Figure 1: Block diagram of the proposed FlowLiDAR.The dataplane detects new flows, sends the IDs to the control plane, and counts the packets.The control plane stores the flow IDs and periodically computes the exact values of the flow lengths

1 )
Most flows with only one or a few packets do not use counters in the dataplane.This reduces the number of counters needed very significantly as most flows tend to have few packets.(2) The extraction of the values from the shared counters is done using a mathematical formulation that increases the accuracy of the flow packet count estimates and reduces the memory needed.(3) It can be efficiently implemented in programmable dataplanes using P4.(4) It outperforms state-of-the-art algorithms in terms of memory vs accuracy trade-off.

Figure 2 :
Figure 2: Fraction of flows that can be monitored with a fixed amount of memory for ElasticSketch (ES), NZE, FlowRadar (FR), PR-Sketch (PR), and FlowLiDAR considering 1.2M active flows to be tracked[5].In this situation, NZE can successfully track all flows only if flowIDs are just 32 bits (an IP address).When a more fine-grained flow analysis is needed, larger flow identifiers must be adopted, impacting the ability of NZE to track all flows.Indeed, when using just the standard 5-tuple, the flow coverage can drop to 60%.On the other hand, ElasticSketch is able to track only 90% of flows with high accuracy (<1% relative error) with 32 bits, but its performance degrades more gracefully than NZE.FlowRadar is not able to track all flows for any flowID size and its coverage is lower than that of NZE and Elastic Sketch.When more packet header fields need to be considered, as in the case of tunneled connections requiring VXLAN + 5-tuple or when upper layer protocol headers are needed, the flow coverage can drop further: in the presence of 256-bit flowIDs, the flow coverage of Elastic Sketch, NZE and FlowRadar drop to approximately 60%, 35% and 20%, respectively which is clearly not acceptable.Finally, the PR-sketch flow coverage does not depend on the flowID size as expected but is below 40%, which is worse than existing schemes for small flowID sizes and also unacceptable.Instead, our solution is able to track more than 99% of flows regardless of flowID size.