skip to main content
research-article
Open Access

Hardware-accelerated Real-time Drift-awareness for Robust Deep Learning on Wireless RF Data

Published:11 March 2023Publication History

Skip Abstract Section

Abstract

Proactive and intelligent management of network resource utilization (RU) using deep learning (DL) can significantly improve the efficiency and performance of the next generation of wireless networks. However, variations in wireless RU are often affected by uncertain events and change points due to the deviations of real data distribution from that of the original training data. Such deviations, which are known as dataset drifts, can subsequently lead to a shift in the corresponding decision boundary degrading the DL model prediction performance. To address these challenges, we present hardware-accelerated real-time radio frequency (RF) analytics and drift-awareness modules for robust DL predictions. We have prototyped the proposed design on a Zynq-7000 System-on-Chip that contains an FPGA and an embedded ARM processor. We have used Xilinx Vivado design suite for synthesis and analysis of the HDL design for the proposed solution. To detect dataset drifts, the proposed solution adopts a distance-based technique on FPGA to quantify in real-time the change between the prediction distribution obtained from DL predictions and data distribution of input streaming samples. Using various performance metrics, we have extensively evaluated the performance of the proposed solution and shown that it can significantly improve the DL model robustness in the presence of dataset drifts.

Skip 1INTRODUCTION Section

1 INTRODUCTION

The use of sophisticated deep learning (DL)-driven techniques to predict resource utilization variations in a sixth-generation (6G) network can allow it to proactively plan reallocation of resources to those services/network elements that will have higher resource requirements [15, 20, 30]. This is different from existing networks that mainly utilize reactive resource allocation approaches that are based on responding to current resource requirements. By optimizing the resources beforehand, a proactive resource allocation solution has the ability to satisfy demands and prevent resource congestion more effectively as compared to a reactive solution that responds only when a change in demand occurs.

In spite of often exhibiting resource utilization patterns recurring on a daily, a weekly, and a yearly holiday basis that can be learned by a DL model, in practice, network usage can be irregularly affected by uncertain events and change points, such as abrupt and previously unseen increase in network utilization load. In machine learning (ML) terminology, this phenomenon is called dataset drift, which simply means statistical properties of the target variable (resource utilization in this case), which the model is trying to predict, changes over time in unforeseen ways [9]. An event pertaining to a sudden data drift is in general not predictable by a DL model and it can seriously degrade the DL model performance later in time. For example, using drifted data to update the DL prediction system could drag it away from the previous learned direction, which in turn can derail the so-far trained network, thereby incurring unreliable/erroneous predictions. In Figure 1, we provide an illustrative example of how a sudden and unforeseen data drifting event can affect the prediction performance of a DL model trained to predict wireless frequency channel utilization (CU) time series data. Data drifting poses a challenge, which the DL models designed for wireless networks have often ignored.

Fig. 1.

Fig. 1. DL predictions and \(95\%\) prediction intervals for resource utilization in terms of CU is shown. CU data have been collected by us in the University of Oulu. Figure (a) shows the DL predictions for usual CU data, while Figure (b) shows the DL predictions with anomalous resource utilization level that was not seen before. The DL model performance is seriously degraded by the anomalous resource utilization level.

A typical approach adopted by adaptive ML systems to deal with dataset drifts is to employ drift detectors that can help in adapting the DL model to changing data. Drift detectors are used to examine the relationship between the incoming streaming data and the prediction model data. However, traditional ML approaches to data drift detection can be too slow for monitoring wireless radio frequency (RF) data streams that can typically have speeds ranging from tens of million to tens of billion samples-per-second. For example, a server/CPU-based drift-awareness detector can lead to delay in processing of samples due to slow transfer speed between in-phase and quadrature (I/Q) data collection module and the host processing them. Moreover, limitations in memory buffer size can lead to gaps in collected RF data. Having gaps in the collected data can make drift detectors to make less accurate decisions for DL model updates, which can adversely affect the performance of the prediction systems. Field programmable gate array (FPGA) is a reconfigurable hardware that is very well suited for prototyping and testing high-speed streaming data processing designs. Having an architecture suitable for pipelining, even a low-cost FPGA can acquire and process wireless data streams of several million samples per second without creating any gaps or data losses. Such a hardware-accelerated drift detection and classification system generally offers a real-time drift detection solution due to its reconfigurability and ability to process millions of samples in real-time.

In this article, we present an FPGA-assisted real-time RF data analytics and explainable drift-awareness design for the wireless resource utilization prediction system in shared spectrum bands. The presented design takes advantage of inherent FPGA parallelism to perform simultaneously: (i) analytics on RF I/Q components data streams that can be utilized to update the DL model; and (ii) perform drift detection that enables robustness in the DL model to drifts in RF data streams. We adopt a rolling horizon (i.e., multistage) approach in which the DL model makes predictions for the next time interval \(t_i\) and the length of each time interval is \(\tau\). The basic idea is to construct probability distributions of DL prediction samples for the next interval and utilize a hardware-efficient distance-based technique on an FPGA that compares in real-time the distribution of arriving samples with the DL predicted sample distribution. The proposed FPGA-based technique not only detects drifting samples that deviate from the predicted sample distribution but also provide explanations to reason the meaning of the drift. For example, explanation in terms of what type of change has happened due to drift and whether or not the prediction system should update the model with the new arriving samples. The explanation property helps the DL prediction module to understand/interpret the drift in data that can be used to automate the DL adaptation decisions. Furthermore, drift explanation can be very important in certain application perspectives such as in proactive wireless frequency resource allocation solutions. For instance, if the DL module in the proactive resource allocator predicts a low frequency resource utilization state and a drift leads to a high resource utilization state, then the proposed drift-awareness method can provide an explanation in terms of which state the system has moved to after the drift. This information can be used by the proactive resource allocator to take accurate resource allocation adaptation decisions thereafter.

The main contributions of our work can be summarized as:

(1)

We study the dataset drift by leveraging both synthetic and real datasets of wireless frequency spectrum resource utilization. For the simplicity and tractability purposes, we focus on using a DL-based technique to build a proactive prediction system that can take into account the variability in wireless frequency CU. CU is a measure of channel usage that is measured over some time period \(t\) and is given as a percentage value between \(0\%\) and \(100\%\).

(2)

We present a novel hardware/software co-design of a real-time analytics and explainable drift-awareness technique for RF data streams. Using the Xilinx Vivado design suite, hardware description language (HDL) design of the RF analytics and drift-awareness modules are implemented as a combination of multiple intellectual property (IP) cores that calculate CU in real-time, estimate current distribution of CU data and compare the current distribution with the predicted CU distribution using a statistical distance measure.

(3)

The drift detection module on FPGA requires data using two asynchronous sources: (i) real-time streaming I/Q samples from an RF transceiver; and (ii) probability distributions of predicted CU values from the DL server. The system also requires to output data from CU analytics and drift detection modules that are also asynchronous to one another. We present an event-driven finite state machine (FSM)-based circuit design that handles the asynchronicity and allows the FPGA to perform accurate computations in parallel on multiple data inputs. Moreover, the proposed circuit design allows conflict-free data transactions of multiple data outputs.

(4)

The performance of the proposed design and implemented system is evaluated using extensive measurements, testing, and statistical analyses that are performed in both laboratory and over-the-air environments. We evaluate the performance of the FPGA design in terms of its resource utilization and also evaluate the performance of the drift-awareness technique using various performance metrics such as precision, accuracy, and recall. Finally, we also show how the proposed design can enable robustness in a DL prediction system even under abrupt changes in CU data distributions.

The rest of the article is organized as follows: First, we introduce the related work including dataset drift in ML literature and FPGA-assisted real-time analytics. Then, we introduce the formulation of the research problem addressed in this article. In Section 4, we provide insights for the drift detector design. Section 5 discusses the hardware accelerated drift detection, its circuit designs, and their implementation on an FPGA. In Section 6, we present the experiment results and their discussion. Finally, we conclude the article in Section 7 discussing the future research directions.

Skip 2RELATED WORK Section

2 RELATED WORK

Most of the typical ML/DL models that are used for classification, prediction, and so on, for real-world applications implicitly assume that data are randomly generated from some stationary distribution [9]. However, in many real-world contexts, assuming a stationary distribution for data is not always true, since the underlying data generating processes are typically non-stationary. They can evolve dynamically over time and can even change abruptly [7, 9, 13]. In such environments, typical ML/DL models trained with the assumption of stationarity become outdated with time and perform inferior over time. Hence, detecting the point at which the input time-series data distribution changes is of utmost importance. The phenomenon of evolving data distribution over time that has been introduced in literature in different terminologies such as concept drift/shift and dataset drift/shift [18] exists in almost all real-time data streaming systems where the data occur at the input in one-by-one manner. Indeed, it is required for these kinds of systems to continuously monitor the data to track the dataset drift in time to initiate corrective measures in an adaptive manner. For example, the dataset drift detection can be used to improve the prediction performance of a DL model in a dynamic environment. Presence of dataset drifts in the input data stream can unfavorably affect the subsequent predictions [12]. By constantly monitoring the input data stream for dataset drifts and replacing the drifted samples with predictions can vastly mitigate the effect of dataset drifts on subsequent DL model predictions [12]. The work in Reference [11] also has shown how the dataset drift detection can be used to improve the performance of proactive wireless RF resource allocation. It has used dataset drift detection in conjunction with a proactive resource allocation algorithm to issue an alert for possible reallocation of resources in the presence of dataset drifts in wireless data.

2.1 Dataset Drift in ML Literature

In ML-related research, the problem of data drift has been addressed in various ways [4, 17, 18, 25, 26]. Drift detection method (DDM) [9] is one of the earliest works that has dealt with concept drift. It used a time window where it calculates the online error rate for the actual model using the data samples inside. A similar implementation to the DDM has been adopted in the early drift detection method (EDDM) [4]. It has advanced on detecting gradual concept drift by using the distances between the classification errors in a time window instead of using the online error rate. The authors in Reference [25] have also developed a concept drift detection algorithm using statistical test of equal proportions. The principle behind the algorithm in Reference [25] is examining two accuracies: a recent accuracy calculated for a window of size \(W\) and the overall accuracy calculated from the start of learning to detect the drift. In Raza et al., an algorithm for the detection of dataset shift-point for non-stationary time series data is presented. Raza et al. use exponentially weighted moving average control chart for observations that are auto-correlated to detect the shift-point in real-time. Kifer et al. address the concept drift from the root cause of distribution drift. Kifer et al. propose a measure called relativized discrepancy to calculate the distance between distributions, which is used for drift detection. Some limitations we have identified in the above algorithms are (i) they only provide dataset drift detection and do not give any information about the kind of change that has occurred; (ii) either they are complex for implementation in hardware or they are non-real time solutions.

Furthermore, these traditional ML approaches to data drift detection can be too slow for complex wireless data prediction scenarios where fast-evolving data need to be analyzed in real-time to make good predictions and perform proactive resource management accordingly. Until now, the application of concept drift detection for wireless communication network datasets has not been thoroughly studied. As wireless datasets are highly dynamic, classical ML/DL models can make erroneous control decisions or end up in catastrophic failure. To facilitate better automated resource management decisions in DL-driven 6G networks, different from traditional approaches, we propose a technique that can detect in real-time sudden drifts in wireless resource utilization demand.

2.2 FPGA-assisted Real-time Analytics

Processing RF streaming data and performing real-time data analytics on it require high-performance and low latency processing systems [14]. A general purpose CPU cannot address the requirements of real-time analytics and drift detection at high data rates of incoming streaming RF data [14, 33]. For example, a CPU-based real-time streaming packet processor can suffer from network-memory-CPU bottleneck resulting in network packets dropping [33]. The FPGAs are more suitable for real-time streaming data processing applications, as the underline hardware is reconfigurable and is very much suited for highly pipelined and high throughput applications [24]. By using FPGAs, techniques that are commonly applied for streaming data to speed up processing such as single-instruction multiple-data (SIMD) operations can easily be implemented [29]. Also, FPGAs have higher performance per watt efficiency compared to general purpose CPUs, which makes them attractive and effective in streaming analytics applications [14]. For example, Sukhwani et al. have presented an FPGA-based query-processing engine to offload and accelerate database-analytics queries that are computing-expensive. In Neshatpour et al., a heterogeneous architecture consisting of CPU and FPGA for accelerating data mining and ML algorithms is proposed. It allows computing-intensive tasks to be offloaded to the hardware accelerator for energy-efficiency and speedup. In Wu et al., an FPGA accelerator for stream data processing at the edge servers using a distributed stream processing system called F-Storm is presented. Algemili has presented an FPGA implementation of Moore’s algorithm for data stream processing. In Fraser et al., a streaming implementation of kernel normalized least mean squares algorithm for regression in FPGA is presented. Recently, real-time implementations of RF-related ML applications on FPGAs have also been proposed by several research works [5, 21, 27, 28]. Soltani et al. have implemented a real-time RF signal classifier for classifying RF signals on an FPGA. It used a deep neural network (NN) implemented on an FPGA to classify the RF signals. The classification is done based on the modulation type. In Siddhartha et al., a real-time implementation of an RF spectral prediction method using a long short term memory (LSTM) network is presented. Moss et al. have implemented a real-time anomaly detector for RF signals using an encoder-decoder network on an FPGA. Bhatia et al. have proposed an FPGA-based NN architecture for denoising block with application in jammer suppression in long term evolution (LTE) networks.

All these prior works substantiate the fact that FPGAs are more suited for prototyping real-time RF data processing/analytics modules that can be utilized in RF data streaming applications. This well-suitedness of FPGAs for real-time wireless data processing and analytics has motivated us to use them in our work for hardware-accelerated dataset drift detection. Our dataset drift detection algorithm handles the dataset drift problem by the root cause of distribution drift using histograms. To the best of our knowledge, this is the first time an FPGA-based real-time dataset drift detection method has been applied to a wireless system.

Skip 3PROBLEM FORMULATION Section

3 PROBLEM FORMULATION

The problem considered in this article is to enable real-time robustness in a DL-based prediction solution for wireless RF data that can experience dataset drifts over time. The system model of the proposed solution is shown in Figure 2. In our model, time series is a sequence of data points \(\lt x_1, x_2, \ldots , x_t, \ldots \gt\), where each element \(x_t\) is a new measured value at a time instance \(t\). The data stream we consider is a time series of CU data in a wireless network. For a given time interval \([t, t+\tau)\), let the set of input variable points denoted by \(X\) and the set of target variable points denoted by \(Y\) then the joint distribution of input/output points for the given time duration is given by \(P_{(t, t+\tau)}(X,Y)\). The prediction model will produce accurate predictions when the new inference data are similar to the training data. When these two datasets are different, then the model can predict with less accuracy and can produce unexpected results. When these two data-sets are different over time, the data surrounding the model is said to be drifting.

Fig. 2.

Fig. 2. System model of the proposed hardware-accelerated robust DL system for wireless RF data.

Let the joint distributions of the model-predicted data and the test data for the time duration \(\tau\) be \(P_{prd(t,t+\tau)}(X,Y)\) and \(P_{tst(t,t+\tau)}(X,Y)\), respectively. Then the dataset drift at time \(t\) can be given as (1) \(\begin{equation} \exists t:\ P_{prd(t,t+\tau)}(X,Y) \ne P_{tst(t,t+\tau)}(X,Y). \end{equation}\)

For online CU time series prediction in the presence of dataset drift, our work aims to incrementally learn the time series and provide robust predictions in real time. Given the time series vector with a history size \(N_H\), the predicted data vector for a future target size \(N_F\) can be given as (2) \(\begin{equation} \mathbf {\hat{x}_t} = \mathbf {f_{\theta }}(x_{t-1}, x_{t-2}, \ldots , x_{t-1-N_H}), \end{equation}\) where \(\mathbf {f_{\theta }}\) denotes a DL model with learned parameters given by \(\theta\). Due to its ability to efficiently model data with temporal dynamics behavior, we use a recurrent neural network (RNN)-based DL model for time series forecast in this article.

In a real-time prediction system that makes predictions for multiple time-points, a particular time series modeling feature is that future target data points become historical data points after the lapse of a certain time duration. For instance, to make the next \(N_F\) predictions at time instance \(t\), historical time series vector \(\mathbf {x_t} = \lt x_{t-1}, x_{t-2}, \ldots , x_{t-1-N_H}\gt\) is used. Then, to make the next \(F\) predictions at time instance \(t+F\), the new historical time series vector \(\mathbf {x_{t+N_F}} = \lt x_{t+N_F-1}, x_{t+N_F-2}, \ldots , x_{t+N_F-1-N_H}\gt\) is used. One complication of adopting such a procedure is that if the time series vector \(\mathbf {x_t}\) contains data belonging to a new concept, then the next prediction at time instance \(t+N_F\) based on the time series vector \(\mathbf {x_t}\) can become biased and erroneous [12].

Our goal is to address these challenges by designing an FPGA-assisted module that is able to detect the dataset drifts in real-time, which in turn enables the DL model to react to the dataset drifts accordingly. The robustness in the DL model is able to mitigate the effects on predictions from sudden dataset drifts and able to learn new concepts promptly to favor precise and timely predictions. The detection of dataset drifts in real-time, exactly as they happen, is enabled by implementing a hardware-assisted solution based on a low-cost Xilinx’s Zynq-7000 series FPGA.

Skip 4INSIGHTS BEHIND OUR DRIFT DETECTOR DESIGN Section

4 INSIGHTS BEHIND OUR DRIFT DETECTOR DESIGN

In this section, we present insights behind our dataset drift detection and explanation algorithm for streaming data. Please refer to Algorithms 1 and 2 for the complete steps of the proposed solution. A simplified illustration of the dataset drift detection process is shown in Figure 3. In our design, we use RF I/Q data to calculate CU values [16]. From CU values, a data stream of histograms is obtained, as histograms can be used to represent the probability distributions of the input CU values. Let \(\mathcal {H}\) represent a data series \(\lt H_1, H_2, \ldots \gt\), where each item \(H_i\) denotes a histogram of the input data with \(B\) bins at time instance \(i\). \(H_i\) is given as \(H_i = \lbrace (I_1, \pi _{i,1}), (I_2, \pi _{i,2}), \ldots , (I_B, \pi _{i,B})\rbrace\) where each bin is given by \(I_j = [I̠_j, \bar{I}_j]\) with \(I̠_j\) and \(\bar{I}_j\) being the leftmost and rightmost boundaries of the bin, respectively, and \(\pi _{i,j}\) gives the number of counts in each bin. Our intention is not only to detect dataset drift in the input data stream, but also to give an explanation of what kind of change has occurred. To enable this, we incorporate a histogram classification method that labels each histogram \(H_i\) to a certain reference state \(S_i\) based on a pre-decided percentile. For example, for the CU problem, the appropriate number of classified states can be 5, which represents very low, low, medium, high, and very high CU states. Upper part of Figure 3 illustrates the obtained classified states for input data histograms. The proposed algorithm works in two different phases. In the first phase, which is the training phase, the algorithm finds the reference histograms using the histogram classification method from training data. In the test phase, a DL model is used to obtain the predictions for the next time duration \(\tau\), and the prediction histogram \(\hat{H}_\tau\) is generated. Then, the corresponding reference histogram \(\hat{R}_\tau\) and the reference state \(\hat{S}_\tau\) are obtained. We say that a dataset drift has occurred if \(S_i \nsim \hat{S}_\tau\) (e.g., \(S_i\) and \(\hat{S}_\tau\) are significantly dissimilar). This process of detecting the drift for an example prediction interval is illustrated in Figure 3 (see the lower part of the figure).

Fig. 3.

Fig. 3. Overview of the implemented dataset drift detection algorithm for an example prediction interval.

We use the following two tools in dataset drift detection algorithm.

(1)

State classification

Input data distributions that come in the form of histograms are classified by using their cumulative distribution functions (CDFs). Let \(\bar{X}\) be a random variable, then the CDF of \(\bar{X}\), \(F_{\bar{X}}\) is defined by \(F_{\bar{X}}(x) = P(\bar{X} \le x),\) which is the probability that \(\bar{X}\) is less than \(x\). Let the set of reference states be \(\mathcal {S} = \lbrace S_1, S_2, \ldots , S_M\rbrace\) and the set of boundaries for reference states be \(\mathcal {L} = \lbrace l_1, l_2, \ldots , l_{M-1}\rbrace\), respectively, where \(M\) is the number of reference states, then the reference states are defined as (3) \(\begin{equation} S_i = \lbrace S_j | P(\bar{X} \le l_j) \ge p\rbrace , \end{equation}\) where \(p\) is a pre-decided percentile and \(j\) is the minimum possible number in \([1,M-1]\) that satisfies the predicate \(P(\bar{X} \le l_j) \ge p\). As an example, for the CU problem, we considered the pre-decided percentile, \(p=0.8\), which means that at least \(80\%\) of the values should be inside the corresponding boundaries of the declared state (refer to Section 6 for details).

(2)

Earth Mover’s Distance

Unlike in other similar approaches that rely on classification error such as References [4, 9] to detect dataset drifts, our algorithm is based on a distance-based technique that calculates the distance between current data distribution and a reference distribution calculated from the predictions from a DL model.

Let \(P_1\) and \(P_2\) be two probability density functions (PDFs), and their CDFs be \(F_1\) and \(F_2\), respectively. Then, the earth mover’s distance (EMD), \(d_e\) between the two distributions is given by (4) \(\begin{equation} d_e(P_1, P_2) = \int _{-\infty }^{\infty } |F_1(x) - F_2(x)| \ dx. \end{equation}\)

In contrast to most of the other distance metrics, EMD is a distance metric that is bounded. For any two arbitrary PDFs, the EMD between the two distributions can be given as \(d_e \in [0, 1]\). If the two PDFs are identical (i.e., complete overlap), then \(d_e = 0\) and if they are completely divergent, then \(d_e = 1\) (i.e., no overlap).

4.1 Reference Histograms and Thresholds Calculation

To calculate the reference histograms for respective \(M\) reference states, we make use of the state classification and EMD tools introduced earlier. The functions ClassifyHistogram and EarthMoversDistance in lines 16 and 26 of Algorithm 1 show state classification and EMD tools adapted for histograms, respectively. In training phase of the proposed drift-awareness method, all the histograms in the training dataset are classified using the predefined state boundaries. Then, the reference histograms are obtained by taking the mean of the histograms corresponding to each state. To calculate the dataset drift detection thresholds, quantiles of the EMDs between the classified histograms and the corresponding reference histograms are obtained. The quantile \(q\) to be used for dataset drift detection threshold \(t_{s_p}\) can be selected heuristically based on the detection performance later. Algorithm 1 shows the complete steps of this process.

4.2 DL Model

In the training phase, we train a DL model as well, which is able to predict the next few steps of the CU time series based on historical steps. The DL model once trained learns to capture the key temporal characteristics of the time series in such a way that it is able to effectively predict the normal behavior of the time series. The proposed drift-awareness method relies on the DL predictions to differentiate between the normal and anomalous behavior of the time series. To construct the DL model, a type of RNN called LSTM RNN is used. Details of our DL model can be found in Reference [11].

4.3 Drift-awareness and Drift Explanation

Dataset drift detection takes place in the test phase of the proposed drift-awareness method. Algorithm 2 shows the complete steps of the dataset drift detection process. We consider a time window equal to the prediction interval \(\tau\). Let the real-time CU histogram data series be \(\mathcal {H}_r =\ \lt H_{t+1}, H_{t+2}, \ldots , H_{t+N_F} \gt\), where \(H_i\) denotes the histogram at time instance \(i\) and past CU time series data vector be \(\mathbf {x_t} =\ \lt x_{t-1}, x_{t-2}, \ldots , x_{t-1-N_H}\gt\). The DL model takes \(\mathbf {x_t}\) as the input and makes multiple time point predictions for the prediction time duration \(\tau\) from which the corresponding prediction histogram \(\hat{H}_\tau\) is generated. To obtain the reference state \(\hat{S}_\tau\) corresponding to \(\hat{H}_\tau\), EMD between \(\hat{H}_\tau\) and each of the reference histograms \(R_i\) is calculated and the classified state of the reference histogram \(\hat{R}_\tau\) that gives the lowest EMD is selected as \(\hat{S}_\tau\). Then, we calculate the EMD between the input real-time histogram \(H_i\) and \(\hat{R}_\tau\) and we check whether the resulting EMD \(d_{er_i}\) is higher than the corresponding EMD threshold. If \(d_{er_i}\) is higher than the threshold, then we record a dataset drift. We perform this check for each \(H_i\) that occurs in the prediction time duration \(\tau\). When the recorded number of dataset drifts within the prediction time duration is higher than the allowed number of dataset drifts \(N_{th}\), then we declare that a dataset drift has occurred. Further, the algorithm records the state of each of the input histogram \(H_i\) inside the prediction time duration \(\tau\). When a dataset drift is detected, the algorithm provides an explanation to the change by returning the most frequent state it has recorded. The information on drift detection and explanation can be used either for automating DL model adaptation decisions or for mitigating the adverse effects of dataset drift on predictions later in time.

4.4 Drift-awareness-assisted Predictions

For stationary time series, a trained DL model is only supposed to predict the normal behavior of the time series well. When there is a certain disturbance in the time series that hurts the stationarity, then the DL model will behave badly on the prediction task at the disturbance and following the disturbance.

By using the proposed drift-awareness method, we can promptly detect any disturbance that can occur in the time series and replace them with the predictions from the trained DL model for subsequent predictions. Therefore, the drift-awareness-assisted prediction process can be given as (5) \(\begin{equation} \mathbf {\hat{x}_t} = \mathbf {f}_\theta (\bar{x}_{t-1}, \bar{x}_{t-2}, \ldots , \bar{x}_{t-1-N_H}), \end{equation}\) \(\begin{equation*} \bar{x}_k = {\left\lbrace \begin{array}{ll} \hat{x}_k, & \text{if } x_k \text{ belongs to a disturbance}, \\ x_k, & \text{otherwise}, \end{array}\right.} \end{equation*}\)

where \(\forall k \in \lbrace t-1, t-2, \ldots , t-1-N_H\rbrace\). We show that this can significantly improve the prediction performance of the DL model under disturbances due to dataset drifts.

In general, it is not desired for a DL model to take into account sudden drifts with no pattern in its predictions, as it can be treated as noise for the system. If the drift occurs gradually for a longer time period or if it is periodic, then it has to be taken into account for the DL predictions. In such situations, usually the DL model should be retrained with new training data to reflect the input data distribution change in predictions. Studying of these kinds of longer or periodic drifts can be considered as a future direction of this work. When the proposed drift-awareness method is used in a certain application (such as in proactive wireless frequency resource allocation) and the drift-awareness method detects a drift in input data, then the application takes that into account. For example, in a proactive wireless resource allocator, the wireless resource allocation algorithm should perform resource reallocation when a certain type of drift is detected in wireless data.

It is important to note that the problem of drift detection poses two significant challenges for the DL-driven wireless resource allocation problems. The first challenge is to be able to detect when a drift occurs in real-time and adjust the resource allocation decisions for wireless access points accordingly. Our work provides an FPGA-driven real-time solution to address this first challenge. There is also a second challenge relating to the possibility of developing a more robust DL model that takes into account a variety of drift cases from the beginning. Such a model then can utilize the proposed real-time drift detection to enable switching of modes or defaulting to a more appropriate mode for a transient/drift, and so on. Addressing this second challenge requires careful and detailed research and is the topic of future research work.

Skip 5FPGA DESIGN AND IMPLEMENTATION Section

5 FPGA DESIGN AND IMPLEMENTATION

Wireless systems that work with megahertz (\(MHz\)) bandwidths can generate several millions of I/Q samples per second. Processing such data rates in real-time to obtain the proposed drift-awareness in a DL solution using a conventional personal computer (PC)-based software defined radio (SDR) solution is not possible. This is due to the reason that the proposed drift-awareness solution utilizes multiple parallel algorithms for very high-rate data processing. When the same parallel algorithms are executed in real-time on a PC in software, this will lead to these algorithms performing sequential processing that in turn will create gaps in the drift-awareness calculations. The gaps in calculations will lead to significant degradation in the performance of the drift-awareness. FPGAs are very well suited for this kind of real-time streaming data processing due to their underline architecture. Thus, we implement our dataset drift detection system around an FPGA-based hardware. Implementing the entire design in the FPGA has enabled us to achieve real-time dataset drift detection without generating any data drops (gaps in data) or other practical bottlenecks that cannot be achieved in a software-based solution on a PC.

Figure 4 presents an overview of the implemented dataset drift detection solution. The solution comprises an RF receiver, an FPGA development board, and a PC. Figure 4 also shows how different functions have been partitioned between the hardware and the software. Various real-time data processing modules that are used for CU calculation, EMD calculation, and dataset drift detection are implemented as IP cores that reside on the FPGA, while the DL prediction model and the performance evaluation module run in the PC. Several key hardware design concepts have been adopted to realize the design of which the details are presented in the next few subsections.

Fig. 4.

Fig. 4. Overview of the implemented explainable drift-awareness system.

5.1 Hardware Overview of the Proposed Solution

The explainable drift-awareness solution presented in this work has been implemented on a low-cost Zedboard [3] development board that comprises a Zynq-7000 series all programmable system on chip (SoC) device [36]. Zynq-7000 SoC is a powerful FPGA device that integrates a dual-core ARM Cortex-A9-based processing system (PS) and a programmable logic (PL) in the same device [36]. The Zedboard also features an FPGA Mezzanine Card (FMC) interface that we used to connect to an FMCOMMS2 [6] SDR board based on Analog Devices’ AD9361 agile RF transceiver. We have implemented our designed data processing modules by integrating new IP cores to the FMCOMMS2 HDL reference design [2]. We used the tools Xilinx Vivado high level synthesis (HLS), Xilinx system generator (XSG), and Xilinx Vivado to develop the data processing modules. Vivado HLS tool was used to develop data processing modules primarily due to its ability to target code specifications written in C/C++ into Xilinx devices without the need to manually create register transfer level (RTL) designs. It enables different architectures for the RTL to be implemented from the same C/C++ specification [22] using different tool directives [35]. Furthermore, design implementation using HLS can significantly reduce the design cycle when compared to RTL [32]. We used the XSG to simulate and package the IP core, which can be added to the Vivado IP catalogue for later use in an HDL design. Finally, Xilinx Vivado was used to integrate the IP core to the FMCOMMS2 HDL design, synthesize, and generate the bit stream for the FPGA programming. The firmware for the ARM processor in PS, which is responsible for the configuration of IP cores, dataset drift detection, and the movement of data between IP cores and PC, was developed using Xilinx’s Software Development Kit. The entire design and its source code are provided in Reference [10].

Figure 5 shows a high-level overview of the data flow among various modules of the proposed explainable drift-awareness solution using Zedboard and FMCOMMS2 SDR. The PL section of the Zynq SoC device contains the IP cores of the original FMCOMMS2 HDL design [2] and the implemented data processing IP cores. In the figure, it can be seen that the FMCOMMS2 SDR interfaces with the advance extensible interface (AXI) AD9361 IP core via the FMC interface. The AXI AD9361 IP core takes care of the low-level signaling with the SDR that is configurable through the AXI interface of the IP core. Also, it is responsible for performing direct current (DC) filtering and I/Q correction of the received I/Q samples from the SDR and forwarding them to the analog to digital converter first in first out (ADC FIFO) IP core, which is simply a FIFO buffer. I/Q samples from the FIFO are read by the implemented IP cores of the drift-awareness solution that perform the necessary data processing tasks. The input data width and the input data rate to the drift-awareness solution from ADC FIFO IP core are 16 bits and \(30.72 MS/s\) (mega samples per second), respectively. The ADC PACK IP core reads the output data from our implemented IP cores and forwards them to the ADC direct memory access (ADC DMA) IP core, which writes them to the double data rate (DDR) memory through the high-performance AXI interface.

Fig. 5.

Fig. 5. Data processing chain showing the data flow among IP cores.

Partitioning of our design among PS and PL sections of the Zynq-7000 SoC allows leveraging unique capabilities attributed to each section. The PL is in charge of accelerating time-critical signal-processing tasks, while less critical tasks are performed in the PS. PS also takes care of the task scheduling and movement of data among the IP cores and the PC. In Figure 5, the implemented IP cores in PL and some of the tasks performed by the PS are shown. In the next subsections, an in-depth explanation of the implemented hardware modules are presented.

5.2 Pipelined Architecture for Data Processing Modules

Having gaps or drops in the data can adversely affect the performance of the drift detection method. To process data continuously without making any gaps or drops, all the data processing modules should accept the data and process at the same rate as the sampling frequency. To satisfy this requirement, a pipelined architecture for the data processing IP cores using data valid signals has been realized. The pipelined architecture is shown in Figure 6. A key attribute of a pipelined device is the initiation interval (II), which is defined as the number of clock cycles before the module can accept new input data. In a pipelined design, II of the IP cores in the pipeline should be equal to 1 to avoid any data loss. By exploiting various design techniques, we have developed the IP cores to satisfy this requirement. Moreover, by utilizing the data valid signals from the original FMCOMMS2 HDL reference design [2], we have been able to control the data flow between our implemented IP cores. The IP cores read the incoming data when the incoming data valid signals are high. They output the processed data and assert the outgoing data valid signals to notify the next IP core in the pipeline of the validity of the outgoing data.

Fig. 6.

Fig. 6. Pipelining of IP cores using valid signals.

5.3 FSM-based Data Transfer

The design is composed of calcCU, hist, cdf, absdiff, and pdist IP cores and involves multiple input data sources for these IP cores to perform computations. It also needs to handle multiple data outputs such as EMD values and histogram array values. Although AMBA AXI protocol is widely adopted as the medium to exchange data in FPGA SoCs, however, the protocol does not specify how conflicting transactions involving multiple data sources/sinks are arbitrated. Integrating conflict-free transactions for multiple IP cores that work in parallel into the same FPGA-bound design is complex and challenging at the system level. To this end, the design incorporates several key techniques to manage the data flow among the IP cores. For example, provision of conflict-free (without generating any data collisions) asynchronous access to the ADC/DAC DMAs data flows to/from multiple IP cores is achieved using a simple FSM-based arbitration technique. Moreover, adoption of pipelining has allowed us to achieve the design objective of real-time response.

There are two inputs and one output to the design, as shown in Figure 7. One input comes from the ADC FIFO IP core, which is a continuous stream of I/Q data. The other input is coming from the DAC DMA, which is the reference cumulative histogram data from the PS. The single output is multiplexed to output the histogram data from the hist IP core and EMD data from the pdist IP core. There are two main functions of the FSM: (1) initiation of conflict-free computations in absdiff IP core when data are available from the DAC DMA and cdf IP core; and (2) facilitation of conflict-free data transfer at the end of computation from hist or pdist IP core to the ADC DMA. The FSM that is used to satisfy these requirements is shown in Figure 8.

Fig. 7.

Fig. 7. Implemented IP cores in the proposed hardware design.

Fig. 8.

Fig. 8. FSM model of the conflict-free data transactions design in the hardware.

The type of FSM we have used in our design is a Moore FSM in which the outputs of the FSM depend only on its current state. The FSM has four states, \(S0, S1, S2\), and \(S3,\) and uses three input signals to control the state transitions: (1) data valid signal from the cdf IP core (\(cdf\_vld\)); (2) data valid signal from the pdist IP core (\(pdist\_vld\)); and (3) FIFO read valid signal from the DAC DMA (\(fifo\_rd\_vld\)). These control signals are active high signals that get asserted at the validity of the data. The FSM drives two active high output signals: (1) DAC DMA FIFO read enable signal (\(fifo\_rd\_en\)); and (2) multiplexer line selection signal (\(sel\)). The \(fifo\_rd\_en\) signal is used to request data from the FIFO, and \(sel\) is used to select the output from the hist IP core if it is low or pdist IP core if it is high. In a device power-up or a system reset, the FSM starts in state \(S0\). As shown in Figure 8, state transitions occur depending on the input control signals and output signals corresponding to the four states are driven accordingly.

Next, we present the hardware implementation details of each of the implemented IP cores. We also provide parts of HLS codes that implement key functionalities of the IP cores in the manuscript. Note that the provided parts of HLS codes only discuss the key functionalities and do not contain the data flow control and interfacing techniques that have been abstracted away for the easy comprehension of the reader. Complete HLS codes for the implemented IP cores are provided in Reference [10] for reference.

5.3.1 calcCU IP Core.

This IP core is responsible for estimating the CU of the desired wireless channel by processing the incoming I/Q samples in real-time. A more detailed explanation in the design and implementation of this IP core has been presented in one of our earlier works [16]. The basic functionality of this core can be explained as follows: It estimates the noise floor of the signal in real-time and calculates a detection threshold to identify the signals from the noise. If the signal power exceeds the detection threshold, then the signal is declared to be present, otherwise it is declared to be absent. The fraction of time the signal is present in a certain time duration is given as CU.

5.3.2 hist IP Core.

This IP core is responsible for generating CU histograms in real-time. The number of bins denoted by \(NUM\_BIN\) and the histogram size to be used denoted by \(NUM\_SAMPLES\) are declared at the synthesis time. In our design for the CU problem, we have selected \(NUM\_SAMPLES=2^{13}\), which is found heuristically based on the performance of the drift detector. We have also used \(NUM\_BIN \in \lbrace 20, 50, 100\rbrace\). Listing 1 shows part of the HLS code used to generate the histogram in the IP core. Histogram updates happen whenever the data valid signal goes high.

Listing 1.

Listing 1. Part of the HLS code used to implement the hist IP core.

Usual synthesis of this code in Xilinx Vivado HLS would result in an implementation that will take NUM_BIN clock cycles to compare and fill a single bin in the histogram. To optimize the resulting micro-architecture, Vivado HLS provides compiler directives [35]. Fully unrolling the HIST_LOOP to create multiple independent operations for each loop iteration by using the compiler directive “UNROLL” results in a micro-architecture with a latency of 1. Figure 9 shows the schematic for the HIST_LOOP after the optimized code synthesis.

Fig. 9.

Fig. 9. Schematic of the histogram generator block.

Once the histogram generation is completed, the IP core outputs each bin of the histogram with an asserted data valid signal. When the data valid signal is high, the ADC PACK IP core reads the histogram data and forwards them to the ADC DMA, which transfers the data to the DDR memory. PS reads the histogram data and sends them to the PC for utilizing them in prediction purposes and collection.

5.3.3 cdf IP Core.

This IP core is responsible for generating cumulative distribution for CU samples. It integrates a histogram generator block that is implemented by an HLS code similar to Listing 1. First, it generates the histogram from the CU samples when the data valid signal goes high. Once the histogram generation is complete and when the data valid signal goes low, the cumulative sum is calculated and given out in each clock cycle with a data valid signal by this IP core.

Listing 2 shows part of the HLS code used to implement this IP core.

Listing 2.

Listing 2. Part of the HLS code used to implement the cdf IP core.

To generate cumulative probabilities from the histogram, we need to divide the values in each bin by the number of samples in the histogram (NUM_SAMPLES in the HLS code in Listing 2) and obtain the cumulative sum. The division operation is costly in hardware in terms of latency and area. To implement the division operation with simple bit shift operation, we choose the NUM_SAMPLES to be in a power of two. Therefore, the division operation can be implemented by shifting the register to the left by \(log_2(NUM\_SAMPLES),\) which has a zero latency and costs zero resources in hardware. Also, the division operation results in a decimal number for the calculated answer. We represent the resulting decimal numbers using fixed point number representation and preserve the maximum possible precision. The dtype_o used in Listing 2 is the fixed point datatype we used to hold the resulted cumulative probability value after division. Figure 10 shows the schematic of the implemented cdf IP core.

Fig. 10.

Fig. 10. Schematic generated for the HLS code in Listing 2.

5.3.4 absdiff IP Core.

This IP core is responsible for calculating the absolute difference between two cumulative distributions. One input cumulative distribution to the absdiff IP core comes from the cdf IP core, which generates the cumulative distribution for the CU in real-time. The other input cumulative distribution, which is the reference cumulative histogram obtained from predictions, comes from the PC via PS and DAC DMA. Listing 3 shows part of the HLS code used to implement this IP core. It shows how the absolute difference is calculated using simple comparison logic. Figure 11 shows the schematic of the implemented absdiff IP core.

Fig. 11.

Fig. 11. Schematic generated for the HLS code in Listing 3.

Listing 3.

Listing 3. Part of the HLS code used to implement the absdiff IP core.

5.3.5 pdist IP Core.

This IP core calculates the moving sum of the absolute difference values coming from absdiff IP core and gives out the EMD value, which is a measure of distance between two probability distributions. This IP core works in tandem with cdf and absdiff to generate the EMD result.

Listing 4 shows part of the HLS code used to calculate the moving sum of the implemented the pdist IP core. It calculates the moving sum when the data valid signal is high and once it sums up to the number of bins, it asserts the pdist_vld_o to indicate the validity of the calculated result. Figure 12 shows the schematic of its hardware implementation.

Fig. 12.

Fig. 12. Schematic generated for the HLS code in Listing 4.

Listing 4.

Listing 4. Part of the HLS code used to implement the pdist IP core.

Table 1 shows the fixed-point data type used in input and output data ports of each IP core. Fixed point arithmetic is represented using \(sp.n\) and \(up.n\) where the prefixes \(s\) and \(p\) are used to represent signedness or unsignedness, and \(p\) and \(n\) are used to represent the word length and the fractional length of the fixed point number, respectively. Two’s complement number representation has been used for signed numbers.

Table 1.
IP CoreFixed point data type
InputOutput
calcCU\(s16.0\)\(u16.0\)
hist\(u16.0\)\(u16.0\)
cdf\(u16.0\)\(u32.31\)
absdiff\(u32.31\)\(u32.31\)
pdist\(u32.31\)\(u32.26\)

Table 1. Data Types and Widths of Input and Output Data Ports of the Implemented IP Cores

5.4 Software Design of the Proposed Solution

The proposed solution consists of various key software modules that leverage the hardware modules implemented in the PL of Zynq-7000 FPGA. The software can be divided based on the modules that run on the ARM processor in PS and PC.

5.4.1 On ARM Processor in PS.

Our implemented IP cores in PL process streaming I/Q samples in real-time and generate data at a very fast pace. The data from the IP cores should be transferred to the system memory of PS for further processing and for sending them to the PC for prediction purposes. Using the ARM core processor in PS directly for data transfer purposes would incur a significant overhead on processor utilization. Each data transfer can inhibit the processor from using it in other computational requirements greatly limiting the system performance. To alleviate this issue, we made use of the interrupts associated with AXI DMA controller. We have configured the DMA controller to trigger interrupts at the end of each data transfer. Thus, the processor can schedule the data transfers in DMA controller and carry on with other tasks till an interrupt event occurs. The DMA controller would transfer the data to the system memory and notify the ARM processor at the end of each data transfer. This greatly relieves the computational power of the ARM processor for other tasks.

In the proposed data drift detection and classification system, the ARM processor works in two different operating modes: (i) transfer mode and (ii) detection mode. In transfer mode, it schedules histogram data reads from the hist IP core and sends the histogram data to the PC. This is the state in which the ARM processor works in most of the time. After a certain timeout, it goes to the detection mode where it obtains the reference histogram for the next predictions from the PC, generates the reference cumulative histogram, and sends it to the absdiff IP core for EMD calculation with real-time cumulative histograms. The processor continuously reads the EMD results and compares it with a threshold to detect for any dataset drift, as described in Algorithm 2. Then, the data drift detection information would be transferred back to the PC and the processor would change back to the transfer mode.

5.4.2 On PC.

Basically, there are two modules that reside in PC: (i) the DL model and (ii) the performance evaluation module. To obtain robust time series forecasts for CU data, the DL model is constructed using an encoder-decoder-based recurrent Bayesian NN. Compared to other NNs, an encoder-decoder-based recurrent NN is more powerful, as it can capture the patterns in the time series data using latent embedding space in an effective manner. Our work in Reference [11] provides details of our DL model. Based on the time series forecasts, corresponding reference state histogram is obtained that is sent to the Zynq-7000 FPGA for real-time data drift detection. The performance evaluation module is responsible for keeping track of the dataset drift detection results. It calculates the metrics such as accuracy, precision, and recall for the collected results to assess the performance of the data drift detection algorithm.

5.5 Timing, Resource Utilization, and Power Consumption Estimates

We have designed and implemented the data processing IP cores with \(II=1\), which is required for flawless pipelining. Table 2 shows the reported minimum clock periods for the implemented IP cores with different bin sizes after placement and routing. According to the minimum clock periods, the IP cores can easily work at a clock frequency of \(100 MHz\) without undergoing any timing violations. This means that the sampling rate of the design can be increased up to \(100 MHz\), which should give a theoretical bandwidth of \(50 MHz\). But, due to the hardware limitations of AD9361 transceiver used in FMCOMMS2, the maximum sampling frequency is limited to \(61.44 MHz\). The maximum data rate of the design with combined \(I\) and \(Q\) words is \(30.72 MS/s\). With the input data width to the drift detector being \(16 bits\), the maximum bit rate of the design is \(491.52 Mbits/s\).

Table 2.
IP CoreNum BinsMinimum clock period (ns)BRAM 18KDSP 48EFFLUT
calcCU-7.39202212165
hist206.18200384284
507.13100864858
1007.188001,6641,679
cdf206.25300416330
507.70500916862
1007.814001,7541,692
absdiff204.03800291340
505.14600713572
1005.734001,4231,277
pdist203.863004099
503.863004099
1003.863004099
20-6 (4.29%)39 (17.73%)24,565 (23.09%)15,731 (29.57%)
Whole System50-6 (4.29%)39 (17.73%)25,342 (23.82%)16,213 (30.48%)
100-6 (4.29%)39 (17.73%)26,846 (25.23%)17,561 (33.01%)

Table 2. Resource Utilization and Minimum Clock Estimates of the Implemented IP Cores after Placement and Routing

The whole FPGA hardware design consists of original IP cores from the FMCOMMS2 HDL design and our implemented IP cores. Table 2 shows the resource utilization estimates of the implemented IP cores and the whole system for 20, 50, and 100 bin sizes after placement and routing. According to Table 2, the design with 100 bin has the most resource utilization. Even for the 100 bin design, the whole system utilizes \(\lt\)\(5\%\) of BRAMS (block RAMs), \(\lt\)\(20\%\) DSP48E (digital signal processing blocks), \(\lt\)\(26\%\) of flip flops (FF), and \(\lt\)\(34\%\) of look up tables (LUTs) in the Zynq-7000 XC7Z020-CLG484-1 FPGA available in the Zedboard. Moreover, another instance of the proposed IP cores can be implemented for the other remaining \(I\) and \(Q\) data channels in FMCOMMS2 SDR device if a multi-channel solution is desired. By using the remaining channels, the bit rate of the system can further be increased up to \(983.04Mbits/s\).

Figure 13 shows the estimated power consumption of the FPGA SoC with the implemented design. Total power consumption of the system is estimated at \(3.104W\). It is clear from the figure that \(49.7\%\) of the total power is consumed by the PS, while the FPGA PL only consumes about \(44\%\) of the total power.

Fig. 13.

Fig. 13. Estimated power consumption of the FPGA SoC with the implemented design. The power consumption estimates were obtained using Xilinx’s Vivado tool. Percentage power consumption was calculated with reference to the total power usage and only the most significant values are shown.

As the proposed design is an original, it is not possible to directly compare the implementation complexity of the whole design with other references. Yet, the cdf IP core in the design can be compared. Table 3 shows the FPGA slice usage of the cdf IP core in our design and the implementation in Reference [19] for 16 bins. Note that the results in the table only present a rough comparison, as the proposed implementation is partially parallel as explained earlier, whereas the implementation in Reference [19] is fully parallel incorporating parallel accumulators. The partially parallel implementation of the proposed method is sufficient for our application, since it does not cause any performance degradation. Nevertheless, we can assume that the proposed solution has a comparable implementation complexity.

Table 3.
DesignFPGA slices
Proposed160
[19]688

Table 3. Resource Utilization Comparison of cdf IP Core

Skip 6PERFORMANCE EVALUATION Section

6 PERFORMANCE EVALUATION

In this section, we present the experiment results of our proposed explainable drift awareness system. First, we provide the configuration details of the experiment setup. Then, we present and discuss the results of the proposed method.

6.1 Experiment Setup

To test the proposed method with real wireless data, a Zedboard implementing the proposed modules equipped with a FMCOMMS2 SDR was used in a wireless environment with multiple APs. The device was configured to capture RF data in a 2.4 GHz unlicensed channel with a sample rate of \(30.72MSPS\). The Zedboard was connected to a PC, as shown in Figure 4. The PC was used to send the reference histograms to the Zedboard and read back the dataset drift detection decisions from the Zedboard. The performance of the proposed method was evaluated in the PC using MATLAB software.

For the performance evaluation of the drift detector, first, we need to collect enough CU histograms using the hist IP and obtain the reference histograms from them using Algorithm 1. A subset of the collected histogram data used in this research are provided in Reference [10]. As mentioned earlier, for the histogram classification, five CU states (\(M=5\)) with corresponding state boundaries given by \(\lbrace 5, 15, 35, 65\rbrace\) were used. The reference histograms were obtained by classifying around 50,000 histograms. The reference histograms thus obtained are shown in Figure 14. We used \(p=0.8\) and the third EMD quantile of each state as the detection threshold \(t_{s_p}\). Also, we declared a dataset drift when more than 15 histograms out of 20 histograms (\(N_{th} = 15\)) reported a distribution change. Note that the values given here are heuristically found and specific for the CU problem.

Fig. 14.

Fig. 14. Reference histograms obtained for the five CU states. Note that reference histograms with 20 bins are shown for clarity.

6.2 Dataset Drift Detection Performance

We evaluate the performance of the proposed drift-awareness method by considering the number of hits and the number of misses. A hit is when there is a drift in the input data and the drift-awareness method detects it, which is considered as a true positive (TP) or when there is no drift and the drift-awareness method does not detect it, which is considered as a true negative (TN). A miss is when there is a drift and the drift-awareness method does not detect it, which is considered as a false negative (FN) or when there is no drift and the drift-awareness method detects a drift, which is considered as a false positive (FP). By using these cases, we suggest following confusion matrix-related scores to evaluate the performance of the drift-awareness method: (6) \(\begin{equation} Precision = \frac{TP}{TP+FP} , \end{equation}\) (7) \(\begin{equation} Recall = \frac{TP}{TP+FN} , \end{equation}\) (8) \(\begin{equation} Accuracy = \frac{TP+TN}{TP+TN+FP+FN} , \end{equation}\) (9) \(\begin{equation} False\ positive\ rate\ (FPR) = \frac{FP}{FP+TN} , \end{equation}\) (10) \(\begin{equation} False\ negative\ rate\ (FNR) = \frac{FN}{FN+TP} . \end{equation}\)

As the reference state is selected based on predictions from the DL model for the next prediction time duration, the prediction performance of the DL model can be a bottleneck to the performance of the overall system. As we are interested only in the performance of the dataset drift detection method, we remove the DL model performance dependency by choosing the reference state with uniform probability. By using the confusion matrix-related scores, only the drift detection method is evaluated.

To study the dependency of the proposed dataset drift detection method with the size of the bin of the histogram used, we evaluated the performance for several histogram bin sizes. We evaluated the performance of the proposed method for 100 bin, 50 bin, and 20 bin histogram bin sizes. For each histogram bin size, we performed the test experiment for 10,000 iterations feeding the same number of reference histograms to the drift detector and collected the drift detection data to generate the confusion matrix-related scores. We also evaluated the performance scores for all the five classification states used. Figure 15 shows the precision, recall, accuracy, FPR, and FNR obtained for the proposed drift-awareness method. According to Figure 15, we can see that the histogram with 100 bins achieves the highest values in recall and accuracy scores across all classification states. It also achieves very good value for precision score. The histogram with 20 bins has almost equal performance for precision compared to other histogram bin sizes but performs poorly in recall, accuracy, FPR, and FNR scores. Still for every histogram bin size, the precision, recall, and accuracy scores are higher than 0.8, which is an excellent result. From the results in Figure 15, we can conclude that the performance of the proposed dataset drift detection method increases with the increase in histogram bin size.

Fig. 15.

Fig. 15. (a)–(e) show the calculated precision, recall, accuracy, FPR, and FNR for the proposed dataset drift detection method for different histogram sizes.

Although we see that the histogram with 100 bins has the best overall performance, the performance gain obtained with the increase in FPGA resource utilization (refer Table 2) compared to other histogram bin sizes is marginal. According to the results in Figure 15 and Table 2, the histogram with 50 bins gives a compromise between the performance and the FPGA resource utilization. Therefore, we can conclude that a histogram with 50 bins is more suited for the dataset drift detection method.

6.3 Significance of Hardware Acceleration in the Proposed Dataset Drift Detection Method

In the proposed dataset drift detection method, we have used hardware acceleration for different computations such as CU calculation, histogram generation, EMD calculation, and so on. This has allowed the drift detector to generate accurate drift detection decisions in real-time. A conventional software-based approach to this system will have a buffer-based method where I/Q data would be collected to a buffer in the memory and transferred to the PC when the buffer is full for further processing. This kind of method had been commonly used with WARP SDR platforms [31]. The total latency accrued in data transfer to the PC and in performing all the parallel computations required for the dataset drift detection algorithm sequentially in PC will be considerably higher. As the ADC data rate is also higher (in the proposed design, the combined I/Q rate is 30.72 MS/s), the buffer can be filled within a very short period before the next transfer to the PC is initiated. This will give rise to drops or gaps in the collected I/Q data, which will result in missing out on the actual states of the network. For example, the CU of the network can suddenly rise for a short period of time due to a web browser loading a web page full of multimedia or initial buffering of a video when played. If we look at the CU histogram at such a time point, then it will belong to the representative state high or very high state, whereas the histograms before and after that time point can belong to low or medium state. By considering 20 histograms in the vicinity, we have identified these kinds of abnormal points and shown them in Figure 16 with respect to mean CU calculated for each histogram. It is apparent that there are quite a lot of these abnormal points and, therefore, there is a high probability that a software-based approach that has gaps in the collected data can have histograms at these points and generate erroneous dataset drift decisions based only on them. However, a hardware-based approach will generate better dataset drift decisions, as it has all the histogram data in the vicinity and considers them in dataset drift detection as well.

Fig. 16.

Fig. 16. Mean CU data showing abnormal points (outliers). Each data point in the figure also has a CU histogram associated with it.

To simulate this concept, using real CU data with and without gaps, we evaluate the confusion matrix-related scores for software- and hardware-based approaches for the data in the time period shown in Figure 16. We consider that the software-based approach makes the dataset drift detection decision based on histograms of data with gaps, whereas the proposed hardware-based approach based on histograms with gapless data.

Table 4 shows the calculated confusion matrix-related scores for the two approaches. We can immediately see that the software-based approach has taken a huge performance degradation compared to the hardware-based approach. Especially, in Table 4, the FPR of the software-based method has drastically increased, which means that dataset drift detection algorithm based on the software-based approach will generate very high false positives or false alarms in the dataset drift detection. This is very disadvantageous from an application perspective, such as in network resource allocation. More often false detection of drifts will require more often reactive resource allocation, which is computationally intensive and can create significant overhead for the system. From the results, it is obvious that the hardware-based approach for dataset drift detection is way superior compared to the software-based approach.

Table 4.
Representative stateApproachPrecisionRecallAccuracyFPRFNR
very lowHW1.00000.95010.97440.00000.0499
SW0.94920.94430.94680.61180.0557
lowHW0.98930.89570.94020.21430.1043
SW0.95050.86610.90641.00000.1339
mediumHW0.99470.96390.97910.06590.0361
SW0.92250.89220.90710.95850.1078
highHW0.96670.86540.91320.16770.1347
SW0.84770.74010.79030.74660.2599
very highHW0.98740.89480.93880.05480.1052
SW0.87200.79450.83140.55720.2055

Table 4. Confusion Matrix-related Scores Calculated for the Hardware-based Approach and Software-based Approach Denoted by HW and SW, Respectively

These results clearly show the performance degradation when the software-based approach is utilized. Hence, a hardware-based accelerator is crucial for achieving the best performance for the proposed dataset drift detection method.

6.4 Prediction Performance Improvement Using Drift-awareness

DL models used for prediction purposes show some amount of resistance to the dataset drifts that affect the prediction performance. A DL model is said to be more robust if its prediction performance degradation due to the drift is minimal. To quantify the model robustness, we propose the robustness measure, which evaluates the DL model performance in relation to its performance under a drift or a disturbance. Let the DL model be denoted by \(\mathcal {M}\), the model input by \(X\), the observed values by \(Y\). Let us also denote the model output and the observed values under a certain drift/disturbance \(\delta\) by \(\hat{Y_{\delta }}\) and \(Y_{\delta }\), respectively, then the robustness measure \(\mathcal {R}\) for the model is defined by (11) \(\begin{equation} \mathcal {R}(\mathcal {M}|X,Y,\delta) = \frac{\Pi (\hat{Y}, Y)}{\Pi (\hat{Y}_{\delta }, Y_{\delta })}, \end{equation}\) where \(\Pi\) denotes any performance metric such as mean absolute percentage error (MAPE). According to the robustness measure, a DL model with \(\mathcal {R} \approx 1\) has more robust performance under drifts/disturbances.

To apply the proposed drift-awareness method in model performance improvement in terms of robustness, we use validation data with a dataset drift introduced to it, as shown in Figure 17(a). In the figure, it is visible that the dataset drift has adversely affected the predictions that occur after the dataset drift. Figure 17(b) shows the predictions when the DL model is assisted by the proposed drift-awareness method. We can immediately see a huge improvement in DL model predictions when assisted by the drift-awareness method. To quantify the model performance in terms of robustness, we calculate the robustness measure \(\mathcal {R}\) for the two cases shown in (a) and (b) of Figure 17. To keep the results impartial, we only consider the area in the validation data where the dataset drift is not present. We use MAPE as the performance metric in the robustness measure. Table 5 shows the calculated robustness measure with and without the drift-awareness method incorporated in the DL model predictions. From the table, we can clearly see that the robustness measure increases from 0.8832 to 0.9998 when the DL model is assisted by the proposed drift-awareness method. This is a huge improvement in model prediction performance enabled by the proposed drift-awareness method.

Fig. 17.

Fig. 17. (a) shows the DL model predictions for the validation data introduced with a dataset drift that lasts for about two hours. (b) shows the drift awareness-assisted DL model predictions for the validation data when the data that corresponds to the dataset drift is detected and replaced by the predictions.

Table 5.
Without drift-awarenessWith drift-awareness
0.88320.9998

Table 5. Robustness Measure Calculated for the Validation Data with Dataset Drift Injected

Skip 7CONCLUSION Section

7 CONCLUSION

In this article, we have presented an FPGA-assisted real-time RF data analytics and explainable drift-awareness design for wireless resource utilization prediction system. The presented solution that leverages FPGA parallelism can simultaneously perform analytics and dataset drift detection on RF I/Q data streams. The RF I/Q data analytics are used for DL model updates and inference tasks, and the dataset drift detection is used to improve the robustness of the DL model in the presence of dataset drifts. We have addressed the problem of dataset drifts by the root cause of distribution change. We have adopted a hardware-friendly distance-based technique on the FPGA to quantify the change between the expected distribution constructed from DL model predictions and the real input data distribution. The preliminary results show that the proposed method has superior performance in detecting dataset drifts. Moreover, by incorporating the proposed method with DL model predictions, we have shown through experiments that the robustness of the DL model can significantly be improved.

For future work, we hope to explore and integrate statistical tests along with the proposed drift-awareness method. We also expect to perform more comparisons with other drift detection methods.

REFERENCES

  1. [1] Algemili U.. 2016. Investigation of reconfigurable FPGA design for processing big data streams. In Proceedings of the IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). IEEE, 226233. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Analog Devices, Inc. 2021. HDL Reference Designs: Github Repository. Analog Devices, Inc. Retrieved from https://github.com/analogdevicesinc/hdl.Google ScholarGoogle Scholar
  3. [3] Avnet. 2021. Complete Development Kit for Designers. Avnet, Inc. Retrieved from https://www.avnet.com/wps/portal/us/products/avnet-boards/avnet-board-families/zedboard/.Google ScholarGoogle Scholar
  4. [4] Baena-García M., Campo-Ávila J., Fidalgo-Merino R., Bifet A., Gavald R., and Morales-Bueno R.. 2006. Early drift detection method. In Proceedings of the 4th ECML PKDD International Workshop on Knowledge Discovery from Data Streams. 7786.Google ScholarGoogle Scholar
  5. [5] Bhatia A., Robinson J., Carmack J., and Kuzdeba S.. 2022. FPGA implementation of radio frequency neural networks. In Proceedings of the IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC’22). 06130618. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Devices Analog. 2021. AD-FMCOMMS2-EBZ: AD9361 Software Defined Radio Board. Analog Devices, Inc. Retrieved from https://www.analog.com/en/design-center/evaluation-hardware-and-software/evaluation-boards-kits/eval-ad-fmcomms2.html#eb-overview.Google ScholarGoogle Scholar
  7. [7] Ditzler G., Roveri M., Alippi C., and Polikar R.. 2015. Learning in nonstationary environments: A survey. IEEE Computat. Intell. Mag. 10, 4 (2015), 1225. DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Fraser N. J., Lee J., Moss D. J. M., Faraone J., Tridgell S., Jin C. T., and Leong P. H. W.. 2017. FPGA implementations of kernel normalised least mean squares processors. ACM Trans. Reconfig. Technol. Syst. 10, 4 (Dec.2017). DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Gama J., Medas P., Castillo G., and Rodrigues P.. 2004. Learning with drift detection. In Proceedings of the Advances in Artificial Intelligence – SBIA 2004, Bazzan Ana L. C. and Labidi Sofiane (Eds.). Springer, Berlin, 286295. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Ganewattha C.. 2021. Repository for Drift-awareness HLS Codes. Retrieved from https://github.com/ganewatthe/drift_aware_zed.git.Google ScholarGoogle Scholar
  11. [11] Ganewattha C., Khan Z., Latva-aho M., and Lehtomäki J. J.. 2022. Confidence aware deep learning predictions for cloud managed wireless resource allocation in shared spectrum bands. IEEE Access 10 (2022), 3494534959. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Guo T., Xu Z., Yao X., Chen H., Aberer K., and Funaya K.. 2016. Robust online time series prediction with recurrent neural networks. In Proceedings of the IEEE International Conference on Data Science and Advanced Analytics (DSAA’16). IEEE, 816825. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Hulten G., Spencer L., and Domingos P.. 2001. Mining time-changing data streams. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01). Association for Computing Machinery, New York, NY, 97106. DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Kainth M., Pritsker D., and Neoh H. S.. 2021. FPGA Inline Acceleration for Streaming Analytics. Intel Corporation. Retrieved from https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/wp/wp-01278-fpga-inline-acceleration-for-streaming-analytics.pdf.Google ScholarGoogle Scholar
  15. [15] Kaur J., Khan M. A., Iftikhar M., Imran M., and Haq Q. Emad Ul. 2021. Machine learning techniques for 5G and beyond. IEEE Access 9 (2021), 2347223488. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Khan Z., Lehtomäki J. J., Hossain E., Latva-Aho M., and Marshall A.. 2018. An FPGA-based implementation of a multifunction environment sensing device for shared access with rotating radars. IEEE Trans. Instrum. Measur. 67, 11 (2018), 25612578. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kifer D., Ben-David S., and Gehrke J.. 2004. Detecting change in data streams. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB’04). VLDB Endowment, 180191. Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Lu J., Liu A., Dong F., Gu F., Gama J., and Zhang G.. 2019. Learning under concept drift: A review. IEEE Trans. Knowl. Data Eng. 31, 12 (2019), 23462363. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Marsalek R., Pospisil M., Fryza T., and Simandl M.. 2013. Sequential and parallelized FPGA implementation of spectrum sensing detector based on Kolmogorov-Smirnov test. In Algorithms and Architectures for Parallel Processing. Springer International Publishing, 336345. Google ScholarGoogle Scholar
  20. [20] Morocho-Cayamcela M. E., Lee H., and Lim W.. 2019. Machine learning for 5G/B5G mobile and wireless communications: Potential, limitations, and future directions. IEEE Access 7 (2019), 137184137206. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Moss D. J. M., Boland D., Pourbeik P., and Leong P. H. W.. 2018. Real-time FPGA-based anomaly detection for radio frequency signals. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS’18). 15. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Nane R., Sima V., Pilato C., Choi J., Fort B., Canis A., Chen Y. T., Hsiao H., Brown S., Ferrandi F., Anderson J., and Bertels K.. 2016. A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput.-aid. Des. Integ. Circ. Syst. 35, 10 (2016), 15911604. DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Neshatpour K., Malik M., Ghodrat M. A., Sasan A., and Homayoun H.. 2015. Energy-efficient acceleration of big data analytics applications using FPGAs. In Proceedings of the IEEE International Conference on Big Data (Big Data). IEEE, 115123. DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Neuendorffer S. and Vissers K.. 2008. Streaming systems in FPGAs. In Embedded Computer Systems: Architectures, Modeling, and Simulation. Springer, Berlin, 147156. Google ScholarGoogle Scholar
  25. [25] Nishida K. and Yamauchi K.. 2007. Detecting concept drift using statistical testing. In Discovery Science. Springer, Berlin, 264269. Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Raza H., Prasad G., and Li Y.. 2013. Dataset shift detection in non-stationary environments using EWMA charts. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE, 31513156. DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Siddhartha, Lee Y. H., Moss D. J. M., Faraone J., Blackmore P., Salmond D., Boland D., and Leong. P. H. W.2018. Long short-term memory for radio frequency spectral prediction and its real-time FPGA implementation. In Proceedings of the IEEE Military Communications Conference (MILCOM’18). 19. DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Soltani S., Sagduyu Y. E., Hasan R., Davaslioglu K., Deng H., and Erpek T.. 2019. Real-time and embedded deep learning on FPGA for RF signal classification. In Proceedings of the IEEE Military Communications Conference (MILCOM’19). 16. DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Sukhwani B., Min H., Thoennes M., Dube P., Brezzo B., Asaad S., and Dillenberger D. E.. 2014. Database analytics: A reconfigurable-computing approach. IEEE Micro 34, 1 (2014), 1929. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Wang C., Renzo M. D., Stanczak S., Wang S., and Larsson E. G.. 2020. Artificial intelligence enabled wireless networking for 5G and beyond: Recent advances and future challenges. IEEE Wirel. Commun. 27, 1 (2020), 1623. DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] WARP. 2022. WARP Project. Retrieved from https://www.warpproject.com.Google ScholarGoogle Scholar
  32. [32] Winterstein F. J., Bayliss S. R., and Constantinides G. A.. 2015. Separation logic for high-level synthesis. ACM Trans. Reconfig. Technol. Syst. 9, 2 (Dec2015). DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Woods L. and Alonso G.. 2011. Fast data analytics with FPGAs. In Proceedings of the IEEE 27th International Conference on Data Engineering Workshops. IEEE, 296299. DOI:DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Wu S., Hu D., Ibrahim S., Jin H., Xiao J., Chen F., and Liu H.. 2019. When FPGA-accelerator meets stream data processing in the edge. In Proceedings of the IEEE 39th International Conference on Distributed Computing Systems (ICDCS’19). IEEE, 18181829. DOI:DOI:Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Xilinx. 2021. Vivado Design Suite User Guide: High-level Synthesis. Xilinx, Inc. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_3/ug902-vivado-high-level-synthesis.pdf.Google ScholarGoogle Scholar
  36. [36] Xilinx. 2021. Zynq-7000 SoC Data Sheet: Overview. Xilinx, Inc. Retrieved from https://www.xilinx.com/support/documentation/data_sheets/ds190-Zynq-7000-Overview.pdf.Google ScholarGoogle Scholar

Index Terms

  1. Hardware-accelerated Real-time Drift-awareness for Robust Deep Learning on Wireless RF Data

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Article Metrics

                  • Downloads (Last 12 months)1,200
                  • Downloads (Last 6 weeks)943

                  Other Metrics

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                HTML Format

                View this article in HTML Format .

                View HTML Format
                About Cookies On This Site

                We use cookies to ensure that we give you the best experience on our website.

                Learn more

                Got it!