DynED: Dynamic Ensemble Diversification in Data Stream Classification

Ensemble methods are commonly used in classification due to their remarkable performance. Achieving high accuracy in a data stream environment is a challenging task considering disruptive changes in the data distribution, also known as concept drift. A greater diversity of ensemble components is known to enhance prediction accuracy in such settings. Despite the diversity of components within an ensemble, not all contribute as expected to its overall performance. This necessitates a method for selecting components that exhibit high performance and diversity. We present a novel ensemble construction and maintenance approach based on MMR (Maximal Marginal Relevance) that dynamically combines the diversity and prediction accuracy of components during the process of structuring an ensemble. The experimental results on both four real and 11 synthetic datasets demonstrate that the proposed approach (DynED) provides a higher average mean accuracy compared to the five state-of-the-art baselines.


INTRODUCTION
The ability to extract meaningful insights out of an endless flow of incoming data is crucial nowadays, and data stream classification methods have made this task more feasible.As more organizations move towards a more technology-driven environment, there has been an alarming increase in generated large amounts of real-time information coming from different sources, such as social media platforms, sensor-based systems, healthcare, etc.Data stream classification techniques offer rapid processing capabilities.This allows analysts to harness valuable insights at lightning speed.As a result, decision-making processes can be carried out promptly to minimize risks while improving performance.This work is licensed under a Creative Commons Attribution 4.0 International License.
Dealing with the dynamic nature of data streams is one of the main challenges with data stream classification due to changes in the data distribution.This phenomenon is known as concept drift [2,18,31,34,38], and necessitates a learning paradigm capable of handling it.Ensemble approaches combine multiple possibly weak classifiers to improve model performance, robustness, resilience distributivity, and redundancy [33].These approaches have the ability to adapt to the changes in data distributions while maintaining high levels of accuracy [7,35,36].
The discrepancies in predictions provided by individual ensemble components are referred to as diversity.In the ensemble learning setting, maintaining diversity among the individual ensemble components is one of the main challenges.Exposure of the data stream environment to various concept drifts and the fast arrival rate of data items makes this challenge even harder.High-diversity ensembles demonstrate better performance in the presence of concept drift [27,28] even with fewer components [6].The performance of ensemble components can decrease drastically when concept drift occurs.To maintain high accuracy, it is critical to detect the concept drift and update or replace the impacted ensemble components [27].
Several approaches to handle the difficulties mentioned above have been proposed.Leveraging Bagging (LevBag) [5] combines bagging's simplicity with additional randomization of component inputs and outputs.This randomization can help individual components in an ensemble make different predictions.Concerning diversity, the Adaptive Random Forest (ARF) [16] uses a local randomization strategy to retain diversity among ensemble components.This method uses different random selections of features for each component in the ensemble, encouraging diversity among individual components.
Compared to the prior methods, the Streaming Random Patches (SRP) [17] combines random subspaces and online bagging to achieve competitive prediction performance.As a result, they indirectly increase the diversity of the ensemble components.Lastly, Kappa Updated Ensemble (KUE) [8] combines online and block-based methods and uses the Kappa statistic to weigh and select classifiers dynamically.To increase the diversity, each base learner is trained with a different subset of features.Additionally, new instances are added to each base learner with a specific probability based on the Poisson distribution.
The challenges of data stream classification and the efforts of the previous solutions to increase diversity within their ensembles have motivated further investigation.The aim is to determine how to add more variety and prune the redundant or ineffective components [14,30,37] in an ensemble to handle concept drift better, and maintain high accuracy.
The following are the main contributions of this research.We • Propose a novel ensemble construction and maintenance approach, called DynED (Dynamic Ensemble Diversification), based on the principles of the Maximal Marginal Relevance (MMR) concept; • Adjust the diversity parameter dynamically to cope with the data stream to have high diversity in case of severe drifts; • Experiment with 15 datasets with varying drift types and compare our results with those of the state-of-the-art methods.

PROPOSED APPROACH 2.1 Using MMR in Data Stream Classification
Maximal Marginal Relevance (MMR) [9] is a diversity-based ranking method that minimizes redundancy while maintaining the relevance of a query in a document set.It is useful for text-document summarization, response extraction [1,26], and document re-ranking [9].Formally, the MMR method follows Eq. 1 to rank the documents: In Eq.1, represents a document, is a ranked list of the documents, is a set of the selected documents, is a parameter that balances accuracy and redundancy, and ( , ) measures the relevance between document and query .When = 1, MMR calculates the relevance-ranked list; when = 0, it calculates a ranking that maximizes diversity among the documents in .MMR optimizes a linear combination of relevance and diversity criteria for values of between 0 and 1.It is necessary to make changes in its definition in order to adapt the MMR method for selecting and ranking ensemble components.In terms of ensemble components, the first part of Eq. 1, which calculates the relevance of to query , is replaced with the accuracy of each component.It is represented as " × acc( , )", where = { 1 , 2 , ..., } is the set of ensemble classifiers, represents each component, and are the previously seen instances of the data stream.
The second part of the Eq. 1 determines a pairwise similarity between the documents.However, evaluating the diversity of ensemble components is difficult since there is no commonly agreedupon formal definition of diversity.Several methods are available for determining the pairwise diversity of classifiers in terms of correct/incorrect (oracle) outputs, such as Correlation Coefficient P (CP), double-fault measure (DF), Disagreement Measure (DM), and statistic [21,22,32].To adapt the second part of the Eq. 1 to the context of component diversity, we replace it with the DF diversity measuring method (the procedure of diversity measure selection is explained in section 3.2).Therefore, the second part of the formal equation turns into "(1 − ) × max ∈ ( , )".The final version of the MMR method for our task is presented in Eq. 2: The MMR method utilizes a measure of similarity, which can be derived from the complement of a diversity measure.

Dynamic Ensemble Diversification: DynED
The working principles of our approach: DynED aims to dynamically ensure high accuracy by increasing diversity in the presence of concept drift and otherwise by decreasing it to reduce exposure to underperforming ensemble components.The pseudo-code for DynED is provided in Algorithms 1 and 2. Stage 1: The primary process includes predicting new samples and training selected components, which is outlined in Algorithm 1.As this method operates online, line 3 of Algorithm 1 uses selected components to predict new samples using Majority Voting [11,23].Stage 2: The Drift Detector, ADWIN [3], is updated using the correct/incorrect predictions.If drift is detected, a new classifier is generated and trained on the last seen data available in the sliding window and then added to the reserved component pool (lines 5-9 of Algorithm 1).Suppose a new component is added or the processed sample count passes the threshold.In that case, the algorithm updates the parameter to reflect the proper value of diversity based on the intensity of accuracy changes (lines 10-16).Then Algorithm 2 is called to update and .The intensity of changes in the accuracy is computed using the formula " ( )− ( − )/ " where ( ) denotes the accuracy of the ensemble model at the t th   Stage 3: Algorithm 2 selects a diverse set of components using an adapted MMR method presented in Eq. 2. The algorithm combines previously selected components with those in the reserve pool and maintains a fixed component count by sorting them based on accuracy and removing poorly performing components.In line 5 of Algorithm 2, prediction errors for all components on previous samples held in sliding window are obtained.In the following line, components are clustered using the K-means clustering algorithm [20] into two groups based on these prediction errors to apply the selection method effectively.After clustering, high-performance components are selected from each cluster, resulting in a total of 2 × components out of where ≤ .In line 9 of Algorithm 2, an adapted MMR method is applied to perform the final selection step.This algorithm outputs a new set of selected components to predict new incoming samples of the stream actively and updates the reserved components pool as a result.The way an ensemble structure is constructed and maintained by DynED is illustrated in Figure 1.

Time Complexity Analysis of Component Selection
The time complexity of Algorithm 2, which employs the reformulated MMR method, is as follows.In line 5 of Algorithm 2, the prediction errors of all classifier components are obtained, the time complexity is O ( ), where represents the classifier component count and ≤ .Lines 6, 7, and 8 of Algorithm 2 involve clustering the classifier components based on their prediction errors and selecting from each cluster, where < .The time complexity of these operations is O ( × × ), where is the number of iterations in the clustering process, and is the number of clusters.and are considered as constant as they are not hyperparameters for DynED.In line 10, which applies the reformulated MMR method, the time complexity can be broken down as follows: calculating the pairwise similarity of classifier components using any diversity measure has a time complexity of O ( 2), where represents the number of classifier components extracted by the clustering step (2× ). Applying the reformulated MMR method itself has a time complexity of O ( × ), where = .Therefore, the overall time complexity of Algorithm 2 is O ( + × × + × + 2 ).The dominant term in this time complexity analysis is 2 .Hence, the algorithm's time complexity can be approximated as O ( 2 ).

EXPERIMENTAL EVALUATION 3.1 Datasets
To assess the performance of our model, we conduct experiments using 15 datasets (Four real and 11 synthetic datasets) and compare them to the baseline models.The datasets cover a wide range of concept drift scenarios.Our experiments include all four types of drift: Gradual (G), Incremental (I), Abrupt (A), Recurring (R), and (U) stands for Unknown drift type.The synthetic datasets based on LED, SEA, Agrawal, and Mixed generators are created using the scikit-multiflow library [29] and MOA framework [4].The LED dataset has seven drifting features without noise.The Agrawal dataset uses four classification functions, and the SEA dataset uses three classification functions to synthesize drift.The description of the datasets is shown in Table 2.

Setup
In our study, we evaluate four diversity measures: Correlation Coefficient P (CP), Double-Fault measure (DF), Disagreement Measure (DM), and statistic.We apply each measure to Eq. 2 across all datasets to determine the most suitable diversity measure for DynED.Our results show that the DF has the highest average mean accuracy compared to that of CP, DM, and -statistic with respective average mean accuracies of 89.57, 88.14, 89.43, and 89.38.Therefore, we choose DF as the diversity measure in DynED.We evaluate the performance of DynED against five state-ofthe-art baselines including LevBag [5], SAM-kNN [25], ARF [16], and SRP [17].These baselines are assessed using the Massive Online Analysis (MOA) [4] framework with default hyperparameters.For KUE [8], we use the source code available on their GitHub for evaluation 1 .DynED is implemented in Python 3.8 using the scikit-multiflow [29] library, with a Hoeffding Tree as the base classifier, and split-confidence set to 9e-1 and grace-period set to 50.All baseline models were evaluated using the Interleaved-testthen-train approach.The codes and datasets for experiments are publicly available, and all experiments and results are reproducible 2 .
The selection of appropriate hyperparameters is a critical aspect of all machine learning methods, including DynED.After conducting tests with various hyperparameters based on the grid search method, we determined the selected values presented in Table 1.These values serve as the default hyperparameters in DynED and are not tailored to any specific dataset.It should be noted that the first 250 samples of each dataset are used as a warm-up, and they are not involved in the accuracy calculations and final results presented in Table 2.

Results and Discussion
The overall accuracy of each method applied to each dataset is presented in Table 2, with the highest scores emphasized in bold.A comparative analysis reveals that DynED outperforms the baselines in 10 out of 15 datasets, particularly in three out of four real and seven out of 11 synthetic datasets.Furthermore, when the average mean accuracies are ranked in descending order, DynED emerges as the top performer with an average rank of 2.20.A closer examination of Table 2 and Figure 2.a and Figure 2.b indicates that DynED provides robustness in the case of gradual and recurrent drift types, outperforming the baselines in nearly all datasets that exhibit these drift types.However, DynED's performance declines in the presence of incremental drift, struggling to maintain high accuracy levels throughout the stream.Nonetheless, when confronted with abrupt drifts, DynED effectively captures and addresses the drift by employing Eq. 2 to increase diversity among components, resulting in enhanced performance as evidenced by the plots in Figure 2. Overall, the results suggest that DynED is a promising method for the online classification of data streams.To assess the statistical significance of the employed methods, the Friedman Test is utilized in conjunction with the Nemenyi post-hoc analysis [10].We calculate the Critical Distance as CD= 1.946.In our experiment, = 0.05.The statistical test analysis, as presented in Figure 2.c, reveals that DynED statistically significantly outperforms KUE and achieves a better ranking among the other baseline methods.

CONCLUSION AND FUTURE WORK
This paper presents DynED, a novel ensemble construction and maintenance method that combines diversity and prediction accuracy for data stream classification tasks.It aims to increase the diversity among components in the presence of concept drift in a data stream in order to handle drifts better.The results show that DynED has higher average mean accuracy compared to the baseline models.In real-world scenarios, data stream environments often face the issue of label scarcity.As a part of future work, we aim to enhance our study for the semi-supervised classification of data streams.

ACKNOWLEDGMENTS
This study is partially supported by TÜBİTAK grant no.122E271.

Figure 1 :
Figure 1: Ensemble construction and maintenance using DynED.Stage 1: Predicting, majority voting, and training.Stage 2: Detecting drifts, adding new components, and updating the diversity parameter.Stage 3: Selecting new components.

Table 2 :
Characteristics of the datasets, average interleaved-test-thentrain accuracy, and the rankings of the methods for each dataset.DT: Drift Type, |X|: No. of features, |y|: No. of classes, |D|: No. of Samples