Team Radiant or Dire? Comparing the SVM and k-NN Classifiers upon a DotA2 Matches Dataset

Match outcome prediction efforts for DotA2 have been ongoing almost since the time the game was released, in 2013. A considerable number of studies have suggested different methods aiming to approach this challenging topic in closer proximity, however, due to the complexity and plethora of parameters affecting the result, there is still room for further researching the algorithmic performance in order to choose appropriate models and achieve higher accuracy in prediction. Different combinations of classifiers have already been introduced and examined. Within the scope of this paper is to experiment with the Support Vector Machine (SVM) and k-Nearest Neighbors classifiers and apply them to compare their class predictive ability (victory for team Radiant or Dire) on a DotA2 dataset consisting of game features, among others team selection, character selection and game mode. The results of this small-case study showed that neither k-NN (for various k values) nor SVM managed to achieve a significant level of predictive accuracy. In addition, regardless of minor differences in the results, McNemar's test proved that their predictive performance is in fact equal, given a set significance level α.


INTRODUCTION
During the past years, plenty of attempts have been made aiming to predict the outcome of one of the most popular MOBA (Multiplayer Online Battle Arena) games, DotA2.The approaches vary in nature, with different models being applied, different evaluation methods, and different expected findings.The topic of DotA2 game outcome prediction or machine learning approaches to help determine character selection and combinations, as well as other probabilistic ratings in the same rationale, has been ongoing almost since the release of the game, back in 2013.The multiparametric factors which affect the game's match outcome create layers of complexity worthy of investigation, with algorithmic approaches aiming to measure performance, accuracy in prediction, efficiency, and other metrics of relevant significance.DotA2, as a game of role-playing and strategy, consists of two rival teams of players, in which characters/heroes of varying features and skill sets compete (each player controls one hero) on a game map against the other team, aiming to bring down their building.The nature of the game itself, complemented by its immense popularity and charm from a machine learning perspective, has made it an attractive topic of research.In this light, the present small-scale study aims to contribute to the body of knowledge of this subject by comparing the predictive ability of two well-known classifiers, the Support Vector Machine (SVM) and k-Nearest Neighbors (k-NN) and to experiment with various k-values (for k-NN) and kernel functions (for SVM) in order to examine possible effects they might have on the game outcome prediction.

BACKGROUND
Prior to the presentation of the methodology, it is noteworthy to look into the existing literature in order to see what has been studied in a similar context and what the results have shown.In their study, Wang & Shang [1] used one of the most common classifiers, the Naive Bayes Classifier, to predict the winning team based on the character lineup and players' choices.The experiments were conducted on the same UCI repository dataset as the one used in the present paper and the accuracy achieved on the test set was stabilized around 59%.In 2018, Beskyd [2] implemented an experimental program which would result in a predictive ANN (Artificial Neural Network) model used to approach the problem.The proposed solution cannot be directly comparable to existing implementations of SVM or decision tree solutions due to the differences in the dataset selection and other imponderables, however, according to his work, ANN was the best proposed model.Song, Zhang and Mao [3] also made an effort in 2017 to predict the winning side of DotA2, however, they concluded that predictions based solely on hero lineups do not seem to have a considerably satisfactory result.Porokhnenko et.al [4] conducted a study dedicated to the choice of heroes for each team (as one of the most important factors affecting the outcome) and presented classification models, along with parameters optimization and performance analysis.They concluded that the best models in this case were Linear Regression, Linear SVC and the neural network (with the Softplus and Sigmoid activation functions).On a slightly different note, but still on the same page, Katona et al. [5] in 2019 trained a deep learning network based on a large selection of DotA2 features and professional/semiprofessional level match dataset in order to achieve accurate death prediction of a game hero (as a significant game-changing event) within the span of 5 seconds.Grutzik et al. [6] also attempted to predict the game outcome by considering the heroes' characters and historical data related to performance, concluding that the best model they applied performs on the same level as human level performance.Based on two criteria, post-match data and hero selection data, Kinkade & Lim [7] presented and analyzed the victory predictors for DotA2, while Semenov et al. [8] conducted a systematic review of machine learning algorithms aiming to predict the game outcome.They applied the Naive Bayes classifier, Logistic Regression and Gradient Boosted Decision Trees, as well as Factorization Machines, which is what provided the best result in that case.Makarov et al. in 2018 [9] compared the predictive accuracy for DotA2 and Counter-Strike:GO using various models based on an approach of probabilistic ranking, measuring the impact of individual players on the game victory.
Following a similar approach, conducted research on the predictive, binary classification performance of SVM, neural networks and k-NN classifiers, [10], [11], [12] and [13] intuitively assisted the authors in the selection of the presented methods and algorithms.To the best of the authors' knowledge, there has not been sufficient research addressing the performance ability of SVM and k-NN with respect to their predictive ability on DotA2 matches.

METHOD
In this section, the methodology followed is being described, along with the necessary background used as motivation for the selection of the specified methods, algorithms, and proposed metrics.In this study, the k-NN and SVM classifiers were applied with the purpose of comparing their class predictive performance on the DotA2 dataset provided by the publicly available UCI Machine Learning Repository [14].The purpose was to evaluate which model achieves higher accuracy and could potentially be more suitable for the task of identifying whether a new game match example given to them as input is correctly being classified as the winning team based on other features of the dataset, mainly character selection, game mode and game type.The aim is to make use of appropriate metrics, e.g., accuracy, derived from the contingency table, and tests such as the McNemar test, in order to interpret the results and decide whether one model exceeds the other in terms of performance, prediction ability, and if so, under which assumptions.
One of the biggest challenges with this task is that, due to the nature of the game and complexity of dependencies among the dataset features (mainly the selection of characters), it is non-trivial and cumbersome to find solely one model that achieves high performance in regressive prediction, thus emphasizing the need for sufficient research to be conducted in order to keep improving the likelihood of prediction, aiming to make it significantly better than random chance.
The original UCI repository DotA2 game result dataset consisted of 102,944 instances.For the purpose of this small-scale study, the dataset used for the experiment consisted of a random selection of 7,000 instances, out of which 4,501 randomly selected samples were used for training and another 2,499 were used for testing the predictive performance of the models.As by the nature of the game, there are two teams competing, Radiant or Dire (according to location on the map) and each consists of 5 players, selected from a range of 113 possible hero characters.Each instance of the training dataset consists of 117 attributes, which include the winning team (denoted 1 and -1 accordingly), the cluster ID based on location on the map, the game mode (e.g.All pick), game type (e.g.Ranked) and each of the 113 characters (denoted 0 if they are not selected in that match, 1 if they are selected by one team and -1 if they are currently selected by the other).There were no missing values in the dataset used and, in the event of missing values before a simulation/execution during new sample selection, these were treated according to common practices for data cleansing.The selected classification algorithms are the SVM and the k-NN classifiers.SVM was deemed to be a suitable choice in this case as it has been shown to be effective even in high dimensional feature spaces and due to its flexibility to work on even higher dimensional spaces with the assistance of kernels.k-NN was selected on the grounds that it is simple and intuitive to implement and can be used to experiment with different values of k to check for the possibility of improved results.SVM A support vector machine is a lazy learner, supervised, discriminative classifier which is defined by a separating hyperplane.Given a training dataset, SVM creates an optimal hyperplane in an Ndimensional space in order to classify the new data.Something important about this classifier is the kernel trick, which is the ability of the algorithm to make an appropriate transformation, such that data which is not linearly separable, can gain this property by being mapped to another space where there exists a hyperplane to separate them.Mathematically, this mapping can be expressed as: , where phi is the feature map.In the present study, the authors are going to experiment with a few parameter tunings and kernel functions to try and achieve better results in the high dimensional space.

K-NN
The k-Nearest Neighbors algorithm is also a lazy learner, and non-parametric method.According to k-NN classification, a point (example) is classified by a majority vote of its neighbors, which are the points with the smallest distance (or highest similarity) to that point.There are various heuristic methods by which one can find a good k value, however, for binary classification problems, like in our case, a k of odd value is preferred in order to avoid tied votes.In this case, different values of k are going to be tried out to examine whether this has a significant impact on the accuracy of class prediction.The authors intended to perform PCA (principal component analysis) in order to reduce the effect of the curse of dimensionality before training the models, however, as character selection (which adds most of the attributes in the dataset) is one of the most important factors in determining the outcome of the game match, it was deemed necessary to not drop any, in order to avoid reducing the performance of the classifiers.One point which was taken into consideration, is the normalization of the data, which is necessary for the sake of keeping the computations within a boundary which will preserve the validity of the results and will not be biased towards one or the other classifier.
The evaluation metrics chosen for the interpretation and evaluation of the results are mainly accuracy, other metrics tied to the contingency table of results, and finally the McNemar test, which helped determine whether the result of the two classifiers reached 'marginal homogeneity', or in other words, whether the null hypothesis of both classifiers having the same predictive performance on the dataset is accepted or rejected.In the following section, the model performance results are presented and interpreted.

RESULTS
As mentioned earlier in the methodology section, first, the k-NN classifier was applied.65% of the normalized dataset was used for training the model and 35% was used for testing.The model was trained on the dataset, excluding the first column of data which is the class that we later wish to predict (winning team of the game).Afterwards, the algorithm ran on the remaining test set in order to predict the outcome of the game.10-fold cross validation was also performed for cost parameter values 0.1, 1, 10 and 100 to compare the various accuracies, although the result was not very intuitive.The computation for classification was fast and it only took a few milliseconds for k-NN to complete, even for higher values.In general, the results of the experiments were the outcome of several random seeds, with no significant difference in results or computational performance; indicative examples are included.Table 1 shows the predictive accuracy of k-NN for different k values.While the k value increases, there appears to be a tendency for the accuracy to increase as well, although not dramatically.The last k value used was 499 which was one of the limit values before the algorithm "complained" over the complexity of neighbor relationships.It is also the point where the accuracy starts to drop after reaching the maximum value which was 55% for 411-nn.The results are represented graphically in Figure 1 where this tendency between the increasing k value and accuracy is visible.The rather poor classification result is not very surprising considering the nature of the dataset, the result is slightly improved while k-NN considers more "neighbor" examples, however, it is up to a certain point that the model fits as most of the attributes of the dataset give the same amount of information and it appears to be that not many of them are more significant in terms of assisting the algorithm in taking a decision over others.
To continue, the SVM classifier was applied on the exact same dataset in order to see if it can achieve better results.The authors experimented with different kernels, including the linear, polynomial (of degree 3), radial basis and sigmoidal function.The computational time for SVM was significantly worse than k-NN, with 12 seconds for the linear kernel to complete, 29 seconds for the polynomial, 33 seconds for the radial basis and 30 seconds for the sigmoidal   respectively.The results of their predictive accuracy can be shown in Table 2.
As can be observed, the highest achieved accuracy was 57% for SVM with linear kernel function.The other kernel functions produced a poorer result.Intuitively, the number of support vectors the algorithm found for each model is indicative of this rather poor accuracy as, in order to reach a prediction, SVM had to make use of a large number of support vectors.This can be justified again by the nature of the dataset, there is clearly no distinctive feature to help the classifier find the eigenvectors that can optimally represent the whole dataset, and which would subsequently lead to better prediction performance.
As a final remark, the authors applied the McNemar test on the two classifiers' results as they deemed it interesting to test whether the performance result of one classifier could be considered better than the other or whether the small difference in results solely depended on the instances chosen for that particular experiment.The McNemar test [15] is given by the following formula: , where the discordant c 0,1 is the number of cases where the first classifier (k-NN) was correct, but SVM was not and c 1,0 is the number of cases where the second classifier (SVM) was correct, but k-NN was not.These values were computed with the help of the confusion matrices of the best k-NN (411-knn) and SVM (linear kernel) models which can be seen in the tables of Figure 2 respectively: The null hypothesis H 0 formulated was that the two classifiers have the same accuracy at a significance level  which in this case was chosen to be  =0,05.The alternate hypothesis H 1 was that the accuracy of the two classifiers is not equal.If the result of the test is > x  , 1 2 (which in our case is equal to 3,84) then the null hypothesis can be rejected.After the computations, x 1 x was found to be equal to 0,00159 which is smaller than 3,84.That means the null hypothesis H 0 of marginal homogeneity in our case is accepted and the two classifiers have, in fact, the exact same performance on the dataset, given a set significance level .

DISCUSSION
Comparing the results of the present small-scale study with previously conducted research on this topic, we see about the same progress, with a little less than 60% accuracy in prediction achieved, in the best-case scenario.It appears to be that there is still a lot of work needing to be done in order to achieve improved class prediction accuracy than almost random guessing.This is a challenging issue, as all the research conducted so far has been using different datasets of DotA2 game results, including different attributes, different sizes and differently collected data.According to the existing literature, it is deemed to be also very challenging; nearly impossible, to be able to predict the game outcome with satisfactory certainty, based solely on the lineup selection of characters, game type and game mode.Due to the big number and complex relationship of factors that affect the outcome of the game, future research needs to include more parameters like skill level, character level, item selection, human decision, perhaps even environmental conditions which could potentially affect the player's focus and concentration, enough to be considered a factor of significance for the algorithm decision-making.It would be of great interest to also see more research involving ensemble methods, different combinations of classifiers, serially or in parallel, and multi-stage classification including a variety of parameters which could eventually compliment the algorithms' learning intelligence and lead to non-trivial, significant results.

Figure 1 :
Figure 1: k-NN Predictive Performance for Various k Values

Table 1 :
k-NN Classifier Accuracy for Different k Values

Table 2 :
SVM Accuracy for Different Kernel Functions