Predicting Pedestrian Involvement in Fatal Crashes Using a TabNet Deep Learning Model

To make road transportation systems safe for pedestrians, understanding the contributing features in vehicle-pedestrian fatal crashes is critical. With a better prediction model, it is possible to design effective countermeasures and reduce fatal crashes involving pedestrians. This paper aims to develop a deep learning-based model to predict fatal crashes that involve pedestrians in the United States using the Fatality Analysis Reporting System (FARS) database from the National Highway Traffic Safety Administration (NHTSA). TabNet architecture has been used to train a model from historical data. At the same time, other traditional classifiers such as support vector machines, random forests, and decision trees have been utilized to develop baseline results. An ensemble model of the five best models from the single model analysis was also developed. Metrics such as Precision, Recall, F1 score, and the area under the ROC curve (auROC) were calculated for each model. Since the problem requires correct prediction of all possible fatal cases, Recall is considered the most crucial evaluation metric. Not surprisingly, the ensemble model was found to have the highest recall value among all models. However, the TabNet model was found to have the highest recall score among single models, indicating that this model is the most suitable for the fatal vehicle-pedestrian crash prediction task out of all models analyzed. Another advantage of the TabNet model is that it can be interpreted, which helps understand the variables that contribute the most to the prediction. It was seen that factors such as roadway geometry, light conditions, impaired drivers, and land use had the highest contributions to the predictions made by the model. This fatal crash prediction model was found to have the potential to aid all relevant stakeholders in decision-making processes to make roadways safer.


INTRODUCTION
While walking-a form of active transportation-has been promoted as a sustainable mode of transportation, a quarter of all fatalities on the roadway systems of the world involve pedestrians [30].The National Highway Traffic Safety Administration (NHTSA) reported that approximately 6,516 pedestrians were killed in traffic-related crashes in the U.S., which constituted a staggering 17% of all traffic crashes.On average, a pedestrian was killed every eighty minutes in 2020, and approximately one out of four pedestrians killed were struck in hit-and-run cases.These statistics show that pedestrians remain among the most vulnerable user groups in our roadway systems [41].Several research studies have been performed previously to identify the factors that affect pedestrian safety.Pedestrian and roadway characteristics, traffic characteristics, land use characteristics, and demographic characteristics were important features that might lead to pedestrian fatalities [7].Chakraborty, Mukherjee, & Mitra (2019) [10] reviewed several studies and identified that traffic exposures (speed, vehicle volume, pedestrian volume), roadway geometric design (number of lanes, road width, median), operational traffic parameters (traffic signal phase timing and control), the built environment, vehicle fleet mix and characteristics, pedestrian behavior (sudden speed change, sudden behavior change, pedestrian signal design, pedestrian signal behavior) and spatial sociodemographic characteristics are essential factors that affect pedestrian safety.
Several models have been developed to predict pedestrian crashes over the years.In terms of crash prediction modeling, transportation modelers and urban planners have historically explored a variety of statistical approaches to constructing pedestrian safety models.These have included: linear regression, logistic regression, multivariate linear regression, Poisson, Negative binomial, Bayesian neural networks, and artificial neural networks.Although statistical models have a very strong and widely accepted mathematical background in crash prediction, the inability of these models to account for complex and highly nonlinear data cannot be undermined.This has led urban planners and transportation modelers to adapt computational intelligence, which combines elements of learning, adaptation, evolution, and fuzzy logic to create intelligent models that can account for highly nonlinear transportation datasets [0].With the emergence and success of deep learning, researchers are now interested in applying complex deep neural networks in building models that could assist with policy decision-making.
This paper aims to develop and test different machine and deep learning models to create a reliable prediction model for fatal vehicle-pedestrian crashes.The prediction models developed were trained using data from the Fatality Analysis Reporting System (FARS), which is maintained by NHTSA.The goal was to create a model that could assist transportation stakeholders and policymakers in identifying what factors could be used to understand the complex relationship of variables that lead to a vehicle-pedestrian fatal crash over any other crash type.

Major Contributions
In the era of advancing technology, the accurate prediction of crashes, particularly those resulting in fatalities, has emerged as a matter of utmost importance.While traditional transportation planning has primarily focused on understanding the factors associated with crashes, achieving reliable predictions for these crashes has proven challenging due to the multifaceted nature of crash factors.The advent of machine learning offers promising avenues for enhancing crash and fatality prediction.However, a significant concern persists regarding generalizing the ML models to future scenarios as the decision making process in these models remain so opaque.
This research paper endeavors to address this challenge by leveraging TabNet, a machine learning model known for its interpretability features.The primary objective is to develop a TabNet model capable of predicting fatal vehicle-pedestrian crashes with superior performance when compared to traditional machine learning models.Beyond achieving high predictive accuracy through the use of TabNet, this study also seeks to contribute to the broader understanding of how the TabNet model arrives at its predictions.By shedding light on the rationale behind these predictions, we aim to establish a foundation for justifying the future deployment of machine learning models in crash and fatality prediction, thereby enhancing decision-making in this critical domain.To summarize, this work is aimed at increasing prediction reliability for vehiclepedestrian crashes and at the same time understanding how the predictive models arrive to their predictions, which will help in justifying the use of machine learning models to predict crash risks in the future.

LITERATURE REVIEW
Over the years, transportation safety engineers, urban planners, and state highway agencies have used statistical and econometric modeling techniques to understand reliability in predicting crashes on public roadways in the United States.However, statistical modeling may sometimes not capture complex relationships present in nonlinear data [0].In general, the literature is vast within the statistical modeling landscape [8,3,34].The ML models are also getting traction in the last decade [4,3,6,18,19,48].[38] provides a review of recent studies applying machine learning algorithms for predicting crash injury severity.Several models have been developed to predict pedestrian crashes.Sze and Wong [42] developed a binary logistic regression model to determine if any association existed between the probability of fatal and severe injury crashes.They considered factors like demography, crash type, environmental characteristics, geometric characteristics of the roadway, and traffic characteristics as variables in their model.Data used consisted of 73,746 pedestrian casualties and injuries that happened between years 1991 and 2004.The authors concluded that the risk of pedestrian fatalities and severe injuries was related to demographic and road environment, among other risk factors.Pulugurtha and Sambhara [32] used a negative binomial distribution model to develop a linear pedestrian crash estimation model for signalized intersections in the city of Charlotte, North Carolina.They used demographic characteristics, socioeconomic characteristics, land use characteristics, road network characteristics and the number of transit stops to predict the number of pedestrian crashes within 200 feet of each intersection.Their results indicated that population, number of transit stops, number of approaches, and pedestrian volume tend to increase pedestrian crashes, while some land use predictor variables such as single family residential area, urban residential commercial area, commercial center area and neighborhood service district suggest a decreased likeliness in pedestrian crashes in the area.Tay et al. [43] used a multinomial logit model to identify factors determining the severity of vehicle-pedestrian crashes in South Korea.Collision data from 2006 was used and their results showed that the probability of fatal injury increased with male drivers, middle-aged drivers, and intoxication, pedestrian's being female, older pedestrians, geometric characteristics of roadway, and weather conditions.Kim et al. [21] used a hierarchical binomial logistic model to model crash outcome probabilities in two-lane rural intersections in the state of Georgia.They concluded that crash models may benefit from considering hierarchical modeling of the variables.For their case, they found that the effect of geometric characteristics and environmental factors can be modeled using multilevel techniques.
Besides statistical models, data mining techniques have also been applied extensively to other traffic engineering aspects of pedestrian movement.Kim and Yamashita [22] used the K-means clustering approach to examine patterns of pedestrian-involved crashes.Their results indicated that the K-means algorithm has the potential to locate compact, localized clusters.Several machine learning models have also been widely studied in crash prediction problems.Siam et al. [40] used the random forest to extract significant factors for different road crash clusters and then applied C5.0 to develop predictive models.Li et al. [24] applied a Support Vector Machine (SVM) model for traffic crash prediction in rural Texas.The results from the support vector machine models were compared with those of Negative binomial regression models.The results from the study indicated that SVM models were more accurate and effective in predicting motor vehicle crashes.Santos et al. [37] used differing machine learning models such as decision trees (DT), random forests (RF), logistic regression (LR), and naive Bayes (NB) to develop models that could identify influential factors that had the potential to predict the level of severity of a crash.Their results showed that Random Forest models are useful in predicting accident hotspots.
With the introduction of deep learning models, traffic safety researchers have now moved towards using deep learning models as a tool for crash prediction [6,14,35].Ma et al. [27] used a stacked sparse autoencoder (SSAE) to predict the severity of injuries resulting from traffic crashes caused by different factors such as junction detail, road type, environmental characteristics, weather conditions, etc. which were used as inputs for the model.Rahim and Hassan [] proposed a convolutional neural network (CNN) approach to develop crash prediction models.They transformed features related to roadway, vehicle, human factors, and severity into images and used a customized loss function to optimize the model for precision and recall.They further suggested that recall and precision were more critical evaluation metrics for traffic prediction models since precision and recall penalize a model for ignoring the minority classes.
Although deep learning has mainly focused on using unstructured data, most transportation-related data are structured.The TabNet architecture [5] is a transformer-based deep neural network, recently proposed for use with tabular data.Due to the success of transformer-based approaches in many application domains [6,9], TabNet makes an ideal candidate as a deep learning model to be trained and used for predicting vehicle-pedestrian fatal crashes based on tabular features.In our review of literature, it was found that limited to no research used TabNet as an architecture for crash prediction modeling.

METHODOLOGY .1 Datasets
Fatal traffic crash data for four years, from January 01, 2016, through December 31, 2019, were obtained from the Fatality Analysis Reporting System (FARS) maintained by the National Highway Traffic Safety Administration (NHTSA), although the FARS database contains fatal crash data from 1975 to date.We used only data from January 01, 2016, to December 31, 2019, for a few reasons: 1) While historical data may provide good insights, in general, for cases of traffic crash fatalities, more recent data matter more as the conditions that lead to a traffic crash change over time, e.g., the conditions that caused fatal vehicle-pedestrian crashes back in 1975 may not exist anymore; 2) NHTSA had changed the reporting format for these data in 2016, and there are discrepancies between the pre-2016 reporting formatting and the post-2016 formatting; and 3) To avoid the impact of COVID-19 related biases in our analyses.Few input variables had to be removed from the dataset that we assembled for the period considered, specifically those which were highly collinear and were believed to have little impact on the interpretation of the model.Finally, thirty input variables were extracted for model development and testing.The variable "Harmful Event" was used as the output variable.There were 34 types of possible values for the output variable including overturn, explosion, collision, railway train, and pedestrian.Since the scope of this research is limited to only vehicle-pedestrian crashes, the pedestrian-based crash was coded as 1, while other types of crashes were coded as 0. All the variables were encoded using One hot encoding resulting in a separate column for missing values for the case of categorical column, while missing values in numerical columns were replaced by the mean of the column.It should be noted that if a model predicted that the crash was not a pedestrian related crash, it indicated that there was a crash however no pedestrainas were involved.However, the crash was fatal to some other groups of people.Traffic way identifiers describe the specific location and type of road or  1.
The dataset used for developing and evaluating the models had 136,71 samples which were divided into three sets: train, development (also known as validation subset) and test.The train subset consists of approximately 80% of the data and was used to train the models; the development subset consists of a further 10% of the data and was used for hyper-parameter tuning, and finally, the test subset consists of the remaining 10% of the data and was used as a test dataset to evaluate the performance of the trained models on cases unseen during training and development.The percentage of the dataset that had pedestrian-vehicle crashes as output is shown in Table 2.

Deep Learning Model
3.2.1 TabNet.We chose TabNet as the preferred deep-learning model algorithm for this research [5].Although tabular data are one of the most common types of data, tabular data was found to receive minimal attention in the deep learning domain, with tree ensemble models still being preferred in most applications.However, TabNet, a transformer-type architecture, was shown to provide significant performance improvements due to its ability to exploit self-supervised pre-training on unlabeled data using the mask feature prediction task [25,31].
The TabNet architecture uses sequential attention to identify informative features at each step, a design choice that enables efficient training as it reduces the complexity by allowing the learning to focus on the most informative features.Another critical characteristic of the TabNet architecture is the ability to use raw unlabeled inputs in tabular form, which enables representation learning and eliminates the need for data preprocessing.The models are trained using gradient descent-based optimization, and can thus be easily integrated into end-to-end learning systems.Another vital part of the TabNet architecture is that it allows interpretability at both local and global levels.The local interpretability enables one to visualize how each feature is important and how they are combined towards the overall prediction.In contrast, the global interpretability obtained by aggregating the weights of the local features helps understand how each feature contributes to the predictions of the trained model.Local interpretability is very important in traffic safety modeling.Traffic professionals seek to understand the essential features in the model to give the highest importance to these features while designing safety measures or applying safety countermeasures to a roadway that demonstrates a safety problem.The TabNet architecture is shown in Figure 1 and Figure 2.

Baseline Models
A review of existing literature found very limited previous studies in the transportation domain that could be used to identify communityaccepted baseline models for our study.Hence, the research team used the most popular classification algorithms for tabular data as baseline classifiers to understand how the TabNet deep learning model results compare to the results of the baseline models.
3.3.1 KNeighbors Classifiers.The K-NN algorithm is often said to be one of the simplest algorithms in machine learning (Pandya, 2016).Often termed as a Lazy Learner, it uses the Euclidean distance to calculate the distance between pairs.For a new query, the algorithm finds K training instances that are the closest to the query instance.One of the biggest disadvantages of K-NN is that it is computationally expensive because it does not build any model beforehand and requires reexamining all the training data with respect to distances to a new query [3].

Decision Tree
Classifiers.Decision tree is a tree-based knowledge representation methodology that mainly has two phases termed the building phase and the pruning phase [18]).It works by successively partitioning the data into more and more homogenous subsets by producing decisions also called nodes.In the pruning phase, the complex tree is pruned to produce an output that is less complex however with a superior predictive capability [12].

Random Forest.
The Random Forest model is a supervised ensemble learning model introduced by Ho [17].The ensemble consists of a set of decision trees and has been traditionally used for both classification and regression problems.Decision trees have been the go-to models for tabular data learning due to their strong ability to pick informative features efficiently.Random Forest is very flexible and fast.The Random Forest algorithm has three main steps.In the first step, the algorithm performs bootstrap sampling.In the second step, a decision tree is constructed from the sample by iteratively selecting a set of best features based on the information gain criterion.The first and second steps are repeated until a pre-defined number of trees is reached.In the final step, the final output is obtained by taking the mean of the output values of all the generated trees [2].Random Forest learning has been used extensively in traffic modeling and crash predictions [13,37,5].

AdaBoost. The Adaptive boost or commonly referred to as
AdaBoost was introduced by Freund & Schapire [15].AdaBoost uses a small but efficient set of weak classifiers to construct a strong classifier.The data sample used to train each weak classifier is biased towards instances that were wrongly classified by previous classifiers.A weighted ensemble of weak classifiers is used as a final classifier for making prediction on future data.[11] proposed Extreme Gradient Boosting (XGBoost), an implementation of Gradient Boosting, which in turn is an algorithm similar to the Random Forest algorithm, with two main differences.First, each new data sample that is used to construct a new tree in the ensemble is  focused on harder examples that are not well classified by the previously constructed trees.Second, the trees are trained based on arbitrary differentiable loss functions, which are optimized using gradient descent.XGBoost has been a highly successful type of machine learning model for tabular data.It is also highly scalable and can process billions of data in the most efficient manner.Another essential feature of the XGBoost model is the ability to handle sparse data by visiting only data with non-missing values to learn an optimal default direction for a node split and thus lowering the computational cost.Another significant feature of XGBoost is the use of column blocks for parallel learning.XGBoost uses a memory unit to store data in a compressed column format [2].As Random Forest, XGBoost has also been used in traffic modeling and crash prediction scenarios [19,44].

GaussianNB.
GaussianNB is similar to Naïve Bayes learning classification with the only difference being that GaussianNB assumes the likelihood of the feature to be Gaussian.The Naïve Bayes algorithm is based on the Bayes Theorem and makes the assumption that all features contribute independently to the probability of the target outcome (i.e., features are independent given the target outcome).The parameters in the algorithm are estimated using the maximum likelihood method.Boser et al. (1992) [9] is one of the most wellknown traditional machine learning techniques used in regression and classification problems.Due to its more comprehensive theoretical guarantees, the SVM can result in prediction performance that is superior to other conventional techniques [6].More specifically, the SVM technique has a target to find the separating hyperplane that maximizes the margin between the hyperplanes defined by the support vectors of two linearly separable classes.When the classes are not linearly separable, the SVM model works by mapping the input into a high-dimensional feature space where data becomes linearly separable [47].Given its performance in other application domains, SVM has generated tremendous interest among researchers in the transportation engineering domain [24,4].

Classifiers Used.
A variety of classifiers were trained and evaluated on the dataset that was assembled from the FARS database, including k-Nearest Neighbors, Decision Trees, Random Forests, AdaBoost, XGBoost, Gaussian Naïve Bayes (GaussianNB), Support Vector Machines (SVM), and finally, our proposed model, a deep learning model based on the TabNet architecture.The implementation of the TabNet model was performed using Python and the TabNet-PyTorch library.Python and Scikit-learn libraries were used for all the other models.

Model
Training.The TabNet model was trained in Python using cross-entropy as the loss function with a batch size of 8192 and a maximum epoch of 1500.Adam was used as the optimizer function with a momentum value of 0.3.A sparsity loss coefficient of 1e-4 was also used.The sparsity loss coefficient was used for feature selection.The learning rate was changed periodically during training using the PyTorch scheduler.The virtual batch size used for ghost batch normalization was set to 1024.The XGBoost model was trained in Python using the XGBoost library, the following hyperparameters were used: learning rate = 0.1, gamma = 0, max depth = 5, subsample = 0.7, and default values for the rest of the hyperparameters.All the other models were trained in Python using the sklearn library, and default hyperparameters were used after a small exploration of fine-tuning showed no significant improvements.

Performance Evaluation.
The performance of the models was evaluated using several standard machine learning metrics, such as accuracy, precision, recall, F1 score, and the area under the ROC curve (AUC).Precision was found using Eq 1. True positives are instances that the model correctly predicts in the positive class.A false positive is an instance that the model wrongly predicts to be in the positive class.

Precision =
True Positives True Positives + False Positives (1) Precision in our case measures the number of correct instances of a fatal vehicle-pedestrian accident over the total number of instances predicted as vehicle-pedestrian accidents by the model.Precision measures the positive predictive value.Precision is an essential parameter for our safety models because it ensures that we correctly identify accidents that have the potential to occur without including too many false positives The recall is considered to be the most important metric for safety models out of all the metrics used in our evaluation.This is because prioritizing high recall means we catch most instances of fatal vehicle-pedestrian crashes (even if it includes some errors), essentially minimizing the number of cases the model misses, or false negatives.The recall was calculated using Eq 2.
The F1 score is the harmonic mean of precision and recall.It balances recall and precision, the two most important parameters for our model.The F1 score was calculated using Eq 3.
Area Under the ROC Curve (AUC) is another important metric that is frequently used to evaluate the performance of a machine learning model, as it captures the tradeoff between the false positive rate (i.e., 1 -specificity) and the true positive rate (i.e., sensitivity) over a whole set of classification thresholds over predicted class probability (as opposed to the F1 score which captures just the 0.5 thresholds).Thus, the ROC curve is obtained by plotting the false positive rate against the true positive rate for different classification thresholds.AUC measures the area under the ROC curve.The True Positive Rate (TPR) and False Positive Rate (FPR) are found using Eq.One major problem with deep learning models, in general, is that although they have an excellent capability to capture nonlinear relationships in data, it is hard to understand which factors have a significant impact on the model.TabNet, on the other hand, mixes the benefits of deep learning models as global function approximators and the interpretability advantage of traditional machine learning algorithms such as Decision Trees or Random Forests.TabNet can show the contribution of each sample's features to the fatal vehicle-pedestrian crash and also show each feature's global contribution to the output.Gaining insights into the key contributing factors enables the identification of factors that escalate the likelihood of crashes and fatalities.This understanding is pivotal in identifying effective traffic safety measures.Furthermore, the outcomes have the capacity to reveal whether risks are predominantly linked to human behavior, vehicle attributes, or environmental factors.Such insights can guide the selection and implementation of tailored safety interventions.The interpretability features also have the potential to uncover deficiencies in current road conditions while simultaneously highlighting elements that heighten the chances of collisions.This valuable information equips agencies with the knowledge needed to implement precise safety measures.

RESULTS
The training and development subsets of the data were used to train and fine-tune different models that were discussed above.The test subset, which comprises 10% of the data, was then used to evaluate the performance of the models.The TabNet model was the main model of the study, while other models were used as baselines for the TabNet model.The performance metrics obtained with different models for the vehicle-pedestrian crash class are listed in Table 3.It can be seen that the TabNet model exhibited the highest recall among all the models evaluated.As mentioned above, recall is the most critical metric when considering a crash prediction model because a higher recall ensures fewer chances that a case of a fatal vehicle-pedestrian crash is missed.In a previous study, Rahim and Hassan [33] also used recall as the main metric to evaluate model performance.Our recall value of 81.3% is slightly higher than the one obtained in the research by Rahim and Hassan [33]; however, the data used in the two studies were different and a direct comparison is not feasible.
Table 3 indicates that the precision is the highest for the Random Forest classifier.However, TabNet outperforms all the other models not just in terms of recall, but also in terms of AUC scores.One of the important features of the TabNet model is its ability to produce interpretable results and thus provide more insights towards usability.Figure 4 shows the features that had the most contribution to the model's predictions.It can be seen that the Traffic Way Identifier (feature 11 in Table 1) had the most impact, i.e., certain highways had a higher probability of causing fatal vehicle-pedestrian crash accidents.Traffic planners can use this information to develop a safety protocol for the highways that show this behavior.Besides the Traffic Way identifier, the EMS minute of arrival to the hospital was also one of the most significant factors.The third most important factor identified was the feature corresponding to Drunk Driver.Law enforcement agencies can use this information to prevent fatal pedestrian crashes.Other important variables included the Number of vehicles involved, City, Hour, Land Use and Light Condition.Intuitively, the model does a good job at providing an overall idea of the most critical factors that may be needed to be taken into consideration when designing a traffic way and a safety protocol.
We have repeated the previous experiment using the same setup for the TabNet model with only one difference, instead of merging and randomly dividing the four years of data (2016, 201, 2018, and 2019) into training, development, and test subsets, we used the first three years for training and development, then we used the 2019 data as a test data set.The results of predicting the fatal vehiclepedestrian crash were slightly higher than the previous results as shown in Table 4.We can infer from these results that the data distribution is similar in all the years and the models that we train on data from previous years are good enough to predict a pedestrian fatal crash in the future.By identifying some of the causes of the vehicle-pedestrian fatal crash, the responsible authorities could use this knowledge and take the needed steps to enhance roadway safety.

Ensemble Model
A soft voting ensemble and a hard voting ensemble models [16] were further trained using three pre-trained models from our previous experiment.The hard ensemble predicts the class that gets the highest number of votes from the models.As opposed to that, the models in the soft ensemble predict each class with a probability, and the ensemble uses the mean of these probabilities for each class, and a threshold of 50% to decide the class.For example, if class 1 gets 40% probability and 0 gets 60%, then the ensemble predicts 0. We can also control the threshold, e.g. to get a higher recall, we can choose a lower threshold.The pre-trained models to be included in the ensemble were chosen according to their performance.First, we chose the three models with the highest F1-score, then, we chose the three models with the highest recall score, and lastly, we chose the three models with the highest precision value.Table 5 and Table 6 list the performance metrics for the soft and hard ensemble models, respectively.Given that recall is more important in our evaluation, we also build ensemble models where we include the two best models in terms of recall and the best one in terms of precision.Using the pre-trained models with the highest recall scores (Tab-Net, XGBoost, and Random Forest) to build the soft ensemble model led to the best result and showed a significant improvement over all the other models with a 94.04% recall score and close to 78% F1-score.

CONCLUSIONS AND RECOMMENDATIONS
Vehicle-pedestrian fatal crash prediction is an essential component in different aspects of road safety.The main objective of road safety practitioners is to reduce fatality on roads.Statistical models are yet a crucial tool in developing crash prediction models; however, as more computing power begins to emerge, deep learning models have the potential to improve prediction models for road safety.A more accurate crash prediction model can aid essential decisionmaking processes.Although deep learning models have been employed for crash prediction models in the last few years, one of the most critical components lacking among all the previous research is the interpretability of what factors have the highest effect on the outcome.TabNet can depict the most significant contributing factors to the output.
The developed model can aid in the decision-making process for different stakeholders in the transportation sector.Legislators can use this tool to evaluate the effectiveness of any new project they might be planning to initiate by testing the model with the conditions of the infrastructure they plan to build.Traffic safety planners can understand what safety measures might be needed to avoid a vehicle-pedestrian crash.Transportation engineers may use this tool to ensure that the design takes into concern all the factors that may be required to prevent pedestrian-vehicle fatalities.Finally, government agencies may use this tool to make decisions on future projects that may be needed.Overall, with improvements, the model has the potential to be a go-to tool for all stakeholders in the transportation sector.There is a lot of potential for improving this model.Among others, more data can be added from the FARS database to ensure that a better model with more uncertainties is

Figure 1 :
Figure 1: Single step (top) as well as the overall multi-step layout of the TabNet architecture (bottom).FC represents a fully connected layer, while BN stands for a batch normalization layer.

Figure 2 :
Figure 2: Shown are the layouts of the Attentive Transformer (left) and the Feature Transformer (right) that are used in TabNet.Black circles are used to implement elementwise multiplication.The dashed lines in the feature transformer are from the prior step.FC represents a fully connected layer, BN stands for a batch normalization layer, and GLU stands for Gated Linear Unit non-linearity.

Figure 3 :
Figure 3: AUC score of the TabNet model as a function of the number of training iterations as measured on training and development subsets

4 and 5 TPR 4
Model Interpretability.A significant need for transportation engineers and planners is to understand what impacts the output.

Figure 4 :
Figure 4: The contribution of different factors with respect to fatal vehicle-pedestrian crash prediction.

Figure 3
illustrates the AUC score of the TabNet model as a function of the number of training iterations using the training (blue curve) and development (orange curve) subsets.The result depicts that the model has been trained significantly, although the development AUC does not change much after approximately 30 iterations (while the training AUC continues to increase).A final AUC score of 9.1% on the test dataset indicates that the model can accurately distinguish between fatal vehicle-pedestrian crashes and other fatal crashes.

Table 1 :
Input variables for the models

Table 2 :
The output variable of the models

Table 3 :
Performance metrics for the vehicle-pedestrian crash class for all models considered

Table 4 :
Model Evaluation Metrics for TabNet when the model is trained on 2016-2018 data and tested on 2019 data Furthermore, the model may be extended to a multi-class problem where all types of crashes can be predicted.

Table 5 :
Performance metrics for the soft voting ensemble models

Table 6 :
Performance metrics for the hard voting ensemble models