A Fundamental Model with Stable Interpretability for Traffic Forecasting

Deep learning models have been widely applied in traffic prediction and analysis. Notably, attention-based models like Graph Attention Network (GAT) have brought significant insights and decisionmaking capabilities to traffic managers through their interpretability. However, attacks on the sensor networks that traffic prediction relies on can introduce severe disturbances and uncertainties in the interpretability of models, leading to erroneous judgments by managers. To address the issue, we propose a definition of fundamental models with stable interpretability. In the paper, we first showcase existing attention-based interpretable models in traffic prediction and analysis. Subsequently, we introduce and define the conditions that this fundamental model should meet in terms of accuracy, interpretability, and stability of interpretability. Finally, we discuss the opportunities and potential development directions in traffic forecasting and analysis. It is promised that the model will establish a solid foundation for ensuring the safety of deploying and applying interpretable models in real-world transportation management systems.


INTRODUCTION
Transportation plays a pivotal role in daily life, exerting a significant influence on various aspects of society.In this context, accurate and real-time traffic forecasting holds immense significance for road users, private sectors, and governments.Essential transportation services like flow control, route planning, and navigation heavily rely on precise evaluations of traffic conditions [15].Furthermore, multiscale traffic forecasting serves as the fundamental basis for effective urban traffic control, guidance, and the Intelligent Transportation System (ITS) [13].
Recently, a large amount of graph neural networks (GNNs) [3,8,15] are proposed to capture spatial dependencies and representations among the nodes.these methods model the transportation network by representing the sensor nodes within the network as vertices in the graph.The approaches have been shown to improve the accuracy of traffic flow forecasting.
Furthermore, in real-world scenarios, both the performance of the model in traffic forecasting and the interpretability of its prediction process are critical [7].By leveraging interpretable models, traffic management authorities can make informed decisions regarding infrastructure improvements, route optimizations, and traffic control strategies, thereby enhancing overall traffic flow efficiency and mitigating congestion issues [18,32].The Graph Attention Network (GAT) and its derived models have been widely employed for interpretability in various domains [11,14].Also, building upon previous work [8], the attention mechanism in GNN models has demonstrated the ability to effectively capture the impact of different neighboring nodes on the target prediction node, particularly during different time periods.This is the reason that attention weights are often regarded as providing insights into the "inner workings" of the model [12,30].By quantifying the dependencies among traffic nodes, they provide valuable insights for traffic management departments, aiding in future design and planning endeavors.For example, for primary road A, determining which specific branches to expand can have a substantial impact on alleviating congestion during peak hours.
However, in real-world environments, there are perturbations caused by attacks on sensor networks (such as Poisson attacks) [1,21,22] and noise data caused by device failures.Apart from identifying heavily deviated data through anomaly detection, small changes in contaminated data can easily have adverse effects on  the training of online learning models, leading to decreased model performance and erroneous interpretability.Here, erroneous interpretability means that the weights of the model are affected, resulting in a disparity between the information conveyed by the model and the actual conditions.The discrepancy can potentially lead to misleading suggestions for transportation managers.Therefore, there is significant value and importance in exploring and researching models that can maintain stable predictions while preserving stable correct interpretability.A further illustration of the differences among the three types of models is given in Fig. 1.A vanilla model such as STGCN [31] tends to exhibit degraded performance due to the impact of noisy data.While a stable model [7] ensures the stability of prediction accuracy, it lacks the ability to maintain the stability of the learned model parameters, resulting in significant changes in interpretability.In contrast, a model with stable interpretability can ensure both high performance and the stability of model weights.It provides reliable and accurate insights, even in the presence of adversarial attacks or anomalous data, enabling traffic management authorities to make well-informed decisions based on trustworthy information.

INTERPRETABLE MODELS IN TRAFFIC FORECASTING
A common definition of interpretability is the degree to which an observer can understand the underlying reasons behind a decision [17].In traffic forecasting and other traffic analysis domains, the interpretability of a model is demonstrated by its capability to provide explanations for the correlation between road segments and the causal relationship between these correlations and changes in traffic flow [29,30].GNNs have gained widespread application in spatiotemporal time series prediction [3,13], primarily due to their ability to learn spatial correlations using either automatically learned or pre-defined adjacency matrices [2,8,9].Consequently, by harnessing the interpretability of GNNs, it becomes possible to visually demonstrate the correlations between nodes in the road network intuitively.Recently, a large number of proposed approaches are aimed at improving explanations for GNNs [16,24,27].Several research studies focus on gaining insights into the underlying factors that drive predictions and improving the understanding of relationships among traffic nodes.[7,28].All these methods take attention mechanisms [23] to enable the model to perceive the importance weights of neighboring nodes.The intuition behind this mechanism can be explained by drawing an analogy to human biological systems, where we tend to utilize the most pertinent portions of the input while disregarding other irrelevant parts.Before we clarify the definition of the model with stable interpretability, we first give an explanation of the attention mechanism in Graph Attention Network (GAT).Here we follow the notations in [25] for typical Graph Attention Networks (GAT).In a traffic forecasting task, we take historical traffic volume data from various traffic nodes over a certain number of time slots as input and predict the future traffic volume for all traffic nodes for one or several upcoming time slots.The historical traffic volumes are set as the input features of nodes where  denotes the node total number.The input features are fed into the GAT layer and then produce a new set of features . Then, the attention coefficients are calculated by a self-attention mechanism on the nodes: where  ∈ R  ′ × R  ′ is a weight matrix after a shared linear transformation applied to each node.And    indicates the importance of node 's features to node .In practice, a softmax function is used to normalize the coefficients across all choices of .That is, In the following section, we use the attention score    to define the stable interpretation of graphs.

THE DEFINITION OF STABLE INTERPRETABILITY MODEL
We first define the attention vector for each node.We denote the attention vector   = { 1 ,  2 , . . .,   } for node , where  is the number of node 's neighbors.Then, we omit the subscript  for each node to , and  is denoted as the output.
Note that we use vanilla attention to represent the attention machanism in GAT.And our goal is to find a substitute of 'vanilla attention' to make the node interpretation more stable.A stable attention should keep the interpretation of vanilla attention.In GAT, the rank of each entry in the attention vector determines the importance of its associated neighbors.To increase the sparsity of node neighbor in case too many neighbors in some nodes.Furthermore, to keep the order of leading entries, mathematically, we can use the overlaps of top- indices between stable attention and vanilla attention to measure their similarity on interpretability, where  is a hyperparameter.Definition 1 (Top- overlaps).For vector  ∈ R  , we define the set of top- component   (•) as follow, And for two vectors ,  ′ , the top-k overlap function   (,  ′ ) is defined by the overlapping ratio between the top- components of two vectors, i.e.,   (,  ′ ) = 1  |  () ∩   ( ′ )|.Also,  could be seen as a function of ℎ in attention.Thus, w can also be seen as a function of ℎ.Moreover, since we only concern about replacing the attention vector, thus we will still follow the previous model except for the procedure to produce the vector w (ℎ).We define a Stable Attention-based Model (SAM) as follows.
Definition 2 (Stable Attention-based Model).We call a vector w is a (, , , ,  1 ,  2 )-Stable Interpretable Attention for the vanilla attention  if it satisfies for any , • (Similarity of Interpretability) Note that in the previous definition, there are several parameters.There are two properties -similarity and stability for prediction and interpretability, respectively.
The first two conditions are the similarity and stability of attention.Noted that in the interpretability field, we are not only willing to ensure the stability of explanation but also the sensitivity of explanation which is quite different for prediction.To be more specific, the attention should be sensitive when removing important features while being stable when adding small perturbations.Thus, we use top- to keep such characteristics.We ensure w has similar interpretability with vanilla attention.There are two parameters,  1 and  1 . 1 could be considered prior knowledge, i.e., we believe the top- 1 indices of attention will play the most important role in making the prediction, or their corresponding  1 features can almost determine its prediction. 1 measures how much interpretability does w inherit from vanilla attention.When  1 = 1, then this means the top- 1 order of the entries in w () is the same as it is in vanilla attention.Thus,  1 should close to 1.The term stability involves two parameters,  1 and  2 , which correspond to the robust region and the level of stability, respectively.Ideally, if w satisfies this condition with  1 = ∞ and  2 = 1, then w will be extremely stable w.r.t any randomness or perturbations.Thus, in practice, we wish  1 to be as large as possible and  2 to be close enough to 1.
The last two conditions are the similarity and stability of prediction based on attention.In the third condition,  1 measures the closeness between the prediction distribution based on w and the prediction distribution based on vanilla attention.When  1 = 0, then w = .Therefore we hope  1 to be as small as possible.Similarly, the term stability involves two parameters,  2 and  2 , which correspond to the robust region and the level of stability, respectively.Ideally, if w satisfies this condition with  2 = ∞ and  2 = 0, then w will be extremely stable w.r.t any randomness or perturbations.Thus, in practice, we wish  2 to be as large as possible and  2 to be sufficiently small.
Thus, based on these discussions, we can see Definition 2 is consistent with our above intuition on stable attention and it is reasonable.

OPPORTUNITIES IN MODELS WITH STABLE INTERPRETABILITY
In this section, we envision the primary challenges for models with stable interpretability in traffic forecasting tasks.Additionally, we explore the possibilities of applying this framework to various traffic analysis tasks and discuss the prospects of building a faithful model for real-world scenarios based on the stable model.
Varying limitations on stable interpretability Models in Different Scenes.The traffic prediction tasks with different scenarios have distinct performance requirements [29], which place varying degrees of emphasis on interpretability.According to the proposed general definition of a stable interpretability model in the third section, we assign varying degrees of importance to four specific learning objectives.In specific tasks, the objective is to ensure that the model meets practical requirements in terms of prediction accuracy, interpretability, and stability of interpretability.To achieve this, an approximate loss function is designed for training that takes into account these three aspects.
Generalizability of stable interpretability across different types of noise.Another critical challenge is to empower the model with the ability to maintain stable interpretability in various attack and noise environments [4].Previous work on building robust GNNs has primarily focused on specific noise environments, such as input data protected under differential privacy or data subject to specific Poisson attacks.However, in real-world environments, the noise contamination experienced by the data is uncertain.In this regard, we can extend the preventing catastrophic forgetting approaches, for example, lottery ticket hypothesis [6] and REMIND [10], to hold stable weights trained from the normal data, thereby achieving a spatial-temporal Graph Neural Network that maintains stable interpretability in complex noise environments.
Model with stable interpretability in other traffic analysis tasks.Also, the interpretability of models is not only required in traffic forecasting but also in other areas of traffic analysis, such as travel time prediction, traffic accident prediction, and travel mode classification [5,19,26,33].Building stable interpretability is necessary for these domains as well.For example, in traffic accident prediction, after the model issues an accident alert, the interpretability and stability of the model can provide consistent information to traffic managers and analysis teams, informing them about which nodes' traffic flow changes could reduce the likelihood of accidents.Attention-based Graph Neural Networks can also be applied in these domains and provide interpretability.Building on such models, we can further develop stable models that maintain interpretability even in the presence of noise.
Stable interpretability to faithful interpretability.Although the attention mechanism has laid the foundation for interpretability in deep learning, the current models primarily provide explanations for the correlations between nodes, while the causal relationships between nodes still require further design.Similarly, it is worth exploring how these abstract graph nodes representing the correlations between roads can be related to other factors in real-world traffic scenarios [20,30], such as the number of lanes, road width, and density of Points of Interest (POI).By incorporating relevant factors from real-world environments during the model training process, we can evolve stable interpretability into faithful interpretability.Having faithful interpretability not only enhances the robustness of the model to noisy data but also establishes stable connections with elements present in the real environment.

CONCLUSION
In this paper, we address the issue of existing interpretable models being susceptible to real-world noise and propose a model with stable interpretability, along with four explicit objective functions that define its general framework.Furthermore, we explore and elaborate on the refinement of forecasting models in different scenarios, the potential application of this model in other traffic analysis tasks, and the strategies to ensure reliable stable interpretability.This research domain holds great promise for implementation in realworld environments, particularly in the development of intelligent transportation systems that are complex, noisy, and compliant with artificial intelligence regulations.

Figure 1 :
Figure 1: An example demonstrates the variation of model weights between benign dataset training and perturbations dataset training for neighboring nodes.In this example, the red nodes represent the time series prediction targets, while the green nodes represent the neighboring nodes of the prediction targets.The color intensity illustrates the absolute value of the weights, with darker shades indicating larger weights.It can be observed that the weights of the ordinary and stable models undergo significant changes across different training sets, whereas the weights of the model with faithful interpretability remain nearly unchanged.The smaller RMSE indicates better model performance.thetraining of online learning models, leading to decreased model performance and erroneous interpretability.Here, erroneous interpretability means that the weights of the model are affected, resulting in a disparity between the information conveyed by the model and the actual conditions.The discrepancy can potentially lead to misleading suggestions for transportation managers.Therefore, there is significant value and importance in exploring and researching models that can maintain stable predictions while preserving stable correct interpretability.A further illustration of the differences among the three types of models is given in Fig.1.A vanilla model such as STGCN[31] tends to exhibit degraded performance due to the impact of noisy data.While a stable model[7] ensures the stability of prediction accuracy, it lacks the ability to maintain the stability of the learned model parameters, resulting in significant changes in interpretability.In contrast, a model with stable interpretability can ensure both high performance and the stability of model weights.It provides reliable and accurate insights, even in the presence of adversarial attacks or anomalous data, enabling traffic management authorities to make well-informed decisions based on trustworthy information.