Uncertainty Quantification via Spatial-Temporal Tweedie Model for Zero-inflated and Long-tail Travel Demand Prediction

Understanding Origin-Destination (O-D) travel demand is crucial for transportation management. However, traditional spatial-temporal deep learning models grapple with addressing the sparse and long-tail characteristics in high-resolution O-D matrices and quantifying prediction uncertainty. This dilemma arises from the numerous zeros and over-dispersed demand patterns within these matrices, which challenge the Gaussian assumption inherent to deterministic deep learning models. To address these challenges, we propose a novel approach: the Spatial-Temporal Tweedie Graph Neural Network (STTD). The STTD introduces the Tweedie distribution as a compelling alternative to the traditional 'zero-inflated' model and leverages spatial and temporal embeddings to parameterize travel demand distributions. Our evaluations using real-world datasets highlight STTD's superiority in providing accurate predictions and precise confidence intervals, particularly in high-resolution scenarios.


INTRODUCTION
Efficient urban transportation hinges upon a balanced travel supply and demand, a balance greatly aided by accurate O-D travel demand forecasting [6,23].This precision in prediction allows for dynamic resource allocation, reducing wait times, and boosting service provider profitability [4,10,27].Thus, improving the model accuracy is the main focus of the travel demand prediction domain.
However, the task is not without its challenges, owing to the intricate spatial-temporal interdependencies and fluctuating nature of travel demand.While regions with dense demand, like airports and hospitals, generally present data that adheres to a Gaussian distribution-a core assumption of numerous prediction models- [5,9], the opposite is true for areas with sparse and discrete O-D demand, such as educational institutions and government premises [1,12].Such deviations from the Gaussian assumption further complicate forecasting.Moreover, the problem of data sparsity is exacerbated when accounting for the disparity in urban demand across different regions at high spatial-temporal resolutions, like at 5min intervals [29].A plethora of zero values, signifying the absence of trips, along with a long-tail distribution at higher demand levels, result in the skewness, discrepancy, and large variance in the data distribution [18,19,28].Hence, accurately interpreting zeros, capturing longtail distributions, and understanding non-negative discrete values become paramount for robust demand forecasting.
Classic deep learning methods, such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs), have tackled O-D matrix prediction by exploiting spatial and temporal dependencies [8,11,22,24,26].Recent advancements have introduced Graph Neural Networks (GNNs), which leverage the graph-like structure of O-D matrices to uncover non-Euclidean correlations [3,23].Despite their respective merits, all these models mainly treat O-D matrix entries as continuous variables, with a primary focus on coarse temporal resolutions.They usually simplify variance structures by assuming homoskedasticity (constant variance) and predominantly output expected average travel demand values.These approaches could overlook critical data features and may fail to sufficiently account for potential deviations and realworld uncertainties [13,29].Recent research [17,29] suggests that integrating zero-inflation statistic models with deterministic deep learning frameworks might hold promise for modelling sporadic travel demand and quantifying uncertainty, even these approaches still fall short in adequately accounting for the long-tail distribution in sparse O-D demand.
In this paper, we propose the Spatial-Temporal Tweedie Graph Neural Network (STTD)-a comprehensive solution designed for joint numeric prediction and uncertainty quantification.Our main contributions can be summarized as follows: • We integrate the Tweedie distribution to model demand, replacing the traditional two-part zero-inflated model, thereby effectively capturing the zero-inflation and long-tail non-zero characteristics of O-D travel data.• The proposed combination is adept at quantifying the spatialtemporal uncertainty inherent in sparse travel demand data.• We validate the superiority of the STTD through experiments on two real-world travel demand datasets, tested across various spatial-temporal resolutions and performance metrics.
The paper is organized as follows.Section 2 defines the research question and develops the model.Section 3 introduces the dataset used for the case study, the evaluation metrics, and the experimental results.Section 4 concludes the paper and discusses future research.

METHODOLOGY 2.1 Problem Description
The primary objective of our model is to fit parameters that capture the future travel demand distributions for each O-D pair over a span of  future time windows.The model accomplishes this by leveraging data from  origins,  destinations, and the corresponding travel demand within periods of length  minutes, rendering the task essentially a sequence-to-sequence prediction.Unlike previous work that defined the locations of origins or destinations as vertices, we adopt a more effective approach and directly construct the O-D flow graph G = ( , , ).Within this graph, | | =  ×  signifies the set of O-D pairs,  designates the set of edges, and  ∈ R | | × | | represents the adjacency matrix that outlines the relationships among O-D pairs [6,29].Our approach provides a more nuanced understanding of the spatial-temporal intricate interrelationships present in urban travel demand.
We let   represent the trips that occur at the  ℎ O-D pair in the  ℎ time window, where  ∈  and   ∈ N. Our approach primarily considers the individual instances of travel demand at different intervals, all of which collectively model the term   .Subsequently,   ∈ N | | × designates the demand for all O-D pairs in the  ℎ time window, with   as its entry.The objective is to utilize historical records  1: as the inputs for training data, aided by the graph structure , to predict the probabilistic density function  (  +1: + ) of the distribution of   +1: + -that is, the travel demand distribution for the next  time windows.This prediction allows us to analyze the expected values and confidence intervals of travel demands. .

Tweedie (TD) Distribution
Here,  ∈ R represents the natural parameter, while  ∈ R + is the dispersion parameter.The normalizing functions (•) and  (•) correspond to parameters  and  , respectively [2,16].Functions' details will be provided later.In the context of the Tweedie distribution, the mean and variance of a random variable  are given by the following expressions: where  ′ ( ) and  ′′ ( ) denote the first and second derivatives of  ( ), respectively.Here,  ≥ 0 is the mean parameter.The Tweedie family incorporates many significant distributions based on different index parameter .This includes the Normal ( = 0), Poisson ( = 1), Gamma ( = 2), Inverse Gaussian ( = 3), and Compound Poisson-Gamma distribution (1 <  < 2) [14][15][16].The Compound Poisson-Gamma distribution is particularly useful due to its ability to parameterize zero-inflated and long-tail data.When 1 <  < 2, the demand   can be expressed as shown in Eq.1: + (2) .
where   , the number of time slices within the time  to align with the Tweedie distribution definition 2 .If no trips occur, then   = 0, and the probability mass at zero for travel demand is  (  = 0) = exp (−) [2].Otherwise,   is computed as the sum of   independent Gamma random variables.We re-parameterize the Tweedie distribution where  =  1− /(1 − ), and  ( ) =  2− /(2 − ) as: with the normalizing function (  , , ) defined as: . In this definition, , , and  are the key parameters determining the probability and expected value of travel demands.The parameters in the Gamma and Poisson distributions, namely , , , can be computed using , , and The choice of the Tweedie distribution is driven by the characteristics of travel demand data.In practical terms, a specific time window may span different durations, such as 60 minutes, 15 minutes, or even as brief as 5 minutes.However, the distribution of trips during these periods can exhibit significant variations in both spatial and temporal dimensions.By incorporating more granular intervals into the model, it is possible to better capture the variability and heterogeneity of demand within the given time window.The Tweedie distribution effectively models zero-inflated and long-tail data distributions, making it particularly suited to this context.

Learning Framework and Loss Function
We utilize Diffusion Graph Convolution Network (DGCN) and Temporal Convolutional Network (TCN) as Spatial-Temporal Graph Encoder ST [21,29].Thus, node spatial-temporal embedding Z can be denoted as: Z = ST Θ ( 1: , ).where Z  ∈ R  ′ is the spatial-temporal embedding of the  ℎ O-D pair.Thus, the three ...
where  ∈ [0, +∞),  ∈ (0, +∞),  ∈ (1, 2).Learnable weight matrices   ,  ,   ,   ,   ∈ R  ′ × and lim  → 0 is the minimum value.In order to fully predict travel demands, let  * be one of the predicted travel demand Tweedie distributions from    (  +1: + ) with parameters , ,  (notations , ,  are reused for clearer formula).The learning objective of the whole model can be represented as the maximum log-likelihood function: max log    ( * |, , ) and directly use the negative likelihood as our loss function to better fit the distribution into the data.The log-likelihood of TD is composed of the  * = 0 and  * > 0: where Θ is model paramaters and  is weight-parameter for L2 Normalization.Moreover, for  * > 0 : , where where , ,  are also selected and calculated according to the index of  * = 0 or  * > 0. We optimize the lower bound of L   during training to avoid calculation of summation formula.The whole framework is illustrated in Figure 1.

Model Comparison
We carry out experiments on five distinct travel demand scenarios, presenting the prediction results in Table 1.Here, the best and second-best scores are highlighted with bold and underlined values,

Parameters Visualization
We visualize the learned parameters , ,  and real values  by 3D surface plots in Figure 2 on CDPSAMP10 and SLDSAMP10 test sets.As the plots provided, it is evident that for long-tailed data, the learned values of  are greater (closer to 2), and a reason stands that the main part of loss function  *  1− 1− ( * > 0) will be punished when predicted  * lies in long-tail while the predict approaches zero, making the loss function converts sharply.While for zerovalued data, the learned parameter  is huge.Thus it shows that the model can capture zero-inflated and long-tail data effectively

Denote
as the probability mass function of future O-D travel demand   out model outputs.It follows Tweedie distribution, which is in the form of:  TD (  |, ) ≡ (  , ) exp    − ( )
3 https://data.cityofchicago.org/Transportation/Transportation-Network-Providers-Trips/m6dm-c72p 4https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.pagepairs from CDP.The SLD dataset encapsulates For-Hire Vehicle trip records in 67 Manhattan administrative zones.We alter temporal resolution (5/15/ 60-minute intervals) and sample 10 × 10 / 67 × 67 O-D pairs to gauge our model's performance.Both datasets were used in previous work [6, 29].Evaluation Metrics: (1) Point estimates: Mean Absolute Error (MAE), which measures the accuracy of the mean or median value of the predicted Tweedie distributions.(2) Distributional uncertainty: Mean Prediction Interval Width (MPIW) and Prediction Interval Coverage Probability (PICP) within the 10%-90% confidence interval.MPIW averages the width of the confidence interval, while PICP quantifies the percentage of actual data points within confidence intervals.Additionally, KL-Divergence is applied to evaluate the similarity between the predicted and real data distributions.(3) Discrete demand prediction: true-zero rate and F1-score.The truezero rate measures the model's fidelity in reproducing data sparsity, and the F1-score gauges the accuracy of discrete predictions.In general, Lower MPIW and KL-Divergence values are favorable while larger true-zero rate, PICP, and F1-score values denote superior model performance.
window, follows a Poisson distribution  () with mean .The number of trips,   , are independent gamma random variables denoted by (, ) with mean  and variance  2 .In this way,   is formed by the aggregation of discreet count, and we introduce   ,   , and   =   =1 (  )