DualGRETEL+: Exploiting Dual Hypergraphs for Path Inference Applied to Navigation Data

This paper addresses the problem of path inference in GPS navigation data by enhancing a generative path inference model called GRETEL. The enhanced model, DualGRETEL+, utilizes a dual hypergraph for feature extraction to capture more complex interactions among GPS data. Additionally, a second-order optimizer, the AdaHessian, is employed to enhance the performance of DualGRETEL+. To evaluate the proposed framework, three distinct datasets were used. Experiments indicate that the use of hypergraph features and AdaHessian optimizer contribute to a significant improvement in performance. Consequently, DualGRETEL+ is a promising solution for the path inference problem in GPS navigation data.


INTRODUCTION
In recent years, the widespread availability of Global Positioning System (GPS)-enabled devices has led to an explosion in the available location data.GPS data are collected from GPS devices, providing highly accurate information about the movement of vehicles, people, and other objects.They can be used to obtain insights into traffic patterns, route planning, and other applications [5].
One approach to analyzing GPS data is to represent them as a graph, where nodes represent geographic locations, and edges represent connections between them [18].This graph-based representation allows the application of powerful machine learning techniques such as Graph Convolutional Networks (GCNs) [4,9,12] to analyze and predict the movements of objects in space.Path inference, which involves predicting an object's future path or trajectory based on its past movements and other relevant features, is a common technique in analyzing GPS data to predict the movements of vehicles, people, or other objects.
In the context of GPS data, the path inference problem has been addressed by a generative model called GRETEL [3].The model is designed to accurately capture the directionality of an observed path of ordered GPS locations, referred to as a prefix, and generate a suggested path, known as a suffix.Candidate suffixes are generated by performing a non-backtracking walk on the modified graph.The ultimate aim is to predict the upcoming roads the driver will likely take based on their travel history.
The Dual Hypergraph Transformation (DHT) algorithm transforms a conventional graph into its dual hypergraph, focusing on edge representation [6].Hypergraphs are extensions of traditional graphs.They are capable of modeling higher-order interactions [10].The edges of the original graph are transformed into the hypergraph nodes, and the original graph nodes are transformed into the hyperedges of the hypergraph.The resulting hypergraph can be represented using an incidence matrix that captures all the information.The transformation to dual hypergraph representation allows for more flexible and expressive modeling of complex relationships among GPS data, emphasizing edge characteristics.Here, it is shown that incorporating these new features into the GRETEL model significantly enhances its performance, enabling more accurate predictions of object movements in space.The improved model is called DualGRETEL+ and is tested in three different datasets.The outline of the proposed framework's major steps at a high level is summarized next, and they are visually presented in Fig. 1.
(1) Construct the graph using navigation data.
(2) Apply the DHT to the original graph to obtain its corresponding dual hypergraph (3) Extract novel features from the dual hypergraph (4) Utilize both the navigation data and the extracted features within the GRETEL model for path inference The optimization process greatly influences the performance of machine learning models [2].Thus, the choice of the optimizer can significantly affect the results.In the context of training Du-alGRETEL+, two popular optimization algorithms, Adam [8], and AdaHessian [16], are evaluated.Adam uses a combination of the gradient's first-and second-order moments to adapt the learning rate of each weight of the neural network.It is a first-order optimizer that performs well on a diverse set of deep learning tasks.While Adam is a popular optimization algorithm due to its ability to converge to a good solution quickly, some research has shown that in certain cases, it may fail to converge to the optimal solution and even lead to poorer generalization performance [13].AdaHessian is a second-order optimizer that uses the Hessian matrix, which captures the curvature of the loss function.AdaHessian has been shown to outperform Adam on some tasks, particularly in terms of generalizing performance [16].However, AdaHessian is computationally more expensive than Adam, as it requires the computation of the Hessian matrix.The computational cost may be prohibitive for large neural networks.By conducting experiments on three different datasets with GPS navigation paths, it is observed that the original configuration of GRETEL, which uses the Adam optimizer, can be further improved.Specifically, utilizing the AdaHessian optimizer, a better performance associated with higher accuracy is obtained in the path inference problem.The paper's major contribution is the integration of DHT and the AdaHessian optimizer into GRETEL, resulting in DualGRETEL+ applied to navigation data.This integration brings significant enhancements to the model's performance.By leveraging DHT, new features can be extracted that capture complex correlations among GPS data, leading to better results.Tests are conducted on three datasets, expanding the scope of [3] where only one dataset was used.Meanwhile, using the AdaHessian optimizer further improves DualGRETEL+ overall performance, extending the previous work for path inference in Wikipedia links [14].Together, these two advancements make DualGRETEL+ a highly effective tool for path inference.
The paper is structured as follows: Section 2 provides an in-depth discussion of the methods employed in the GRETEL model, along with the DHT algorithm and the methods used for feature extraction that leads to DualGRETEL+.Section 3 describes the experimental setup and compares the Adam and AdaHessian optimizers [16] using three different datasets.This section also includes a thorough presentation of the experiments related to feature extraction from the dual hypergraph.Finally, conclusions are drawn in Section 4.

METHODOLOGY 2.1 Path inference with GRETEL
Assume  = (V, E) is a graph with  nodes and  edges.We are interested in finding the shortest path between two nodes.An agent moves from one node to another only if there exists a directed edge connecting them.At any given time , the agent's location is given by the sequence of nodes  = ( 1 ,  2 , . . .,   ), known as the prefix of the path traversed on .Let ℎ be the prediction horizon.The aim is to estimate the conditional likelihood Pr( | ℎ, , ), where  = (  +1 , . . .,   +ℎ ) represents the suffix of the path traversed on .This estimation uses the CRETEL algorithm proposed in [3].
At time , the agent is represented as a sparse vector     ∈ R  ≥ 0 that has been normalized to have a sum of one.The th non-zero element of     represents the probability that the agent is located at node   at .A trajectory of the agent is defined as  Δ = (    :  ∈ I), where I is a sub-sequence of 1, 2, . ... Therefore, estimating the likelihood Pr( | ℎ, , ) is equivalent to estimating Pr( | ℎ, , ).
CRETEL [3] is a generative model for graphs.In other words, the model can generate a suffix path given a prefix path and a horizon.To account for the directionality of edges in the graph, a latent graph is defined as Φ Δ = (V, E,   ), where   =   (, ) is a multi-layer perceptron (MLP) network that encodes the edge directionality in the graph .Specifically, the MLP computes the non-normalized weights of each edge as follows Here,    and    are the pseudo-coordinates of the sender and receiver node, respectively, while     and     represent the features of the sender and receiver nodes.
In (1), the feature vector   → corresponds to the edge connecting the sender and receiver nodes.The computed MLP outputs are normalized using the softmax function, expressed as follows where  (•) denotes the logistic function,    ∈ R × is the adjacency matrix of the graph, and    ∈ R | I | × | I | are the GNN weights.The recursion in ( 4) is initialized with    (0) =    .This way, we have Given a target distribution     +ℎ , the model tries to estimate the destination distribution x    +ℎ over a horizon ℎ.This is done by the non-backtracking walk [7] x where     ∈ R × has elements and     a is a  ×  matrix with [    ]  → , = 0 if  ≠  and   ( → ), otherwise.In this case, the dot loss can be applied to train the model.

Dual Hypergraph Transformation (DHT)
A graph  can be fully described by its node and edge features as well as the connections among them.Node features are represented with matrix    ∈ R × where  is the number of nodes, and  is the dimension of a node feature vector.Respectively, edge features are represented with matrix    ∈ R × ′ , where  is the number of edges, and  ′ is the dimension of an edge feature vector.The adjacency matrix    captures node connections.The incidence matrix of an undirected graph given by    ∈ {0, 1} × or the incidence matrix of a directed graph given by    ∈ {−1, 0, 1} × , provides additional information that captures the node-edge relationships as well as the orientation of the edges.Thus, a graph can be represented as  = (  ,   ,   ).
A hypergraph is a mathematical structure that generalizes the concept of a graph.In a hypergraph, edges can connect any number of vertices, not just two, as in a traditional graph.In this way, higherorder interactions can be represented.A hypergraph is typically represented by a set of vertices and a collection of hyperedges that connect subsets of these vertices.This information can be extracted by the incidence matrix   .A hypergraph can be defined as  * = (   * ,    * ,    * ), where    * and    * are the node and hyperedge features respectively, and    * is the incidence matrix of the hypergraph.
The Dual Hypergraph Transform (DHT) interchanges the roles of nodes and edges of the original graph [6].The features accompanying the nodes and edges are preserved, but they also change roles.More specifically, an edge in the original graph is transformed into a node in the dual hypergraph.A node in the original graph is transformed into a hyperedge in the dual hypergraph, i.e.,    * =    ∈ R × ′ and    * =    ∈ R × .The incidence matrix of the dual hypergraph is the transposed incidence matrix of the original graph, i.e.,    * =    ⊤ .The formal representation of this transformation is given by The DHT is a bijective algorithm, implying that by applying it to the dual hypergraph  * , the original graph  can be reconstructed.Fig. 2 shows an example of the DHT applied to a simple graph.

Feature Extraction in Dual Hypergraph
The DHT algorithm [6] transforms a given graph into its dual hypergraph and extracts features using the incidence matrix in two ways.
If the original graph G is undirected, the incidence matrix    ∈ {0, 1} × is a binary matrix of size  × , where  and  are the numbers of nodes and edges, respectively.Each node  is associated with an incidence row vector     ∈ {0, 1}

Optimizer Selection
Neural network optimization typically involves updating the model's weights using gradient-based algorithms such as Gradient Descent or Stochastic Gradient Descent [13].However, these methods can slowly converge and tend to overfit the training data, resulting in poor generalization performance on unseen data [8].To address these issues, researchers have proposed second-order optimization methods that utilize information from the Hessian matrix of the loss function.In addition to the Hessian information, gradient information can improve convergence and generalization performance.AdaHessian [16] is one such method that modifies the update rule of the popular Adam optimizer [8] to include second-order information, resulting in improved convergence and generalization performance on various tasks.Specifically, AdaHessian uses a Hessian diagonal matrix approximator [1] to estimate the second-order information and updates the model parameters as follows where ⊘ defines the element-wise division operator between two vectors.Here,     and   are the model parameters and the learning rate at time step , respectively.    and     are the first and second moments of the AdaHessian, respectively, computed using exponential moving averages, i.e., for 0 ≤  < 1: and Here,  1 and  2 stand for the exponential decay rates for the first and second-moment estimations, respectively.Typical values are  1 = 0.9 and  2 = 0.999.Moreover, in (11),     is the gradient of the loss function while in (12)     is the spatially averaged Hessian diagonal approximation of the loss function at time step  [16].It should be noted that Adam applies similar formulas for modifying the model parameters.However, it differs in the aspect that instead of using the averaged Hessian diagonal     as in (12), it utilizes the gradient     in the second-order moment calculation.DualGRETEL+ uses the Adahessian optimizer in contrast to the original GRETEL model, which resorts to the Adam optimizer [3].Experiments are reported in Section 3, which demonstrate the effectiveness of AdaHessian in path inference.

EXPERIMENTS
Experiments have been conducted on three different datasets with two main objectives.Firstly, to demonstrate the effectiveness of incorporating novel hypergraph features into the GRETEL model.Secondly, to evaluate the impact of using the AdaHessian optimizer on model performance and generalization ability.The experimental findings indicate that the AdaHessian optimizer has resulted in better performance and higher accuracy in path inference problems than the Adam optimizer.
All three datasets use navigation paths derived from GPS data, which can be prone to errors due to GPS noise, signal loss, or other factors.Therefore, a pre-processing step is required to align the GPS data to a known road network, enabling the determination of the vehicle's route.This process is known as map matching and aims to improve the accuracy and usefulness of the location data.
The first dataset is the same as in [3], which includes food deliveries occurring over the OpenStreetMap road network of Lausanne.The map graph includes 18, 156 nodes and 32, 468 edges.The second dataset, Geolife [17], contains GPS trajectories recorded by Microsoft Research Asia from April 2007 to August 2012 in Beijing, China.The dataset includes additional data, such as timestamps, altitude, user speed, and GPS coordinates.Geolife consists of 32, 442 nodes and 53, 050 edges.The third dataset, iWet, includes tourist itineraries for buses in the Central Macedonia region of Greece.iWet comprises 18, 317 nodes and 43, 787 edges.Both Geolife and iWet use the Fast Map Matching algorithm [15], a graph-based approach that leverages a probabilistic model and dynamic programming to match GPS points to road segments.In contrast, Lausanne dataset uses a Hidden Markov Model (HMM) as a map matching algorithm [11].The GRETEL model utilizes specific features for nodes and edges, referred to as primal, including the distance between nodes and the average speed limit.These features are combined with all features extracted from the dual hypergraph to serve as edge features.The metrics provided by [3] are used.The average probability of the model selecting a node with non-zero likelihood is measured by the target-probability.In contrast, choice accuracy evaluates the accuracy of an algorithm's decisions at each intersection of the ground-truth path between nodes   and   +ℎ , where ℎ represents the prediction horizon.This metric is calculated for nodes with at least 3 and 1 degrees.The metrics precision-top1 and precision-top5 calculate the accuracy of the model's prediction against the actual target by considering the best predictions, where the number of best predictions ranges from 1 to 5. The path-nll measures the negative log-likelihood.All results presented here are based on five independent executions, including the mean and standard deviation.The experimental analysis is performed for each dataset, where the model's performance is compared when using the hypergraph features and AdaHessian optimizer.The baseline case involves using the primal features and Adam optimizer.Table 1 presents the results of applying GRETEL to the Lausanne dataset.Incorporating dual hypergraph features improves performance across all metrics when the Adam optimizer is used, with the DHnode-in-out-degree feature achieving the highest performance.Similar results are observed when using the AdaHessian optimizer, except for choice accuracy and precision-top1.Both simi larity-hyperedge and DHnode-in-out-degree features outperform the primal features.Of particular interest is the substantial increase in the overall performance of GRETEL with the AdaHessian optimizer.The most significant increase is observed in targetprobability and precision-top1, with a percentage increase of 54.8% and 57.2%, respectively.The same percentage increase around 34% is observed for choice-accuracy-3 and precision-top5.For path-nll and path-nll-3, the percentage increase is 14.8% and 19.6% respectively, while the smallest increase is observed for choice-accuracy, but the value is already high.
The corresponding results using Geolife dataset are presented in Table 2.When using the Adam optimizer, the dual hypergraph features outperform the baseline case only in metrics such as precision-top5, path-nll, and path-nll-3.In other metrics, the primal features show slightly better results.However, when using the AdaHessian optimizer, the dual hypergraph features improve the model's performance in all metrics.The similarity-hyperedge-DHnodein-out-degree and Similarity-hyperedge features exhibit the best performance.In this case, the impact of the two optimizers on the results is mixed, with both positive and negative effects.Specifically, there is a significant increase in precision-top1 and target-probability of 157.04% and 102.3%, respectively.The difference in precision-top5 and choice-accuracy-3 is similar, around 17% while in path-nll and path-nll-3, there is a decrease of 167.6% and 85.57% respectively.
The experimental results for the iWet dataset can be found in Table 3.When the Adam optimizer was utilized, the model exhibited behavior indicative of being stuck in a local minimum and showed no progress during training.This behavior can sometimes occur with the Adam optimizer when training deep learning models, mainly when dealing with non-convex optimization problems.The optimizer may fail to converge to the global minimum.In contrast, while the model's performance with AdaHessian was notably lower than in the other datasets, the use of dual hypergraph features has still improved the model's performance.The use of DHnode-in-out-degree feature increases the model's performance.
Figure 3 presents 4 instances of GPS trajectory prediction in the Geolife dataset.The model used for the prediction uses similarityhyperedge and DHnode-in-out-degree features.

CONLCUSION
GPS navigation data can be represented as graphs, where each node corresponds to a location, and each edge represents a route or a path between locations.Such representation enables the modeling of complex relationships between GPS points.In this work, we presented DualGRETEL+, an enhanced version of the GRETEL model, for accurate predictions of object movement in space.DualGRETEL+ utilizes a dual hypergraph to extract additional features, allowing for a more flexible and expressive representation of relationships within GPS data.The resulting hypergraph can be represented using an incidence matrix, which captures all the information.By incorporating these new features into GRETEL, we significantly improved its performance on three datasets.We also demonstrated the efficacy of the AdaHessian optimizer for further enhancing the model's performance.This study highlights the potential of hypergraphs and second-order optimization methods for analyzing and predicting object movement in GPS data.

Figure 1 :
Figure 1: Algorithmic steps of the proposed framework.

Figure 2 :
Figure 2: Transformation of a simple graph to its corresponding dual hypergraph.The incidence matrix of the original graph and the dual hypergraph is presented.A visual representation of how the directed edges are computed in the dual hypergraph is given.
with elements indexed by  = 1, 2, . . ., , where  corresponds to the edge's id.Applying the DHT algorithm, the row vector     ∈ {0, 1}  of matrix    is transformed into a column vector    *  ∈ {0, 1}  ≡    ⊤  of matrix    * .As a result, the role of  changes, and it indexes the ids of the nodes in the dual hypergraph.Each 1 in column vector    *  corresponds to a value of  indicating which nodes of the dual hypergraph are connected with the hyperedge  * .For example, if    *  has 1 in positions  = 1, 2, 5, it means that hyperedge  * is associated with the nodes of the dual hypergraph  * 1 ,  * 2 , and  * 5 .The corresponding description for the original graph indicates that node  participates in edges  1 ,  2 , and  5 .This is essentially a one-hot encoding scheme for multi-categorical data, where the categories correspond to the edges in the original graph.The extracted feature is the cosine similarity between incidence row vectors      and      , where   is the source node, and   is the target node of an arbitrary edge .The key name similarity-hyperedge refers to the experiments that use this feature.For a directed original graph    ∈ {−1, 0, 1} × has size  × , where  and  are the numbers of nodes and edges, respectively.Each node  is associated with an incidence row vector     ∈ {−1, 0, 1}  .Examining the corresponding row vector     of the node , a value of −1 in position  indicates that node  is a source node in edge .In contrast, a value of 1 in position  indicates that node  is a target node in edge  since  represents the id of edges.If     consists only of {−1, 0} values, then there are no outgoing edges from node , and if it consists only of {1, 0} values, there are no incoming edges to node .To extract new features related to the input and output edge degrees of the dual hypergraph nodes, the edges' direction must be determined.In this case, the key name for experiments is DHnode-in-out-degree.This is accomplished by examining the column vectors    *  of matrix    * and considering the combinations between the associated nodes of the dual hypergraph for each hyperedge  * .Each node  *  with  = 1, 2, . . .,  in the dual hypergraph corresponds to an edge   with  = 1, 2, . . .,  in the original graph.For every combination ( *  ,  *  ), the existence of the path   →   in the original graph that passes through the examined node  is verified.For example, consider hyperedge    in Fig.2, which connects nodes  1 ,  2 , and  3 .The corresponding values in the incidence matrix    * is the column vector    2 with values [1, −1, 1, 0, 0] ⊤ .We check each combination of participating nodes, which are (1 − 2), (1 − 3), (2 − 1), (2 − 3), and (3 − 1), (3 − 2).The original graph has a path for edges  1 →  2 through node   and a path for edges  3 →  2 through node   .

Figure 3 :
Figure 3: Prediction examples in Geolife dataset, using similarity-hyperedge and DHnode-in-out-degree features.Each row showcases different examples.The left-column images (a) and (c) display 5 historical trajectory prefixes, visually represented with markers that transition from blue to purple, with the green circle signifying the actual target location.Red markers indicate the predicted trajectory suffix reflecting the target distribution.In the right-column images (b) and (d), the direction of the predicted trajectory is presented along with the 5 trajectory prefixes.

Figure 3 :
Figure 3: (cont.)The left-column images (e) and (g) display 5 historical trajectory prefixes, visually represented with markers that transition from blue to purple, with the green circle signifying the actual target location.Red markers indicate the predicted trajectory suffix reflecting the target distribution.In the right-column images (f) and (h), the direction of the predicted trajectory is presented along with the 5 trajectory prefixes.

Table 1 :
Performance of GRETEL model when the Lausanne dataset is used.

Table 2 :
Performance of GRETEL model when the Geolife dataset is used

Table 3 :
Performance of GRETEL model when the iWet dataset is used.