MetroGNN: Metro Network Expansion with Reinforcement Learning

Selecting urban regions for metro network expansion to meet maximal transportation demands is crucial for urban development, while computationally challenging to solve. The expansion process relies not only on complicated features like urban demographics and origin-destination (OD) flow but is also constrained by the existing metro network and urban geography. In this paper, we introduce a reinforcement learning framework to address a Markov decision process within an urban heterogeneous multi-graph. Our approach employs an attentive policy network that intelligently selects nodes based on information captured by a graph neural network. Experiments on real-world urban data demonstrate that our proposed methodology substantially improve the satisfied transportation demands by over 30\% when compared with state-of-the-art methods. Codes are published at https://github.com/tsinghua-fib-lab/MetroGNN.


INTRODUCTION
Public transportation, especially the metro network, is a key component in meeting the travel needs of citizens, which not only helps to address the urban equity issue [6,15], but also influences the urban dynamics, including population distribution and economic development [2,14].Therefore, a rational metro network expansion can have a profound impact on the development of the city.
Solving metro network expansion is nontrivial due to two primary challenges.Firstly, consideration must be given to intricate features, including transportation flow matrices between regions and the relationship with existing metro lines [3].Secondly, the selection of regions for metro network expansion poses an NP-hard problem characterized by an enormous solution space comprising all candidate regions, which makes it impossible to conduct an exhaustive search [12].For example, in a medium-sized city with 1000 regions, the solution space can exceed 10 30 , far beyond the capacity of exact solution methods.Additionally, the problem complexity is heightened by various constraints from urban geography [1], including spacing and angles between stations and line segments.
Existing approaches for metro network expansion fall into three categories.First, heuristics are proposed to select regions, such as greedy rules [13], or genetic algorithms [7].Simulated annealing [4] and ant colony methods [11] are also adopted to better navigate the search space.However, these heuristics struggle to handle the various constraints, leading to infeasible solutions.Second, mathematical programming approaches eliminate the solution space by restricting the selected regions in narrow corridors [9].Not surprisingly, the corridor approximation is oversimplified, which blocks out solutions of high quality and thus results in sub-optimal performance of the expanded metro network.Reinforcement learning (RL) has recently been applied to this problem [10], but it neglects the consideration of diverse correlated features.
In this paper, we propose a systematic RL framework for solving complex Markov decision process (MDP) on a graph.The metro network is expanded intelligently by selecting nodes on a heterogeneous multi-graph representing urban regions.To address the challenge of complicated features, we design a novel graph neural network (GNN) to learn effective representations for the heterogeneous multi-graph.Independent message propagation and neighbor aggregation are developed to capture both spatial contiguity and transportation flow between urban regions.To efficiently explore the NP-hard problem solution space, we propose an attentive policy network with an action mask for region selection.The action mask ensures the feasibility of the obtained solutions by addressing various metro network constraints.
To summarize, the contributions of this paper are as follows, • We propose a graph-based RL framework for solving complex MDP, which is able to address the challenging geometrical CO problem of metro network expansion.• We design a novel GNN and an attentive policy network to learn representations for urban regions and select new metro stations.
• Extensive experiments on real-world urban data demonstrate that the proposed MetroGNN can substantially improve OD flow satisfaction by over 30% against state-of-the-art approaches.

PROBLEM STATEMENT
Given a set of nodes, N = { 1 ,  2 , ...,   }, representing the centroids of urban regions (see Figure 1(a)), a metro network M = (V, E) can be described with a subset of nodes V ∈ N , and the edges E (metro lines segments) connecting nodes.Metro network expansion involves the sequential selection of nodes for station construction to maximize its total satisfied OD flow, which can be quantified as follows: where EucDis(  ,   ) and PathDis(  ,   ) are the Euler distance and path distance between   and   via M, respectively, and F   denotes the OD flow between   and   .

METHOD
We propose a graph-based RL framework to solve the complex MDP with following components:

Heterogeneous Multi-graph Model
As illustrated in Figure 2(a), We utilize a heterogeneous multi-graph to faithfully describe the urban regions.In this graph model, the node set N = { 1 , • • • ,   } represents the regions divided by the road network.We then introduce two types of edges to effectively capture the relationships between regions, as illustrated in Figure 1(b).Specifically, the first type links contiguous nodes, capturing their proximity on small spatial scales.The second type connects pairs of nodes with significant OD trips, capturing flow patterns between urban regions on a larger scale.In conclusion, the heterogeneous edges are denoted as follows, where  1 and  2 are threshold values.By expressing the problem with the graphical model, our framework can comprehensively express the spatial relationships and OD characteristics within city.

Encoding Complicated Features with GNN
We design a novel GNN model as the encoder to learn unified representations of complicated features for regions through the heterogeneous edges, as shown in Figure 2(b).Two groups of features are incorporated for each region.The first group directly relates to the OD trips of the region, which includes the total OD access flows   1 , the OD flows with neighboring regions   2 and the OD flows with the regions V where metro stations located   3 .The second group contains auxiliary features, including the population size  1 , the type and number of Points of Interests (POIs) in each urban region  2 , as well as topological features in the graphical model (including  4 , 5 and  6 ).The representation of nodes within the heterogeneous graph is computed as follows, With the proposed GNN model, we unify the complicated features for metro network expansion and obtain effective node representations with spatial contiguity and OD flow information.

Planning with Masked Attentive Policy Network
The solution space of metro network expansion expands exponentially with the number of expansion stations, making it exceedingly challenging to find optimal solutions, especially when considering various constraints such as straightness and spacing.To search the massive solution space under various constraints, we propose an attentive policy network with an action mask for efficient exploration of feasible solutions, as illustrated in Figure 2(c).The agent samples nodes according to the normalized scores, which is calculated by the proposed policy network based on embeddings computed by GNN as follows, where  is the embedding dimension, W  , W  are learnable parameters,    is the relevance and    is the attention score to the current metro network of each node   .Nodes strongly correlated with the current metro network will be emphasized, while that away from the current metro network will be masked, enhancing high quality expansion of metro network.

EXPERIMENTS 4.1 Experiment Settings
Data.We conduct experiments using real-world data from two of China's largest cities, Beijing and Changsha.Specifically, we adopt thousands of authentic urban region divisions delineated by the road structure.Real OD flow data for the whole year of 2020 is utilized, which is obtained from Tencent Map, a prominent mapping and transportation service application in China.
Baselines and evaluation.We compare our model with mathematical programming approaches [9] utilizing two different solvers, CBC (MPC) and GUROBI (MPG).Heuristics baselines are also compared, including Greedy Strategy (GS) [5], Genetic Algorithm (GA) [8], Simulated Annealing Algorithm (SA) [4], and Ant Colony Optimization (ACO) [11].We further include the state-of-the-art RL approach, DRL-CNN [10] for comparison.For each method, we vary the seeds and conduct each experimental configuration 10 times.To evaluate the effect of metro network expansion, we calculate the OD flows satisfied by the expanded metro network according to (1).

Performance Comparison
We assess each method under different scenarios by varying the total budget for expansion and evaluating the corresponding performance.Results of our model and baselines are illustrated in Table 1 and we have the following observations, • DRL-based methods have significant advantages over other approaches.DRL-CNN outperforms other baselines in most cases, achieving higher satisfied OD with an average improvement of 4.4%, demonstrating the superior ability of RL to search a large solution space.Nevertheless, DRL-CNN suffers from severe performance deterioration in complicated scenarios (B=60), with the satisfied OD 5.1% worse than MPC and MPG on average.• Our proposed model achieves the best performance in different scenarios.Our approach substantially surpasses existing baselines under all budgets, substantially improving the satisfied OD flow by over 15.9% against the best baseline in average of three different expansion budgets.Notably, in contrast to DRL-CNN that fails to outperform baselines in complicated scenarios, our approach exhibits more significant advantages in complicated scenarios with a higher budget, with improvements on satisfied OD even over 30%.
To provide a deeper understanding of the reliability and practical applicability of our planning solution within real-world contexts, we present the results generated by the MetroGNN and other baselines based on Beijing, as illustrated in Figure 3.The planning solution generated by our approach covers almost all the areas with   high population and POI densities, which naturally correspond to numerous travel demands.Meanwhile, the new lines generated by our approach are interconnected, and each new line introduces at least two additional interchange stations to the metro network, improving the efficiency of the transportation network.

Ablation Study
We conduct ablation experiments to showcase the efficacy of the graph model and the incorporated complicated features.Graph Modeling.The urban regions exhibit intricate spatial correlations characterized by both short-range proximity and long-range OD flow patterns.By harnessing the graph modeling approach and GNN, our approach effectively captures these complexities among urban regions.As shown in Figure 4(a), when the graph model is omitted, the satisfied OD flow of the expanded metro network drops significantly from 21.80 to only 14.02.Spatial-aware and OD-aware Message Passing.In the proposed GNN model, we design two independent message propagation mechanisms, spatial-aware and OD-aware message passing.As illustrated in Figure 4(a), removing either spatial or transportation edges leads to a significant deterioration in performance, with a decrease of 28.8% and 30.8%, respectively.OD Direct and Auxiliary Features.As shown in Figure 4(b), when the three OD direct features are excluded, our method observes varying degrees of performance degradation.In particular, removing FD3 results in the largest performance drop (-20.4%), which is reasonable since it reflects the direct benefit of adding a region to the metro network.We also evaluate the contribution of auxiliary features, as demonstrated in Figure 4(c).In particular, removing population (FA1) brings about the largest deterioration, as population information is quite important when considering metro network expansion.

CONCLUSION
In this paper, we investigate the problem of metro network expansion, and propose MetroGNN, a systematic graph-based RL framework that can solve complex node selection MDPs on the graph.The proposed model unifies complicated features with GNN and explores the solution space efficiently with an attentive policy network and a carefully designed action mask.Through extensive experiments, our approach demonstrates a significant improvement on travel demand satisfaction, increasing the satisfied OD flow by over 15.9% compared to state-of-the-art baselines.Looking ahead, we plan to investigate the performance of the proposed systematic RL framework in other graph-based decision tasks, such as influence maximization on social media platforms.

Figure 1 :
Figure 1: (a) Regions determined by road network.(b) Heterogeneous multi-graph, where nodes represents regions.The black solid line and the orange dashed line correspond to spatial contiguity and OD associations between regions.(c) Schematic of our approach.

Figure 2 :
Figure 2: (a) The schematic of metro network expansion process.At each step, the agent selects a node that either extends existing lines ( 0 ) or constructs new lines ( 1 ).We use distinct colors for different lines, and purple for interchange.(b) The proposed GNN model, where a spatial-aware and OD-aware message passing mechanism is developed.(c) The proposed masked attentive policy network for node selection.

Figure 3 :
Figure 3: Visualization of metro network expansion for Beijing.We use colors to distinguish between different metro lines, and use boldface nodes to indicate expansion solution.Black nodes indicate stations on the initial lines and their extensions, and red nodes represent stations on new lines.Regions colored with red and green indicate areas where population and POIs are clustered, respectively.The darker the color, the higher the density.

Figure 4 :
Figure 4: Performance of MetroGNN and its variants that remove different elements, including whole graph model (G), spatial edges (Et), transportation flow edges (Eo), OD direct (FD) and auxiliary (FA) features.Best viewed in color.
and the remaining budget   .• Action.The action   corresponds to the selection of a single node in N .• Reward.The intermediate reward   for action   is defined as (M  ) −   (M  −1 ).• State transition.The selected node is regarded as an expansion of a metro line if there is no sharp bend; otherwise, it is considered as the start of a new line to expand the current metro network.