RDGCN: Reinforced Dependency Graph Convolutional Network for Aspect-based Sentiment Analysis

Aspect-based sentiment analysis (ABSA) is dedicated to forecasting the sentiment polarity of aspect terms within sentences. Employing graph neural networks to capture structural patterns from syntactic dependency parsing has been confirmed as an effective approach for boosting ABSA. In most works, the topology of dependency trees or dependency-based attention coefficients is often loosely regarded as edges between aspects and opinions, which can result in insufficient and ambiguous syntactic utilization. To address these problems, we propose a new reinforced dependency graph convolutional network (RDGCN) that improves the importance calculation of dependencies in both distance and type views. Initially, we propose an importance calculation criterion for the minimum distances over dependency trees. Under the criterion, we design a distance-importance function that leverages reinforcement learning for weight distribution search and dissimilarity control. Since dependency types often do not have explicit syntax like tree distances, we use global attention and mask mechanisms to design type-importance functions. Finally, we merge these weights and implement feature aggregation and classification. Comprehensive experiments on three popular datasets demonstrate the effectiveness of the criterion and importance functions. RDGCN outperforms state-of-the-art GNN-based baselines in all validations.


INTRODUCTION
Aspect-based sentiment analysis (ABSA) is a fine-grained task that focuses on predicting the sentiment polarity of aspect terms within sentences [38].The sentence "Great food but the service was dreadful" in Figure 1 serves as an example, with the aspects "food" and "service" exhibiting positive and negative sentiments, respectively.
Early ABSA research [21] primarily relied on manually designed syntactic features.A large number of neural network methods have recently emerged [5], which are non-labor-intensive and bring huge performance improvements.Since context words in a sentence may have different importance for a given aspect, attention mechanisms are widely integrated in recurrent neural networks [15,31], memory networks [23], gated networks [33,37], and convolutional networks [33], etc.However, there may be multiple aspect terms and different opinions in a sentence.Attention-based weighting can make aspect representations susceptible to interference from irrelevant opinions.Taking Figure 1 as an example, for the aspect "service", both opinion words "dreadful" and "Great" may be assigned large attention scores, with the latter potentially hindering sentiment assessment.
Thanks to advances in syntactic parsing with neural networks, the dependency trees of sentences are becoming more accurate [4], prompting studies, such as [35,38,43], to model explicit connections between aspects and their associated opinion words.As a paradigm good at learning topological data, graph neural network (GNN) [32] is widely applied in ABSA methods to utilize dependency structures.Based on the syntax within dependency trees, GNN-based methods are typically divided into three streams.The first branch exploits the discrete (or probabilistic) topology of trees [8,9,12,14,22,24,34].The second branch focuses on the diversity of dependency types in trees [25,26,30,36].The third branch utilizes tree-based minimum distances, that is, the number of edges on the shortest path between two words [30,39,42].Regularly, type-and distance-based methods involve using the raw topology of dependency trees.This is because types are attached to the topology, which in turn is the special case of minimum distances all equal to 1 [2].Despite the success of these studies, issues of underutilization of syntax still persist.
First, because the syntax of types and distances is fundamentally different, intuitively applying the same processing strategy to them may be insufficient [2,30].For example, for dependency types, since their weights are implicit without experts, it is workable to calculate the importance based on the attention mechanism [26,30].However, the attention coefficients may obscure the original explicit syntactic importance of the minimum tree distances, resulting in meaningless calculations from scratch [30].Second, how to effectively implement calculations for explicit distance weights is underexplored, which is not limited to GNN-based research.Most studies, such as [19,35,42], equidistantly down weights for induced dependencies in ascending order of tree-based distances.Even though this strategy is proven to be effective in preserving and distinguishing distance syntax, it still suffers from an unreasonable equidistant setting.For example, for sentences with a large distance range, usually, only a small number of dependencies with small distances need to be distinguished by importance.In contrast, the importance of dependencies with too large distances may all be close to 0 rather than equidistant.
To address these problems, we propose a reinforced dependency graph convolutional network (RDGCN) using different importance calculations for dependency types and distances.Specifically, at first, we propose a new importance calculation criterion for the minimum distances over dependency trees, which imposes constraints on both importance discriminability and distribution.Second, according to this criterion, we propose a distance-importance function consisting of two sub-functions.More specifically, to increase discriminability, a power-based sub-function sets the [0, 1]-weights of dependencies with the minimum and maximum distances to 1 and 0, respectively.To avoid unreasonable arithmetic distribution, an exponential-based sub-function is designed.It disproportionately reduces the weights of induced dependencies, whose gradients gradually tend to parallel along the direction of increasing distance.Given the absence of prior knowledge on the range of valuable distances, we use reinforcement learning (RL) [29] to search for the exponential curvature to control the concave arc distribution of the dependency weights.In this way, the tree-based distance importance weights fall off smoothly rather than abruptly, preserving the possibility of exploiting dependencies with large distances.Besides, RL-based automatic search for the best curvature avoids tedious manual parameter adjustment, rendering RDGCN highly portable across different ABSA tasks.Third, because dependency types do not provide explicit syntactic importance such as minimum distances, we introduce a global attention mechanism to differentiate type weights.Finally, we combine distance and type weights, and perform feature aggregation and sentiment prediction based on GCN.The major contributions are summarized as follows: • We propose a novel ABSA model that efficiently captures both distance and type syntax through different strategies.
• This work is an important attempt on how to calculate the non-equidistant importance for explicit distance syntax.
• We evaluate the proposed RDGCN on three popular datasets whose experimental results and analysis verify the rationality of the criterion and functions as well as the superiority of the performance.

PRELIMINARIES
In this section, we describe aspect-based sentiment analysis (ABSA) as well as graph neural network (GNN)-based ABSA.

ABSA
Given a sentence-aspect pair  − , where  =< 1 ,  2 , ...,   > is an aspect and sub-sequence of the sentence  =< 1 ,  2 , ...,   >,   and   are the -th and -th words (tokens) in  and  ,  and  are the lengths of the sentence and aspect, respectively.For ABSA, it needs to predict the sentiment polarity of  by drawing information from  , i.e.,  − → , where  is the polarity category like positive and negative.The current dominant approach is to derive the contextual representations of sentences based on encoders such as Transformer [27] and BERT [10].The sequence  can be transformed into a lowdimensional embedding matrix E ∈ R  × , where the -th row of E represents the feature vector   with dimension  of the -th token.Afterwards, aspect-specific features F ∈ R  × are derived from E.

GNN-based ABSA
GNN-based ABSA typically needs to introduce additional syntactic structures, which often come from dependency tree-based topology, type, and distance views.For example, given a sentence-aspect pair  − , the dependency tree  corresponding to the sentence  can be yielded by an existing syntactic dependency parser.Then, the graph  can be abstracted as an adjacency matrix A ∈ R  × , whose entry from the -th row and -th column indicates the dependency weight between the -th and -th tokens of  .The dependency weights of A in different GNN-based models are usually obtained from different computational strategies.Combined with the token feature matrix E output by the sentence encoder, GNN learns the syntactic structure patterns of sentences through feature aggregation.The aggregation operation at the -th layer can be formulated as the following form: where  ( −1)  and    are the input and output feature vectors of the th token (node)   .A[, ] > 0 represents that there is a dependency (edge) between tokens   and   , and  ( −1)  is the neighbor features of   to be aggregated.AGG  (•) denotes an aggregation function like attention [28] and convolutional [11] operations, whose superscript  usually denotes a specific in-layer feature transformation module.Moreover, ⊕ is an operation to combine the features of   and that of its neighbors, such as averaging and concatenation, and  (•) is an activation function, such as Tanh and ReLU.After the aggregation is completed at the final layer  of GNN, the sentence feature matrix E  ∈ R  × will be obtained and used for follow-up aspect-specific Step-1 Step-3 Step-4 Step-5 (aspect)

METHODOLOGY
In this section, we formulate the novel reinforced dependency graph convolutional network (RDGCN).Its overview is shown in Figure 2.

Criterion Construction
Previous studies like [19,35,39,42] introducing minimum distance syntax based on dependency trees, typically abide by the calculation criterion that the larger the distance, the less important the induced dependencies (edges) are.Specifically, they usually decrease weights equidistantly from small to large distances based on reciprocals and differences.Although such strategies are confirmed to preserve and differentiate distance syntax efficiently, the equidistant settings are often unrefined and unreasonable.Considering that there is no work to explore how to effectively implement importance calculations for explicit distance syntax, we pioneer a universal calculation criterion.More specifically, we argue that an excellent importance calculation criterion for minimum tree distances should impose requirements in terms of discriminability as well as distribution: Discriminability requires that the importance weights acquired from different distance values should be as discriminable as the distances themselves.Because the importance gap between adjacent distance values is determined by the distribution, in this case, we only constrain the extreme of the weights.In other words, the importance weights with the minimum tree distances of 0 (lower bound) and  (upper bound) should be far apart, leaving space for differences in intermediate values.Hence, for weights scaled to [0, 1], the maximum and minimum weights are best fixed at 1 and 0. Distribution means that the weight distribution should be feasible and reasonable while ensuring discriminability.For example, given a sentence with a very large upper bound  of the minimum distances, since the syntax gradually blurs as the distance increases, the weight discrepancy between two adjacency distances with a larger value should be smaller than that with a smaller value.
In other words, the degree of discriminability of the importance of different distance intervals should not be the same.Thus equidistant decreases that gives little attention on important distance intervals are sub-optimal.
Even though this criterion specifies the above two requirements, following it to design distance-importance functions still introduces three challenges.Firstly, unlike the arithmetic sequence distribution of weights that requires only one linear function, there is no natural function that fulfills the criterion.Secondly, due to insufficient prior knowledge, it is challenging to determine the key distance intervals requiring greater discriminability.Thirdly, because the importance calculation covers all input sentences, the computational complexity of the designed function should be acceptable.We will explore how to address these challenges in the next section.

Distance-importance Function
As shown in Figure 2, given a sentence-aspect pair  − , we utilize the Stanza parser1 developed by the Stanford NLP Group to perform syntactic analysis on  and generate its dependency tree .The tree  is a special kind of graph whose initial topology encodes different types of directed dependencies across token nodes.In particular, we regard dependencies as undirected edges, so any two tokens in a sentence are reachable and the minimum tree distance is the number of edges on the shortest path.Furthermore, the dependency tree  is converted into a settled induced syntactic graph   , which is fully connected and can be represented as a symmetric adjacency matrix A  ∈ R  × , whose entry A  [, ] represents the distance value between the -th and -th tokens.Because edge weights are usually inversely proportional to their corresponding distance values, we need a distance-importance calculation function to transform A  .
Following the criterion set forth in Section 3.1, here we design the function IMP  (•), which can address three challenges step-by-step.Specifically, inspired by previous linear functions, a straightforward solution to meet both requirements is to append a constant function to a linear one, which can be expressed as: where  represents the minimum distance value to be calculated,  represents the predefined distance boundary, and  represents the slope of the linear function.The function approximately satisfies the criterion, whose importance weight distribution is shown in Figure 3.However, such a way of placing different functions in different distance intervals and splicing them is too imprecise, leading to too steep weight changes near the  value.It is unreasonable that the edge with a distance value of ( − 1) has importance, but the next edge with a value of  suddenly has no effect.In addition, it is hard to determine the optimal  for deciding the key interval.
In order to further enhance IMP  (•), here we redesign it from the two requirements of the criterion and replace the above strategy of concatenating functions by intervals.In particular, we design two sub-functions spanning the entire distance interval [1, ] to satisfy the discriminability and distribution.To increase discriminability, a power-based sub-function is proposed to maximize the gap of the edge weights of the minimum and maximum distances, which can be expressed as follows: To avoid arithmetic and non-smooth distributions, we introduce another exponential-based sub-function to disproportionately reduce the importance of induced dependencies, which can be defined as: where  denotes the natural constant (a.k.a.Euler number), and  is the exponential curvature.Then, we combine the two sub-functions by a multiplication operation ⊕, which can be expressed as follows: The functions corresponding to Equations (2-5) are shown in Figure 3. Since the function IMP  −2 (•) (in blue) controlling the distribution has a maximum weight of 1 while the minimum weight (when  =  ) is not necessarily very small, the IMP  −1 (•) (in orange) guiding the range of weights simply scales the minimum weight to 0. In addition, the weight distribution of IMP  −1 (•) is close to 1 in a broad distance interval, protecting the concave arc distribution of IMP  −2 (•).This is why the non-equidistant distributions of IMP  (•) (in green) and IMP  −2 (•) are close to fit.Compared with the strategy (in purple) of concatenating functions by intervals, the new IMP  (•) promotes a smoother distribution while satisfying the criterion, preserving the possibility of exploiting induced dependencies with large distances.As shown in Figure 3, different curvatures contribute to different weight distributions and key distance intervals of IMP  (•) (red & green).Therefore, the selection of the optimal curvature  is crucial, which directly affects the performance of GNN-based ABSA.From an application point of view, due to insufficient prior knowledge, it is laborious and inefficient to find the optimal  by manual tuning, especially when the candidate set is large.From an implementation perspective, since the curvature  does not directly participate in model training, it is infeasible to optimize  using backpropagation.Hence, we use reinforcement learning (RL) [16,17,40,41] to search for optimal curvatures for different tasks.Concretely, we define the problem of finding optimal curvatures as a Two-Armed Bandit [29] {{ + ,  − }, REW(•), TER(•)}. + and  − denote two actions, REW(•) is the reward function, and TER(•) is the termination function: • Action: The action space represents how the RL-based module updates the curvature  according to the reward.Here, we designate  + and  − as increasing and decreasing a fixed value  to the current  according to the polarity of the reward.
• Reward: Since we aim to improve ABSA, the gap in validation accuracy by adjacent time intervals is considered a reward indicator.The reward function can be expressed as follows: where  represents the index of the time interval containing the predefined number of batches, which also implies the update frequency of .In addition, { }  represents the validation set, ACC(•) is the function applied to acquire the accuracy of sentiment classification.Since the key distance interval will gradually shrink as  increases, we perform action  + when the reward is +1 (or  − on the contrary), thereby gradually condensing the syntactic information.
• Termination:  will be updated continuously until it satisfies: where  is the number of historical rewards.The inequality suggests that the reward has converged, and  continues constant.Therefore, the discriminative degree of critical distance intervals is determined dynamically, making RDGCN highly portable across different tasks.The computational complexity mainly comes from the functions themselves and RL.The functions cost  (1+log( )), the RL module costs  (+1).Compared with those linear functions, the complexity of the presented IMP  (•) changes from a constant level to a linear level, which is still relatively excellent.

Type-importance Function
Unlike distances, the importance of dependency types is less explicit in the absence of expert knowledge.Hence, we use a global attention mechanism to calculate the importance weights of type edges.More specifically, we first number the types and transform the tree  into another induced syntactic graph,    , whose adjacency matrix is A   ∈ R  × .Since token nodes usually do not have type-labeled self-loops, we then use custom "root" and "none" type numbers to fill the diagonal and meaningless induced edges of A   , respectively.Finally, we initialize the type feature matrix H ∈ R  × , where  is the total number of all dependency types.Thus the type-importance function IMP   can be expressed as follows: where  ∈ R  ×1 is the transposed query vector, and  ∈ R 1× is the weight vector normalized by softmax.In particular, to preserve the raw topology of the dependency tree , we use its initial adjacency matrix A as the mask mechanism to remove the induced "none" type edges.In this way, the three syntactic dependency views mentioned in Section 1 are all introduced into RDGCN.Furthermore, we merge the induced graphs   and    into one graph  * with an addition operation ⊕, which can be abstracted into a matrix form as follows: where A * symbolizes the adjacency matrix of the induced syntactic graph  * .Each entry A * [, ] encodes an induced edge weight and belongs to [0, 2].

Feature Aggregation
In this part, we perform feature aggregation on the adjacency matrix A * ∈ R  × and the initial feature matrix E = E 0 ∈ R  × obtained by the sentence encoder to enhance the token representations.Since A * already has the weights, we implement the aggregation function AGG(•) by convolution aggregation, whose aggregation process of the -th layer can be expressed as: where W  represents the feature transformation matrix of the -th layer.After iterating  times, we obtain the final feature representation matrix E  ∈ R  × of the sentence  .

Pooling and Classification
Since the aspect  =< 1 ,  2 , ...,   > is a sub-sequence of  , we first filter out the non-aspect features of E  to obtain the aspect-specific feature matrix F ∈ R  × , and then perform mean pooling on F to obtain the aspect-specific vector, which can be expressed as follows: ) Then, we feed  into a classifier consisting of a linear function and a softmax to yield a probability distribution over the polarity decision.Finally, we optimize the parameters based on the cross-entropy loss: where { }  is a set containing all training  − pair samples,  * is the -dimensional vector of each training aspect, Z ∈ R  × and  ∈ R 1× indicate the trainable parameters and bias of the classifier. * ∈ R  ×1 is the transposed polarity label vector corresponding to  * , and  denotes the total number of polarity categories.

EXPERIMENTS
In this section, we present the experimental settings consisting of datasets and evaluation, baselines, and implementation details.We then perform classification tasks, case study, ablation study, etc., to address three research questions (RQs): • RQ1: How does RDGCN perform on the ABSA dataset compared to state-of-the-art (SOTA) baselines?• RQ2: How much do the syntactic importance functions included in RDGCN improve performance?• RQ3: How much does changing important hyperparameters of RDGCN affect ABSA?

Datasets and Evaluation
Following previous ABSA works, we evaluate the proposed RDGCN on three popular fine-grained datasets, namely Restaurant, Laptop, and Twitter.Among them, Restaurant and Laptop are from SemEval-2014 task 4 [20], which comprise sentiment reviews from restaurant and laptop domains, respectively.Moreover, Twitter is collected and processed by [6] from tweets.Following most studies like [3,4,12], we remove these samples with conflicting polarities or with "NULL" aspects in all datasets, where each aspect is annotated with one of three polarities: positive, negative, and neutral.In order to measure the effectiveness of all methods, we utilize two metrics, i.e., accuracy (Acc.) and macro-F1 (F1), to expose their classification performance.

Baselines
To comprehensively evaluate the performance of RDGCN, we compare it with SOTA baselines, which are briefly described as follows: 1) ATAE-LSTM [31] is an attention-based LSTM model that focuses on aspect-specific key parts of sentences.
2) IAN [15] interactively calculates attention scores for aspects and contexts, yielding aspect and context representations, respectively.
3) RAM [3] leverages a recurrent attention mechanism on sentence memory to extract aspect-specific importance information.4) MGAN [7] applies a fine-grained attention mechanism to capture token-level interactions between aspects and contexts.5) TNet [13] transforms token representations from a BiLSTM into target-specific representations and then uses the CNN layer instead of attention to generate salient features for sentiment classification.6) PWCN [35] calculates the proximity weights of context words for the aspect according to the minimum distances over the dependency tree, and applies these weights to enhance the output of BiLSTM to obtain aspect-specific syntax-aware representations.7) ASGCN [34] applies GCN on the raw topology of the dependency tree to introduce syntactic information.8) TD-GAT [9] leverages a graph attention network (GAT) [28] to capture syntactic dependency structures.9) BiGCN [36] performs convolutions over hierarchical lexical and syntactic graphs to integrate token co-occurrence information and dependency type information.10) kumaGCN [1] associates dependency trees with aspect-specific induced graphs, and applies gating mechanisms to obtain syntactic features with latent semantic information.11) DGEDT [24] jointly considers the representations learned from a Transformer and graph-based representations learned from the corresponding dependency graph in an iterative interactive manner.12) R-GAT [30] transforms the dependency tree into a star-induced graph with edges consisting of minimum distances and dependency types, and introduces a relational GAT for aggregation via attention.13) T-GCN [26] distinguishes relation types via attention, and uses an attentive layer ensemble to learn features from many GCN layers.14) DualGCN [12] simultaneously introduces syntactic information and semantic information through SynGCN and SemGCN modules.15) SSEGCN [39] acquires semantic information through attention, and equips it with syntactic information of minimum tree distances.16) BERT & Model+BERT [10] represent the pre-trained language model (PLM) BERT and the model with BERT as a sentence encoder.

Implementation Details
For all experiments, we employ pre-trained 300-dimensional Glove vectors [18] to initialize token embeddings.Following the common settings of previous studies such as [12,39], we vectorize the part-ofspeech (POS) information of tokens and their relative position with respect to the boundary tokens of the aspect.Then, we concatenate the 30-dimensional POS and position vectors with the Glove vectors, and input them into a BiLSTM model to get the initial token feature representations.In addition, we set the hidden dimension of BiLSTM and GCN to  = 50, the number of model layers to  = 2.To ensure the optimization space, we leverage the dropout of 0.7 and 0.1 to the input of BiLSTM and the output of the framework (BiLSTM & GCN), respectively.We optimize RDGCN2 using the Adam optimizer with a learning rate of 0.002, where the total number of training epochs is 20 and the batch size is 32.For the introduction of syntax, we utilize an off-the-shelf Stanza parser to get syntactic dependency trees, and compute the minimum tree distances among tokens (including inner tokens of the aspect term).The upper bound of the distance values is set to  = 10.Since the datasets do not contain the validation set, the test accuracy is used to implement the reward function ACC(•).In addition, we set the predefined range of curvature to  ∈ [0.1, 2], Table 2: Case study results.Different aspects within the same sentence are colored differently.The P, N, and O are positive, negative, and neutral, respectively.The Label column merges the same RDGCN column as it, which displays the true labels.Try the rose roll (not on menu).

P-N P-O P-O
the update value for  of RL actions is  = 0.1, the update frequency is 2 (i.e.,  contains 2 batches), the size is  = 10,  is initialized to 0.1 because the initial rewards are all +1 as performance increases.For Model+BERT, we utilize the bert-base-uncased3 English version.

Classification Results
To answer RQ1, we compare our RDGCN with all baselines on three ABSA datasets, and the classification results are depicted in Table 1.
The classification results justify that the proposed RDGCN (+BERT) exhibits overall better sentiment performance than SOTA baselines.However, RDGCN lags behind DualGCN as well as SSEGCN on the Restaurant dataset.A possible explanation is that the two baselines additionally capture the semantic information of sentences based on self-attention, which refines the aspect representations.Because the features encoded by BERT already contain rich semantics, RDGCN-BERT remedies this deficiency in the accuracy metric.Moreover, we can observe that the performance of RNN+attention-based baselines is generally weaker than that of RNN+GNN-based baselines.This is because GNN enhances aspect representations by learning syntactic dependency trees or induced trees, which shows that capturing the structural patterns of syntactic parsing can indeed improve analysis performance.Among GNN-based ones, models (such as R-GAT and SSEGCN) that consider multiple syntactic views outperform models, such as DGEDT and DualGCN, that only consider the raw topology of dependency trees, particularly on the Laptop and Twitter datasets.This implies that syntaxes in different views may be complementary, and effectively and comprehensively mining syntactic information can further improve GNN-based ABSA.Further, the existing SOTA baselines that incorporate distance syntax either intuitively reduce the distance weights equidistantly (PWCN) or obstruct the original explicit weights by attention and masks (R-GAT and SSEGCN), both of which are inferior to RDGCN.It shows that the proposed criterion together with the distance-importance function are necessary and effective.Last but not least, we can observe that the powerful BERT outperforms most baselines and RDGCN+BERT achieves the biggest breakthrough in BERT performance compared to other algorithms, justifying that RDGCN acquires more valuable syntactic knowledge for ABSA.In general, RDGCN (+BERT) performs the best on ABSA tasks compared to the SOTA baselines.

Case Study
To better showcase the superiority of our RDGCN, we conduct case studies on some example sentences, as depicted in Table 2.The first sentence "Great food but the service was dreadful" owns two aspects ("food'' and "service'') with opposite sentiment polarities.The aspect "quality'' in the second sentence does not have any obvious opinion token.The interfering token "Biggest'' in the third sentence "Biggest complaint is Windows 8" may neutralize the negativity of the opinion token "complaint".The fourth sample has the above three difficulties at the same time.On the one hand, we argue that the estimations of the attention-based R-GAT are susceptible to opposite or interfering opinion tokens in the first, third, and fourth sentences.On the other hand, SSEGCN fails to deal with the second sentence lacking explicit opinion tokens.A possible explanation is that the dependency type syntax is more suitable for such cases than the tree distance syntax, which is not available in SSEGCN.Consistent predictions with true labels verify that RDGCN captures more complementary syntactic information than SOTA baselines.

Ablation Study
To answer RQ2, we conduct ablation studies to examine the effect of syntactic importance functions on model performance.As depicted in Figure 4(a) and Figure 4(b), the performance of RDGCN decreases regardless of whether it is without the distance-importance function    in Table 1, again showing that IMP  (•) takes full advantage of the explicit distance syntax and has good portability on several datasets.Overall, the importance calculation functions contained in RDGCN for the minimum distances and dependency types are efficacious in improving performance, particularly the former, which dynamically searches for exponential curvature based on reinforcement learning (RL) and generates weight distributions that vary with the interval.

RL Process Analysis
In this part, we focus on the RL-based module to further examine the distance-importance function IMP  (•).To present the RL process, we plot the updating of the curvature  that controls the concave arc distribution of the distance edge weights in Figure 5 (a).As the time index  increases, we can observe that the curvatures corresponding to different datasets are updated towards disparate destinations and remain unchanged after the termination condition is met.Although  does not directly participate in training, it will influence the edge weights of induced graphs, which are the important factors affecting performance.Therefore, we tune the history reward number  small to speed up the end of RL.As per Figure 5(b), it is clear that after  is fixed, the performance of RDGCN is still improving.In this way, the RL module can quickly achieve the best , making the performance improve faster, while ensuring a long-period stable training process.
In other words, because the training is less stable during the update process, RL chooses  during the early period of faster performance improvement, and instructs RDGCN with stable  in the subsequent process.To evaluate the search results, we quantify the influence of  on model performance in ).We find that the IMP  (•)-based RDGCN outperforms the control model on all datasets, which implies that our smooth descent strategy is more feasible and reasonable than steep descent.Besides, we compare the function IMP  (•) with attention-based and equidistant strategies on the sentence-aspect pair sample "However I can refute that OSX is FAST"-"OSX".Figure 7 illustrates the importance weights of context tokens for the aspect term "OSX" based on different strategies.First, even though the attention allocates a greater importance weight to the opinion token "FAST", it pays some attention to noise tokens like "However" and "refute" that may interfere with classification.Second, the two types of weight descent also allocate greater importance to the key opinion token "FAST" while eliminating the interference of "However".Third, the distance-importance function IMP  (•) using non-equidistant descent further excludes all contexts except "FAST" compared to the equidistant descent method, which is beneficial for improving ABSA.Thus, it makes sense to design a function for the importance calculation of explicit distance syntax via the criterion.

Hyperparameter Analysis
To answer RQ3, we examine the influence of four hyperparameters (the input dropout, the maximum of the tree-based distances  , the hidden dimension , and the number of GCN layers ) of RDGCN on ABSA performance of the Laptop dataset, as depicted in Figure 8.
For the dropout, the performance first increases and then decreases because too small or too large dropout value will lead to overfitting or insufficient input features.For the  , a lower upper bound of the distances may miss part of the useful syntax for ABSA.In addition, due to the requirements of the proposed criterion on the weights of induced edges with larger distances, the performance does not show a significant landslide as  increases.For the , feature vectors and matrices with too small dimensions are difficult to encode sufficient feature information, resulting in sub-optimal performance.For the , too many aggregation layers may cause feature over-smoothing, which is common in GNN-based ABSA models.Based on the above observations, we argue that the influence of these hyperparameters in the regular range on the aspect sentiment prediction performance of RDGCN is acceptable, which once again verifies that the proposed RDGCN has excellent stability.

CONCLUSION
This paper develops RDGCN to improve the importance calculation of dependency types and tree-based distances for ABSA.Extensive experiments justify the effectiveness of RDGCN.

Figure 1 :
Figure 1: An example sentence and its dependency tree, with aspect terms shown in bold red."ROOT" is a virtual word, and the symbols below each real word represent parts of speech.

Figure 3 :
Figure 3: Weight distributions for distance-importance functions with different slopes or curvatures.The boundary value  of the minimum tree distances is 10.

Figure 4 :
Figure 4: Visualization of ablation study results, where "w/o" is without and "+ Dis." means the introduction of the distance syntax.The left two figures depict the removal of the importance function in RDGCN, while the right two show the introduction of the distance-importance function IMP  (•) to the baselines, where the green bars indicate the improved performance.

Figure 5 :
Figure 5: The process of the RL module, whose hollow points indicate that RL stops searching at the current time index .

Figure 6 :
Figure 6: The effect of curvature values on performance.The left figure depicts the performance of RDGCN with different , while the right shows the performance difference between RDGCN and a Eq.2-based control.

Figure 8 :
Figure 8: Visualization of the performance impact of different key hyperparameters on the Laptop dataset.

Figure 6 (
b).On the one hand, different  values yield different performance on every dataset.This is expected since different  determine different key distance intervals.On the other hand, the  values searched for different tasks all facilitate the RDGCN to yield excellent performance, indicating the effectiveness of the RL module.Further, we swap the function IMP  (•) with the importance function via interval concatenation adopted in Equation2to construct the control model.The box plots in Figure6(b) depict the average results yielded from RDGCN and its control baseline on all optional curvatures ( ∈ [0.1, 2]) or slopes ( ∈[1,10]

Table 1 :
Classification results (%).The best results for all models are in bold, while the second-best results are in italics.