Context-aware Event Forecasting via Graph Disentanglement

Event forecasting has been a demanding and challenging task throughout the entire human history. It plays a pivotal role in crisis alarming and disaster prevention in various aspects of the whole society. The task of event forecasting aims to model the relational and temporal patterns based on historical events and makes forecasting to what will happen in the future. Most existing studies on event forecasting formulate it as a problem of link prediction on temporal event graphs. However, such pure structured formulation suffers from two main limitations: 1) most events fall into general and high-level types in the event ontology, and therefore they tend to be coarse-grained and offers little utility which inevitably harms the forecasting accuracy; and 2) the events defined by a fixed ontology are unable to retain the out-of-ontology contextual information. To address these limitations, we propose a novel task of context-aware event forecasting which incorporates auxiliary contextual information. First, the categorical context provides supplementary fine-grained information to the coarse-grained events. Second and more importantly, the context provides additional information towards specific situation and condition, which is crucial or even determinant to what will happen next. However, it is challenging to properly integrate context into the event forecasting framework, considering the complex patterns in the multi-context scenario. Towards this end, we design a novel framework named Separation and Collaboration Graph Disentanglement (short as SeCoGD) for context-aware event forecasting. Since there is no available dataset for this novel task, we construct three large-scale datasets based on GDELT. Experimental results demonstrate that our model outperforms a list of SOTA methods.


INTRODUCTION
Event forecasting [51] is one of the long-standing and challenging tasks, including forecasting of pandemic outbreak [11], civil unrest [34], international conflicts [28], etc. Accurately predicting such vital events enables people to prepare in advance to prevent catastrophic results or minimize potential influence.Automatic event forecasting targets at modeling the rich relational and temporal patterns endowed by events observed in history, thus making accurate forecasting to events in the future.The development of data science and artificial intelligence endows human with stronger capability for automatic event forecasting, which has garnered more and more attention in recent years.
One of the prominent formulations for event forecasting is to define an event as a quadruple, i.e., (, , , ), where ,  , , and  refer to subject, relation (event type) 1 , object, and timestamp, respectively.At each timestamp, all the quadruples form an event graph.Given a query of (, ,  + 1) in the future and the list of historical event graphs until , we aim to predict the missing object.Based on such a structured formulation, a plethora of works have been emerging in recent years.They have applied structured relational and temporal information (e.g., RE-NET [22], RE-GCN [30]), time intervals (e.g., EvoKG [36]), and texts from ontology and news articles (e.g., Glean [9] and CMF [10]), etc. for event forecasting.
Albeit the remarkable achievements of current works [5,9,10,22,30], they still suffer from the following limitations.First, existing structured events tend to be classified as high-level general events, while more specific and informative events are few.As shown in Figure 1(a), for the well-known GDELT [28] dataset, while the hierarchical event type ontology [4] defines a large number of finegrained event types, most actual events were being classified into (a) Most current events fall in the coarse-grained and higher level types of the ontology, while more informative finegrained events are fewer.(b) Out-of-ontology and diverse contexts affect events.Context can provide more fine-grained information to enhance the event forecasting performance.event types in the higher levels of event ontology and fine-grained events are fewer.Consequently, the expressiveness of events is severely restricted, resulting in less utility in practical scenarios.Second, events follow the types in the predefined ontology, which is usually fixed due to the difficulty of construction.For example in political event forecasting, the well-known CAMEO [4] ontology costs ten years to be finalized.It is difficult to update the ontology timely, thus newly emerging out-of-ontology information is unable to be covered by the outdated ontology 2 .Worse still, events are greatly influenced by out-of-ontology contextual information, such as the situation and circumstance.As the example shown in Figure 1(b), given diverse contexts, the entity President Y performs distinct roles and actions w.r.t. the various countries of the world.Such diverse contexts that provide clues for certain situations, are crucial or even determinant to event forecasting, and they cannot be adequately modeled solely based on the event ontology.
To address these limitations, we introduce context into existing event representation as supplementary information and define a novel task named context-aware event forecasting.We associate each event with a categorical context, elaborating the event's occurrence situation or condition.Then each event is extended from a quadruple to a quintuple, i.e., (, , , , ), where  denotes the context.The incorporation of context brings multiple benefits to event forecasting.First, it endows more fine-grained information to each event, thus making the coarse-grained events more specific and expressive.Second, the flexibly defined context is able to offer crucial information about the circumstances or backgrounds of the events, narrowing down the potential forecasting space.As the example shown in Figure 1(b), given different contexts of Olympics 2016 or G20 2022 Summit, the target countries that President Y would make a visit to will be different.Despite various merits, integrating context into the problem of event forecasting poses new challenges 2 Both out-of-ontology and outdated ontology refer to the same problem in this work.to existing methods.First, one entity under different contexts may trigger distinctive events, and how to capture the relational and temporal patterns given a certain context is not trivial.Second, events from different contexts are also correlated with each other, and how to delicately model the collaborative associations among contexts is vital to accurate event forecasting.
To tackle the above challenges, we borrow the idea from graph disentanglement representation learning and propose a general framework SeCoGD (Separation and Collaboration Graph Disentanglement), for context-aware event forecasting.It consists of two stages: separation and collaboration.First, in the separation stage, we utilize the context as a prior guidance to separate the event graphs into multiple sub-graphs.Then, we resort to established relational-temporal models, such as RE-GCN [30], to capture the context-specific patterns within each sub-graph.Second, in the collaboration stage, we construct hypergraphs among the disentangled embeddings and leverage GNN to learn the collaborative associations among contexts.Different from current graph disentanglement methods that just focus on how to separate the graph, our framework considers both separation and collaboration due to the prior guidance of context for the separation.Moreover, our method is a general framework, of which the key modules can be replaced by alternative designs.At the same time, considering that there is no available dataset that has context information, we build three new datasets based on GDELT.Extensive experiments demonstrate that our proposed framework outperforms various SOTA methods.The main contributions of our work are summarized as: • We propose to introduce the categorical contextual information into the structured event forecasting problem.• To tackle the new task, we build a novel framework SeCoGD, and the two-stage design of separation and collaboration is effective in capturing the complex patterns in the multi-context scenario.• We build three datasets based on GDELT to facilitate current and future studies for context-aware event forecasting.Our method significantly outperforms SOTA methods on the three datasets.

PRELIMINARY
We first give a formal definition for the task of context-aware event forecasting.Then, we introduce the newly constructed datasets.

Problem Formulation
We first present the problem formulation of conventional event forecasting, which does not consider the contextual information.
Then we present how to introduce the context and formulate the new task of context-aware event forecasting.Conventional Event Forecasting.We define an event as a quadruple (, , , ), where  ∈ E,  ∈ R, and  ∈ E corresponds to subject entity, relation, and object entity, respectively;  is the timestamp when this event happens; E and R are the entity and relation set, respectively.All the quadruples in the same timestamp  form an event graph, denoted as   = {(  ,   ,   , )}  =1 , where (  ,   ,   , ) is the -th event, and  is the number of events at timestamp .Given the historical event graphs at and before time , denoted as G ≤ = { 1 ,  2 , • • • ,   }, and a query, denoted as (, ,  + 1), we aim to predict the object .
Context-aware Event Forecasting.We define context  ∈ C as a categorical value denoting certain situations or conditions shared by a group of events, where C = { 1 ,  2 , • • • ,   } is the set of contexts and  is the number of a few contexts.In practice, for historical events, the context can be obtained from human annotation, crowd-sourcing tags, or automatic information extraction systems.We assign a context  to each event, thus extending its quadruple representation into a quintuple representation, denoted as (, , , , ).Correspondingly, the event graph at each timestamp  will be extended as   = {(  ,   ,   , ,   )}  =1 , where   is the context of the -th event.Given the historical event graphs G ≤ , a query (, ,  + 1) and a specified context  in which the query event is supposed to be, we target at predicting the object .Please note that specifying the categorical context during inference will not leak information about the predicted object.For example, given the context of Covid-19, the query "which country that President Y will cooperate with" will not be leaked to the model.And we assume that it is not difficult for human to provide such contextual information for a certain event he/she wants to predict.

Dataset Construction
Existing datasets for event forecasting are different cropped versions of GDELT [28] and ICEWS [34].For example, among the datasets used by current works [15,16,22,29,30], ICEWS14, ICEWS18, ICEWS05-15 include events in the ICEWS dataset of year 2014, 2018, and 2005-2015, respectively; and GDELT covers January 2018 of the original GDELT dataset.However, all of these versions only use the existing quadruple data while overlooking the context information.
To facilitate the study of context-aware event forecasting, we build three benchmark datasets based on the GDELT dataset [28], which provides the original news article URLs of the extracted events.Following previous works [9,10], we crop three subsets of GDELT according to the regions of the events, i.e., Egypt (EG), Iran (IR), and Israel (IS), spanning from February 2015 to March 2022.According to a previous systematical study [47], the structured events extracted by GDELT have high recall while low precision, which means there are many false positive events.Such noise could be caused by the event extraction system used in GDELT or the low quality of original articles.Since the GDELT event extraction system is unavailable to the public, we aim to remove low-quality articles to eliminate these noises by the following data preprocessing steps.
First, we keep the event with a valid URL.Second, we sort the domain names of the URLs, which correspond to different news agencies.In total, there are around 20K domain names, and the top 69 cover 40% of the events.After checking these top domain names, we confirm that their news articles are of higher quality and reliability.Therefore, we remove the remaining 60% of events that are published in long-tailed domain names, which are usually from less influential agencies or personal blogs and are likely to be of low quality or even fake.Third, even though the interval of two consecutive timestamps in the original GDELT data is 15 minutes, it is unnecessary to have such precise timestamps for political events.Following ICEWS, we take the one-day time interval and collapse the 15 minutes-level timestamps of events on the same day to the day-level timestamp.Finally, we obtain the datasets and split them into training/validation/testing with a ratio of 8/1/1 over the  1.
Since there is no context label in the original GDELT dataset, we leverage the textual content and topic model (i.e., LDA [1]) as a proxy to generate contexts.In particular, we use an LDA model to extract the topic distribution of each article, where the topic with the highest weight is treated as the context of an article.An event is assigned with its corresponding article's context.More studies about the topic models as well as alternative context generation approaches are presented in Section 4.4.To be noted, during inference, people can provide certain context for the query to be predicted.

METHODOLOGY
To solve the new problem of context-aware event forecasting, we propose a novel framework Separation and Collaboration Graph Disentanglement (SeCoGD), as shown in Figure 2. It consists of two stages: the separation stage and the collaboration stage.

Separation
In the separation stage, we first use the context as a prior guidance to disentangle the event graph into multiple sub-graphs.Then we devise a context-specific modeling module to capture the relational and temporal patterns within each context.

Context-aware Graph
Disentanglement.Generally, events in the same context exhibit similar or correlated patterns, while events in different contexts demonstrate distinctive characteristics.Current works [10,22,30,36] connect all the quadruples at the same timestamp as a unified event graph and learn a single embedding for each entity and relation via GNN models.However, such unified entity and relation embeddings are highly entangled w.r.t.diverse context [32], failing to capture the context-specific patterns.
Inspired by recent progress in disentangled representation learning [32,46,48], we seek graph disentanglement for context-aware event forecasting.Most existing works solely rely on the inherent structural information for graph disentanglement.For example, MaridVAE [33] and DGCF [45] utilize the user-item interactions to learn disentangled representations for different intents; DisenKGAT [48] tackles the heterogeneous knowledge graph and disentangles the entity embedding with respect to different topics and clusters.Nonetheless, these methods are incapable of disentangling event graphs since the events are too coarse-grained, and pure structural information is unable to well disentangle the graph.
We employ the context as a prior guidance to disentangle the event graph.Formally, given  contexts, we separate the original entangled event graph   into  sub-graphs , where     is the number of events in timestamp  within context   .Note that we make use of the external prior knowledge, i.e., the context, to disentangle the original graph.Meanwhile, previous works exploit end2end solutions, which either adopt attention mechanism [32] or incorporate auxiliary distance regularizer [45] to directly learn disentangled representations solely relying on the graph data.This prior-guided disentanglement is better than the end2end solutions in separating the graph due to the incorporation of external knowledge.Based on the disentangled event graph, the core of the model is to model the patterns within each context and across multiple contexts.
3.1.2Context-specific Modeling.Each separated graph preserves distinct characteristics for both concurrent relations and evolving patterns under its corresponding context.Towards this end, we build a context-specific modeling module for each context.Given a list of historical event graphs   ≤ under a certain context , the context-specific modeling module aims to learn entity and relation representations.We inherit the design of RE-GCN [30] to build this context-specific modeling module, which encompasses two parts: concurrent events modeling and temporal event modeling.
Concurrent Event Modeling is curated to model the relationship among events occurring in the same timestamp.We make use of RGCN [37], which is capable of modeling multi-relation graphs, as the graph kernel to learn the entity representation.At timestamp  under context , for each layer  of the graph propagation, the message obtained by each object  is e  ,, ∈ R  , defined as: where  is the dimensionality of the message, E  stands for all the events of which  is the object, W  1 , W  2 ∈ R  × are the parameters of the convolutional kernel in layer , and  (•) is the activation function which we use RReLU.To be noted, e   , E  , e  −1  , r, e  −1  all stand for their corresponding representations in time  of context , where we omit the subscript (, ) for simplicity.After performing multi-layer message passing, we aggregate the messages obtained from multi-layer propagation and yield the entity representation at time  under context , defined as: where e 0 ,, = e 0 , ∈ R  is randomly initialized for each entity  under each context .And the representations for all entities at timestamp  within context  are denoted as Temporal Pattern Modeling is designed to capture the temporal evolution of entities and relations.Following the previous study [30], we devise a learnable gate mechanism to reserve the entities' evolving patterns.It is formally defined as: where U , ∈ R  × is the learnable gate, which is calculated by a nonlinear transformation: where  (•) is the sigmoid activation function, and W 4 and b are trainable parameters for the gate.For efficiency during implementation, we take the recent  steps of historical graphs to capture the temporal evolving patterns, following the typical TKG (Temporal Knowledge Graph) solutions [5].Then, the entity embeddings in the last step preserve all the context-aware relational and temporal patterns, and we denote them as E  .
For the relation representation, we concatenate its embedding and associated entities, thus each relation embedding is updated as: where V ,, is the set of entities that connect to the relation  , e ,, ∈ R  is the representation of entity  in E , , and [; ] is the concatenation operation.Then a GRU is applied to deduce the temporal relation representation r , , calculated by: And all the relations' representations at time  for context  are defined as R , ∈ R | R | × .Going through  steps of recurrent units, we obtain all the relations' representations that retain relational and temporal information conditioned on context , denoted as R  .

Collaboration
In the collaboration stage, we leverage hypergraphs to model the cross-context collaborative associations.Then we perform contextaware prediction and optimization.

Cross-context Modeling.
Even though the same entity demonstrates different characteristics in various contexts, these contexts are not independent but correlated with each other.For example, given the contexts of Covid-19 Pandemic and Russia-Ukraine War, many countries must consider them simultaneously to make economic policies, in order to minimize the influence on their economy as well as social stability.To this end, capturing such correlation is crucial for some events that are affected by multiple contexts.Furthermore, after disentangling the event graph into multiple contexts, each sub-graph will be sparser than the original unified graph.Some entities that do not have sufficient occurrence in a certain context will not be well-trained for accurate forecasting.For such few-shot entities and relations, transferring knowledge from other contexts that have sufficient training data is a promising solution.
Based on the above motivations, we devise a collaboration module to model the collaborative effects among multiple contexts, aiming to achieve potential knowledge transfer for sparse entities.It is worth mentioning that we do not have supervised information to quantify the correlations among contexts, thus, we are unable to explicitly model the collaborative effects.Considering this, we resort to hypergraph to model the latent collaborations.Concretely, for each entity , we construct a hypergraph among its sub-embeddings in different contexts, where the nodes are the separated embeddings of all entities (relations) in different contexts and every hyper-edge connects the separated embeddings of the same entity (relation).Then we leverage a multi-layer LightGCN [17] to propagate over every hypergraph, and ê , ∈ R  is the -th layer propagated information to node  under context , obtained by:   where C  are all the contexts that the entity  has been in.After  layers of propagation, we aggregate each layer's embedding and yield the final entity representation:

RE-GCN
where ê0 , is the representation of entity  in E  .After the hypergraph propagation, all entities are represented as Ê .
Analogous to entities, relations' representations in different contexts are also totally isolated during the context-specific modeling.Thereby, for each relation, we also build a hypergraph and take advantage of a multi-layer LightGCN kernel to capture the collaborative associations among different contexts, defined as: where r , ∈ R  is information propagated to relation  in layer , and C  is the set of contexts that relation  has been in over the historical observations.With  layers of graph propagation, we aggregate multiple layers' representations and obtain the final relation embedding r, , formally written as: r, =  =0 r , , where r0 , is relation 's embedding in R  .3.2.2Context-aware Prediction and Optimization.With the contextspecific and cross-context modeling modules, we learn the entity and relation representations that not only capture context-aware characteristics but also preserve transferred knowledge from other contexts.Following the established approach to event forecasting [29,30], we devise a decoder based on ConvTransE [38].In particular, given a query quadruple (, , , ), we first use a Con-vTransE to produce the query's representation, then score the candidate entities E via inner-product between the query and candidate representations.Formally, we calculate the prediction scores for all candidate entities given the query (,  ) at time  + 1 under context  as follows: where softmax(•) is the softmax function, ConvTransE(•) is the ConvTransE decoder, and ê, and r are the representations for  and  , respectively.The predicted object is presented as: We employ cross-entropy loss to optimize the whole framework in an end-to-end fashion, and the loss is defined as: where  is the total number of timestamps in the training set, and y (,, +1, ) is the one-hot representation of ground-truth object .

Discussion
To further highlight the key contributions of this work, we discuss the generalization capability of SeCoGD, as well as the rationale behind separation and collaboration.

Generalization Capability.
We argue that our method SeC-oGD is a general framework instead of a specific model.The key contribution of SeCoGD lies in two aspects: 1) it makes use of context as a prior guidance to disentangle the event graph; and 2) it proposes a novel graph disentanglement idea under prior-guided disentanglement, that is to model the collaborative association among the disentangled representations.First, the context can be flexibly defined according to various application scenarios.For example, the tag of the news article that an event belongs to can be used as its context.Alternatively, similar to our solution of the latent topic model, various automatic text clustering algorithms, such as K-means or GMM (Gaussian Mixture Model), are plausible to identify the latent contexts of events.Second, each component of SeCoGD has various alternatives.For example, RE-NET [22] can be used to replace the RE-GCN module for context-specific modeling, hypergraph can be replaced by some regularizers (i.e., L2 distance) that pull closer the disentangled representations, and multiple modules [12,40,49] could be manipulated as the decoder.
3.3.2Separation and/or Collaboration.Our work strengthens that it is crucial to incorporate the collaboration stage on top of the separation stage.However, most previous works on graph disentanglement solely focus on the separation part.For example, they either leverage regularization terms to maximize the mutual information [45] among multiple chunked representations or use attention mechanism [32] to make different disentangled representations attend on various sub-graphs.We assume that such contradictory modeling philosophy roots in two reasons.First, we have prior knowledge as the guidance for the disentanglement, therefore, we do not need any heuristically manipulated disentanglement strategies, such as mutual information maximization or attention.Second and more importantly, we believe that the crux of an effective graph disentanglement model lies in a good balance of separation and collaboration.Current works are built upon a unified graph model, which is highly intertwined.Thereby, a separation module is necessary to eliminate the entanglement.Meanwhile, we disentangle event graphs by the prior contextual information, where the sub-graphs are well or even over separated, thus a collaboration module is required to rectify the separation.

EXPERIMENTS
We aim to answer the following research questions: • RQ1: Does our framework outperform the SOTA methods?
• RQ2: Are the design of two stages, i.e., separation and collaboration, effective in terms of event forecasting?• RQ3: How does the context affect event forecasting?

Experimental Settings
We conduct experiments on the three datasets that we constructed, i.e., EG, IR, and IS.The construction and statistics of the datasets can be found in Section 2.2.Following previous settings [30], we use Mean Reciprocal Rank (MRR) and HIT@{1, 3, 10} as the evaluation metrics.We use MRR to select the best model based on the validation set and record its corresponding performance on the testing set.
4.1.1Compared Methods.Since current works have never studied the newly proposed problem of context-aware event forecasting on temporal event graph data, we select several strands of the most relevant works to compare with our proposed method.
• Static KG completion methods treat event forecasting as a link prediction task on the static event graph.We select the following representative methods: DistMult [49], ConvE [12], ConvTransE [38], RotatE [40], and RGCN [37].• Temporal KG forecasting methods are designed for temporal event forecasting.These methods consider both relational and temporal information for link prediction in the next timestamp.We consider the following SOTA methods: TANGO [16], RE-NET [22], RE-GCN [30], EvoKG [36], and HiSMatch [29].• Temporal event forecasting methods with texts incorporate textual information into the event forecasting model, while the standard temporal KG forecasting methods only use structural information.In particular, we implement two versions of a representative method: 1) CMF  [10], by faithfully following the settings of the original work and incorporating the event textual description defined in the CAMEO [4] ontology into the structural event forecasting model.In addition, we also re-implemented 2) CMF  [10], which differs from CMF  by using the original article embeddings extracted by doc2vec [26] instead of using the texts in the ontology.CMF is originally designed for binary classification of a event happening or not.We replace their task head with a typical ConvTransE decoder to enable link prediction.• Graph disentanglement methods aim to separate the intertwined relational information into disentangled representations.We take into account two representative graph disentanglement methods: DisenGCN [32] and DisenKGAT [48].Both of the methods are designed for static graphs.
4.1.2Hyper-parameter Settings.We implement all the static methods using OpenKE3 , for TANGO, RE-NET, RE-GCN, and EvoKG, we use their released code.For CMF and HiSMatch, we re-implement them by ourselves since these methods have not released the code.

Performance Comparison (RQ1)
Table 2 shows the overall performance of our model and baselines.First of all, our method outperforms all the baselines on all three datasets.Among all the metrics, the improvement on HIT@1 is the highest, which is truly helpful in practice.Second, for all the baselines, RE-GCN achieves the best performance and even beats the models with textual inputs (i.e., CMF  and CMF  ), demonstrating its superiority in modeling temporal event graphs.This is why we select RE-GCN for context-specific modeling in our implementation.Third, in terms of the methods with textual inputs, CMF  and CMF  perform well, beating most of the static and temporal methods.The results imply that the additional textual information offers valuable clues that are crucial to forecast future events.However, they are not the strongest baseline, probably because they are originally designed for binary event classification and the link prediction head is not perfectly adapted.Finally, for the two graph disentanglement-based methods, i.e., DisenGCN and DisenKGAT, they do not perform very well.There are two possible reasons: 1) they rely on the static global graph, which cannot model the temporal evolving patterns; and 2) more importantly, the events in current datasets are coarse-grained and less discriminative, therefore, the methods that solely rely on structured data fail to learn disentangled representations.The results also justify our method that leverages the context as a prior guidance, instead of graph structure, to separate the event graph.
Table 2: The overall performance comparison between SeCoGD and baselines.

Study of Key Modules (RQ2)
We conduct model studies to analyze the effect of the key modules in the two stages, i.e., separation and collaboration.

Study of the Separation Stage.
For the concurrent event modeling, we use the RGCN kernel.We tune the number of propagation layers, and the results are shown in Figure 3. Basically, two and three layers are better than one layer, depicting that higherorder information propagation over the concurrent event graph is beneficial to capture the context-specific signals.We also replace RGCN with CompGCN [42], and Figure 4 illustrates the results.Overall speaking, CompGCN and RGCN perform similarly to each other on the three datasets, and they differ slightly in terms of different evaluation metrics.It shows that our framework is not sensitive to relational modeling models.
For the temporal pattern modeling module, we tune the length of historical graphs  that we used to generate the entity and  relation embeddings.We try different historical length  within {1, 3, 7} and visualize the results in Figure 5.For HIT@10, the longer the historical length is, the better the performance will be.But for MRR, EG and IR achieve the best performance with D=1 and D=2, respectively.This difference reminds practitioners to properly select evaluation metrics according to the application scenarios.For example on the EG dataset, if we care more about the ranking of the prediction, we need to choose MRR and set D=1.Meanwhile, if we pay more attention to the hit rate of the top-10 predicted results, HIT@10 with D=7 should be a better option.In addition, longer historical length takes extra computational costs.Therefore, it is a trade-off between efficacy and efficiency in practice.From the results, we can see that the results of removing either relation or entity hypergraph are worse than SeCoGD but better than that of removing both, demonstrating the efficacy of both hypergraphs.More interestingly, the performance drop of removing the entity hypergraph is generally larger than that of removing the relation hypergraph, implying that the collaboration of entities is more valuable.During prediction, our context-aware event forecasting will pair each query (, ,  + 1) with an auxiliary context .By specifying the context, its corresponding branch of the decoder will be selected and performs the forecasting.We argue that such a context-aware prediction narrows down the candidate space and performs better.To justify our hypothesis, we curate a variant, in which we do not specify the context during inference while just averaging the prediction scores from all the context decoders, corresponding to the row "Avr.Context" in Table 3.We can observe that "Avr.Context" performs much worse than SeCoGD.This phenomenon indicates that the specification of the proper context during inference is crucial to SeCoGD, justifying our hypothesis that the context plays a pivotal role in accurate event forecasting.

Study of the Context (RQ3)
4.4.1 Effect of the Number of Contexts.We vary the number of LDA topics  when we generate the context based on the news article, resulting in multiple versions of datasets with different number of contexts.We implement SeCoGD on all the versions of dataset and obtain the results, which are presented in Figure 6.In general, more contexts yield better performance.This is natural and reasonable because when the number of contexts increases, each context will be more specific, resulting in more fine-grained information being injected into the event.However, more contexts inevitably introduce more computational expenses.We leave the study of efficiency and scalability improvement in future work.4.4.2Effect of the Context Curation Methods.We define the context as a categorical label for each event, while LDA is just one of the automatic methods in order to avoid extensive labor and costs for context annotation.We argue that alternative automatic approaches are also workable for our framework.To illustrate this property, we leverage two prominent text clustering methods, i.e., K-means and GMM (Gaussian Mixture Model) using the article embeddings pretrained by doc2vec [26], to generate contexts.Results based on the newly-generated contexts are shown in Table 4. From the results, we can conclude that SeCoGD is generally able to outperform RE-GCN by leveraging the contexts generated with alternative clustering methods.This further illustrates that our method is robust to diverse context sources.Nonetheless, our proposal of using LDA to generate contexts performs best, thus we take it as the default setting.

Case Study.
We seek to elicit the content of each context, thus to elaborate how different events are predicted under distinct contexts.As shown in Figure 7, for each dataset, we illustrate the top words of each context in the form of word cloud [19] (the size of each word is proportional to its weight in the LDA topic distribution).From word clouds, we observe rich information within each context and clear content differences among contexts.For example, each context in the EG dataset covers background information such as popular actors, important cities, and critical actions; meanwhile, they are about economic, military, and political events respectively.We also pick an example query (, ,  + 1) from the testing set of each dataset to concretely explicate the benefits of context-aware event forecasting.We list the object with the highest score predicted by RE-GCN and SeCoGD, and we observe that SeCoGD generates more accurate prediction results compared to RE-GCN.With a given context , general event types such as 'Consult', 'Negotiate', and 'Host a visit' are now narrated with more supplementary information modeled in the context, leading to better results.We also observe that given the same query, SeCoGD sometimes predicts distinct objects under different contexts.For example, for the example query in IS dataset, SeCoGD predicts that Israel will host a visit for Ukraine students under Context 2, and predicts UK instead under Context 3. We notice that both events exist in the dataset and thus are both correct, and the two predictions are in line with the contents of their context.As shown in the word cloud for IS Context 2 and 3, the former prediction might take more military factors into consideration, and the latter is more related to government affairs.This demonstrates the flexibility in depicting the event by context.

RELATED WORK 5.1 Temporal Event Forecasting
Temporal event forecasting aims to forecast future events based on a list of observed historical events.It has been studied in various application scenarios, including criminal activities [44], disease outbreaks [11], stock markets [2], as well as international political events [28,34].Various problem formulations are utilized with regard to different event types, such as time series forecasting, natural language generation, and link prediction.In this work, we follow the typical formulations of link prediction, which is also called temporal knowledge graph completion.It inherits from static knowledge graph completion, where the key is to learn relational embeddings via various scoring functions, such as TransE [3], Dist-Mult [49], ComplEx [41], RotatE [40], ConvE [12], ConvTransE [38]  etc.To tackle the temporal evolving patterns and forecast future events, recurrent neural network (RNN) [20] has been included.RE-NET [22] proposes to use RGCN [37] to capture the relational patterns in each timestamp and GRU [7] to model the dynamics of embeddings over time.RE-GCN [30] additionally incorporates a static graph to learn the static properties of the entities and adopts ConvTransE [38] as the decoder.TANGO [16] models the structure of candidate entities via neural ordinary differential equations; EvoKG [36] considers the time information for event forecasting; HiSMatch [29] reformulates event forecasting as a query-candidate matching problem and proposes a two-branch framework to match a query to candidate entities.More related works [15,18,27,39,53] can be seen in the survey [5].Most of these TKG methods only operate on pure structured data, overlooking the rich semantic or contextual information.To address these limitations, Glean [9] and CMF [10] propose to use the textual information.They simplify the event forecasting problem from fine-grained link prediction to an easier binary classification problem, i.e., predicting whether an event will happen or not.In addition, the textual information is only available for historical events but unavailable for future events, thus this additional information cannot directly narrow down the candidate space.Some works also use context for event prediction, while they are either for event status classification [8] or time series forecasting [35], which are different tasks.

Graph Disentanglement
Graph neural networks [14,25,43] have been the defacto solutions for graph representation learning.Graph disentanglement is the extension of disentangled representation learning from the general domain to the graph data.Disentangled representation learning focuses on separating the unified representation into multiple disentangled components, thus achieving many excellent modeling properties such as enhanced representation capability or explainability.Various studies have been conducted on CV [6], NLP [23], as well as recommender system [21,33,45].For graph representation learning, disentanglement has also garnered particular attention.DisenGCN [32] is one of the pioneering works to use multiple disentangled graph convolutional kernels to learn disentangled node representations.FactorGCN [50] factorizes the node embedding into multiple blocks, which captures interpretable global topological semantics.IPGDN [31] leverages the Hilbert-Schmidt Independence Criterion (HSIC) to achieve disentanglement.ADGCN [52] introduces adversarial learning to graph disentanglement representation learning.DisenHAN [46] is designed for heterogeneous graph, where multiple node and relation types are involved.The most relevant work for event forecasting is DisenKGAT [48], which aims to learn disentangled representations for knowledge graph.Despite various studies on graph disentanglement, our work differs from current works and promotes these works in several aspects.First, we are the first to introduce graph disentanglement learning to temporal event forecasting.Second, most of the existing works aim to directly disentangle the graph purely using the graph's own features, ignoring the contextual information.

CONCLUSION AND FUTURE WORK
In this work, we explored the incorporation of context into the problem of event forecasting and proposed a novel task of contextaware event forecasting.To tackle this novel problem, we borrowed the idea from graph disentanglement and designed an overall framework SeCoGD.Specifically, we utilized the context as prior guidance to separate the event graph and incorporated a context-specific modeling module to capture the relational and temporal patterns in each context.In addition, we designed a cross-context modeling module to model the collaborative associations among multiple contexts.Since there are no available datasets for this new task, we built three large-scale datasets based on GDELT.Extensive experiments on these three datasets demonstrated that our framework outperforms all the SOTA methods.Various model studies further elaborated more details about the effectiveness of the key modules and various contexts of the framework.Despite the progress achieved by this work, there are several limitations, thus motivating multiple potential research directions in the future.First, the implementation of context generation is based on unsupervised methods, while human-generated contexts, such as tags and categories, could be more useful in practice.Second, the original articles of these events are just used as a proxy to generate contexts, of which just a little information has been utilized.More effective approaches to mining more beneficial patterns from raw texts are promising.Third, more advanced graph disentanglement methods are expected to be explored and enhance the performance.Finally, in addition to next step prediction, the more important yet challenging multi-horizon forecasting should be studied in future.

Figure 1 :
Figure 1: The motivation of context-aware event forecasting.(a)Most current events fall in the coarse-grained and higher level types of the ontology, while more informative finegrained events are fewer.(b) Out-of-ontology and diverse contexts affect events.Context can provide more fine-grained information to enhance the event forecasting performance.

Figure 2 :
Figure 2: The overall framework of SeCoGD consists of two stages: separation and collaboration.The separation stage includes the context-aware graph disentanglement and context-specific modeling modules, and the collaboration stage comprises the cross-context modeling and context-aware prediction modules.

Figure 3 :
Figure 3: Results comparison of propagating different number of layers L in the context-specific modeling module.

Figure 4 :
Figure 4: Results of using different graph kernels.

Figure 5 :
Figure 5: Results of with different historical length D.

4. 3 . 2
Study of the Collaboration Stage.We construct a hypergraph over the sub-embeddings of each entity and relation to retain the collaborative associations across multiple contexts.To test the efficacy of the collaboration stage and the implementation of the

Figure 6 :
Figure 6: Results with different number of contexts.

Figure 7 :
Figure 7: Case study on three datasets.In each sub-figure, the context number K is set as three, the top shows the word cloud of each context, and the bottom illustrates several exemplar forecasting results by SeCoGD and RE-GCN.

Table 3 :
Study of the cross-context modeling and contextaware prediction.

Table 4 :
Alternative context generation methods., we design several ablated models by progressively removing the two hypergraphs of entity and relation.In Table3, "w/o Ent HG", "w/o Rel HG", and "w/o Ent or Rel HG" refer to without relation hypergraph, without entity hypergraph, and without either entity or relation hypergraph, respectively. hypergraph