Causal Discovery in Temporal Domain from Interventional Data

Causal learning from observational data has garnered attention as controlled experiments can be costly. To enhance identifiability, incorporating intervention data has become a mainstream approach. However, these methods have yet to be explored in the context of time series data, despite their success in static data. To address this research gap, this paper presents a novel contribution. Firstly, a temporal interventional dataset with causal labels is introduced, derived from a data center IT room of a cloud service company. Secondly, this paper introduces TECDI, a novel approach for temporal causal discovery. TECDI leverages the smooth, algebraic characterization of acyclicity in causal graphs to efficiently uncover causal relationships. Experimental results on simulated and proposed real-world datasets validate the effectiveness of TECDI in accurately uncovering temporal causal relationships. The introduction of the temporal interventional dataset and the superior performance of TECDI contribute to advancing research in temporal causal discovery. Our datasets and codes have released at~\hrefhttps://github.com/lpwpower/TECDI https://github.com/lpwpower/TECDI.


INTRODUCTION
The exploration of causal knowledge has stood as a fundamental undertaking in scientific research.Causal knowledge aids in comprehending intricate systems, including cloud service systems, and guides further optimization decisions.Nonetheless, even for very powerful large-scale models, it is challenging to handle causal discovery and causal inference tasks [9,21].Conducting randomized controlled trials, which serve as the gold standard for causal discovery, often proves arduous due to factors such as experimental costs and ethical constraints.As a result, causal discovery algorithms relying on observational data have garnered substantial attention in recent years.However, from the observational data, the true causal graph is only identifiable up to a Markov equivalence class under the faithfulness assumption [6].Encouragingly, identifiability can be enhanced by obsorbing additional information from interventional data [4,19], which have been applied in the causal discovery from static data [3,8].
Temporal data is prevalent across diverse fields, such as industry [10], finance [7], meteorology [11], and neuroscience [2], which need causal knowledge.Unlike static data, temporal data provides a dynamic perspective, enabling us to capture the evolution of causal relationships.Therefore, causal discovery within the temporal domain extends beyond the mere identification of causal links between variables (such as  causes  ), incorporating the discovering causal lags (such as   − causes   with lag ).The approaches to causal discovery from temporal data roughly fall into three categories, namely, Granger causality, constraint-based methods and score-based methods.Besides the sole work [5], there is currently no method available to utilize interventional data for temporal causal discovery.The main reason for this blank may be that real intervention time series data are difficult to obtain.
Inspired by recent advancements in anomaly detection on time series [20], this paper introduces a novel approach for constructing a temporal interventional dataset based on monitoring data from a data center operated by Alibaba.In our dataset, we consider a source anomaly, where no other anomalies were detected in the previous period, as the intervention.Building upon this dataset, we design a method called temporal causal discovery based on interventional Data (TECDI).TECDI leverages the smooth, algebraic characterization of acyclicity of causal graphs [12,22] to achieve differential causal structure learning.Powered by the theoretically-grounded method based on static interventional data [4], our method is able to capture both contemporaneous and time-lagged causal relationships simultaneously with identifiability guarantee.To validate the effectiveness of TECDI, we conducted experiments using both simulated and proposed real-world datasets.The results clearly demonstrate the superiority of our method in discovering temporal causal relationships accurately.
This paper makes two significant contributions.Firstly, to the best of our knowledge, TECDI is the first attempt to employ interventional data for real-world temporal causal discovery.This novel approach opens up new avenues for research in this field and expands the possibilities for understanding causal relationships in practical complex system.Secondly, we construct the first-ever real-world temporal interventional dataset, which we believe will greatly facilitate future investigations and advancements in this area of research.

RELATED WORK
The theoretical evidence confirms that the availability of interventional data can greatly enhance the identification of the underlying causal structure [4,18].However, the research progress in this direction has been limited due to the substantial challenges associated with designing intervention experiments and obtaining the necessary data.The study conducted in [8] addressed the challenge of learning causal graphs with latent variables from a combination of observational and interventional distributions, where the interventional targets are unknown.The proposed approach utilized a Ψ-Markov property to tackle this problem.In a different context, the work presented in [1] introduced a randomized algorithm that recovers the complete causal graph while minimizing the intervention cost.This algorithm relied on a novel characterization based on -colliders.Additionally, DCDI [3] is a versatile method for causal discovery that operates under continuous constraints.It can effectively leverage different types of interventional data and incorporate expressive neural architectures like normalizing flows.These works, however, do not address the problem of temporal causal discovery using interventional data.The only study available on temporal data, as presented in [5], does not include an evaluation on real-world datasets.

3.2
The score for imperfect interventions

Model conditional densities.
To begin with, we use neural networks to model conditional densities.Firstly, we encode the DAG G with a binary adjacency matrix  G ∈ {0, 1} (+1) × (+1) which acts as a mask on the neural network inputs.Similarly, we encode the interventional family I with a binary matrix  I ∈ {0, 1}  × (+1) , where  I   = 1 means that   is a target in   .Then, following equation ( 2), we further model the joint density of the th intervention by where  := { (1) , . . .,  ( ) }, the NN's are neural networks parameterized by where the ground truth interventional family I * := ( * 1 , . . .,  *  ) is known and  () stands for the th ground truth interventional distribution.By maximizing it, we can get an estimated DAG Ĝ that is I * -Markov equivalent to the true DAG G * [3].Then, we take  G as a random matrix, where  G   ∼ (1,  (   )),  is the sigmoid function and    is a scalar parameter.We group these    s into a matrix Λ ∈ R (+1) × (+1) .After that, we rely on augmented Lagrangian procedure [22] to maximize the following score: under the acyclicity constraint: sup Λ ŜI * (Λ), s.t.Tr   (Λ) −  = 0.

EXPERIMENTS 4.1 Baselines
To evaluate the effectiveness of our method, we compare with the following models: • DYNOTEARS [12]: As DYNOTEARS is a method focused on fitting the exact values of time series, it outputs quantitative weight values.We set the threshold value of each  and  to be 0.5.• PCMCI [16]: We used the results of PCMCI with a significance level of 0.01.• NeuralGC [17]: Since NeuralGC only learns contemporaneous relationships, we used  full , representing the overall relations between variables, to compare our method with it. full is defined as: Since the baseline models are unable to utilize interventional data, we have ensured a fair comparison to some extent by keeping the overall sample size consistent across all models, while using only observational data (or normal data in real datasets) for the baseline models, and both observational and interventional data (or abnormal data in real datasets) for the proposed method.Therefore, we validate that by making use of extra information from interventional data, our proposed method outperforms other models that only apply observational data.• Generate time series consistent with the sampled weighted graph following the standard structural vector autoregressive (SVAR) model [15]:

On simulation datasets
where  is random variables under the normal distribution.Then, sample interventional targets from nodes in  0 , and generate imperfect interventional data by adding a random vector of U ([−0.5, −0.25] ∪ [0.25, 0.5]) to    and     , where   is the variable in interventional targets and Besides, if    ,     > 0, add the change to it; if    ,     < 0, minus the change to it.Before training, all data are normalized by subtracting the mean and dividing by the standard deviation.We experimented on two simulated datasets: Dataset 1 contains 5 nodes, their 1 time-lagged variables and 5 different interventional targets, each of which covers a single different node.Dataset 2 contains 10 nodes, their 1 timelagged variables and 10 different interventional targets, each of which covers a single different node.

Evaluation metrics.
We leverage the following two main metrics to evaluate the performance of the proposed method on learning causal graph: i) the structural Hamming distance (SHD), which calculates the number of different edges (either reversed, missing or redundant) between two DAGs; ii) the structural interventional distance (SID), which represents the difference between two DAGs according to their causal inference conditions [14].

Results
. The results on simulation data are reported in Table 1.We can find that on both datasets, our method achieves significantly better results than baseline models on all four metrics: SHD, SID of the overall structure, and SHD of  (intra-slice structure) and  (inter-slice structure).Figure 1 shows results on dataset 1.Computer room air conditioners (CRACs) supply cooling on both sides of the room, creating a closed loop structure that maintains a stable environment.Multiple sensors located in the cold aisle provide real-time temperature monitoring to ensure a continuous supply of cold air.This approach enables timely adjustments to maintain stable IT equipment operation.Data acquisition.The data used in this study was obtained from a specific data center at Alibaba.It covers monitoring data from a cooling system of a particular room from January 1st, 2023 to May 1st, 2023, and includes 38 variables in total.These variables comprise 18 cold aisle temperatures from sensors and 20 air conditioning supply temperatures from CRACs.We collected several time series of these 38 variables during normal as well as abnormal states.For the latter, data was sampled within 20 minutes of the occurrence of the abnormality, with each sampling interval to be 10 seconds.Anomaly points were identified by learning the normal distribution range from historical data, using the - method.Any data points that fall outside of the - range (e.g., 3 to 5) of it selves are extracted as anomaly time points.

Evaluation metrics.
Given the absence of ground truth DAGs in the real dataset, we employ two types of performance metrics based on prior knowledge to assess the algorithms' efficacy in learning causal graphs.These metrics mainly consider the physical location relationships within the room.Firstly, we reckon the edges from cold aisle temperatures (downstream) to air conditioning supply temperatures (upstream) is necessarily incorrect, and should be considered as a false negative.Therefore, we calculate i) C2A False, whether the model has learned those edges.Moreover, the closer the sensor is to a certain air conditioner, the greater the influence on this sensor.We assume that the edges from the air conditioning supply temperatures to the temperatures of the two adjacent cold aisles exist (shown in figure 2), and calculate ii) A2C False, the number of true edges the algorithm has not learned; and iii) A2C True, the number of true edges the algorithm has learned.False but also learning a higher number of A2C True relations.

CONCLUSION
In conclusion, this paper addresses the limitations of temporal causal learning from observational data.The novel contributions include the introduction of a temporal interventional dataset with causal labels and the proposed TECDI approach.The experimental results demonstrate that incorporating interventional data effectively improves the accuracy of temporal causal discovery.In the future, we plan to apply the learned causal graph to practical tasks, such as root cause localization.

3. 1 . 2
Intervention.An intervention on a variable   is corresponding to replacing its conditional density   (  |  G) by a new one.
, the operator ⊙ denotes the Hadamard product (element-wise) and  G  denotes the th column of  G , which enables selecting the parents of node in the graph G.

3. 2 . 2
Maximize the score.Finally, we form the regularized maximum log-likelihood score:

Figure 1 :
Figure 1: A show case of the result on simulation data.

4. 3
On real datasets 4.3.1 Datasets.Scence description.In modern data centers, stable IT equipment operation is crucial.Advanced air conditioning systems are used to regulate the heat generated by the equipment and maintain a stable indoor temperature.In a typical data center

Figure 2 :
Figure 2: A typical data center cooling system diagram.