A Look into Causal Effects under Entangled Treatment in Graphs: Investigating the Impact of Contact on MRSA Infection

Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatment assignment mechanism plays a key role as it determines the patterns of missing counterfactuals -- the fundamental challenge of causal effect estimation. Most existing observational studies for causal effect learning assume that the treatment is assigned individually for each unit. However, on many occasions, the treatments are pairwisely assigned for units that are connected in graphs, i.e., the treatments of different units are entangled. Neglecting the entangled treatments can impede the causal effect estimation. In this paper, we study the problem of causal effect estimation with treatment entangled in a graph. Despite a few explorations for entangled treatments, this problem still remains challenging due to the following challenges: (1) the entanglement brings difficulties in modeling and leveraging the unknown treatment assignment mechanism; (2) there may exist hidden confounders which lead to confounding biases in causal effect estimation; (3) the observational data is often time-varying. To tackle these challenges, we propose a novel method NEAT, which explicitly leverages the graph structure to model the treatment assignment mechanism, and mitigates confounding biases based on the treatment assignment modeling. We also extend our method into a dynamic setting to handle time-varying observational data. Experiments on both synthetic datasets and a real-world MRSA dataset validate the effectiveness of the proposed method, and provide insights for future applications.


INTRODUCTION
In the past a few decades, a burgeoning body of studies [13,21,42] have been proposed for preventing infectious diseases such as Methicillin-resistant Staphylococcus aureus (MRSA).MRSA is a type of bacteria that is resistant to antibiotics, including methicillin and other penicillins.It can cause infections in the skin, respiratory tract, and urinary tract and can be spread through close contact with infected individuals or contaminated surfaces.In these scenarios, in-person contact relations are crucial for MRSA-related studies, and graphs are naturally used for modeling these relations.An important question that medical specialists are interested in is: "What is the causal effect of close contact (treatment) on the spread of MRSA (outcome) in a room-sharing network?"Inspiringly, an emerging field that aims to investigate causal effects rather than the statistical correlations between variables in graph data has attracted arising attention recently [12,26].In general, causal effect learning [20,33] aims to estimate the causal effect of a certain treatment on an outcome for different units.On graph data, causal effect learning has great potential in many real-world applications such as epidemiology [10,25].The progress in this area provides us with effective tools for investigating contact impact on MRSA infection.
As discussed in [20], the fundamental challenge of causal effect learning is data missing-only one potential outcome (the one that corresponds to the treatment assignment) can be observed for each unit.For example, for a patient with frequent physical contact with others, the potential outcome for this individual with infrequent contact (i.e., counterfactual) is unavailable.As the treatment assignment mechanism (i.e., how the treatment is assigned to different units) determines which part of the data is missing, treatment assignment plays an essential role in causal studies.Currently, most existing studies are based on the individualistic treatment assignment [20], where the treatment is assigned individually for each unit.However, in graphs, the treatment is often assigned in a pairwise manner to units that are connected.For example, the in-room contact in a room-sharing network is often not individually applied to each person.Instead, it often happens between a pair of people.In these scenarios, treatments are not individually applied to each unit (i.e., treatments cannot be determined only by each unit's own properties).This setting is referred to as entangled treatment [44].Motivated by these scenarios, in this work, we study the problem of causal effect learning in graphs under entangled treatment.
A few previous works [44,45] have made preliminary explorations of this problem, but many challenges remain unaddressed: 1) As discussed in [44], treatment entanglement increases the risk of misspecification of the treatment effect estimator.If the entanglement through the graph is not considered, causal effect estimators tend to incorrectly attribute the observed treatment assignments to each unit's individual properties, and thus degrade the performance of causal effect estimation.To handle this entanglement problem, existing works [44] assume that the treatment assignment is determined by a pre-determined function over the graph (e.g., the treatment can be the node degree on the graph).However, on many occasions, this function is unknown.2) Existing works [44,45] rely on the unconfoundedness assumption [36] (or its weaker version) that there do not exist unobserved confounders (confounders are variables which causally influence both the treatment and the outcome.For example, patients' behavior habits are hidden confounders that influence their physical contact and infection risk.However, hidden confounders often exist in the real world and could lead to confounding biases.3) Existing works are often limited to a static setting.However, the graph, treatment, outcome, and unit covariates are naturally dynamic in many real-world scenarios.For example, the patient data is evolving over time; the causal association across different timestamps also brings more difficulties in learning causal effects.
To address the aforementioned challenges, in this paper, we propose a novel framework NEAT to estimate causal effects under Network EntAngled Treatments.Specifically: 1) To handle the entangled treatment, for each node, we explicitly leverage its relevant graph topology to model the unknown treatment assignment with a learnable neural network module.2) To tackle the hidden confounders, we take the graph structure regarding each node as an instrumental variable (IV) [16].IV can eliminate the biases brought by hidden confounders in causal effect estimation.In the previous example, the room-sharing network is a valid IV if it is assumed to be independent of the patient's behavior habits, and its influence on the MRSA infection is fully mediated by the physical contact.A valid IV can provide a sort of randomization in the process of causal effect estimation and improve the estimation performance.
3) To learn causal effects in a dynamic setting, we generalize the setting and develop our framework to handle this problem across multiple timestamps.
Notice that our work differs from other two areas of causal effect learning on graphs: 1) interference: these works [27,28] assume that the treatment of each unit could causally affect the outcome of other units; 2) network deconfounding [12,26]: these works assume that hidden confounders are buried in the graph structure.These two lines of work and our paper study separate research problems with different assumptions and application scenarios.In this work, our contributions can be summarized as follows: • Problem.Motivated by the MRSA clinical studies, we investigate the important problem of causal effect estimation under entangled treatment.We address the challenges of treatment entanglement, hidden confounders, and a timeevolving environment.To the best of our knowledge, this is the first work addressing these challenges of this problem.• Method.We propose a novel method, NEAT, to address this problem.NEAT estimates causal effects with treatments entangled through a graph.This method leverages the graph topology w.r.t. each node to better model the treatment assignment and facilitate treatment effect estimation even with hidden confounders.This method works for both static and dynamic settings.• Experiment.We conduct extensive experiments to evaluate our method on both synthetic and real-world graphs.Especially, we include real-world clinical data for MRSA infection.
The results validate the effectiveness of our proposed method in different aspects.

PRELIMINARIES 2.1 Notations and Definitions
The observational data is denoted by {, , ,  } .In most studies, treatment is assumed to be a binary value, but in this work, we allow it to be a   -size vector (e.g., a vector that describes patients' close contact patterns).The observed outcomes are denoted by   = {   }  ∈ [ ] .For each unit  at timestamp ,    ∈ R. In this paper, we use bold letters (e.g.,   ) to denote variables for all units, and use unbold letters (e.g.,    ) to denote variables for a single unit.For simplicity, we use the same notation for both variables and data.The subscript (•)  denotes the index of a unit.If it is not necessary to emphasize the index of a specific unit, we drop the subscript to denote a random unit.The causal graph for this study is shown in Fig. 1; in this case, not all the confounders can be directly observed or measured, thus they can often lead to biased treatment effect estimation.The hidden confounders are denoted by This work is based on the well-known Neyman-Rubin potential outcome framework [37].The potential outcome is defined as the outcome which would have been realized when the treatment assignment had been set to a certain value.We denote the potential outcomes under treatment  =  as . Consider a baseline treatment as  =  0 , for a treatment  = , the treatment effect conditioned on covariates  in a static setting is defined as: In a dynamic setting, we denote the historical information before timestamp  as   = {, , ,  } 1,...,−1 .Similar to the above, we denote the historical information regarding unit  before timestamp  as    .When estimate causal effects at timestamp , only the data no later than timestamp  can be used.We define the treatment effect at timestamp  as: Similar as [38], we define the treatment effect for each unit  at timestamp  as    =  (   ,   ).We define the entangled treatment as follows: Definition 2.1.(Entangled treatment) The treatment here can be a function T (•) over the graph structure, the observed features, and the hidden confounders: In a dynamic setting, the treatment is also a function over historical information Notice that as the treatment function has the graph structure as an input, the treatments across different units are no longer individualistic (i.e.,   cannot be determined only based on variables of unit ).A typical example of the treatment function T (•) is the degree of each node.But under many real-world circumstances, T (•) is an unknown function.
The problem we study in this work is formally defined as: Definition 2.2.(Causal effect estimation under entangled treatments) Given the observational data {, , ,  } 1,..., , we aim to estimate the treatment effect  (  ,   ) for different units at each timestamp  with treatments entangled in the graph.

Assumptions
We assume that the outcome is generated by treatment, features, historical information, and hidden confounders as follows: where Y and  are unknown and (nonlinear) functions.We assume  [(  )] = 0.In this work, we take the graph structure as an instrumental variable for IV analysis.An implicit assumption of our work is that the graph information of each node  can be represented as a variable   , and its samples in observational data are sufficient for us to capture the patterns it influences the treatment assignment.
The following assumptions make the graph structure as a valid IV.Inspired by recent IV studies [16], we use the above assumptions to effectively leverage the graph structure as an IV for treatment effect estimation.More analysis can be found in Appendix A.

THE PROPOSED FRAMEWORK
In this section, we introduce the proposed NEAT framework for causal effect learning under entangled treatment on the graph.Fig. 2 shows an illustration of the proposed framework.Specifically, this framework contains three modules: node representation learning, entangled treatment modeling, and outcome prediction.

Overall Pipeline
The whole framework is designed in a classical two-stage IV study pipeline [1,16].Generally, in this pipeline, the first stage predicts the treatment with IVs, and the second stage estimates the potential outcomes based on the treatment predicted by the first stage.
The key intuition behind this design is that, as the IVs are unconfounded, the predicted treatment from the first stage can provide more randomization, and thus it can help mitigate the confounding bias brought by hidden confounders.
In our framework, in the first stage, we train a treatment modeling module to predict treatment assignments for each node at each timestamp.In this module, we leverage the graph structure as an IV, and utilize it to capture the patterns of entangled treatment in the graph.Simultaneously, we learn a representation for each node to encode its properties, including its current features and historical information.In the second stage, we predict potential outcomes based on the original node features, the learned node representations, and the predicted treatment.In this two-stage IV framework, the biases brought by hidden confounders can be effectively eliminated.

Treatment Modeling
Outcome Prediction

Node Representation Learning
The treatment effects are often different for nodes with different properties.For example, close contact may influence patients of different ages differently.To model such heterogeneity, we capture the properties of each node through node representation learning.
For each node , we learn a representation   to encode its properties based on its node features   : Here,  (•) is implemented by a neural network module with learnable parameters.Dynamic setting.In a time-evolving environment, as illustrated in Fig. 1 (b), the current properties of each node can be influenced by the historical data in previous timestamps.To capture the timeevolving properties and model the causal mechanism in a dynamic setting, for each node , we embed the historical information before each timestamp  into a representation    with a recurrent neural network (RNN) [7,18].   is then incorporated into    .At each timestamp, we update the historical embedding as: Here, we learn the representation for each node  at timestamp  with a transformation function Φ(•):

Entangled Treatment Modeling
The treatment function T (•) in Eq. (3) or Eq. ( 4  Treatment Entanglement.As the treatments of different units are entangled through the graph structure, to effectively capture the patterns of treatment assignment, we explicitly leverage the graph structure in the treatment modeling module.As a feasible implementation, we design this module  (•) based on graph neural networks (GNNs) [22,46].Here we use one-layer graph convoluntional network (GCN) [22] to predict the treatment as follows: where  (•) is an activation function such as Softmax.  is the normalized adjacency matrix calculated from the graph   beforehand with the renormalization trick [22].Here [•, •] stands for the concatenation operation. 0 denotes the parameters in GCNs.Loss for treatment modeling.The loss for treatment prediction is denoted by L  .Generally, L  is defined as: (11) where   (•) is a loss term to measure the prediction error of treatment modeling.Noticeably, in this work, we do not restrict the data type of treatment.To handle different types of treatment, we design a different implementation for this module.More specifically, for discrete treatments (e.g., whether a patient has frequent close contact), we implement treatment prediction  (•) as a classification model with the cross-entropy loss function; for continuous treatments (e.g., values that describe the patient's contact patterns) we implement this module as a prediction task with mean square error (MSE) loss.

Outcome Prediction
We train an outcome prediction module  (•) in the second stage, which predicts We denote the loss function for outcome prediction by: where   (•) is a loss function (e.g., MSE) to measure the prediction error of the outcome.For each node , the potential outcome w.r.t.treatment  =  is predicted by   () =  (,   ,   ).We thereby estimate the treatment effect for each node  as:

Implementation Details
In node representation learning, we implement Φ(•) with a multilayer perceptron (MLP) and use a Gated Recurrent Unit (GRU) [7] for RNN.In entangled treatment modeling, we implement  (•) with a GCN layer.For discrete treatments, we use Softmax as the final layer, and take the output logits to model the probability of treatment values.For continuous treatments, we model them with a mixture of Gaussian distribution with component weights   (  ,   ,   ) and parameters (  (  ,   ,   ),   (  ,   ,   )) for each component .In outcome prediction, we use an MLP module to implement  (•), and use MSE loss for   (•).We use two optimizers to train the first and the second stage, respectively.

Discussion
Many graph learning techniques (e.g., GCNs) mainly focus on local graph information (generally, -layer GCNs can handle neighbors within  hops), but if the treatment assignment is affected by a wider range on the graph (e.g., the length of the longest path which contains node ), it would be more difficult to capture and handle such information.However, it is worth noting that the proposed framework should not be limited to the specific implementation as introduced above.Instead, we can replace each component with a different implementation to achieve better specifications if relevant prior knowledge is given.

EXPERIMENTS
In this section, we validate the effectiveness of our proposed method by conducting extensive evaluations.More specifically, our experiments are designed to answer the following research questions: (1) RQ1: How does the proposed framework perform under treatment entanglement compared with state-of-the-art baselines?(2) RQ2: How does the proposed framework perform under different levels of treatment entanglement and hidden confounders?(3) RQ3: How does each component of the proposed framework contribute to the final treatment effect estimation? (4) RQ4: How does the proposed framework perform under different parameter settings?

Dataset and Simulation
In our experiment, we use four datasets with dynamic graph data, including synthetic, semi-synthetic, and real-world data.As it is notoriously hard to obtain the true causal models and counterfactuals from the real world, on the first three datasets, we follow regular practice to evaluate our method on data with simulated causal models.Nevertheless, we encourage our simulation to be as close to reality as possible, thus, our synthetic and semi-synthetic datasets are based on graphs that are generated by real-world relational information and node features.Based on these graph data, we simulate the time-varying hidden confounders, treatment assignments, and outcomes.Hidden confounders.In a static setting, we simulate the hidden confounders as: Here,  denotes an identity matrix of size   (i.e., the dimension of hidden confounders).We set  = 20 by default.
Features.If the node features are available in the dataset, we directly use them.Otherwise, we simulate them by: where  (•) is a linear function R   → R   .Here,   is the dimension of node features.  is a noise vector in Gaussian distribution.
Treatment.We simulate the treatment with function T : where Θ , , Θ , are parameter vectors with dimension   and   , respectively.Each parameter in Θ , * is in Gaussian distribution N (0, 0.5 2 ).N  is the set of neighbors of node  in the graph.We use only one-hop neighbors by default. ∈ [0, 1] is the parameter that controls the strength of treatment entanglement, i.e., the larger  is set, the stronger the graph influences the treatment assignments.BI(•) is a function that maps the input to a binary value.A regular implementation is to transform the input to a probability using a Sigmoid function, and then sample the output with Bernoulli distribution.Noticeably, we do not restrict the treatment to be a binary value.Continuous treatment can be simulated without the BI(•) function; and high-dimensional treatment with dimension   can be simulated by replacing the parameter vector Θ , with a parameter matrix Θ , with dimension   ×   (similarly for Θ , ).  ∼ N (0, 0.01 2 ) is a random Gaussian noise.
Potential outcome.We simulate the potential outcomes as follows: where Θ  and Θ 0 are parameter vectors of dimension   , and Θ  is of dimension   . ≥ 0 is a parameter that controls the strength of the hidden confounder.  ∼ N (0, 0.1 2 ) is a noise.Dynamic setting.In a dynamic setting, we simulate the historical data and hidden confounders over time as: where  is the number of previous timestamps which influence the current one.We set  = 3 by default.Generally, the historical information at each timestamp encodes the previous hidden confounders, node features, treatments, and outcomes.Parameters    ,    ,    , and    control these four types of influence from timestamp  −  .We generate time-varying hidden confounders with a transformation over the historical information.Here,   (•) is a linear transformation function.  ∼ N (0,  ) is a noise.We use the same way as Eq. ( 16) to simulate features.The treatments and outcomes are also generated similarly as above description in Eq. ( 17) and Eq. ( 18), but the historical information    is incorporated by concatenating it with    as input.4.1.2Datasets.We further introduce more details about each dataset used in this paper.More details of data statistics are shown in Table 1, including the number of nodes, edges, features, and timestamps.Random graph.This dataset contains synthetic graphs generated by the Erdös-Rényi (E-R) model [11] at each timestamp.We use NetworkX [14] to generate these graphs.Based on these graphs, we simulate other variables as described in Section 4.1.1.Real-world graphs.We use two real-world dynamic graphs with each node representing a real person and each edge representing a certain type of connection between them.Based on the type of connection, these two datasets are referred as Transaction and Social, respectively.We use the covariates of people in these datasets as node features, and simulate the treatments and outcomes as described in Section 4.1.1.More details of these datasets can be found in Appendix B. MRSA.This dataset contains real-world hospital data for studying Methicillin-resistant Staphylococcus aureus (MRSA) infection.We construct a dynamic graph for the room-sharing relations between patients.At each timestamp, each node is a patient, and an edge exists between a pair of patients if and only if they have shared at least one room during this timestamp.The patient information such as medicine usage and length of stay are taken as node features.We investigate the causal effect of the number of in-room contacts (treatment) on MRSA infection test results (outcome).We consider there exist hidden confounders such as patients' behavior habits.In this dataset, we do not use any simulated data, and do not evaluate our causal effect estimation based on simulated counterfactuals.Instead, we use the domain knowledge regarding MRSA to confirm our findings.

Baselines
In the experiments, we compare our method with some state-ofthe-art baselines.These baselines can be divided into the following three main categories: • Individual units.These methods are based on the assumption that different units are independent.They estimate the treatment effect by adjusting for confounders based on unit covariates.We adopt the widely-used methods including S-Learner (SL) [23], causal forest (CF) [47], and counterfactual regression (CFR) [39].• Network deconfounder.These methods assume that there is a graph connecting different units.They mitigate confounding biases by using the graph structure as a proxy for hidden confounders.We use the network deconfounder (NetDeconf) [12] and the dynamic network deconfounder (DNDC) [26].• DeepIV.This method [16] uses instrumental variables to mitigate the confounding biases.For each node , we take the -th row in the adjacency matrix as its IV.We use the implementation released in the EconML package1 for S-Learner, causal forest, and DeepIV.

Evaluation Metrics
We adopt two widely-adopted metrics for treatment effect estimation, including Rooted Precision in Estimation of Heterogeneous Effect (PEHE) [17] and Mean Absolute Error (ATE) [49] at each timestamp : For all the experiments, we calculate the average values of these metrics over all timestamps, and still denote them by

√
and    for simplicity.

Setup
For all datasets, we randomly split them into 60%/20%/20% training/validation/test data.By default, we focus on the dynamic setting and set the number of training epochs as 2000, the learning rate as 0.004, the dimension for node representation and history embedding as 32 and 20, respectively,  = 0.5,  = 0.5.We report the mean and standard deviation of performance over ten repeated executions on test data.More details of experiment setup are in Appendix B.

RQ1: Performance of Different Methods
To demonstrate the effectiveness of the proposed method, in Table 2, we show the treatment effect estimation performance of our method and the baselines in both static and dynamic settings.We observe that in both settings, the proposed method NEAT outperforms other baselines in different metrics.We attribute the improvement to two key factors: 1) We explicitly incorporate the graph structure to model the treatment assignment.During this process, we can   better utilize the observational data for treatment effect estimation.Among the baselines, SL, CF, and CFR do not consider the graph which connects different units; NetDeconf and DNDC can leverage graph structure, but they use the graph as a proxy to infer the hidden confounders.These methods, however, do not fit in well in the problem setting studied in this paper.2) We utilize the graph structure as an instrumental variable to eliminate the confounding biases.Among the baselines, SL, CF, and CFR are based on the unconfoundedness assumption; NetDeconf and DNDC assume the hidden confounders can be inferred from the graph structure.These assumptions cannot be satisfied in our datasets.DeepIV also takes the graph information as an instrumental variable to handle hidden confounders, but its performance is impeded due to the lack of proper techniques to handle graph data.

RQ2: Performance under Different Levels of Treatment Entanglement and Confounders
To evaluate our method more comprehensively, we test it under different levels of treatment entanglement.In the simulation, we control the treatment entanglement with parameter : the larger  is set, the stronger the treatment assignment of each node is  3 shows the causal effect estimation performance when we set  as different values.Generally, we observe more obvious performance gain when  is larger.This observation indicates that our method can well handle the entangled treatments by leveraging the graph structure.We only show the results on the Random dataset, but similar observations can also be found on other datasets.We also evaluate our method under different levels of hidden confounders.In Fig. 4, we show the results when we change the strength of hidden confounders.Specifically, we change the strength by multiplying the hidden confounders in simulation with the parameter  ≥ 0. From Fig. 4, it can be observed that compared with baselines, our method is more robust with hidden confounders.This is because we effectively utilize the graph as an instrumental variable to mitigate confounding biases.

RQ3: Ablation Study
To verify the effectiveness of each component in our method, we conduct an ablation study including the following variants: (1) NEAT-NT: In this variant, we replace the treatment modeling module with a random sampling over the space of treatment assignment; (2) NEAT-NG: In this variant, we do not use the graph in treatment modeling, and replace the input adjacency matrix with an identity matrix.(3) NEAT-NH: In this variant, we remove the RNN in our method and do not use historical information.Fig. 5 reports the performance of our method and these variants.The results show that all the different components contribute to the final superior performance of our method.

Case Study on Real-world Hospital Data
Methicillin-resistant Staphylococcus aureus (MRSA) is a difficultto-treat pathogen (owing to multi-drug resistance) that is known to spread efficiently within hospitals via contact.One important avenue of hospitalized patient-to-patient MRSA transmission is thought to be through contamination of hospital room surfaces and equipment [29].In addition, patients may be more or less susceptible to acquiring MRSA given individual factors [40], and MRSA transmission rates may vary according to particular hospital wards (or hospital units) [32].
The MRSA dataset contains observational data including patient covariates, room-sharing information, and MRSA test record from a real-world hospital.We construct a room-sharing network, in which an edge connects two patients (nodes) if and only if they have appeared in at least one same room simultaneously.We use our method to investigate the following causal questions: (1) How does the number of in-room contacts causally influence the MRSA infection risk?(2) How do other treatments, such as the type of hospital unit (e.g.Cardiology, Internal Medicine, etc.) causally influence the MRSA infection risk?As the ground-truth causal model is unknown, it is infeasible to evaluate our method on this dataset with the aforementioned metrics.Instead, we show some case studies and verify our key findings with domain knowledge.
For the first question, we map the number of in-room contacts into three levels of treatment.Here, treatments 0, 1, 2 represent the roommate number from low to high.We take  = 0 as the control group, and calculate the treatment effect for  = 1 and  = 2 by comparing the estimated potential outcomes of them with the case of  = 0, respectively.Table 3 shows the estimated averaged treatment effect (ATE) of roommate number on MRSA infection over all the patients, and also shows the estimated conditional averaged treatment effect (CATE) conditioned on each subpopulation of patients in a specific group of rooms.From the results, we observe that: 1) In general, the increase in roommate number could result in an increase in MRSA infection risk.This observation holds in the whole population and different subpopulations.As MRSA is contagious through physical contact, this observation is consistent with domain knowledge.2) The CATE of roommate number on MRSA infection is the strongest in Intensive Care and Gerontology.In Intensive Care, it is frequent for patients to share devices such as ventilators, which leads to a more severe risk of infection when the number of in-room contacts increases.Besides, most patients in Gerontology rooms are older adults with comorbidities associated with MRSA susceptibility (i.e., age >79, prior nursing home residence, antibiotic exposure, dementia, stroke, or diabetes), which brings a higher risk for acquiring MRSA from the environment with more physical contact [41].
For the second question, we take the hospital unit type as treatment, and show the estimated ATE of each hospital unit type on MRSA infection in Table 4. Here, we take General Surgery as the baseline treatment (control group).From Table 4, we observe that staying in Intensive Care and Gerontology rooms increases the MRSA infection risk most obviously.The reason might lie in the properties of these units (equipment sharing in the intensive care units, and more MRSA carriers in Gerontology).We also observe a relatively low treatment effect among beds in Transitional Care and Hematology/Oncology units.Most of these rooms are private (as opposed to other semi-private or 2-patient shared rooms), and may lead to less infection risk.

RELATED WORK
In this section, we introduce some representative studies related to this work, including causal inference on graph data and instrumental variable analysis.Causal inference on graph data.Causal inference on graph data has recently attracted arising attention [12,26,31,48,51].Under this broad area, the topics which are most related to this work include: 1) Entangled treatment: a few initial explorations [44,45] have been made for entangled treatment.These works discuss the challenges of entangled treatment modeling, and extend the traditional propensity score method for this problem.In our work, we do not limit the method to be propensity score-based, and consider a more general setting of entangled treatment with unknown treatment function, hidden confounders, and dynamic data.2) Network deconfounding: A line of works [12,26] leverage the graph structure among units to capture the hidden confounders.Netdeconf [12] develops a GCN-based framework to learn the representations of hidden confounders, and adjusts for the confounders on top of the learned representations.DNDC [26] further proposes to learn time-varying confounder representations from observational dynamic graphs.Although we also allow the existence of hidden confounders, our work differs from their application scenarios, as we focus on the setting in which the graph structure is an IV rather than a proxy for confounders.3) Network interference: Traditional causal effect estimation studies are based on the Stable Unit Treatment Value (SUTVA) assumption [34,35] that the treatment of each unit does not causally affect the outcome of other units (i.e., interference does not exist).However, interference often exists between connected units in graph data [2,5,52].There have been many works [2,19,27,28,43,50] addressing the problem of causal inference under interference.Our work differs from them as we do not assume the existence of interference in graphs.Instead, we focus on the case when the graph influences the treatment assignment.Instrumental variable.Hidden confounders can bring biases in causal effect estimation.Different from most causal inference methods which assume that all the confounders are observed, instrumental variable (IV) based methods provide an alternative approach to identifying causal effects even with the existence of hidden confounders.One of the most well-known lines of IV studies is twostage methods [1,8,16,30].The two-stage least squares method (2SLS) [1] is the most representative work in this line, which first fits a linear model to predict treatment with features and IVs, and then fits another linear model to predict the outcome with the features and the predicted treatment.2SLS is based on two strong assumptions: homogeneity (treatment effect is the same for different units) and linearity (the linear models are correctly specified).There have been many follow-up works to relax these assumptions.DeepIV [16] is a neural network-based two-stage method that allows nonlinearity and heterogeneity.Another line of IV studies is based on the generalized method of moments (GMM) [15,24].Among them, DeepGMM [4] leverages the moment conditions to identify the counterfactual generation function and estimate causal effects.But most of the existing IV studies focus on instrument variables in simple structures, such as scalars and vectors.

CONCLUSION
In this paper, motivated from the task of investigating the impact of close contact on MRSA infection in a room-sharing network, we studied the problem of causal effect estimation under entangled treatment.We discussed the related challenges and applications of this problem.To address this problem, we proposed a novel method NEAT, which leverages the graph structure to better model the treatment assignments, and mitigates the confounding biases by using the graph structure as an instrumental variable.Considering the fact that the observational data is often time-varying in the real world, we further generalize the problem to a dynamic setting.Extensive experiments on synthetic, semi-synthetic, and real-world graph data validate the effectiveness of the proposed method.Especially, the validation of our method on real-world data provides insights for its future applications in real-world clinical studies.In the future, interesting directions of entangled treatment modeling on graphs include incorporating different levels of graph information (e.g., local-level and global-level) in treatment modeling, and considering entanglements in different types of graph data such as heterogeneous graphs and knowledge graphs.offer or solicitation for the purchase or sale of any security, financial instrument, financial product or service, or to be used in any way for evaluating the merits of participating in any transaction, and shall not constitute a solicitation under any jurisdiction or to any person, if such solicitation under such jurisdiction or to such person would be unlawful.

Figure 1 :
Figure 1: The causal graph of the studied problem in a static setting and in a dynamic setting.Each vertex in the causal graph represents a variable, and each arrow  →  represents a causal relation from  to .The observable variables are shown in white while the unobserved ones are shown in grey.

Figure 2 :
Figure 2: The proposed framework NEAT.It contains three components: node representation learning, entangled treatment modeling, and outcome prediction.
) is often not predetermined.To better estimate treatment effects from observational data, we capture the treatment assignment patterns by training a module  (•) to model the conditional distribution of treatment    given   ,    ,    .The treatment modeling module is trained in the first stage together with node representation learning:    =  (  ,    ,    ) =  (  , Φ(   ,    )).

4. 1 . 1
Simulation.We describe the way we simulate different variables as follows.More details of simulation are in Appendix B.

Figure 3 :Figure 4 :
Figure 3: Treatment effect estimation performance under different levels of treatment entanglement on Random dataset.

Figure 5 :
Figure 5: Ablation study for different variants of NEAT on Random dataset.

Figure 6 :
Figure 6: Parameter study for NEAT on Random dataset.
∈ [ ] , and for each unit ,   ∈ R   .The graph structure connecting these units at each timestamp is an  ×  binary matrix   = { } Assumption 1. (Relevance) Given   ,   for any random unit, the treatment is relevant to the graph structure, i.e.,   ̸⊥ ⊥   |  ,   .

Table 1 :
Detailed statistics of the datasets.

Table 3 :
Estimated treatment effect of roommate number on MRSA infection in different populations of patients.

Table 4 :
Estimated treatment effect of hospital unit type on MRSA infection.