Causal inference learners
Given covariates \(X\), treatment indicator \(W\) and a binary outcome \(Y \in \{0,1\}\).
Inverse probability of treatment weights
Fit a model on observations \(X\) to predict the treatment \(W\), \(p_{w_i}(x_i)=P(W=w_i|X=x_i)\) and use the probability of being treated as a sample weight to predict the outcome \(Y\).
S-learner
A single model is trained on observations and treatments.
T-learner
Two models, one model per treatment, are trained on the observations only.
X-learner
The X-learner first estimates the response function (\(\hat{\mu}_0\) and \(\hat{\mu}_1\)), then coomputes the imputed treatment effects (\(\tilde{D}_{i}^{1}\) and \(\tilde{D}_{i}^{0}\)), then estimates the conditional average treatment effect for treated and controls (\(\hat{\tau}_1\) and \(\hat{\tau}_0\)), and finally averages the estimates.
\(g(x)\in[0,1]\) is a weighting function which is chosen to minimize the variance of \(\hat{\tau}(x)\). \(g(x)\) can be estimated by the propsensity score or set to a constant and equal o the ratio of treated to untreated samples.
G-Computation
Parameters we can derive from the G-Computation:
- For a binary outcome, \(Y\in\{0,1\}\), \(P(Y=1)=E[Y]\).
- \(E[Y^1]-E[Y^0]\) is the causal risk difference due to treatment
- \(E[Y^1]/E[Y^0]\) is the causal relative risk
- \(E[Y^1](1-E[Y^0])/E[Y^0](1-E[Y^1])\) is the causal odds ratio due to treatment