Asif Rahman

Causal inference learners

Given covariates \(X\), treatment indicator \(W\) and a binary outcome \(Y \in \{0,1\}\).

Inverse probability of treatment weights

Fit a model on observations \(X\) to predict the treatment \(W\), \(p_{w_i}(x_i)=P(W=w_i|X=x_i)\) and use the probability of being treated as a sample weight to predict the outcome \(Y\).

\[ IPTW_i=\frac{1}{p_{w_i}} \]

S-learner

A single model is trained on observations and treatments.

\[ \begin{eqnarray} \hat{\mu} = M(Y \sim (X,W))\\ \hat{\tau}(x)=\hat{\mu}(x,1) - \hat{\mu}(x,0) \end{eqnarray} \]

T-learner

Two models, one model per treatment, are trained on the observations only.

\[ \begin{eqnarray} \hat{\mu}_0 = M_0(Y^0 \sim X^0)\\ \hat{\mu}_1 = M_1(Y^1 \sim X^1)\\ \hat{\tau}(x)=\hat{\mu}_1(x)-\hat{\mu}_0(x) \end{eqnarray} \]

X-learner

The X-learner first estimates the response function (\(\hat{\mu}_0\) and \(\hat{\mu}_1\)), then coomputes the imputed treatment effects (\(\tilde{D}_{i}^{1}\) and \(\tilde{D}_{i}^{0}\)), then estimates the conditional average treatment effect for treated and controls (\(\hat{\tau}_1\) and \(\hat{\tau}_0\)), and finally averages the estimates.

\[ \begin{eqnarray} \hat{\mu}_0 = M_1(Y^0 \sim X^0)\\ \hat{\mu}_1 = M_2(Y^1 \sim X^1)\\ \tilde{D}_{i}^{1}=Y_i^1 - \hat{\mu}_0(X_i^1)\\ \tilde{D}_{i}^{0}=\hat{\mu}_1(X_i^0) - Y_i^0\\ \hat{\tau}_1=M_3(\tilde{D}^{1} \sim X^1)\\ \hat{\tau}_0=M_4(\tilde{D}^{0} \sim X^0)\\ \hat{\tau}(x)=g(x)\hat{\tau}_0(x)+(1-g(x))\hat{\tau}_1(x) \end{eqnarray} \]

\(g(x)\in[0,1]\) is a weighting function which is chosen to minimize the variance of \(\hat{\tau}(x)\). \(g(x)\) can be estimated by the propsensity score or set to a constant and equal o the ratio of treated to untreated samples.

G-Computation

Parameters we can derive from the G-Computation: