Asif Rahman

Causal inference learners

Given covariates \(X\), treatment indicator \(W\) and a binary outcome \(Y \in \{0,1\}\).

Inverse probability of treatment weights

Fit a model on observations \(X\) to predict the treatment \(W\), \(p_{w_i}(x_i)=P(W=w_i|X=x_i)\) and use the probability of being treated as a sample weight to predict the outcome \(Y\).

\[ IPTW_i=\frac{1}{p_{w_i}} \]


A single model is trained on observations and treatments.

\[ \begin{eqnarray} \hat{\mu} = M(Y \sim (X,W))\\ \hat{\tau}(x)=\hat{\mu}(x,1) - \hat{\mu}(x,0) \end{eqnarray} \]


Two models, one model per treatment, are trained on the observations only.

\[ \begin{eqnarray} \hat{\mu}_0 = M_0(Y^0 \sim X^0)\\ \hat{\mu}_1 = M_1(Y^1 \sim X^1)\\ \hat{\tau}(x)=\hat{\mu}_1(x)-\hat{\mu}_0(x) \end{eqnarray} \]


The X-learner first estimates the response function (\(\hat{\mu}_0\) and \(\hat{\mu}_1\)), then coomputes the imputed treatment effects (\(\tilde{D}_{i}^{1}\) and \(\tilde{D}_{i}^{0}\)), then estimates the conditional average treatment effect for treated and controls (\(\hat{\tau}_1\) and \(\hat{\tau}_0\)), and finally averages the estimates.

\[ \begin{eqnarray} \hat{\mu}_0 = M_1(Y^0 \sim X^0)\\ \hat{\mu}_1 = M_2(Y^1 \sim X^1)\\ \tilde{D}_{i}^{1}=Y_i^1 - \hat{\mu}_0(X_i^1)\\ \tilde{D}_{i}^{0}=\hat{\mu}_1(X_i^0) - Y_i^0\\ \hat{\tau}_1=M_3(\tilde{D}^{1} \sim X^1)\\ \hat{\tau}_0=M_4(\tilde{D}^{0} \sim X^0)\\ \hat{\tau}(x)=g(x)\hat{\tau}_0(x)+(1-g(x))\hat{\tau}_1(x) \end{eqnarray} \]

\(g(x)\in[0,1]\) is a weighting function which is chosen to minimize the variance of \(\hat{\tau}(x)\). \(g(x)\) can be estimated by the propsensity score or set to a constant and equal o the ratio of treated to untreated samples.


Parameters we can derive from the G-Computation:

  • For a binary outcome, \(Y\in\{0,1\}\), \(P(Y=1)=E[Y]\).
  • \(E[Y^1]-E[Y^0]\) is the causal risk difference due to treatment
  • \(E[Y^1]/E[Y^0]\) is the causal relative risk
  • \(E[Y^1](1-E[Y^0])/E[Y^0](1-E[Y^1])\) is the causal odds ratio due to treatment