# Causal inference learners

Given covariates \(X\), treatment indicator \(W\) and a binary outcome \(Y \in \{0,1\}\).

## Inverse probability of treatment weights

Fit a model on observations \(X\) to predict the treatment \(W\), \(p_{w_i}(x_i)=P(W=w_i|X=x_i)\) and use the probability of being treated as a sample weight to predict the outcome \(Y\).

## S-learner

A single model is trained on observations and treatments.

## T-learner

Two models, one model per treatment, are trained on the observations only.

## X-learner

The X-learner first estimates the response function (\(\hat{\mu}_0\) and \(\hat{\mu}_1\)), then coomputes the imputed treatment effects (\(\tilde{D}_{i}^{1}\) and \(\tilde{D}_{i}^{0}\)), then estimates the conditional average treatment effect for treated and controls (\(\hat{\tau}_1\) and \(\hat{\tau}_0\)), and finally averages the estimates.

\(g(x)\in[0,1]\) is a weighting function which is chosen to minimize the variance of \(\hat{\tau}(x)\). \(g(x)\) can be estimated by the propsensity score or set to a constant and equal o the ratio of treated to untreated samples.

## G-Computation

Parameters we can derive from the G-Computation:

- For a binary outcome, \(Y\in\{0,1\}\), \(P(Y=1)=E[Y]\).
- \(E[Y^1]-E[Y^0]\) is the causal risk difference due to treatment
- \(E[Y^1]/E[Y^0]\) is the causal relative risk
- \(E[Y^1](1-E[Y^0])/E[Y^0](1-E[Y^1])\) is the causal odds ratio due to treatment