Causal inference learners
IPW, S-learner, T-learner, X-learner, G-Computation
Given covariates $X$, treatment indicator $W$ and a binary outcome $Y \in {0,1}$.
Inverse probability of treatment weights
Fit a model on observations $X$ to predict the treatment $W$, $p_{w_i}(x_i)=P(W=w_i|X=x_i)$ and use the probability of being treated as a sample weight to predict the outcome $Y$.
$$IPTW_i=\frac{1}{p_{w_i}}$$
S-learner
A single model is trained on observations and treatments.
$$\begin{aligned} \hat{\mu} = M(Y \sim (X,W))\\ \hat{\tau}(x)=\hat{\mu}(x,1) - \hat{\mu}(x,0) \end{aligned}$$
T-learner
Two models, one model per treatment, are trained on the observations only.
$$\begin{aligned} \hat{\mu}_0 = M_0(Y^0 \sim X^0)\\ \hat{\mu}_1 = M_1(Y^1 \sim X^1)\\ \hat{\tau}(x)=\hat{\mu}_1(x)-\hat{\mu}_0(x) \end{aligned}$$
X-learner
The X-learner first estimates the response function ($\hat{\mu}0$ and $\hat{\mu}1$), then coomputes the imputed treatment effects ($\tilde{D}{i}^{1}$ and $\tilde{D}{i}^{0}$), then estimates the conditional average treatment effect for treated and controls ($\hat{\tau}_1$ and $\hat{\tau}_0$), and finally averages the estimates.
$$\begin{aligned} \hat{\mu}0 = M_1(Y^0 \sim X^0)\\ \hat{\mu}1 = M_2(Y^1 \sim X^1)\\ \tilde{D}{i}^{1}=Y_i^1 - \hat{\mu}{0}(X_i^1)\\ \tilde{D}_{i}^{0}=\hat{\mu}1(X_i^0) - Y{i}^{0}\\ \hat{\tau}_1=M_3(\tilde{D}^{1} \sim X^{1})\\ \hat{\tau}_0=M_4(\tilde{D}^{0} \sim X^{0})\\ \hat{\tau}(x)=g(x)\hat{\tau}_0(x)+(1-g(x))\hat{\tau}_1(x) \end{aligned}$$
$g(x)\in[0,1]$ is a weighting function which is chosen to minimize the variance of $\hat{\tau}(x)$. $g(x)$ can be estimated by the propsensity score or set to a constant and equal o the ratio of treated to untreated samples.
G-Computation
Parameters we can derive from the G-Computation: