Double/Debiased Machine Learning for Treatment and Structural Parameters
V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, W. Newey, J. Robins
Double/Debiased Machine Learning, proposed by Chernozhukov et al., tackles the problem of inference on some parameter, \(\theta_0\), in the setting where there is a potentially high-dimensional set of control variables/confounders. To do so, this method harnesses generic ML algorithms which perform well in high-dimensional settings. DML can be applied to learn the ATE and LATE in a partially linear regression model and partially linear instrumental variables model, respectively. It can also be used to estimate the ATE and LATE in the interactive model and interactive instrumental variables model, where treatment effects are fully heterogeneous.
This post is going to show how to implement estimation of the ATE and LATE in both the partially linear and interactive settings. This will be a pretty minimal prototype, implemented entirely in base R. I’ll use a couple other packages for the ML methods, but which models you choose to use are entirely separate from the general method outlined here.
Packages
library(estimatr)
library(glmnet)
library(hdm)
library(ranger)
Data
For clarity, we’ll use one of the same empirical examples used in the actual paper. This example is described as follows:
We use the DML method to estimate the effect of 401(k) eligibility, the treatment variable, and 401(k) participation, a self‐selected decision to receive the treatment that we instrument for with assignment to the treatment state, on accumulated assets. In this example, the treatment variable is not randomly assigned and we aim to eliminate the potential biases due to the lack of random assignment by flexibly controlling for a rich set of variables.
data(pension)
<- pension[sample(1:nrow(pension), nrow(pension)), ]
pension
# Construct model inputs
<- ~ (
form poly(age, 2) + poly(inc, 2) + poly(educ, 2) + poly(fsize,2)
+ as.factor(marr) + as.factor(twoearn) + as.factor(db) + as.factor(pira)
+ as.factor(hown)
^2
)<- model.matrix(form, data = pension)
X <- pension[, c("age", "inc", "fsize", "educ", "db", "marr", "twoearn", "pira", "hown")]
X_RF <- pension$e401
D <- pension$net_tfa Y
Now, let’s define a few simple functions for applying three different ML models (Random Forest, LASSO, and ElasticNet with \(\alpha = 0.5\)) to our data. These three models will be used across each of the four DML estimation strategies.
# LASSO learner and prediction functions
<- \(x, y) cv.glmnet(x, y, nfolds = 5)
y_learner_lasso <- \(x, y) cv.glmnet(x, y, nfolds = 5, family = "binomial")
d_learner_lasso <- \(mod, data) predict(mod, data,"lambda.min", "response")
pred_fn_lasso
# ElasticNet learner and prediction functions
<- \(x, y) cv.glmnet(x, y, nfolds = 5, alpha = 0.5)
y_learner_enet <- \(x, y) cv.glmnet(x, y, nfolds = 5, alpha = 0.5, family = "binomial")
d_learner_enet <- \(mod, data) predict(mod, data, "lambda.min", "response")
pred_fn_enet
# Random Forest learner and prediction functions
<- \(x, y) ranger(x = x, y = y, num.trees = 1000)
y_learner_rf <- \(x, y) ranger(x = x, y = y, num.trees = 1000, probability = TRUE)
d_learner_rf <- \(mod, data) predict(mod, data)$predictions
pred_fn_rf_y <- \(mod, data) predict(mod, data)$predictions[, "1"] pred_fn_rf_d
Estimating the ATE of 401(k) Eligibility on Net Financial Assets
In the example in this paper, we use the same data as in Chernozhukov and Hansen (2004). We use net financial assets – defined as the sum of IRA balances, 401(k) balances, checking accounts, US saving bonds, other interest‐earning accounts in banks and other financial institutions, other interest‐earning assets (such as bonds held personally), stocks, and mutual funds less non‐mortgage debt – as the outcome variable, Y, in our analysis. Our treatment variable, D, is an indicator for being eligible to enroll in a 401(k) plan. The vector of raw covariates, X, consists of age, income, family size, years of education, a married indicator, a two‐earner status indicator, a defined benefit pension status indicator, an IRA participation indicator, and a home‐ownership indicator.
Partially Linear Regression Model
First we will calculate the ATE of 401(k) eligibility on net financial assets in the partially linear model.
The following function implements DML for the partially linear model.
<- function(X, Y, D, y_learner, d_learner, pred_fn_y, pred_fn_d = pred_fn_y) {
dml_plm
# Create folds for sample splitting
<- sample(1:nrow(X), size = round((1/3)*nrow(X)))
fold1 <- sample((1:nrow(X))[-fold1], size = round((1/3)*nrow(X)))
fold2 <- (1:nrow(X))[-c(fold1, fold2)]
fold3
# Create data.frame to store residuals
<- data.frame()
resids
# Generate outcome model across folds
for (fold in list(fold1, fold2, fold3)) {
# Get the training/prediction indices
<- (1:nrow(X))[-fold]
idx_train <- fold
idx_predict
# Outcome prediction model
<- y_learner(
outcome_model x = X[idx_train, , drop = FALSE],
y = Y[idx_train]
)
# Residualize outcome
<- pred_fn_y(outcome_model, X[idx_predict, , drop = FALSE])
outcome_predictions <- Y[idx_predict] - outcome_predictions
outcome_resids
# Treatment prediction model
<- d_learner(
treatment_model x = X[idx_train, , drop = FALSE],
y = D[idx_train]
)
# Residualize treatment
<- pred_fn_d(treatment_model, X[idx_predict, , drop = FALSE])
treatment_predictions <- pmin(pmax(treatment_predictions, 0.01), .99)
treatment_predictions <- (
treatment_resids # This is necessary to deal with cases where D is a factor
as.numeric(as.character(D[idx_predict])) - treatment_predictions
)
# Collect residuals
<- data.frame(
new_resids "y_resid" = unname(outcome_resids),
"d_resid" = unname(treatment_resids),
"y_hat" = unname(outcome_predictions),
"d_hat" = unname(treatment_predictions),
"idx" = idx_predict
)<- do.call(rbind, list(resids, new_resids))
resids
}
# Return data in original ordering
return(resids[order(resids[, "idx", drop = TRUE]), , drop = FALSE])
}
Now that we’ve defined the function, let’s estimate the ATE using our three different ML methods.
# Estimate DML first stage
set.seed(123)
<- dml_plm(
first_stage_lasso X = X,
Y = Y,
D = D,
y_learner = y_learner_lasso,
d_learner = d_learner_lasso,
pred_fn_y = pred_fn_lasso
)<- dml_plm(
first_stage_enet X = X,
Y = Y,
D = D,
y_learner = y_learner_enet,
d_learner = d_learner_enet,
pred_fn_y = pred_fn_enet
)<- dml_plm(
first_stage_rf X = X_RF,
Y = Y,
D = factor(D),
y_learner = y_learner_rf,
d_learner = d_learner_rf,
pred_fn_y = pred_fn_rf_y,
pred_fn_d = pred_fn_rf_d
)
# Estimate the ATE using OLS
<- lm_robust(y_resid ~ d_resid, first_stage_lasso, se_type = "HC1")
ate_lasso <- lm_robust(y_resid ~ d_resid, first_stage_enet, se_type = "HC1")
ate_enet <- lm_robust(y_resid ~ d_resid, first_stage_rf, se_type = "HC1") ate_rf
LASSO | ElasticNet | RandomForest | |
---|---|---|---|
Estimate | 9738.496 | 9816.064 | 8986.493 |
Std. Error | 1372.968 | 1412.982 | 1274.455 |
RMSE Y | 53255.882 | 53939.421 | 54589.826 |
RMSE D | 0.444 | 0.444 | 0.447 |
Interactive Regression Model
Now, we’ll define the function to estimate the ATE in the fully heterogeneous model.
<- function(X,
dml_irm
Y,
D,
y_learner,
d_learner,
pred_fn_y,pred_fn_d = pred_fn_y) {
# Create folds for sample splitting
<- sample(1:nrow(X), size = round((1/3)*nrow(X)))
fold1 <- sample((1:nrow(X))[-fold1], size = round((1/3)*nrow(X)))
fold2 <- (1:nrow(X))[-c(fold1, fold2)]
fold3
# Create data.frame to store residuals
<- data.frame()
resids
# Generate outcome model across folds
for (fold in list(fold1, fold2, fold3)) {
# Get the training/prediction indices
<- (1:nrow(X))[-fold]
idx_train <- fold
idx_predict <- as.numeric(as.character(D))
numD
# Treatment prediction model
<- d_learner(
treatment_model x = X[idx_train, , drop = FALSE],
y = D[idx_train]
)
# Residualize treatment
<- pred_fn_d(treatment_model, X[idx_predict, , drop = FALSE])
treatment_predictions <- pmin(pmax(treatment_predictions, 0.01), .99)
treatment_predictions <- (numD[idx_predict] - treatment_predictions)
treatment_resids
# Outcome prediction model
<- cbind("D" = numD, X)
XD <- y_learner(
outcome_model x = XD[idx_train, , drop = FALSE],
y = Y[idx_train]
)
# Outcome predictions under control
"D"] <- 0
XD[, <- pred_fn_y(outcome_model, XD[idx_predict, , drop = FALSE])
outcome_preds_d0
# Outcome predictions under treatment
"D"] <- 1
XD[, <- pred_fn_y(outcome_model, XD[idx_predict, , drop = FALSE])
outcome_preds_d1 <- (
outcome_predictions *outcome_preds_d1
numD[idx_predict]+ (1 - numD[idx_predict])*outcome_preds_d0
)
# Calculate individual level weights
<- Y[idx_predict] - outcome_predictions
outcome_resids <- (
inv_prop_scores /treatment_predictions
numD[idx_predict]- (1 - numD[idx_predict])/(1 - treatment_predictions)
)<- outcome_preds_d1 - outcome_preds_d0
expected_diff <- expected_diff + inv_prop_scores*outcome_resids
effect_estimates
# Collect residuals
<- data.frame(
new_resids "effect_estimates" = unname(effect_estimates),
"y_resid" = unname(outcome_resids),
"d_resid" = unname(treatment_resids),
"y_hat" = unname(outcome_predictions),
"d_hat" = unname(treatment_predictions),
"idx" = idx_predict
)<- do.call(rbind, list(resids, new_resids))
resids
}
# Return data in original ordering
return(resids[order(resids[, "idx", drop = TRUE]), , drop = FALSE])
}
Let’s estimate the ATE!
# Estimate DML first stage
set.seed(123)
<- dml_irm(
first_stage_lasso X = X,
Y = Y,
D = D,
y_learner = y_learner_lasso,
d_learner = d_learner_lasso,
pred_fn_y = pred_fn_lasso
)<- dml_irm(
first_stage_enet X = X,
Y = Y,
D = D,
y_learner = y_learner_enet,
d_learner = d_learner_enet,
pred_fn_y = pred_fn_enet
)<- dml_irm(
first_stage_rf X = X_RF,
Y = Y,
D = factor(D),
y_learner = y_learner_rf,
d_learner = d_learner_rf,
pred_fn_y = pred_fn_rf_y,
pred_fn_d = pred_fn_rf_d
)
# Estimate the ATE using OLS
<- lm_robust(effect_estimates ~ 1, first_stage_lasso, se_type = "HC1")
ate_lasso <- lm_robust(effect_estimates ~ 1, first_stage_enet, se_type = "HC1")
ate_enet <- lm_robust(effect_estimates ~ 1, first_stage_rf, se_type = "HC1") ate_rf
LASSO | ElasticNet | RandomForest | |
---|---|---|---|
Estimate | 9078.850 | 9311.623 | 8100.171 |
Std. Error | 1412.741 | 1462.927 | 1149.504 |
RMSE Y | 53085.113 | 53732.086 | 54427.880 |
RMSE D | 0.444 | 0.444 | 0.447 |
Estimating the LATE of 401(k) Participation on Net Financial Assets
We now will estimate the LATE of 401(k) participation on net financial assets, using 401(k) eligibility as our instrument and 401(k) participation as our treatment.
<- pension$p401
D <- pension$e401 Z
Partially Linear IV Model
The following function estimates the partially linear IV model.
<- function(X,
dml_plivm
Y,
D,
Z,
y_learner,
d_learner,z_learner = d_learner,
pred_fn_y,pred_fn_d = pred_fn_y,
pred_fn_z = pred_fn_d) {
# Create folds for sample splitting
<- sample(1:nrow(X), size = round((1/3)*nrow(X)))
fold1 <- sample((1:nrow(X))[-fold1], size = round((1/3)*nrow(X)))
fold2 <- (1:nrow(X))[-c(fold1, fold2)]
fold3
# Create data.frame to store residuals
<- data.frame()
resids
# Generate outcome model across folds
for (fold in list(fold1, fold2, fold3)) {
# Get the training/prediction indices
<- (1:nrow(X))[-fold]
idx_train <- fold
idx_predict
# Convert treatment and instrument to numeric
<- as.numeric(as.character(D[idx_predict]))
numD <- as.numeric(as.character(Z[idx_predict]))
numZ
# Outcome prediction model
<- y_learner(
outcome_model x = X[idx_train, , drop = FALSE],
y = Y[idx_train]
)
# Residualize outcome
<- pred_fn_y(outcome_model, X[idx_predict, , drop = FALSE])
outcome_predictions <- Y[idx_predict] - outcome_predictions
outcome_resids
# Treatment prediction model
<- d_learner(
treatment_model x = X[idx_train, , drop = FALSE],
y = D[idx_train]
)
# Residualize treatment
<- pred_fn_d(treatment_model, X[idx_predict, , drop = FALSE])
treatment_predictions <- numD - treatment_predictions
treatment_resids
# Instrument prediction model
<- z_learner(
instrument_model x = X[idx_train, , drop = FALSE],
y = Z[idx_train]
)
# Residualize instrument
<- pred_fn_z(instrument_model, X[idx_predict, , drop = FALSE])
instrument_predictions <- pmin(pmax(instrument_predictions, 0.01), .99)
instrument_predictions <- numZ - instrument_predictions
instrument_resids
# Collect residuals
<- data.frame(
new_resids "y_resid" = unname(outcome_resids),
"d_resid" = unname(treatment_resids),
"z_resid" = unname(instrument_resids),
"y_hat" = unname(outcome_predictions),
"d_hat" = unname(treatment_predictions),
"z_hat" = unname(instrument_predictions),
"idx" = idx_predict
)<- do.call(rbind, list(resids, new_resids))
resids
}
# Return data in original ordering
return(resids[order(resids[, "idx", drop = TRUE]), , drop = FALSE])
}
Let’s estimate the LATE now!
# Estimate DML first stage
set.seed(123)
<- dml_plivm(
first_stage_lasso X = X,
Y = Y,
D = D,
Z = Z,
y_learner = y_learner_lasso,
d_learner = d_learner_lasso,
pred_fn_y = pred_fn_lasso
)<- dml_plivm(
first_stage_enet X = X,
Y = Y,
D = D,
Z = Z,
y_learner = y_learner_enet,
d_learner = d_learner_enet,
pred_fn_y = pred_fn_enet
)<- dml_plivm(
first_stage_rf X = X_RF,
Y = Y,
D = factor(D),
Z = factor(Z),
y_learner = y_learner_rf,
d_learner = d_learner_rf,
pred_fn_y = pred_fn_rf_y,
pred_fn_d = pred_fn_rf_d
)
# TSLS with LASSO residuals
<- lm(d_resid ~ z_resid, first_stage_lasso)$fitted.values
d_hat_lasso <- lm_robust(first_stage_lasso$y_resid ~ d_hat_lasso, se_type = "HC1")
ate_lasso
# TSLS with LASSO residuals
<- lm(d_resid ~ z_resid, first_stage_enet)$fitted.values
d_hat_enet <- lm_robust(first_stage_enet$y_resid ~ d_hat_enet, se_type = "HC1")
ate_enet
# TSLS with RF residuals
<- lm(d_resid ~ z_resid, first_stage_rf)$fitted.values
d_hat_rf <- lm_robust(first_stage_rf$y_resid ~ d_hat_rf, se_type = "HC1") ate_rf
LASSO | ElasticNet | RandomForest | |
---|---|---|---|
Estimate | 13982.593 | 13809.441 | 12781.930 |
Std. Error | 1980.323 | 1977.618 | 1938.429 |
RMSE Y | 53280.924 | 53123.786 | 56275.848 |
RMSE D | 0.415 | 0.414 | 0.418 |
RMSE Z | 0.444 | 0.444 | 0.449 |
Interactive IV Model
Finally, the following function estimates the interactive IV model.
<- function(X,
dml_iivm
Y,
D,
Z,
y_learner,
d_learner,z_learner = d_learner,
pred_fn_y,pred_fn_d = pred_fn_y,
pred_fn_z = pred_fn_d,
always_takers = TRUE,
never_takers = TRUE) {
# Create folds for sample splitting
<- sample(1:nrow(X), size = round((1/3)*nrow(X)))
fold1 <- sample((1:nrow(X))[-fold1], size = round((1/3)*nrow(X)))
fold2 <- (1:nrow(X))[-c(fold1, fold2)]
fold3
# Create data.frame to store residuals
<- data.frame()
resids
# Generate outcome model across folds
for (fold in list(fold1, fold2, fold3)) {
# Get the training/prediction indices
<- (1:nrow(X))[-fold]
idx_train <- fold
idx_predict
# Convert treatment and instrument to numeric
<- as.numeric(as.character(D))
numD <- as.numeric(as.character(Z))
numZ
# Outcome model
<- cbind("Z" = numZ, X)
XZ <- y_learner(
outcome_model x = XZ[idx_train, , drop = FALSE],
y = Y[idx_train]
)
# Outcome predictions under Instrument == 0
"Z"] <- 0
XZ[, <- pred_fn_y(outcome_model, XZ[idx_predict, , drop = FALSE])
outcome_preds_z0
# Outcome predictions under Instrument == 1
"Z"] <- 1
XZ[, <- pred_fn_y(outcome_model, XZ[idx_predict, , drop = FALSE])
outcome_preds_z1
# Outcome predictions and residuals
<- (
outcome_predictions *outcome_preds_z1
numZ[idx_predict]+ (1 - numZ[idx_predict])*outcome_preds_z0
)<- Y[idx_predict] - outcome_predictions
outcome_resids
# Treatment model
<- cbind("Z" = numZ, X)
XZ <- d_learner(
treatment_model x = XZ[idx_train, , drop = FALSE],
y = D[idx_train]
)
# Treatment predictions under Instrument == 0
if (always_takers == FALSE) {
<- rep(0, length(D[idx_predict]))
treatment_preds_z0 else {
} "Z"] <- 0
XZ[, <- pred_fn_d(treatment_model, XZ[idx_predict, , drop = FALSE])
treatment_preds_z0
}
# Treatment predictions under Instrument == 1
if (never_takers == FALSE) {
<- rep(1, length(D[idx_predict]))
treatment_preds_z1 else {
} "Z"] <- 1
XZ[, <- pred_fn_d(treatment_model, XZ[idx_predict, , drop = FALSE])
treatment_preds_z1
}
# Treatment predictions and residuals
<- (
treatment_predictions *treatment_preds_z1
numZ[idx_predict]+ (1 - numZ[idx_predict])*treatment_preds_z0
)<- numD[idx_predict] - treatment_predictions
treatment_resids
# Instrument prediction model
<- z_learner(
instrument_model x = X[idx_train, , drop = FALSE],
y = Z[idx_train]
)
# Instrument predictions and residuals
<- pred_fn_z(instrument_model, X[idx_predict, , drop = FALSE])
instrument_predictions <- pmin(pmax(treatment_predictions, 0.01), .99)
treatment_predictions <- numZ[idx_predict] - instrument_predictions
instrument_resids
# Construct effect estimates
<- (
effect_estimates - outcome_preds_z0)
(outcome_preds_z1 + (numZ[idx_predict]*(Y[idx_predict] - outcome_preds_z1))
/instrument_predictions
- ((1 - numZ[idx_predict])*(Y[idx_predict] - outcome_preds_z0))
/(1 - instrument_predictions)
- (
- treatment_preds_z0)
(treatment_preds_z1 + (numZ[idx_predict]*(numD[idx_predict] - treatment_preds_z1))
/instrument_predictions
- ((1 - numZ[idx_predict])*(numD[idx_predict] - treatment_preds_z0))
/(1 - instrument_predictions)
)
)
# Collect residuals
<- data.frame(
new_resids "effect_estimates" = unname(effect_estimates),
"y_resid" = unname(outcome_resids),
"d_resid" = unname(treatment_resids),
"z_resid" = unname(instrument_resids),
"y_hat" = unname(outcome_predictions),
"d_hat" = unname(treatment_predictions),
"z_hat" = unname(instrument_predictions),
"idx" = idx_predict
)<- do.call(rbind, list(resids, new_resids))
resids
}
# Return data in original ordering
return(resids[order(resids[, "idx", drop = TRUE]), , drop = FALSE])
}
Let’s estimate the LATE now!
# Estimate DML first stage
set.seed(123)
<- dml_iivm(
first_stage_lasso X = X,
Y = Y,
D = D,
Z = Z,
y_learner = y_learner_lasso,
d_learner = d_learner_lasso,
pred_fn_y = pred_fn_lasso,
always_takers = FALSE
)<- dml_iivm(
first_stage_enet X = X,
Y = Y,
D = D,
Z = Z,
y_learner = y_learner_enet,
d_learner = d_learner_enet,
pred_fn_y = pred_fn_enet,
always_takers = FALSE
)<- dml_iivm(
first_stage_rf X = X_RF,
Y = Y,
D = factor(D),
Z = factor(Z),
y_learner = y_learner_rf,
d_learner = d_learner_rf,
pred_fn_y = pred_fn_rf_y,
pred_fn_d = pred_fn_rf_d,
always_takers = FALSE
)
# ATEs
<- lm_robust(first_stage_lasso$effect_estimates ~ 1, se_type = "HC1")
ate_lasso <- lm_robust(first_stage_enet$effect_estimates ~ 1, se_type = "HC1")
ate_enet <- lm_robust(first_stage_rf$effect_estimates ~ 1, se_type = "HC1") ate_rf
LASSO | ElasticNet | RandomForest | |
---|---|---|---|
Estimate | 9155.272 | 8686.631 | 7687.822 |
Std. Error | 1458.567 | 1323.662 | 1213.721 |
RMSE Y | 53083.322 | 52912.153 | 55839.966 |
RMSE D | 0.275 | 0.275 | 0.277 |
RMSE Z | 0.444 | 0.444 | 0.449 |
Conclusion
And that’s it! There are some additional tidbits, for example estimating the ATTE instead of the ATE, estimating confidence intervals with a multiplier bootstrap, and dealing with multiple treatments, but this has covered (IMO) the essentials. This really helped me solidify the underlying estimation strategy, which I found very helpful.