Bootsrap-enhanced Lasso
bolasso.Rd
This function implements model-consistent Lasso estimation through the bootstrap. It supports parallel processing by way of the future package, allowing the user to flexibly specify many parallelization methods. This method was developed as a variable-selection algorithm, but this package also supports making ensemble predictions on new data using the bagged Lasso models.
Usage
bolasso(
formula,
data,
n.boot = 100,
progress = TRUE,
implement = "glmnet",
x = NULL,
y = NULL,
...
)
Arguments
- formula
An optional object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted. Can be omitted when
x
andy
are non-missing.- data
An optional object of class data.frame that contains the modeling variables referenced in
form
. Can be omitted whenx
andy
are non-missing.- n.boot
An integer specifying the number of bootstrap replicates.
- progress
A boolean indicating whether to display progress across bootstrap folds.
- implement
A character; either 'glmnet' or 'gamlr', specifying which Lasso implementation to utilize. For specific modeling details, see
glmnet::cv.glmnet
orgamlr::cv.gamlr
.- x
An optional predictor matrix in lieu of
form
anddata
.- y
An optional response vector in lieu of
form
anddata
.- ...
Additional parameters to pass to either
glmnet::cv.glmnet
orgamlr::cv.gamlr
.
Value
An object of class bolasso
. This object is a list of length
n.boot
of cv.glmnet
or cv.gamlr
objects.
References
Bach FR (2008). “Bolasso: model consistent Lasso estimation through the bootstrap.” CoRR, abs/0804.1302. 0804.1302, https://arxiv.org/abs/0804.1302.
See also
glmnet::cv.glmnet and gamlr::cv.gamlr for full details on the
respective implementations and arguments that can be passed to ...
.
Examples
mtcars[, c(2, 10:11)] <- lapply(mtcars[, c(2, 10:11)], as.factor)
idx <- sample(nrow(mtcars), 22)
mtcars_train <- mtcars[idx, ]
mtcars_test <- mtcars[-idx, ]
## Formula Interface
# Train model
set.seed(123)
bolasso_form <- bolasso(
form = mpg ~ .,
data = mtcars_train,
n.boot = 20,
nfolds = 5,
implement = "glmnet"
)
#> Loaded glmnet 4.1-4
# Extract selected variables
selected_vars(bolasso_form, threshold = 0.9, select = "lambda.min")
#> # A tibble: 2 × 2
#> variable mean_coef
#> <chr> <dbl>
#> 1 Intercept 21.7
#> 2 wt -3.04
# Bagged ensemble prediction on test data
predict(bolasso_form,
new.data = mtcars_test,
select = "lambda.min")
#> boot1 boot2 boot3 boot4 boot5 boot6
#> Mazda RX4 20.76134 20.34704 21.239324 19.60828 21.12979 20.96082
#> Hornet 4 Drive 18.75163 19.29475 16.366901 17.61425 18.05342 18.98135
#> Merc 280 17.89915 21.65255 15.322505 18.66957 18.48299 18.09306
#> Lincoln Continental 10.54323 17.21361 9.862591 11.41671 10.01480 10.50363
#> Chrysler Imperial 10.70333 17.99750 10.009883 11.76223 10.23903 10.60520
#> Fiat 128 26.77030 26.77135 27.474993 23.85337 23.79135 25.40743
#> Fiat X1-9 27.66538 25.81630 28.044054 24.08995 24.65012 26.28904
#> Porsche 914-2 26.79507 24.71485 28.210405 23.07841 22.23800 25.33830
#> Ford Pantera L 17.80783 19.85633 19.977539 15.11513 17.53030 17.47561
#> Volvo 142E 24.50529 27.16728 26.479446 22.39905 21.61929 23.01562
#> boot7 boot8 boot9 boot10 boot11 boot12
#> Mazda RX4 22.869079 21.753744 16.65284 21.499026 18.54609 21.926434
#> Hornet 4 Drive 20.137220 19.802472 19.63291 18.515911 18.85144 18.973212
#> Merc 280 19.042205 19.916379 17.66827 18.766076 18.53630 17.318266
#> Lincoln Continental 9.449886 9.771935 10.64444 8.692528 10.96614 8.313098
#> Chrysler Imperial 9.446983 9.829435 11.11865 8.964457 11.69595 8.346580
#> Fiat 128 27.805458 31.486748 28.09133 24.452312 24.51643 24.757907
#> Fiat X1-9 28.673155 32.126942 27.82723 25.265440 24.51643 25.816437
#> Porsche 914-2 26.173894 25.930607 25.29917 25.917949 26.01814 28.816658
#> Ford Pantera L 18.310563 17.252871 15.61905 16.662751 15.53147 16.669863
#> Volvo 142E 25.021818 26.471130 25.48484 24.898070 24.58647 26.069353
#> boot13 boot14 boot15 boot16 boot17 boot18
#> Mazda RX4 22.57936 21.17064 21.723214 22.335163 21.758327 20.03276
#> Hornet 4 Drive 21.53215 19.63922 18.502808 19.958580 19.274229 21.62764
#> Merc 280 19.32290 18.09167 18.558067 18.891055 19.274863 18.55365
#> Lincoln Continental 10.08783 10.20804 9.484839 9.771742 6.778726 10.31880
#> Chrysler Imperial 10.03757 11.06782 9.727419 9.892499 7.363223 10.93002
#> Fiat 128 26.92219 27.88908 26.046254 26.421907 27.797364 21.60413
#> Fiat X1-9 27.88049 28.18078 27.042491 27.480385 28.607440 22.58776
#> Porsche 914-2 26.57926 31.30031 25.560203 26.336914 27.406556 31.75930
#> Ford Pantera L 17.14143 18.72760 17.872990 18.138485 17.059597 16.65913
#> Volvo 142E 23.86174 27.25885 23.793789 23.546842 26.453907 28.53635
#> boot19 boot20
#> Mazda RX4 22.490042 22.404499
#> Hornet 4 Drive 19.558860 18.551515
#> Merc 280 18.369479 18.837312
#> Lincoln Continental 8.167621 7.361824
#> Chrysler Imperial 8.338122 7.652546
#> Fiat 128 27.559192 25.187786
#> Fiat X1-9 28.766635 26.346452
#> Porsche 914-2 27.516819 23.150085
#> Ford Pantera L 18.038968 17.580329
#> Volvo 142E 24.373386 22.148463
## Alternal Matrix Interface
# Train model
set.seed(123)
bolasso_mat <- bolasso(
x = model.matrix(mpg ~ . - 1, mtcars_train),
y = mtcars_train[, 1],
data = mtcars_train,
n.boot = 20,
nfolds = 5,
implement = "glmnet"
)
# Extract selected variables
selected_vars(bolasso_mat, threshold = 0.9, select = "lambda.min")
#> # A tibble: 2 × 2
#> variable mean_coef
#> <chr> <dbl>
#> 1 Intercept 21.9
#> 2 wt -3.05
# Bagged ensemble prediction on test data
predict(bolasso_mat,
new.data = model.matrix(mpg ~ . - 1, mtcars_test),
select = "lambda.min")
#> boot1 boot2 boot3 boot4 boot5 boot6
#> Mazda RX4 20.76134 20.34704 21.239324 19.60828 21.12979 20.96082
#> Hornet 4 Drive 18.75163 19.29475 16.366901 17.61425 18.05342 18.98135
#> Merc 280 17.89915 21.65255 15.322505 18.66957 18.48299 18.09306
#> Lincoln Continental 10.54323 17.21361 9.862591 11.41671 10.01480 10.50363
#> Chrysler Imperial 10.70333 17.99750 10.009883 11.76223 10.23903 10.60520
#> Fiat 128 26.77030 26.77135 27.474993 23.85337 23.79135 25.40743
#> Fiat X1-9 27.66538 25.81630 28.044054 24.08995 24.65012 26.28904
#> Porsche 914-2 26.79507 24.71485 28.210405 23.07841 22.23800 25.33830
#> Ford Pantera L 17.80783 19.85633 19.977539 15.11513 17.53030 17.47561
#> Volvo 142E 24.50529 27.16728 26.479446 22.39905 21.61929 23.01562
#> boot7 boot8 boot9 boot10 boot11 boot12
#> Mazda RX4 22.869079 21.753744 16.65284 21.499026 18.54609 21.926434
#> Hornet 4 Drive 20.137220 19.802472 19.63291 18.515911 18.85144 18.973212
#> Merc 280 19.042205 19.916379 17.66827 18.766076 18.53630 17.318266
#> Lincoln Continental 9.449886 9.771935 10.64444 8.692528 10.96614 8.313098
#> Chrysler Imperial 9.446983 9.829435 11.11865 8.964457 11.69595 8.346580
#> Fiat 128 27.805458 31.486748 28.09133 24.452312 24.51643 24.757907
#> Fiat X1-9 28.673155 32.126942 27.82723 25.265440 24.51643 25.816437
#> Porsche 914-2 26.173894 25.930607 25.29917 25.917949 26.01814 28.816658
#> Ford Pantera L 18.310563 17.252871 15.61905 16.662751 15.53147 16.669863
#> Volvo 142E 25.021818 26.471130 25.48484 24.898070 24.58647 26.069353
#> boot13 boot14 boot15 boot16 boot17 boot18
#> Mazda RX4 22.57936 21.17064 21.723214 22.335163 21.758327 18.83813
#> Hornet 4 Drive 21.53215 19.63922 18.502808 19.958580 19.274229 21.93688
#> Merc 280 19.32290 18.09167 18.558067 18.891055 19.274863 18.76669
#> Lincoln Continental 10.08783 10.20804 9.484839 9.771742 6.778726 10.16809
#> Chrysler Imperial 10.03757 11.06782 9.727419 9.892499 7.363223 10.76310
#> Fiat 128 26.92219 27.88908 26.046254 26.421907 27.797364 20.92558
#> Fiat X1-9 27.88049 28.18078 27.042491 27.480385 28.607440 22.14549
#> Porsche 914-2 26.57926 31.30031 25.560203 26.336914 27.406556 30.85193
#> Ford Pantera L 17.14143 18.72760 17.872990 18.138485 17.059597 15.75061
#> Volvo 142E 23.86174 27.25885 23.793789 23.546842 26.453907 28.62207
#> boot19 boot20
#> Mazda RX4 22.490042 22.404499
#> Hornet 4 Drive 19.558860 18.551515
#> Merc 280 18.369479 18.837312
#> Lincoln Continental 8.167621 7.361824
#> Chrysler Imperial 8.338122 7.652546
#> Fiat 128 27.559192 25.187786
#> Fiat X1-9 28.766635 26.346452
#> Porsche 914-2 27.516819 23.150085
#> Ford Pantera L 18.038968 17.580329
#> Volvo 142E 24.373386 22.148463