Skip to contents

This function implements model-consistent Lasso estimation through the bootstrap. It supports parallel processing by way of the future package, allowing the user to flexibly specify many parallelization methods. This method was developed as a variable-selection algorithm, but this package also supports making ensemble predictions on new data using the bagged Lasso models.

Usage

bolasso(
  formula,
  data,
  n.boot = 100,
  progress = TRUE,
  implement = "glmnet",
  x = NULL,
  y = NULL,
  ...
)

Arguments

formula

An optional object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted. Can be omitted when x and y are non-missing.

data

An optional object of class data.frame that contains the modeling variables referenced in form. Can be omitted when x and y are non-missing.

n.boot

An integer specifying the number of bootstrap replicates.

progress

A boolean indicating whether to display progress across bootstrap folds.

implement

A character; either 'glmnet' or 'gamlr', specifying which Lasso implementation to utilize. For specific modeling details, see glmnet::cv.glmnet or gamlr::cv.gamlr.

x

An optional predictor matrix in lieu of form and data.

y

An optional response vector in lieu of form and data.

...

Additional parameters to pass to either glmnet::cv.glmnet or gamlr::cv.gamlr.

Value

An object of class bolasso. This object is a list of length n.boot of cv.glmnet or cv.gamlr objects.

References

Bach FR (2008). “Bolasso: model consistent Lasso estimation through the bootstrap.” CoRR, abs/0804.1302. 0804.1302, https://arxiv.org/abs/0804.1302.

See also

glmnet::cv.glmnet and gamlr::cv.gamlr for full details on the respective implementations and arguments that can be passed to ....

Examples

mtcars[, c(2, 10:11)] <- lapply(mtcars[, c(2, 10:11)], as.factor)
idx <- sample(nrow(mtcars), 22)
mtcars_train <- mtcars[idx, ]
mtcars_test <- mtcars[-idx, ]

## Formula Interface

# Train model
set.seed(123)
bolasso_form <- bolasso(
  form = mpg ~ .,
  data = mtcars_train,
  n.boot = 20,
  nfolds = 5,
  implement = "glmnet"
)
#> Loaded glmnet 4.1-4

# Extract selected variables
selected_vars(bolasso_form, threshold = 0.9, select = "lambda.min")
#> # A tibble: 2 × 2
#>   variable  mean_coef
#>   <chr>         <dbl>
#> 1 Intercept     21.7 
#> 2 wt            -3.04

# Bagged ensemble prediction on test data
predict(bolasso_form,
        new.data = mtcars_test,
        select = "lambda.min")
#>                        boot1    boot2     boot3    boot4    boot5    boot6
#> Mazda RX4           20.76134 20.34704 21.239324 19.60828 21.12979 20.96082
#> Hornet 4 Drive      18.75163 19.29475 16.366901 17.61425 18.05342 18.98135
#> Merc 280            17.89915 21.65255 15.322505 18.66957 18.48299 18.09306
#> Lincoln Continental 10.54323 17.21361  9.862591 11.41671 10.01480 10.50363
#> Chrysler Imperial   10.70333 17.99750 10.009883 11.76223 10.23903 10.60520
#> Fiat 128            26.77030 26.77135 27.474993 23.85337 23.79135 25.40743
#> Fiat X1-9           27.66538 25.81630 28.044054 24.08995 24.65012 26.28904
#> Porsche 914-2       26.79507 24.71485 28.210405 23.07841 22.23800 25.33830
#> Ford Pantera L      17.80783 19.85633 19.977539 15.11513 17.53030 17.47561
#> Volvo 142E          24.50529 27.16728 26.479446 22.39905 21.61929 23.01562
#>                         boot7     boot8    boot9    boot10   boot11    boot12
#> Mazda RX4           22.869079 21.753744 16.65284 21.499026 18.54609 21.926434
#> Hornet 4 Drive      20.137220 19.802472 19.63291 18.515911 18.85144 18.973212
#> Merc 280            19.042205 19.916379 17.66827 18.766076 18.53630 17.318266
#> Lincoln Continental  9.449886  9.771935 10.64444  8.692528 10.96614  8.313098
#> Chrysler Imperial    9.446983  9.829435 11.11865  8.964457 11.69595  8.346580
#> Fiat 128            27.805458 31.486748 28.09133 24.452312 24.51643 24.757907
#> Fiat X1-9           28.673155 32.126942 27.82723 25.265440 24.51643 25.816437
#> Porsche 914-2       26.173894 25.930607 25.29917 25.917949 26.01814 28.816658
#> Ford Pantera L      18.310563 17.252871 15.61905 16.662751 15.53147 16.669863
#> Volvo 142E          25.021818 26.471130 25.48484 24.898070 24.58647 26.069353
#>                       boot13   boot14    boot15    boot16    boot17   boot18
#> Mazda RX4           22.57936 21.17064 21.723214 22.335163 21.758327 20.03276
#> Hornet 4 Drive      21.53215 19.63922 18.502808 19.958580 19.274229 21.62764
#> Merc 280            19.32290 18.09167 18.558067 18.891055 19.274863 18.55365
#> Lincoln Continental 10.08783 10.20804  9.484839  9.771742  6.778726 10.31880
#> Chrysler Imperial   10.03757 11.06782  9.727419  9.892499  7.363223 10.93002
#> Fiat 128            26.92219 27.88908 26.046254 26.421907 27.797364 21.60413
#> Fiat X1-9           27.88049 28.18078 27.042491 27.480385 28.607440 22.58776
#> Porsche 914-2       26.57926 31.30031 25.560203 26.336914 27.406556 31.75930
#> Ford Pantera L      17.14143 18.72760 17.872990 18.138485 17.059597 16.65913
#> Volvo 142E          23.86174 27.25885 23.793789 23.546842 26.453907 28.53635
#>                        boot19    boot20
#> Mazda RX4           22.490042 22.404499
#> Hornet 4 Drive      19.558860 18.551515
#> Merc 280            18.369479 18.837312
#> Lincoln Continental  8.167621  7.361824
#> Chrysler Imperial    8.338122  7.652546
#> Fiat 128            27.559192 25.187786
#> Fiat X1-9           28.766635 26.346452
#> Porsche 914-2       27.516819 23.150085
#> Ford Pantera L      18.038968 17.580329
#> Volvo 142E          24.373386 22.148463

## Alternal Matrix Interface

# Train model
set.seed(123)
bolasso_mat <- bolasso(
  x = model.matrix(mpg ~ . - 1, mtcars_train),
  y = mtcars_train[, 1],
  data = mtcars_train,
  n.boot = 20,
  nfolds = 5,
  implement = "glmnet"
)

# Extract selected variables
selected_vars(bolasso_mat, threshold = 0.9, select = "lambda.min")
#> # A tibble: 2 × 2
#>   variable  mean_coef
#>   <chr>         <dbl>
#> 1 Intercept     21.9 
#> 2 wt            -3.05

# Bagged ensemble prediction on test data
predict(bolasso_mat,
        new.data = model.matrix(mpg ~ . - 1, mtcars_test),
        select = "lambda.min")
#>                        boot1    boot2     boot3    boot4    boot5    boot6
#> Mazda RX4           20.76134 20.34704 21.239324 19.60828 21.12979 20.96082
#> Hornet 4 Drive      18.75163 19.29475 16.366901 17.61425 18.05342 18.98135
#> Merc 280            17.89915 21.65255 15.322505 18.66957 18.48299 18.09306
#> Lincoln Continental 10.54323 17.21361  9.862591 11.41671 10.01480 10.50363
#> Chrysler Imperial   10.70333 17.99750 10.009883 11.76223 10.23903 10.60520
#> Fiat 128            26.77030 26.77135 27.474993 23.85337 23.79135 25.40743
#> Fiat X1-9           27.66538 25.81630 28.044054 24.08995 24.65012 26.28904
#> Porsche 914-2       26.79507 24.71485 28.210405 23.07841 22.23800 25.33830
#> Ford Pantera L      17.80783 19.85633 19.977539 15.11513 17.53030 17.47561
#> Volvo 142E          24.50529 27.16728 26.479446 22.39905 21.61929 23.01562
#>                         boot7     boot8    boot9    boot10   boot11    boot12
#> Mazda RX4           22.869079 21.753744 16.65284 21.499026 18.54609 21.926434
#> Hornet 4 Drive      20.137220 19.802472 19.63291 18.515911 18.85144 18.973212
#> Merc 280            19.042205 19.916379 17.66827 18.766076 18.53630 17.318266
#> Lincoln Continental  9.449886  9.771935 10.64444  8.692528 10.96614  8.313098
#> Chrysler Imperial    9.446983  9.829435 11.11865  8.964457 11.69595  8.346580
#> Fiat 128            27.805458 31.486748 28.09133 24.452312 24.51643 24.757907
#> Fiat X1-9           28.673155 32.126942 27.82723 25.265440 24.51643 25.816437
#> Porsche 914-2       26.173894 25.930607 25.29917 25.917949 26.01814 28.816658
#> Ford Pantera L      18.310563 17.252871 15.61905 16.662751 15.53147 16.669863
#> Volvo 142E          25.021818 26.471130 25.48484 24.898070 24.58647 26.069353
#>                       boot13   boot14    boot15    boot16    boot17   boot18
#> Mazda RX4           22.57936 21.17064 21.723214 22.335163 21.758327 18.83813
#> Hornet 4 Drive      21.53215 19.63922 18.502808 19.958580 19.274229 21.93688
#> Merc 280            19.32290 18.09167 18.558067 18.891055 19.274863 18.76669
#> Lincoln Continental 10.08783 10.20804  9.484839  9.771742  6.778726 10.16809
#> Chrysler Imperial   10.03757 11.06782  9.727419  9.892499  7.363223 10.76310
#> Fiat 128            26.92219 27.88908 26.046254 26.421907 27.797364 20.92558
#> Fiat X1-9           27.88049 28.18078 27.042491 27.480385 28.607440 22.14549
#> Porsche 914-2       26.57926 31.30031 25.560203 26.336914 27.406556 30.85193
#> Ford Pantera L      17.14143 18.72760 17.872990 18.138485 17.059597 15.75061
#> Volvo 142E          23.86174 27.25885 23.793789 23.546842 26.453907 28.62207
#>                        boot19    boot20
#> Mazda RX4           22.490042 22.404499
#> Hornet 4 Drive      19.558860 18.551515
#> Merc 280            18.369479 18.837312
#> Lincoln Continental  8.167621  7.361824
#> Chrysler Imperial    8.338122  7.652546
#> Fiat 128            27.559192 25.187786
#> Fiat X1-9           28.766635 26.346452
#> Porsche 914-2       27.516819 23.150085
#> Ford Pantera L      18.038968 17.580329
#> Volvo 142E          24.373386 22.148463