Predictive Models with Cross Validation
CV.Rd
CV
allows the user to specify a cross validation scheme with complete
flexibility in the model, data splitting function, and performance metrics,
among other essential parameters.
Public fields
learner
Predictive modeling function.
scorer
List of performance metric functions.
splitter
Function that splits data into cross validation folds.
Methods
Method fit()
fit
performs cross validation with user-specified parameters.
Arguments
formula
An object of class formula: a symbolic description of the model to be fitted.
data
An optional data frame, or other object containing the variables in the model. If
data
is not provided, howformula
is handled depends on$learner
.x
Predictor data (independent variables), alternative interface to data with formula.
y
Response vector (dependent variable), alternative interface to data with formula.
progress
Logical; indicating whether to print progress across cross validation folds.
Details
fit
follows standard R modeling convention by surfacing a formula
modeling interface as well as an alternate matrix option. The user should
use whichever interface is supported by the specified $learner
function.
Returns
An object of class FittedCV.
Examples
if (require(rpart) && require(rsample) && require(yardstick)) {
iris_new <- iris[sample(1:nrow(iris), nrow(iris)), ]
iris_new$Species <- factor(iris_new$Species == "virginica")
### Basic Example
iris_cv <- CV$new(
learner = rpart::rpart,
learner_args = list(method = "class"),
splitter = rsample::vfold_cv,
splitter_args = list(v = 3),
scorer = list(
"accuracy" = yardstick::accuracy_vec
),
prediction_args = list(type = "class")
)
iris_cv_fitted <- iris_cv$fit(formula = Species ~ ., data = iris_new)
### Example with multiple metric functions
iris_cv <- CV$new(
learner = rpart::rpart,
learner_args = list(method = "class"),
splitter = rsample::vfold_cv,
splitter_args = list(v = 3),
scorer = list(
"f_meas" = yardstick::f_meas_vec,
"accuracy" = yardstick::accuracy_vec,
"roc_auc" = yardstick::roc_auc_vec,
"pr_auc" = yardstick::pr_auc_vec
),
prediction_args = list(
"f_meas" = list(type = "class"),
"accuracy" = list(type = "class"),
"roc_auc" = list(type = "prob"),
"pr_auc" = list(type = "prob")
),
convert_predictions = list(
NULL,
NULL,
function(i) i[, "FALSE"],
function(i) i[, "FALSE"]
)
)
iris_cv_fitted <- iris_cv$fit(formula = Species ~ ., data = iris_new)
}
Method new()
Create a new CV object.
Usage
CV$new(
learner = NULL,
splitter = NULL,
scorer = NULL,
learner_args = NULL,
splitter_args = NULL,
scorer_args = NULL,
prediction_args = NULL,
convert_predictions = NULL
)
Arguments
learner
Function that estimates a predictive model. It is essential that this function support either a formula interface with
formula
anddata
arguments, or an alternate matrix interface withx
andy
arguments.splitter
A function that computes cross validation folds from an input data set or a pre-computed list of cross validation fold indices. If
splitter
is a function, it must have adata
argument for the input data, and it must return a list of cross validation fold indices. Ifsplitter
is a list of integers, the number of cross validation folds islength(splitter)
and each element contains the indices of the data observations that are included in that fold.scorer
A named list of metric functions to evaluate model performance on each cross validation fold. Any provided metric function must have
truth
andestimate
arguments, for true outcome values and predicted outcome values respectively, and must return a single numeric metric value.learner_args
A named list of additional arguments to pass to
learner
.splitter_args
A named list of additional arguments to pass to
splitter
.scorer_args
A named list of additional arguments to pass to
scorer
.scorer_args
must either be length 1 orlength(scorer)
in the case where different arguments are being passed to each scoring function.prediction_args
A named list of additional arguments to pass to
predict
.prediction_args
must either be length 1 orlength(scorer)
in the case where different arguments are being passed to each scoring function.convert_predictions
A list of functions to convert predicted values prior to being evaluated by the metric functions supplied in
scorer
. This list should either be length 1, in which case the same function will be applied to all predicted values, orlength(scorer)
in which case each function inconvert_predictions
will correspond with each function inscorer
.
Examples
## ------------------------------------------------
## Method `CV$fit`
## ------------------------------------------------
if (require(rpart) && require(rsample) && require(yardstick)) {
iris_new <- iris[sample(1:nrow(iris), nrow(iris)), ]
iris_new$Species <- factor(iris_new$Species == "virginica")
### Basic Example
iris_cv <- CV$new(
learner = rpart::rpart,
learner_args = list(method = "class"),
splitter = rsample::vfold_cv,
splitter_args = list(v = 3),
scorer = list(
"accuracy" = yardstick::accuracy_vec
),
prediction_args = list(type = "class")
)
iris_cv_fitted <- iris_cv$fit(formula = Species ~ ., data = iris_new)
### Example with multiple metric functions
iris_cv <- CV$new(
learner = rpart::rpart,
learner_args = list(method = "class"),
splitter = rsample::vfold_cv,
splitter_args = list(v = 3),
scorer = list(
"f_meas" = yardstick::f_meas_vec,
"accuracy" = yardstick::accuracy_vec,
"roc_auc" = yardstick::roc_auc_vec,
"pr_auc" = yardstick::pr_auc_vec
),
prediction_args = list(
"f_meas" = list(type = "class"),
"accuracy" = list(type = "class"),
"roc_auc" = list(type = "prob"),
"pr_auc" = list(type = "prob")
),
convert_predictions = list(
NULL,
NULL,
function(i) i[, "FALSE"],
function(i) i[, "FALSE"]
)
)
iris_cv_fitted <- iris_cv$fit(formula = Species ~ ., data = iris_new)
}
#> Loading required package: rpart
#> Loading required package: rsample
#> Loading required package: yardstick
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
#> Loading required package: future