Title: | Automatic Stacked Ensemble for Regression Tasks |
---|---|
Description: | Stacked ensemble for regression tasks based on 'mlr3' framework with a pipeline for preprocessing numeric and factor features and hyper-parameter tuning using grid or random search. |
Authors: | Giancarlo Vercellino |
Maintainer: | Giancarlo Vercellino <[email protected]> |
License: | GPL-3 |
Version: | 1.1.0 |
Built: | 2025-03-24 05:24:26 UTC |
Source: | https://github.com/cran/sense |
A data frame for regression task generated with mlbench friedman1.
benchmark
benchmark
A data frame with 11 columns and 150 rows.
mlbench, friedman1
Stacked ensamble for regression tasks based on 'mlr3' framework.
sense( df, target_feat, benchmarking = "all", super = "avg", algos = c("glmnet", "ranger", "xgboost", "rpart", "kknn", "svm"), sampling_rate = 1, metric = "mae", collapse_char_to = 10, num_preproc = "scale", fct_preproc = "one-hot", impute_num = "sample", missing_fusion = FALSE, inner = "holdout", outer = "holdout", folds = 3, repeats = 3, ratio = 0.5, selected_filter = "information_gain", selected_n_feats = NULL, tuning = "random_search", budget = 30, resolution = 5, n_evals = 30, minute_time = 10, patience = 0.3, min_improve = 0.01, java_mem = 64, decimals = 2, seed = 42 )
sense( df, target_feat, benchmarking = "all", super = "avg", algos = c("glmnet", "ranger", "xgboost", "rpart", "kknn", "svm"), sampling_rate = 1, metric = "mae", collapse_char_to = 10, num_preproc = "scale", fct_preproc = "one-hot", impute_num = "sample", missing_fusion = FALSE, inner = "holdout", outer = "holdout", folds = 3, repeats = 3, ratio = 0.5, selected_filter = "information_gain", selected_n_feats = NULL, tuning = "random_search", budget = 30, resolution = 5, n_evals = 30, minute_time = 10, patience = 0.3, min_improve = 0.01, java_mem = 64, decimals = 2, seed = 42 )
df |
A data frame with features and target. |
target_feat |
String. Name of the numeric feature for the regression task. |
benchmarking |
Positive integer. Number of base learners to stack. Default: "all". |
super |
String. Super learner of choice among the available learners. Default: "avg". |
algos |
String vector. Available learners are: "glmnet", "ranger", "xgboost", "rpart", "kknn", "svm". |
sampling_rate |
Positive numeric. Sampling rate before applying the stacked ensemble. Default: 1. |
metric |
String. Evaluation metric for outer and inner cross-validation. Default: "mae". |
collapse_char_to |
Positive integer. Conversion of characters to factors with predefined maximum number of levels. Default: 10. |
num_preproc |
String. Options for scalar pre-processing: "scale" or "range". Default: "scale". |
fct_preproc |
String. Options for factor pre-processing: "encodeimpact", "encodelmer", "one-hot", "treatment", "poly", "sum", "helmert". Default: "one-hot". |
impute_num |
String. Options for missing imputation in case of numeric: "sample" or "hist". Default: "sample". For factor the default mode is Out-Of-Range. |
missing_fusion |
String. Adding missing indicator features. Default: "FALSE". |
inner |
String. Cross-validation inner cycle: "holdout", "cv", "repeated_cv", "subsampling". Default: "holdout". |
outer |
String. Cross-validation outer cycle: "holdout", "cv", "repeated_cv", "subsampling". Default: "holdout". |
folds |
Positive integer. Number of repetitions used in "cv" and "repeated_cv". Default: 3. |
repeats |
Positive integer. Number of repetitions used in "subsampling" and "repeated_cv". Default: 3. |
ratio |
Positive numeric. Percentage value for "holdout" and "subsampling". Default: 0.5. |
selected_filter |
String. Filters available for regression tasks: "carscore", "cmim", "correlation", "find_correlation", "information_gain", "relief", "variance". Default: "information_gain". |
selected_n_feats |
Positive integer. Number of features to select through the chosen filter. Default: NULL. |
tuning |
String. Available options are "random_search" and "grid_search". Default: "random_search". |
budget |
Positive integer. Maximum number of trials during random search. Default: 30. |
resolution |
Positive integer. Grid resolution for each hyper-parameter. Default: 5. |
n_evals |
Positive integer. Number of evaluation for termination. Default: 30. |
minute_time |
Positive integer. Maximum run time before termination. Default: 10. |
patience |
Positive numeric. Percentage of stagnating evaluations before termination. Default: 0.3. |
min_improve |
Positive numeric. Minimum error improvement required before termination. Default: 0.01. |
java_mem |
Positive integer. Memory allocated to Java. Default: 64. |
decimals |
Positive integer. Decimal format of prediction. Default: 2. |
seed |
Positive integer. Default: 42. |
This function returns a list including:
benchmark_error: comparison between the base learners
resampled_model: mlr3 standard description of the analytic pipeline.
plot: mlr3 standard graph of the analytic pipeline.
selected_n_feats: selected features and score according to the filtering method used.
model_error: error measure for outer cycle of cross-validation.
testing_frame: data set used for calculating the test metrics.
test_metrics: metrics reported are mse, rmse, mae, mape, mdae, rae, rse, rrse, smape.
model_predict: prediction function to apply to new data on the same scheme.
time_log: computation time.
Giancarlo Vercellino [email protected]
Useful links:
## Not run: sense(benchmark, "y", algos = c("glmnet", "rpart")) ## End(Not run)
## Not run: sense(benchmark, "y", algos = c("glmnet", "rpart")) ## End(Not run)