Package 'sense' reference manual

Title:	Automatic Stacked Ensemble for Regression Tasks
Description:	Stacked ensemble for regression tasks based on 'mlr3' framework with a pipeline for preprocessing numeric and factor features and hyper-parameter tuning using grid or random search.
Authors:	Giancarlo Vercellino
Maintainer:	Giancarlo Vercellino <[email protected]>
License:	GPL-3
Version:	1.1.0
Built:	2025-03-24 05:24:26 UTC
Source:	https://github.com/cran/sense

benchmark data set

Description

A data frame for regression task generated with mlbench friedman1.

Usage

benchmark
benchmark

Format

A data frame with 11 columns and 150 rows.

Source

mlbench, friedman1

sense

Description

Stacked ensamble for regression tasks based on 'mlr3' framework.

Usage

sense(
  df,
  target_feat,
  benchmarking = "all",
  super = "avg",
  algos = c("glmnet", "ranger", "xgboost", "rpart", "kknn", "svm"),
  sampling_rate = 1,
  metric = "mae",
  collapse_char_to = 10,
  num_preproc = "scale",
  fct_preproc = "one-hot",
  impute_num = "sample",
  missing_fusion = FALSE,
  inner = "holdout",
  outer = "holdout",
  folds = 3,
  repeats = 3,
  ratio = 0.5,
  selected_filter = "information_gain",
  selected_n_feats = NULL,
  tuning = "random_search",
  budget = 30,
  resolution = 5,
  n_evals = 30,
  minute_time = 10,
  patience = 0.3,
  min_improve = 0.01,
  java_mem = 64,
  decimals = 2,
  seed = 42
)
sense(
  df,
  target_feat,
  benchmarking = "all",
  super = "avg",
  algos = c("glmnet", "ranger", "xgboost", "rpart", "kknn", "svm"),
  sampling_rate = 1,
  metric = "mae",
  collapse_char_to = 10,
  num_preproc = "scale",
  fct_preproc = "one-hot",
  impute_num = "sample",
  missing_fusion = FALSE,
  inner = "holdout",
  outer = "holdout",
  folds = 3,
  repeats = 3,
  ratio = 0.5,
  selected_filter = "information_gain",
  selected_n_feats = NULL,
  tuning = "random_search",
  budget = 30,
  resolution = 5,
  n_evals = 30,
  minute_time = 10,
  patience = 0.3,
  min_improve = 0.01,
  java_mem = 64,
  decimals = 2,
  seed = 42
)

Arguments

`df`	A data frame with features and target.
`target_feat`	String. Name of the numeric feature for the regression task.
`benchmarking`	Positive integer. Number of base learners to stack. Default: "all".
`super`	String. Super learner of choice among the available learners. Default: "avg".
`algos`	String vector. Available learners are: "glmnet", "ranger", "xgboost", "rpart", "kknn", "svm".
`sampling_rate`	Positive numeric. Sampling rate before applying the stacked ensemble. Default: 1.
`metric`	String. Evaluation metric for outer and inner cross-validation. Default: "mae".
`collapse_char_to`	Positive integer. Conversion of characters to factors with predefined maximum number of levels. Default: 10.
`num_preproc`	String. Options for scalar pre-processing: "scale" or "range". Default: "scale".
`fct_preproc`	String. Options for factor pre-processing: "encodeimpact", "encodelmer", "one-hot", "treatment", "poly", "sum", "helmert". Default: "one-hot".
`impute_num`	String. Options for missing imputation in case of numeric: "sample" or "hist". Default: "sample". For factor the default mode is Out-Of-Range.
`missing_fusion`	String. Adding missing indicator features. Default: "FALSE".
`inner`	String. Cross-validation inner cycle: "holdout", "cv", "repeated_cv", "subsampling". Default: "holdout".
`outer`	String. Cross-validation outer cycle: "holdout", "cv", "repeated_cv", "subsampling". Default: "holdout".
`folds`	Positive integer. Number of repetitions used in "cv" and "repeated_cv". Default: 3.
`repeats`	Positive integer. Number of repetitions used in "subsampling" and "repeated_cv". Default: 3.
`ratio`	Positive numeric. Percentage value for "holdout" and "subsampling". Default: 0.5.
`selected_filter`	String. Filters available for regression tasks: "carscore", "cmim", "correlation", "find_correlation", "information_gain", "relief", "variance". Default: "information_gain".
`selected_n_feats`	Positive integer. Number of features to select through the chosen filter. Default: NULL.
`tuning`	String. Available options are "random_search" and "grid_search". Default: "random_search".
`budget`	Positive integer. Maximum number of trials during random search. Default: 30.
`resolution`	Positive integer. Grid resolution for each hyper-parameter. Default: 5.
`n_evals`	Positive integer. Number of evaluation for termination. Default: 30.
`minute_time`	Positive integer. Maximum run time before termination. Default: 10.
`patience`	Positive numeric. Percentage of stagnating evaluations before termination. Default: 0.3.
`min_improve`	Positive numeric. Minimum error improvement required before termination. Default: 0.01.
`java_mem`	Positive integer. Memory allocated to Java. Default: 64.
`decimals`	Positive integer. Decimal format of prediction. Default: 2.
`seed`	Positive integer. Default: 42.

Value

This function returns a list including:

benchmark_error: comparison between the base learners
resampled_model: mlr3 standard description of the analytic pipeline.
plot: mlr3 standard graph of the analytic pipeline.
selected_n_feats: selected features and score according to the filtering method used.
model_error: error measure for outer cycle of cross-validation.
testing_frame: data set used for calculating the test metrics.
test_metrics: metrics reported are mse, rmse, mae, mape, mdae, rae, rse, rrse, smape.
model_predict: prediction function to apply to new data on the same scheme.
time_log: computation time.

Author(s)

Giancarlo Vercellino [email protected]

Examples

## Not run: 
sense(benchmark, "y", algos = c("glmnet", "rpart"))


## End(Not run)

## Not run: 
sense(benchmark, "y", algos = c("glmnet", "rpart"))


## End(Not run)

Package 'sense'

Help Index

benchmark data set

Description

Usage

Format

Source

sense

Description

Usage

Arguments

Value

Author(s)

See Also

Examples