Title: | Fast Extrapolation of Time Features using K-Nearest Neighbors |
---|---|
Description: | Fast extrapolation of univariate and multivariate time features using K-Nearest Neighbors. The compact set of hyper-parameters is tuned via grid or random search. |
Authors: | Giancarlo Vercellino |
Maintainer: | Giancarlo Vercellino <[email protected]> |
License: | GPL-3 |
Version: | 1.3.0 |
Built: | 2025-01-23 05:27:16 UTC |
Source: | https://github.com/cran/jenga |
A data frame with with daily and cumulative cases of Covid infections and deaths in Europe since March 2021.
covid_in_europe
covid_in_europe
A data frame with 5 columns and 163 rows.
www.ecdc.europa.eu
Automatic projections of time features using KNN
jenga( df, seq_len = NULL, smoother = FALSE, k = NULL, method = NULL, kernel = NULL, ci = 0.8, n_windows = 10, mode = NULL, n_sample = 30, search = "random", dates = NULL, error_scale = "naive", error_benchmark = "naive", seed = 42 )
jenga( df, seq_len = NULL, smoother = FALSE, k = NULL, method = NULL, kernel = NULL, ci = 0.8, n_windows = 10, mode = NULL, n_sample = 30, search = "random", dates = NULL, error_scale = "naive", error_benchmark = "naive", seed = 42 )
df |
A data frame with time features on columns (numerical or categorical features, but not both). |
seq_len |
Positive integer. Time-step number of the projected sequence |
smoother |
Logical. Perform optimal smoothing using standard loess (only for numerical features). Default: FALSE |
k |
Positive integer. Number of neighbors to consider when applying kernel average. Min number is 3. Default: NULL (automatic selection). |
method |
Positive integer. Distance method for calculating neighbors. Possibile options are: "euclidean", "manhattan", "minkowski". Default: NULL (automatic selection). |
kernel |
String. Distribution used to calculate kernel densities. Possible options are: "norm", "cauchy", "unif", "t". Default: NULL (automatic selection). |
ci |
Confidence interval. Default: 0.8 |
n_windows |
Positive integer. Number of validation tests to measure/sample error. Default: 10. |
mode |
String. Sequencing method: deterministic ("segmented"), or non-deterministic ("sampled"). Default: NULL (automatic selection). |
n_sample |
Positive integer. Number of samples for grid or random search. Default: 30. |
search |
String. Two option available: "grid", "random". Default: "random". |
dates |
Date. Vector with dates for time features. |
error_scale |
String. Scale for the scaled error metrics. Two options: "naive" (average of naive one-step absolute error for the historical series) or "deviation" (standard error of the historical series). Default: "naive". |
error_benchmark |
String. Benchmark for the relative error metrics. Two options: "naive" (sequential extension of last value) or "average" (mean value of true sequence). Default: "naive". |
seed |
Positive integer. Random seed. Default: 42. |
This function returns a list including:
exploration: list of all models, complete with predictions, test metrics, prediction stats and plot
history: a table with the sampled models, hyper-parameters, validation errors
best_model: results for the best model, including:
predictions: min, max, q25, q50, q75, quantiles at selected ci, and different statics for numerical and categorical variables
testing_errors: training and testing errors for one-step and sequence for each ts feature (different measures for numerical and categorical variables)
time_log
Giancarlo Vercellino [email protected]
Useful links:
jenga(covid_in_europe[, c(2, 3)], n_sample = 1) jenga(covid_in_europe[, c(4, 5)], n_sample = 1)
jenga(covid_in_europe[, c(2, 3)], n_sample = 1) jenga(covid_in_europe[, c(4, 5)], n_sample = 1)