Title: | Probabilistic Time Series Forecasting with XGBoost and Conformal Inference |
---|---|
Description: | Implements a probabilistic approach to time series forecasting combining XGBoost regression with conformal inference methods. The package provides functionality for generating predictive distributions, evaluating uncertainty, and optimizing hyperparameters using Bayesian, coarse-to-fine, or random search strategies. |
Authors: | Giancarlo Vercellino [aut, cre, cph] |
Maintainer: | Giancarlo Vercellino <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2025-03-25 05:41:46 UTC |
Source: | https://github.com/cran/xpect |
This function implements probabilistic time series forecasting by combining gradient-boosted regression (XGBoost) with conformal inference techniques. It produces predictive distributions capturing uncertainty and optimizes hyper parameters through Bayesian, coarse-to-fine, or random search methods. The approach leverages historical observations from predictor series to estimate the future values of a specified target series. Users can customize the forecasting model extensively by setting parameters for model complexity, regularization, and conformal calibration.
Implements a probabilistic approach to time series forecasting combining XGBoost regression with conformal inference methods. The package provides functionality for generating predictive distributions, evaluating uncertainty, and optimizing hyper parameters using Bayesian, coarse-to-fine, or random search strategies.
xpect( predictors, target, future, past = 1L, coverage = 0.5, max_depth = 3L, eta = 0.1, gamma = 0, alpha = 0, lambda = 1, subsample = 0.8, colsample_bytree = 0.8, search = "none", calib_rate = 0.5, n_sim = 1000, nrounds = 200, n_samples = 10, n_exploration = 10, n_phases = 3, top_k = 3, seed = 42 )
xpect( predictors, target, future, past = 1L, coverage = 0.5, max_depth = 3L, eta = 0.1, gamma = 0, alpha = 0, lambda = 1, subsample = 0.8, colsample_bytree = 0.8, search = "none", calib_rate = 0.5, n_sim = 1000, nrounds = 200, n_samples = 10, n_exploration = 10, n_phases = 3, top_k = 3, seed = 42 )
predictors |
A data frame containing multiple time series predictors and the target series to forecast. |
target |
Character string specifying the name of the target series to forecast within the predictors dataset. |
future |
Integer specifying the number of future time steps to forecast. |
past |
Integer or numeric vector specifying past observations used as input features. Single value sets fixed value (default: 1). NULL sets standard range (1L-30L), while two values define custom range. |
coverage |
Numeric or numeric vector for fraction of total variance preserved during SVD. Single value sets fixed value (default: 0.5). NULL sets standard range (0.05-0.95), while two values define custom range. |
max_depth |
Integer or numeric vector for max depth of XGBoost trees. Single value sets fixed value (default: 3). NULL sets standard range (3L-10L), while two values define custom range. |
eta |
Numeric or numeric vector for learning rate in XGBoost. Single value sets fixed value (default: 0.1). NULL sets standard range (0.01-0.3), while two values define custom range. |
gamma |
Numeric or numeric vector for minimum loss reduction to split a leaf node. Single value sets fixed value (default: 0). NULL sets standard range (0-5), while two values define custom range. |
alpha |
Numeric or numeric vector for L1 regularization strength. Single value sets fixed value (default: 0). NULL sets standard range (0-1), while two values define custom range. |
lambda |
Numeric or numeric vector for L2 regularization strength. Single value sets fixed value (default: 1). NULL sets standard range (0-1), while two values define custom range. |
subsample |
Numeric or numeric vector (0-1) for instance subsampling ratio per tree. Single value sets fixed value (default: 0.8). NULL sets standard range (0-1), while two values define custom range. |
colsample_bytree |
Numeric or numeric vector (0-1) for column subsampling ratio per tree. Single value sets fixed value (default: 0.8). NULL sets standard range (0-1), while two values define custom range. |
search |
Character string specifying the hyper parameter search method to employ. Options include: "none" (default), "random_search", "bayesian", "coarse_to_fine". |
calib_rate |
Numeric fraction (default: 0.5) of observations allocated for conformal calibration, influencing the uncertainty estimation. |
n_sim |
Integer (default: 1000) determining the number of simulated calibration error samples used during conformal inference. |
nrounds |
Integer (default: 200) specifying the maximum number of boosting iterations allowed during model training. |
n_samples |
Integer specifying the number of parameter configurations evaluated during random search or initial Bayesian sampling. |
n_exploration |
Integer specifying the number of exploratory evaluations during Bayesian optimization to balance exploration-exploitation. |
n_phases |
Integer specifying how many iterative refinement phases are performed in coarse-to-fine optimization. |
top_k |
Integer (default: 3) indicating how many top-performing parameter configurations are retained in each coarse-to-fine optimization iteration. |
seed |
Integer setting the random seed for reproducibility. |
A list containing:
A data frame logging each evaluated hyperparameter configuration and its associated cross-entropy performance against the selected benchmark.
The optimal forecasting model, including probability density functions (pdf), cumulative distribution functions (cdf), inverse cumulative distribution functions (icdf), and random sampling functions (sampler) for each point in the forecasted horizon.
A named vector detailing the selected hyper parameters of the best-performing forecasting model.
A visualization displaying the optimal forecasts alongside confidence bands derived from conformal intervals, facilitating intuitive uncertainty interpretation.
Duration tracking the computational time required for the complete optimization and model-building process.
Giancarlo Vercellino [email protected]
Maintainer: Giancarlo Vercellino [email protected] [copyright holder]
Useful links:
dummy_data <- data.frame(target_series = cumsum(rnorm(100)), predictor1 = cumsum(rnorm(100))) result <- xpect(predictors = dummy_data, target = "target_series", future = 3, past = c(5L, 20L),#CUSTOM RANGE coverage = 0.9, max_depth = c(3L, 8L),#CUSTOM RANGE eta = c(0.01, 0.05), gamma = NULL,#STANDARD RANGE alpha = NULL,#STANDARD RANGE lambda = NULL,#STANDARD RANGE subsample = 0.8, colsample_bytree = 0.8, search = "random_search", n_samples = 3, seed = 123)
dummy_data <- data.frame(target_series = cumsum(rnorm(100)), predictor1 = cumsum(rnorm(100))) result <- xpect(predictors = dummy_data, target = "target_series", future = 3, past = c(5L, 20L),#CUSTOM RANGE coverage = 0.9, max_depth = c(3L, 8L),#CUSTOM RANGE eta = c(0.01, 0.05), gamma = NULL,#STANDARD RANGE alpha = NULL,#STANDARD RANGE lambda = NULL,#STANDARD RANGE subsample = 0.8, colsample_bytree = 0.8, search = "random_search", n_samples = 3, seed = 123)