diff --git a/NEWS.md b/NEWS.md index ed7ab225..1b903715 100644 --- a/NEWS.md +++ b/NEWS.md @@ -6,6 +6,8 @@ * Fixed example for `nested_cv()` (@seb09, #520). +* `rolling_origin()` is now superseded by `sliding_window()`, `sliding_index()`, and `sliding_period()` which provide more flexibility and control (@nmercadeb, #524). + * Removed trailing space in printing of `mc_cv()` objects (@ccani007, #464). * Improved documentation for `initial_split()` and friends (@laurabrianna, #519). diff --git a/R/rolling_origin.R b/R/rolling_origin.R index 1a49993b..a9099d46 100644 --- a/R/rolling_origin.R +++ b/R/rolling_origin.R @@ -1,9 +1,18 @@ #' Rolling Origin Forecast Resampling #' +#' @description +#' `r lifecycle::badge("superseded")` +#' #' This resampling method is useful when the data set has a strong time #' component. The resamples are not random and contain data points that are #' consecutive values. The function assumes that the original data set are #' sorted in time order. +#' +#' This function is superseded by [sliding_window()], [sliding_index()], and +#' [sliding_period()] which provide more flexibility and control. Superseded +#' functions will not go away, but active development will be focused on the new +#' functions. +#' #' @details The main options, `initial` and `assess`, control the number of #' data points from the original data that are in the analysis and assessment #' set, respectively. When `cumulative = TRUE`, the analysis set will grow as @@ -59,6 +68,13 @@ #' @export rolling_origin <- function(data, initial = 5, assess = 1, cumulative = TRUE, skip = 0, lag = 0, ...) { + + lifecycle::signal_stage( + stage = "superseded", + what = "rolling_origin()", + with = I("`sliding_window()`, `sliding_index()` and `sliding_period()`") + ) + check_dots_empty() n <- nrow(data) diff --git a/man/rolling_origin.Rd b/man/rolling_origin.Rd index 27597678..48f4c39e 100644 --- a/man/rolling_origin.Rd +++ b/man/rolling_origin.Rd @@ -42,10 +42,17 @@ and a column called \code{id} that has a character string with the resample identifier. } \description{ +\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#superseded}{\figure{lifecycle-superseded.svg}{options: alt='[Superseded]'}}}{\strong{[Superseded]}} + This resampling method is useful when the data set has a strong time component. The resamples are not random and contain data points that are consecutive values. The function assumes that the original data set are sorted in time order. + +This function is superseded by \code{\link[=sliding_window]{sliding_window()}}, \code{\link[=sliding_index]{sliding_index()}}, and +\code{\link[=sliding_period]{sliding_period()}} which provide more flexibility and control. Superseded +functions will not go away, but active development will be focused on the new +functions. } \details{ The main options, \code{initial} and \code{assess}, control the number of diff --git a/vignettes/Common_Patterns.Rmd b/vignettes/Common_Patterns.Rmd index 039cf0ac..a7c51356 100644 --- a/vignettes/Common_Patterns.Rmd +++ b/vignettes/Common_Patterns.Rmd @@ -223,11 +223,13 @@ sliding_period(Chicago, date, "year") %>% head(2) ``` -All of these functions produce analysis sets of the same size, with the start and end of the analysis set "sliding" down your data frame. If you'd rather have your analysis set get progressively larger, so that you're predicting new data based upon a growing set of older observations, you can use the `rolling_origin()` function: +All of these functions produce analysis sets of the same size, with the start and end of the analysis set "sliding" down your data frame. If you'd rather have your analysis set get progressively larger, so that you're predicting new data based upon a growing set of older observations, you can use the `sliding_window()` function with `lookback = -Inf`: ```{r} -rolling_origin(Chicago) %>% +sliding_window(Chicago, lookback = Inf) %>% head(2) ``` +This is commonly referred to as "evaluation on a rolling forecasting origin", or more colloquially, "rolling origin cross-validation". + Note that all of these time-based resampling functions are deterministic: unlike the rest of the package, running these functions repeatedly under different random seeds will always return the same results.