Title: | Plausibility Analysis of Epidemiological Signals |
---|---|
Description: | Provides functionality to prepare data and analyze plausibility of both forecasted and reported epidemiological signals. The functions implement a set of plausibility algorithms that are agnostic to geographic and time resolutions and are calculated independently then presented as a combined score. |
Authors: | VP Nagraj [aut, cre] , Desiree Williams [aut], Amy Benefield [aut] |
Maintainer: | VP Nagraj <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2024-11-12 04:27:22 UTC |
Source: | https://github.com/signaturescience/rplanes |
This unexported helper is used internally in valid_dates to optionally issue a warning for potential completeness of seed and signal data based on dates provided.
check_incomplete(seed_date, signal_date, resolution)
check_incomplete(seed_date, signal_date, resolution)
seed_date |
Last date available in seed object |
signal_date |
First date available in signal object |
resolution |
Character vector specifying the temporal resolution (e.g., "weeks", "months") |
Operates as side-effect and returns a warning()
if there are the seed and signal dates combined indicate an incomplete week or month.
This unexported helper function is used within plane_shape()
to generate sliding windows from a vector and return a data frame where each row is a subset (a sliding window) of a time series. The length of the each windowed time series (and therefore number of columns) is equal to "window_size". The number of windows is equal to (length(vector) - window_size) + 1
. For example, given a time series of length 38 and a window size of length 4, then there will be 35 windowed time series (rows), with 4 time stamps each (columns).
create_sliding_windows_df(vector, window_size)
create_sliding_windows_df(vector, window_size)
vector |
A numeric or integer vector that is the time series to be used to create sliding windows |
window_size |
An integer specifying the size (i.e., number of elements) of the windowed time series desired |
A data.frame
where each row is a subset (a sliding window) of a time series.
This unexported helper function takes an input number for an observed difference and cuts it into a categorical description (e.g., "increase", "decrease", or "stable") of the change.
cutter(x, threshold = 1)
cutter(x, threshold = 1)
x |
Vector of length 1 with scaled difference to be categorized |
threshold |
Limit used to define the categorical differences; default is |
Character vector of length 1 with the categorical description of difference
This unexported helper identifies the date of the first day for the epiweek of the given date. The function is used internally inside of valid_dates.
epiweek_start(date)
epiweek_start(date)
date |
Date to be queried |
Date of the first day of the epiweek for the input date.
This unexported helper function is used to identify the shape in the plane_shape()
function's scaled difference ("sdiff") method.
get_shapes(input_data, window_size)
get_shapes(input_data, window_size)
input_data |
A data frame containing at least two columns, one of which must be named "value" with the value assessed and another named "dates" with the date for the observed data |
window_size |
The number of of categorical differences used to define the shape |
A vector with the shapes identified. Each element of the vector will include a shape, which is a cluster of categorical differences (of the same size as the specified "window_size") collapsed with ";" (e.g., c("decrease;stable;stable;stable","stable;stable;stable;increase","stable;stable;increase;increase")
).
This function checks if the object is of class signal
and forecast
.
is_forecast(x)
is_forecast(x)
x |
Input object to be checked |
Logical as to whether or not the input object inherits the "signal" and "forecast" classes.
## get path to example forecast file fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") ex_forecast <- read_forecast(fp) sig <- to_signal(ex_forecast, outcome="flu.admits", type="forecast", horizon=4, resolution="weeks") is_forecast(sig)
## get path to example forecast file fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") ex_forecast <- read_forecast(fp) sig <- to_signal(ex_forecast, outcome="flu.admits", type="forecast", horizon=4, resolution="weeks") is_forecast(sig)
This function checks if the object is of class signal
and observed
.
is_observed(x)
is_observed(x)
x |
Input object to be checked |
Logical as to whether or not the input object inherits the "signal" and "observed" classes.
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") sig <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") is_observed(sig)
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") sig <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") is_observed(sig)
This unexported helper identifies the date of the first day of the month for the given date. The function is used internally inside of valid_dates.
month_start(date)
month_start(date)
date |
Date to be queried |
Date of the first day of the month for the input date.
This function evaluates whether or not the evaluated signal interval covers the last observed value. The interval used in this plausibility component is drawn from the upper and lower bounds of the forecasted prediction interval. As such, the only accepted signal format is forecast, which will include upper and lower bounds.
plane_cover(location, input, seed)
plane_cover(location, input, seed)
location |
Character vector with location code; the location must appear in input and seed |
input |
Input signal data to be scored; object must be one of forecast |
seed |
Prepared seed |
A list
with the following values:
indicator: Logical as to whether or not the last value falls outside of the interval (e.g., not in between lower and upper bounds of prediction interval) of the evaluated signal
last_value: A vector with the last value recorded in the seed
bounds: A list with a two elements corresponding to the upper and lower bounds of the evaluated signal interval
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_cover(location = "08", input = prepped_forecast, seed = prepped_seed) plane_cover(location = "47", input = prepped_forecast, seed = prepped_seed)
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_cover(location = "08", input = prepped_forecast, seed = prepped_seed) plane_cover(location = "47", input = prepped_forecast, seed = prepped_seed)
This function implements the point-to-point difference plausibility component. Differences in evaluated signals are calculated from input values iteratively subtracted from the previous values (i.e., for each x at time point i, the difference will be calculated as xi - xi-1). The plausibility analysis uses the evaluated differences to compare against the maximum difference observed and recorded in the seed.
plane_diff(location, input, seed)
plane_diff(location, input, seed)
location |
Character vector with location code; the location must appear in input and seed |
input |
Input signal data to be scored; object must be one of forecast or observed |
seed |
Prepared seed |
A list
with the following values:
indicator: Logical as to whether or not the absolute value of any of the evaluated differences exceeds the maximum difference
values: A vector with the values assessed including the last value in seed concatenated with the evaluated signal values
evaluated_differences: A vector with the consecutive differences for the values
maximum_difference: A vector with one value for the maximum difference observed in seed
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_diff(location = "10", input = prepped_forecast, seed = prepped_seed) plane_diff(location = "51", input = prepped_forecast, seed = prepped_seed)
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_diff(location = "10", input = prepped_forecast, seed = prepped_seed) plane_diff(location = "51", input = prepped_forecast, seed = prepped_seed)
This function evaluates whether consecutive values in observations or forecasts are repeated a k number of times. This function takes in a forecast or observed object that is either from an observed dataset or forecast dataset. Note that if a signal is contant (i.e., the same value is repeated for all time points) then the repeat component will return FALSE
.
plane_repeat(location, input, seed, tolerance = NULL, prepend = NULL)
plane_repeat(location, input, seed, tolerance = NULL, prepend = NULL)
location |
Character vector with location code; the location must appear in input and seed |
input |
Input signal data to be scored; object must be one of forecast or observed |
seed |
Prepared seed |
tolerance |
Integer value for the number of allowed repeats before flag is raised. Default is |
prepend |
Integer value for the number of values from seed to add before the evaluated signal. Default is |
A list
with the following values:
indicator: Logical as to whether or not the value is repeated sequentially k number of times.
repeats: A tibble
with repeating values found. If there are no repeats (i.e., indicator is FALSE
) then the tibble
will have 0 rows.
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component ## use defaults plane_repeat(location = "12", input = prepped_forecast, seed = prepped_seed) ## set tolerated repeats to 2 plane_repeat(location = "12", input = prepped_forecast, seed = prepped_seed, tolerance = 2) ## use defaults plane_repeat(location = "49", input = prepped_forecast, seed = prepped_seed) ## set number of values prepended for evaluation to 4 plane_repeat(location = "49", input = prepped_forecast, seed = prepped_seed, prepend = 4)
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component ## use defaults plane_repeat(location = "12", input = prepped_forecast, seed = prepped_seed) ## set tolerated repeats to 2 plane_repeat(location = "12", input = prepped_forecast, seed = prepped_seed, tolerance = 2) ## use defaults plane_repeat(location = "49", input = prepped_forecast, seed = prepped_seed) ## set number of values prepended for evaluation to 4 plane_repeat(location = "49", input = prepped_forecast, seed = prepped_seed, prepend = 4)
This function wraps PLANES scoring for specified components across all locations in single step.
plane_score(input, seed, components = "all", args = NULL, weights = NULL)
plane_score(input, seed, components = "all", args = NULL, weights = NULL)
input |
Input signal data to be scored; object must be one of forecast or observed |
seed |
Prepared seed |
components |
Character vector specifying component; must be either |
args |
Named list of arguments for component functions. List elements must be named to match the given component and arguments passed as a nested list (e.g., |
weights |
Named vector with weights to be applied; default is |
A list
with scoring results for all locations.
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane scoring with all components plane_score(input = prepped_forecast, seed = prepped_seed) ## run plane scoring with select components plane_score(input = prepped_forecast, seed = prepped_seed, components = c("cover","taper")) ## run plane scoring with all components and additional args trend_args <- list("sig_lvl" = 0.05) repeat_args <- list("prepend" = 4, "tolerance" = 8) shape_args <- list("method" = "dtw") comp_args <- list("trend" = trend_args, "repeat" = repeat_args, "shape" = shape_args) plane_score(input = prepped_forecast, seed = prepped_seed, args = comp_args) ## run plane scoring with specific components and weights comps <- c("cover", "taper", "diff") wts <- c("cover" = 1.5, "taper" = 1, "diff" = 4) plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts)
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane scoring with all components plane_score(input = prepped_forecast, seed = prepped_seed) ## run plane scoring with select components plane_score(input = prepped_forecast, seed = prepped_seed, components = c("cover","taper")) ## run plane scoring with all components and additional args trend_args <- list("sig_lvl" = 0.05) repeat_args <- list("prepend" = 4, "tolerance" = 8) shape_args <- list("method" = "dtw") comp_args <- list("trend" = trend_args, "repeat" = repeat_args, "shape" = shape_args) plane_score(input = prepped_forecast, seed = prepped_seed, args = comp_args) ## run plane scoring with specific components and weights comps <- c("cover", "taper", "diff") wts <- c("cover" = 1.5, "taper" = 1, "diff" = 4) plane_score(input = prepped_forecast, seed = prepped_seed, components = comps, weights = wts)
This function wraps the seed_engine to operate across all locations in the input signal.
plane_seed(input, cut_date = NULL)
plane_seed(input, cut_date = NULL)
input |
Input signal data used for seeding; must be an observed signal object |
cut_date |
Maximum date (inclusive) for which seeding should be performed; default is |
A named list
of length n, where multiple elements corresponding to seed characteristics and metadata for each of the n locations are nested in independent lists.
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## prepare seed with no cut date plane_seed(prepped_observed) ## prepare seed with cut date plane_seed(prepped_observed, cut_date = "2022-10-29")
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## prepare seed with no cut date plane_seed(prepped_observed) ## prepare seed with cut date plane_seed(prepped_observed, cut_date = "2022-10-29")
This function identifies the shape of the trajectory for a forecasted signal to compare against existing shapes in seed data. If the shape is identified as novel, a flag is raised, and the signal is considered implausible. See the Details section for further information.
plane_shape(location, input, seed, method = "sdiff")
plane_shape(location, input, seed, method = "sdiff")
location |
Character vector with location code; the location must appear in input and seed |
input |
Input signal data to be scored; object must be one of forecast |
seed |
Prepared seed |
method |
The method for determining shapes; must be one of "sdiff" or "dtw" (see Details); default is "sdiff" |
The approach for determining shapes can be customized by the user with the plane_shape()
"method" argument. The two methods available are "sdiff" (default) and "dtw". Compared with "sdiff", the "dtw" method has been shown to have a higher sensitivity, lower specificity, and much greater computational cost in some circumstances. The "sdiff" method is recommended if computational efficiency is a concern.
The "sdiff" method will use consecutive scaled differences to construct shapes. The algorithm operates in three steps:
The prepared seed data is combined with forecasted point estimates and each point-to-point difference is calculated.
The differences are centered and scaled, then cut into categories. Differences greater than or equal to one standard deviation above the mean of differences are considered an "increase". Differences less than or equal to one standard deviation below the mean of differences are considered a "decrease". All other differences are considered "stable".
The categorical differences are then combined into windows of equal size to the forecasted horizon. Collectively these combined categorical differences create a "shape" (e.g., "increase;stable;stable;decrease").
Lastly, the algorithm compares the shape for the forecast to all of the shapes observed. If the shape assessed has not been previously observed in the time series then a flag is raised and the indicator returned is TRUE
.
The "dtw" method uses a Dynamic Time Warping (DTW) algorithm to identify shapes within the seed data and then compares the shape of the forecast input signal to the observed shapes. This is done in three broad steps:
The prepared seed data is divided into a set of sliding windows with a step size of one, each representing a section of the overall time series. The length of these windows is determined by the horizon length of the input data signal (e.g., 2 weeks). For example, if the seed data was a vector, c(1, 2, 3, 4, 5)
, and the horizon length was 2, then the sliding windows for the observed seed data would be: c(1, 2)
, c(2, 3)
, c(3, 4)
, and c(4, 5)
. Each sliding window is a subset of the total trajectory shape of the observed data.
Shape-based DTW distances are calculated for every 1x1 combination of the observed sliding windows and are stored in a distance matrix. These distances calibrate the function for identifying outlying shapes in forecast data. The algorithm finds the minimum distances for each windowed time series to use as a baseline for "observed distances" between chunks of the larger observed time series. The maximum of those minimum distances across the observed time series is set as the threshold. If the minimum of the forecast:observed distance matrix is greater than the threshold, then the forecast is inferred to be unfamiliar (i.e., a novel shape).
Next, the algorithm calculates the shape-based DTW distances between the forecast signal (including the point estimate, lower, and upper bounds) and every observed sliding window. If the distance between the forecast and any observed sliding window is less than or equal to the threshold defined above, then this shape is not novel and no flag is raised (indicator is FALSE
).
A list
with the following values:
indicator: Logical as to whether or not the the shape of the evaluated signal is novel (TRUE
if shape is novel, FALSE
if a familiar shape exists in the seed)
Toni Giorgino. Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package. Journal of Statistical Software, 31(7), 1-24. doi:10.18637/jss.v031.i07
Tormene, P.; Giorgino, T.; Quaglini, S. & Stefanelli, M. Matching incomplete time series with dynamic time warping: an algorithm and an application to post-stroke rehabilitation. Artif Intell Med, 2009, 45, 11-34. doi:10.1016/j.artmed.2008.11.007
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) tmp_hosp <- hosp %>% dplyr::select(date, location, flu.admits) %>% dplyr::mutate(date = as.Date(date)) prepped_observed <- to_signal(tmp_hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal prepped_forecast <- read_forecast(system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_shape(location = "37", input = prepped_forecast, seed = prepped_seed) ## run plane component with DTW method plane_shape(location = "37", input = prepped_forecast, seed = prepped_seed, method = "dtw")
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) tmp_hosp <- hosp %>% dplyr::select(date, location, flu.admits) %>% dplyr::mutate(date = as.Date(date)) prepped_observed <- to_signal(tmp_hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal prepped_forecast <- read_forecast(system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_shape(location = "37", input = prepped_forecast, seed = prepped_seed) ## run plane component with DTW method plane_shape(location = "37", input = prepped_forecast, seed = prepped_seed, method = "dtw")
This function evaluates whether or not the evaluated signal interval tapers (i.e., decreases in width) as horizons progress. The interval used in this plausibility component is drawn from the upper and lower bounds of the forecasted prediction interval. As such, the only accepted signal format is forecast, which will include upper and lower bounds.
plane_taper(location, input, seed)
plane_taper(location, input, seed)
location |
Character vector with location code; the location must appear in input and seed |
input |
Input signal data to be scored; object must be one of forecast |
seed |
Prepared seed |
A list
with the following values:
indicator: Logical as to whether or not the prediction interval width tapers with advancing horizons
widths: Consecutive interval widths for forecasted data
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_taper(location = "19", input = prepped_forecast, seed = prepped_seed) plane_taper(location = "44", input = prepped_forecast, seed = prepped_seed)
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_taper(location = "19", input = prepped_forecast, seed = prepped_seed) plane_taper(location = "44", input = prepped_forecast, seed = prepped_seed)
This function identifies any change points in the forecast data or in the final observed data point. Change points are identified by any significant change in magnitude or direction of the slope of the time series.
plane_trend(location, input, seed, sig_lvl = 0.1)
plane_trend(location, input, seed, sig_lvl = 0.1)
location |
Character vector with location code; the location must appear in input and seed |
input |
Input signal data to be scored; object must be forecast |
seed |
Prepared seed |
sig_lvl |
The significance level at which to identify change points (between zero and one); default is |
This function uses e.divisive(), which implements a hierarchical divisive algorithm to identify change points based on distances between segments (calculated using equations 3 and 5 in Matteson and James, 2014; the larger the distance, the more likely a change point). Then a permutation test is used to calculate an approximate p-value.
The input to e.divisive()
is transformed using differencing (i.e., diff(x)
instead of the raw data, x
). This slightly changes the way that change points are identified, as the index aligns with the gap between points rather than the points themselves. Instead of identifying a change point based on the change in size between two points, it identifies change points based on the change in the change itself. The dataframe below illustrates the difference between x
and diff(x)
:
Index | x | diff(x) |
1 | 3 | 6 |
2 | 9 | 0 |
3 | 9 | 28 |
4 | 37 | 37 |
5 | 74 | 1 |
6 | 75 | 0 |
7 | 75 | 0 |
Given this data, e.divisive(x)
would identify index 5 (74) as the change point, because there was a jump of +37 between index 4 and 5. But e.divisive(diff(x))
would pick both index 3 (28) and 5 (1), because there was a jump of +28 from index 2 and 3, and there was a jump of -36 between index 4 and 5.
Internally, the trend function uses an extra argument to e.divisive()
for min.size = 2
, which requires a gap of at least 2 points between detecting change points. This can indirectly increase the significance level or decrease the number of change points identified.
A list
with the following values:
indicator: Logical as to whether or not the any forecast data or the final observed data point are a significant change point
output: An n x 7 tibble. The length of the forecast plus the observed data determine the length of n. The columns are:
Location: A character vector with the location code
Index: An integer index of all observed and forecast data
Date: The dates corresponding to all observed and forecast data (formatted as date)
Value: The incidence of all observed and forecast data (e.g., hospitalization rates)
Type: Indicates whether the data row is observed or forecast data
Changepoint: Logical identifying any change point (whether in observed or forecast data). A TRUE is returned if any point is determined a change point based on the user defined significance level (sig_lvl).
Flagged: Logical indicating whether or not the change point was flagged. Change points are only flagged if they are in the forecast data or are the final observed data point. A TRUE is returned if the Changepoint is TRUE and is a final observed data point or any forecast point.
flagged_dates: The date of any flagged change point(s). If there are none, NA is returned
Matteson, D. S., & James, N. A. (2014). A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109(505), 334–345. https://doi.org/10.1080/01621459.2013.849605
Matteson DS, James NA (2013). “A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.” ArXiv e-prints. To appear in the Journal of the American Statistical Association, 1306.4933.
Gandy, A. (2009) "Sequential implementation of Monte Carlo tests with uniformly bounded resampling risk." Journal of the American Statistical Association.
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) tmp_hosp <- hosp %>% dplyr::select(date, location, flu.admits) %>% dplyr::mutate(date = as.Date(date)) prepped_observed <- to_signal(tmp_hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal prepped_forecast <- read_forecast(system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_trend(location = "05", input = prepped_forecast, seed = prepped_seed, sig_lvl = .2) ## change location plane_trend(location = "09", input = prepped_forecast, seed = prepped_seed, sig_lvl = .2) ## change sig_lvl plane_trend(location = "06", input = prepped_forecast, seed = prepped_seed, sig_lvl = .05)
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) tmp_hosp <- hosp %>% dplyr::select(date, location, flu.admits) %>% dplyr::mutate(date = as.Date(date)) prepped_observed <- to_signal(tmp_hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal prepped_forecast <- read_forecast(system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes")) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_trend(location = "05", input = prepped_forecast, seed = prepped_seed, sig_lvl = .2) ## change location plane_trend(location = "09", input = prepped_forecast, seed = prepped_seed, sig_lvl = .2) ## change sig_lvl plane_trend(location = "06", input = prepped_forecast, seed = prepped_seed, sig_lvl = .05)
This function checks for the presence of any value(s) equal to zero in the evaluated signal. If there are any zeros found, then the function assesses whether or not any zeros have been observed in the seed for the given location. If so, the function will consider the evaluated zero plausible and no flag will be raised (i.e., indicator returned as FALSE
). If not, the function will consider the evaluated zero implausible and a flag will be raised (i.e., indicator returned as TRUE
).
plane_zero(location, input, seed)
plane_zero(location, input, seed)
location |
Character vector with location code; the location must appear in input and seed |
input |
Input signal data to be scored; object must be one of forecast or observed |
seed |
Prepared seed |
A list
with the following values:
indicator: Logical as to whether or not there are zeros in evaluated signal but not in seed data
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_zero(location = "10", input = prepped_forecast, seed = prepped_seed) plane_zero(location = "51", input = prepped_forecast, seed = prepped_seed)
## read in example observed data and prep observed signal hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) hosp$date <- as.Date(hosp$date, format = "%Y-%m-%d") prepped_observed <- to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") ## read in example forecast and prep forecast signal fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") prepped_forecast <- read_forecast(fp) %>% to_signal(., outcome = "flu.admits", type = "forecast", horizon = 4) ## prepare seed with cut date prepped_seed <- plane_seed(prepped_observed, cut_date = "2022-10-29") ## run plane component plane_zero(location = "10", input = prepped_forecast, seed = prepped_seed) plane_zero(location = "51", input = prepped_forecast, seed = prepped_seed)
This unexported helper generates a vector of lower bound, median, and upper bound for the prediction interval of specified width. The function is used internally inside of read_forecast.
q_boundary(pi_width)
q_boundary(pi_width)
pi_width |
Interval width as an integer |
Vector of quantiles corresponding to lower and upper bounds centered on median.
This function reads a probabilistic ("quantile") forecast csv file and prepares it for the to_signal function and downstream plausibility analysis. The quantile forecast file can be either a "legacy" or "hubverse" format (see Details for more information). The object returned is a tibble
with summarized forecast data (i.e., prediction interval) for each location and horizon in the original file.
read_forecast(file, pi_width = 95, format = "legacy")
read_forecast(file, pi_width = 95, format = "legacy")
file |
Path to csv file containing quantile forecasts |
pi_width |
Width of prediction interval as integer; default |
format |
Format of the probabilistic format file; must be one of |
The probabilistic forecast format has been used by multiple forecasting hubs. In general, this format includes one row per combination of quantile, location, target, and horizon. At each row the forecasted value is provided. The specific format, including columns required, has changed over time. This function accommodates the "legacy" as well as more recent "hubverse" formats. For more details on specific columns and see the links in the References.
A tibble
with the following columns:
location: Geographic unit such as FIPS code
date: Date corresponding the forecast horizon
horizon: Forecast horizon
lower: Lower limit of the prediction interval for the forecast
point: Point estimate for the forecast
upper: Upper limit of the prediction interval for the forecast
Hubverse: https://hubdocs.readthedocs.io/en/latest/user-guide/model-output.html
Legacy: https://github.com/cdcepi/Flusight-forecast-data/tree/master/data-forecasts#forecast-file-format
## read in example forecast and prep forecast signal (legacy format) fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") read_forecast(fp) fp2 <- system.file("extdata/forecast/2023-11-04-SigSci-TSENS.csv", package = "rplanes") read_forecast(fp2, format = "hubverse")
## read in example forecast and prep forecast signal (legacy format) fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") read_forecast(fp) fp2 <- system.file("extdata/forecast/2023-11-04-SigSci-TSENS.csv", package = "rplanes") read_forecast(fp2, format = "hubverse")
This helper function uses argument matching to resolve the resolution from input. The function also handles casing. This will allow, for example, an input resolution of "daily" or "day" to be resolved to "days".
resolve_resolution(resolution)
resolve_resolution(resolution)
resolution |
Character vector specifying the temporal resolution (e.g., "days", "weeks", "months") |
If the resolution matches to "days", "weeks", or "months" then the match will be returned. If not, the function will throw an error.
rplanes
explorer app launcherThe rplanes
explorer app allows a user to interactively upload their own data (or view an internal example) to explore the plausibility analysis functionality.
rplanes_explorer(...)
rplanes_explorer(...)
... |
Additional arguments to be passed to shiny::runApp |
This function operates as a side-effect and starts the rplanes
Shiny app.
## Not run: # Launch the explorer app rplanes_explorer(host = "0.0.0.0", launch.browser = TRUE, port = 80) ## End(Not run)
## Not run: # Launch the explorer app rplanes_explorer(host = "0.0.0.0", launch.browser = TRUE, port = 80) ## End(Not run)
This helper function is used inside of plane_seed to evaluate characteristics of observed data to use for downstream plausibility analysis.
seed_engine(input, location, cut_date = NULL)
seed_engine(input, location, cut_date = NULL)
input |
Input signal data used for seeding; must be an observed signal object |
location |
Character vector with location code |
cut_date |
Maximum date (inclusive) for which seeding should be performed; default is |
A list
of length 1 with multiple elements corresponding to seed characteristics and metadata for the given location.
This unexported helper function creates a list with contents of a vector spit into chunks. The user can specify how large each chunk should be with the "size" argument.
to_chunk(x, size)
to_chunk(x, size)
x |
Vector to be split into chunks as large as the "size" specified |
size |
Width of the chunks for "x" vector |
A list with as many elements as the number of chunks created. Each element will include vector with a length equal to the "size" specified.
This function creates an object of the S3 class "signal". The user can conditionally specify either a "forecast" or "observed" signal.
to_signal( input, outcome, type = "observed", resolution = "weeks", horizon = NULL )
to_signal( input, outcome, type = "observed", resolution = "weeks", horizon = NULL )
input |
Data to be converted to signal; see "Details" for more information |
outcome |
Name of the outcome column in the input data |
type |
Signal type; must be one of "observed" or "forecast"; default is "observed" |
resolution |
The temporal resolution of the signal; data can be aggregated daily, weekly, or monthly; default is |
horizon |
Number of time steps ahead for forecast signals; only used if |
The input signal data may be either "observed" or "forecast" data. Depending on the type, the input data must conform to certain format prior to submission. In both cases, the data must be passed as a data frame.
For "observed" data the data frame must at minimum include columns for location (geographic unit such as FIPS code) and date (date of reported value; must be date
class). The data should also include a column that contains the outcome (e.g., case count).
For "forecast" data the data frame must include columns for location (geographic unit such as FIPS code), date (date corresponding to forecast horizon; must be date
class or character formatted as 'YYYY-MM-DD'), horizon (forecast horizon), lower (the lower limit of the prediction interval for the forecast), point (the point estimate for the forecast), and upper (the upper limit of the prediction interval for the forecast). Note that the read_forecast function returns data in this format.
The input data must at the daily, weekly, or monthly resolution. The "resolution" parameter is designed to use string matching. This allows flexibility for the user, such that, for example, an input of "day", "days", or "daily" would all resolve to a resolution of days. The same rules apply for designating weekly or monthly resolution.
An object of the class signal
. The object will have a second class of either observed
or forecast
depending on the value passed to the "type" argument.
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") ex_forecast <- read_forecast(fp) to_signal(ex_forecast, outcome = "flu.admits", type = "forecast", horizon = 4, resolution = "weeks")
hosp <- read.csv(system.file("extdata/observed/hdgov_hosp_weekly.csv", package = "rplanes")) to_signal(hosp, outcome = "flu.admits", type = "observed", resolution = "weeks") fp <- system.file("extdata/forecast/2022-10-31-SigSci-TSENS.csv", package = "rplanes") ex_forecast <- read_forecast(fp) to_signal(ex_forecast, outcome = "flu.admits", type = "forecast", horizon = 4, resolution = "weeks")
This function validates that there are no gaps or overlaps between dates specified in the "seed_date" and "signal_date". During plausibility component analyses, the function is called to validate the seed against the evaluated signal.
valid_dates(seed_date, signal_date, resolution, warn_incomplete = FALSE)
valid_dates(seed_date, signal_date, resolution, warn_incomplete = FALSE)
seed_date |
Last date available in seed object |
signal_date |
First date available in signal object |
resolution |
Character vector specifying the temporal resolution (e.g., "days", "weeks", "months") |
warn_incomplete |
Logical as to whether or not the validation should warn for completeness of seed and signal; default is |
The validation will return with a stop()
if there is an overlap or gap between seed and signal dates. Otherwise the function will invisibly return TRUE
indicating that the date span is valid.
seed_date <- as.Date("2023-03-08") signal_date <- as.Date("2023-03-15") valid_dates(seed_date = seed_date, signal_date = signal_date, resolution="weeks") x <- try(valid_dates(seed_date = seed_date, signal_date = signal_date, resolution="days"), silent=TRUE) x x <- try(valid_dates(seed_date = seed_date, signal_date = signal_date, resolution="months"), silent=TRUE) x
seed_date <- as.Date("2023-03-08") signal_date <- as.Date("2023-03-15") valid_dates(seed_date = seed_date, signal_date = signal_date, resolution="weeks") x <- try(valid_dates(seed_date = seed_date, signal_date = signal_date, resolution="days"), silent=TRUE) x x <- try(valid_dates(seed_date = seed_date, signal_date = signal_date, resolution="months"), silent=TRUE) x
This unexported helper is used inside of the individual plausibility component functions (e.g., plane_diff()
) to validate that the location specified appears in both the input signal and seed and that the location has as many values as other locations in the seed.
valid_location(location, input, seed)
valid_location(location, input, seed)
location |
Character vector with location code; the location must appear in input and seed |
input |
Input signal data to be scored; object must be forecast |
seed |
Prepared seed |
The validation will return with a stop()
if the location is not found in the seed or input signal. Otherwise the function will invisibly return TRUE
indicating that the location is valid.