Package 'fiphde' reference manual

Title:	Forecasting Influenza in Support of Public Health Decision Making
Description:	Miscellaneous functions for retrieving data, creating and evaluating time series forecasting models for influenza-like illness (ILI) and influenza hospitalizations in the United States.
Authors:	VP Nagraj [aut, cre] , Desiree Williams [aut], Shakeel Jessa [aut], Chris Hulme-Lowe [aut], Stephen Turner [aut]
Maintainer:	VP Nagraj <[email protected]>
License:	GPL (>= 3)
Version:	2.1.0
Built:	2025-03-27 03:36:03 UTC
Source:	https://github.com/signaturescience/fiphde

Make clean column names

Description

This helper is used in ilinet and who_nrevss functions to clean column names of values returned from the APIs.

Usage

.mcga(tbl)
.mcga(tbl)

Arguments

tbl

Input tibble with columns to rename

Value

A tibble with clean column names

Nowcast clinical laboratory percent positive flu data

Description

This function provides a naive nowcasting method for clinical laboratory percent positive flu data. The methodology simply averages the last 4 weeks of available data and uses this average as the value for the number of weeks specified to replace. The function will always add 1 additional week to the observed data and (optionally) replace the number of weeks specified in the "weeks_to_replace" argument. This is useful given that there is reporting lag in the NREVSS clinical laboratory percent positive flu data.

Usage

clin_nowcast(clin, weeks_to_replace = 1)
clin_nowcast(clin, weeks_to_replace = 1)

Arguments

`clin`	Data prepared with get_cdc_clin
`weeks_to_replace`	Number of retrospective weeks to replace with nowcast; default is `1`

Value

A tibble with the following columns:

abbreviation: Abbreviation for the location
location: FIPS code for the location
epiyear: Year of reporting (in epidemiological week calendar)
epiweek: Week of reporting (in epidemiological week calendar)
week_start: Date of beginning (Sunday) of the given epidemiological week
p_positive: Percentage of positive specimens
n_positive: Total number of positive specimens
total: Total number of specimens tested

Examples

## Not run: 

# Get data for Texas
tx_clin <-
  get_cdc_clin(region = "state") %>%
  dplyr::filter(location == "48")

# Look at most recent observations
tx_clin %>%
  dplyr::arrange(week_start) %>%
  tail()

# Now augment with default 1 week nowcast
tx_clin %>%
  clin_nowcast(., weeks_to_replace = 1) %>%
  dplyr::arrange(week_start) %>%
  tail()

# And again augmented with 2 week nowcast instead
tx_clin %>%
  clin_nowcast(., weeks_to_replace = 2) %>%
  dplyr::arrange(week_start) %>%
  tail()


## End(Not run)

## Not run: 

# Get data for Texas
tx_clin <-
  get_cdc_clin(region = "state") %>%
  dplyr::filter(location == "48")

# Look at most recent observations
tx_clin %>%
  dplyr::arrange(week_start) %>%
  tail()

# Now augment with default 1 week nowcast
tx_clin %>%
  clin_nowcast(., weeks_to_replace = 1) %>%
  dplyr::arrange(week_start) %>%
  tail()

# And again augmented with 2 week nowcast instead
tx_clin %>%
  clin_nowcast(., weeks_to_replace = 2) %>%
  dplyr::arrange(week_start) %>%
  tail()


## End(Not run)

Calculate categorical probability density

Description

This unexported helper function is used to build a distribution from quantile forecasts and then calculate the probability density for the thresholds associated with each category forecasted: large increase, increase, stable, decrease, large decrease.

Usage

density_probs(df, n_horizons = 5, ...)
density_probs(df, n_horizons = 5, ...)

Arguments

`df`	Data frame with forecasts and categorical thresholds joined
`n_horizons`	Number of horizons ahead
`...`	Additional arguments passed to `distfromq::distfromq()`

Value

Data frame with probabilities for each rate change category

FIPHDE explorer app launcher

Description

The explorer app allows a user to view plots of forecasts, inspect tabular output of submission files, and download subsets of forecast submission data. The app includes an interface to interactively select locations to include in the plots, table, and download. This function wraps shiny::runApp and accepts arguments for the observed data against which the forecasts should be plotted, as well as the directory containing submission files, both of which are temporarily attached to the global environment for use during the app session. Additional arguments passed to ... will be inherited by shiny::runApp.

The explorer is meant to review candidate submission files. As such, the app is written to expect that submission files in the "submission_dir" argument are named with ".candidate.csv" suffix. For examples of submission files and the naming convention see system.file("extdata", "submission-example", package = "fiphde"). Note that submission files can be included for multiple models in model-specific sub-directories.

Usage

fiphde_launcher(.data, submission_dir, app_dir = NULL, ...)
fiphde_launcher(.data, submission_dir, app_dir = NULL, ...)

Arguments

`.data`	A `tibble` with historical data for trend leading up to forecast
`submission_dir`	Full path to directory of submission files containing forecast submissions to explore; submission files in the directory must be named with ".candidate.csv" suffix
`app_dir`	Full path to directory of explorer app; default is `NULL` and app directory will be resolved from `system.file("app", package="fiphde")`
`...`	Additional arguments to be passed to shiny::runApp

Value

This function starts a Shiny app. On exit it removes objects (see ".data" and "submission_dir") that are temporarily attached and used by the app session.

Examples

## Not run: 
# Path to the submission example
submission_dir <- system.file("extdata", "submission-example", package = "fiphde")
# Prepare data for explorer app
prepped_hosp <-
  get_hdgov_hosp(limitcols = TRUE) %>%
  prep_hdgov_hosp(statesonly=TRUE, min_per_week = 0, remove_incomplete = TRUE) %>%
  dplyr::filter(abbreviation != "DC")
# Launch the explorer app
fiphde_launcher(.data = prepped_hosp,
                submission_dir = submission_dir,
                host = "0.0.0.0",
                launch.browser = TRUE,
                port = 80)

## End(Not run)
## Not run: 
# Path to the submission example
submission_dir <- system.file("extdata", "submission-example", package = "fiphde")
# Prepare data for explorer app
prepped_hosp <-
  get_hdgov_hosp(limitcols = TRUE) %>%
  prep_hdgov_hosp(statesonly=TRUE, min_per_week = 0, remove_incomplete = TRUE) %>%
  dplyr::filter(abbreviation != "DC")
# Launch the explorer app
fiphde_launcher(.data = prepped_hosp,
                submission_dir = submission_dir,
                host = "0.0.0.0",
                launch.browser = TRUE,
                port = 80)

## End(Not run)

Forecast categorical targets

Description

This function takes probabilistic flu hospitalization forecast input and converts the forecasted values for each location to a categorical "change" indicator. The criteria for each level ("large decrease", "decrease", "stable", "increase", "large increase") was defined by the CDC (see link in references). The algorithm evaluates absolute changes in counts and rates (per 100k individuals) for the most recently observed week and a 2 week ahead forecasted horizon. This procedure runs independently for each location, and results in a formatted tabular output that includes each possible level and its corresponding probability of being observed (calculated from probabilistic quantiles) for every location.

Usage

forecast_categorical(
  .forecast,
  .observed,
  method = "density",
  format = "hubverse",
  horizon = 4
)
forecast_categorical(
  .forecast,
  .observed,
  method = "density",
  format = "hubverse",
  horizon = 4
)

Arguments

`.forecast`	A tibble with "submission-ready" probabilistic flu hospitalization forecast data (i.e., tibble contained in list element returned from format_for_submission)
`.observed`	A tibble with observed flu admission data (i.e., tibble output from prep_hdgov_hosp)
`method`	The categorical forecasting method to use; must be one of `"density"` or `"interpolation"`; default is `"density"`
`format`	The submission format to be used; must be one of `"hubverse"` or `"legacy"`; default is `"hubverse"`
`horizon`	The number of horizons ahead to forecast; must be one of `4` or `5`; default is `4`

Value

A tibble with formatted categorical forecasts.

If format is "hubverse" the tibble will have the following columns:

reference_date: Date of reference for forecast submission
horizon: Horizon for the given forecast
target: Name of forecasted target
target_end_date: Last date of the forecasted target (e.g., Saturday of the given epidemiological week)
location: Name or geographic identifier (e.g., FIPS code) for location for the given forecast
output_type: Type of forecasted value (e.g., "quantile")
output_type_id: The quantile for the forecasted value if output_type is "quantile"
value: The forecasted value

If format is "legacy" the tibble will have the following columns:

forecast_date: Date of forecast
target: Horizon and name of forecasted target
target_end_date: Last date of the forecasted target (e.g., Saturday of the given epidemiological week)
location: Name or geographic identifier (e.g., FIPS code) for location for the given forecast
type: One of either "point" or "quantile" for the forecasted value
quantile: The quantile for the forecasted value; NA if "type" is "point"
value: The forecasted value

References

https://github.com/cdcepi/Flusight-forecast-data/blob/master/data-experimental/README.md

Examples

## Not run: 
# Retrieve hospitalization data
h_raw <- get_hdgov_hosp(limitcols=TRUE)
# Prepare and summarize hospitalization data to weekly resolution
prepped_hosp <- prep_hdgov_hosp(h_raw)
# Create a keyed time series tibble with only locations of interest
prepped_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)

# Run with default constrained ARIMA, nonseasonal ETS, no NNETAR
hosp_fitfor <- ts_fit_forecast(prepped_tsibble,
                               horizon=4L,
                               outcome="flu.admits",
                               covariates=TRUE)
# Prepare forecast for quantile submission format
forc <- format_for_submission(hosp_fitfor$tsfor, method = "ts", format = "legacy")
# Run categorical summary of quantiles for the time series ensemble
forecast_categorical(forc$ensemble, prepped_hosp, method = "interpolation", format = "legacy")

## End(Not run)
## Not run: 
# Retrieve hospitalization data
h_raw <- get_hdgov_hosp(limitcols=TRUE)
# Prepare and summarize hospitalization data to weekly resolution
prepped_hosp <- prep_hdgov_hosp(h_raw)
# Create a keyed time series tibble with only locations of interest
prepped_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)

# Run with default constrained ARIMA, nonseasonal ETS, no NNETAR
hosp_fitfor <- ts_fit_forecast(prepped_tsibble,
                               horizon=4L,
                               outcome="flu.admits",
                               covariates=TRUE)
# Prepare forecast for quantile submission format
forc <- format_for_submission(hosp_fitfor$tsfor, method = "ts", format = "legacy")
# Run categorical summary of quantiles for the time series ensemble
forecast_categorical(forc$ensemble, prepped_hosp, method = "interpolation", format = "legacy")

## End(Not run)

Forecast ILI

Description

This function forecasts ILI up to a specified future horizon. The models used can be parameterized with a "models" argument (for more details see ts_fit_forecast). By default, the function will use an ARIMA approach to model all locations in the input historical ILI data and then use the fitted models to forecast out to each of the horizons.

Usage

forecast_ili(
  ilidat,
  horizon = 4L,
  trim_date = NULL,
  models = list(arima = "PDQ(0,0,0)+pdq(1:2,0:2,0)")
)
forecast_ili(
  ilidat,
  horizon = 4L,
  trim_date = NULL,
  models = list(arima = "PDQ(0,0,0)+pdq(1:2,0:2,0)")
)

Arguments

`ilidat`	Data returned from get_cdc_ili
`horizon`	Optional horizon periods through which the forecasts should be generated; default is `4`
`trim_date`	Earliest start date you want to use for ILI data; default `NULL` doesn't trim
`models`	The list of model parameters passed to ts_fit_forecast; defaults to `⁠list(arima="PDQ(0,0,0)+pdq(1:2,0:2,0)"⁠`

Value

A named list containing:

ilidat: The data sent into the function filtered to the location and the trim_date. Select columns returned.
ilidat_tsibble: The tsibble class object returned by running make_tsibble on the data above.
ili_fit: The fit from fabletools::model.
ili_forecast: The forecast from fabletools::forecast at the specified horizon.
ili_future: The horizon-number of weeks of ILI data forecasted into the future.
ili_bound: The data in 1 bound to the data in 5.
arima_params: A tibble with ARIMA model parameters for each location (if type="arima").
locstats: A tibble with missing data information on all locations.
removed: A tibble with locations removed because of high missing ILI data.

Examples

## Not run: 
# Retrieve ILI data
ilidat <- get_cdc_ili(region = c("national", "state", "hhs"),
                      years = 2010:lubridate::year(lubridate::today()))

# Using data only from march 2020 forward, for US only
ilidat_us <- ilidat %>% dplyr::filter(location=="US")
# Replace most recent week with nowcast data, and nowcast last week
ilidat_us <- ilidat_us %>% replace_ili_nowcast(weeks_to_replace=1)
ilifor_us <- forecast_ili(ilidat_us, horizon=4L, trim_date="2020-03-01")
# Take a look at objects that come out ILI forecasting procedure
ilifor_us$ili_fit
ilifor_us$arima_params
ilifor_us$ili_forecast
head(ilifor_us$ili_bound)
tail(ilifor_us$ili_bound, 10)

## End(Not run)
## Not run: 
# Retrieve ILI data
ilidat <- get_cdc_ili(region = c("national", "state", "hhs"),
                      years = 2010:lubridate::year(lubridate::today()))

# Using data only from march 2020 forward, for US only
ilidat_us <- ilidat %>% dplyr::filter(location=="US")
# Replace most recent week with nowcast data, and nowcast last week
ilidat_us <- ilidat_us %>% replace_ili_nowcast(weeks_to_replace=1)
ilifor_us <- forecast_ili(ilidat_us, horizon=4L, trim_date="2020-03-01")
# Take a look at objects that come out ILI forecasting procedure
ilifor_us$ili_fit
ilifor_us$arima_params
ilifor_us$ili_forecast
head(ilifor_us$ili_bound)
tail(ilifor_us$ili_bound, 10)

## End(Not run)

Format forecasts for submission

Description

This function prepares forecasts to adhere to probabilistic forecast submission guidelines for consortia such as FluSight.

Usage

format_for_submission(
  .forecasts,
  method = "ts",
  .target = "wk ahead inc flu hosp",
  format = "hubverse",
  horizon_shift = 1
)
format_for_submission(
  .forecasts,
  method = "ts",
  .target = "wk ahead inc flu hosp",
  format = "hubverse",
  horizon_shift = 1
)

Arguments

`.forecasts`	Forecasts to be formatted for submission; if method is `"ts"` this should be forecasts from ts_fit_forecast; otherwise this must be a `tibble` with forecast output (e.g., output from glm_forecast) with a column designating "location"
`method`	Method for forecasting; default is `"ts"` which will trigger the use of ts_format_for_submission internally
`.target`	Name of the target in the forecast; default is `"wk ahead inc flu hosp"`
`format`	The submission format to be used; must be one of `"hubverse"` or `"legacy"`; default is `"hubverse"`
`horizon_shift`	Number of horizons to shift backwards to align with reference date; only used if format is `"hubverse"`; default is `1`

Value

A named list of tibbles with probabilistic forecasts (one for each model), formatted for submission.

If format is "hubverse" each tibble will have the following columns:

reference_date: Date of reference for forecast submission
horizon: Horizon for the given forecast
target: Name of forecasted target
target_end_date: Last date of the forecasted target (e.g., Saturday of the given epidemiological week)
location: Name or geographic identifier (e.g., FIPS code) for location for the given forecast
output_type: Type of forecasted value (e.g., "quantile")
output_type_id: The quantile for the forecasted value if output_type is "quantile"
value: The forecasted value

If format is "legacy" each tibble will have the following columns:

forecast_date: Date of forecast
target: Horizon and name of forecasted target
target_end_date: Last date of the forecasted target (e.g., Saturday of the given epidemiological week)
location: Name or geographic identifier (e.g., FIPS code) for location for the given forecast
type: One of either "point" or "quantile" for the forecasted value
quantile: The quantile for the forecasted value; NA if "type" is "point"
value: The forecasted value

References

https://github.com/signaturescience/FluSight-forecast-hub/tree/main/model-output#forecast-file-format

https://github.com/cdcepi/Flusight-forecast-data/blob/master/data-forecasts/README.md

Examples

## Not run: 
# Get raw data from healthdata.gov
h_raw <- get_hdgov_hosp(limitcols=TRUE)

# Prep, and make a tsibble
prepped_hosp <- prep_hdgov_hosp(h_raw, statesonly=TRUE)
prepped_hosp_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)
# Limit to only Virginia and US
prepped_hosp_tsibble <-
  prepped_hosp_tsibble %>%
  dplyr::filter(location %in% c("US", "51"))

# Fit a model
hosp_fitfor <- ts_fit_forecast(prepped_hosp_tsibble,
                               horizon=4L,
                               outcome="flu.admits",
                               covariates=TRUE)

# Format for submission
formatted_list <- format_for_submission(hosp_fitfor$tsfor, method = "ts", format = "legacy")
formatted_list

## End(Not run)
## Not run: 
# Get raw data from healthdata.gov
h_raw <- get_hdgov_hosp(limitcols=TRUE)

# Prep, and make a tsibble
prepped_hosp <- prep_hdgov_hosp(h_raw, statesonly=TRUE)
prepped_hosp_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)
# Limit to only Virginia and US
prepped_hosp_tsibble <-
  prepped_hosp_tsibble %>%
  dplyr::filter(location %in% c("US", "51"))

# Fit a model
hosp_fitfor <- ts_fit_forecast(prepped_hosp_tsibble,
                               horizon=4L,
                               outcome="flu.admits",
                               covariates=TRUE)

# Format for submission
formatted_list <- format_for_submission(hosp_fitfor$tsfor, method = "ts", format = "legacy")
formatted_list

## End(Not run)

Retrieve clinical laboratory percent positive flu data

Description

This function returns weekly state and/or national clinical laboratory percent positivity data from the NREVSS reporting instrument via the CDC FluView API.

Usage

get_cdc_clin(region = "both", years = NULL)
get_cdc_clin(region = "both", years = NULL)

Arguments

`region`	Either "state", "national", or "both". Defaults to `"both"` to return state and national data combined.
`years`	A vector of years to retrieve data for. CDC has data going back to 1997. Default value (`NULL`) retrieves all years.

Value

A tibble with the following columns:

abbreviation: Abbreviation for the location
location: FIPS code for the location
epiyear: Year of reporting (in epidemiological week calendar)
epiweek: Week of reporting (in epidemiological week calendar)
week_start: Date of beginning (Sunday) of the given epidemiological week
p_positive: Percentage of positive specimens
n_positive: Total number of positive specimens
total: Total number of specimens tested

References

https://gis.cdc.gov/grasp/fluview/Phase_6_Cleared_Help.pdf

Examples

## Not run: 
# Get all clinical lab flu positivity data
all_clin <- get_cdc_clin()
all_clin
# Alternatively look at a specific location and time
# This 2021 will return weekly data
# Starting at beginning of 2021/22 season
# Ending the week before start of 2022/23 season
va_clin <-
  get_cdc_clin(region = "state", years = 2021) %>%
  dplyr::filter(location == "51")
va_clin

## End(Not run)
## Not run: 
# Get all clinical lab flu positivity data
all_clin <- get_cdc_clin()
all_clin
# Alternatively look at a specific location and time
# This 2021 will return weekly data
# Starting at beginning of 2021/22 season
# Ending the week before start of 2022/23 season
va_clin <-
  get_cdc_clin(region = "state", years = 2021) %>%
  dplyr::filter(location == "51")
va_clin

## End(Not run)

Retrieve hospitalization data from FluSurv-NET

Description

This function retrieves historical FluSurv-NET hospitalization data via either RESP-NET or the CDC FluView API (see 'Details' section).

Usage

get_cdc_hosp(source = "fluview", years = NULL)
get_cdc_hosp(source = "fluview", years = NULL)

Arguments

`source`	The source for the hospitalization data; must be one of `"fluview"` or `"resp-net"`; default is `"fluview"`
`years`	A vector of years to retrieve data for (i.e. 2014 for CDC flu season 2014-2015). CDC has data going back to 2009 and up until the previous flu season. Default value (`NULL`) retrieves all years. Only used if `source="fluview"`.

Details

The data retrieval from FluView and and RESP-NET pulls FluSurv-NET lab-confirmed hospitalization data as overall cumulative rates and weekly incident rates across reporting sites, age groups, sex, and race/ethnicity categories. Note that as of October 2024, the FluSurv-NET hospitalizations from RESP-NET begin in the 2018-19 season, while the FluView data goes back to 2009-10.

Value

A tibble with the following columns:

location: FIPS code for the location
abbreviation: Abbreviation for the location
region: Name of the region
epiyear: Year of reporting (in epidemiological week calendar)
epiweek: Week of reporting (in epidemiological week calendar)
week_start: Date of beginning (Sunday) of the given epidemiological week
week_end: Date of end (Saturday) of the given epidemiological week
rate: The cumulative rate per 100k
weekly_rate: The weekly rate per 100k
season: The flu season to which the given epidemiological week belongs

References

Examples

## Not run: 
# Retrieve FluSurv-Net hospitalization data for specific year(s)
get_cdc_hosp()

## End(Not run)
## Not run: 
# Retrieve FluSurv-Net hospitalization data for specific year(s)
get_cdc_hosp()

## End(Not run)

Retrieve ILI data from ILINet

Description

This function pulls ILINet data from the CDC FluView API. Data are available historically and can be pulled at the state, national, or HHS region level.

Usage

get_cdc_ili(region = c("national", "state", "hhs"), years = NULL)
get_cdc_ili(region = c("national", "state", "hhs"), years = NULL)

Arguments

`region`	Either "state", "national", or "hhs"; defaults to `c("national", "state", "hhs")` for all three.
`years`	A vector of years to retrieve data for. CDC has data going back to 1997. Default value (`NULL`) retrieves all years.

Value

A tibble with the following columns:

location: FIPS code for the location
region_type: The type of location
abbreviation: Abbreviation for the location
region: Name of the region
epiyear: Year of reporting (in epidemiological week calendar)
epiweek: Week of reporting (in epidemiological week calendar)
week_start: Date of beginning (Sunday) of the given epidemiological week
weighted_ili: Population-weighted percentage of ILI outpatient visits
unweighted_ili: Unweighted percentage of ILI outpatient visits
ilitotal: Total number of ILI outpatient visits reported
num_providers: Number of providers reporting
total_patients: Total number of outpatient visits reported
population: Total population for the given location

References

https://gis.cdc.gov/grasp/fluview/FluViewPhase1QuickReferenceGuide.pdf

Examples

## Not run: 
# Retrieve ILI data for specific years and regions
get_cdc_ili(region="national", years=2021)
get_cdc_ili(region="hhs", years=2021)
get_cdc_ili(region="state", years=2021) %>% dplyr::filter(abbreviation=="VA")
get_cdc_ili(region=c("national", "state"), years=2021)

## End(Not run)
## Not run: 
# Retrieve ILI data for specific years and regions
get_cdc_ili(region="national", years=2021)
get_cdc_ili(region="hhs", years=2021)
get_cdc_ili(region="state", years=2021) %>% dplyr::filter(abbreviation=="VA")
get_cdc_ili(region=c("national", "state"), years=2021)

## End(Not run)

Retrieve hospitalization data from HHS

Description

This function retrieves hospital utilization time series data distributed through healthdata.gov. Data are aggregated to the state granularity from facility level reports via HHS TeleTracking, HHS Protect, and the National Healthcare Safety Network (historically). Users can optionally filter to include all fields or restrict to a prespecified set of fields relevant to COVID and influenza hospital utilization. The results are returned as a tibble.

Usage

get_hdgov_hosp(
  endpoint = "https://healthdata.gov/api/views/g62h-syeh/rows.csv",
  app_token = Sys.getenv("HEALTHDATA_APP_TOKEN"),
  limitcols = FALSE,
  shift_back = TRUE
)
get_hdgov_hosp(
  endpoint = "https://healthdata.gov/api/views/g62h-syeh/rows.csv",
  app_token = Sys.getenv("HEALTHDATA_APP_TOKEN"),
  limitcols = FALSE,
  shift_back = TRUE
)

Arguments

`endpoint`	URL to healthdata.gov endpoint
`app_token`	App token from healthdata.gov; default is to look for environment variable called `"HEALTHDATA_APP_TOKEN"` and if a token is not supplied to proceed with possibility of rate limitation (see "Details" for more information)
`limitcols`	Logical as to whether or not to limit to prespecified set of columns (see "Value" section for more details); default `FALSE`
`shift_back`	Logical as to whether or not the dates for the retrieved data should be shifted back to reflect previous day; default is `TRUE`

Details

The data retrieval will proceed whether or not an API token has been supplied via the app_token argument. However, to avoid possible rate limits it is recommended to retrieve a token for the healthdata.gov API (https://healthdata.gov/profile/edit/developer_settings), and add that token as an entry to .Renviron with HEALTHDATA_APP_TOKEN="yourtokenhere".

Value

A tibble with at least the following columns:

state: Abbreviation of the state
date: Date of report
flu.admits: Count of flu cases among admitted patients on previous day
flu.admits.cov: Coverage (number of hospitals reporting) for incident flu cases
flu.deaths: Count of flu deaths on previous day
flu.deaths.cov: Coverage (number of hospitals reporting) for flu deaths
flu.icu: Count of flu cases among ICU patients on previous day
flu.icu.cov: Coverage (number of hospitals reporting) for flu ICU cases
flu.tot: Count of total flu cases among admitted patients
flu.tot.cov: Coverage (number of hospitals reporting) for total flu cases
cov.admits: Count of COVID cases among admitted patients on previous day
cov.admits.cov: Coverage (number of hospitals reporting) for incident COVID cases
cov.deaths: Count of COVID deaths on previous day
cov.deaths.cov: Coverage (number of hospitals reporting) for COVID deaths

If limitcols=TRUE then the only columns returned will be those listed above. However, if limitcols=FALSE then the function will additionally return all other fields in the state-aggregated hospitalization data.

References

https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh

https://dev.socrata.com/foundry/healthdata.gov/g62h-syeh

Examples

## Not run: 
# Retrieve hospitalization data (all columns)
get_hdgov_hosp()
# Retrieve hospitalization data (limited columns)
get_hdgov_hosp(limitcols=TRUE)

## End(Not run)
## Not run: 
# Retrieve hospitalization data (all columns)
get_hdgov_hosp()
# Retrieve hospitalization data (limited columns)
get_hdgov_hosp(limitcols=TRUE)

## End(Not run)

Retrieve weekly NHSN flu hospitalization data

Description

This function retrieves weekly aggregated NHSN hospital respiratory data API. The function was written to use the default API endpoint (see description of "endpoint" argument and link in references). Note one the available endpoints includes data flagged as "preliminary". All reported weekly aggregates include the number of facilities reporting. In the weeks between April 28, 2024 and November 02, 2024 the NHSN flu hospitalization signal was not required to be reported.

Usage

get_nhsn_weekly(endpoint = "https://data.cdc.gov/api/views/mpgq-jmmr/rows.csv")
get_nhsn_weekly(endpoint = "https://data.cdc.gov/api/views/mpgq-jmmr/rows.csv")

Arguments

endpoint

URL to data.cdc.gov endpoint; default is "https://data.cdc.gov/api/views/mpgq-jmmr/rows.csv" for the preliminary reporting signal

Value

A tibble with the following columns:

abbreviation: Abbreviation of the state or US aggregate
week_end: End date for the epiweek/epiyear being reported
flu.admits: Count of incident flu cases among hospitalized patients
flu.admits.cov: Coverage (number of hospitals reporting) for incident flu cases
flu.admits.cov.perc: Coverage (percentage of hospitals reporting) for incident flu cases

References

https://data.cdc.gov/Public-Health-Surveillance/Weekly-Hospital-Respiratory-Data-HRD-Metrics-by-Ju/mpgq-jmmr/about_data

https://data.cdc.gov/Public-Health-Surveillance/Weekly-Hospital-Respiratory-Data-HRD-Metrics-by-Ju/ua7e-t2fy/about_data

Retrieve ILI nowcast

Description

This function pulls the ILI nowcast from CMU Delphi's ILI Nearby API. Observed ILINet data is typically reported with a lag, and the ILI nowcast can be used to augment the ILI data stream. The functionality here depends on availability of the ILI Nearby API (see 'Details' section).

Usage

get_nowcast_ili(
  epiyearweeks = NULL,
  dates = lubridate::today() - c(14, 7),
  state = NULL,
  boundatzero = TRUE
)
get_nowcast_ili(
  epiyearweeks = NULL,
  dates = lubridate::today() - c(14, 7),
  state = NULL,
  boundatzero = TRUE
)

Arguments

`epiyearweeks`	A vector of epiyear-epiweeks to retrieve data for, e.g., 202150, 202151, etc. Exclusive with dates
`dates`	A vector of dates to retrieve data for, e.g., ""2021-12-12" or "2021-12-19". Exclusive with epiyearweek. Defaults to two weeks prior.
`state`	A vector of states to retrieve (two-letter abbreviation). Default `NULL` retrieves all states, national, and hhs regions. See examples.
`boundatzero`	Logical as to whether or not the values should be truncated at 0 (i.e., non-negative); default is `TRUE`

Details

As of October 2022 ILInearby was no longer being updated. As such, the get_nowcast_ili() will likely return 'NA'. See https://github.com/cmu-delphi/delphi-epidata/issues/993.

Value

Either NA (if the API can't be reached) or a tibble with the following columns:

location: FIPS code for the location
abbreviation: Abbreviation for the location
epiyear: Year of reporting (in epidemiological week calendar)
epiweek: Week of reporting (in epidemiological week calendar)
weighted_ili_now: Nowcasted ILI value

References

https://delphi.cmu.edu/nowcast/

Examples

## Not run: 
# Defaults to the previous two weeks for all states
get_nowcast_ili()

# Otherwise specify one or the other, not both
get_nowcast_ili(epiyearweeks=c("202150", "202151"), dates=NULL)
get_nowcast_ili(epiyearweeks=NULL, dates=c("2021-12-12", "2021-12-19"))

# Get just one state for the last years worth of data (back 52 weeks to 1 week)
get_nowcast_ili(epiyearweeks=NULL,
                dates=lubridate::today()-seq(52*7, 7, -7),
                state="FL")

## End(Not run)
## Not run: 
# Defaults to the previous two weeks for all states
get_nowcast_ili()

# Otherwise specify one or the other, not both
get_nowcast_ili(epiyearweeks=c("202150", "202151"), dates=NULL)
get_nowcast_ili(epiyearweeks=NULL, dates=c("2021-12-12", "2021-12-19"))

# Get just one state for the last years worth of data (back 52 weeks to 1 week)
get_nowcast_ili(epiyearweeks=NULL,
                dates=lubridate::today()-seq(52*7, 7, -7),
                state="FL")

## End(Not run)

Retrieve data from RESP-NET

Description

This unexported helper function retrieves data from the RESP-NET API for respiratory hospitalizations. The data retrieved can be filtered to specific network(s) reported. The function is used internally by get_cdc_hosp.

Usage

get_respnet(network = "FluSurv-NET")
get_respnet(network = "FluSurv-NET")

Arguments

network

The name of the RESP-NET network to query; must be one of "FluSurv-NET", "COVID-NET", "RSV-NET", or "Combined"; default is "FluSurv-NET"

References

Fit glm models

Description

This helper function is used in glm_wrap to fit a list of models and select the best one. The model selection procedure will use the root mean square error (RMSE) metric implemented in yardstick::rmse to select the best model.

Usage

glm_fit(.data, .models, complete = TRUE)
glm_fit(.data, .models, complete = TRUE)

Arguments

`.data`	Data including all explanatory and outcome variables needed for modeling; must include column for "location"
`.models`	List of models defined as trending::trending_model objects
`complete`	Logical as to whether or not all observations for covariates must be available in a given model; default is `TRUE`

Value

A tibble containing characteristics from the "best" glm model including:

model_class: The "type" of model for the best fit
fit: The fitted model object for the best fit stored as a list column
location: The geographic unit being modeled
data: Original model fit data as a tibble in a list column

Forecast glm models

Description

This function uses fitted model object from glm_fit and future covariate data to create probablistic forecasts at specific quantiles derived from the "alpha" parameter.

Usage

glm_forecast(
  .data,
  new_covariates = NULL,
  fit,
  alpha = c(0.01, 0.025, seq(0.05, 0.45, by = 0.05)) * 2
)
glm_forecast(
  .data,
  new_covariates = NULL,
  fit,
  alpha = c(0.01, 0.025, seq(0.05, 0.45, by = 0.05)) * 2
)

Arguments

`.data`	Data including all explanatory and outcome variables needed for modeling
`new_covariates`	Tibble with one column per covariate, and n rows for n horizons being forecasted
`fit`	Fitted model object from glm_fit; note must be accessed from first element in "fit" column
`alpha`	Vector specifying the threshold(s) to be used for prediction intervals; alpha of `0.05` would correspond to 95% PI; default is `c(0.01, 0.025, seq(0.05, 0.45, by = 0.05)) * 2` to create a range of intervals

Value

A tibble with forecasted data including the following columns:

epiweek: The epidemiological week for the forecasted horizon
epiyear: The epidemiological year for the forecasted horizon
quantile: The quantile for the forecasted value; NA for point estimate
value: The forecasted value

Get quantiles from prediction intervals

Description

This helper function runs the trending::predict.trending_fit method on a fitted model at specified values of "alpha" in order to create a range of prediction intervals. The processing also includes steps to convert the alpha to corresponding quantile values at upper and lower bounds. See "Details" for more information on the translation of "alpha" to quantile values. This function is used internally in glm_forecast.

Usage

glm_quibble(
  fit,
  new_data,
  alpha = c(0.01, 0.025, seq(0.05, 0.45, by = 0.05)) * 2
)
glm_quibble(
  fit,
  new_data,
  alpha = c(0.01, 0.025, seq(0.05, 0.45, by = 0.05)) * 2
)

Arguments

`fit`	Fitted model object from glm_fit; note must be accessed from first element in "fit" column
`new_data`	A `tibble` with new data on which the trending::predict.trending_fit method should run
`alpha`	Vector specifying the threshold(s) to be used for prediction intervals (PI); alpha of `0.05` would correspond to 95% PI; default is `c(0.01, 0.025, seq(0.05, 0.45, by = 0.05)) * 2` to create a range of intervals

Details

The "alpha" parameter defines the width of prediction interval (PI). For example, an alpha = 0.05 would correspond to a 95% PI. This function uses the PI(s) (per the alpha value(s) specified) to construct a range of quantiles that fall at lower and upper bound of each PI. Continuing from the example of alpha = 0.05, the quantile estimates returned would fall at 0.025 (lower bound of PI) and 0.975 (upper bound of PI).

Value

A tibble with forecasted data including the following columns:

epiweek: The epidemiological week for the forecasted horizon
epiyear: The epidemiological year for the forecasted horizon
quantile: The quantile for the forecasted value; NA for point estimate
value: The forecasted value

Run glm modeling and forecasting

Description

This is a wrapper function that pipelines influenza hospitalization modeling (glm_fit) and forecasting (glm_forecast).

Usage

glm_wrap(
  .data,
  .models,
  new_covariates = NULL,
  horizon = 4,
  alpha = c(0.01, 0.025, seq(0.05, 0.45, by = 0.05)) * 2
)
glm_wrap(
  .data,
  .models,
  new_covariates = NULL,
  horizon = 4,
  alpha = c(0.01, 0.025, seq(0.05, 0.45, by = 0.05)) * 2
)

Arguments

`.data`	Data including all explanatory and outcome variables needed for modeling
`.models`	List of models defined as trending::trending_model objects
`new_covariates`	A `tibble` with one column per covariate, and n rows for n horizons being forecasted
`horizon`	Number of weeks ahead for forecasting
`alpha`	Vector specifying the threshold(s) to be used for prediction intervals (PI); alpha of `0.05` would correspond to 95% PI; default is `c(0.01, 0.025, seq(0.05, 0.45, by = 0.05)) * 2` to create a range of intervals

Value

Named list with two elements:

model: Output from glm_fit with selected model fit
forecasts: Output from glm_forecast with forecasts from each horizon combined as a single tibble

Examples

## Not run: 
# Retrieve data to be used in fitting models
hosp_va <-
 get_hdgov_hosp(limitcols=TRUE) %>%
 prep_hdgov_hosp(statesonly=TRUE, min_per_week = 0, remove_incomplete = TRUE) %>%
 dplyr::filter(abbreviation == "VA")

# Define list of models
models <-
 list(
   poisson = trending::glm_model(flu.admits ~ hosp_rank + ili_rank, family = "poisson"),
   quasipoisson = trending::glm_model(flu.admits ~ hosp_rank + ili_rank, family = "quasipoisson"),
   negbin = trending::glm_nb_model(flu.admits ~ hosp_rank + ili_rank)
 )

# Create new covariate data to feed into forecast procedure
new_cov <-
  dplyr::tibble(
    date = max(hosp_va$week_start) + c(7,14,21,28),
    epiweek = lubridate::epiweek(date),
    epiyear = lubridate::epiyear(date)
  ) %>%
  dplyr::left_join(
    fiphde:::historical_severity, by="epiweek"
  ) %>%
  dplyr::select(-epiweek,-epiyear)

# Run the glm wrapper to fit and forecast
va_glm_res <- glm_wrap(.data = hosp_va, .models = models, new_covariates = new_cov, horizon = 4)
va_glm_res


## End(Not run)
## Not run: 
# Retrieve data to be used in fitting models
hosp_va <-
 get_hdgov_hosp(limitcols=TRUE) %>%
 prep_hdgov_hosp(statesonly=TRUE, min_per_week = 0, remove_incomplete = TRUE) %>%
 dplyr::filter(abbreviation == "VA")

# Define list of models
models <-
 list(
   poisson = trending::glm_model(flu.admits ~ hosp_rank + ili_rank, family = "poisson"),
   quasipoisson = trending::glm_model(flu.admits ~ hosp_rank + ili_rank, family = "quasipoisson"),
   negbin = trending::glm_nb_model(flu.admits ~ hosp_rank + ili_rank)
 )

# Create new covariate data to feed into forecast procedure
new_cov <-
  dplyr::tibble(
    date = max(hosp_va$week_start) + c(7,14,21,28),
    epiweek = lubridate::epiweek(date),
    epiyear = lubridate::epiyear(date)
  ) %>%
  dplyr::left_join(
    fiphde:::historical_severity, by="epiweek"
  ) %>%
  dplyr::select(-epiweek,-epiyear)

# Run the glm wrapper to fit and forecast
va_glm_res <- glm_wrap(.data = hosp_va, .models = models, new_covariates = new_cov, horizon = 4)
va_glm_res


## End(Not run)

Laboratory-confirmed influenza hospitalizations

Description

Adapted from cdcfluview::hospitalizations.

This unexported helper function leverages the CDC FluView API to pull influenza hospitalizations collected by surveillance instruments (including FluSurv-NET). The data retrieved can be parameterized by geographic granularity and/or flu season, and includes hospitalization rates by age group. The function is used internally by get_cdc_hosp.

Usage

hospitalizations(
  surveillance_area = c("flusurv", "eip", "ihsp"),
  region = "all",
  years = NULL
)
hospitalizations(
  surveillance_area = c("flusurv", "eip", "ihsp"),
  region = "all",
  years = NULL
)

Arguments

`surveillance_area`	One of `"flusurv"`, `"eip"`, or `"ihsp"`
`region`	Individual region within the surveillance area selected; default `"all"` mimics selecting "Entire Network" from the CDC FluView application drop down; see "Details" for list of valid region values for each surveillance area
`years`	A vector of years to retrieve data for (i.e. `2014` for CDC flu season 2014-2015). CDC has data for this API going back to 2009 and up until the previous flu season. Default value (`NULL`) means retrieve all years. NOTE: if you happen to specify a 2-digit season value (i.e. `56` == 2016-2017) the function is smart enough to retrieve by season ID vs convert that to a year.

Details

NOTE: The list of regions was compiled in February 2023 by querying the CDC FluView API. Individual regions may not be accessible in all cases. As of late 2023, the query was only returning results for the "Entire Network" selection.

Each possible value "surveillance_area" ("flusurv", "eip", or "ihsp") can be further queried by region. The following is a list of valid regions:

flusurv: "Entire Network"
eip: "Entire Network", "California", "Colorado", "Connecticut", "Georgia", "Maryland", "Minnesota", "New Mexico", "New York - Albany", "New York - Rochester", "Oregon, "Tennessee"
ihsp: "Entire Network", "Idaho", "Iowa", "Michigan", "Ohio", "Oklahoma", "Rhode Island", "South Dakota", "Utah"

References

Hubverse formatting

Description

This unexported helper is used internally inside in format_for_submission. It specifically updates formatting for Hubverse guidelines.

Usage

hubverse_format(dat, horizon_shift = 1)
hubverse_format(dat, horizon_shift = 1)

Arguments

`dat`	Forecast prepped in "legacy" format
`horizon_shift`	Number of horizons to shift backwards to align with reference date; default is `1`

Value

Formatted tibble

References

https://github.com/signaturescience/FluSight-forecast-hub/tree/main/model-output#forecast-file-format

https://github.com/cdcepi/Flusight-forecast-data/blob/master/data-forecasts/README.md

Retrieve ILINet surveillance data

Description

Adapted from cdcfluview::ilinet.

This unexported helper function retrieves current and historical ILINet surveillance data for the identified region via the CDC FluView API. The function is used internally in get_cdc_ili. Data returned include weighted and unweighted ILI percentage, as well as age-specific ILI outpatient visit counts for each location / epidemiological week.

Usage

ilinet(region = c("national", "hhs", "census", "state"), years = NULL)
ilinet(region = c("national", "hhs", "census", "state"), years = NULL)

Arguments

`region`	One of "`national`", "`hhs`", "`census`", or "`state`"
`years`	A vector of years to retrieve data for (i.e. `2014` for CDC flu season 2014-2015). CDC has data for this API going back to 1997. Default value (`NULL`) means retrieve all years. NOTE: if you happen to specify a 2-digit season value (i.e. `57` == 2017-2018) the function is smart enough to retrieve by season ID vs convert that to a year.

References

Check Monday

Description

This is a helper function to see if today is Monday.

Usage

is_monday()
is_monday()

Value

Logical indicating whether or not today is Monday

Examples

is_monday()
is_monday()

Make `tsibble`

Description

This function converts an input tibble with columns for lubridate::epiyear and lubridate::epiweek into a tsibble::tsibble object. The tsibble has columns specifying indices for the time series as well as a date for the Monday of the epiyear/epiweek combination at each row.

Usage

make_tsibble(df, epiyear, epiweek, key = location)
make_tsibble(df, epiyear, epiweek, key = location)

Arguments

`df`	A `tibble` containing columns `epiyear` and `epiweek`.
`epiyear`	Unquoted variable name containing the MMWR epiyear.
`epiweek`	Unquoted variable name containing the MMWR epiweek.
`key`	Unquoted variable name containing the name of the column to be the tsibble key. See tsibble::as_tsibble.

Value

A tsibble containing additional columns monday indicating the date for the Monday of that epiweek, and yweek (a yearweek vctr class object) that indexes the tsibble in 1 week increments.

Examples

# Create an example tibble
d <- tibble::tibble(epiyear=c(2020, 2020, 2021, 2021),
                    epiweek=c(52, 53, 1, 2),
                    location="US",
                    somedata=101:104)
# Convert to tsibble (keyed time series tibble)
make_tsibble(d, epiyear = epiyear, epiweek=epiweek, key=location)
# Create an example tibble
d <- tibble::tibble(epiyear=c(2020, 2020, 2021, 2021),
                    epiweek=c(52, 53, 1, 2),
                    location="US",
                    somedata=101:104)
# Convert to tsibble (keyed time series tibble)
make_tsibble(d, epiyear = epiyear, epiweek=epiweek, key=location)

Convert MMWR format to date

Description

Adapted from cdcfluview::mmwr_week_to_date.

This function transforms MMWR epidemiological year+week (or year+week+day) to a date object. This was implemented based on the cdcfluview::mmwr_week_to_date function, which adapted similar functionality from the MMWRweek package.

Usage

mmwr_week_to_date(year, week, day = NULL)
mmwr_week_to_date(year, week, day = NULL)

Arguments

`year`	Vector of epidemiological year(s); must be same length as "week" and "day" (unless "day" is `NULL`)
`week`	Vector of epidemiological week(s); must be same length as "year" and "day" (unless "day" is `NULL`)
`day`	Vector of day(s); must be same length as "week" and "year" (unless set to is `NULL`); default is `NULL` and the day returned will be the first day of the epidemiological week (i.e., Sunday)

Value

Vector of date objects as with as many elements as input year(s), week(s), day(s)

References

cdcfluview package

Examples

mmwr_week_to_date(2020,1)
mmwr_week_to_date(2020,1,5)
mmwr_week_to_date(c(2020,2021,2022),c(1,2,8), c(1,1,7))
mmwr_week_to_date(2020,1)
mmwr_week_to_date(2020,1,5)
mmwr_week_to_date(c(2020,2021,2022),c(1,2,8), c(1,1,7))

Minimum non-zero

Description

Helper function to get the minimum non-zero positive value from a vector. Used internally in mnz_replace.

Usage

mnz(x)
mnz(x)

Arguments

`x`	A numeric vector

Value

The minimum non-zero positive value from x

Examples

x <- c(.1, 0, -.2, NA, .3, .4, .0001, -.3, NA, 999)
x
mnz(x)
x <- c(.1, 0, -.2, NA, .3, .4, .0001, -.3, NA, 999)
x
mnz(x)

Minimum non-zero replacement

Description

Replace zeros and negative values with the minimum non-zero positive value from a vector.

Usage

mnz_replace(x)
mnz_replace(x)

Arguments

`x`	A numeric vector

Value

A vector of the same length with negatives and zeros replaced with the minimum nonzero value of that vector.

Examples

x <- c(.1, 0, -.2, NA, .3, .4, .0001, -.3, NA, 999)
x
mnz(x)
mnz_replace(x)
tibble::tibble(x) %>% dplyr::mutate(x2=mnz_replace(x))
x <- c(.1, 0, -.2, NA, .3, .4, .0001, -.3, NA, 999)
x
mnz(x)
mnz_replace(x)
tibble::tibble(x) %>% dplyr::mutate(x2=mnz_replace(x))

Non-seasonal flu hospitalization imputation

Description

This unexported helper function is used to create a "non-seasonal", location-specific imputation estimate for weekly NHSN flu hospitalization counts. The imputation approach was motivated by the change in reporting requirements for the NHSN hospital respiratory disease metrics, which became optional from April 2024 to November 2024. This function includes four different approaches (see 'Details' for more) for adjusting and/or filling the gap in state-level flu hospitalization reporting.

Usage

ns_impute(
  dat,
  location,
  method = "val",
  begin_date = "2024-04-28",
  end_date = "2024-11-02"
)
ns_impute(
  dat,
  location,
  method = "val",
  begin_date = "2024-04-28",
  end_date = "2024-11-02"
)

Arguments

`dat`	A `tibble` with hospitalization data prepared either by prep_hdgov_hosp or prep_nhsn_weekly
`location`	FIPS code for location to impute
`method`	Imputation method to use; must be one of `"val"`, `"diff"`, `"median"`, or `"partial"` (see 'Details' for more); default is `"val"`
`begin_date`	Start date for imputation in YYYY-MM-DD format; default is `"2024-04-28"`
`end_date`	End date for imputation in YYYY-MM-DD; default is `"2024-11-02"`

Details

There are four possible methods for imputing non-seasonal weeks implemented in this function:

"val": Random sampling from a vector of values including all flu hospitalizations reported weeks between June-October 2022 and June-October 2023 for the given location; first and last values are defined as median of the random sample and the most recent un-imputed value (i.e., the week before imputation begins and the week after imputation ends)
"diff": Random sampling from a vector of week-to-week differences in flu hospitalizations reported in weeks between June-October 2022 and June-October 2023 for the given location
"median": Median of 2022 and 2023 values reported for the given epiweek
"partial": Uses the adjust_partial=TRUE flag for the prep_nhsn_weekly function to fill the weeks in the date range specified

Value

A tibble with the same structure as the input for the "dat" argument, but with weeks between "begin_date" and "end_date" imputed.

Plot forecasts

Description

This function serves as a plotting mechanism for prepped forecast submission data. The plots show the historical trajectory of the truth data supplied along with the forecasted point estimates and (optionally) the prediction interval. All plots are faceted by location.

Note that the ".data" and "submission" arguments to this function expect incoming data prepared in a certain format. See the argument documentation and "Details" for more information.

Usage

plot_forecast(
  .data,
  submission,
  location = "US",
  pi = 0.95,
  .model = NULL,
  .outcome = "flu.admits",
  format = "legacy"
)
plot_forecast(
  .data,
  submission,
  location = "US",
  pi = 0.95,
  .model = NULL,
  .outcome = "flu.admits",
  format = "legacy"
)

Arguments

`.data`	A data frame with historical truth data for all locations and outcomes in submission targets
`submission`	Formatted submission (e.g., a `tibble` containing forecasts prepped with format_for_submission)
`location`	Vector specifying locations to filter to; `'US'` by default.
`pi`	Width of prediction interval to plot; default is `0.95` for 95% PI; if set to `NULL` the PI will not be plotted
`.model`	Name of the model used to generate forecasts; default is `NULL` and the name of the model will be assumed to be stored in a column called "model" in formatted submission file
`.outcome`	The name of the outcome variable you're plotting in the historical data; defaults to `"flu.admits"`
`format`	The submission format to be used; must be one of `"hubverse"` or `"legacy"`; default is `"legacy"`

Details

To plot the forecasted output alongside the observed historical data, both the ".data" and "submission" data must be prepared at the same geographic and temporal resolutions. The data frame passed to ".data" must include the column specified in the ".outcome" argument as well as the following columns:

location: FIPS location code
week_end: Date of the last day (Saturday) in the given epidemiological week

If format is "legacy" the "submission" data should be a probabilistic forecast prepared as a tibble with at minimum the following columns:

forecast_date: Date of forecast
target: Horizon and name of forecasted target
target_end_date: Last date of the forecasted target (e.g., Saturday of the given epidemiological week)
location: FIPS code for location
type: One of either "point" or "quantile" for the forecasted value
quantile: The quantile for the forecasted value; NA if "type" is "point"
value: The forecasted value

If format is "hubverse" the "submission" data should be a probabilistic forecast prepared as a tibble with at minimum the following columns:

reference_date: Date of reference for forecast submission
horizon: Horizon for the given forecast
target: Name of forecasted target
target_end_date: Last date of the forecasted target (e.g., Saturday of the given epidemiological week)
location: Name or geographic identifier (e.g., FIPS code) for location for the given forecast
output_type: Type of forecasted value (e.g., "quantile")
output_type_id: The quantile for the forecasted value if output_type is "quantile"
value: The forecasted value

The "submission" data may optionally include a column with the name of the model used, such that multiple models can be visualized in the same plot.

Value

A ggplot2 plot object with line plots for outcome trajectories faceted by location

Examples

## Not run: 
# Get some data
h_raw <- get_hdgov_hosp(limitcols=TRUE)

# Prep all the data
prepped_hosp_all <- prep_hdgov_hosp(h_raw)

# What are the last four weeks of recorded data?
last4 <-
  prepped_hosp_all %>%
  dplyr::distinct(week_start) %>%
  dplyr::arrange(week_start) %>%
  tail(4)

# Remove those
prepped_hosp <-
  prepped_hosp_all %>%
  dplyr::anti_join(last4, by="week_start")

# Make a tsibble
prepped_hosp_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)
# Limit to just one state and US
prepped_hosp_tsibble <-
  prepped_hosp_tsibble %>%
  dplyr::filter(location %in% c("US", "51"))

# Fit models and forecasts
hosp_fitfor <- ts_fit_forecast(prepped_hosp_tsibble,
                               horizon=4L,
                               outcome="flu.admits",
                               trim_date=NULL,
                               covariates=TRUE)

# Format for submission
hosp_formatted <- ts_format_for_submission(hosp_fitfor$tsfor)

# Plot with current and all data
plot_forecast(prepped_hosp, hosp_formatted$ensemble)
plot_forecast(prepped_hosp_all, hosp_formatted$ensemble)
plot_forecast(prepped_hosp, hosp_formatted$ensemble, location=c("US", "51"))
plot_forecast(prepped_hosp_all, hosp_formatted$ensemble, location=c("US", "51"))
plot_forecast(prepped_hosp, hosp_formatted$ets)
plot_forecast(prepped_hosp_all, hosp_formatted$ets)
plot_forecast(prepped_hosp, hosp_formatted$arima)
plot_forecast(prepped_hosp_all, hosp_formatted$arima)

# Demonstrating multiple models
prepped_hosp <-
  h_raw %>%
  prep_hdgov_hosp(statesonly=TRUE, min_per_week = 0, remove_incomplete = TRUE) %>%
  dplyr::filter(abbreviation != "DC") %>%
  dplyr::filter(week_start < as.Date("2022-01-08", format = "%Y-%m-%d"))

tsens_20220110 <-
  system.file("extdata/2022-01-10-SigSci-TSENS.csv", package="fiphde") %>%
  readr::read_csv(show_col_types = FALSE)
creg_20220110 <-
  system.file("extdata/2022-01-10-SigSci-CREG.csv", package="fiphde") %>%
  readr::read_csv(show_col_types = FALSE)
combo_20220110 <- dplyr::bind_rows(
  dplyr::mutate(tsens_20220110, model = "SigSci-TSENS"),
  dplyr::mutate(creg_20220110, model = "SigSci-CREG")
)
plot_forecast(prepped_hosp, combo_20220110, location = "24")
plot_forecast(prepped_hosp, tsens_20220110, location = "24")
plot_forecast(prepped_hosp, combo_20220110, location = c("34","36"))
plot_forecast(prepped_hosp, creg_20220110, location = "US", .model = "SigSci-CREG")
plot_forecast(prepped_hosp, creg_20220110, location = "US", .model = "SigSci-CREG")

## demonstrating different prediction interval widths
plot_forecast(prepped_hosp, combo_20220110, location = "24", pi = 0.5)
plot_forecast(prepped_hosp, combo_20220110, location = "24", pi = 0.9)
plot_forecast(prepped_hosp, combo_20220110, location = "24", pi = 0.95)
plot_forecast(prepped_hosp, combo_20220110, location = "24", pi = NULL)

## End(Not run)
## Not run: 
# Get some data
h_raw <- get_hdgov_hosp(limitcols=TRUE)

# Prep all the data
prepped_hosp_all <- prep_hdgov_hosp(h_raw)

# What are the last four weeks of recorded data?
last4 <-
  prepped_hosp_all %>%
  dplyr::distinct(week_start) %>%
  dplyr::arrange(week_start) %>%
  tail(4)

# Remove those
prepped_hosp <-
  prepped_hosp_all %>%
  dplyr::anti_join(last4, by="week_start")

# Make a tsibble
prepped_hosp_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)
# Limit to just one state and US
prepped_hosp_tsibble <-
  prepped_hosp_tsibble %>%
  dplyr::filter(location %in% c("US", "51"))

# Fit models and forecasts
hosp_fitfor <- ts_fit_forecast(prepped_hosp_tsibble,
                               horizon=4L,
                               outcome="flu.admits",
                               trim_date=NULL,
                               covariates=TRUE)

# Format for submission
hosp_formatted <- ts_format_for_submission(hosp_fitfor$tsfor)

# Plot with current and all data
plot_forecast(prepped_hosp, hosp_formatted$ensemble)
plot_forecast(prepped_hosp_all, hosp_formatted$ensemble)
plot_forecast(prepped_hosp, hosp_formatted$ensemble, location=c("US", "51"))
plot_forecast(prepped_hosp_all, hosp_formatted$ensemble, location=c("US", "51"))
plot_forecast(prepped_hosp, hosp_formatted$ets)
plot_forecast(prepped_hosp_all, hosp_formatted$ets)
plot_forecast(prepped_hosp, hosp_formatted$arima)
plot_forecast(prepped_hosp_all, hosp_formatted$arima)

# Demonstrating multiple models
prepped_hosp <-
  h_raw %>%
  prep_hdgov_hosp(statesonly=TRUE, min_per_week = 0, remove_incomplete = TRUE) %>%
  dplyr::filter(abbreviation != "DC") %>%
  dplyr::filter(week_start < as.Date("2022-01-08", format = "%Y-%m-%d"))

tsens_20220110 <-
  system.file("extdata/2022-01-10-SigSci-TSENS.csv", package="fiphde") %>%
  readr::read_csv(show_col_types = FALSE)
creg_20220110 <-
  system.file("extdata/2022-01-10-SigSci-CREG.csv", package="fiphde") %>%
  readr::read_csv(show_col_types = FALSE)
combo_20220110 <- dplyr::bind_rows(
  dplyr::mutate(tsens_20220110, model = "SigSci-TSENS"),
  dplyr::mutate(creg_20220110, model = "SigSci-CREG")
)
plot_forecast(prepped_hosp, combo_20220110, location = "24")
plot_forecast(prepped_hosp, tsens_20220110, location = "24")
plot_forecast(prepped_hosp, combo_20220110, location = c("34","36"))
plot_forecast(prepped_hosp, creg_20220110, location = "US", .model = "SigSci-CREG")
plot_forecast(prepped_hosp, creg_20220110, location = "US", .model = "SigSci-CREG")

## demonstrating different prediction interval widths
plot_forecast(prepped_hosp, combo_20220110, location = "24", pi = 0.5)
plot_forecast(prepped_hosp, combo_20220110, location = "24", pi = 0.9)
plot_forecast(prepped_hosp, combo_20220110, location = "24", pi = 0.95)
plot_forecast(prepped_hosp, combo_20220110, location = "24", pi = NULL)

## End(Not run)

Plot categorical forecasts

Description

This function creates a bar plot for categorical forecasts. See examples for demonstration of usage.

Usage

plot_forecast_categorical(categorical_forecast, format = "hubverse")
plot_forecast_categorical(categorical_forecast, format = "hubverse")

Arguments

`categorical_forecast`	Either a `tibble` with categorical forecasts created with forecast_categorical or prepared forecast submission in "hubverse" format (see Details)
`format`	Either "hubverse" or "legacy"; the "hubverse" format will require an input forecast that includes output for "pmf" (see Details); default is "hubverse"

Details

The categorical plotting function works both with "legacy" formatting (i.e., format used in the 2022-23 FluSight season) and the "hubverse" formatting (i.e., format used in the 2023-24 FluSight season). Unlike the "legacy" format, the "hubverse" format allows for quantile and categorical forecasts to be co-mingled in the same submission object. If the format is specified as "hubverse", then the plot_forecast_categorical() function will interally look for the "pmf" forecasts.

Value

A ggplot2 object with categorical forecasts shown as a stacked bar plot.

Examples

## Not run: 
# Retrieve hospitalization data
h_raw <- get_hdgov_hosp(limitcols=TRUE)
# Prepare and summarize hospitalization data to weekly resolution
prepped_hosp <- prep_hdgov_hosp(h_raw)
# Create a keyed time series tibble with only locations of interest
prepped_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)
# Run with default constrained ARIMA, nonseasonal ETS, no NNETAR
hosp_fitfor <- ts_fit_forecast(prepped_tsibble,
                               horizon=4L,
                               outcome="flu.admits")
# Prepare forecast for quantile submission format
prepped_forecast <- format_for_submission(hosp_fitfor$tsfor, method = "ts", format = "hubverse")
# Run categorical summary of quantiles for the time series ensemble
categorical_forecast <- forecast_categorical(prepped_forecast$ensemble, prepped_hosp, format = "hubverse")
# Plot the categorical forecast
plot_forecast_categorical(categorical_forecast, format = "hubverse")

## End(Not run)
## Not run: 
# Retrieve hospitalization data
h_raw <- get_hdgov_hosp(limitcols=TRUE)
# Prepare and summarize hospitalization data to weekly resolution
prepped_hosp <- prep_hdgov_hosp(h_raw)
# Create a keyed time series tibble with only locations of interest
prepped_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)
# Run with default constrained ARIMA, nonseasonal ETS, no NNETAR
hosp_fitfor <- ts_fit_forecast(prepped_tsibble,
                               horizon=4L,
                               outcome="flu.admits")
# Prepare forecast for quantile submission format
prepped_forecast <- format_for_submission(hosp_fitfor$tsfor, method = "ts", format = "hubverse")
# Run categorical summary of quantiles for the time series ensemble
categorical_forecast <- forecast_categorical(prepped_forecast$ensemble, prepped_hosp, format = "hubverse")
# Plot the categorical forecast
plot_forecast_categorical(categorical_forecast, format = "hubverse")

## End(Not run)

Simple Poisson count forecaster

Description

This function is a helper that forecasts Poisson counts for near-term horizons based on characteristics of recently observed count data. The function effectively takes a rolling average of most recent observations (augmenting with each forecasted horizon as the horizons progress), then uses this average as the parameter for Lambda in a random draw from a Poisson distribution.

Usage

pois_forc(.data, .location, .var, horizon = 4)
pois_forc(.data, .location, .var, horizon = 4)

Arguments

`.data`	Data frame with incoming data that includes a variable with counts (see ".var" argument), and location (must be stored in a column called "location") and a variable for sorting by date (must be stored in a column called "week_start")
`.location`	The name of the location of interest
`.var`	Bare, unquoted name of the variable with counts to be forecasted
`horizon`	The number of horizons ahead to forecast; must be one of `4` or `5`; default is `4`

Value

Vector with Poisson forecasts for the number of horizons specified.

Examples

## Not run: 
all_clin <- get_cdc_clin()
va_ahead <-
  dplyr::tibble(
    n_positive = pois_forc(all_clin, .location = "51", n_positive),
    total = pois_forc(all_clin, .location = "51", total),
    p_positive = n_positive / total)
va_ahead

## End(Not run)
## Not run: 
all_clin <- get_cdc_clin()
va_ahead <-
  dplyr::tibble(
    n_positive = pois_forc(all_clin, .location = "51", n_positive),
    total = pois_forc(all_clin, .location = "51", total),
    p_positive = n_positive / total)
va_ahead

## End(Not run)

Prep hospitalization data

Description

This function prepares hospitalization data retrieved using get_hdgov_hosp for downstream forecasting. The function optionally limits to states only, trims to a given date, removes incomplete weeks, and removes locations with little reporting over the last month.

Usage

prep_hdgov_hosp(
  hdgov_hosp,
  statesonly = TRUE,
  trim = list(epiyear = 2020, epiweek = 43),
  remove_incomplete = TRUE,
  min_per_week = 1,
  augment = FALSE,
  augment_stop = "2020-10-18"
)
prep_hdgov_hosp(
  hdgov_hosp,
  statesonly = TRUE,
  trim = list(epiyear = 2020, epiweek = 43),
  remove_incomplete = TRUE,
  min_per_week = 1,
  augment = FALSE,
  augment_stop = "2020-10-18"
)

Arguments

`hdgov_hosp`	Daily hospital utilization data from get_hdgov_hosp
`statesonly`	Logical as to whether or not to limit to US+DC+States only (i.e., drop territories); default is `TRUE`
`trim`	Named list with elements for epiyear and epiweek corresponding to the minimum epidemiological week to retain; defaults to `list(epiyear=2020, epiweek=43)`, which is the first date of report in the healthdata.gov hospitalization data; if set to `NULL` the data will not be trimmed
`remove_incomplete`	Logical as to whether or not to remove the last week if incomplete; default is `TRUE`
`min_per_week`	The minimum number of flu.admits per week needed to retain that state. Default removes states with less than 1 flu admission per week over the last 30 days.
`augment`	Logical as to whether or not the data should be augmented with NHSN hospitalizations imputed backwards in time (see 'Details' for more); default is `FALSE`
`augment_stop`	Date at which the time series imputation data should stop; yyyy-mm-dd format; only used if "augment" is `TRUE` default is `"2020-10-18"`

Details

The preparation for the weekly flu hospitalization data includes an option to "augment" the input time series. The augmentation is based on an extended time series that was developed with an imputation approach. The extended time series estimates flu hospitalizations at the state-level in years before NHSN reporting became available. If the user decides to include the imputed data, then the time series is extended backwards in time from the "augment_stop" date (defaults to October 18, 2020). The prepended data augmentation is formatted to match the NSHN reporting format. For more details on the data augmentation approach, refer to the publication: https://www.medrxiv.org/content/10.1101/2024.07.31.24311314.

Value

A tibble with hospitalization data summarized to epiyear/epiweek with the following columns:

abbreviation: Abbreviation for the location
location: FIPS code for the location
week_start: Date of beginning (Sunday) of the given epidemiological week
monday: Date of Monday of the given epidemiological week
week_end: Date of end (Saturday) of the given epidemiological week
epiyear: Year of reporting (in epidemiological week calendar)
epiweek: Week of reporting (in epidemiological week calendar)
flu.admits: Count of flu cases among admitted patients on previous week
flu.admits.cov: Coverage (number of hospitals reporting) for incident flu cases
ili_mean: Estimate of historical ILI activity for the given epidemiological week
ili_rank: Rank of the given epidemiological week in terms of ILI activity across season (1 being highest average activity)
hosp_mean: Estimate of historical flu hospitalization rate for the given epidemiological week
hosp_rank: Rank of the given epidemiological week in terms of flu hospitalizations across season (1 being highest average activity)

References

https://www.medrxiv.org/content/10.1101/2024.07.31.24311314

Examples

## Not run: 
# Retrieve hospitalization data
hdgov_hosp <- get_hdgov_hosp(limitcols=TRUE)
# Prepare and summarize to weekly resolution
h <- prep_hdgov_hosp(hdgov_hosp)
h

## End(Not run)
## Not run: 
# Retrieve hospitalization data
hdgov_hosp <- get_hdgov_hosp(limitcols=TRUE)
# Prepare and summarize to weekly resolution
h <- prep_hdgov_hosp(hdgov_hosp)
h

## End(Not run)

Prepare NHSN weekly data

Description

This function prepares data retrieved from the weekly aggregated NHSN hospital respiratory data API. The data must be first retrieved using get_nhsn_weekly. Once pulled from the API, this function will conditionally adjust partial reporting and add extended time series data (see 'Details' for more information). The preparation also includes joining to internal data prepared to estimate the historical severity of each epiweek.

Usage

prep_nhsn_weekly(
  dat,
  adjust_partial = TRUE,
  trim = NULL,
  statesonly = TRUE,
  augment = FALSE,
  augment_stop = "2020-10-18"
)
prep_nhsn_weekly(
  dat,
  adjust_partial = TRUE,
  trim = NULL,
  statesonly = TRUE,
  augment = FALSE,
  augment_stop = "2020-10-18"
)

Arguments

`dat`	Weekly hospital utilization data from get_nhsn_weekly
`adjust_partial`	Logical as to whether or not the partial reporting should be adjusted (see 'Details' for more); default is `TRUE`
`trim`	Named list with elements for epiyear and epiweek corresponding to the minimum epidemiological week to retain; default is set to `NULL` the data will not be trimmed; to override the default use a named list (e.g., `list(epiyear=2020, epiweek=43)`)
`statesonly`	Logical as to whether or not the data should be limited to states and DC (i.e., no other territories included); default is `TRUE`
`augment`	Logical as to whether or not the data should be augmented with NHSN hospitalizations imputed backwards in time (see 'Details' for more); default is `FALSE`
`augment_stop`	Date at which the time series imputation data should stop; yyyy-mm-dd format; only used if "augment" is `TRUE` default is `"2020-10-18"`

Details

The weekly aggregated data from NHSN includes locations that may have incomplete coverage of hospitals reporting (see https://data.cdc.gov/Public-Health-Surveillance/Weekly-Hospital-Respiratory-Data-HRD-Metrics-by-Ju/mpgq-jmmr/about_data for more information). The preparation in this function includes an optional step triggered by the "adjust_partial" argument to find the maximum coverage at any time point for each location, then adjusts the reported counts by a factor of X / Y_t, where X is the maximum coverage and Y_t is the coverage at time point t. If the coverage for the given week is near or equal to the maximum observed coverage, then the counts will have little to no effect on the counts. Note that this should be used with caution, as it is possible that some locations may have non-uniform reporting behaviors, especially during non-mandatory NHSN reporting windows. In other words, the counts may be adjusted using reported values from healthcare facilities that may be of a different size, serve different communities, or otherwise have different characteristics than the facilities that did not report.

The preparation for the weekly flu hospitalization data includes an option to "augment" the input time series. The augmentation is based on an extended time series that was developed with an imputation approach. The extended time series estimates flu hospitalizations at the state-level in years before NHSN reporting became available. If the user decides to include the imputed data, then the time series is extended backwards in time from the "augment_stop" date (defaults to October 18, 2020). The prepended data augmentation is formatted to match the true NSHN reporting. For more details on the data augmentation approach, refer to the publication: https://www.medrxiv.org/content/10.1101/2024.07.31.24311314.

Value

A tibble with hospitalization data summarized to epiyear/epiweek with the following columns:

abbreviation: Abbreviation for the location
location: FIPS code for the location
week_start: Date of beginning (Sunday) of the given epidemiological week
monday: Date of Monday of the given epidemiological week
week_end: Date of end (Saturday) of the given epidemiological week
epiyear: Year of reporting (in epidemiological week calendar)
epiweek: Week of reporting (in epidemiological week calendar)
flu.admits: Count of flu cases among admitted patients on previous week
flu.admits.cov: Coverage (number of hospitals reporting) for incident flu cases
ili_mean: Estimate of historical ILI activity for the given epidemiological week
ili_rank: Rank of the given epidemiological week in terms of ILI activity across season (1 being highest average activity)
hosp_mean: Estimate of historical flu hospitalization rate for the given epidemiological week
hosp_rank: Rank of the given epidemiological week in terms of flu hospitalizations across season (1 being highest average activity)

References

https://www.medrxiv.org/content/10.1101/2024.07.31.24311314

Replace ILINet data with nowcast

Description

This function replaces the weighted ILI retrieved from get_cdc_ili with nowcast data for each of the locations in the original data. The function will first attempt to use ILI Nearby nowcasts pulled using get_nowcast_ili. If the ILI Nearby nowcasts are unavailable, the function will optionally fallback to a pseudo nowcast method that averages the observed ILI for the 4 most recent weeks. The nowcast data will be used to add 1 additional week to the observed ILI data and (optionally) replace the number of weeks specified in the "weeks_to_replace" argument.

Usage

replace_ili_nowcast(
  ilidat,
  start_date = NULL,
  weeks_to_replace = 1,
  fallback = TRUE,
  try_api = TRUE
)
replace_ili_nowcast(
  ilidat,
  start_date = NULL,
  weeks_to_replace = 1,
  fallback = TRUE,
  try_api = TRUE
)

Arguments

`ilidat`	ILI data retrieved via get_cdc_ili
`start_date`	Date from which to start nowcasting; default is lubridate::today
`weeks_to_replace`	Number of weeks of `ilidat` to replace; default is `1`
`fallback`	Logical as to whether or not to fall back to pseudo nowcast (average of last 4 ILI weeks in the given location) if nowcast data is unavailable; default is `TRUE`
`try_api`	Logical as to whether or not the function should try the ILI Nearby nowcast API; default is `TRUE`; if `FALSE` then the function will not attempt to query the API at all

Value

A tibble with the following columns:

location: FIPS code for the location
region_type: The type of location
abbreviation: Abbreviation for the location
region: Name of the region
epiyear: Year of reporting (in epidemiological week calendar)
epiweek: Week of reporting (in epidemiological week calendar)
week_start: Date of beginning (Sunday) of the given epidemiological week
weighted_ili: Population-weighted percentage of ILI outpatient visits

Examples

## Not run: 
ilidat <- get_cdc_ili(years=2021)
ilidat <-
  ilidat %>%
  dplyr::filter(location=="US" | abbreviation=="VA") %>%
  dplyr::group_by(location) %>%
  dplyr::slice_max(week_start, n=4) %>%
  dplyr::select(location:weighted_ili)
ilidat
iliaug <- replace_ili_nowcast(ilidat, weeks_to_replace=1)
iliaug

## End(Not run)
## Not run: 
ilidat <- get_cdc_ili(years=2021)
ilidat <-
  ilidat %>%
  dplyr::filter(location=="US" | abbreviation=="VA") %>%
  dplyr::group_by(location) %>%
  dplyr::slice_max(week_start, n=4) %>%
  dplyr::select(location:weighted_ili)
ilidat
iliaug <- replace_ili_nowcast(ilidat, weeks_to_replace=1)
iliaug

## End(Not run)

Round and preserve vector

Description

This unexported helper is used to ensure that categorical forecasts are rounded to sum to 1.

Usage

round_preserve(x, digits = 0)
round_preserve(x, digits = 0)

Arguments

`x`	Numeric vector with values to round
`digits`	The number of digits to use in precision; defalut is `0`

Value

Vector of same length as "x" with values rounded

Calculate smoothed and weighted averages of previous observations

Description

This helper function calculates a weighted average of the last n observations.

Usage

smoothie(x, n = 4, weights = c(1, 2, 3, 4))
smoothie(x, n = 4, weights = c(1, 2, 3, 4))

Arguments

`x`	Incoming vector of observations
`n`	Number of recent observations to smooth; default is `4`
`weights`	Vector of weights to be applied to last n observations during averaging

Value

Vector of length 1 with the weighted average of last n observations.

Examples

## Not run: 
## pull and prep weekly US flu hospitalization data
hosp_us <-
  get_hdgov_hosp() %>%
  prep_hdgov_hosp() %>%
  dplyr::filter(location == "US")

## what do the last 4 observations look like?
tail(hosp_us$flu.admits, 4)

## smooth over last 4 with default weights
smoothie(hosp_us$flu.admits, n=4, weights=c(1,2,3,4))

## try smoothing over last 4 with different weights (exponential this time)
smoothie(hosp_us$flu.admits, n=4, weights=exp(1:4))

## End(Not run)
## Not run: 
## pull and prep weekly US flu hospitalization data
hosp_us <-
  get_hdgov_hosp() %>%
  prep_hdgov_hosp() %>%
  dplyr::filter(location == "US")

## what do the last 4 observations look like?
tail(hosp_us$flu.admits, 4)

## smooth over last 4 with default weights
smoothie(hosp_us$flu.admits, n=4, weights=c(1,2,3,4))

## try smoothing over last 4 with different weights (exponential this time)
smoothie(hosp_us$flu.admits, n=4, weights=exp(1:4))

## End(Not run)

Get Monday

Description

This function is a helper to get the date for the Monday of the current week. The function determines the current week based on epidemiological week orientation (i.e., week begins with Sunday).

Usage

this_monday()
this_monday()

Value

Date for the Monday of the current week.

Examples

this_monday()
this_monday()

Get Saturday

Description

This function is a helper to get the date for the Saturday of the current week. The function determines the current week based on epidemiological week orientation (i.e., week begins with Sunday).

Usage

this_saturday()
this_saturday()

Value

Date for the Saturday of the current week.

Examples

this_saturday()
this_saturday()

Clean numeric values

Description

This unexported helper is used in the ilinet function to strip special characters and empty space and convert a character vector to numeric.

Usage

to_num(x)
to_num(x)

Arguments

`x`	Input character vector for which special characters should be stripped and converted

Value

Numeric vector

Fit and forecast with time-series approaches

Description

This function allows the user to fit time series models and forecast values out to a specified horizon. Starting from a tsibble object (see make_tsibble), the function fits the models specified as a list in the "models" argument. The "Details" section provides more information on how to parameterize the models used. Note that if the input tsibble is "keyed" (e.g., grouped by location) then the procedure will fit and forecast independently for each grouping.

Usage

ts_fit_forecast(
  prepped_tsibble,
  outcome = "flu.admits",
  horizon = 4L,
  trim_date = "2021-01-01",
  models = list(arima = "PDQ(0, 0, 0) + pdq(1:2, 0:2, 0)", ets =
    "season(method=\"N\")", nnetar = NULL),
  covariates = TRUE,
  ensemble = TRUE
)
ts_fit_forecast(
  prepped_tsibble,
  outcome = "flu.admits",
  horizon = 4L,
  trim_date = "2021-01-01",
  models = list(arima = "PDQ(0, 0, 0) + pdq(1:2, 0:2, 0)", ets =
    "season(method=\"N\")", nnetar = NULL),
  covariates = TRUE,
  ensemble = TRUE
)

Arguments

`prepped_tsibble`	A `tsibble` with data formatted via make_tsibble
`outcome`	The outcome variable to model; default is `"flu.admits"`
`horizon`	Number of weeks ahead to forecast
`trim_date`	The date (YYYY-MM-DD) at which time series models should start fitting; default `"2021-01-01"`; if set to `NULL` the input data will not be trimmed (i.e., all data will be used to fit time series models)
`models`	A list of right hand side formula contents for models you want to run; default is `list(arima='PDQ(0, 0, 0) + pdq(1:2, 0:2, 0)', ets='season(method="N")', nnetar=NULL)` which runs a constrained ARIMA, non-seasonal ETS, and ignores the NNETAR model; see "Details" for more information
`covariates`	Logical. Should flu hospitalization-specific covariates that should be modeled with the time series? If so, historical hospitalization and ILI rank for each epidemiological week, brought in with prep_hdgov_hosp, is added to the ARIMA model.
`ensemble`	Logical as to whether or not the models should be ensembled (using mean); default `TRUE`

Details

When fitting time series models, the set of models used (and their parameters) can be defined via a named list passed to the "models" argument. The list should contain elements that define the right-hand side of model formulas. The function internally uses the fable::fable package, and any models provided must be part of the fable ecosystem of time series models. The models passed must be named as "arima", "ets", and "nnetar". To skip any one of these models set the named argument for the given model to NULL. The "models" argument defaults to list(arima = "PDQ(0, 0, 0) + pdq(1:2, 0:2, 0)", ets = "season(method='N')", nnetar = NULL). To run an unconstrained ARIMA: list(arima='PDQ() + pdq()') (see fable::ARIMA). To run a seasonal exponential smoothing: list(ets='season(method=c("A", "M", "N"), period="3 months")') (see fable::ETS). To run an autoregressive neural net with P=1: list(nnetar="AR(P=1)") (see fable::NNETAR).

Value

A list of the time series fit, time series forecast, and model formulas.

tsfit: A mdl_df class "mable" with one row for each location, columns for arima and ets models.
tsfor: A fbl_ts class "fable" with one row per location-model-timepoint up to horizon number of time points.
formulas: A list of ARIMA, ETS, and/or NNETAR formulas

References

https://fable.tidyverts.org/

Examples

## Not run: 
# Retrieve hospitalization data
h_raw <- get_hdgov_hosp(limitcols=TRUE)
# Prepare and summarize hospitalization data to weekly resolution
prepped_hosp <- prep_hdgov_hosp(h_raw)
# Create a keyed time series tibble with only locations of interest
prepped_tsibble <- make_tsibble(prepped_hosp,
                                epiyear = epiyear,
                                epiweek=epiweek,
                                key=location) %>%
  dplyr::filter(location %in% c("US", "51"))

# Run with default constrained ARIMA, nonseasonal ETS, no NNETAR
hospfor1 <- ts_fit_forecast(prepped_tsibble,
                            horizon=4L,
                            outcome="flu.admits",
                            covariates=TRUE)
# Run an unconstrained ARIMA, seasonal ETS, no NNETAR
hospfor2 <- ts_fit_forecast(prepped_tsibble,
                            horizon=4L,
                            outcome="flu.admits",
                            covariates=TRUE,
                            models=list(arima='PDQ() + pdq()',
                                        ets='season(method=c("A", "M", "N"), period="3 months")',
                                        nnetar=NULL))
# Run an unconstrained ARIMA, seasonal ETS, NNETAR
hospfor3 <- ts_fit_forecast(prepped_tsibble,
                            horizon=4L,
                            outcome="flu.admits",
                            covariates=TRUE,
                            models=list(arima='PDQ() + pdq()',
                                        ets='season(method=c("A", "M", "N"), period="3 months")',
                                        nnetar="AR(P=1)"))

## End(Not run)
## Not run: 
# Retrieve hospitalization data
h_raw <- get_hdgov_hosp(limitcols=TRUE)
# Prepare and summarize hospitalization data to weekly resolution
prepped_hosp <- prep_hdgov_hosp(h_raw)
# Create a keyed time series tibble with only locations of interest
prepped_tsibble <- make_tsibble(prepped_hosp,
                                epiyear = epiyear,
                                epiweek=epiweek,
                                key=location) %>%
  dplyr::filter(location %in% c("US", "51"))

# Run with default constrained ARIMA, nonseasonal ETS, no NNETAR
hospfor1 <- ts_fit_forecast(prepped_tsibble,
                            horizon=4L,
                            outcome="flu.admits",
                            covariates=TRUE)
# Run an unconstrained ARIMA, seasonal ETS, no NNETAR
hospfor2 <- ts_fit_forecast(prepped_tsibble,
                            horizon=4L,
                            outcome="flu.admits",
                            covariates=TRUE,
                            models=list(arima='PDQ() + pdq()',
                                        ets='season(method=c("A", "M", "N"), period="3 months")',
                                        nnetar=NULL))
# Run an unconstrained ARIMA, seasonal ETS, NNETAR
hospfor3 <- ts_fit_forecast(prepped_tsibble,
                            horizon=4L,
                            outcome="flu.admits",
                            covariates=TRUE,
                            models=list(arima='PDQ() + pdq()',
                                        ets='season(method=c("A", "M", "N"), period="3 months")',
                                        nnetar="AR(P=1)"))

## End(Not run)

Format time series forecast

Description

This function specifically formats time series forecasts generated with ts_fit_forecast to adhere to probabilistic forecast submission guidelines for consortia such as FluSight. It is used as a helper in format_for_submission.

Usage

ts_format_for_submission(
  tsfor,
  .target = "wk ahead inc flu hosp",
  .counts = TRUE
)
ts_format_for_submission(
  tsfor,
  .target = "wk ahead inc flu hosp",
  .counts = TRUE
)

Arguments

`tsfor`	The forecast from ts_fit_forecast
`.target`	Name of the target in the forecast; default is `"wk ahead inc flu hosp"`
`.counts`	Logical; default `TRUE` indicates that the target outcome is a count, and should be rounded off at an integer

Details

Uses quantiles c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99) in the built-in fiphde:::q, using an accessory table fiphde:::quidk.

Value

A named list of tibbles with probabilistic forecasts (one for each model), formatted for submission with the following columns:

forecast_date: Date of forecast
target: Horizon and name of forecasted target
target_end_date: Last date of the forecasted target (e.g., Saturday of the given epidemiological week)
location: FIPS code for location
type: One of either "point" or "quantile" for the forecasted value
quantile: The quantile for the forecasted value; NA if "type" is "point"
value: The forecasted value

References

https://github.com/cdcepi/Flusight-forecast-data/blob/master/data-forecasts/README.md

Examples

## Not run: 
# Get raw data from healthdata.gov
h_raw <- get_hdgov_hosp(limitcols=TRUE)

# Prep, and make a tsibble
prepped_hosp <- prep_hdgov_hosp(h_raw, statesonly=TRUE)
prepped_hosp_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)
# Limit to only Virginia and US
prepped_hosp_tsibble <-
  prepped_hosp_tsibble %>%
  dplyr::filter(location %in% c("US", "51"))

# Fit a model
hosp_fitfor <- ts_fit_forecast(prepped_hosp_tsibble,
                               horizon=4L,
                               outcome="flu.admits",
                               covariates=TRUE)

# Format for submission
formatted_list <- ts_format_for_submission(hosp_fitfor$tsfor)
formatted_list

## End(Not run)
## Not run: 
# Get raw data from healthdata.gov
h_raw <- get_hdgov_hosp(limitcols=TRUE)

# Prep, and make a tsibble
prepped_hosp <- prep_hdgov_hosp(h_raw, statesonly=TRUE)
prepped_hosp_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)
# Limit to only Virginia and US
prepped_hosp_tsibble <-
  prepped_hosp_tsibble %>%
  dplyr::filter(location %in% c("US", "51"))

# Fit a model
hosp_fitfor <- ts_fit_forecast(prepped_hosp_tsibble,
                               horizon=4L,
                               outcome="flu.admits",
                               covariates=TRUE)

# Format for submission
formatted_list <- ts_format_for_submission(hosp_fitfor$tsfor)
formatted_list

## End(Not run)

Validate forecast submission

Description

This function will take the prepped forecast data from format_for_submission and run a series of tests to validate the format.

Usage

validate_forecast(subdat)
validate_forecast(subdat)

Arguments

subdat

A tibble with submission ready forecasts prepped by and stored in output of format_for_submission

Value

Named list with elements for each test (including logical for whether or not test passed and message if failed) and an overall "valid" logical with TRUE if all tests passed an FALSE if at least one failed

Examples

## Not run: 
# Get raw data from healthdata.gov
h_raw <- get_hdgov_hosp(limitcols=TRUE)

# Prep, and make a tsibble
prepped_hosp <- prep_hdgov_hosp(h_raw, statesonly=TRUE)
prepped_hosp_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)
# Limit to only Virginia and US
prepped_hosp_tsibble <-
  prepped_hosp_tsibble %>%
  dplyr::filter(location %in% c("US", "51"))

# Fit a model
hosp_fitfor <- ts_fit_forecast(prepped_hosp_tsibble,
                               horizon=4L,
                               outcome="flu.admits",
                               covariates=TRUE)



# Format for submission
formatted_list <- format_for_submission(hosp_fitfor$tsfor, method = "ts", format = "legacy")
# Validate one of the forecasts
# Note that this expects forecast is prepared with forecast date = Monday of the current week
ens_forc <- formatted_list$ensemble
ens_forc$forecast_date <- this_monday()
validate_forecast(ens_forc)

## End(Not run)

## Not run: 
# Get raw data from healthdata.gov
h_raw <- get_hdgov_hosp(limitcols=TRUE)

# Prep, and make a tsibble
prepped_hosp <- prep_hdgov_hosp(h_raw, statesonly=TRUE)
prepped_hosp_tsibble <- make_tsibble(prepped_hosp,
                                     epiyear = epiyear,
                                     epiweek=epiweek,
                                     key=location)
# Limit to only Virginia and US
prepped_hosp_tsibble <-
  prepped_hosp_tsibble %>%
  dplyr::filter(location %in% c("US", "51"))

# Fit a model
hosp_fitfor <- ts_fit_forecast(prepped_hosp_tsibble,
                               horizon=4L,
                               outcome="flu.admits",
                               covariates=TRUE)



# Format for submission
formatted_list <- format_for_submission(hosp_fitfor$tsfor, method = "ts", format = "legacy")
# Validate one of the forecasts
# Note that this expects forecast is prepared with forecast date = Monday of the current week
ens_forc <- formatted_list$ensemble
ens_forc$forecast_date <- this_monday()
validate_forecast(ens_forc)

## End(Not run)

WHO/NREVSS clinical lab surveillance data

Description

Adapted from cdcfluview::who_nrevss.

This unexported helper function leverages the CDC FluView API to pull flu surveillance data collected from U.S. World Health Organization (WHO) Collaborating Laboratories and National Respiratory and Enteric Virus Surveillance System (NREVSS) laboratories. The data retrieved can be parameterized by geographic granularity and/or flu season. The function is used internally by get_cdc_clin.

Usage

who_nrevss(region = c("national", "hhs", "census", "state"), years = NULL)
who_nrevss(region = c("national", "hhs", "census", "state"), years = NULL)

Arguments

`region`	One of "`national`", "`hhs`", "`census`", or "`state`"
`years`	A vector of years to retrieve data for (i.e. `2014` for CDC flu season 2014-2015). CDC has data for this API going back to 1997. Default value (`NULL`) means retrieve all years. NOTE: if you specify a 2-digit season value the function will convert that to the corresponding season identifier (i.e. `57` == 2017-2018).

References

cdcfluview package

Package 'fiphde'

Help Index

Make clean column names

Description

Usage

Arguments

Value

Nowcast clinical laboratory percent positive flu data

Description

Usage

Arguments

Value

Examples

Calculate categorical probability density

Description

Usage

Arguments

Value

FIPHDE explorer app launcher

Description

Usage

Arguments

Value

Examples

Forecast categorical targets

Description

Usage

Arguments

Value

References

Examples

Forecast ILI

Description

Usage

Arguments

Value

Examples

Format forecasts for submission

Description

Usage

Arguments

Value

References

Examples

Retrieve clinical laboratory percent positive flu data

Description

Usage

Arguments

Value

References

Examples

Retrieve hospitalization data from FluSurv-NET

Description

Usage

Arguments

Details

Value

References

Examples

Retrieve ILI data from ILINet

Description

Usage

Arguments

Value

References

Examples

Retrieve hospitalization data from HHS

Description

Usage

Arguments

Details

Value

References

Examples

Retrieve weekly NHSN flu hospitalization data

Description

Usage

Arguments

Value

References

Make `tsibble`