How to retrieve the coefficients which calculated by rolling lm() function - r

Using function slidify() in package timetk to create lm_roll(), how to retrieve the coefficients calculated by lm_roll() and calculate slope ? Below code , the last muate can't work . Thanks!
library(timetk)
library(tidyverse)
library(tidyquant)
lm_roll <- slidify(~lm(.x ~ .y),.period=90,.unlist=FALSE,.align ="right")
FB <- FANG %>% filter(symbol == "FB")
FB %>%
drop_na() %>%
mutate(numeric_date = as.numeric(date)) %>%
mutate(rolling_lm = lm_roll(adjusted,numeric_date)) %>%
filter(!is.na(rolling_lm)) %>% mutate(intercept= coef(rolling_lm)[1],
numeric_date_index = coef(rolling_lm)[2],
slope=coef(rolling_lm)[1]/coef(rolling_lm)[2])

Related

Forecasting time series groups based on trends using nested data in R

Reminiscent of this question: Forecasting Time Series Groups with tslm() & tidyverse
except I want to use Matt Dancho’s code at https://cran.rstudio.com/web/packages/sweep/vignettes/SW01_Forecasting_Time_Series_Groups.html
How can I use forecast::tslm() to produce grouped time series predictions from a nested data frame? The solution Rob Hyndman kindly provided used the tsibble and fable packages without nesting.
The map() of tslm() at the bottom of this code generates:
Caused by error in `formula.default()`:
! invalid formula
starter_time <- Sys.time()
library(dplyr)
library(timetk)
library(tidyr)
library(purrr)
library(lubridate)
library(forecast)
library(broom)
library(sweep)
library(zoo)
monthly_qty_by_cat2 <- bike_sales %>%
mutate(order.month = as_date(as.yearmon(order.date))) %>%
group_by(category.secondary, order.month) %>%
summarise(total.qty = sum(quantity))
monthly_qty_by_cat2_nest <- monthly_qty_by_cat2 %>%
group_by(category.secondary) %>%
nest()
monthly_qty_by_cat2_ts <- monthly_qty_by_cat2_nest %>%
mutate(data.ts = map(.x = data,
.f = tk_ts,
select = -order.month,
start = 2011,
freq = 12))
## invalid formula ERROR
monthly_qty_by_cat2_fit <- monthly_qty_by_cat2_ts %>%
mutate(fit.ets = map(data.ts, tslm, total.qty ~ trend))

Future dataset is incomplete when using Fable Prophet

I'm trying to view the out of sample performance scores after running fable prophet. Please note, the forecast is grouped based on type and the forecast is looking 5 observations ahead.
Here is the code:
library(tibble)
library(tsibble)
library(fable.prophet)
lax_passengers <- read.csv("https://raw.githubusercontent.com/mitchelloharawild/fable.prophet/master/data-raw/lax_passengers.csv")
library(dplyr)
library(lubridate)
lax_passengers <- lax_passengers %>%
mutate(datetime = mdy_hms(ReportPeriod)) %>%
group_by(month = yearmonth(datetime), type = Domestic_International) %>%
summarise(passengers = sum(Passenger_Count)) %>%
ungroup()
lax_passengers <- as_tsibble(lax_passengers, index = month, key = type)
fit <- lax_passengers %>%
model(
mdl = prophet(passengers ~ growth("linear") + season("year", type = "multiplicative")),
)
fit
test_tr <- lax_passengers %>%
slice(1:(n()-5)) %>%
stretch_tsibble(.init = 12, .step = 1)
fc <- test_tr %>%
model(
mdl = prophet(passengers ~ growth("linear") + season("year", type = "multiplicative")),
) %>%
forecast(h = 5)
fc %>% accuracy(lax_passengers)
When I run fc %>% accuracy(lax_passenger), I get the following warning:
Warning message:
The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
5 observations are missing between 2019 Apr and 2019 Aug
How do make the future dataset complete as I believe the performance score isn't accurate based on the missing 5 observations.
It seems like when I try to stretch the tsibble, it doesn't slice correctly as it doesn't remove the last 5 observations from each type.
The slice() function removes rows from the entire dataset, so it is only removing the last 5 rows from your last key (type=="International"). To remove the last 5 rows from all keys, you'll need to group by keys and slice.
test_tr <- lax_passengers %>%
group_by_key() %>%
slice(1:(n()-5)) %>%
ungroup() %>%
stretch_tsibble(.init = 12, .step = 1)

Extract all columns of nested tibble data based on condition

For example I want to extract and add all variables based on minimal value of one variable (i.e. year in nested gapminder by country)
library(tidyverse)
data("gapminder")
gap_nested <- gapminder %>%
nest(data = -country) %>%
mutate(year = map(data, ~ min(.x$year)))
How do I do this? )
You can use the filter function
You can use the filter function from they dplyr package (included in tidyverse), like in this example:
gap_nested <- gapminder %>%
nest(data = -country) %>%
mutate(year = map(data, ~ min(.x$year))) %>%
filter(year == 1960)
This will return only countries which have minimum year equals to 1960.
Hope this helps.

Bootstrapping by multiple groups in the tidyverse: rsample vs. broom

In this SO Question bootstrapping by several groups and subgroups seemed to be easy using the broom::bootstrap function specifying the by_group argument with TRUE.
My desired output is a nested tibble with n rows where the data column contains the bootstrapped data generated by each bootstrap call (and each group and subgroup has the same amount of cases as in the original data).
In broom I did the following:
# packages
library(dplyr)
library(purrr)
library(tidyr)
library(tibble)
library(rsample)
library(broom)
# some data to bootstrap
set.seed(123)
data <- tibble(
group=rep(c('group1','group2','group3','group4'), 25),
subgroup=rep(c('subgroup1','subgroup2','subgroup3','subgroup4'), 25),
v1=rnorm(100),
v2=rnorm(100)
)
# the actual approach using broom::bootstrap
tibble(id = 1:100) %>%
mutate(data = map(id, ~ {data %>%
group_by(group,subgroup) %>%
broom::bootstrap(100, by_group=TRUE)}))
Since the broom::bootstrap function is deprecated, I rebuild my approach with the desired output using rsample::bootstraps. It seems to be much more complicated to get my desired output. Am I doing something wrong or have things gotten more complicated in the tidyverse when generating grouped bootstraps?
data %>%
dplyr::mutate(group2 = group,
subgroup2 = subgroup) %>%
tidyr::nest(-group2, -subgroup2) %>%
dplyr::mutate(boot = map(data, ~ rsample::bootstraps(., 100))) %>%
pull(boot) %>%
purrr::map(., "splits") %>%
transpose %>%
purrr::map(., ~ purrr::map_dfr(., rsample::analysis)) %>%
tibble(id = 1:length(.), data = .)

R Many Models Multiple Functions

I've written a routine that extracts information from lmer models to compute the ICC and get the LRT from lmerTest's ranova function. What I have below works but I suspect it could be improved by (a) combining the two functions into one and returning a list, but I can't seem to access the list elements with purrr's map function, and (b) using multiple mutate/purrr lines to get all the needed data in one place rather than having to join later. My code follows using the "Peet" dataset provided in Hox (2002) and available at the UCLA IDRE site:
library(foreign)
library(lme4)
library(tidyverse)
library(purrr)
#Peet family data described and used in Hox
peet.dat<-read.dta("https://stats.idre.ucla.edu/stat/stata/examples/mlm_ma_hox/peetmis.dta")
names(peet.dat)
#convert to long format
peet.long.dat <- peet.dat %>%
tidyr::gather(type, score, -family,-sex,-person) %>%
arrange(type)
names(peet.long.dat)
#need two functions, one for the MLM estimates and the other for
#ranova p-test for variance--merge later by type
aov_model <- function(df) {
lmr.model <- lmerTest::lmer(score~ 1 + (1|family), data=df)
}
aov_test <- function(df) {
lmr.model <- lmerTest::lmer(score~ 1 + (1|family), data=df)
ll.test <- lmerTest::ranova(lmr.model)
}
#get the model estimates
models <- peet.long.dat %>%
nest(-type) %>%
mutate(aov_obj = map(data, aov_model),
summaries = map(aov_obj, broom.mixed::tidy)) %>%
unnest(summaries, .drop = T) %>%
select(type, effect, estimate, term) %>%
filter(effect != "fixed") %>%
mutate(variance = estimate^2) %>%
select(-estimate, -effect) %>%
spread(term, variance) %>%
rename(group.var = `sd__(Intercept)`, residual = `sd__Observation`) %>%
mutate(ICC = group.var/(group.var+residual))
models
#get the ranova LRTs
tests <- peet.long.dat %>%
nest(-type) %>%
mutate(test_obj = map(data, aov_test),
test_summaries = map(test_obj, broom.mixed::tidy)) %>%
unnest(test_summaries, .drop = T) %>%
filter(!is.na(LRT))
#join estimates with LRT p values
models %>% left_join(tests[c("type","p.value")])
Any help greatly appreciated.
I think the key here is to split() your data.frame based on the variable type:
# convert to list by type
peet.ls <- peet.dat %>%
tidyr::gather(type, score, -family,-sex,-person) %>%
split(.$type)
# map to fit models on subsets and return summaries
peet.ls %>%
map(function(df.x) {
# fit the model
lmr_model <- lmerTest::lmer(score~ 1 + (1|family), data = df.x)
#get the model estimates
mlm_est <- lmr_model %>%
broom.mixed::tidy() %>%
select(effect, estimate, term) %>%
filter(effect != "fixed") %>%
mutate(variance = estimate^2) %>%
select(-estimate, -effect) %>%
spread(term, variance) %>%
rename(group.var = `sd__(Intercept)`,
residual = `sd__Observation`) %>%
mutate(ICC = group.var/(group.var+residual))
# get the ranova LRTs & add to other estimates
mlm_est$p.value <- lmr_model %>%
lmerTest::ranova() %>%
broom.mixed::tidy() %>%
filter(!is.na(LRT)) %>%
pull(p.value)
# return summaries
mlm_est
}) %>%
# combine data.frames and add the variable 'type'
bind_rows(.id = "type") %>%
select(type, everything())

Resources