I am new to R programming. I've generated a hierarchical time series using the hts package.I need to plot time series in each hierarchy separately using dygraphs.
library(hts)
abc <- ts(5 + matrix(sort(rnorm(1000)), ncol = 10, nrow = 100))
colnames(abc) <- c("A10A", "A10B", "A10C", "A20A", "A20B",
"B30A", "B30B", "B30C", "B40A", "B40B")
y <- hts(abc, characters = c(1, 2, 1))
fcasts1 <- forecast(y, method = "bu" ,h=4, fmethod = "arima",
parallel = TRUE)
dygraph(fcasts1,y)
I keep getting this error message ,
Error in UseMethod("as.xts") :
no applicable method for 'as.xts' applied to an object of class "c('gts', 'hts')"
Is there a solution for this issue ?Maybe if someone could tell me how to put the variables right in dygraph.
It is not possible to directly plot hts objects using dygraph. What you need to do is convert the hts$bts object into a matrix and then convert into a normal time series using ts() function.
Here is an example I've worked out.
library(hts)
abc <- ts(5 + matrix(sort(rnorm(1000)), ncol = 10, nrow = 100))
colnames(abc) <- c("A10A", "A10B", "A10C", "A20A", "A20B",
"B30A", "B30B", "B30C", "B40A", "B40B")
y <- hts(abc, characters = c(1, 2, 1))
fcasts1 <- forecast.gts(y, method = "bu" ,h=4, fmethod = "arima",
parallel = TRUE)
ts1 <- as.matrix(fcasts1$bts)
ts1 <- ts(ts1,start = c(2016,3), frequency = 12)
dygraph(ts1[,"A10A"],main='Sample dygraph ',ylab = 'Demand')
Related
My series has 3 different columns, first ID tag identifying the first outlet, then time tag, and finally the measurement.
I need to create forecasts for 100 different series (outlets). First I need to subset ID for the first outlet, then predict arima functions and finally collect 7 days ahead forecasts for every outlet. Moreover, I also need hourly, weekly, daily dummies in my model. So I need to xregs to the auto.arima procedure.
However, I am incapable create the code bellow with a loop that would run for all 100 different IDs.
df11 <-subset(df10,ID==288)%>%select(Tag,Measure)
sales.xts <- xts(df11[ ,c(-1)],order.by = df11$Tag)
sales.xts_m<-sales.xts["2020-07-22/2020-10-04"]
dummies<- xts(Seasonaldummies_all[,-1],order.by = Seasonaldummies_all$Tag)
dummies_hd_m<-dummies_hd["2020-07-22/2020-10-04"]
model<-auto.arima(sales.xts_m,xreg=dummies_hd_m, biasadj = TRUE,max.p=7,max.q=7,seasonal=FALSE,test=c("kpss"),lambda = "auto",num.cores=15,stationary = TRUE)
Can you show me a quick way to do that job by apply or loop functions?
You if you want to use forecast package need to convert your data into a ts (mts) object. To do that fist transform your data from long format to wide format (from the image you post above I assume your data is in a long format). Then by using ts() function to create a ts() object, see the example below.
Let's generate some example ts data
sales.xts_m <- ts(data.frame(AA = arima.sim(list(order=c(1,0,0), ar=.5), n=100,
mean = 12),
AB = arima.sim(list(order=c(1,0,0), ar=.5), n=100,
mean = 12),
AC = arima.sim(list(order=c(1,0,0), ar=.5), n=100,
mean = 11),
BA = arima.sim(list(order=c(1,0,0), ar=.5), n=100,
mean = 10),
BB = arima.sim(list(order=c(1,0,0), ar=.5), n=100,
mean = 14)), start = c(2000, 1),
frequency = 12)
nts <- ncol(sales.xts_m) # number of time series
h <- 12 # forecast horizon
Example xreg
dummies_hd_m <- forecast::seasonaldummy(sales.xts_m[,1])
dummies_hd_m_future <- forecast::seasonaldummy(sales.xts_m[,1], h = h)
mylist <- list()
fc <- matrix(nrow = h, ncol = nts)
if you need to keep the models --------------------
models will be in mylist and point forecast in fc for each ts
for (i in 1:nts) {
mylist[[i]] <- auto.arima(sales.xts_m[,i],xreg=dummies_hd_m, biasadj = TRUE,
max.p=7,max.q=7,seasonal= FALSE,test=c("kpss"),
lambda = "auto",num.cores=15,stationary = TRUE )
fc[,i] <- forecast(mylist[[i]], h=h, xreg = dummies_hd_m_future)$mean
}
#ts names
colnames(fc) <- colnames(sales.xts_m)
if you do not need to keep models --------------------
fc <- matrix(nrow = h, ncol = nts)
for (i in 1:nts) {
fc[,i] <- forecast(auto.arima(sales.xts_m[,i],xreg=dummies_hd_m, biasadj = TRUE,
max.p=7,max.q=7,seasonal=FALSE,test=c("kpss"),
lambda = "auto",num.cores=15,stationary = TRUE ), h=h,
xreg = dummies_hd_m_future)$mean
}
#ts names
colnames(fc) <- colnames(sales.xts_m)
If you want to use ML models for your projects
devtools::install_github("Akai01/caretForecast")
library(caretForecast)
nts <- ncol(sales.xts_m) # mumber of time series
h <- 12 # forecast horizon
fc <- matrix(nrow = h, ncol = nts)
example: Support Vector Machines with Linear Kernel. You need to change only caret_method argument to use another model, for example caret_method = "ridge" or caret_method = "rf" etc. Ref: https://github.com/Akai01/caretForecast
for (i in 1:nts) {
fc[,i] <- forecast(ARml(sales.xts_m[,i], maxlag = 12, xreg = dummies_hd_m,
caret_method = "svmLinear", seasonal = FALSE ),
h=h, xreg = dummies_hd_m_future)$mean
}
colnames(fc) <- colnames(sales.xts_m)
I am using Prophet for a multi-channel leads forecast. I've been able to make the prediction using the method described by #RLave in this post Prophet Forecasting using R for multiple items. I would like to add regressors to my forecast. Below is the example given by #RLave along with an example of how I have tried to add regressor and the results. How do I make add_regressor work with my leads list?
# also contains the purrr package
library(tidyverse)
set.seed(123)
tb1 <- tibble(
ds = seq(as.Date("2018-01-01"), as.Date("2018-12-31"), by = "day"),
y = sample(365)
regressor = rnorm(365, mean = 0, sd = 1)
)
tb2 <- tibble(
ds = seq(as.Date("2018-01-01"), as.Date("2018-12-31"), by = "day"),
y = sample(365)
regressor = rnorm(365, mean = 0, sd = 1)
)
# two separate time series
ts_list <- list(tb1, tb2)
Build and prediction:
library(prophet)
# prophet call
m_list <- map(ts_list, prophet)
# makes future obs
future_list <- map(m_list, make_future_dataframe, periods = 40)
# map2 because we have two inputs
forecast_list <- map2(m_list, future_list, predict)
Attempt to Add Regressor
m <- prophet()
m <- add_regressor(m, "regressor")
# prophet call
m_list2 <- map2(m, ts_list, fit.prophet )
Error: Mapped vectors must have consistent lengths:
.x has length 31
.y has length 14
I want to iterate over a list of linear models and apply "clustered" standard errors to each model using the vcovCL function. My goal is to do this as efficiently as possible (I am running a linear model across many columns of a dataframe). My problem is trying to specify additional arguments inside of the anonymous function. Below I simulate some fake data. Precincts represent my cross-sectional dimension; months represent my time dimension (5 units observed across 4 months). The variable int is a dummy for when an intervention takes place.
df <- data.frame(
precinct = c( rep(1, 4), rep(2, 4), rep(3, 4), rep(4, 4), rep(5, 4) ),
month = rep(1:4, 5),
crime = rnorm(20, 10, 5),
int = c(c(0, 1, 1, 0), rep(0, 4), rep(0, 4), c(1, 1, 1, 0), rep(0, 4))
)
df[1:10, ]
outcome <- df[3]
est <- lapply(outcome, FUN = function(x) { lm(x ~ as.factor(precinct) + as.factor(month) + int, data = df) })
se <- lapply(est, function(x) { sqrt(diag(vcovCL(x, cluster = ~ precinct + month))) })
I receive the following error message when adding the cluster argument inside of the vcovCL function.
Error in eval(expr, envir, enclos) : object 'x' not found
The only way around it, in my estimation, would be to index the dataframe, i.e., df$, and then specify the 'clustering' variables. Could this be achieved by specifying an additional argument for df inside of the function call? Is this code efficient?
Maybe specifying the model equation formulaically is a better way to go, I suppose.
Any thoughts/comments are always helpful :)
Here is one approach that would retrieve clustered standard errors for multiple models:
library(sandwich)
# I am going to use the same model three times to get the "sequence" of linear models.
mod <- lm(crime ~ as.factor(precinct) + as.factor(month) + int, data = df)
# define function to retrieve standard errors:
robust_se <- function(mod) {sqrt(diag(vcovCL(mod, cluster = list(df$precinct, df$month))))}
# apply function to all models:
se <- lapply(list(mod, mod, mod), robust_se)
If you want to get the entire output adjusted, the following might be helpful:
library(lmtest)
adj_stats <- function(mod) {coeftest(mod, vcovCL(mod, cluster = list(df$precinct, df$month)))}
adjusted_models <- lapply(list(mod, mod, mod), adj_stats)
To address the multiple column issue:
In case you are struggling with running linear models over several columns, the following might be helpful. All the above would stay the same, except that you are passing your list of models to lapply.
First, let's use this dataframe here:
df <- data.frame(
precinct = c( rep(1, 4), rep(2, 4), rep(3, 4), rep(4, 4), rep(5, 4) ),
month = rep(1:4, 5),
crime = rnorm(20, 10, 5),
crime2 = rnorm(20, 10, 5),
crime3 = rnorm(20, 10, 5),
int = c(c(0, 1, 1, 0), rep(0, 4), rep(0, 4), c(1, 1, 1, 0), rep(0, 4))
)
Let's define the outcome columns:
outcome_columns <- c("crime", "crime2", "crime3")
Now, let's run a regression with each outcome:
models <- lapply(outcome_columns,
function(outcome) lm( eval(parse(text = paste0(outcome, " ~ as.factor(precinct) + as.factor(month) + int"))), data = df) )
And then you would just call
adjusted_models <- lapply(models, adj_stats)
Regarding efficiency:
The above code is efficient in that it is easily adjustable and quick to write up. For most use cases, it will be perfectly fine. For computational efficiency, note that your design matrix is the same in all cases, i.e. by precomputing the common elements (e.g. inv(X'X)*X'), you could save some computations. You would however lose out on the convenience of many built-in functions.
There are two things I need to do. Firstly I would like to be able to create new variables in a coda mcmc object that have been calculated from existing variables so that I can run chain diagnostics on the new variable. Secondly I would like to be able to index single variables in some of the coda plot functions while still viewing all chains.
Toy data. Bayesian t-test on the sleep data using JAGS and rjags.
data(sleep)
# read in data
y <- sleep$extra
x <- as.numeric(as.factor(sleep$group))
nTotal <- length(y)
nGroup <- length(unique(x))
mY <- mean(y)
sdY <- sd(y)
# make dataList
dataList <- list(y = y, x = x, nTotal = nTotal, nGroup = nGroup, mY = mY, sdY = sdY)
# model string
modelString <- "
model{
for (oIdx in 1:nTotal) {
y[oIdx] ~ dnorm(mu[x[oIdx]], 1/sigma[x[oIdx]]^2)
}
for (gIdx in 1:nGroup) {
mu[gIdx] ~ dnorm(mY, 1/sdY)
sigma[gIdx] ~ dunif(sdY/10, sdY*10)
}
}
"
writeLines(modelString, con = "tempModel.txt")
# chains
# 1. adapt
jagsModel <- jags.model(file = "tempModel.txt",
data = dataList,
n.chains = 3,
n.adapt = 1000)
# 2. burn-in
update(jagsModel, n.iter = 1000)
# 3. generate
codaSamples <- coda.samples(model = jagsModel,
variable.names = c("mu", "sigma"),
thin = 15,
n.iter = 10000*15/3)
Problem one
If I convert the coda object to a dataframe I can calculate the difference between the estimates for the two groups and plot this new variable, like so...
df <- as.data.frame(as.matrix(codaSamples))
names(df) <- gsub("\\[|\\]", "", names(df), perl = T) # remove brackets
df$diff <- df$mu1 - df$mu2
ggplot(df, aes(x = diff)) +
geom_histogram(bins = 100, fill = "skyblue") +
geom_vline(xintercept = mean(df$diff), colour = "red", size = 1, linetype = "dashed")
...but how do I get a traceplot? I can get one for existing variables within the coda object like so...
traceplot(codaSamples[[1]][,1])
...but I would like to be able to get them for the the new diff variable.
Problem Two
Which brings me to the second problem. I would like to be able to get a traceplot (among other things) for individual variables. As I have shown above I can get them for a single variable if I only want to see one chain but I'd like to see all chains. I can see all chains for all variables in the model with the simple
plot(codaSamples)
...but what if I don't want or need to see all variables? What if I just want to see the trace and/or desnity plots for one, or even two, variables (but not all variables) but with all chains in the plot?
I want to estimate Realized GARCH (1,1) model. In my dataset I have the following time series:
ret <- replicate(1, rnorm(100))
RV <- replicate(1, rnorm(100))
date <- c(1:100)
I do the following:
install.packages("rugarch")
library(rugarch)
attspec <- ugarchspec(mean.model = list(armaOrder = c(0, 0), include.mean = FALSE), variance.model = list(model = 'realGARCH', garchOrder = c(1, 1)))
fit <- ugarchfit(spec=attspec, data=ret, solver = 'hybrid', realizedVol = RV[, 1])
After the last line I get an error: realizedVol must be an xts object
I tried to convert my RV matrix into xts object using the examples given in the description of the xts package:
require(xts)
rownames(RV) <- date
matrix_xts <- as.xts(RV,dateFormat='Date')
or
df_xts <- as.xts(as.data.frame(RV))
In both cases the error is character string is not in a standard unambiguous format
So, what should I do in order to make a suitable format of xts objest for the realizedVol specification?
You should have both ret and RV as xts objects, they can be initialized in the following way:
times<-Sys.time()+date
ret_xts<-xts(ret,order.by = times)
RV_xts <- xts(RV,order.by = times)
and then you can successfully call:
fit <- ugarchfit(spec=attspec, data=ret_xts, solver = 'hybrid', realizedVol = RV_xts)