Is it possible to adapt standard prediction interval code for dlm in R with other distribution? - r

Using the dlm package in R I fit a dynamic linear model to a time series data set, consisting of 20 observations. I then use the dlmForecast function to predict future values (which I can validate against the genuine data for said period).
I use the following code to create a prediction interval;
ciTheory <- (outer(sapply(fut1$Q, FUN=function(x) sqrt(diag(x))), qnorm(c(0.05,0.95))) +
as.vector(t(fut1$f)))
However my data does not follow a normal distribution and I wondered whether it would be possible to
adapt the qnorm function for other distributions. I have tried qt, but am unable to apply qgamma.......
Just wondered if anyone knew how you would go about sorting this.....
Below is a reproduced version of my code...
library(dlm)
data <- c(20.68502, 17.28549, 12.18363, 13.53479, 15.38779, 16.14770, 20.17536, 43.39321, 42.91027, 49.41402, 59.22262, 55.42043)
mod.build <- function(par) {
dlmModPoly(1, dV = exp(par[1]), dW = exp(par[2]))
}
# Returns most likely estimate of relevant values for parameters
mle <- dlmMLE(a2, rep(0,2), mod.build); #nileMLE$conv
if(mle$convergence==0) print("converged") else print("did not converge")
mod1 <- dlmModPoly(dV = v, dW = c(0, w))
mod1Filt <- dlmFilter(a1, mod1)
fut1 <- dlmForecast(mod1Filt, n = 7)
Cheers

Related

Why does R and PROCESS render different result of a mediation model (one is significant, the other one is not)?

As a newcomer who just gets started in R, I am confused about the result of the mediation analysis.
My model is simple: IV 'T1Incivi', Mediator 'T1Envied', DV 'T2PSRB'. I ran the same model in SPSS using PROCESS, but the result was insignificant in PROCESS; however, the indirect effect is significant in R. Since I am not that familiar with R, could you please help me to see if there is anything wrong with my code? And tell me why the result is significant in R but not in SPSS?Thanks a bunch!!!
My code in R:
X predict M
apath <- lm(T1Envied~T1Incivi, data=dat)
summary(apath)
X and M predict Y
bpath <- lm(T2PSRB~T1Envied+T1Incivi, data=dat)
summary(bpath)
Bootstrapping for indirect effect
getindirect <- function(dataset,random){
d=dataset[random,]
apath <- lm(T1Envied~T1Incivi, data=d)
bpath <- lm(T2PSRB~T1Envied+T1Incivi, data=dat)
indirect <- apath$coefficients["T1Incivi"]*bpath$coefficients["T1Envied"]
return(indirect)
}
library(boot)
set.seed(6452234)
Ind1 <- boot(data=dat,
statistic=getindirect,
R=5000)
boot.ci(Ind1,
conf = .95,
type = "norm")`*PSRB as outcome*
In your function getindirect all linear regressions should be based on the freshly shuffled data in d.
However there is the line
bpath <- lm(T2PSRB~T1Envied+T1Incivi, data=dat)
that makes the wrong reference to the variable dat which should really not be used within this function. That alone can explain incoherent results.

How can I load a library in R to call it from Excel with bert-toolkit?

Bert-toolkit is a very nice package to call R functions from Excel. See: https://bert-toolkit.com/
I have used bert-toolkit to call a fitted neuralnet (avNNnet fitted with Caret) within a wrapper function in R from Excel VBA. This runs perfect. This is the code to load the model within the wrapper function in bert-toolkit:
load("D:/my_model_avNNet.rda")
neuraln <- function(x1,x2,x3){
xx <- data.frame(x1,x2,x3)
z <- predict(my_model_avNNET, xx)
z
}
Currently I tried to do this with a fitted GAM (fitted with package mgcv). Although I do not succeed. If I call the fitted GAM from Excel VBA it gives error 2015. If I call the fitted GAM from a cell it gives #VALUE! At the same time the correct outcome of the calculation is shown in the bert-console!
This is the code to load the model in the wrapperfunction in bert-toolkit:
library(mgcv)
load("D:/gam_y_model.rda")
testfunction <- function(k1,k2){
z <- predict(gam_y, data.frame(x = k1, x2 = k2))
print (z)
}
The difference between the avNNnet-model (Caret) and the GAM-model (mgcv) is that the avNNnet-model does NOT need the Caret library to be loaded to generate a prediction, while the GAM-model DOES need the mgcv library to be loaded.
It seems to be not sufficient to load the mgvc-library in the script with the GAM-model which loads the GAM-model in a wrapper function in bert-toolkit, as I did in the code above. Although the correct outcome of the model is shown in the bert-console. It does not generate the correct outcome in Excel.
I wonder how this is possible and can be solved. It seems to me that maybe there are two instances of R running in bert-toolkit.
How can I load the the mgcv-library in such a way that it can be used by the GAM-model within the function called from Excel?
This is some example code to fit the GAM with mgcv and save to model (after running this code the model can uploaded in bert-toolkit with the code above) :
library(mgcv)
# construct some sample data:
x <- seq(0, pi * 2, 0.1)
x2 <- seq(0, pi * 20, 1)
sin_x <- sin(x)
tan_x2 <- tan(x2)
y <- sin_x + rnorm(n = length(x), mean = 0, sd = sd(sin_x / 2))
Sample_data <- data.frame(y,x,x2)
# fit gam:
gam_y <- gam(y ~ s(x) + s(x2), method = "REML")
# Make predictions with the fitted model:
x_new <- seq(0, max(x), length.out = 100)
x2_new <- seq(0, max(x2), length.out = 100)
y_pred <- predict(gam_y, data.frame(x = x_new, x2 = x2_new))
# save model, to load it later in bert-toolkit:
setwd("D:/")
save(gam_y, file = "gam_y_model.rda")
One of R's signatures is method dispatching where users call the same named method such as predict but internally a different variant is run such as predict.lm, predict.glm, or predict.gam depending on the model object passed into it. Therefore, calling predict on an avNNet model is not the same predict on a gam model. Similarly, just as the function changes due to the input, so does the output change.
According to MSDN documents regarding the Excel #Value! error exposed as Error 2015:
#VALUE is Excel's way of saying, "There's something wrong with the way your formula is typed. Or, there's something wrong with the cells you are referencing."
Fundamentally, without seeing actual results, Excel may not be able to interpret or translate into Excel range or VBA type the result R returns from gam model especially as you describe R raises no error.
For example, per docs, the return value of the standard predict.lm is:
predict.lm produces a vector of predictions or a matrix of predictions...
However, per docs, the return value of predict.gam is a bit more nuanced:
If type=="lpmatrix" then a matrix is returned which will give a vector of linear predictor values (minus any offest) at the supplied covariate values, when applied to the model coefficient vector. Otherwise, if se.fit is TRUE then a 2 item list is returned with items (both arrays) fit and se.fit containing predictions and associated standard error estimates, otherwise an array of predictions is returned. The dimensions of the returned arrays depends on whether type is "terms" or not: if it is then the array is 2 dimensional with each term in the linear predictor separate, otherwise the array is 1 dimensional and contains the linear predictor/predicted values (or corresponding s.e.s). The linear predictor returned termwise will not include the offset or the intercept.
Altogether, consider adjusting parameters of your predict call to render a numeric vector for easy Excel interpretation and not a matrix/array or some other higher dimension R type that Excel cannot render:
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="response")
return(z)
}
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="lpmatrix")
return(z)
}
testfunction <- function(k1,k2){
z <- mgcv::predict.gam(gam_y, data.frame(x = k1, x2 = k2), type=="linked")
return(z$fit) # NOTICE fit ELEMENT USED
}
...
Further diagnostics:
Check returned object of predict.glm with str(obj) and class(obj)/ typeof(obj) to see dimensions and underlying elements and compare with predict in caret;
Check if high precision of decimal numbers is the case such as Excel's limits of 15 decimal points;
Check amount of data returned (exceeds Excel's sheet row limit of 220 or cell limit of 32,767 characters?).

arima model for multiple seasonalities in R

I'm learning to create a forecasting model for time series that has multiple seasonalities. Following is the subset of dataset that I'm refering to. This dataset includes hourly data points and I wish to include daily as well as weekly seasonalities in my arima model. Following is the subset of dataset:
data= c(4,4,1,2,6,21,105,257,291,172,72,10,35,42,77,72,133,192,122,59,29,25,24,5,7,3,3,0,7,15,91,230,284,147,67,53,54,55,63,73,114,154,137,57,27,31,25,11,4,4,4,2,7,18,68,218,251,131,71,43,55,62,63,80,120,144,107,42,27,11,10,16,8,10,7,1,4,3,12,17,58,59,68,76,91,95,89,115,107,107,41,40,25,18,14,15,6,12,2,4,1,6,9,14,43,67,67,94,100,129,126,122,132,118,68,26,19,12,9,5,4,2,5,1,3,16,89,233,304,174,53,55,53,52,59,92,117,214,139,73,37,28,15,11,8,1,2,5,4,22,103,258,317,163,58,29,37,46,54,62,95,197,152,58,32,30,17,9,8,1,3,1,3,16,109,245,302,156,53,34,47,46,54,65,102,155,116,51,30,24,17,10,7,4,8,0,11,0,2,225,282,141,4,87,44,60,52,74,135,157,113,57,44,26,29,17,8,7,4,4,2,10,57,125,182,100,33,27,41,39,35,50,69,92,66,30,11,10,11,9,6,5,10,4,1,7,9,17,24,21,29,28,48,38,30,21,26,25,35,10,9,4,4,4,3,5,4,4,4,3,5,10,16,28,47,63,40,49,28,22,18,27,18,10,5,8,7,3,2,2,4,1,4,19,59,167,235,130,57,45,46,42,40,49,64,96,54,27,17,18,15,7,6,2,3,1,2,21,88,187,253,130,77,47,49,48,53,77,109,147,109,45,41,35,16,13)
The code I'm trying to use is following:
tsdata = ts (data, frequency = 24)
aicvalstemp = NULL
aicvals= NULL
for (i in 1:5) {
for (j in 1:5) {
xreg1 = fourier(tsdata,i,24)
xreg2 = fourier(tsdata,j,168)
xregs = cbind(xreg1,xreg2)
armodel = auto.arima(bike_TS_west, xreg = xregs)
aicvalstemp = cbind(i,j,armodel$aic)
aicvals = rbind(aicvals,aicvalstemp)
}
}
The cbind command in the above command fails because the number of rows in xreg1 and xreg2 are different. I even tried using 1:length(data) argument in the fourier function but that also gave me an error. If someone can rectify the mistakes in the above code to produce a forecast of next 24 hours using an arima model with minimum AIC values, it would be really helpful. Also if you can include datasplitting in your code by creating training and testing data sets, it would be totally awesome. Thanks for your help.
I don't understand the desire to fit a weekly "season" to these data as there is no evidence for one in the data subset you provided. Also, you should really log-transform the data because they do not reflect a Gaussian process as is.
So, here's how you could fit models with a some form of hourly signals.
## the data are not normal, so log transform to meet assumption of Gaussian errors
ln_dat <- log(tsdata)
## number of hours to forecast
hrs_out <- 24
## max number of Fourier terms
max_F <- 5
## empty list for model fits
mod_res <- vector("list", max_F)
## fit models with increasing Fourier terms
for (i in 1:max_F) {
xreg <- fourier(ln_dat,i)
mod_res[[i]] <- auto.arima(tsdata, xreg = xreg)
}
## table of AIC results
aic_tbl <- data.frame(F=seq(max_F), AIC=sapply(mod_res, AIC))
## number of Fourier terms in best model
F_best <- which(aic_tbl$AIC==min(aic_tbl$AIC))
## forecast from best model
fore <- forecast(mod_res[[F_best]], xreg=fourierf(ln_dat,F_best,hrs_out))

Weighted Portmanteau Test for Fitted GARCH process

I have fitted a GARCH process to a time series and analyzed the ACF for squared and absolute residuals to check the model goodness of fit. But I also want to do a formal test and after searching the internet, The Weighted Portmanteau Test (originally by Li and Mak) seems to be the one.
It's from the WeightedPortTest package and is one of the few (perhaps the only one?) that properly tests the GARCH residuals.
While going through the instructions in various documents I can't wrap my head around what the "h.t" argument wants. It says in the info in R that I need to assign "a numeric vector of the conditional variances". This may be simple to an experienced user, though I'm struggling to understand. What is it that I need to do and preferably how would I code it in R?
Thankful for any kind of help
Taken directly from the documentation:
h.t: a numeric vector of the conditional variances
A little toy example using the fGarch package follows:
library(fGarch)
library(WeightedPortTest)
spec <- garchSpec(model = list(alpha = 0.6, beta = 0))
simGarch11 <- garchSim(spec, n = 300)
fit <- garchFit(formula = ~ garch(1, 0), data = simGarch11)
Weighted.LM.test(fit#residuals, fit#h.t, lag = 10)
And using garch() from the tseries package:
library(tseries)
fit2 <- garch(as.numeric(simGarch11), order = c(0, 1))
summary(fit2)
# comparison of fitted values:
tail(fit2$fitted.values[,1]^2)
tail(fit#h.t)
# comparison of residuals after unstandardizing:
unstd <- fit2$residuals*fit2$fitted.values[,1]
tail(unstd)
tail(fit#residuals)
Weighted.LM.test(unstd, fit2$fitted.values[,1]^2, lag = 10)

How to extract info from package in R and use in function?

I apologize for the vague question title. What I want to do is run a regression in R using geeglm from the geepack R package, then use information from that to calculate a quasilikelihood information criteria (QIC; Pan 2001). I can do this fairly easily for single models but I would like to write a general function that can do this for a variety of different types of models. I guess my real question is whether there is a better alternative than having a long series of nested ifelse statements?
Here's my current code:
library(geepack)
data(dietox) #data from the geepack package
# Run gee regression
dietox$Cu <- as.factor(dietox$Cu)
mf <- formula(Weight ~ Cu * (Time + I(Time^2) + I(Time^3)))
gee1 <- geeglm(mf, data = dietox, id = Pig, family = gaussian, corstr = "ar1")
Then I can run a function to calculate the quasilikelihood:
QlogLik.normal <- function(model.R) {
library(MASS)
mu.R <- model.R$fitted.values
y <- model.R$y
# Quasi Likelihood for Normal
quasi.R <- sum(((y - mu.R)^2)/-2)
quasi.R
}
However, I would like to write a function that is more general because the quasilikelihood function is different for every distribution. The above function would work for gee1 because it had a gaussian (normal) distribution. If I wanted to generalize it for a variety of distributions I could use a series of nested ifelse statements (below), but I don't know if this is the best way to do this. Does anyone have other options or a better solution? This just doesn't seem very elegant to say the least (clearly I don't have much programming or R experience).
QlogLik <- function(model.R) {
library(MASS)
mu.R <- model.R$fitted.values
y <- model.R$y
ifelse(model.R$modelInfo$variance == "poisson",
# Quasi Likelihood for Poisson
quasi.R <- sum((y*log(mu.R)) - mu.R),
ifelse(model.R$modelInfo$variance == "gaussian",
# Quasi Likelihood for Normal
quasi.R <- sum(((y - mu.R)^2)/-2),
ifelse(model.R$modelInfo$variance == "binomial",
# Quasilikelihood for Binomial
quasi.R <- sum(y*log(mu.R/(1 - mu.R)) + log(1 - mu.R)),
quasi.R <- "Error: distribution not recognized")))
quasi.R
}
In this example, I used the model output from geeglm to extract the type of distribution used to model the variance
model.R$modelInfo$variance
but there may be other ways to determine what distribution was used in the geeglm model. Any help would be appreciated.
You should be able to rewrite your function like this:
QlogLik <- function(model.R) {
library(MASS)
mu.R <- model.R$fitted.values
y <- model.R$y
type <- family(model.R)$family
switch(type,
poisson = sum((y*log(mu.R)) - mu.R),
gaussian = sum(((y - mu.R)^2)/-2),
binomial = sum(y*log(mu.R/(1 - mu.R)) + log(1 - mu.R)),
stop("Error: distribution not recognized"))
}
As #baptise points out, switch useful in these cases. You use family(model.R)$family to automatically detect what family type should be used with switch.
Also, if your commands for what to do in different cases run beyond one line, you can wrap the lines with curly brackets ({ do something here }) instead.
switch(type,
type1 = { something <- do(this)
thisis(something) },
type2 = do(that))
I hope this helps!
You may also use model.R$family$family which gives the type of distribution used to model the variance, but so far I didn't know if you could eliminate those ifelse statements. The quasi.R in your code differs among different distributions, so you have to define each of them separately.
BTW, it is a good question and thanks for posting it: I had similar situations in the past, and hope to get some advice on how to write the codes more efficiently.

Resources