I'm currently estimating a VAR model followed by the estimation of generalized impulse response functions. To obtain SE of those, I'm supposed to do some bootstrapping first.
This procedure starts with "estimating the parameters of the VAR model and extracting the estimation residuals, denoted Ût."
Now, I'm estimating my var model with the vars package as follows
varendoA<-data.frame(value_ts,value2_ts, price_ts, price2_ts)
library(vars)
fitvar<- VAR(varendo, type = c("both"), season = christmas, lag.max = 12,ic = c("AIC"))
summary(fitvar)
The model contains 5 variables with 104 observations, a trend, constant and a dummy for the Christmas period and outputs a result with 5 lags.
Now when I want to extract its residuals residuals(fitvar) I get a list of 99 numbers per variable.
I'm supposed to use these residuals to generate bootstrap residuals (randomly drawing with replacement from the obtained ones) and use these with the estimated equations to generate new, bootstrapped time series to re-estimate the VAR and IRFs (and in the end obtain SEs for my estimations).
Since I'm supposed to recursively compute the new time series as follows:
shouldn't I get a list of 104 residuals per variable instead of 99? I'm a bit confused with this whole generating process.
Any help is more than appreciated.
In an autoregressive (AR) model, variables are forecast using linear combinations of past values of the variable. Since you have set lag.max = 12, you are allowing VAR to select a model that uses, at most, 12 lagged values as predictors.
Since your model uses 5 lags, VAR cannot fit values to the first 5 observations of your variables. This is because those first 5 observations are being used to fit a value to the 6th observation. Therefore, the number of residuals will be the number of observations minus the AR model order.
Related
can you please help with this question in R, i need to get more than one predictor:
Fit multiple linear regression without an intercept with the function lm() to train data
using variable (y.train) as a goal variable and variables (X.mat.train) as
predictors. Look at the vector of estimated coefficients of the model and compare it with
the vector of ’true’ values beta.vec graphically
(Tip: build a plot for the differences of the absolute values of estimated and true values).
i have already tried it out with a code i will post at the end but it give me only one predictor but in this example i need to get more than one predictor:
and i think the wrong one is the first line but i couldn't find a way to fix it :
i can't put the data set here it's large but i have a variable that stores 190 observation from a victor (y.train) and another value that stores 190 observation from a matrix (X.mat.trian).. should give more than one predictor but for me it's giving one..
simple.fit = lm(y.train~0+ X.mat.train) #goal var #intercept # predictor
summary(simple.fit)# Showing the linear regression output
plot(simple.fit)
abline(simple.fit)
n <- summary(simple.fit)$coefficients
estimated_coeff <- n[ , 1]
estimated_coeff
plot(estimated_coeff)
#Coefficients: X.mat.train 0.5018
v <- sum(beta.vec)
#0.5369
plot(beta.vec)
plot(beta.vec, simple.fit)
I've been trying to estimate a pVAR using GMM on R using the package panelvar. I'm estimating a dynamic panel VAR with two-step GMM using first-differences.
I have a balanced panel with 378 with a group variable (id) and a time variable (year). I have 14 observations per group (unvaried) and 27 groups in total. In total, I have 120 instruments. I'm a bit concerned about the results of the Hansen J-test and I'm looking for some explanations: I have a Hansen J-test statistic of 0 with a p-value of 1. To my understanding, this would mean that the model is correctly specified. But the fact that the p-value is very high (1.000), it might mean that something deeper is going on.
In my estimation, I have 7 dependent variables, 2 exogenous variables, and I'm using 4 lagged instruments per dependent variable. Why is the p-value of the Hansen test very high?
Thanks in advance!
I estimate the ARIMA model on a training dataset using the auto.arima function in R. Afterwards I am using the function forecast to make suppose 50 predictions and calculate the accuracy measures such as RMSE and MAE.
If I use the forecast function, it uses only the observations in the training set, and then makes the predictions at each time unit t using the values predicted at time t-1. What I am trying to do, is to make 1 prediction at time, adding at each time t an observed value to the training set, without reestimating the ARIMA model. So instead of considering the predicted values at time t-1, I would consider the real values. So if ARIMA has been estimated on the training dataset of 100 observations, the first forecast will be done considering the training dataset of length 100, the second forecast will consider the training set of length 101, the third forecast will take the training set of length 102 and so on.
The auto.arima output contains the datasets "x" which is the training set I use to estimate the model, and the dataset "fitted" which contains the fitted values. It also has the argument "nobs" which is the length of the dataset "x". I am trying to replace auto.arima$x with a new training dataset where the last observations are given by true values I add one at the time. I also modify "nobs" so it would give me the length of the new "x". But I noticed that the forecast for only one time ahead always considers the old training set. So for instance I added one observed value at a time to the training set and made the one ahead predictions for 50 times but all the predictions are equal to the first one. Like the forecast function ignores the fact that I replaced the "x" series inside the auto.arima output. I tried to replace the "fitted" values with the same result.
Does someone know how exactly the function "forecast" considers the training set based on which to make the predictions? What should I modify inside the auto.arima output at each time t to get the one-ahead predictions based on the real values at the previous times, instead of the estimated ones? Or there is a way to tell the "forecast" function to consider a different training dataset?
I don't want to refit ARIMA model on the new training dataset (using Arima function) and reestimate the residual variance, it takes literally forever...
Any suggestion would be helpful
Thank you in advance
everybody!
I have a response variable that counts sucessful days in a month and is distributed in a peculiar shape (see above). About 50% are zeros, and there is a heavy tail. Because of the overdispersion and the excess of zeros, I was advised to predict it with a Zero-Inflated Negative Binomial regression model.
However, no matter how significant a model I obtain, it reflects little of those distributing features (see below). For example, the peaks are always around 4, and no predictions fall beyond 20.
Is this usual in fitting overdispersed, heavy-tailed count data? Are there other ways to improve the fitting? Any suggestions would be appreciated. Thank you!
P. S.
I also tried logistic regression to predict zero/non-zero only. But none of the fitted models perform better than simply guessing zeros for all cases.
I suppose you did a histogram of the fitted values, so this will only reflect the fitted means, and possibly multiplied by the ratio of being zero depending on the model you use. It is not supposed to recreate that distribution because how spread your data can be is embedded in the dispersion parameter.
We can use an example from the pscl package:
library(pscl)
data("bioChemists")
fit <- hurdle(art ~ ., data = bioChemists,dist="negbin",zero.dist="binomial")
par(mfrow=c(1,2))
hist(fit$y,main="Observed")
hist(fit$fitted.values,main="Fitted")
As mentioned before, in this hurdle model, the fitted values you see, are the predicted means multiplied by the ratio of being zero (see more here):
head(fit$fitted.values)
1 2 3 4 5 6
1.9642025 1.2887343 1.3033753 1.3995826 2.4560884 0.8783207
head(predict(fit,type="zero")*predict(fit,type="count"))
1 2 3 4 5 6
1.9642025 1.2887343 1.3033753 1.3995826 2.4560884 0.8783207
To simulate the data based on the fitted model, we extract out the parameters:
Theta=fit$theta
Means=predict(fit,type="count")
Zero_p = predict(fit,type="prob")[,1]
Have function to simulate the counts:
simulateCounts = function(mu,theta,zero_p){
N = length(mu)
x = rnbinom(N,mu=mu,size=THETA)
x[runif(x)<zero_p] = 0
x
}
So run this simulation a number of times to get the spectrum of values:
set.seed(100)
simulated = replicate(10,simulateCounts(Means,Theta,Zero_p))
simulated = unlist(simulated)
par(mfrow=c(1,2))
hist(bioChemists$art,main="Observed")
hist(simulated,main="simulated")
I'm generating an Arima model with an external regressor. Let's suppose I have n observations. The predict.Arima function from forecast package just make predictions for n + 1 observation on.
I need to make a prediction for the n value (last value of the series), changing the value of the external regressor, i.e., I need to predict the value of the n observation given an specific value for the external regressor.
library(forecast)
set.seed(123)
aux <- 1:24
covari <- aux + rnorm(24,0,2)
vari <- ts(aux * runif(24,0,3), start=c(2010,1), freq=12)
mod <- auto.arima(vari, xreg=covari)
predict(mod, newxreg=20)
This code generate a model, and shows how to generate a prediction. I can control the number of periods ahead setting the parameter n.ahead.
predict(mod, newxreg=runif(4,15,25), n.ahead=4)
This code will generate predictions for the next 4 values of the series.
What I need is an n.ahead=-1, i.e., a prediction for a value inside the series, but with a different external regressor.
If I'm using just one external regressor the task is not complicated, because since is an additive model, I can just add the difference of the observed xreg value by the value I want multiplied by the coefficient of the xreg. However it gets more complicated if the number of external regressors increase.
Is there any way to predict values that are not ahead the end of the series of an Arima model?
What do you mean by "predict"? With time series, that is an estimate of a future value conditional on the observed past values. So a "prediction" of an observed value is simply the observed value.
But perhaps you mean fitted value. That is, the one-step forecast of an observation conditional on all previous observations. In that case, you can get what you want using fitted(mod).
By the way, predict.Arima() is not part of the forecast package. The forecast package provides the forecast.Arima() function as a replacement. For example:
forecast(mod, xreg=20)
forecast(mod, xreg=runif(4,15,25), h=4)
Update:
As explained in the comment, the OP wants a "prediction" of a past observation assuming a different value of the regressor had been observed. There are several ways of interpreting that.
First, where the coefficients are updated to reflect the new information, and only past data are used. In that case, just re-fit the model and get the fitted values.
Second, where the coefficients are not updated, and only past data are used. There is no function for that, and I'm not sure why anyone would need to do it. But it can be done as follows:
fitted(mod) + mod$coef["covari"] * (newx - oldx)
Third, where the coefficients are not updated, and all data are used. Then we get
observed + mod$coef["covari"] * (newx - oldx)