I am using the forecast package of R and I created a MA(1) model by using the ARIMA function. I plotted the time series itself ($x variable of the ma_model), the model itself ($fitted variable of the ma_model) and the residuals (residuals variable of the ma_model). Strangely the time series looks equal to the model altough there are nonegative residuals. Here is the code that I used:
library(forecast)
ma_model<-Arima(ts(generationData$Price[1:200]), order=c(0,1,0))
plot(ma_model$fitted, main = "Fitted")
plot(ma_model$x, main = "X")
plot(ma_model$residuals, main = "Residuals")
Here is the result
Basically the model can't be equal to the real time series especially when having residuals. Can anyone explain this to me? I'd appreciate every comment.
Update: I tried to use the order=c(0,0,20) so I have a MA(20) or AR(20) model (I am not sure which parameters stands for MA and AR). Now the fitted curve and the original time series look quite equal (but not exactly equal). Is this possible and usual? I'd appreciate every further comment.
Any comments on this issue?
I am not sure about your output, but from the code it seems that you just took the difference in the model, not the MA.
I think it should be order=c(0,0,1) instead of order=c(0,1,0) for building the MA model.
Related
I'm new to R but have some experience with ARIMA models. Now I wanted to learn a bit about neural networks for forecasting.
I tried to repeat the procedure from Rob's post. It worked great for the data set he used. It also worked great for imaginary datasets I created.
But then I tried to use real-life data (revenue data for 7 years monthly) and the resulting forecasts are strangely flat. My code:
read.csv("Revenue.csv",header=TRUE)
x <-read.csv("Revenue.csv",header=TRUE)
y<-ts(x,freq=12,start=c(2011,1))
(fit<-nnetar(y))
fcast <- forecast(fit, PI=TRUE, h=20, bootstrap=TRUE)
autoplot(fcast)
The result is an almost straight line (attached as picture 1). That strikes me as odd, because the trend has been positive so far: there was a revenue growth of more than 100% every year. Still the result of nnetar is that the revenue will stabilise. How is that possible?
As a comparison I used Auto.arima for the same data set (picture 2). It shows a clear upward trend.
One suggestion, even if its hard to help without the data sample.
It appears than nnetar is not capturing very well the trend in your data.
Probably you could try to use a trend as external regressors ( xreg argument)
For example for a deterministic trend.
Trend=seq(from=start, to=end, by=1)
(fit <- nnetar(y, xreg=Trend))
(f <- forecast(fit,h=h, xreg=seq(from=end, to=end+h, by=1))
An alternative would be to use more lag or seasonal lags (p and P argument in your nnetar model)
I am working through the "Forecasting Using R" DataCamp course. I have completed the entire thing except for the last part of one particular exercise (link here, if you have an account), where I'm totally lost. The error help it's giving me isn't helping either. I'll put the various parts of the task down with the code I'm using to solve them:
Produce time plots of only the daily demand and maximum temperatures with facetting.
autoplot(elec[, c("Demand", "Temperature")], facets = TRUE)
Index elec accordingly to set up the matrix of regressors to include MaxTemp for the maximum temperatures, MaxTempSq which represents the squared value of the maximum temperature, and Workday, in that order.
xreg <- cbind(MaxTemp = elec[, "Temperature"],
MaxTempSq = elec[, "Temperature"] ^2,
Workday = elec[,"Workday"])
Fit a dynamic regression model of the demand column with ARIMA errors and call this fit.
fit <- auto.arima(elec[,"Demand"], xreg = xreg)
If the next day is a working day (indicator is 1) with maximum temperature forecast to be 20°C, what is the forecast demand? Fill out the appropriate values in cbind() for the xreg argument in forecast().
This is where I'm stuck. The sample code they supply looks like this:
forecast(___, xreg = cbind(___, ___, ___))
I have managed to work out that the first blank is fit, so I'm trying code that looks like this:
forecast(fit, xreg = cbind(elec[,"Workday"]==1, elec[, "Temperature"]==20, elec[,"Demand"]))
But that is giving me the error hint "Make sure to forecast the next day using the inputs given in the instructions." Which... doesn't tell me anything useful. Any ideas what I should be doing instead?
When you are forecasting ahead of time, you use new data that was not included in elec (which is the data set you used to fit your model). The new data was given to you in the question (temperature 20C and workday 1). Therefore, you do not need elec in your forecastcall. Just use the new data to forecast ahead:
forecast(fit, xreg = cbind(20, 20^2, 1))
I am working with some daily time series (unequally spaced) problem and want to detect the order of seasonality (and/or the frequency of the data if necessary).
I know there is seasonality from the time series plot and ACF plot. The features of seasonality is obvious. My code looks like the following:
plot(mydates, mydata, type="l")
Acf(mydata)
I tried to fit the data using auto.arima, but It returns a non-seasonal fit.
auto.arima(mydata)
Series: mydata, ARIMA(1,0,1) with zero mean, Coefficients: ....
I also tried function nsdiffs, and it doesn't work either.
nsdiffs(mydata)
Error in nsdiffs(tslist[[1]]) : Non seasonal data
nsdiffs(ts(mydata, frequency=90))
0
I technically cannot use the ts function because I don't really know the frequency of my data (which is what I intend to find out). But I tested anyway, using some random guess of the frequency. It returns 0 every time.
Could anyone help me with this?
Thank you!
I'm doing some survival analysis in R, and looking to tidy up/simplify my code.
At the moment I'm doing several steps in my data analysis:
make a Surv object (time variable with indication as to whether each observation was censored);
fit this Surv object according to a categorical predictor, for plotting/estimation of median survival time processes; and
calculate a log-rank test to ask whether there is evidence of "significant" differences in survival between the groups.
As an example, here is a mock-up using the lung dataset in the survival package from R. So the following code is similar enough to what I want to do, but much simplified in terms of the predictor set (which is why I want to simplify the code, so I don't make inconsistent calls across models).
library(survival)
# Step 1: Make a survival object with time-to-event and censoring indicator.
# Following works with defaults as status = 2 = dead in this dataset.
# Create survival object
lung.Surv <- with(lung, Surv(time=time, event=status))
# Step 2: Fit survival curves to object based on patient sex, plot this.
lung.survfit <- survfit(lung.Surv ~ lung$sex)
print(lung.survfit)
plot(lung.survfit)
# Step 3: Calculate log-rank test for difference in survival objects
lung.survdiff <- survdiff(lung.Surv ~ lung$sex)
print(lung.survdiff)
Now this is all fine and dandy, and I can live with this but would like to do better.
So my question is around step 3. What I would like to do here is to be able to use information in the formula from the lung.survfit object to feed into the calculation of the differences in survival curves: i.e. in the call to survdiff. And this is where my domitable [sic] programming skills hit a wall. Below is my current attempt to do this: I'd appreciate any help that you can give! Once I can get this sorted out I should be able to wrap a solution up in a function.
lung.survdiff <- survdiff(parse(text=(lung.survfit$call$formula)))
## Which returns following:
# Error in survdiff(parse(text = (lung.survfit$call$formula))) :
# The 'formula' argument is not a formula
As I commented above, I actually sorted out the answer to this shortly after having written this question.
So step 3 above could be replaced by:
lung.survdiff <- survdiff(formula(lung.survfit$call$formula))
But as Ben Barnes points out in the comment to the question, the formula from the survfit object can be more directly extracted with
lung.survdiff <- survdiff(formula(lung.survfit))
Which is exactly what I wanted and hoped would be available -- thanks Ben!
I've fitted a VECM model in R, and converted in to a VAR representation. I would like to use this model to predict the future value of a response variable based on different scenarios for the explanatory variables.
Here is the code for the model:
library(urca)
library(vars)
input <-read.csv("data.csv")
ts <- ts(input[16:52,],c(2000,1),frequency=4)
dat1 <- cbind(ts[,"dx"], ts[,"u"], ts[,"cci"],ts[,"bci"],ts[,"cpi"],ts[,"gdp"])
args('ca.jo')
vecm <- ca.jo(dat1, type = 'trace', K = 2, season = NULL,spec="longrun",dumvar=NULL)
vecm.var <- vec2var(vecm,r=2)
Now what I would like do is to predict "dx" into the future by varying the others. I am not sure if something like "predict dx if u=30,cpi=15,bci=50,gdp=..." in the next period would work. So what I have in mind is something along the lines of: increase "u" by 15% in the next period (which would obviously impact on all the other variables as well, including "dx") and predict the impact into the future.
Also, I am not sure if the "vec2var" step is necessary, so please ignore it if you think it is redundant.
Thanks
Karl
This subject is covered very nicely in Chapters 4 and 8 of Bernhard Pfaff's book, "Analysis of Integrated and Cointegrated Time Series with R", for which the vars and urca packages were written.
The vec2var step is necessary if you want to use the predict functionality that's available.
A more complete answer was provided on the R-Sig-Finance list. See also this related thread.
Here you go - ??forecast gave vars::predict, Predict method for objects of class varest and vec2var as an answer, which looks precisely as you want it. Increasing u looks like impulse response analysis, so look it up!