I have data from multiple countries in different years and around 65 variables.
I've thought of using PLS to find the predictors that best explain a variable (Purchasing.power.parity) and then use auto.arima to forecast the next year's results.
I'm learning about PLS and PCA so I've tried to understand all the math behind it and this is what I came up with so far:
library(pls)
library(forecast)
modelo_pls = plsr(Purchasing.power.parity~. , ncomp=4, data=imputed, validation="CV", scale=T)
summary(modelo_pls)
plot(RMSEP(modelo_pls))
tsdata<-ts(imputed$Purchasing.power.parity)
modelo<-auto.arima(tsdata,xreg =modelo_pls$scores[,1:4])
summary(modelo)
pronostico<- forecast(modelo,xreg =modelo_pls$Yscores[,1:4])
plot(pronostico)
The issue is that auto.arima returns a (0,0,0) model even though the 4 PLS components explain nearly a variance of 98%.
Related
I am very new to machine learning. I am trying to explore fitting random forests with the ranger library in R. My dependent variable is continuous - so it would be a regression tree (and not just classification). Upon trying out the functions, I have noticed that there seems to be a discrepancy between ranger and predict ranger. The following lines result in different predictions in results and results_alternative:
rf_reg <- ranger(formula = y ~ ., data = training_df)
results <- rf_reg$predictions
results_alterantive <- predict(rf_reg, data = training_df)$predictions
Could anybody please explain why there is a discrepancy and what is causing it? Which one is correct? I have tried it with classification on iris data and that seemed to give the same results. Many thanks!
Further to this discussion regarding fitting arima model using external regressors.
From Auto.arima to forecast in R
I was able to forecast perfectly for next 5 months given that I had future values for the predictors explaining my response variable (churn_rate).
arima_model_churn_rate <- auto.arima(tsm_churn_rate, stepwise = FALSE,
approximation = FALSE,
xreg = xreg_in_out_p_month_1)
number_of_future_month <- 5
forecast_churn_rate <- forecast (arima_model_churn_rate,
xreg = xreg_fut_in_out_p_month_churn_rate,
h = number_of_future_month)
plot(forecast_churn_rate)
My question is as I need to predict in future I can not wait for the predictors to be measured to make prediction for future months ?
If I have to wait till end of month then I can do simple calculation to see what is churn rate ?
My goal is predict for next 3 months in that case what I should I do get future values for my predictors?
I am kind of confused with this whole scenario as discussed in the blog. For arima model with external regressor we need future values. Its perfectly worked for example case where I just trained my model on 2 years data and I used next 5 months measurements for predictors as future value.
But what If I want to predict for future 3/6/ or even year and If I have to wait for future values then I am already in that time point. Then prediction does not make any sense.
Can someone explain this whole concept to me please. Sorry if I could not explain this whole scenario really well. I tried my level best to get around though.
Thanks in advance !!
If you don't have values for your future predictors, then you need to either forecast them first, or use a different model.
You could try a model without those predictors, or you could include lagged values of the predictors where the lag is at least as long as the forecast horizon.
I've tried to use dlm package on R to forecast with Kalman Filter.
My model is like this, using five variables:
build <- function(u) {
reg <-dlmModReg(vars[,1:5], start=startM, end=endM, dV=exp(u[1]), dW=exp(u[2:7]))
return(reg)
}
I can get time varying coefficients when MODELING. However, when I try to forecast, the coefficients in the result gets constant only in the forecasting part like this: https://www.dropbox.com/s/boig4ln5acxrtra/coefficients.png
I use the following way to predict.
Gaussian state space forecasting with regression effects
https://stats.stackexchange.com/questions/5090/gaussian-state-space-forecasting-with-regression-effects
So, if anyone know how to get time-varying coefficients in dlmFilter to forecast, please let me know. I appreciate your help. Thanks.
I have two temporal processes. I would like to see if one temporal process (X_{t,2}) can be used to perform better forecast of the other process (X_{t,1}). I have multiple sources providing temporal data on X_{t,2}, (e.g. 3 time series measuring X_{t,2}). All time series require a seasonal component.
I found MARSS' notation to be pretty natural to fit this type of model and the code looks like this:
Z=factor(c("R","S","S","S")) # observation matrix
B=matrix(list(1,0,"beta",1),2,2) #evolution matrix
A="zero" #demeaned
R=matrix(list(0),4,4); diag(R)=c("r","s","s","s")
Q="diagonal and unequal"
U="zero"
period = 12
per.1st = 1 # Now create factors for seasons
c.in = diag(period)
for(i in 2:(ceiling(TT/period))) {c.in = cbind(c.in,diag(period))}
c.in = c.in[,(1:TT)+(per.1st-1)]
rownames(c.in) = month.abb
C = "unconstrained" #2 x 12 matrix
dlmfit = MARSS(data, model=list(Z=Z,B=B,Q=Q,C=C, c=c.in,R=R,A=A,U=U))
I got a beta estimate implying that the second temporal process is useful in forecasting the first process but to my dismay, MARSS gives me an error when I use MARSSsimulate to forecast because one of the matrices (related to seasonality) is time-varying.
Anyone, knows a way around this issue of the MARSS package? And if not, any tips on fitting an analogous model using, say the dlm package?
I was able to represent my state-space model in a form adequate to use with the dlm package. But I encountered some problems using dlm too. First, the ML estimates are VERY unstable. I bypassed this issue by constructing the dlm model based on marss estimates. However, dlmFilter is not working properly. I think the issue is that dlmFilter is not designed to deal with models with multiple sources for one time series, and additional seasonal components. dlmForecast gives me forecasts that I need!!!
In summary for my multivariate time series model (with multiple sources providing data for one of the temporal processes), the MARSS library gave me reasonable estimates of the parameters and allowed me to obtain filtered and smoothed values of the states. Forecast values were not possible. On the other hand, dlm gave fishy estimates for my model and the dlmFilter didn't work, but I was able to use dlmForecast to forecast values using the model I fitted in MARSS and reexpressed in dlm appropriate form.
I have fit my discrete count data using a variety of functions for comparison. I fit a GEE model using geepack, a linear mixed effect model on the log(count) using lme (nlme), a GLMM using glmer (lme4), and a GAMM using gamm4 (gamm4) in R.
I am interested in comparing these models and would like to plot the expected (predicted) values for a new set of data (predictor variables). My goal is to compare the predicted effects for each model under particular conditions (x variables). Of particular interest is the comparison between marginal (GEE) and conditional estimates.
I think my main problem might be getting the new data in the correct form with the correct labels and attributes and such. I am still very much an R novice and struggle with this stuff (no course on this at my university unfortunately).
I currently have fitted models
gee1 lme1 lmer1 gamm1
and can extract their fixed effect coefficients and standard errors without a problem. I also don't have a problem converting them from the log scale or estimating confidence intervals accounting for the random effects.
I also have my new dataframe newdat which has 365 observations of 23 variables (average environmental data for each day of the year).
I am stuck on how to predict new count estimates from this. I played around with the model.matrix function but couldn't get it to work. For example, I tried:
mm = model.matrix(terms(glmm1), newdat) # Error in model.frame.default(object,
# data, xlev = xlev) : object is not a matrix
newdat$pcount = mm %*% fixef(glmm1)
Any suggestions or good references would be greatly appreciated. Can anyone help with the error above?
Getting predictions for lme() and lmer() is documented on http://glmm.wikidot.com/faq