Using 'Forecast' Function in R in the context of 'neuralnet'? - r

Hello everyone,
I am using a neuralnet package in R. I am using three 'x' variables to predict a 'y' variable. I am dealing with the time series data with 82 observations. I divided the data set into training and test sets. Using 'compute' function I tested the neuralnet fit model on test data.
However, I want to forecast into future periods with no x data at hand. I am trying to use 'forecast' function. Is there any way to use forecast function for neuralnet package in R. I appreciate any help in this regard.
Find the R code below:
## Creating Training (1:72) and Test Slices (73:82)
library(caret)
timeSlices <- createTimeSlices(1:nrow(data1),
initialWindow = 72, horizon = 10, fixedWindow = TRUE)
There are total 82 observations. I used createTimeSlices from caret package to create a training set (1:72) and test set (73:82). Now I use neuralnet packages to train the model as shown below:
library(neuralnet)
set.seed(101)
Model1 <- neuralnet(ppy ~ ppx1+ppx2+ppx3,data=mytraindata,
hidden=4, threshold=0.001, stepmax = 2e+05, rep = 1,
lifesign = "full", lifesign.step = 1000,
algorithm = "rprop+", err.fct = "sse",
act.fct = "logistic", linear.output = TRUE)
Now I predict the model on test set (73:82), with the following code,
Predmodel1 <- compute(Model1, Ntestdata)
Until this point everything is cool. However, I need to forecast the future y variables. I tried to use 'forecast' function to predict y values for the future 7 periods (i.e., periods 83:89) as follows
library(forecast)
predmodel2 <- forecast(Model1$net.result[[1]], 7)
I got the following error:
Error in ets(object, lambda = lambda, allow. multiplicative. trend = allow. multiplicative. trend, : y should be a univariate time series
Any thoughts how to forecast y into 7 periods (83:89) into future? To forecast the future 7 periods, let us assume that the Model1 is trained on observations 1:82. I greatly appreciate your feedback. Thank you.

forecast() Function is only for uni-variate data not for multivariate approaches

Related

Linear Mixed-Effects Models for a big spatial auto-correlated dataset

So, I am working with a big dataset (55965 points). I am trying to run a LME accounting for correlation. But R will return me this
Error: 'sumLenSq := sum(table(groups)^2)' = 3.13208e+09 is too large.
Too large or no groups in your correlation structure?
I can not subset it since I need all the points. My questions are:
Is there some setting I can change in the function?
If not, is there any other package with similar function that would run such a big dataset?
Here is a reproducible example:
require(nlme)
my.data<- matrix(data = 0, nrow = 55965, ncol = 3)
my.data<- as.data.frame(my.data)
dummy <- rep(1, 55965)
my.data$dummy<- dummy
my.data$V1<- seq(780, 56744)
my.data$V2<- seq(1:55965)
my.data$X<- seq(49.708, 56013.708)
my.data$Y<-seq(-12.74094, -55977.7409)
null.model <- lme(fixed = V1~ V2, data = my.data, random = ~ 1 | dummy, method = "ML")
spatial_model <- update(null.model, correlation = corGaus(1, form = ~ X + Y), method = "ML")
Since you have assigned a grouping factor with only one level, there are no groups in the data, which is what the error message reports. If you just want to account for spatial autocorrelation, with no other random effects, use gls from the same package.
Edit: A further note on 2 different approaches to modelling spatial autocorrelation: The corrGauss (and other corrSpatial type functions) implement spatial correlation models for regression residuals, which is different from, say, a spatial random effect added to the model based on county/district/grid identity.

Trying to do cross-validation of a survival tree

I'm trying to do survival analysis using decision trees in rpart, similar to here: Using a survival tree from the 'rpart' package in R to predict new observations . To compare the decision tree survival model to other models, such as Cox regression, I'd like to use cross-validation to get Dxy and compare the c-index. When I try to use validate.rpart with an rpart fit that includes a Surv object I get an error. Borrowing the example from the previous question:
library(rms)
# Make Data:
set.seed(4)
dat = data.frame(X1 = sample(x = c(1,2,3,4,5), size = 100, replace=T))
dat$t = rexp(100, rate=dat$X1)
dat$t = dat$t / max(dat$t)
dat$e = rbinom(n = 100, size = 1, prob = 1-dat$t )
# Survival Fit:
sfit = survfit(Surv(t, event = e) ~ 1, data=dat)
plot(sfit)
# Tree Fit:
require(rpart)
tfit = rpart(formula = Surv(t, event = e) ~ X1 , data = dat, model=TRUE, control=rpart.control(minsplit=30, cp=0.01))
plot(tfit); text(tfit)
validate(tfit)
The error:
Error in unclass(x)[i, , drop = FALSE] :
(subscript) logical subscript too long
Any idea for a workaround for this problem? Is there any other way to get the c-index from an rpart survival model?
The R rms package validate.rpart function does not implement survival models (which are in effect simple exponential distribution models) at present. I have improved the code to do this, and this functionality will be in the next release of the rms package to CRAN in a few weeks. New source code can be obtained at https://github.com/harrelfe/rms by tomorrow but that won't help very much because validate.rpart is a method.
Do note that the sample size for recursive partitioning can be excessive, e.g., 100,000 subjects in some cases, for the regression tree to be reliable and stable.

Calculate Prediction Intervals of a predicted value using Caret package of R

I used different neural network packages within Caret package for my predictions. Code used with nnet package is
library(caret)
# training model using nnet method
data <- na.omit(data)
xtrain <- data[,c("temperature","prevday1","prevday2","prev_instant1","prev_instant2","prev_2_hour")]
ytrain <- data$power
train_model <- train(x = xtrain, y = ytrain, method = "nnet", linout=TRUE, na.action = na.exclude,trace=FALSE)
# prediction using training model created
pred_ob <- predict(train_model, newdata=dframe,type="raw")
The predict function simply calculates the prediction value. But, I also need prediction intervals (2-sigma) as well. On searching, I found a relevant answer at stackoverflow link, but this does not result as needed. The solution suggests to use finalModelvariable as
predict(train_model$finalModel, newdata=dframe, interval = "confidence",type=raw)
Is there any other way to calculate prediction intervals? The training data used is the dput() of my previous question at stackoverflow link and the dput() of my prediction dataframe (test data) is
dframe <- structure(list(temperature = 27, prevday1 = 1607.69296666667,
prevday2 = 1766.18103333333, prev_instant1 = 1717.19306666667,
prev_instant2 = 1577.168915, prev_2_hour = 1370.14983583333), .Names = c("temperature",
"prevday1", "prevday2", "prev_instant1", "prev_instant2", "prev_2_hour"
), class = "data.frame", row.names = c(NA, -1L))
****************************UPDATE***********************
I used nnetpredintpackage as suggested at link. To my surprise it results in an error, which I find difficult to debug. Here is my updated code till now,
library(nnetpredint)
nnetPredInt(train_model, xTrain = xtrain, yTrain = ytrain,newData = dframe)
It results in the following error:
Error: Number of observations for xTrain, yTrain, yFit are not the same
[1] 0
I can check that xtrain, ytrain and dframe are with correct dimensions, but I do not have any idea about yFit. I don't need this according to the examples of nnetpredintvignette
caret doesn't generate prediction intervals; that relies on the individual package. If that package cannot do this, then neither can the train objects. I agree that nnetPredInt is the appropriate way to go.
Two other notes:
you most likely should center and scale your data if you have not already.
using the finalModel object is somewhat dangerous since it has no idea what was done to the data (e.g. dummy variables, centering and scale or other preprocessing methods, etc) before it was created.
Max
Thanks for your question. And a simple answer to your problem is: Right now the nnetPredInt function only support the following S3 object, "nnet", "nn" and "rsnns", produced by different neural network packages. And the train function in caret package return an "train" object. That's why the function nnetPredInt doesn't get the yFit vectors, which is the fitted.value of the training datasets, from your train_model.
1.Quick way to use the model from caret package:
Get the finalModel result from the 'train' object:
nnetObj = train_model$finalModel # return the 'nnet' model which the caret package has found.
yPredInt = nnetPredInt(nnetObj, xTrain = xtrain, yTrain = ytrain,newData = dframe)
For Example, Use the Iris Dataset and the 'nnet' method from caret package for regression prediction.
library(caret)
library(nnetpredint)
# Setosa 0 and Versicolor 1
ird <- data.frame(rbind(iris3[,,1], iris3[,,2]), species = c(rep(0, 50), rep(1, 50)))
samp = sample(1:100, 80)
xtrain = ird[samp,][1:4]
ytrain = ird[samp,]$species
# Training
train_model <- train(x = xtrain, y = ytrain, method = "nnet", linout = FALSE, na.action = na.exclude,trace=FALSE)
class(train_model) # [1] "train"
nnetObj = train_model$finalModel
class(nnetObj) # [1] "nnet.formula" "nnet"
# Constructing Prediction Interval
xtest = ird[-samp,][1:4]
ytest = ird[-samp,]$species
yPredInt = nnetPredInt(nnetObj, xTrain = xtrain, yTrain = ytrain,newData = xtest)
# Compare Results: ytest and yPredInt
ytest
yPredInt
2.The Hard Way
Use the generic nnetPredInt function to pass all the neural net specific parameters to the function:
nnetPredInt(object = NULL, xTrain, yTrain, yFit, node, wts, newData,alpha = 0.05 , lambda = 0.5, funName = 'sigmoid', ...)
xTrain # Training Dataset
yTrain # Training Target Value
yFit # Fitted Value of the training data
node # Structure of your network, like c(4,5,5,1)
wts # Specific order of weights parameters found by your neural network
newData # New Data for prediction
Tips:
Right now nnetpredint package only support the standard multilayer neural network regression with activated output, not the linear output,
And it will support more type of models soon in the future.
You can use the nnetPredInt function {package:nnetpredint}. Check out the function's help page here
If you are open to writing your own implementation there is another option. You can get prediction intervals from a trained net using the same implementation you would write for standard non-linear regression (assuming back propagation was used to do the estimation).
This paper goes through the methodology and is fairly straight foward: http://www.cis.upenn.edu/~ungar/Datamining/Publications/yale.pdf.
There are, as with everything,some cons (outlined in the paper) to this approach but definitely worth knowing as an option.

caret::train: specify training data parameters

I am designing a neural network model that predicts estimation of van genuchten water retention parameters (theta_r, thera_s, alpha, n) using limited to more extended input data like texture, bulk density, and one or two water retention. Investigating neural networks in R project I found RSNNS package and I create and train multiple multi-layer perceptrons (MLPs) with tuning on the number of hidden units and the learning rate. The general performance characterized with training and testing RMSEs for these models is really poor and random, in fact, i used log-transformed values of alpha and n parameters to avoid bias and account for their approximately lognormal distributions but this does not help much :(. I was recommended to work with nnet and caret package but I've had trouble adapting the code i don't know what I'm doing wrong, any suggestion?
#input dataset
basic <- read.table(url("https://dl.dropboxusercontent.com/s/m8qe4k5swz1m3ij/basic.txt?dl=1&token_hash=AAH6Z3d6fWTLoQZYi04Ys72sdufdERE5gm4v7eF0cgMlkQ"), header=T, sep=" ")
#output dataset
fitted <- read.table(url("https://dl.dropboxusercontent.com/s/rjx745ej80osbbu/fitted.txt?dl=1&token_hash=AAHP1zcPQyw4uSe8rw8swVm3Buqe3TP7I1j-4_SOeeUTvw"), header=T, sep=" ")
# Use log-transformed values of alpha and n output parameters
fitted$alpha <- log(fitted$alpha)
fitted$n <- log(fitted$n)
#Fit model with caret package
library(caret)
model <- train(x = basic, y = fitted, method='nnet', linout=TRUE, trace = FALSE,
#Grid of tuning parameters to try:
tuneGrid=expand.grid(.size=c(1,5,10),.decay=c(0,0.001,0.1)))
caret is just a wrapper to the algorithms it is calling so you can specify any parameter in the algorith even if it is not an option in caret's tuning grid. This is accomplishing via the "..." in caret's train() function, which is basically saying that you can pass any extra parameters into the method you are calling. I'm not sure what parameters you want to adjust to your nnet call (and I'm getting errors accessing your dropbox data) so here is a trivial example passing in specific values to maxit and Hess:
> library(caret)
> m1 <- train(Species~.,data=iris, method='nnet', linout=TRUE, trace = FALSE,trControl=trainControl("cv"))
> #this time pass in values for maxint and Hess
> m2 <- train(Species~.,data=iris, method='nnet', linout=TRUE, trace = FALSE,trControl=trainControl("cv"),maxint=10,Hess=T)
> m1$finalModel$call
nnet.formula(formula = modFormula, data = data, size = tuneValue$.size,
decay = tuneValue$.decay, linout = TRUE, trace = FALSE)
> m2$finalModel$call
nnet.formula(formula = modFormula, data = data, size = tuneValue$.size,
decay = tuneValue$.decay, linout = TRUE, trace = FALSE, maxint = 10,
Hess = ..4)

SARIMAX model in R

I would fit a SARIMAX model with temperature as exogenous variable in R. Can I do that with xreg function present in the package TSA?
I thought to fit the model as:
fit1 = arima(x, order=c(p,d,q), seasonal=list(order=c(P,D,Q), period=S), xreg=temp)
is that correct or I have to use other function of R?
if it itsn't correct: which steps should I use?
Thanks.
Check out the forecast package, it's great:
# some random data
x <- ts(rnorm(120,0,3) + 1:120 + 20*sin(2*pi*(1:120)/12), frequency=12)
temp = rnorm(length(x), 20, 30)
require(forecast)
# build the model (check ?auto.arima)
model = auto.arima(x, xreg = data.frame(temp = temp))
# some random predictors
temp.reg = data.frame(temp = rnorm(10, 20, 30))
# forecasting
forec = forecast(model, xreg = temp.reg)
# quick way to visualize things
plot(forec)
# model diagnosis
tsdiag(model)
# model info
summary(forec)
I won't suggest you to use auto.arima(). Depending on the model you want to fit it may return poor results, as for example when working with some complex SARIMA models the difference between the models done manually and with auto.arima() were noticeable, auto.arima() do not even returned white noise innovations (as it is expected), while manual fits, of course, did.

Resources