I want to train holt model on training data, and then make a forecast on test data without estimating parameters again. On several sites I found this solution, but for ARIMA model:
library(forecast)
m_train <- holt(rnorm(500)+runif(500))
m_test <- holt(rnorm(100), model = m_train)
m_train$model
m_test$model
but when I look at the print I can see that smoothing parameters are different for both models.
Related
I only have a small dataset of 30 samples, so I only have a training data set but no test set. So I want to use cross-validation to assess the model. I have run pls models in r using cross-validation and LOO. The mvr output has the fitted values and validation$preds values, and these are different. As the final results of R2 and RMSE for just the training set should I be using the final fitted values or the validation$preds values?
Short answer is if you want to know how good the model is at predicting, you will use the validation$preds because it is tested on unseen data. The values under $fitted.values are obtained by fitting the final model on all your training data, meaning the same training data is used in constructing model and prediction. So values obtained from this final fit, will underestimate the performance of your model on unseen data.
You probably need to explain what you mean by "valid" (in your comments).
Cross-validation is used to find which is the best hyperparameter, in this case number of components for the model.
During cross-validation one part of the data is not used for fitting and serves as a test set. This actually provides a rough estimate the model will work on unseen data. See this image from scikit learn for how CV works.
LOO works in a similar way. After finding the best parameter supposedly you obtain a final model to be used on the test set. In this case, mvr trains on all models from 2-6 PCs, but $fitted.values is coming from a model trained on all the training data.
You can also see below how different they are, first I fit a model
library(pls)
library(mlbench)
data(BostonHousing)
set.seed(1010)
idx = sample(nrow(BostonHousing),400)
trainData = BostonHousing[idx,]
testData = BostonHousing[-idx,]
mdl <- mvr(medv ~ ., 4, data = trainData, validation = "CV",
method = "oscorespls")
Then we calculate mean RMSE in CV, full training model, and test data, using 4 PCs:
calc_RMSE = function(pred,actual){ mean((pred - actual)^2)}
# error in CV
calc_RMSE(mdl$validation$pred[,,4],trainData$medv)
[1] 43.98548
# error on full training model , not very useful
calc_RMSE(mdl$fitted.values[,,4],trainData$medv)
[1] 40.99985
# error on test data
calc_RMSE(predict(mdl,testData,ncomp=4),testData$medv)
[1] 42.14615
You can see the error on cross-validation is closer to what you get if you have test data. Again this really depends on your data.
In common with other machine learning methods, I divided my original data set (7-training data set: 3-test data set).
Here is my code.
install.packages(randomForestSRC)
library(randomForestSRC)
data(pbc, package="randomForestSRC")
data <- na.omit(pbc)
train <- sample(1:nrow(data), round(nrow(data) * 0.70))
data.grow <- rfsrc(Surv(days, status) ~ .,
data[train, ],
ntree = 100,
tree.err=T,
importance=T,
nsplit=1,
proximity=T)
data.pred <- predict(data.grow,
data[-train , ],
importance=T,
tree.err=T)
What I have a question is that predict function in this code.
Originally, I wanted to construct a prediction model based on random survival forest to predict the diseae development.
For example, After I build the prediction model with training data set, I wanted to know the probability of disease development with test data which has no information about disease incidence for each individual becuase I would like to know the probability of diease development based on the subject's general characteristics such as age, bmi, sex, something like that.
However, unlike my intention to build a predicion model as I said above, "predict" function in this package didn't work based on the data which has no status information (event/censored).
"predict" function must work with outcome information (event/censored).
Therefore, I cannot understand what the "predict" function means.
If "precict" function works only with oucome information, then how can I make a predction for disease development based on the subject's general characteristics in the future?
In addition, if the prediction in this model is constructed with the outcome information, what the meaning is "predct" in the random survival forest model.
Please let me know what the "predict" function in this package means is.
Thank you for reading my long question.
The predict for this type of model, i.e. predict.rfsrc, works much like you'd expect it to if you've used predict with glm, lm, RRF or other models.
The predict statement does not require you to know the outcome for the prediction data set. I am trying to understand why you thought that it did.
Your example rfsrc statement does not work because it refers to columns that are not in the example data set.
I think the best plan is that I will show you using a reproducible example, below. If you have further questions you can ask me in a comment.
# Train a RFSRC model
mtcars.mreg <- rfsrc(Surv(mpg, cyl) ~., data = mtcars[1:30,],
tree.err=TRUE, importance = TRUE)
# Simulate new data
new_data <- mtcars[31:32,]
# predict
predicted <-predict(mtcars.mreg, new_data)
predicted
Sample size of test (predict) data: 2
Number of grow trees: 1000
Average no. of grow terminal nodes: 4.898
Total no. of grow variables: 9
Analysis: RSF
Family: surv-CR
Test set error rate: NA
predicted$predicted
event.1 event.2 event.3
[1,] 0.4781338 2.399299 14.71493
[2,] 3.2185606 4.720809 2.15895
library(forecast)
data(Nile,package="datasets")
train=Nile[1:50]##I want to use this to train model
test<-Nile[51:length(Nile)]
m1<-auto.arima(Nile)
My question is that now I got a arima model, and how can I use this model combined with the old data(train) to forecast the value in test in dynamics. What I want is like in an OLS regression, I got a model, then I can use other data to test this model. Finally I can draw a picture.
I built a coxph model on training data and I am having trouble in using it for predictions on the validation dataset. I used Surv to build a survival object and used it as the response variable in coxph model. I then used predict to get predictions on both the training and test dataset but at both times it predicted using on training data.
my.surv<-Surv(train$time_to_event, train$event, type="right")
basic.coxph<-coxph(my.surv~train$basin+train$region+train$district_code)
prediction_train<-predict(basic.coxph,train,type="risk")
prediction_test<-predict(object=basic.coxph,newdata=validate,type="risk")
The results of both prediction_train and prediction_test have the same dimensions and are exactly the same, though the training and validation dataset have different dimensions but al the same columns. Any suggestion on what I am doing wrong here?
Is there a way to create a holdout/back test sample in following ARIMA model with exogenous regressors. Lets say I want to estimate the following model using the first 50 observations and then evaluate model performance on the remaining 20 observations where the x-variables are pre-populated for all 70 observations. What I really want at the end is a graph that plots actual and fitted values in development period and validation/hold out period (also known as Back Testing in time series)
library(TSA)
xreg <- cbind(GNP, Time_Scaled_CO) # two time series objects
fit_A <- arima(Charge_Off,order=c(1,1,0),xreg) # Charge_Off is another TS object
plot(Charge_Off,col="red")
lines(predict(fit_A, Data),col="green") #Data contains Charge_Off, GNP, Time_Scaled_CO
You don't seem to be using the TSA package at all, and you don't need to for this problem. Here is some code that should do what you want.
library(forecast)
xreg <- cbind(GNP, Time_Scaled_CO)
training <- window(Charge_Off, end=50)
test <- window(Charge_Off, start=51)
fit_A <- Arima(training,order=c(1,1,0),xreg=xreg[1:50,])
fc <- forecast(fit_A, h=20, xreg=xreg[51:70,])
plot(fc)
lines(test, col="red")
accuracy(fc, test)
See http://otexts.com/fpp/9/1 for an intro to using R with these models.