I have fitted an Averaged neural network in R with Caret. See code below. Does the term Averaged mean that the average is based on the outcomes of 1000 neural networks? (since there are 1000 iterations in this case)
Thanks.
library(AppliedPredictiveModeling)
data(solubility)
### Create a control funciton that will be used across models. We
### create the fold assignments explictily instead of relying on the
### random number seed being set to identical values.
library(caret)
set.seed(100)
indx <- createFolds(solTrainY, returnTrain = TRUE)
ctrl <- trainControl(method = "cv", index = indx)
################################################################################
### Section 7.1 Neural Networks
### Optional: parallel processing can be used via the 'do' packages,
### such as doMC, doMPI etc. We used doMC (not on Windows) to speed
### up the computations.
### WARNING: Be aware of how much memory is needed to parallel
### process. It can very quickly overwhelm the availible hardware. We
### estimate the memory usuage (VSIZE = total memory size) to be
### 2677M/core.
library(doMC)
registerDoMC(10)
library(caret)
nnetGrid <- expand.grid(decay = c(0, 0.01, .1),
size = c(1, 3, 5, 7, 9, 11, 13),
bag = FALSE)
set.seed(100)
nnetTune <- train(x = solTrainXtrans, y = solTrainY,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = ctrl,
preProc = c("center", "scale"),
linout = TRUE,
trace = FALSE,
MaxNWts = 13 * (ncol(solTrainXtrans) + 1) + 13 + 1,
maxit = 1000,
allowParallel = FALSE)
nnetTune
plot(nnetTune)
testResults <- data.frame(obs = solTestY,
NNet = predict(nnetTune, solTestXtrans))
################################################################################
See also:
https://scientistcafe.com/post/nnet.html
avNNet is a model where the same neural network model is fit using different random number seeds. All the resulting models are used for prediction. For regression, the output from each network are averaged. For classification, the model scores are first averaged, then translated to predicted classes. Source.
The number of models fit is controlled by the argument repeats which is passed down to the model in caret via ...
repeats - the number of neural networks with different random number seeds. At default this is set to 5. So five models will be averaged. In caret's definition of the model I do not see this changing.
If the bag argument is set to TRUE model fitting and aggregation is performed by bootstrap aggregation, which in my opinion is almost guaranteed to provide better predictive performance if the number of models is high enough.
Related
I have a few questions about the difference between Rpart and Caret (using Rpart):
When using Rpart to fit a decision tree, calling dt$cptable displays a table of complexity parameters and their associated cross-validation errors. When pruning a tree, we would want to select the CP with the lowest cross-validation error. How are these cross-validation errors calculated? In reading Rpart's vignette, it seems like RPart does the following:
a) Fits the full tree based on the user-specified parameters. As the tree is being built, the algorithm calculates the complexity parameter at each split
b) The algorithm then splits the data into k folds, and for each CP, basically just performs cross-validation using these folds. Then it calculates the average error across all of the folds to get the 'xerror' output we see in CP$table
If we were to use caret with cross validation to find the optimal tree, how is it running? Basically, is the algorithm splitting the dataset into k folds, then calling the Rpart function, and for each call of the Rpart function doing the same thing described in point 1 above? In other words, is it using cross-validation within cross-validation, whereas Rpart is just using cross-validation once?
Below is some code, even though I'm asking more about how the algorithm functions, maybe it will be useful:
library(rpart)
library(rpart.plot)
library(caret)
set.seed(100)
data.class <- data.termlife[, 2:ncol(data.termlife)]
data.class$TERM_FLAG <- as.factor(data.class$TERM_FLAG)
train.indices <- createDataPartition(data.class$TERM_FLAG, p = .8, list = FALSE)
data.class.t <- data.class[train.indices, ]
data.class.v <- data.class[-train.indices, ]
#Using Rpart
rpart.ctrl <- rpart.control(minsplit = 5, minbucket = 5, cp = .01)
f <- as.formula(paste0("TERM_FLAG ~ ", paste0(names(data.class.t)[2:9], collapse = "+")))
dt <- rpart(formula = f, data = data.class.t, control = rpart.ctrl, parms = list(split = "gini"))
cp.best.rpart <- dt$cptable[which.min(dt$cptable[, "xerror"]), "CP"]
#Using Caret
train.ctrl <- trainControl(method = "cv", number = 10)
tGrid <- expand.grid(cp = seq(0, .02, .0001))
dt.caret <- train(form = f, data = data.class.t, method = "rpart", metric = "Accuracy", trControl = train.ctrl, tuneGrid = tGrid)
cp.best.caret <- dt.caret$bestTune$cp
print(paste("Rpart's best CP: ", cp.best.rpart))
print(paste("Caret's best CP: ", cp.best.caret))
[1] "Rpart's best CP: 0.0194444444444444"
[1] "Caret's best CP: 0.02"
The results are very similar, so when would you ever want to use Caret with Rpart? Thank you!!!
I'm trying to implement some functions to compare five different machine learning models to predict some values in a regression problem.
My intention is working on a suit of functions that could train the different codes and organize them in a suit of results. The models I select by instance are: Lasso, Random Forest, SVM, Linear Model and Neural Network. To tune some models I intend to use the references of Max Kuhn: https://topepo.github.io/caret/available-models.html.
However, since each model requires different tuning parameters, I'm in doubt how to set them:
First I set up the grid to 'nnet' model tunning. Here I selected different number of nodes in hidden layer and the decay coefficient:
my.grid <- expand.grid(size=seq(from = 1, to = 10, by = 1), decay = seq(from = 0.1, to = 0.5, by = 0.1))
Then I construct the functions that will run the five models 5 times in a 6-fold configuration:
my_list_model <- function(model) {
set.seed(1)
train.control <- trainControl(method = "repeatedcv",
number = 6,
repeats = 5,
returnResamp = "all",
savePredictions = "all")
# The tunning configurations of machine learning models:
set.seed(1)
fit_m <- train(ST1 ~.,
data = train, # my original dataframe, not showed in this code
method = model,
metric = "RMSE",
preProcess = "scale",
trControl = train.control
linout = 1 # linear activation function output
trace = FALSE
maxit = 1000
tuneGrid = my.grid) # Here is how I call the tune of 'nnet' parameters
return(fit_m)
}
Lastly, I execute the five models:
lapply(list(
Lass = "lasso",
RF = "rf",
SVM = "svmLinear",
OLS = "lm",
NN = "nnet"),
my_list_model) -> model_list
However, when I run this, it shows:
Error: The tuning parameter grid should not have columns fraction
By what I understood, I didn't know how to specify very well the tune parameters. If I try to throw away the 'nnet' model and change it, for example, to a XGBoost model, in the penultimate line, it seems it works well and results would be calculated. That is, it seems the problem is with the 'nnet' tuning parameters.
Then, I think my real question is: how to configure these different parameters of models, in special the 'nnet' model. In addition, since I didn't need to set up the parameters of lasso, random forest, svmLinear and linear model, how were they tuned by the caret package?
my_list_model <- function(model,grd=NULL){
train.control <- trainControl(method = "repeatedcv",
number = 6,
returnResamp = "all",
savePredictions = "all")
# The tuning configurations of machine learning models:
set.seed(1)
fit_m <- train(Y ~.,
data = df, # my original dataframe, not showed in this code
method = model,
metric = "RMSE",
preProcess = "scale",
trControl = train.control,
linout = 1, # linear activation function output
trace = FALSE,
maxit = 1000,
tuneGrid = grd) # Here is how I call the tune of 'nnet' parameters
return(fit_m)
}
first run below code and see all the related parameters
modelLookup('rf')
now make grid of all models based on above lookup code
svmGrid <- expand.grid(C=c(3,2,1))
rfGrid <- expand.grid(mtry=c(5,10,15))
create a list of all model's grid and make sure the name of model is same as name in the list
grd_all<-list(svmLinear=svmGrid
,rf=rfGrid)
model_list<-lapply(c("rf","svmLinear")
,function(x){my_list_model(x,grd_all[[x]])})
model_list
[[1]]
Random Forest
17 samples
3 predictor
Pre-processing: scaled (3)
Resampling: Cross-Validated (6 fold, repeated 1 times)
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ...
Resampling results across tuning parameters:
mtry RMSE Rsquared MAE
5 63.54864 0.5247415 55.72074
10 63.70247 0.5255311 55.35263
15 62.13805 0.5765130 54.53411
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 15.
[[2]]
Support Vector Machines with Linear Kernel
17 samples
3 predictor
Pre-processing: scaled (3)
Resampling: Cross-Validated (6 fold, repeated 1 times)
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ...
Resampling results across tuning parameters:
C RMSE Rsquared MAE
1 59.83309 0.5879396 52.26890
2 66.45247 0.5621379 58.74603
3 67.28742 0.5576000 59.55334
RMSE was used to select the optimal model using the smallest value.
The final value used for the model was C = 1.
I am training a SVM with radial kernel on a big dataset of 244058 customers for churn prediction. I do k=5 cross validation and want to train on the basis of a custom grid. I use parallel processing with 5 cores, and still it takes many hours to run the model and I just quit it because it takes so much time and my computer heats up. I have 4 cores CPU and 12 gb ram, so I do not think the memory or processing is the problem. I also scaled and centered the data, tried PCA and even removed near zero variance predictors and removed correlated variables. The code is shown below. Any tips? I have also tried RapidMiner instead of R, but the educational license only allows the use of 1 CPU core.
doParallel::registerDoParallel(cores = 5)
svmradialfit <- train(CHURN ~ ., data = train2scale[,-c(1,11, 13:14,16:17)],
method = 'svmRadial' , metric = 'AUC', maximize = TRUE, tuningGrid = expand.grid(C=100, sigma = 0.01) ,trControl = trainControl(method = 'cv', number = 5, verboseIter = TRUE, classProbs = TRUE, savePredictions = TRUE, preProcOptions = 'nzv' ,summaryFunction = prSummary, allowParallel = TRUE) )
I would like to study the optimal tradeoff between bias/variance for model tuning. I'm using caret for R which allows me to plot the performance metric (AUC, accuracy...) against the hyperparameters of the model (mtry, lambda, etc.) and automatically chooses the max. This typically returns a good model, but if I want to dig further and choose a different bias/variance tradeoff I need a learning curve, not a performance curve.
For the sake of simplicity, let's say my model is a random forest, which has just one hyperparameter 'mtry'
I would like to plot the learning curves of both training and test sets. Something like this:
(red curve is the test set)
On the y axis I put an error metric (number of misclassified examples or something like that); on the x axis 'mtry' or alternatively the training set size.
Questions:
Has caret the functionality to iteratively train models based of training set folds different in size? If I have to code by hand, how can I do that?
If I want to put the hyperparameter on the x axis, I need all the models trained by caret::train, not just the final model (the one with maximum performance got after CV). Are these "discarded" model still available after train?
Caret will iteratively test lots of cv models for you if you set the
trainControl() function and the parameters (e.g. mtry) using a tuneGrid().
Both of these are then passed as control options to the train()
function. The specifics of the tuneGrid parameters (e.g. mtry, ntree) will be different for each
model type.
Yes the final trainFit model will contain the error rate (however you specified it) for all folds of your CV.
So you could specify e.g. a 10-fold CV times a grid with 10 values of mtry -which would be 100 iterations. You might want to go get a cup of tea or possibly lunch.
If this sounds complicated ... there is a very good example here - caret being one of the best documented packages about.
Here's my code on how I approached this issue of plotting a learning curve in R while using the Caret package to train your model. I use the Motor Trend Car Road Tests in R for illustrative purposes. To begin, I randomize and split the mtcars dataset into training and test sets. 21 records for training and 13 records for the test set. The response feature is mpg in this example.
# set seed for reproducibility
set.seed(7)
# randomize mtcars
mtcars <- mtcars[sample(nrow(mtcars)),]
# split iris data into training and test sets
mtcarsIndex <- createDataPartition(mtcars$mpg, p = .625, list = F)
mtcarsTrain <- mtcars[mtcarsIndex,]
mtcarsTest <- mtcars[-mtcarsIndex,]
# create empty data frame
learnCurve <- data.frame(m = integer(21),
trainRMSE = integer(21),
cvRMSE = integer(21))
# test data response feature
testY <- mtcarsTest$mpg
# Run algorithms using 10-fold cross validation with 3 repeats
trainControl <- trainControl(method="repeatedcv", number=10, repeats=3)
metric <- "RMSE"
# loop over training examples
for (i in 3:21) {
learnCurve$m[i] <- i
# train learning algorithm with size i
fit.lm <- train(mpg~., data=mtcarsTrain[1:i,], method="lm", metric=metric,
preProc=c("center", "scale"), trControl=trainControl)
learnCurve$trainRMSE[i] <- fit.lm$results$RMSE
# use trained parameters to predict on test data
prediction <- predict(fit.lm, newdata = mtcarsTest[,-1])
rmse <- postResample(prediction, testY)
learnCurve$cvRMSE[i] <- rmse[1]
}
pdf("LinearRegressionLearningCurve.pdf", width = 7, height = 7, pointsize=12)
# plot learning curves of training set size vs. error measure
# for training set and test set
plot(log(learnCurve$trainRMSE),type = "o",col = "red", xlab = "Training set size",
ylab = "Error (RMSE)", main = "Linear Model Learning Curve")
lines(log(learnCurve$cvRMSE), type = "o", col = "blue")
legend('topright', c("Train error", "Test error"), lty = c(1,1), lwd = c(2.5, 2.5),
col = c("red", "blue"))
dev.off()
The output plot is as shown below:
At some point, probably after this question was asked, the caret package added the learning_curve_dat function which helps assess model performance across a range of training set sizes.
Here is the example from the function documentation:
library(caret)
set.seed(1412)
class_dat <- twoClassSim(1000)
set.seed(29510)
lda_data <- learning_curve_dat(dat = class_dat,
outcome = "Class",
test_prop = 1/4,
## `train` arguments:
method = "lda",
metric = "ROC",
trControl = trainControl(classProbs = TRUE,
summaryFunction = twoClassSummary))
ggplot(lda_data, aes(x = Training_Size, y = ROC, color = Data)) +
geom_smooth(method = loess, span = .8)
The performance metric(s) are found for each Training_Size and saved in lda_data along with the Data variable ("Resampling", "Training", and optionally "Testing").
Here is a link to the function documentation: https://rdrr.io/cran/caret/man/learning_curve_dat.html
To be clear, this answers the first part of the question but not the second part.
NOTE Before at least August 2020 there was a typo in the caret package code and documentation. The function call was learing_curve_dat before it was corrected to learning_curve_dat. I've updated my answer to reflect this change. Make sure you are using a recent version of the caret package.
I am designing a neural network model that predicts estimation of van genuchten water retention parameters (theta_r, thera_s, alpha, n) using limited to more extended input data like texture, bulk density, and one or two water retention. Investigating neural networks in R project I found RSNNS package and I create and train multiple multi-layer perceptrons (MLPs) with tuning on the number of hidden units and the learning rate. The general performance characterized with training and testing RMSEs for these models is really poor and random, in fact, i used log-transformed values of alpha and n parameters to avoid bias and account for their approximately lognormal distributions but this does not help much :(. I was recommended to work with nnet and caret package but I've had trouble adapting the code i don't know what I'm doing wrong, any suggestion?
#input dataset
basic <- read.table(url("https://dl.dropboxusercontent.com/s/m8qe4k5swz1m3ij/basic.txt?dl=1&token_hash=AAH6Z3d6fWTLoQZYi04Ys72sdufdERE5gm4v7eF0cgMlkQ"), header=T, sep=" ")
#output dataset
fitted <- read.table(url("https://dl.dropboxusercontent.com/s/rjx745ej80osbbu/fitted.txt?dl=1&token_hash=AAHP1zcPQyw4uSe8rw8swVm3Buqe3TP7I1j-4_SOeeUTvw"), header=T, sep=" ")
# Use log-transformed values of alpha and n output parameters
fitted$alpha <- log(fitted$alpha)
fitted$n <- log(fitted$n)
#Fit model with caret package
library(caret)
model <- train(x = basic, y = fitted, method='nnet', linout=TRUE, trace = FALSE,
#Grid of tuning parameters to try:
tuneGrid=expand.grid(.size=c(1,5,10),.decay=c(0,0.001,0.1)))
caret is just a wrapper to the algorithms it is calling so you can specify any parameter in the algorith even if it is not an option in caret's tuning grid. This is accomplishing via the "..." in caret's train() function, which is basically saying that you can pass any extra parameters into the method you are calling. I'm not sure what parameters you want to adjust to your nnet call (and I'm getting errors accessing your dropbox data) so here is a trivial example passing in specific values to maxit and Hess:
> library(caret)
> m1 <- train(Species~.,data=iris, method='nnet', linout=TRUE, trace = FALSE,trControl=trainControl("cv"))
> #this time pass in values for maxint and Hess
> m2 <- train(Species~.,data=iris, method='nnet', linout=TRUE, trace = FALSE,trControl=trainControl("cv"),maxint=10,Hess=T)
> m1$finalModel$call
nnet.formula(formula = modFormula, data = data, size = tuneValue$.size,
decay = tuneValue$.decay, linout = TRUE, trace = FALSE)
> m2$finalModel$call
nnet.formula(formula = modFormula, data = data, size = tuneValue$.size,
decay = tuneValue$.decay, linout = TRUE, trace = FALSE, maxint = 10,
Hess = ..4)