Hyperparameter tuning for neural net (nnet) in caret in R - r

I am constructing a neural net model in R using caret package and my code is as follows:
model <- train(RS_LAI~S2REP_LF+PSRI_ES+IRECI+NDVIRE+B11_ES+B7+TCARI_LF+MCARI+WDRVI
, data = Data,
method = "nnet", trControl = controlparameters,
linout = TRUE)
At the end when the model runs the result I get is the final value of size and decay. Here I suppose, size is the number of hidden layers, but I am confused what is the number of nodes its using in each layer? How can I get that? I think the number of nodes is also an important parameter to tune, but caret doesn't give that option.

You are using nnet , if you read the help page:
Fit single-hidden-layer neural network, possibly with skip-layer
connections.
So it is 1 layer and the size parameter is the number of nodes or units, as you can see from the same help page:
size: number of units in the hidden layer. Can be zero if there are
skip-layer units.
You can try to use neuralnet, it specifies up to 3 layers and your hyperparameters would be the number of nodes in each layer.

Related

R caret "besttune" for CV & repeatedCV

I"m trying to understand how caret is coming to the decision it's making on the best-tuned model. I have looked through the documentation and I have not found (which could easily be my fault) a place to adjust how this decision is made. I'm using something similar to :
train(
y~.,
data=X,
num.trees = 1000,
method = "ranger",
trControl = trainControl(
method = "repeatedcv",
number = 100,
repeats = 100, verboseIter = T
)
I'm trying to use Caret more often, and I'm sure there is a smart way it's making the decision.. I'm just trying to understand how and if I can adjust it.
There is a lot of documentation but the best place to look for your question is here.
Basically, for grid search, multiple combinations of tuning parameters are evaluated using resampling. Each combination gets an associated resampling estimate of performance (let's say it is accuracy).
train() knows that accuracy should be maximized so, by default, it picks the parameter combination with the largest value and uses these to fit one final model (using these values and the entire training set).

GBM model : why is the validation accuracy fluctuating on grid search in h2o R

I'm using the package H2O in R and i'm trying to improve my score with gbm model. I tried a grid search using a training and a valid sets.
But when it finished, the log loss curves between the two sets is very different. Indeed there's overfitting on my train set so the accuracy is higher than my valid set
Here on H2O, my gbm's parameters :
ntrees = 100,
max_depth = 3,
learn_rate = 0.01,
nfolds = 5,
seed = 1234
Could you give me some way to resolve my problems ?
For help on tuning a H2O GBM in R I would recommend reviewing this tuning guide: https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/tutorials/gbm/gbmTuning.Rmd.
There are a lot of reasons you could be seeing overfitting from the predictors you use, the features you engineer, the way you split up your data, and finally the way you tune your model.
Without seeing your specific dataset and the specific code you ran, it would be hard do give you an exact reason for why you are having issues with overfitting.

What is the proper way to use glmnet with caret?

I was reading the glmnet documentation and I found this:
Note also that the results of cv.glmnet are random, since the folds
are selected at random. Users can reduce this randomness by running
cv.glmnet many times, and averaging the error curves.
The following code uses caret with a repeated cv.
library(caret)
ctrl <- trainControl(verboseIter = TRUE, classProbs = TRUE,
summaryFunction = twoClassSummary, method = "repeatedcv",
repeats = 10)
fit <- train(x, y, method = "glmnet", metric = "ROC", trControl = ctrl)
Is that the best way to run glmnet with cross validation through caret?, or is it better to run glmnet directly?
You need to define best way. Do you want to use
A regularized regression alone on a dataset for feature selection? (in which case, use glmnet--Max Kuhn has implied that you may be better off using models with in-built CV features as they would have been optimized for both predictor selection and minimizing error). See below.
"In many cases, using these models with built-in feature selection will be more efficient than algorithms where the search routine for
the right predictors is external to the model. Built-in feature
selection typically couples the predictor search algorithm with the
parameter estimation and are usually optimized with a single
objective function (e.g. error rates or likelihood)." (Kuhn, caret
package documentation: caret feature selection overview)
Or are you comparing different models, one of which is glmnet? In which case, caret may be a great choice.

How can you reduce the default ntree=500 parameter passed to RF from caret?

I believe the "rf" (randomForest) method in caret sets the default number of trees at 500. Unfortunately, this causes the time complexity to grow out of control for larger datasets. Is there any quick way to reduce the number of trees without creating a custom method? I know that the only tuneable parameter for rf is mtry.
Just to clarify: I'm not looking to tune on number of trees. I simply want to fix it to a lower value so that I can run rf in a reasonable amount of time.
You can specify the ntree parameter when you call train like so:
rf <- train(X, y, method="rf", preProcess=c("center","scale"), ntree=100, trControl=fitControl)
One suggestion would be to use the randomForest library. I have always found that one simpler to use than the one in caret, and it has a parameter to set the number of trees.

Recursive feature elimination in 'caret' for 'randomForest': set different ntree parameter for the first forest

I am currently trying to optimize the random forest classifier for a very high-dimensional dataset (p > 200k) using recursive feature elimination (RFE). caret package has a nice implementation for doing this (rfe()-function). However, I am also thinking about optimizing RAM and CPU usage.. That's why I wonder if there is an opportunity to set different (larger) number of trees to train the first forest (without feature elimination) and to use its importances to build the remaining ones (with RFE) using for example 500 trees with 10- or 5-fold cross-validation. I know that this option is available in varSelRF.. But how about caret? I didn't manage to find anything regarding this in the manual.
You can do that. The rfFuncs list has an object called fit that defines how the model is fit. One argument to this function is called 'first' which is TRUE on the first fit (there is also a 'last' arg). You can set ntree based on this.
See the feature selection vignette for more details.
Max

Resources