set number of trees in R ~ Caret package - r

I am currently wondering on the way to set 10 trees using the random forest algorithm from the Caret package, and hope an assistance could be obtained:
below is my syntax:
tr <- trainControl(method = "repeatedcv",number = 20)
fit<-train(y ~.,method="rf",data=example, trControl=tr)
Following researches on http://www.inside-r.org/packages/cran/randomForest/docs/randomForest
Setting either n=10
as argument in randomForest() or n.trees in case of using gbm could have merely helped, but I am interested in the Caret package.
Any feedback would be very appreciated.
Thanks

Caret's train() uses the randomForest() function when you specify method = "rf" in the train call.
You simply need to pass ntree = 10 to train which then will be passed on to randomForest().
Therefore, your call would look like this:
fit <- train(y ~., method="rf",data=example, trControl=tr, ntree = 10)

For interest to anyone in my position who landed here while using ranger method of random forrest (Google still directed me here when specifying "ranger" in my search term) use num.trees.
num.trees = 20

I think ntree is a parameter you are looking for

Related

How to obtain sensitivity when applying cross validation? [duplicate]

This question already has an answer here:
Optimising caret for sensitivity still seems to optimise for ROC
(1 answer)
Closed 2 years ago.
Let's consider data
set.seed(20)
y <- sample(0:1, 100, replace = T)
x <- data.frame(rnorm(100), rexp(100))
I want to perform cross validation and output sensitivity and specificity. I found out that I can provide additional input to train function 'metric' to specify which metric I want to have. SO :
# train the model on training set
library(caret)
cross <- train(as.factor(y) ~ .,
data = cbind(y,x),
metric = 'Sensitivity',
trControl = trainControl(method = "cv", number = 5)
,
method = "glm",
family = binomial()
)
However I see the problem :
The metric "Sensitivity" was not in the result set. Accuracy will be used instead.
Is there any solution how Sensitivity and specificity can be used in cross validation ?
You can subset confusionMatrix() with $ or [] and this will probably give you what you need.
You can also use function like negPredValue() to get Sensitivity and Specificity.
The 'Sensitivity' metric does not exist for train() in caret package.
Since you are using caret, you can find some of the answer in the documentation of this package. It states that the metric parameter is ...
a string that specifies what summary metric will be used to select the
optimal model. By default, possible values are "RMSE" and "Rsquared"
for regression and "Accuracy" and "Kappa" for classification. If
custom performance metrics are used (via the summaryFunction argument
in trainControl, the value of metric should match one of the
arguments. If it does not, a warning is issued and the first metric
given by the summaryFunction is used. (NOTE: If given, this argument
must be named.)
So by default, a 'Sensitivity' metric does not exist. But you can define such a metric yourself. One approach is to use the trainControl function to pass a custom function that calculates sensitivity. See Optimising caret for sensitivity still seems to optimise for ROC for instance.

Does rfeControl function in caret create stratified folds?

I want to do feature selection of my random forrest model following the approach of rfe of the caret package. As my data set contains only about 100 labeled samples and as it is highly unbalanced (which reflects real life balance), I need/want to do stratified cross validation. However, I did not find any documentation about the rfeControl function regarding stratified cross validation.
Does anybody know if the rfeControl function does create stratified folds if I use
ctrl <- rfeControl(functions = rfFuncs,
method = "cv",
verbose = FALSE)
with method ="cv", rfe() should use createFolds() to create your folds, and these will be balanced based on your output variable.
You can see ?createFolds for details on how this is implemented.

How to use size and decay in nnet

I am quite new to the neural network world so I ask for your understanding. I am generating some tests and thus I have a question about the parameters size and decay. I use the caret package and the method nnet. Example dataset:
require(mlbench)
require(caret)
require (nnet)
data(Sonar)
mydata=Sonar[,1:12]
set.seed(54878)
ctrl = trainControl(method="cv", number=10,returnResamp = "all")
for_train= createDataPartition(mydata$V12, p=.70, list=FALSE)
my_train=mydata[for_train,]
my_test=mydata[-for_train,]
t.grid=expand.grid(size=5,decay=0.2)
mymodel = train(V12~ .,data=my_train,method="nnet",metric="Rsquared",trControl=ctrl,tuneGrid=t.grid)
So, two are my questions. First, is this the best way with caret to use the nnet method?Second, I have read about the size and the decay (eg. Purpose of decay parameter in nnet function in R?) but I cannot understand how to use them in practice here. Can anyone help?
Brief Caret explanation
The Caret package lets you train different models and tuning hyper-parameters using Cross Validation (Hold-Out or K-fold) or Bootstrap.
There are two different ways to tune the hyper-parameters using Caret: Grid Search and Random Search. If you use Grid Search (Brute Force) you need to define the grid for every parameter according to your prior knowledge or you can fix some parameters and iterate on the remain ones. If you use Random Search you need to specify a tuning length (maximum number of iterations) and Caret is going to use random values for hyper-parameters until the stop criteria holds.
No matter what method you choose Caret is going to use each combination of hyper-parameters to train the model and compute performance metrics as follows:
Split the initial Training samples into two different sets: Training and Validation (For bootstrap or Cross validation) and into k sets (For k-fold Cross Validation).
Train the model using the training set and to predict on validation set (For Cross Validation Hold-Out and Bootstrap). Or using k-1 training sets and to predict using the k-th training set (For K-fold Cross Validation).
On the validation set Caret computes some performance metrics as ROC, Accuracy...
Once the Grid Search has finished or the Tune Length is completed Caret uses the performance metrics to select the best model according to the criteria previously defined (You can use ROC, Accuracy, Sensibility, RSquared, RMSE....)
You can create some plot to understand the resampling profile and to pick the best model (Keep in mind performance and complexity)
if you need more information about Caret you can check the Caret web page
Neural Network Training Process using Caret
When you train a neural network (nnet) using Caret you need to specify two hyper-parameters: size and decay. Size is the number of units in hidden layer (nnet fit a single hidden layer neural network) and decay is the regularization parameter to avoid over-fitting. Keep in mind that for each R package the name of the hyper-parameters can change.
An example of training a Neural Network using Caret for classification:
fitControl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 5,
classProbs = TRUE,
summaryFunction = twoClassSummary)
nnetGrid <- expand.grid(size = seq(from = 1, to = 10, by = 1),
decay = seq(from = 0.1, to = 0.5, by = 0.1))
nnetFit <- train(Label ~ .,
data = Training[, ],
method = "nnet",
metric = "ROC",
trControl = fitControl,
tuneGrid = nnetGrid,
verbose = FALSE)
Finally, you can make some plots to understand the resampling results. The following plot was generated from a GBM training process
GBM Training Process using Caret

Is there a way to try all feature subsets using neural networks (caret)?

I'm working with caret and the method avNNET. I would like to try all subsets of variables while doing cross validation. So I can determine the best predictors and parameters (like a brute-force approach).
I have used stepAIC with glm, is there something similar?
In the caret manual you will find the "pcaNNet" method, which is Neural Networks with Feature Extraction.
An example using it:
# define training control
train_control <- trainControl(method="repeatedcv", number=10, repeats = 10, classProbs = TRUE)
# train the model
model <- train(Status~., data=My_data, trControl=train_control, method="pcaNNet", metric = "Kappa")
# summarize results
print(model)
# Confusion matrix
model %>% confusionMatrix()

using caret for survival analysis (random survival forest)

Is there a way to use caret for Survival Analysis. I really like how easy to use it is. I tried fitting a random survival forest using the party package, which is on caret's list.
This works:
library(survival)
library(caret)
library(party)
fitcforest <- cforest(Surv(futime, death) ~ sex+age, data=flchain,
controls = cforest_classical(ntree = 1000))
but using caret I get an error:
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",
number = 10,
repeats = 2,
)
cforestfit <- train(Surv(futime, death) ~ sex+age,data=flchain, method="cforest",trControl = fitControl)
I get this error:
Error: nrow(x) == length(y) is not TRUE
Is there a way to make these Surv object work with caret?
Can I use other survival analysis oriented packages with caret?
thanks
Not yet. That is one of two major updates that should be coming soon (the other expands pre-processing).
Contact me offline if you are interested in helping the development and/or testing of those features.
Thanks,
Max
I have found no way to train survival models with caret. As an alternative, the mlr framework (1) has a set of survival learners (2). I have found mlr to be extremely user-friendly and useful.
mlr: http://mlr-org.github.io/mlr-tutorial/release/html/
survival learners in mlr: http://mlr-org.github.io/mlr-tutorial/release/html/integrated_learners/index.html#survival-analysis-15
There is an increasing number of packages in R that model survival data, examples;
For lasso and elastic nets: BioSpear.
For random forest: randomForestSRC.
Best, Loic

Resources