caret::train: specify further non-tuning parameters for mlpWeightDecay (RSNNS package) - r

I have a problem with specifying the learning rate using the caret package with the method "mlpWeightDecay" from RSNNS package.
The tuning parameters of "mlpWeightDecay" are size and decay.
An example leaving size constant at 4 and tuning decay over c(0,0.0001, 0.001, 0.002):
data(iris)
TrainData <- iris[,1:4]
TrainClasses <- iris[,5]
fit1 <- train(TrainData, TrainClasses,
method = "mlpWeightDecay",
preProcess = c("center", "scale"),
tuneGrid=expand.grid(.size = 4, .decay = c(0,0.0001, 0.001, 0.002)),
trControl = trainControl(method = "cv")
)
But I also want to manipulate the learning rate of the model and not just taking the default learning rate of 0.2.
I know that I can use further arguments of the mlpWeightDecay method from RSNNS via the "..." parameter.
"learnFuncParams" would be the RSNNS parameter I would need to insert. It takes 4 parameters (learning rate, weight decay, dmin, dmax).
Going on with the example it looks like this:
fit1 <- train(TrainData, TrainClasses,
method = "mlpWeightDecay",
preProcess = c("center", "scale"),
tuneGrid=expand.grid(.size = 4, .decay = c(0,0.0001, 0.001, 0.002)),
trControl = trainControl(method = "cv"),
learnFuncParams=c(0.4,0,0,0)
)
BUT the documentation of the caret train function tells me for the "..." parameter:
arguments passed to the classification or regression routine (such as randomForest). Errors will occur if values for tuning parameters are passed here.
The problem is that one of the 4 "learningFuncParams" parameters (weight decay) IS a tuning parameter.
Consequently I get an error and warnings:
Error in train.default(TrainData, TrainClasses, method = "mlpWeightDecay", :
final tuning parameters could not be determined
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Warning messages:
1: In method$fit(x = if (!is.data.frame(x)) as.data.frame(x) else x, ... :
Over-riding weight decay value in the 'learnFuncParams' argument you passed in. Other values are retained
2: In eval(expr, envir, enclos) :
model fit failed for Fold01: size=4, decay=0e+00 Error in mlp.default(x = structure(list(Sepal.Length = c(-0.891390168709482, :
formal argument "learnFuncParams" matched by multiple actual arguments
How can I set the learning rate without coming in conflicts with the tuning parameter "decay" if both is set in the same parameter "learningFuncParams"?
Thanks!

It looks like you can specify your own learnFuncParams in "...". caret checks if you've provided your own set of parameters and will only override learnFuncParams[3] (which is the decay). It will take the learnFuncParams[1,2,4] that you have provided.
A very convenient way to find out what caret does is to type getModelInfo("mlpWeightDecay") and then scroll up to the $mlpWeightDecay$fit part. It shows how caret will call the real training function:
$mlpWeightDecay$fit
if (any(names(theDots) == "learnFuncParams")) {
prms <- theDots$learnFuncParams
prms[3] <- param$decay
warning("Over-riding weight decay value in the 'learnFuncParams' argument you passed in. Other values are retained")
}
It checks if you've provided your own learnFuncParams. If you did, it uses it, but inserts its own decay. You can ignore the warning.
I think the error you've got ("final tuning parameters could not be determined") has another reason. Have you tried a lower learning rate?

Related

Errors with dredge() function in MuMin

I'm trying to use the dredge() function to evaluate models by completing every combination of variables (up to five variables per model) and comparing models using AIC corrected for small sample size (AICc).
However, I'm presented with one error and two warning messages as follows:
Fixed term is "(Intercept)"
Warning messages: 1: In dredge(MaxN.model,
m.min = 2, m.max = 5) : comparing models fitted by REML 2: In
dredge(MaxN.model, m.min = 2, m.max = 5) : arguments 'm.min' and
'm.max' are deprecated, use 'm.lim' instead
I've tried changing to 'm.lim' as specified but it comes up with the error:
Error in dredge(MaxN.model, m.lim = 5) : invalid 'm.lim' value In
addition: Warning message: In dredge(MaxN.model, m.lim = 5) :
comparing models fitted by REML
The code I'm using is:
MaxN.model<-lme(T_MaxN~Seagrass.cover+composition.pca1+composition.pca2+Sg.Richness+traits.pca1+
land.use.pc1+land.use.pc2+seascape.pc2+D.landing.site+T_Depth,
random=~1|site, data = sgdf, na.action = na.fail, method = "REML")
Dd_MaxN<-dredge(MaxN.model, m.min = 2 , m.max = 5)
What am I doing wrong?
You didn't tell us what you tried to specify for m.lim. ?dredge says:
m.lim ...optionally, the limits ‘c(lower, upper)’ for number of terms in a single model
so you should specify a two-element numeric (integer) vector.
You should definitely be using method="ML" rather than method="REML". The warning/error about REML is very serious; comparing models with different fixed effects that are fitted via REML will lead to nonsense.
So you should try:
MaxN.model <- lme(..., method = "ML") ## where ... is the rest of your fit
Dd_MaxN <- dredge(MaxN.model, m.lim=c(2,5))

How to retrieve elastic net coefficients?

I am using the caret package to train an elastic net model on my dataset modDat. I take a grid search approach paired with repeated cross validation to select the optimal values of the lambda and fraction parameters required by the elastic net function. My code is shown below.
library(caret)
library(elasticnet)
grid <- expand.grid(
lambda = seq(0.5, 0.7, by=0.1),
fraction = seq(0, 1, by=0.1)
)
ctrl <- trainControl(
method = 'repeatedcv',
number = 5, #folds
repeats = 10, #repeats
classProbs = FALSE
)
set.seed(1)
enetTune <- train(
y ~ .,
data = modDat,
method = 'enet',
metric = 'RMSE',
tuneGrid = grid,
verbose = FALSE,
trControl = ctrl
)
I can get predictions using y_hat <- predict(enetTune, modDat), but I cannot view the coefficients underlying the predictions.
I have tried coef(enetTune$finalModel) but the only thing returned is NULL. I am suspecting that I have to give the coef() function more information but not sure how to do this.
In addition, I would like to produce a box plot of the 50 sets of coefficients (10 repeats of 5 folds) associated with the optimal lambda and fraction parameters.
To see the coefficients, use predict:
predict(enetTune$finalModel, type = "coefficients")
See ?predict.enet for more information on how to get specific coefficients.
Following on from the answer by #Weihuang Wong, you can get the coefficients from the final model using the following code:
predict.enet(enetTune$finalModel, s=enetTune$bestTune[1, "fraction"], type="coef", mode="fraction")$coefficients
To me what works best is stats::predict, as is #Weihuang Wong answer. However, as OP pointed out in a comment, that provides a list of coefficients for every value of lambda tested.
The important thing to understand here is that when you are using predict, your intention is precisely to predict the value of the parameters, and not really to retrieve them. You should then be aware of that an explore the options available.
In this case, you could use the same function with the argument s for the penalty parameter lambda. Remebember that you are still predicting, but this time you will get the coefficients you are looking for.
stats::predict(enetTune$finalModel, type = "coefficients", s = enetTune$bestTune$lambda)

R - Decreasing memory usage of using caret to train a random forest

I am trying to create a random forest given ~100 thousand inputs. To accomplish them, I am using train from the caret package with method = "parRF". Unfortunately, my machine with 128 GBs of memory still runs out. Therefore, I need to cut down on how much memory I use.
Right now, the training method I am running is:
> trControl <- trainControl(method = "LGOCV", p = 0.9, savePredictions = T)
> model_parrf <- train(x = data_preds, y = data_resp, method = "parRF",
trControl = trControl)
However, because each forest is kept, the system quickly runs out of memory. If my understanding of train and randomForest is correct, each random forest made stores about 500 * 100,000 doubles at the very least. Therefore, I would like to throw away the random forests I no longer need. I tried passing the keep.forest = FALSE into randomForest using
> model_parrf <- train(x = data_preds, y = data_resp, method = "parRF",
trControl = trControl, keep.forest = FALSE)
Error in train.default(x = data_preds, y = data_resp, method = "parRF", :
final tuning parameters could not be determined
In addition, this warning was thrown repeatedly:
In eval(expr, envir, enclos) :
predictions failed for Resample01: mtry=2 Error in predict.randomForest(modelFit, newdata) :
No forest component in the object
It seems that for some reason, caret requires the forests to be kept in order to compare models. Is there any way I can use caret with less memory?
Keep in mind that, if you use M cores, you need to store the data set M+1 times. Try using less workers.

Issues with predict function when building a CART model via CrossValidation using the train command

I am trying to build a CART model via cross validation using the train function of "caret" package.
My data is 4500 x 110 data frame, where all the predictor variables (except the first two, UserId and YOB (Year of Birth) which I am not using for model building) are factors with 2 levels except the dependent variable which is of type integer (although has only two values 1 and 0). Gender is one of the independent variables.
When I ran rpart command to get CART model (using the package "rpart"), i didn't have any problem with the predict function. However, I wanted to improve the model via cross validation, and so used the train function from the package "caret" with the following command:
tr = train(y ~ ., data = subImpTrain, method = "rpart", trControl = tr.control, tuneGrid = cp.grid)
This build the model with the following warning
Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
But it did give me a final model (best.tree). However, when I am trying to run the predict function using the following command:
best.tree.pred = predict(best.tree, newdata = subImpTest)
on the test data, it is giving me the following error:
Error in eval(expr, envir, enclos) : object 'GenderMale' not found
The Gender variable has two values: Female, Male
Can anybody help me understand the error
As #lorelai suggested, caret dummy-codes your variables if you supply it a formula. An alternative is to provide it the variables themselves, like so:
tr = train(y = subImpTrain$y, x = subImpTrain[, -subImpTrain$y],
method = "rpart", trControl = tr.control, tuneGrid = cp.grid)
More importantly, however, you shouldn't use predict.rpart and instead use predict.train, like so:
predict(tr, subImpTest)
In which case it would work just fine with the formula interface.
I have had a similar problem in the past, although concerning another algorithm.
Basically, some algorithms transform the factor variables into dummy variables and rename them accordingly.
My solution was to create my own dummies and leave them in numerical format.
I read that decision trees manage to work properly even so.

Using neuralnet with caret train and adjusting the parameters

So I've read a paper that had used neural networks to model out a dataset which is similar to a dataset I'm currently using. I have 160 descriptor variables that I want to model out for 160 cases (regression modelling). The paper I read used the following parameters:-
'For each split, a model was developed for each of the 10 individual train-test folds. A three layer back-propagation net with 33 input neurons and 16 hidden neurons was used with online weight updates, 0.25 learning rate, and 0.9 momentum. For each fold, learning was conducted from a total of 50 different random initial weight starting points and the network was allowed to iterate through learning epochs until the mean absolute error (MAE) for the validation set reached a minimum. '
Now they used a specialist software called Emergent in order to do this, which is a very specialised neuronal network model software. However, as I've done previous models before in R, I have to keep to it. So I'm using the caret train function in order to do 10 cross fold validation, 10 times with the neuralnet package. I did the following:-
cadets.nn <- train(RT..seconds.~., data = cadet, method = "neuralnet", algorithm = 'backprop', learningrate = 0.25, hidden = 3, trControl = ctrl, linout = TRUE)
I did this to try and tune the parameters as closely to the ones used in the paper, however I get the following error message:-
layer1 layer2 layer3 RMSE Rsquared RMSESD RsquaredSD
1 1 0 0 NaN NaN NA NA
2 3 0 0 NaN NaN NA NA
3 5 0 0 NaN NaN NA NA
Error in train.default(x, y, weights = w, ...) :
final tuning parameters could not be determined
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Do you know what I'm doing wrong? It works when I do nnet, but I can't tune the parameters for that to make it similar to the ones used in the paper I'm trying to mimic.
This is what I get in the warnings() fifty times:-
1: In eval(expr, envir, enclos) :
model fit failed for Fold01.Rep01: layer1=1, layer2=0, layer3=0 Error in neuralnet(form, data = data, hidden = nodes, ...) :
formal argument "hidden" matched by multiple actual arguments
2: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
3: In eval(expr, envir, enclos) :
model fit failed for Fold01.Rep01: layer1=3, layer2=0, layer3=0 Error in neuralnet(form, data = data, hidden = nodes, ...) :
formal argument "hidden" matched by multiple actual arguments
4: In data.frame(..., check.names = FALSE) :
row names were found from a short variable and have been discarded
5: In eval(expr, envir, enclos) :
model fit failed for Fold01.Rep01: layer1=5, layer2=0, layer3=0 Error in neuralnet(form, data = data, hidden = nodes, ...) :
formal argument "hidden" matched by multiple actual arguments
Thanks!
train sets hidden for you (based on the values given by layer-layer3. You are trying to specify that argument twice, hence:
formal argument "hidden" matched by multiple actual arguments
HTH,
Max
I think for beginners it's not obvious at all that the layer specification cannot be passed directly into the train function.
One must read the documentation very carefully to understand the following passage for ...:
Errors will occur if values for tuning parameters are passed here.
So first, you must realize that the hidden parameter of the neuralnet::neuralnet is defined as a tuning parameter and therefore may not be passed directly to the train function (by ...). You find the tuning parameter definitions by:
getModelInfo("neuralnet")$neuralnet$parameters
parameter class label
1 layer1 numeric #Hidden Units in Layer 1
2 layer2 numeric #Hidden Units in Layer 2
3 layer3 numeric #Hidden Units in Layer 3
Instead, you must pass the hidden layer definition by the tuneGrid parameter - not obvious at all because that is normally reserved for tuning the parameters, not passing them.
So you can define the hidden layers as follows:
tune.grid.neuralnet <- expand.grid(
layer1 = 10,
layer2 = 10,
layer3 = 10
)
and then pass that to the caret::train function call as:
model.neuralnet.caret <- caret::train(
formula.nps,
data = training.set,
method = "neuralnet",
linear.output = TRUE,
tuneGrid = tune.grid.neuralnet, # cannot pass parameter hidden directly!!
metric = "RMSE",
trControl = trainControl(method = "none", seeds = seed)

Resources