The website I am trying to run the code is using an old version of R and does not accept ranger as the library. I have to use the caret package. I am trying to process about 800,000 lines in my train data frame and here is the code I use
control <- trainControl(method = 'repeatedcv',
number = 3,
repeats = 1,
search = 'grid')
tunegrid <- expand.grid(.mtry = c(sqrt(ncol(train_1))))
fit <- train(value~.,
data = train_1,
method = 'rf',
ntree = 73,
tuneGrid = tunegrid,
trControl = control)
Looking at previous posts, I tried to tune my control parameters, is there any way I can make the model run faster? Am I able to specify a specific setting so that it just generates a model with the parameters I set, and not try multiple options?
This is my code from ranger which I optimized and currently having accurate model
fit <- ranger(value ~ .,
data = train_1,
num.trees = 73,
max.depth = 35,mtry = 7,importance='impurity',splitrule = "extratrees")
Thank you so much for your time
When you specify method='rf', caret is using the randomForest package to build the model. If you don't want to do all the cross-validation that caret is useful for, just build your model using the randomForest package directly. e.g.
library(randomForest)
fit <- randomForest(value ~ ., data=train_1)
You can specify values for ntree, mtry etc.
Note that the randomForest package is slow (or just won't work) for large datasets. If ranger is unavailable, have you tried the Rborist package?
I want to perform backward feature selection using the function fastbw from the rms package. I use a sample dataset PimaIndiansDiabetes as below:
library(mlbench)
data(PimaIndiansDiabetes)
library(caret)
trControl <- trainControl(method = "repeatedcv",
repeats = 3,
classProbs = TRUE,
number = 10,
savePredictions = TRUE,
summaryFunction = twoClassSummary)
caret_model <- train(diabetes~.,
data=PimaIndiansDiabetes,
method="glm",
trControl=trControl)
library(rms)
reduced_model <- fastbw(caret_model$finalModel)
This gives me an error:
Error in fastbw(caret_model$finalModel) : fit does not have design
information
May I know what this means and how to resolve it?
You're probably stuck. fastbw() works only with models from rms, i.e. ?fastbw says:
fit: fit object with ‘Varcov(fit)’ defined (e.g., from ‘ols’,
‘lrm’, ‘cph’, ‘psm’, ‘glmD’)
I tried your fit with method="lrm" (lrm is rms's logistic regression tool), but got
Error: Model lrm is not in caret's built-in library
I think you're going to have to find another way to do stepwise regression, e.g. see this question: i.e. using library(MASS) and then method="glmStepAIC" (within caret), or stepAIC (from scratch).
It's not obvious to me why you're training a model and then doing stepwise regression ...
I am writing a machine learning code using caret package in R. A sample of code could be
weighted_fit <- train(outcome,
data = train,
method = 'glmnet',
trControl = ctrl)
As you know, some methods in caret package have built-in feature selection such as elastic net. My question is that is there any way to deactivate the built in feature selection in this code?
Thanks in advance for any comment.
#I will try to answer this question to the best of my ability:
#The train function in caret package comes with a parameter tuneGrid which can be used to create a data-frame of tuning parameters.
#The tuning parameter of elastic net regularization in glmnet() is alpha, so create the following:
glmgrid <- expand.grid(alpha = 0) will give ridge regularization.
glmgrid <- expand.grid(alpha = 1) will give lasso regularization.
#and then use
weighted_fit <- train(outcome,
data = train,
method = 'glmnet',
trControl = ctrl,
tuneGrid = glmgrid)
#In glmnet in r , the alpha values can be in the range [0,1] i.e. 0 to 1 including 0 and 1.
# GLMNET - https://www.rdocumentation.org/packages/glmnet/versions/2.0-18/topics/glmnet
# CARET - https://topepo.github.io/caret/index.html
Not sure if this more of a statistics question but the closest similar problem I could find is here, although I couldn't get it to work for my case.
I am trying to develop a pooled, penalized logistic regression model. I used mice to create a mids object and then fit a model to each dataset using caret repeated cross-validation with elastic net regression (glmnet) to tune parameters. The fitted object is not of class "mira" but I think I fixed that by changing the object class with the right list items. The major issue is that glmnet does not have an associated vcov method, which is required by pool().
I would like to use penalized regression based on the amount of variables and uncertainty over which ones are the best predictors. My data consists of 4x numeric variables and 9x categorical variables of varying levels and I anticipate including interactions.
Does anyone know how I might be able to create my own vcov method or otherwise address this issue? I am not sure if this is possible.
Example data and code are enclosed, noting that I am not able to share the actual data.
library(mice)
library(caret)
dat <- as.data.frame(list(time=c(4,3,1,1,2,2,3,5,2,4,5,1,4,3,1,1,2,2,3,5,2,4,5,1),
status=c(1,1,1,0,2,2,0,0,NA,1,2,0,1,1,1,NA,2,2,0,0,1,NA,2,0),
x=c(0,2,1,1,NA,NA,0,1,1,2,0,1,0,2,1,1,NA,NA,0,1,1,2,0,1),
sex=c("M","M","M","M","F","F","F","F","M","F","F","M","F","M","M","M","F","F","M","F","M","F","M","F")))
imp <- mice(dat,m=5, seed=192)
control = trainControl(method = "repeatedcv",
number = 10,
repeats=3,
verboseIter = FALSE)
mod <- list(analyses=vector("list", imp$m))
for(i in 1:imp$m){
mod$analyses[[i]] <- train(sex ~ .,
data = complete(imp, i),
method = "glmnet",
family="binomial",
trControl = control,
tuneLength = 10,
metric="Kappa")
}
obj <- as.mira(mod)
obj <- list(call=mod$analyses[[1]]$call, call1=imp$call, nmis=imp$nmis, analyses=mod$analyses)
oldClass(obj) <- "mira"
pool(obj)
Produces:
Error in pool(obj) : Object has no vcov() method.
I'm trying to build a predictive model in caret using PCA as pre-processing. The pre-processing would be as follows:
preProc <- preProcess(IL_train[,-1], method="pca", thresh = 0.8)
Is it possible to pass the thresh argument directly to caret's train() function? I've tried the following, but it doesn't work:
modelFit_pp <- train(IL_train$diagnosis ~ . , preProcess="pca",
thresh= 0.8, method="glm", data=IL_train)
If not, how can I pass the separate preProc results to the train() function?
As per the documentation, you specify additional preprocessing arguments with trainControl
?trainControl
...
preProcOptions
A list of options to pass to preProcess. The type of pre-processing
(e.g. center, scaling etc) is passed in via the preProc option in train.
...
Since your dataset is not reproducible, let's look at an example. I will use the Sonar dataset from mlbench and use the pls algorithm just for fun.
library(caret)
library(mlbench)
data(Sonar)
ctrl <- trainControl(preProcOptions = list(thresh = 0.95))
mod <- train(Class ~ .,
data = Sonar,
method = "pls",
trControl = ctrl)
Although documentation isn't the most exciting read, definitely make sure to try to go through it. Package authors work hard to create documentation and there are many wonders to be found within.