rbind.fill error in train() function in R caret package - r

A similar question was closed and the solution accepted was to check if the package caret was installed correctly. As instructed in the solutions I checked if caret package was installed and loaded correctly. I have reloaded the package and it is available in the current session. The following lines using train(...) is producing the error:
model <- train(
price ~ ., diamonds,
method = "lm",
trControl = trainControl(
method = "cv", number = 10
, verboseIter = TRUE
))
Here I am trying to train and cross validate the famous diamonds dataset on a linear model. However, the following error is observed:
Error: All inputs to rbind.fill must be data.frames
It doesn't provide any further information about the error. My warnings are on. Is there any way I can debug this?

I copy pasted your code in my console and it worked just fine. Try updating your caret package.

Related

Caret: family specification in glmboost doesn't work

I'm trying to run a boosted robust regression on Caret (with the Huber family), however I get an error when training the model:
library(caret)
X <- rnorm(300, 0, 100)
Y <- rnorm(300, 0, 100000)
data <- cbind(X,Y)
model <- train(Y~X, method="glmboost", data=data, family=Huber())
I get the error 'could not find function Huber()', however this is explicitly included in the mboost package (the one on which glmboost is based).
Any help would be really appreciated.
If you Just run library(caret) with method="glmboost" it will load the mboost package, but it will not attach the mboost package to your search path. Packages are discouraged from automatically attaching other packages since they may import functions that could conflict with other functions you have loaded. Thus most packages load dependencies privately. If you fully qualify the function name with the package name, then you can use it in your model
model <- train(Y~X, method="glmboost", data=data, family=mboost::Huber())
Or you could just also run library(mboost) to attach the package to your search path so you don't have to include the package name prefix.

feature selection function in caret package

I am posting this because this postfeture selection in caret hasent helped my issue and I have 2 questions regarding feature selection function in caret package
when I run code below on my matrix of gene expression allsamplecombat with 5 classes defined in y= :
control <- rfeControl(functions=rfFuncs, method="cv", number=10)
results <- rfe(t(allsamplecombat[filter,]), y = factor(info$clust), sizes=c(300,400,500,600,700,800,1000,1200), rfeControl=control)
I get an out put like this
So, I want to know if I can extract top features for each classes, because predictors(results) just give me the resulting feature without indicating importance for each classes.
my second problem is that when i try to change rfeControl functions to treebagFuncs and run 'parRF` method
control <- rfeControl(functions=treebagFuncs, method="cv", number=5)
results <- rfe(t(allsamplecombat[filter,]), y = factor(info$clust), sizes=c(400,500,600,700,800), rfeControl=control, method="parRF")
i get Error in { : task 1 failed - "subscript out of bounds" error.
what is wrong in my code?
For the importances, there is a sub-object called variables that contains this information for each step of the elimination.
treebagFuncs is designed to work with ipred's bagging function and isn't related to random forest.
You would probably used caretFuncs and pass method to that. However, if you are going to parallelize something, do it to the resampling loop and not the model function. This is generally more efficient. Note that if you do both with M workers, you might actually get M^3 (one for rfe, one for train, and one for parRF). There are options in rfe and train to turn their parallelism off.

model averaged coefficients of linear mixed models in glmulti? Fix no longer works

I'm using the glmulti package to do variable selection on the fixed effects of a mixed model in lme4. I had the same problem retrieving coefficients and confidence intervals that was solved by the author of the package in this thread. Namely using the coef or coef.multi gives an check.names error and the coefficients are listed as NULL when calling the predict method. So I tried the solution listed on the thread linked above, using:
setMethod('getfit', 'merMod', function(object, ...) {
summ=summary(object)$coef
summ1=summ[,1:2]
if (length(dimnames(summ)[[1]])==1) {
summ1=matrix(summ1, nr=1, dimnames=list(c("(Intercept)"),c("Estimate","Std. Error")))
}
cbind(summ1, df=rep(10000,length(fixef(object))))
})
I fixed the missed " in the original post and the code ran. But, now instead of getting
Error in data.frame(..., check.names = FALSE) :arguments imply
differing number of rows: 1, 0
I get this error for every single model...
Error in calculation of the Satterthwaite's approximation. The output
of lme4 package is returned summary from lme4 is returned some
computational error has occurred in lmerTest
I'm using lmerTest and it doesn't surprise me that it would fail if glmulti can't pull the correct info from the model. So really it's the first two lines of the error that are probably what should be focussed on.
A description of the original fix is on the developers website here. Clearly the package hasn't been updated in awhile, and yes I should probably learn a new package...but until then I'm hoping for a fix. I'll contact the developer directly through his website. But, in the mean time, has anyone tried this and found a fix?
lme4 glmulti rJava and other related packages have all been updated to the latest version.

How to pass saved models to caretEnsemble

Reasonably new to this so sorry if I'm being thick.
Is there a way to pass existing models to caretEnsemble?
I have several models, run on the same training data, that I would like to ensemble with caretEnsemble.
Each model takes several hours to run, so I save them, then reload them when needed rather than re-run.
model_xgb <- train(oi_in_4_24_months~., method="xgbTree", data=training, trControl=train_control)
saveRDS(model_xgb, "model_xgb.rds")
model_logit <- train(oi_in_4_24_months~., method="LogitBoost", data=training, trControl=train_control)
saveRDS(model_logit, "model_logit.rds")
model_xgb <- readRDS("model_xgb.rds")
model_logit <- readRDS("model_logit.rds")
I want to pass these saved models to caretEnsemble, but as far as I can make out I can only pass a list of model types, e.g. "LogitBoost", "xgbTree", and caretEnsemble will both run the initial models, then ensemble them.
Is there a way to pass existing models, trained on the same data, to caretEnsemble?
The package author has an example script (https://gist.github.com/zachmayer/5152157) that suggests the following:
all_models <- list(model_xgb, model_logit)
names(all_models) <- sapply(all_models, function(x) x$method)
greedy <- caretEnsemble(all_models, iter=1000L)
But that produces an error
"Error: is(list_of_models, "caretList") is not TRUE".
I think that use of caretList previously wasn't compulsory, but now is.
I don't think you still need the solution to this but answering for anyone else that has the same question.
You can add models to be used by caretEnsemble or caretStack by using as.caretList(list(rpart2 = model_list1, gbm = model_list2))
But remember to use the same indexes for cross-validation/bootstrapping. 'If the indexes were different (or some stuff were not stored as "not/wrongly" specified in trainControl), it will throw an error when trying to use caretEnsemble or caretStack. Which is the expected behavior, obviously.' This issue on github has very clear and simple instructions.

Difficulty getting Caret GLM with Repeated CV to execute

I have been doing 10X10-fold cv logistic models for a long time using homebrew code, but recently have figured that it might be nice to let caret handle the messy stuff for me.
Unfortunately, I seem to be missing some of the nuances that caret needs to be able to function.
Specifically, I keep getting this error:
>Error in { : task 1 failed - "argument is not interpretable as logical"
Please see if you can pick up what I am doing wrong...
Thanks in advance!
Data set is located here.
dataset <- read.csv("Sample Data.csv")
library(caret)
my_control <- trainControl(
method="repeatedcv",
number=10,
repeats = 10,
savePredictions="final",
classProbs=TRUE
)
This next block of code was put in there to make caret happy. My original dependent variable was a binary that I had turned into a factor, but caret had issues with the factor levels being "0" and "1". Not sure why.
dataset$Temp <- "Yes"
dataset$Temp[which(dataset$Dep.Var=="0")] <- "No"
dataset$Temp <- as.factor(dataset$Temp)
Now I (try) to get caret to run the 10X10-fold glm model for me...
testmodel <- train(Temp ~ Param.A + Param.G + Param.J + Param.O, data = dataset,
method = "glm",
trControl = my_control,
metric = "Kappa")
testmodel
> Error in { : task 1 failed - "argument is not interpretable as logical"
Though you already found a fix by updating R and caret, I'd like to point out there is (was) a bug in your code which caused the error, and which I can reproduce here with an older version of R and caret:
The savePredictions of trainControl is meant to be set to either TRUE or FALSE instead of 'final'. Seems you simply mixed it with the returnResamp parameter, which would take exactly this parameter.
BTW: R and caret have restrictions on level names of factors, which is why caret complained when you handed 0 and 1 level names for the dependent variable to it. Using a simple dataset$Dep.Var <- factor(paste0('class', dataset$Dep.Var)) should do the trick in such cases.
I don't have enough reputation to comment, so I am posting this as an answer. I ran your exact code, and it worked fine for me, twice. I did get this warning:
glm.fit: fitted probabilities numerically 0 or 1 occurred
As per the author, this error had something to do with the savePredictions parameter. Have a look at this issue:
https://github.com/topepo/caret/issues/304
Thanks to #Sumedh, I figured that the problem might not be with my code, and I updated all my packages.
Surprise! Now it works. So I wasn't crazy after all.
Sorry all for the fire drill.

Resources