"promise already under evaluation" error in R caret's rfe function - r

I have a matrix X and vector Y which I use as arguments into the rfe function from the caret package. It's as simple as:
I get a weird error which I can't decipher:
promise already under evaluation: recursive default argument reference or earlier problems?
EDIT:
Here is a reproducible example for the first 5 rows of my data:
library(caret)
X_values = c(29.04,96.57,4.57,94.23,66.81,26.71,69.01,77.06,49.52,97.59,47.57,64.07,24.25,11.27,77.30,90.99,44.05,30.96,96.32,16.04)
X = matrix(X_values, nrow = 5, ncol=4)
Y = c(5608.11,2916.61,5093.05,3949.35,2482.52)
rfe(X, Y)
My R version is 3.2.3. Caret package is 6.0-76.
Does anybody know what this is?

There are two problems with your code.
You need to specify the function/algorithm that you want to fit. (this is what causes the error message you get. I am unsure why rfe throws such a cryptic error message; it makes it difficult to debug, indeed.)
You need to name your columns in the input data.
The following works:
library(caret)
X_values = c(29.04,96.57,4.57,94.23,66.81,26.71,69.01,77.06,49.52,97.59,47.57,64.07,24.25,11.27,77.30,90.99,44.05,30.96,96.32,16.04)
X = matrix(X_values, nrow = 5, ncol=4)
Y = c(5608.11,2916.61,5093.05,3949.35,2482.52)
ctrl <- rfeControl(functions = lmFuncs)
colnames(X) <- letters[1:ncol(X)]
set.seed(123)
rfe(X, Y, rfeControl = ctrl)
I chose a linear model for the rfe.
The reason for the warning messages is the low number of observations in your data during cross validation. You probably also want to set the sizes argument to get a meaningful feature elimination.

Related

What format of x and y inputs does R glmnet expect?

I have a data set that looks like this:
I'm interested in the best possible multilinear regression, that's why I'm trying this LASSO method.
R, which stands for stock market returns, should be the dependent variable, whereas all the others (except D/Date and P/Price) are independent variables.
Here's what I've tried so far:
library(Matrix)
library(foreach)
library(glmnet)
trainX <- spxdata[c(4:11)]
trainY <- spxdata[c(3)]
CV = cv.glmnet(x = trainX, y = trainY, alpha = 1, nlambda = 100)
and this gives me the following error message:
Error in storage.mode(y) <- "double" : (list) object cannot be coerced to type 'double'
I'm not accustomed to R and only use it rarely, so I'm not sure how to go about this problem. I guess it has something to do with the format of my trainX and trainY subset, but what exactly have I done wrong here?
The predictor matrix should be a matrix, and not a data frame, which is what you have there. Similarly, the response should be a vector, and not a one-column data frame.
You can get these with
trainX <- as.matrix(spxdata[4:11])
trainY <- spxdata[[3]] # not [3]
But in general, you may want to avoid these and other issues by using my glmnetUtils package, which implements a formula interface to glmnet. This lets you use it the same way you'd use glm or rpart or other modelling functions.

Errors while performing caret tuning in R

I am building a predictive model with caret/R and I am running into the following problems:
When trying to execute the training/tuning, I get this error:
Error in if (tmps < .Machine$double.eps^0.5) 0 else tmpm/tmps :
missing value where TRUE/FALSE needed
After some research it appears that this error occurs when there missing values in the data, which is not the case in this example (I confirmed that the data set has no NAs). However, I also read somewhere that the missing values may be introduced during the re-sampling routine in caret, which I suspect is what's happening.
In an attempt to solve problem 1, I tried "pre-processing" the data during the re-sampling in caret by removing zero-variance and near-zero-variance predictors, and automatically inputting missing values using a carets knn automatic imputing method preProcess(c('zv','nzv','knnImpute')), , but now I get the following error:
Error: Matrices or data frames are required for preprocessing
Needless to say I checked and confirmed that the input data set are indeed matrices, so I dont understand why I get this second error.
The code follows:
x.train <- predict(dummyVars(class ~ ., data = train.transformed),train.transformed)
y.train <- as.matrix(select(train.transformed,class))
vbmp.grid <- expand.grid(estimateTheta = c(TRUE,FALSE))
adaptive_trctrl <- trainControl(method = 'adaptive_cv',
number = 10,
repeats = 3,
search = 'random',
adaptive = list(min = 5, alpha = 0.05,
method = "gls", complete = TRUE),
allowParallel = TRUE)
fit.vbmp.01 <- train(
x = (x.train),
y = (y.train),
method = 'vbmpRadial',
trControl = adaptive_trctrl,
preProcess(c('zv','nzv','knnImpute')),
tuneGrid = vbmp.grid)
The only difference between the code for problem (1) and (2) is that in (1), the pre-processing line in the train statement is commented out.
In summary,
-There are no missing values in the data
-Both x.train and y.train are definitely matrices
-I tried using a standard 'repeatedcv' method in instead of 'adaptive_cv' in trainControl with the same exact outcome
-Forgot to mention that the outcome class has 3 levels
Anyone has any suggestions as to what may be going wrong?
As always, thanks in advance
reyemarr
I had the same problem with my data, after some digging i found that I had some Inf (infinite) values in one of the columns.
After taking them out (df <- df %>% filter(!is.infinite(variable))) the computation ran without error.

VSURF and randomForest

I'm trying to use VSURF and randomForest in R, but the functions in the libraries like predict.VSURF, predict.randomForest and plot.VSURF are not working and I'm getting the following error:
Error: could not find function "predict.VSURF"
Here's a reproducible example:
library(randomForest)
library(VSURF)
data(cars)
fit <- VSURF(x = cars[1:402,2:ncol(cars)], y = cars[1:402,1])
#At this step I get the error: Error: could not find function "predict.VSURF"
preds <- predict.VSURF(fit, newdata = cars[403:804,2:ncol(cars)])
R will recognize fit as a VSURF class object and call VSURF.predict for it. You just use predict() instead.
Also, in looking at your example, VSURF seems to fail for only one x variable throwing this error:
Error in matrix(NA, nrow = nfor.thres, ncol = ncol(x)) :
non-numeric matrix extent
Using mtcars and only predict(), VSURF works fine for me.
data("mtcars")
fit <- VSURF(x = mtcars[1:25,2:ncol(mtcars)], y = mtcars[1:25,1])
preds <- predict(fit, newdata = mtcars[26:32, 2:ncol(mtcars)])

"nrow(x) == n is not TRUE" when using train in Caret; already set as factors

I'm using the dataset found here: http://archive.ics.uci.edu/ml/datasets/Qualitative_Bankruptcy
When running code:
library(caret)
bank <- read.csv("Qualitative_Bankruptcy.data.txt", header=FALSE, na.strings = "?",
strip.white = TRUE)
x=bank[1:6]
y=bank[7]
bank.knn <- train(x, y, method= "knn", trControl = trainControl(method = "cv"))
I get the following error:
Error: nrow(x) == n is not TRUE
The only example I've found is Error: nrow(x) == n is not TRUE when using Train in Caret ; my Y is already a factor vector with two classes, all the X features are factors as well. I've tried using as.matrix and as.data.frame on both the X and Y without success.
nrow(x) is equal to 250, but I'm not sure what the n is referring to in the package.
y is not actually a vector, but a data.frame with one column because bank[7] does not convert the 7th column into a vector, so length(y) is 1. Use bank[, 7] instead. It does not make a difference for x but it could as well be generated by bank[, 1:6].
Additionally to make KNN work you probably have to convert the x data.frame that consists of factor variables to numeric dummy variables.
x=model.matrix(~. - 1, bank[, 1:6])
y=bank[, 7]
bank.knn <- train(x, y, method= "knn",
trControl = trainControl(method = "cv"))
I'm not a caret user but I think you have two problems. The extraction method you used did not deliver an atomic vector but rahter a list that contained a vector. If you asked for length(y) you get 1 rather than 250. The first error is easily solved by changing to this definition of y:
y <- bank[[7]] # extract a vector rather than a sublist
Then things get messy. The KNN method expects continuous data (and the error messages you get indicate the caret's author considers it a "regression method" and you are passing factor data, so you therefore need to choose a classification method instead.

Select Features for Naive Bayes Clasification in R

i want to use naive Bayes classifier to make some predictions.
So far i can make the prediction with the following (sample) code in R
library(klaR)
library(caret)
Faktor<-x <- sample( LETTERS[1:4], 10000, replace=TRUE, prob=c(0.1, 0.2, 0.65, 0.05) )
alter<-abs(rnorm(10000,30,5))
HF<-abs(rnorm(10000,1000,200))
Diffalq<-rnorm(10000)
Geschlecht<-sample(c("Mann","Frau", "Firma"),10000,replace=TRUE)
data<-data.frame(Faktor,alter,HF,Diffalq,Geschlecht)
set.seed(5678)
flds<-createFolds(data$Faktor, 10)
train<-data[-flds$Fold01 ,]
test<-data[flds$Fold01 ,]
features <- c("HF","alter","Diffalq", "Geschlecht")
formel<-as.formula(paste("Faktor ~ ", paste(features, collapse= "+")))
nb<-NaiveBayes(formel, train, usekernel=TRUE)
pred<-predict(nb,test)
test$Prognose<-as.factor(pred$class)
Now i want to improve this model by feature selection. My real data is about 100 features big.
So my question is , what woould be the best way to select the most important features for naive Bayes classification?
Is there any paper dor reference?
I tried the following line of code, bit this did not work unfortunately
rfe(train[, 2:5],train[, 1], sizes=1:4,rfeControl = rfeControl(functions = ldaFuncs, method = "cv"))
EDIT: It gives me the following error message
Fehler in { : task 1 failed - "nicht-numerisches Argument für binären Operator"
Calls: rfe ... rfe.default -> nominalRfeWorkflow -> %op% -> <Anonymous>
Because this is in german you may please reproduce this on your machine
How can i adjust the rfe() call to get a recursive feature elimination?
This error appears to be due to the ldaFuncs. Apparently they do not like factors when using matrix input. The main problem can be re-created with your test data using
mm <- ldaFuncs$fit(train[2:5], train[,1])
ldaFuncs$pred(mm,train[2:5])
# Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) :
# non-numeric argument to binary operator
And this only seems to happens if you include the factor variable. For example
mm <- ldaFuncs$fit(train[2:4], train[,1])
ldaFuncs$pred(mm,train[2:4])
does not return the same error (and appears to work correctly). Again, this only appears to be a problem when you use the matrix syntax. If you use the formula/data syntax, you don't have the same problem. For example
mm <- ldaFuncs$fit(Faktor ~ alter + HF + Diffalq + Geschlecht, train)
ldaFuncs$pred(mm,train[2:5])
appears to work as expected. This means you have a few different options. Either you can use the rfe() formula syntax like
rfe(Faktor ~ alter + HF + Diffalq + Geschlecht, train, sizes=1:4,
rfeControl = rfeControl(functions = ldaFuncs, method = "cv"))
Or you could expand the dummy variables yourself with something like
train.ex <- cbind(train[,1], model.matrix(~.-Faktor, train)[,-1])
rfe(train.ex[, 2:6],train.ex[, 1], ...)
But this doesn't remember which variables are paired in the same factor so it's not ideal.

Resources