VSURF and randomForest - r

I'm trying to use VSURF and randomForest in R, but the functions in the libraries like predict.VSURF, predict.randomForest and plot.VSURF are not working and I'm getting the following error:
Error: could not find function "predict.VSURF"
Here's a reproducible example:
library(randomForest)
library(VSURF)
data(cars)
fit <- VSURF(x = cars[1:402,2:ncol(cars)], y = cars[1:402,1])
#At this step I get the error: Error: could not find function "predict.VSURF"
preds <- predict.VSURF(fit, newdata = cars[403:804,2:ncol(cars)])

R will recognize fit as a VSURF class object and call VSURF.predict for it. You just use predict() instead.
Also, in looking at your example, VSURF seems to fail for only one x variable throwing this error:
Error in matrix(NA, nrow = nfor.thres, ncol = ncol(x)) :
non-numeric matrix extent
Using mtcars and only predict(), VSURF works fine for me.
data("mtcars")
fit <- VSURF(x = mtcars[1:25,2:ncol(mtcars)], y = mtcars[1:25,1])
preds <- predict(fit, newdata = mtcars[26:32, 2:ncol(mtcars)])

Related

Error when passing the "weights" argument to the coxph function using riskRegression in R

I am trying to use inverse probability of treatment weighting in a cause-specific cox regression using the CSC function in the riskRegression Package.
I calculated the weights without a problem, but when I try to pass the weights to the CSC function I get the following error message:
Error in eval(extras, data, env) :
..1 used in an incorrect context, no ... to look in
A complete reproducible example looks like this:
library(ipw)
library(cmprsk)
library(survival)
library(riskRegression)
data(mgus2)
# get some example data
mgus2$etime <- with(mgus2, ifelse(pstat==0, futime, ptime))
mgus2$event <- with(mgus2, ifelse(pstat==0, 2*death, 1))
mgus2$event <- factor(mgus2$event, 0:2, labels=c("censor", "pcm", "death"))
mgus2$age_cat <- cut(mgus2$age, breaks=seq(0, 100, 25))
mgus2$sex <- ifelse(mgus2$sex=="F", 0, 1)
# remove NA
mgus2 <- subset(mgus2, !is.na(mspike))
# estimate inverse probability weights
weights <- ipwpoint(sex, "binomial", "logit", denominator= ~ age_cat + mspike,
data=mgus2)
mgus2$weights <- weights$ipw.weights
# rerun cox model using weights
mod2 <- CSC(Hist(etime, event) ~ sex + age_cat + mspike, cause="pcm",
surv.type="hazard", fitter="coxph", data=mgus2,
weights=weights)
I know from the documentation that the CSC function calls the coxph function internally, passing additional arguments to it using ... syntax. Other arguments could be passed to the function just fine, but the weight argument always produces the error message above.
How can I fix this?
UPDATE:
I have contacted the Package Maintainer and he has fixed the bug already. It should work fine now, with one little difference: Instead of weights=weights one has to use weights=mgus2$weights.

Error in panel regression in case of different independent variable r

I am trying to run Fama Macbeth regression by the following code:
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~max_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
It is working when I regress the data using the independent variable named 'max_1'. However when I change it and use another independent variable named 'ivol_1' the result is showing an error. The code is
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
the error message is like this:
Error in pmg(return ~ ivol_1, df_all_11, index = c("yearmonth", "firms")) :
Insufficient number of time periods
or sometimes the error is like this
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, :
object is not a matrix
For your convenience, I am sharing my data with you. The data link is
data frame
I am wondering why this is happening in case of the different variable in the same data frame. I would be grateful if you can solve this problem.
This problem can be solved by mice function
library(mice)
library(dplyr)
require(foreign)
require(plm)
require(lmtest)
df_all_11<-read.csv("df_all_11.csv.part",sep = ",",header = TRUE,stringsAsFactor = F)
x<-data.frame(ivol_1=df_all_11$ivol_1,month=df_all_11$Month)
imputed_Data <- mice(x, m=3, maxit =5, method = 'pmm', seed = 500)
completeData <- complete(imputed_Data, 3)
df_all_11<-mutate(df_all_11,ivol_1=completeData$ivol_1)
fpmg2 <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms"))
coeftest(fpmg2)
this problem because the variable ivol_1 have a lots of NA so you should impute the NA first then run the pmg function.

Make lsmeans work when called inside a function with a nb.glm object

I need to make a function (fit.function) that calls lsmeans with different formulas and data based on a negative binomial model fit from MASS (nb.glm).
I get the following error when I try to call lsmeans inside the function:
Error in terms.formula(formula, data = data) :
'data' argument is of the wrong type
Error in ref.grid(object = list(coefficients = c(1.69377906086784,
2.30790181649084, :
Perhaps a 'data' or 'params' argument is needed
It seems like the error has something to do with the environment of the ref.grid function.
Could anyone help me to fix the error? Any idea for a workaround?
My code is as follows:
library(lsmeans)
library(MASS)
df1 <-data.frame(y=rnbinom(100,size=0.75,mu =5 ), x="A")
df2 <-data.frame(y=rnbinom(100,size=0.75,mu =50 ), x="B")
df3 <-data.frame(y=rnbinom(100,size=0.75,mu =500 ), x="C")
df <- rbind(df1,df2,df3)
nb.fit<-function(formula,data){
glm.nb(formula,data=data)
}
fit.function <- function(formula, data){
lsmeans(glm.nb(formula, data = data), "x", adjust = "tuckey")
}
# lsmeans are calcultated when both lsmeans and glm.nb are explicitly called
main.fit <- lsmeans(glm.nb(y ~ x,data=df), "x", adjust = "tuckey")
main.fit
CLD <- cld(main.fit, type= "response")
plot(CLD)
# no problem wrapping glm.nb into nb.fit
class(glm.nb(y ~ x,df))
nb.model <-nb.fit(y ~ x,df)
class(nb.model)
# The Error appears once I wrap lsmeans into fit.function
func.fit <- fit.function(y ~ x,df)

"promise already under evaluation" error in R caret's rfe function

I have a matrix X and vector Y which I use as arguments into the rfe function from the caret package. It's as simple as:
I get a weird error which I can't decipher:
promise already under evaluation: recursive default argument reference or earlier problems?
EDIT:
Here is a reproducible example for the first 5 rows of my data:
library(caret)
X_values = c(29.04,96.57,4.57,94.23,66.81,26.71,69.01,77.06,49.52,97.59,47.57,64.07,24.25,11.27,77.30,90.99,44.05,30.96,96.32,16.04)
X = matrix(X_values, nrow = 5, ncol=4)
Y = c(5608.11,2916.61,5093.05,3949.35,2482.52)
rfe(X, Y)
My R version is 3.2.3. Caret package is 6.0-76.
Does anybody know what this is?
There are two problems with your code.
You need to specify the function/algorithm that you want to fit. (this is what causes the error message you get. I am unsure why rfe throws such a cryptic error message; it makes it difficult to debug, indeed.)
You need to name your columns in the input data.
The following works:
library(caret)
X_values = c(29.04,96.57,4.57,94.23,66.81,26.71,69.01,77.06,49.52,97.59,47.57,64.07,24.25,11.27,77.30,90.99,44.05,30.96,96.32,16.04)
X = matrix(X_values, nrow = 5, ncol=4)
Y = c(5608.11,2916.61,5093.05,3949.35,2482.52)
ctrl <- rfeControl(functions = lmFuncs)
colnames(X) <- letters[1:ncol(X)]
set.seed(123)
rfe(X, Y, rfeControl = ctrl)
I chose a linear model for the rfe.
The reason for the warning messages is the low number of observations in your data during cross validation. You probably also want to set the sizes argument to get a meaningful feature elimination.

Select Features for Naive Bayes Clasification in R

i want to use naive Bayes classifier to make some predictions.
So far i can make the prediction with the following (sample) code in R
library(klaR)
library(caret)
Faktor<-x <- sample( LETTERS[1:4], 10000, replace=TRUE, prob=c(0.1, 0.2, 0.65, 0.05) )
alter<-abs(rnorm(10000,30,5))
HF<-abs(rnorm(10000,1000,200))
Diffalq<-rnorm(10000)
Geschlecht<-sample(c("Mann","Frau", "Firma"),10000,replace=TRUE)
data<-data.frame(Faktor,alter,HF,Diffalq,Geschlecht)
set.seed(5678)
flds<-createFolds(data$Faktor, 10)
train<-data[-flds$Fold01 ,]
test<-data[flds$Fold01 ,]
features <- c("HF","alter","Diffalq", "Geschlecht")
formel<-as.formula(paste("Faktor ~ ", paste(features, collapse= "+")))
nb<-NaiveBayes(formel, train, usekernel=TRUE)
pred<-predict(nb,test)
test$Prognose<-as.factor(pred$class)
Now i want to improve this model by feature selection. My real data is about 100 features big.
So my question is , what woould be the best way to select the most important features for naive Bayes classification?
Is there any paper dor reference?
I tried the following line of code, bit this did not work unfortunately
rfe(train[, 2:5],train[, 1], sizes=1:4,rfeControl = rfeControl(functions = ldaFuncs, method = "cv"))
EDIT: It gives me the following error message
Fehler in { : task 1 failed - "nicht-numerisches Argument für binären Operator"
Calls: rfe ... rfe.default -> nominalRfeWorkflow -> %op% -> <Anonymous>
Because this is in german you may please reproduce this on your machine
How can i adjust the rfe() call to get a recursive feature elimination?
This error appears to be due to the ldaFuncs. Apparently they do not like factors when using matrix input. The main problem can be re-created with your test data using
mm <- ldaFuncs$fit(train[2:5], train[,1])
ldaFuncs$pred(mm,train[2:5])
# Error in FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...) :
# non-numeric argument to binary operator
And this only seems to happens if you include the factor variable. For example
mm <- ldaFuncs$fit(train[2:4], train[,1])
ldaFuncs$pred(mm,train[2:4])
does not return the same error (and appears to work correctly). Again, this only appears to be a problem when you use the matrix syntax. If you use the formula/data syntax, you don't have the same problem. For example
mm <- ldaFuncs$fit(Faktor ~ alter + HF + Diffalq + Geschlecht, train)
ldaFuncs$pred(mm,train[2:5])
appears to work as expected. This means you have a few different options. Either you can use the rfe() formula syntax like
rfe(Faktor ~ alter + HF + Diffalq + Geschlecht, train, sizes=1:4,
rfeControl = rfeControl(functions = ldaFuncs, method = "cv"))
Or you could expand the dummy variables yourself with something like
train.ex <- cbind(train[,1], model.matrix(~.-Faktor, train)[,-1])
rfe(train.ex[, 2:6],train.ex[, 1], ...)
But this doesn't remember which variables are paired in the same factor so it's not ideal.

Resources