R: randomForest error message - r

Trying to run Random Forest on a data set that has 400~ samples, and about 360 variables in data frame df:
I'm trying to use the the variables (s10, s100, etc etc) to predict the Genotype. This is the code I'm using:
rf <-randomForest(Genotype ~ ., data = df, importance = TRUE, proximity = TRUE)
but I keep getting the error message:
Error in if (n == 0) stop("data (x) has 0 rows") :
argument is of length zero
What am I doing wrong?

First, don't name your objects the same as R functions (ie., "df"). Second, try a non-formula interface to randomForest. Let's where that gets you.
( rf <-randomForest(y=my.df[,"Genotype"], x=my.df[,2:ncol(my.df)], importance = TRUE, proximity = TRUE) )

Related

R - GAM error - Missing value where TRUE/FALSE needed

I'm trying to train a binomial GAM model with the mgcv package, and running into this error:
Error in if (length(grad) > 0 && sum(uconv.ind) > 0) { : missing value where TRUE/FALSE needed
There are no columns in my data frame (of the columns which are included in the model) that have NAs in them. When I look at the unique values of the response column, it shows [1] 0 1 as expected.
Here is the code used to train the model:
mgcv::bam(formula = formula, family = binomial, data = df, select = T, discrete = T, method = 'fREML', nthreads = 32, drop.unused.levels = FALSE)
Any help would be greatly appreciated!
As requested, here is a screenshot of a random sample of the data. The data is related to my company so I can't give too much information away:
The final column is the response, and it is a numeric column. When I type df[!complete.cases(df), ], the result has 0 rows.

Fail to predict woe in R

I used this formula to get woe with
library("woe")
woe.object <- woe(data, Dependent="target", FALSE,
Independent="shop_id", C_Bin=20, Bad=0, Good=1)
Then I want to predict woe for the test data
test.woe <- predict(woe.object, newdata = test, replace = TRUE)
And it gives me an error
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "data.frame"
Any suggestions please?
For prediction, you cannot do it with the package woe. You need to use the package. Take note of the masking of the function woe, see below:
#let's say we woe and then klaR was loaded
library(klaR)
data = data.frame(target=sample(0:1,100,replace=TRUE),
shop_id = sample(1:3,100,replace=TRUE),
another_var = sample(letters[1:3],100,replace=TRUE))
#make sure both dependent and independent are factors
data$target=factor(data$target)
data$shop_id = factor(data$shop_id)
data$another_var = factor(data$another_var)
You need two or more dependent variables:
woemodel <- klaR::woe(target~ shop_id+another_var,
data = data)
If you only provide one, you have an error:
woemodel <- klaR::woe(target~ shop_id,
data = data)
Error in woe.default(x, grouping, weights = weights, ...) : All
factors with unique levels. No woes calculated! In addition: Warning
message: In woe.default(x, grouping, weights = weights, ...) : Only
one single input variable. Variable name in resulting object$woe is
only conserved in formula call.
If you want to predict the dependent variable with only one independent, something like logistic regression will work:
mdl = glm(target ~ shop_id,data=data,family="binomial")
prob = predict(mdl,data,type="response")
predicted_label = ifelse(prob>0.5,levels(data$target)[1],levels(data$target)[0])

Errors with dredge() function in MuMin

I'm trying to use the dredge() function to evaluate models by completing every combination of variables (up to five variables per model) and comparing models using AIC corrected for small sample size (AICc).
However, I'm presented with one error and two warning messages as follows:
Fixed term is "(Intercept)"
Warning messages: 1: In dredge(MaxN.model,
m.min = 2, m.max = 5) : comparing models fitted by REML 2: In
dredge(MaxN.model, m.min = 2, m.max = 5) : arguments 'm.min' and
'm.max' are deprecated, use 'm.lim' instead
I've tried changing to 'm.lim' as specified but it comes up with the error:
Error in dredge(MaxN.model, m.lim = 5) : invalid 'm.lim' value In
addition: Warning message: In dredge(MaxN.model, m.lim = 5) :
comparing models fitted by REML
The code I'm using is:
MaxN.model<-lme(T_MaxN~Seagrass.cover+composition.pca1+composition.pca2+Sg.Richness+traits.pca1+
land.use.pc1+land.use.pc2+seascape.pc2+D.landing.site+T_Depth,
random=~1|site, data = sgdf, na.action = na.fail, method = "REML")
Dd_MaxN<-dredge(MaxN.model, m.min = 2 , m.max = 5)
What am I doing wrong?
You didn't tell us what you tried to specify for m.lim. ?dredge says:
m.lim ...optionally, the limits ‘c(lower, upper)’ for number of terms in a single model
so you should specify a two-element numeric (integer) vector.
You should definitely be using method="ML" rather than method="REML". The warning/error about REML is very serious; comparing models with different fixed effects that are fitted via REML will lead to nonsense.
So you should try:
MaxN.model <- lme(..., method = "ML") ## where ... is the rest of your fit
Dd_MaxN <- dredge(MaxN.model, m.lim=c(2,5))

How to fix error in MuMIn package, dredge() function?

I am trying to run the Dredge function of the MuMIn package and keep getting an error, saying "result is empty". I don't know why and cannot find information on the meaning of this error message.
As far as my study of this function and package, the below code "should" be correct. Basically, I have a General Linear Mixed Model being run and I want to use the dredge function to run a model selection procedure based on AICc.
options(na.action = "na.fail") # Require for dredge to run
glmm1 <- lmer(cpue_diff ~ year + p.afraid + s.frequency.monitoring + (1 | f1.name ), data = dat, REML=FALSE)
summary(glmm1)
model_dredge <- dredge(glmm1, beta = FALSE, evaluate = TRUE, rank = "AICc")
options(na.action = "na.omit") # set back to default
The error message is:
"Fixed term is "(Intercept)" Error in dredge(glmm1, beta = FALSE,
evaluate = TRUE, rank = "AICc") : result is empty"
Any ideas anyone what this message means and how to correct it?
Much appreciated!
Check the the dimensions of your data frame. I had the same error message and when I checked out my data frame, there was more than a thousand "extra rows" full of NAs.
After subsetting the data frame to include only the true rows, everything worked well with the dredge function.

tune() function e1071 / libsvm -error with rpart

I am trying to tune rpart. I have already split my data into a training and cv set. The tune.rpart convenience function doesn't seem to have a a way to specify a cv set. so I am using the regular tune() function.
I have 595 potential variables in my dataset, so I don't want to specify using a formula. I get the following error when I do this
Error in tune(rpart, train.x = trainset[, -1], train.y = trainset[, 1], :
Dependent variable has wrong type!
In addition: Warning message:
In if (y) ans$y <- Y :
the condition has length > 1 and only the first element will be used
Code:
load('train.dat')
load('cv.dat')
trainset$class<-factor(trainset$class)
cvset$class<-factor(cvset$class)
rpart.tune<-tune(rpart,train.x= trainset[,-1], train.y=trainset[,1],
validation.x=cvset[,-1], validation.y=cvset[,1],
ranges = list(
cp = c(0.002,0.005,0.01,0.015,0.02,0.03)),
tunecontrol = tune.control(sampling = "fix"))
Data is available at:
https://docs.google.com/folder/d/0B2_rKFnvrjMAM3FGbnFvZm5laUk/edit
You have to check the prediction output of the classifier you are using. tune() expects (for classification) receive one of the following:
(is.logical(true.y) || is.factor(true.y)) && (is.logical(pred) || is.factor(pred) || is.character(pred))
The prediction with rpart, for example, produces a matrix as output. You can try svmthat handles that right or try to give just two classes.

Resources