Error when trying to make confusion matrix - r

cm = table(obs = test[,14], pred)
Error in if (xi > xj) 1L else -1L: missing value where TRUE/FALSE needed
I am trying to output the confusion matrix of my random forest model on the testing data, but I'm getting this error. Any ideas what the issue might be?
Thank you in advance!

The error function tells us that one of the items in test[,14] or pred is missing (NA), and the table() function you are using cannot handle missing values. I expect you can get a confusion matrix by first eliminating elements of both vectors where either vector is NA.
Note that the table() function you are using does not seem to be the base R table() function. I expect it is part of a package you have loaded.

Related

prcomp() function is giving me error "infinite or missing values in 'x'"

Eset is an expression matrix and I'm trying to make a PCA and I keep on getting this error and I was thinking maybe it was because "exprs" is not numeric but I'm checking and it is double. how can I solve this?
Eset<-ExpressionSet(as.matrix(exp))
pData(Eset)<-meta
featureData(Eset) <- as(feat,"AnnotatedDataFrame")
exprs<- Biobase::exprs(Eset)
exprs<-t(exprs)
exprs<- as.numeric(exprs)
type(exprs)
PCA <- prcomp(exprs, scale = FALSE)
I was not expecting this error because I made sure that "exprs" was numeric but it's still not working

Compute the mean in R subject to conditions

I tried following the advice on this post (Conditional mean statement), however it did not seem to work for me. I get the error Error in x[j] : only 0's may be mixed with negative subscripts.
So I have a database called data with a few different columns and many rows. There is one indicator column, z, which takes value 0 or 1. I want to compute the mean of the column base if z depending on whether z=0 or z=1. So I have used the following line of code:
mean(data[data$z==1, data$base], na.rm = TRUE)
But as mentioned, I get the error Error in x[j] : only 0's may be mixed with negative subscripts. I'm unsure why I am getting this error, or what I could/should do instead. I do not actually understand the error.
Thanks.
Using the comment by GKi worked and solved my problem. In the end I used
mean(data$base[data$z==1], na.rm = TRUE)

Imputation using robComputation gives error message

I have a dataset with a rather large portion of missing values. I'm trying to do imputation using robComposition. But I keep getting the error message: "Error in quantile.default(d, k/length(d)) : missing values and NaN's not allowed if 'na.rm' is FALSE". This does not make sense to me. Why would missing values not be allowed if I'm trying to impute missing values? Here's a small subset of the data and code to reproduce the error
library(robCompositions)
p <- c(1.000000,2.083333,1.333333,1.166667,4.250000,1.083333,2.083333,1.166667,1.000000,1.000000)
i <- c(1101.25,1675.00,2500.00,1612.50,NA,1750.0,600.00,0.00,1530.00,3158.50)
s <- c(34000,1550,NA,2750,375,1750,30000,20000,NA,NA)
x <- data.frame(p,i,s)
imp <- impCoda(x)
After contact with one of the authors of robComposite it became apparant that I need to use imri() in the package VIM to impute data for non-composite models.

Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector

The reason the title of the question is the error I am getting is because I simply do not know how to interpret it, no matter how much I research. Whenever I run a logistic regression with bigglm() (from the biglm package, designed to run regressions over large amounts of data), I get:
Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector
This is how my bigglm() function looks like:
fit <- bigglm(f, data = df, family=binomial(link="logit"), chunksize=100, maxit=10)
Where f is the formula and df is the dataframe (of little over a million rows and about 210 variables).
So far I have tried changing my dependent variable to a numeric class but that didn't work. My dependent variable has no missing values.
Judging from the error message I wonder if this might have to do anything with the family argument in the bigglm() function. I have found numerous other websites with people asking about the same error and most of them are either unanswered, or for a completely different case.
The error Argument eta must be a nonempty numeric vector to me looks like your data has either empty values or NA. So, please check your data. Whatever advice we provide here, cannot be tested until we see your code or the steps involved resulting an error.
try this
is.na(df) # if TRUE, then replace them with 0
df[is.na(df)] <- 0 # Not sure replacing NA with 0 will have effect on your model
or whatever line of the code is resulting in NAs generation pass na.rm=Targument
Again, we can only speculate. Hope it helps.

Error in huge R package when criterion "stars"

I am trying to do an association network using some expression data I have, the data is really huge: 300 samples and ~30,000 genes. I would like to apply a Gaussian graphical model to my data using the huge R package.
Here is the code I am using
dim(data)
#[1] 317 32291
huge.out <- huge.npn(data)
huge.stars <- huge.select(huge.out, criterion="stars")
However in this last step I got an error:
Error in cor(x) : ling....in progress:10%
Missing values present in input variable 'x'. Consider using use = 'pairwise.complete.obs'
Any help would be very appreciated.
You posted this exact question on Rhelp today. Both SO and Rhelp deprecate cross-posting but if you do choose to switch venues it is at the very least courteous to inform the readership.
You responded to the suggestion here on SO that there were missing data in your data-object named 'data' by claiming there were no missing data. So what does this code return:
lapply(data , function(x) sum(is.na(x)))
That would be a first level check, but there could also be an error caused by a later step that encountered a missing value in the matrix of correlation coefficients in the matrix 'huge.out". That could happen if there were: a) infinities in the calculations or b) if one of the columns were constant:
> cor(c(1:10,Inf), 1:11)
[1] NaN
> cor(rep(2,7), rep(2,7))
[1] NA
Warning message:
In cor(rep(2, 7), rep(2, 7)) : the standard deviation is zero
So the next check is:
sum( is.na(huge.out) )
That will at least give you some basis for defending your claim of no missings and will also give you a plausible theory as to the source of the error. To locate a column that is entirely constant you might do something like this (assuming it were a dataframe):
which(sapply(sapply(data, unique), length) > 1)
If it's a matrix, you need to use apply.

Resources