What does "argument to 'which' is not logical" mean in FactoMineR MCA? - r

I'm trying to run an MCA on a datatable using FactoMineR. It contains only 0/1 numerical columns, and its size is 200.000 * 20.
require(FactoMineR)
result <- MCA(data[, colnames, with=F], ncp = 3)
I get the following error :
Error in which(unlist(lapply(listModa, is.numeric))) :
argument to 'which' is not logical
I didn't really know what to do with this error. Then I tried to turn every column to character, and everything worked. I thought it could be useful to someone else, and that maybe someone would be able to explain the error to me ;)
Cheers

Are the classes of your variables character or factor?I was having this problem. My solution was to change al variables to factor.
#my data.frame was "aux.da"
i=0
while(i < ncol(aux.da)){
i=i+1 aux.da[,i] = as.factor(aux.da[,i])
}

It's difficult to tell without further input, but what you can do is:
Find the function where the error occurred (via traceback()),
Set a breakpoint and debug it:
trace(tab.disjonctif, browser)
I did the following (offline) to find the name of tab.disjonctif:
Found the package on the CRAN mirror on GitHub
Search for that particular expression that gives the error

I just started to learn R yesterday, but the error comes from the fact that the MCA is for categorical data, so that's why your data cannot be numeric. Then to be more precise, before the MCA a "tableau disjonctif" (sorry i don't know the word in english : Complete disjunctive matrix) is created.
So FactomineR is using this function :
https://github.com/cran/FactoMineR/blob/master/R/tab.disjonctif.R
Where i think it's looking for categorical values that can be matched to a numerical value (like Y = 1, N = 0).
For others ; be careful : for R categorical data is related to factor type, so even if you have characters you could get this error.

To build off #marques, #Khaled, and #Pierre Gourseaud:
Yes, changing the format of your variables to factor should address the error message, but you shouldn't change the format of numerical data to factor if it's supposed to be continuous numerical data. Rather, if you have both continuous and categorical variables, try running a Factor Analysis for Mixed Data (FAMD) in the same FactoMineR package.
If you go the FAMD route, you can change the format of just your categorical variable columns to factor with this:
data[,c(3:5,10)] <- lapply(data[,c(3:5,10)] , factor)
(assuming column numbers 3,4,5 and 10 need to be changed).

This will not work for only numeric variables. If you only have numeric use PCA. Otherwise, add a factor variable to your data frame. It seems like for your case you need to change your variables to binary factors.

Same problem as well and changing to factor did not solve my answer either, because I had put every variable as supplementary.
What I did first was transform all my numeric data to factor :
Xfac = factor(X[,1], ordered = TRUE)
for (i in 2:29){
tfac = factor(X[,i], ordered = TRUE)
Xfac = data.frame(Xfac, tfac)
}
colnames(Xfac)=labels(X[1,])
Still, it would not work. But my 2nd problem was that I included EVERY factor as supplementary variable !
So these :
MCA(Xfac, quanti.sup = c(1:29), graph=TRUE)
MCA(Xfac, quali.sup = c(1:29), graph=TRUE)
Would generate the same error, but this one works :
MCA(Xfac, graph=TRUE)
Not transforming the data to factors also generated the problem.
I posted the same answer to a related topic : https://stackoverflow.com/a/40737335/7193352

Related

How to solve error factor has bad level in R

I have some difficulties applying the inhomogeneous G-function to my point pattern in R.
In order to use GmultiInhom, I first tried to convert my point pattern bci.tree8pppa to a multitype pattern:
bci.tree8multi = ppp(bci.tree8pppa$x, bci.tree8pppa$y, window=owin(c(0,1000), c(0,500)), marks = factor(bci.tree8pppa$marks[,3]))
Then applied the G-function as follows:
G = GmultiInhom(bci.tree8multi, marks(bci.tree8multi) == species1, marks(bci.tree8multi) == species2, lambdaI = lambda1points, lambdaJ = lambda2points, lambdamin = min(lambda2points), r = c(0,r1,r2,r3))
But this yields the error: "Error in split.default(X, group) : factor has bad level"
How can I solve this?
Thank you in advance!
For the benefit of any R programmers out there: I traced the error message "factor has bad level" to the C source code for .Internal(split.default(x,f)) in the base R system. This error message can occur only when f is a list rather than a factor. The code converts f to a factor using the function interaction which performs character string manipulations. This conversion can go wrong, in the sense that the resulting factor has "bad levels": the integer representation of the factor includes values less than 1 or greater than the number of levels of the factor. Then the error occurs.
The original post has not provided a working example, so it's difficult to figure out exactly how the code in spatstat::GmultiInhom caused the wrong type of data to be fed to split.default. However, it must be related to the misuse of the argument r. The code in spatstat will be tightened to enforce stricter requirements on the format of r.

Coercing a vector to numeric mode in R

So, I have a set of data, and what I'm trying to do is find all the local maxima on the resulting curve. I read in a CSV file, which has x-values in the first column and y-values in the second, first step done, easy.
To find the maxima, I tried to use the findpeaks() function from the pracma database. However, each time I tried to run it, I got the same error:
Error: is.vector(x, mode = "numeric") is not TRUE
So, I first tried just converting this to a vector. Still got the same issue, however is.vector(x, mode = "any") was now returning true. I found some other help threads (which I can no longer find, so I can't share them, sorry!), and decided to try using lapply to coerce each entry in the new vector using as.numeric. Didn't work. Looked into ?as.numeric, and it mentioned that as.double might be better suited. Didn't work. Now I'm at a loss and not sure what to do - current working code is shown below.
plot <- read_csv("AFGP60 UV-05-04-16.csv",
col_names = FALSE, na = "null", skip = 2,n_max = numrow)
diffplot <- c(plot[1:601,2])
diffplot <- lapply(diffplot,as.double)
findpeaks(diffplot)`
Try diffplot <- as.numeric(as.vector(plot[1:600, 2])).
The problem was that the data was read as character or as factor. The above code should change that. However, there are multiple issues with your code. First, plot is a base function used for plotting. Naming a variable with such a name is bad practice.
Second, the diffplot variable is a vector (first 600 rows from the second column), so there is no need to change each element separately with the lapply function.

Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector

The reason the title of the question is the error I am getting is because I simply do not know how to interpret it, no matter how much I research. Whenever I run a logistic regression with bigglm() (from the biglm package, designed to run regressions over large amounts of data), I get:
Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector
This is how my bigglm() function looks like:
fit <- bigglm(f, data = df, family=binomial(link="logit"), chunksize=100, maxit=10)
Where f is the formula and df is the dataframe (of little over a million rows and about 210 variables).
So far I have tried changing my dependent variable to a numeric class but that didn't work. My dependent variable has no missing values.
Judging from the error message I wonder if this might have to do anything with the family argument in the bigglm() function. I have found numerous other websites with people asking about the same error and most of them are either unanswered, or for a completely different case.
The error Argument eta must be a nonempty numeric vector to me looks like your data has either empty values or NA. So, please check your data. Whatever advice we provide here, cannot be tested until we see your code or the steps involved resulting an error.
try this
is.na(df) # if TRUE, then replace them with 0
df[is.na(df)] <- 0 # Not sure replacing NA with 0 will have effect on your model
or whatever line of the code is resulting in NAs generation pass na.rm=Targument
Again, we can only speculate. Hope it helps.

R programming MCA() in FactoMineR error message

I was using the MCA() function from FactoMineR package in R to do the multiple correspondence analysis on a set of around 160 variables with around 2000 observations. Around 150 of the variables are continuous, so I first used the cut() function to convert those continuous variables to categorical variables and then used MCA() function.
My code is very simple like this:
library(FactoMineR)
data<-read.csv('demographics.csv')
for (i in 9:length(data)){
temp<-unlist(data[i],use.names=FALSE)
data[i]<-cut(temp,breaks=5,labels=c('A','B','C','D','E'))
}
MC<-MCA(data,ncp=10,graph=TRUE)
After I run the code, I got the following error message.
Error in dimnames(res) <- list(attributes(tab)$row.names, listModa) : length of 'dimnames' [2] not equal to array extent
I am wondering why this error occurs and how to fix it. There is no missing data in my table and all of the variables are categorical.
If anyone has encountered similar problems and would love to help, I would really appreciate it. Thanks a lot.
I have had this error before because the function requires the variables to be factors (and the data I was passing it wasn't fully converted into factors). Unlike a lot of other R functions, this one does not convert the data for you even if all columns are categorical.
I'm not quite sure what your data is, but it is likely that one or more columns is not as a factor variable. If your columns 1 to 8 are already factors then it may be in the read.csv call; string variables will automatically be converted into a factors when you read them in from the csv, but numeric ones will not.

How does the first argument of the following `t.test()` work? How can it be a numeric vector of data values?

for the built-in matrix ToothGrowth :
tooth.1mg <- subset(ToothGrowth,dose==1)
tt <- t.test(len~supp,tooth.1mg,alternative="two.sided",
+ var.equal=FALSE,conf.level=0.95)
How len~supp works ? what does the symbol ~ indicate ?
why can't i write supp~len
?
This is what it does with a formula interface. Also see ?t.test.
Should be clear from that and the error message why supp~len can't work - the rhs of the formula is supposed to be a categorical variable, i.e. a factor, with precisely two categories (and lhs are the values in each category).

Resources