R programming MCA() in FactoMineR error message - r

I was using the MCA() function from FactoMineR package in R to do the multiple correspondence analysis on a set of around 160 variables with around 2000 observations. Around 150 of the variables are continuous, so I first used the cut() function to convert those continuous variables to categorical variables and then used MCA() function.
My code is very simple like this:
library(FactoMineR)
data<-read.csv('demographics.csv')
for (i in 9:length(data)){
temp<-unlist(data[i],use.names=FALSE)
data[i]<-cut(temp,breaks=5,labels=c('A','B','C','D','E'))
}
MC<-MCA(data,ncp=10,graph=TRUE)
After I run the code, I got the following error message.
Error in dimnames(res) <- list(attributes(tab)$row.names, listModa) : length of 'dimnames' [2] not equal to array extent
I am wondering why this error occurs and how to fix it. There is no missing data in my table and all of the variables are categorical.
If anyone has encountered similar problems and would love to help, I would really appreciate it. Thanks a lot.

I have had this error before because the function requires the variables to be factors (and the data I was passing it wasn't fully converted into factors). Unlike a lot of other R functions, this one does not convert the data for you even if all columns are categorical.
I'm not quite sure what your data is, but it is likely that one or more columns is not as a factor variable. If your columns 1 to 8 are already factors then it may be in the read.csv call; string variables will automatically be converted into a factors when you read them in from the csv, but numeric ones will not.

Related

PCA in R: Error in svd(x, nu=0, nv=k) : Infinite or missing values in 'x'

My dataframe contains about 26k rows with 129 variables. I've made sure all of the variables are numeric and do not have any NA values (used na.omit). Using the function prcomp() on my dataframe tells me "Infinite or missing values in x". What might I be overlooking then?
Did you also make sure none of them are infinite? As that's the other part of that message?
Easily check all this with:
all( is.finite( your.data.frame ) )

How to solve error factor has bad level in R

I have some difficulties applying the inhomogeneous G-function to my point pattern in R.
In order to use GmultiInhom, I first tried to convert my point pattern bci.tree8pppa to a multitype pattern:
bci.tree8multi = ppp(bci.tree8pppa$x, bci.tree8pppa$y, window=owin(c(0,1000), c(0,500)), marks = factor(bci.tree8pppa$marks[,3]))
Then applied the G-function as follows:
G = GmultiInhom(bci.tree8multi, marks(bci.tree8multi) == species1, marks(bci.tree8multi) == species2, lambdaI = lambda1points, lambdaJ = lambda2points, lambdamin = min(lambda2points), r = c(0,r1,r2,r3))
But this yields the error: "Error in split.default(X, group) : factor has bad level"
How can I solve this?
Thank you in advance!
For the benefit of any R programmers out there: I traced the error message "factor has bad level" to the C source code for .Internal(split.default(x,f)) in the base R system. This error message can occur only when f is a list rather than a factor. The code converts f to a factor using the function interaction which performs character string manipulations. This conversion can go wrong, in the sense that the resulting factor has "bad levels": the integer representation of the factor includes values less than 1 or greater than the number of levels of the factor. Then the error occurs.
The original post has not provided a working example, so it's difficult to figure out exactly how the code in spatstat::GmultiInhom caused the wrong type of data to be fed to split.default. However, it must be related to the misuse of the argument r. The code in spatstat will be tightened to enforce stricter requirements on the format of r.

Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector

The reason the title of the question is the error I am getting is because I simply do not know how to interpret it, no matter how much I research. Whenever I run a logistic regression with bigglm() (from the biglm package, designed to run regressions over large amounts of data), I get:
Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector
This is how my bigglm() function looks like:
fit <- bigglm(f, data = df, family=binomial(link="logit"), chunksize=100, maxit=10)
Where f is the formula and df is the dataframe (of little over a million rows and about 210 variables).
So far I have tried changing my dependent variable to a numeric class but that didn't work. My dependent variable has no missing values.
Judging from the error message I wonder if this might have to do anything with the family argument in the bigglm() function. I have found numerous other websites with people asking about the same error and most of them are either unanswered, or for a completely different case.
The error Argument eta must be a nonempty numeric vector to me looks like your data has either empty values or NA. So, please check your data. Whatever advice we provide here, cannot be tested until we see your code or the steps involved resulting an error.
try this
is.na(df) # if TRUE, then replace them with 0
df[is.na(df)] <- 0 # Not sure replacing NA with 0 will have effect on your model
or whatever line of the code is resulting in NAs generation pass na.rm=Targument
Again, we can only speculate. Hope it helps.

What does "argument to 'which' is not logical" mean in FactoMineR MCA?

I'm trying to run an MCA on a datatable using FactoMineR. It contains only 0/1 numerical columns, and its size is 200.000 * 20.
require(FactoMineR)
result <- MCA(data[, colnames, with=F], ncp = 3)
I get the following error :
Error in which(unlist(lapply(listModa, is.numeric))) :
argument to 'which' is not logical
I didn't really know what to do with this error. Then I tried to turn every column to character, and everything worked. I thought it could be useful to someone else, and that maybe someone would be able to explain the error to me ;)
Cheers
Are the classes of your variables character or factor?I was having this problem. My solution was to change al variables to factor.
#my data.frame was "aux.da"
i=0
while(i < ncol(aux.da)){
i=i+1 aux.da[,i] = as.factor(aux.da[,i])
}
It's difficult to tell without further input, but what you can do is:
Find the function where the error occurred (via traceback()),
Set a breakpoint and debug it:
trace(tab.disjonctif, browser)
I did the following (offline) to find the name of tab.disjonctif:
Found the package on the CRAN mirror on GitHub
Search for that particular expression that gives the error
I just started to learn R yesterday, but the error comes from the fact that the MCA is for categorical data, so that's why your data cannot be numeric. Then to be more precise, before the MCA a "tableau disjonctif" (sorry i don't know the word in english : Complete disjunctive matrix) is created.
So FactomineR is using this function :
https://github.com/cran/FactoMineR/blob/master/R/tab.disjonctif.R
Where i think it's looking for categorical values that can be matched to a numerical value (like Y = 1, N = 0).
For others ; be careful : for R categorical data is related to factor type, so even if you have characters you could get this error.
To build off #marques, #Khaled, and #Pierre Gourseaud:
Yes, changing the format of your variables to factor should address the error message, but you shouldn't change the format of numerical data to factor if it's supposed to be continuous numerical data. Rather, if you have both continuous and categorical variables, try running a Factor Analysis for Mixed Data (FAMD) in the same FactoMineR package.
If you go the FAMD route, you can change the format of just your categorical variable columns to factor with this:
data[,c(3:5,10)] <- lapply(data[,c(3:5,10)] , factor)
(assuming column numbers 3,4,5 and 10 need to be changed).
This will not work for only numeric variables. If you only have numeric use PCA. Otherwise, add a factor variable to your data frame. It seems like for your case you need to change your variables to binary factors.
Same problem as well and changing to factor did not solve my answer either, because I had put every variable as supplementary.
What I did first was transform all my numeric data to factor :
Xfac = factor(X[,1], ordered = TRUE)
for (i in 2:29){
tfac = factor(X[,i], ordered = TRUE)
Xfac = data.frame(Xfac, tfac)
}
colnames(Xfac)=labels(X[1,])
Still, it would not work. But my 2nd problem was that I included EVERY factor as supplementary variable !
So these :
MCA(Xfac, quanti.sup = c(1:29), graph=TRUE)
MCA(Xfac, quali.sup = c(1:29), graph=TRUE)
Would generate the same error, but this one works :
MCA(Xfac, graph=TRUE)
Not transforming the data to factors also generated the problem.
I posted the same answer to a related topic : https://stackoverflow.com/a/40737335/7193352

R using cell values from a data frame as arguments in an already defined custom function

I am relatively new to R and programming in general, so my question might be due to a lack of experience and cryptic error messages. I have done a fair amount of investigation and experimenting with different versions of apply and functions in the plyr package. The root of my question is how to have the value from a cell in a data frame be supplied as an argument in my function? I'll do my best to provide example data.
I am working with survey data in R, so I have a data frame with many columns and rows. I created a custom function to process some of the data. I run the script for the function first, so that it is loaded in the workspace in RStudio. The function has two arguments:
myfunction <- function(id, info){
# various data processing
}
myfunction does not return anything. When using real data, it outputs some .CSVs for me, so I don't need to get anything back from it - just need it to run using the values from every row.
For the sake of this example, lets say my data frame (called mydata) only has two columns (and in fact, I can subset it down to just these two columns in the overall process if needed for the solution).
ID Gender
1 M
2 F
3 F
4 M
What I would like to happen, is have R go through each row and provide the values of the cells as the two arguments in myfunction:
# So for the first row, it should do
myfunction("1", "M")
# And the second:
myfunction("2", "F")
The closest I've gotten is this:
a_ply(mydata, c(1,2), print)
ID
1 1
2 2
3 3
4 4
Gender
1 M
2 F
3 F
4 M
Which seems like it is in the right direction, but whenever I put myfunction in the a_ply I can't get it to work the way I want. I either get this error message:
Error in eval(expr, envir, enclos) : object 'X' not found
## Which I believe is actually an error from myfunction, which would mean the
## ID value is not passing through to it correctly
Or when playing around with different versions of that a_ply command, I get this error:
Error in file(file, "rt") : invalid 'description' argument
Thanks in advance for any help, so far I've been able to make it this far reading documentation and lots of other posts here, but I can't seem to find anything explaining this.
(For completion and closing the question):
apply(mydata,1, function(x) myfunction(x[1],x[2]))

Resources