Compute the mean in R subject to conditions

Compute the mean in R subject to conditions - r

I tried following the advice on this post (Conditional mean statement), however it did not seem to work for me. I get the error Error in x[j] : only 0's may be mixed with negative subscripts.
So I have a database called data with a few different columns and many rows. There is one indicator column, z, which takes value 0 or 1. I want to compute the mean of the column base if z depending on whether z=0 or z=1. So I have used the following line of code:
mean(data[data$z==1, data$base], na.rm = TRUE)
But as mentioned, I get the error Error in x[j] : only 0's may be mixed with negative subscripts. I'm unsure why I am getting this error, or what I could/should do instead. I do not actually understand the error.
Thanks.

Using the comment by GKi worked and solved my problem. In the end I used
mean(data$base[data$z==1], na.rm = TRUE)

Related

Error when trying to make confusion matrix

cm = table(obs = test[,14], pred)
Error in if (xi > xj) 1L else -1L: missing value where TRUE/FALSE needed
I am trying to output the confusion matrix of my random forest model on the testing data, but I'm getting this error. Any ideas what the issue might be?
Thank you in advance!

The error function tells us that one of the items in test[,14] or pred is missing (NA), and the table() function you are using cannot handle missing values. I expect you can get a confusion matrix by first eliminating elements of both vectors where either vector is NA.
Note that the table() function you are using does not seem to be the base R table() function. I expect it is part of a package you have loaded.

Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector

The reason the title of the question is the error I am getting is because I simply do not know how to interpret it, no matter how much I research. Whenever I run a logistic regression with bigglm() (from the biglm package, designed to run regressions over large amounts of data), I get:
Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector
This is how my bigglm() function looks like:
fit <- bigglm(f, data = df, family=binomial(link="logit"), chunksize=100, maxit=10)
Where f is the formula and df is the dataframe (of little over a million rows and about 210 variables).
So far I have tried changing my dependent variable to a numeric class but that didn't work. My dependent variable has no missing values.
Judging from the error message I wonder if this might have to do anything with the family argument in the bigglm() function. I have found numerous other websites with people asking about the same error and most of them are either unanswered, or for a completely different case.

The error Argument eta must be a nonempty numeric vector to me looks like your data has either empty values or NA. So, please check your data. Whatever advice we provide here, cannot be tested until we see your code or the steps involved resulting an error.
try this
is.na(df) # if TRUE, then replace them with 0
df[is.na(df)] <- 0 # Not sure replacing NA with 0 will have effect on your model
or whatever line of the code is resulting in NAs generation pass na.rm=Targument
Again, we can only speculate. Hope it helps.

What does "argument to 'which' is not logical" mean in FactoMineR MCA?

I'm trying to run an MCA on a datatable using FactoMineR. It contains only 0/1 numerical columns, and its size is 200.000 * 20.
require(FactoMineR)
result <- MCA(data[, colnames, with=F], ncp = 3)
I get the following error :
Error in which(unlist(lapply(listModa, is.numeric))) :
argument to 'which' is not logical
I didn't really know what to do with this error. Then I tried to turn every column to character, and everything worked. I thought it could be useful to someone else, and that maybe someone would be able to explain the error to me ;)
Cheers

Are the classes of your variables character or factor?I was having this problem. My solution was to change al variables to factor.
#my data.frame was "aux.da"
i=0
while(i < ncol(aux.da)){
i=i+1 aux.da[,i] = as.factor(aux.da[,i])
}

It's difficult to tell without further input, but what you can do is:
Find the function where the error occurred (via traceback()),
Set a breakpoint and debug it:
trace(tab.disjonctif, browser)
I did the following (offline) to find the name of tab.disjonctif:
Found the package on the CRAN mirror on GitHub
Search for that particular expression that gives the error

I just started to learn R yesterday, but the error comes from the fact that the MCA is for categorical data, so that's why your data cannot be numeric. Then to be more precise, before the MCA a "tableau disjonctif" (sorry i don't know the word in english : Complete disjunctive matrix) is created.
So FactomineR is using this function :
https://github.com/cran/FactoMineR/blob/master/R/tab.disjonctif.R
Where i think it's looking for categorical values that can be matched to a numerical value (like Y = 1, N = 0).
For others ; be careful : for R categorical data is related to factor type, so even if you have characters you could get this error.

To build off #marques, #Khaled, and #Pierre Gourseaud:
Yes, changing the format of your variables to factor should address the error message, but you shouldn't change the format of numerical data to factor if it's supposed to be continuous numerical data. Rather, if you have both continuous and categorical variables, try running a Factor Analysis for Mixed Data (FAMD) in the same FactoMineR package.
If you go the FAMD route, you can change the format of just your categorical variable columns to factor with this:
data[,c(3:5,10)] <- lapply(data[,c(3:5,10)] , factor)
(assuming column numbers 3,4,5 and 10 need to be changed).

This will not work for only numeric variables. If you only have numeric use PCA. Otherwise, add a factor variable to your data frame. It seems like for your case you need to change your variables to binary factors.

Same problem as well and changing to factor did not solve my answer either, because I had put every variable as supplementary.
What I did first was transform all my numeric data to factor :
Xfac = factor(X[,1], ordered = TRUE)
for (i in 2:29){
tfac = factor(X[,i], ordered = TRUE)
Xfac = data.frame(Xfac, tfac)
}
colnames(Xfac)=labels(X[1,])
Still, it would not work. But my 2nd problem was that I included EVERY factor as supplementary variable !
So these :
MCA(Xfac, quanti.sup = c(1:29), graph=TRUE)
MCA(Xfac, quali.sup = c(1:29), graph=TRUE)
Would generate the same error, but this one works :
MCA(Xfac, graph=TRUE)
Not transforming the data to factors also generated the problem.
I posted the same answer to a related topic : https://stackoverflow.com/a/40737335/7193352

Error in huge R package when criterion "stars"

I am trying to do an association network using some expression data I have, the data is really huge: 300 samples and ~30,000 genes. I would like to apply a Gaussian graphical model to my data using the huge R package.
Here is the code I am using
dim(data)
#[1] 317 32291
huge.out <- huge.npn(data)
huge.stars <- huge.select(huge.out, criterion="stars")
However in this last step I got an error:
Error in cor(x) : ling....in progress:10%
Missing values present in input variable 'x'. Consider using use = 'pairwise.complete.obs'
Any help would be very appreciated.

You posted this exact question on Rhelp today. Both SO and Rhelp deprecate cross-posting but if you do choose to switch venues it is at the very least courteous to inform the readership.
You responded to the suggestion here on SO that there were missing data in your data-object named 'data' by claiming there were no missing data. So what does this code return:
lapply(data , function(x) sum(is.na(x)))
That would be a first level check, but there could also be an error caused by a later step that encountered a missing value in the matrix of correlation coefficients in the matrix 'huge.out". That could happen if there were: a) infinities in the calculations or b) if one of the columns were constant:
> cor(c(1:10,Inf), 1:11)
[1] NaN
> cor(rep(2,7), rep(2,7))
[1] NA
Warning message:
In cor(rep(2, 7), rep(2, 7)) : the standard deviation is zero
So the next check is:
sum( is.na(huge.out) )
That will at least give you some basis for defending your claim of no missings and will also give you a plausible theory as to the source of the error. To locate a column that is entirely constant you might do something like this (assuming it were a dataframe):
which(sapply(sapply(data, unique), length) > 1)
If it's a matrix, you need to use apply.

simulation a while loop

there might be some threads on while loops but I am struggling with them. It would be great if someone could help an R beginner out.
So I am trying to do 10000 simulations from a an out of sample regression forecast using the forecast parameters: mean, sd. Thankfully, my data is normal.
This is what I have
N<-10000
i<-1:N
k<-vector(,N)
while(i<N+1){k(,i)=vector(,rnorm(N,mean=.004546,sd=.00464163))}
...and I get this error
Error in vector(, rnorm(5000, mean = 0.004546, sd = 0.00464163)) :
invalid 'length' argument
In addition: Warning message:
In while (i < N + 1) { : the condition has length > 1 and only the first element will be used
I can't seem to get my head around it.

No reason to create a loop here. If you want to put 10000 samples, normal distributed around mean = 0.004546 and sd = 0.00464163 into vector k, just do:
k <- rnorm(10000,mean = 0.004546, sd = 0.00464163)

try this
N<-10
i<-1
k<-matrix(0,1,N)
while(i<N+1){k[i]=rnorm(1,mean=.004546,sd=.00464163)
i=i+1
}
print(k)

To solve your problem, use #Esben Friis' answer. You are taking a hard approach to an easy problem.
To adress the questions you had about the error messages you got however:
Error in vector(, rnorm(5000, mean = 0.004546, sd = 0.00464163)) :
invalid 'length' argument
This is the wrong way to go as vector() will produce a vector of a set length instead of a set of values. You are thinking about the as.vector() function:
as.vector(rnorm(5000, mean = 0.004546, sd = 0.00464163))
This is however not needed as this will only create a new vector of your values, which are already in a vector structure of the type double. Using this function will therefore not change anything.
It is best to simply use:
rnorm(5000, mean=0.004546, sd=0.00464163)
Further:
In addition: Warning message:
In while(i<N+1){: the condition has length>1 and only the first element will be used
This warning stems from i being a vector 1:N with a length larger than 1. The warning states that only the first index in i will be recycled (used in all instances of the loop) which is the same as doing i[1] .
while(i<N+1){ }
#is the same as
while(i[1]<N+1){ }
Instead you want to loop a new value to N. Furthermore you can use the <= (less or equal to) operator instead of doing <N+1 .
while(newVal<=N){ }
This method will bring up new problems which could be solved by using a for() loop instead, but that is however out of the scope of the question and really not the right approach to your problem, as stated in the beginning. Hope you learned something and good luck!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Compute the mean in R subject to conditions - r

Using the comment by GKi worked and solved my problem. In the end I used mean(data$base[data$z==1], na.rm = TRUE)

Related

Error when trying to make confusion matrix

Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector

What does "argument to 'which' is not logical" mean in FactoMineR MCA?

Error in huge R package when criterion "stars"

simulation a while loop

Categories

Resources