I want to vectorize a function that relies on checking a condition and depending on whether this condition is TRUE or FALSE, return the outcome of one of two functions, respectively. The problem is that, when the condition is FALSE, the first function cannot be evaluated. Then, ifelse returns the correct values but it also produces a warning. I would like to produce a function that does not produce warnings.
I have tried ifelse(), but it does not work. I was expecting that this command would skip the evaluation of the first function when the condition is FALSE.
Here is an illustrative piece of R code
p = c(-1,1,-1,1,-1,-1,-1,1)
ifelse(p>0, sqrt(p), p^2)
which returns
[1] 1 1 1 1 1 1 1 1
Warning message:
In sqrt(p) : NaNs produced
As you can see, the outcome is correct but, for some reason, it evaluates the function at the first function when condition is FALSE. Thus, I would like to somehow avoid this issue.
We can create a numeric vector and then fill the elements based on the condition put forward by 'p'
out <- numeric(length(p))
out[p > 0] <- sqrt(p[p > 0])
out[p <= 0] <- p[p <= 0]^2
With ifelse we need to have all arguments of the same length. According to ?ifelse
ifelse(test, yes, no)
A vector of the same length and attributes (including dimensions and
"class") as test and data values from the values of yes or no
What happens is that we do both the calculations on the entire vector and replace the values of 'p' based on the test condition. For sqrt, the negative values definitely gives warning and output as NaN. While the NaN elements don't show up in the output, the warning was already printed. The warning is a friendly one, but can be suppressed with suppressWarnings
Avoidance through ifelse probably isn't possible. My understanding of the ifelse process is
Create a vector of values based on the expression in yes
Create a vector of values based on the expression in no
Use the result of test to decide whether each element comes from yes or no.
If an error will occur in either yes or no, ifelse will fail.
To get around this, you need to only evaluate expressions where they will succeed. (such as in akrun's answer, a variant of which is given here for completeness)
p = c(-1,1,-1,1,-1,-1,-1,1)
condition <- p > 0
result <- numeric(length(p))
result[g1] <- sqrt(p[condition])
result[!g1] <- p[condition]^2
Related
I am trying to run a ifelse() command but getting some weird behavior...
Running:
1 <= 50
I get:
TRUE
Where typeof(1 <= 50) and class(1 <= 50) returns
[1] "logical"
However, once I put this into a ifelse() loop I get some weird behavior...
ifelse(1 <= 50, print("Yay"), print("Boo"))
[1] "Yay"
[1] "Yay"
It prints the true condition action twice....
I am thinking this is the reason I get this error:
Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
incompatible types (from S4 to logical) in subassignment type fix
When I write more complicated code:
ifelse(length(List[[1]]) >= 50, List[[1]][1], print("Error"))
Which is interesting because if I have the yes statementifelse() assign something to a variable, I still get the error but the resulting object is correct....
> ifelse(length(List[[1]]) >= 50, test <- List[[1]][1], print("Error"))
> test
What am I not understanding....
You are slightly misunderstanding the purpose of ifelse(): this function is made to pick elements from either of two vectors/matrices. The online help describes it as follows:
ifelse returns a value with the same shape as test which is filled
with elements selected from either yes or no depending on whether the
element of test is TRUE or FALSE.
Using arguments with side effects is allowed, but somewhat odd. I believe you should use if ... else for your case.
So what is going on for ifelse(1 <= 50, print("Yay"), print("Boo"))? The first argument is a number (vector of length 1) with just the value TRUE. So ifelse() returns a single element. Since the value is TRUE, it gets the value from the second argument. This prints "Yay", but also returns "Yay" to the ifelse() function. This returned "Yay" is then selected as the output and returned from the ifelse() call. After the call completes, this result is printed to the terminal, giving you the second line of "Yay".
My goal is to categorize the rows on my dataset depending on the values of two different dates.
if(!exists(MY_DATA$Date_1) & exists(MY_DATA$Date_2)) {
MY_DATA$NEW_COL <- c("Category_1")
} else {
MY_DATA$NEW_COL <- c("Category_2")
}
But it isn't working, I'm currently trying a simplified version as follows:
if(!exists(MY_DATA$Date_1)){
MY_DATA$NEW_COL <- c("Category_1")
}
However, it seems that this only reads the value on the first row, and it either gives me a column with all values as Category_1 or no column at all.
Also I have tried this with is.na(), is.null() and exists().
However, it seems that this only reads the value on the first row, and it either gives me a column with all values as Category_1 or no column at all.
This is because if statement requires a vector of length 1. When given a vector with length more than 1, it will only read the first member to make the decision TRUE or FALSE.
The ifelse function can accept vector argument and will return a vector of logical TRUE/FALSE. It may be suitable for your needs.
Rephrasing originally a comment by #r2evans, the use of exists() is to check if a variable is already defined in the R environment. exists() takes a character vector of length 1 as argument, otherwise it will check only the first member.
a = 1
b = 1
exists("a")
[1] TRUE
exists(c("a", "b"))
[1] TRUE
exists(c("ab", "a", "b"))
[1] FALSE
However it's worth noting that exists() does not check if a value is inside a vector. If you are trying to check if a value is in a vector, you'll want operator %in% instead.
The solution will largely depend on your precise implementations.
p.s. This is originally intended as a comment, but is too long as a comment.
Thanks everyone for your support, ifelse did the trick.
The following worked for me:
MY_DATA$NEW_COL <- c("Category_2")
MY_DATA$NEW_COL <- ifelse(!is.na(MY_DATA$Date_1),"Category_1","Category_2")
I am trying to see if the data.frame column has any null values to move to the next loop. I am currently using the code below:
if (is.na(df[,relevant_column]) == TRUE ){next}
which spits out the warning:
In if (is.na(df_cell_client[, numerator]) == TRUE) { ... : the
condition has length > 1 and only the first element will be used
How do I check if any of the values are null and not just the first row?
(I assume by "null" you really mean NA, since a data.frame cannot contain NULL in that sense.)
Your problem is that if expects a single logical, but is.na(df[,relevant_column]) is returning a vector of logicals. any reduces a vector of logicals into a single global "or" of the vector:
Try:
if (any(is.na(df[,relevant_column]))) {next}
BTW: == TRUE is unnecessary. Keep it if you feel you want the clarity in your code, but I think you'll find most R code does not use that. (I've also seen something == FALSE, equally "odd/wrong", where ! something should work ... but I digress.)
Why do the if-else construct and the function ifelse() behave differently?
mylist <- list(list(a=1, b=2), list(x=10, y=20))
l1 <- ifelse(sum(sapply(mylist, class) != "list")==0, mylist, list(mylist))
l2 <-
if(sum(sapply(mylist, class) != "list") == 0){ # T: all list elements are lists
mylist
} else {
list(mylist)
}
all.equal(l1,l2)
# [1] "Length mismatch: comparison on first 1 components"
From the ifelse documentation:
‘ifelse’ returns a value with the same shape as ‘test’ which is
filled with elements selected from either ‘yes’ or ‘no’ depending
on whether the element of ‘test’ is ‘TRUE’ or ‘FALSE’.
So your input has length one so the output is truncated to length 1.
You can also see this illustrated with a more simple example:
ifelse(TRUE, c(1, 3), 7)
# [1] 1
if ( cond) { yes } else { no } is a control structure. It was designed to effect programming forks rather than to process a sequence. I think many people come from SPSS or SAS whose authors chose "IF" to implement conditional assignment within their DATA or TRANSFORM functions and so they expect R to behave the same. SA and SPSS both have implicit FOR-loops in there Data steps. Whereas R came from a programming tradition. R's implicit for-loops are built in to the many vectorized functions (including ifelse). The lapply/sapply fucntions are the more Rsavvy way to implement most sequential processing, although they don't succeed at doing lagged variable access, especially if there are any randomizing features whose "effects" get cumulatively handled.
ifelse takes an expression that builds a vector of logical values as its first argument. The second and third arguments need to be vectors of equal length and either the first of them or the second gets chosen. This is similar to the SPSS/SAS IF commands which have an implicit by-row mode of operation.
For some reason this is marked as a duplicate of
Why does ifelse() return single-value output?
So a work around for that question is:
a=3
yo <- ifelse(a==1, 1, list(c(1,2)))
yo[[1]]
Can someone tell me what is wrong with this if-else loop in R? I frequently can't get if-else loops to work. I get an error:
if(match('SubjResponse',names(data))==NA) {
observed <- data$SubjResponse1
}
else {
observed <- data$SubjResponse
}
Note that data is a data frame.
The error is
Error in if (match("SubjResponse", names(data)) == NA) { :
missing value where TRUE/FALSE needed
This is not a full example as we do not have the data but I see these issues:
You cannot test for NA with ==, you need is.na()
Similarly, the output of match() and friends is usually tested for NULL or length()==0
I tend to write } else { on one line.
As #DirkEddelbuettel noted, you can't test NA that way. But you can make match not return NA:
By using nomatch=0 and reversing the if clause (since 0 is treated as FALSE), the code can be simplified. Furthermore, another useful coding idiom is to assign the result of the if clause, that way you won't mistype the variable name in one of the branches...
So I'd write it like this:
observed <- if(match('SubjResponse',names(data), nomatch=0)) {
data$SubjResponse # match found
} else {
data$SubjResponse1 # no match found
}
By the way if you "frequently" have problems with if-else, you should be aware of two things:
The object to test must not contain NA or NaN, or be a string (mode character) or some other type that can't be coerced into a logical value. Numeric is OK: 0 is FALSE anything else (but NA/NaN) is TRUE.
The length of the object should be exactly 1 (a scalar value). It can be longer, but then you get a warning. If it is shorter, you get an error.
Examples:
len3 <- 1:3
if(len3) 'foo' # WARNING: the condition has length > 1 and only the first element will be used
len0 <- numeric(0)
if(len0) 'foo' # ERROR: argument is of length zero
badVec1 <- NA
if(badVec1) 'foo' # ERROR: missing value where TRUE/FALSE needed
badVec2 <- 'Hello'
if(badVec2) 'foo' # ERROR: argument is not interpretable as logical