I'm using iris dataset.
Ran the following code:
functionq3 <- function(x) {
if(x[['Sepal.Length']] > 5) {
return("greater than 5")
}
else {
return("less than 5")
}
}
outputq3 <- apply(iris,1,functionq3)
print(outputq3)
It returns "greater than 5" even if the value is 5. I'm expecting "less than 5". What's going wrong?
apply coerces all the elements in the iris data frame to character. Then in your function, comparison operator > coerces the numeric 5 on the RHS of x[['Sepal.Length']] > 5 to character "5".
So the real comparison of "5.0" (in iris[['Sepal.Length']]) and 5 is "5.0" > "5". This comparison depend on how the character strings "5.0" and "5" are encoded.
See ?Comparison
Comparison of strings in character vectors is lexicographic within the
strings using the collating sequence of the locale in use ...
... If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
Related
I just started learning R and in my first assignment, I face a problem where I need to compare a bunch of variables and while doing that I am supposed to get false when comparing two variables not only when they are not equal but also when their type is not same.
For example :
7 == "7"
gives true which should be false.
Currently, I am doing the same as follows:
var1 = 8 == "8"
var2 = typeof(8) == typeof("8")
var1 & var2
I guess there should be some much simpler approach for the same.
It seems like it implicitly converting 7 to "7" as it does when we add numeric to a character vector.
So is there a way to get the same result in 1 line?
From the ?Comparison help page:
If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
On the same help page, the authors warn for using == and != for tests in if-expressions. They recommend using identical() instead:
7 == "7"
# TRUE
identical(7, "7")
# FALSE
I had a bug in my code resulting from an inadvertent comparison between a character variable and a numeric variable (they were both supposed to be numeric). This bug would have been much easier to find if R had a warning when doing this type of comparison. For example, why does this not throw a warning
> 'two' < 5
[1] FALSE
but this does throw a warning
> as.numeric('two') < 5
[1] NA
Warning message:
NAs introduced by coercion
It is not clear to me what is going on behind the scenes in the first comparison?
In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).
From ?Comparison:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
I am trying execute this code but getting result NA.
> node1<- paste0("train$", rule, collapse=" & ")
> node1
[1] "train$feat_11< 5.477 & train$feat_60< 4.687"
>x<-ifelse(node1,1,0)
[1] NA
How can I use character vector in if else function?
Logical vectors and character vectors are two very different things in R.
class(node1)
#>[1] "character"
You must first parse and evaluate the string.
lNode1 = eval(parse(text=node1))
class(lNode1)
#>[1] "logical"
x<-ifelse(lNode1,1,0)
#>a list of 1's and 0's
That being said, however, your ifelse statement is redundant. A logical vector will coerce to an integer vector when used in a fashion that requires it. For example, you can sum(lNode1) and get the number of times you pass both rules.
In the console, go ahead and try
> sum(sapply(1:99999, function(x) { x != as.character(x) }))
0
For all of values 1 through 99999, "1" == 1, "2" == 2, ..., 99999 == "99999" are TRUE. However,
> 100000 == "100000"
FALSE
Why does R have this quirky behavior, and is this a bug? What would be a workaround to, e.g., check if every element in an atomic character vector is in fact numeric? Right now I was trying to check whether x == as.numeric(x) for each x, but that fails on certain datasets due to the above problem!
Have a look at as.character(100000). Its value is not equal to "100000" (have a look for yourself), and R is essentially just telling you so.
as.character(100000)
# [1] "1e+05"
Here, from ?Comparison, are R's rules for applying relational operators to values of different types:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.
Those rules mean that when you test whether 1=="1", say, R first converts the numeric value on the LHS to a character string, and then tests for equality of the character strings on the LHS and RHS. In some cases those will be equal, but in other cases they will not. Which cases produce inequality will be dependent on the current settings of options("scipen") and options("digits")
So, when you type 100000=="100000", it is as if you were actually performing the following test. (Note that internally, R may well/probably does use something different than as.character() to perform the conversion):
as.character(100000)=="100000"
# [1] FALSE
Try it:
"hello" > 0
I tried using as.numeric("hello") but it just gave me back NA. What gives?
Because 0 is coerced to "0". See help(">"):
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.