Why does "hello" > 0 return TRUE? - r

Try it:
"hello" > 0
I tried using as.numeric("hello") but it just gave me back NA. What gives?

Because 0 is coerced to "0". See help(">"):
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.

Related

What are character vectors made of?

"Alice" is a character vector of length 1. "Bob" is also a character vector of length 1, but it's clearly shorter. At face value, it appears that R's character are made out of something smaller than characters, but if you try to subset them, say "Alice"[1], you'll just get the original vector back. How does R internally make sense of this? What are character vectors actually made of?
You're mistaking vector length for string length.
In R common variables are all vectors containing whatever data you typed, so both are vectors that contain 1 string even if you don't assign a name to them.
If you want to check the size of each string, use nchar function:
nchar("Alice")
[1] 5
nchar("Bob")
[1] 3

R if function still executed even if condition evaluated to false

I'm using iris dataset.
Ran the following code:
functionq3 <- function(x) {
if(x[['Sepal.Length']] > 5) {
return("greater than 5")
}
else {
return("less than 5")
}
}
outputq3 <- apply(iris,1,functionq3)
print(outputq3)
It returns "greater than 5" even if the value is 5. I'm expecting "less than 5". What's going wrong?
apply coerces all the elements in the iris data frame to character. Then in your function, comparison operator > coerces the numeric 5 on the RHS of x[['Sepal.Length']] > 5 to character "5".
So the real comparison of "5.0" (in iris[['Sepal.Length']]) and 5 is "5.0" > "5". This comparison depend on how the character strings "5.0" and "5" are encoded.
See ?Comparison
Comparison of strings in character vectors is lexicographic within the
strings using the collating sequence of the locale in use ...
... If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.

Dot as a string comparison to a number [duplicate]

I had a bug in my code resulting from an inadvertent comparison between a character variable and a numeric variable (they were both supposed to be numeric). This bug would have been much easier to find if R had a warning when doing this type of comparison. For example, why does this not throw a warning
> 'two' < 5
[1] FALSE
but this does throw a warning
> as.numeric('two') < 5
[1] NA
Warning message:
NAs introduced by coercion
It is not clear to me what is going on behind the scenes in the first comparison?
In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).
From ?Comparison:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.

Why doesn't comparison between numeric and character variables give a warning?

I had a bug in my code resulting from an inadvertent comparison between a character variable and a numeric variable (they were both supposed to be numeric). This bug would have been much easier to find if R had a warning when doing this type of comparison. For example, why does this not throw a warning
> 'two' < 5
[1] FALSE
but this does throw a warning
> as.numeric('two') < 5
[1] NA
Warning message:
NAs introduced by coercion
It is not clear to me what is going on behind the scenes in the first comparison?
In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).
From ?Comparison:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.

Why does 1..99,999 == "1".."99,999" in R, but 100,000 != "100,000"?

In the console, go ahead and try
> sum(sapply(1:99999, function(x) { x != as.character(x) }))
0
For all of values 1 through 99999, "1" == 1, "2" == 2, ..., 99999 == "99999" are TRUE. However,
> 100000 == "100000"
FALSE
Why does R have this quirky behavior, and is this a bug? What would be a workaround to, e.g., check if every element in an atomic character vector is in fact numeric? Right now I was trying to check whether x == as.numeric(x) for each x, but that fails on certain datasets due to the above problem!
Have a look at as.character(100000). Its value is not equal to "100000" (have a look for yourself), and R is essentially just telling you so.
as.character(100000)
# [1] "1e+05"
Here, from ?Comparison, are R's rules for applying relational operators to values of different types:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.
Those rules mean that when you test whether 1=="1", say, R first converts the numeric value on the LHS to a character string, and then tests for equality of the character strings on the LHS and RHS. In some cases those will be equal, but in other cases they will not. Which cases produce inequality will be dependent on the current settings of options("scipen") and options("digits")
So, when you type 100000=="100000", it is as if you were actually performing the following test. (Note that internally, R may well/probably does use something different than as.character() to perform the conversion):
as.character(100000)=="100000"
# [1] FALSE

Resources