Comparing integers with characters in R [duplicate] - r

This question already has answers here:
Why TRUE == "TRUE" is TRUE in R?
(3 answers)
Why does "one" < 2 equal FALSE in R?
(2 answers)
Closed last year.
It appears that as.character() of a number is still a number, which I find counter intuitive. Consider this example:
1 > "2"
[1] FALSE
2 > "1"
[1] TRUE
Even if I try to use as.character() or paste()
as.character(2)
[1] "2"
as.character(2) > 1
[1] TRUE
as.character(2) < 1
[1] FALSE
Why is that? Can't I have R return an error when I am comparing numbers with strings?

As explained in the comments the problem is that the numeric 1 is coerced to character.
The operation < still works for characters. A character is smaller than another if it comes first in alphabetical order.
> "a" < "b"
[1] TRUE
> "z" < "b"
[1] FALSE
So in your case as.character(2) > 1 is transformed to as.character(2) > as.character(1) and because of the "alphabetical" order of numbers TRUEis returned.
To prevent this you would have to check for the class of an object manually.

The documentation of ?Comparison states that
If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
So in your case the number is automatically coerced to string and the comparison is made based on the respective collation.
In order to prevent it, the only option I know of is to manually compare the class first.

Related

#R curious about how R compares character with number? [duplicate]

This question already has answers here:
Comparing integers with characters in R [duplicate]
(2 answers)
Why does "one" < 2 equal FALSE in R?
(2 answers)
Closed last year.
So here is what I found in R:
> "20" < 3
[1] TRUE
> "20" < 2
[1] FALSE
and I tried to find out what exactly number the "20" is. So basically, "20" is between 2.99999999999999489 and 2.99999999999999490, so why is it?
Also, "200" and "2000" (and so on...) have the same range as "20", but the wired thing is R gives me this:
> "20" == "200"
[1] FALSE
> "200" == "2000"
[1] FALSE
I believe they are not same, but R stores them in different number with that range "20" has. So beside why is it, I'm more curious about how R stores characters in numbers, I mean if "200" and "2000" are different numbers between 2.99999999999999489 and 2.99999999999999490, then how many numbers between them? (I know real numbers are infinite, but there must be a digit that R decides to end with)
The help for < tells:
If the two arguments are atomic vectors of different types,
one is coerced to the type of the other,
the (decreasing) order of precedence being
character, complex, numeric, integer, logical and raw.
So when you do "20" < 3, R internally converts 3 to "3". So this becomes "20" < "3".
The comparison between strings is lexicographic (same as the order of words in dictionaries).

How does R compare version strings with the inequality operators?

Could someone explain this behavior in R?
> '3.0.1' < '3.0.2'
[1] TRUE
> '3.0.1' > '3.0.2'
[1] FALSE
What process is R doing to make the comparison?
It's making a lexicographic comparison in this case, as opposed to converting to numeric, as calling as.numeric('3.0.1') returns NA.
The logic here would be something like, "the strings '3.0.1' and '3.0.2' are equivalent until their final characters, and since 1 precedes 2 in an alphanumeric alphabet, '3.0.1' is less than '3.0.2'." You can test this with some toy examples:
'a' < 'b' # TRUE
'ab' < 'ac' # TRUE
'ab0' < 'ab1' # TRUE
Per the note in the manual in the post that #rawr linked in the comments, this will get hairy in different locales, where the alphanumeric alphabet may be sorted differently.

How come as.character(1) == as.numeric(1) is TRUE? [duplicate]

This question already has answers here:
Why does "one" < 2 equal FALSE in R?
(2 answers)
Why is the expression "1"==1 evaluating to TRUE? [duplicate]
(1 answer)
Closed 3 years ago.
Just like the title says, why does "1" == 1 is TRUE? What is the real reason behind this? Is R trying to be kind or is this something else? I was thinking since "1" (or any numbers it really doesn't matter) where read by R as a character it would automatically return FALSE if compare with as.numeric(1) or as.integer(1).
> as.character(1) == as.numeric(1)
[1] TRUE
or
> "1" == 1
[1] TRUE
I guess it is a simple question but I'd like to get an answer. Thank you.
According to ?==
For numerical and complex values, remember == and != do not allow for the finite representation of fractions, nor for rounding error. Using all.equal with identical is almost always preferable. S
In another paragraph, it is also written
x, y
atomic vectors, symbols, calls, or other objects for which methods have been written. If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
identical(as.character(1), as.numeric(1))
#[1] FALSE

Why can't R handle inequalities between negative numbers in quotes

This is a weird problem, with an easy workaround, but I'm just so curious why R is behaving this way.
> "-1"<"-2"
[1] TRUE
> -1<"-2"
[1] TRUE
> "-1"< -2
[1] TRUE
> -1< -2
[1] FALSE
> as.numeric("-1")<"-2"
[1] TRUE
> "-1"<as.numeric("-2")
[1] TRUE
> as.numeric("-1")<as.numeric("-2")
[1] FALSE
What is happening? Please, for my own sanity...
A "number in quotes" is not a number at all, it is a string of characters. Those characters happen to be displayed with the same drawing on your screen as the corresponding number, but they are fundamentally not the same object.
The behavior you are seeing is consistent with the following:
A pair of numbers (numeric in R) is compared in the way that you should expect, numerically with the natural ordering. So, -1 < -2 is indeed FALSE.
A pair of strings (character in R) are compared in lexicographic order, meaning roughly that it is compared alphabetically, character by character, from left to right. Since "-1" and "-2" start with the same character, we move to the second, and "2" comes after "1", so "-2" comes after "-1" and therefore "-1" < "-2" is TRUE.
When comparing objects of mismatched types, you have two basic choices: either you give an error, or you convert one of the types to the other and then fall back on the two facts above. R takes the 2nd route, and chooses to convert numeric to character, which explains the result you got above (all your mismatched examples give TRUE).
Note that it makes more sense to convert numeric to character, rather than the other way around, because most character can't be automatically converted to numeric in a meaningful way.
I've always thought this is because the default behavior is to treat the values in quotes as character, and the values without quotes as double. Without expressly declaring the data types, you get this:
> typeof(-1)
[1] "double"
> typeof("-1")
[1] "character"
> typeof(as.numeric("-1"))
[1] "double"
It's only when the negative numbers are put in quotes that it orders them alphabetically, because they are characters.

why is "<integer>" == <integer> true in R

I just started learning R and in my first assignment, I face a problem where I need to compare a bunch of variables and while doing that I am supposed to get false when comparing two variables not only when they are not equal but also when their type is not same.
For example :
7 == "7"
gives true which should be false.
Currently, I am doing the same as follows:
var1 = 8 == "8"
var2 = typeof(8) == typeof("8")
var1 & var2
I guess there should be some much simpler approach for the same.
It seems like it implicitly converting 7 to "7" as it does when we add numeric to a character vector.
So is there a way to get the same result in 1 line?
From the ?Comparison help page:
If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
On the same help page, the authors warn for using == and != for tests in if-expressions. They recommend using identical() instead:
7 == "7"
# TRUE
identical(7, "7")
# FALSE

Resources