I just started learning R and in my first assignment, I face a problem where I need to compare a bunch of variables and while doing that I am supposed to get false when comparing two variables not only when they are not equal but also when their type is not same.
For example :
7 == "7"
gives true which should be false.
Currently, I am doing the same as follows:
var1 = 8 == "8"
var2 = typeof(8) == typeof("8")
var1 & var2
I guess there should be some much simpler approach for the same.
It seems like it implicitly converting 7 to "7" as it does when we add numeric to a character vector.
So is there a way to get the same result in 1 line?
From the ?Comparison help page:
If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
On the same help page, the authors warn for using == and != for tests in if-expressions. They recommend using identical() instead:
7 == "7"
# TRUE
identical(7, "7")
# FALSE
Related
Could someone explain this behavior in R?
> '3.0.1' < '3.0.2'
[1] TRUE
> '3.0.1' > '3.0.2'
[1] FALSE
What process is R doing to make the comparison?
It's making a lexicographic comparison in this case, as opposed to converting to numeric, as calling as.numeric('3.0.1') returns NA.
The logic here would be something like, "the strings '3.0.1' and '3.0.2' are equivalent until their final characters, and since 1 precedes 2 in an alphanumeric alphabet, '3.0.1' is less than '3.0.2'." You can test this with some toy examples:
'a' < 'b' # TRUE
'ab' < 'ac' # TRUE
'ab0' < 'ab1' # TRUE
Per the note in the manual in the post that #rawr linked in the comments, this will get hairy in different locales, where the alphanumeric alphabet may be sorted differently.
This question already has answers here:
Why TRUE == "TRUE" is TRUE in R?
(3 answers)
Why does "one" < 2 equal FALSE in R?
(2 answers)
Closed last year.
It appears that as.character() of a number is still a number, which I find counter intuitive. Consider this example:
1 > "2"
[1] FALSE
2 > "1"
[1] TRUE
Even if I try to use as.character() or paste()
as.character(2)
[1] "2"
as.character(2) > 1
[1] TRUE
as.character(2) < 1
[1] FALSE
Why is that? Can't I have R return an error when I am comparing numbers with strings?
As explained in the comments the problem is that the numeric 1 is coerced to character.
The operation < still works for characters. A character is smaller than another if it comes first in alphabetical order.
> "a" < "b"
[1] TRUE
> "z" < "b"
[1] FALSE
So in your case as.character(2) > 1 is transformed to as.character(2) > as.character(1) and because of the "alphabetical" order of numbers TRUEis returned.
To prevent this you would have to check for the class of an object manually.
The documentation of ?Comparison states that
If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
So in your case the number is automatically coerced to string and the comparison is made based on the respective collation.
In order to prevent it, the only option I know of is to manually compare the class first.
This question already has answers here:
Why does "one" < 2 equal FALSE in R?
(2 answers)
Why is the expression "1"==1 evaluating to TRUE? [duplicate]
(1 answer)
Closed 3 years ago.
Just like the title says, why does "1" == 1 is TRUE? What is the real reason behind this? Is R trying to be kind or is this something else? I was thinking since "1" (or any numbers it really doesn't matter) where read by R as a character it would automatically return FALSE if compare with as.numeric(1) or as.integer(1).
> as.character(1) == as.numeric(1)
[1] TRUE
or
> "1" == 1
[1] TRUE
I guess it is a simple question but I'd like to get an answer. Thank you.
According to ?==
For numerical and complex values, remember == and != do not allow for the finite representation of fractions, nor for rounding error. Using all.equal with identical is almost always preferable. S
In another paragraph, it is also written
x, y
atomic vectors, symbols, calls, or other objects for which methods have been written. If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
identical(as.character(1), as.numeric(1))
#[1] FALSE
vector1 = c(1,2,3,NA)
condition1 = (vector1 == 2)
vector1[condition1]
vector1[condition1==TRUE]
In the above code, the condition1 is "FALSE TRUE FALSE NA",
and the 3rd and the 4th lines both gives me the result "2 NA"
which is not I expected.
I wanted elements whose values are really '2', not including NA.
Could anybody explain why R is designed to work in this way?
and how I can get the result I want with a simple command?
The subset vector[NA] will always be NA because the NA value is unknown and therefore the result of the subset is also unknown. %in% returns FALSE for NA, so it can be useful here.
vector1 = c(1,2,3,NA)
condition1 = (vector1 %in% 2)
vector1[condition1]
# [1] 2
If you are in RStudio and enter
?`[`
You will get the following explanation:
NAs in indexing
When extracting, a numerical, logical or character NA index picks an
unknown element and so returns NA in the corresponding element of a
logical, integer, numeric, complex or character result, and NULL for a
list. (It returns 00 for a raw result.)
When replacing (that is using indexing on the lhs of an assignment) NA
does not select any element to be replaced. As there is ambiguity as
to whether an element of the rhs should be used or not, this is only
allowed if the rhs value is of length one (so the two interpretations
would have the same outcome). (The documented behaviour of S was that
an NA replacement index ‘goes nowhere’ but uses up an element of
value: Becker et al p. 359. However, that has not been true of other
implementations.)
try the logical operator in that case,
vector1 = c(1,2,3,NA)
condition1<-(vector1==2 & !is.na(vector1) )
condition1
# FALSE TRUE FALSE FALSE
vector1[condition1]
# 2
& operation returns true when both of the logical operators are True.
identical is "The safe and reliable way to test two objects for being exactly equal. It returns TRUE in this case, FALSE in every other case." (see ?identical)
As it does not compare elementwise comparison you can use it in sapply to compare each element in vector1 to 2. I.e.:
condition1 = sapply(vector1, identical, y = 2)
which will give:
vector1[condition1]
[1] 2
I'm reading Hadley Wickham's Advanced R section on coercion, and I can't understand the result of this comparison:
"one" < 2
# [1] FALSE
I'm assuming that R coerces 2 to a character, but I don't understand why R returns FALSE instead of returning an error. This is especially puzzling to me since
-1 < "one"
# TRUE
So my question is two-fold: first, why this answer, and second, is there a way of seeing how R converts the individual elements within a logical vector like these examples?
From help("<"):
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
So in this case, the numeric is of lower precedence than the character. So 2 is coerced to the character "2". Comparison of strings in character vectors is lexicographic which, as I understand it, is alphabetic but locale-dependent.
It coerces 2 into a character, then it does an alphabetical comparison. And numeric characters are assumed to come before alphabetical ones
to get a general idea on the behavior try
'a'<'1'
'1'<'.'
'b'<'B'
'a'<'B'
'A'<'B'
'C'<'B'