Try it:
"hello" > 0
I tried using as.numeric("hello") but it just gave me back NA. What gives?
Because 0 is coerced to "0". See help(">"):
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.
Related
"Alice" is a character vector of length 1. "Bob" is also a character vector of length 1, but it's clearly shorter. At face value, it appears that R's character are made out of something smaller than characters, but if you try to subset them, say "Alice"[1], you'll just get the original vector back. How does R internally make sense of this? What are character vectors actually made of?
You're mistaking vector length for string length.
In R common variables are all vectors containing whatever data you typed, so both are vectors that contain 1 string even if you don't assign a name to them.
If you want to check the size of each string, use nchar function:
nchar("Alice")
[1] 5
nchar("Bob")
[1] 3
I'm using iris dataset.
Ran the following code:
functionq3 <- function(x) {
if(x[['Sepal.Length']] > 5) {
return("greater than 5")
}
else {
return("less than 5")
}
}
outputq3 <- apply(iris,1,functionq3)
print(outputq3)
It returns "greater than 5" even if the value is 5. I'm expecting "less than 5". What's going wrong?
apply coerces all the elements in the iris data frame to character. Then in your function, comparison operator > coerces the numeric 5 on the RHS of x[['Sepal.Length']] > 5 to character "5".
So the real comparison of "5.0" (in iris[['Sepal.Length']]) and 5 is "5.0" > "5". This comparison depend on how the character strings "5.0" and "5" are encoded.
See ?Comparison
Comparison of strings in character vectors is lexicographic within the
strings using the collating sequence of the locale in use ...
... If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
I had a bug in my code resulting from an inadvertent comparison between a character variable and a numeric variable (they were both supposed to be numeric). This bug would have been much easier to find if R had a warning when doing this type of comparison. For example, why does this not throw a warning
> 'two' < 5
[1] FALSE
but this does throw a warning
> as.numeric('two') < 5
[1] NA
Warning message:
NAs introduced by coercion
It is not clear to me what is going on behind the scenes in the first comparison?
In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).
From ?Comparison:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
I had a bug in my code resulting from an inadvertent comparison between a character variable and a numeric variable (they were both supposed to be numeric). This bug would have been much easier to find if R had a warning when doing this type of comparison. For example, why does this not throw a warning
> 'two' < 5
[1] FALSE
but this does throw a warning
> as.numeric('two') < 5
[1] NA
Warning message:
NAs introduced by coercion
It is not clear to me what is going on behind the scenes in the first comparison?
In your example 5 is converted to a character, so the test is the same as 'two' < as.character(5).
From ?Comparison:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
In the console, go ahead and try
> sum(sapply(1:99999, function(x) { x != as.character(x) }))
0
For all of values 1 through 99999, "1" == 1, "2" == 2, ..., 99999 == "99999" are TRUE. However,
> 100000 == "100000"
FALSE
Why does R have this quirky behavior, and is this a bug? What would be a workaround to, e.g., check if every element in an atomic character vector is in fact numeric? Right now I was trying to check whether x == as.numeric(x) for each x, but that fails on certain datasets due to the above problem!
Have a look at as.character(100000). Its value is not equal to "100000" (have a look for yourself), and R is essentially just telling you so.
as.character(100000)
# [1] "1e+05"
Here, from ?Comparison, are R's rules for applying relational operators to values of different types:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.
Those rules mean that when you test whether 1=="1", say, R first converts the numeric value on the LHS to a character string, and then tests for equality of the character strings on the LHS and RHS. In some cases those will be equal, but in other cases they will not. Which cases produce inequality will be dependent on the current settings of options("scipen") and options("digits")
So, when you type 100000=="100000", it is as if you were actually performing the following test. (Note that internally, R may well/probably does use something different than as.character() to perform the conversion):
as.character(100000)=="100000"
# [1] FALSE