R: surprising behaviour of `identical`

R: surprising behaviour of `identical` - r

In R, I stumbled upon this surprising behaviour of the function identical().
With simple ==:
(ncol(dpx)-1) == length(test)
TRUE
But with identical:
identical((ncol(dpx)-1) , length(test))
FALSE
They are both of type integer (81 each).
What is happening?

identical is the "safe and reliable way to test two objects for being exactly equal." ncol(dpx) - 1 returns a numeric vector due to 1 being numeric, while length returns an integer.
As pointed out by #amatsuo_net we could change the code slightly and convert the 1 to be of type integer.
identical((ncol(iris) + 1L - 1L), length(iris))
# [1] TRUE

Related

How come as.character(1) == as.numeric(1) is TRUE? [duplicate]

This question already has answers here:
Why does "one" < 2 equal FALSE in R?
(2 answers)
Why is the expression "1"==1 evaluating to TRUE? [duplicate]
(1 answer)
Closed 3 years ago.
Just like the title says, why does "1" == 1 is TRUE? What is the real reason behind this? Is R trying to be kind or is this something else? I was thinking since "1" (or any numbers it really doesn't matter) where read by R as a character it would automatically return FALSE if compare with as.numeric(1) or as.integer(1).
> as.character(1) == as.numeric(1)
[1] TRUE
or
> "1" == 1
[1] TRUE
I guess it is a simple question but I'd like to get an answer. Thank you.

According to ?==
For numerical and complex values, remember == and != do not allow for the finite representation of fractions, nor for rounding error. Using all.equal with identical is almost always preferable. S
In another paragraph, it is also written
x, y
atomic vectors, symbols, calls, or other objects for which methods have been written. If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw.
identical(as.character(1), as.numeric(1))
#[1] FALSE

How to test if an object is a vector in R

I want to test if an object is a vector in R. I'm confused as to why
is.vector(c(0.1))
returns TRUE and so does
is.vector(0.1)
I would like it to return false when it is just a number and true when it is a vector. Can anyone offer any help on this please?
Many thanks in advance.

in R there doesn't exist a single number or string alone. They are vectors of length 1. Or embedded in some more complex structures.
is.vector(c(0.1)) and is.vector(0.1) are in R absolutely identical.
That is also the reason, why length("this is a string/character") returns 1 - because length() in this case measures the number of elements in the vector.
And you see it if you type "this is a string/character" into R console:
It returns [1] "this is a string/character" - the [1] indicates: vector of length 1.
So you have to do nchar("this is a string/character") to get the length of the first element - the charater string - returning 26.
nchar(c("this is a string/character", "and this another string"))
## [1] 26 23
## nchar is vectorized as you see ...
This is an important difference to Python, where strings and numbers can stand alone.
So len("this") returns 4 in Python. len(["this"]) however 1 (1 element in list, thus length of list is 1).

As already mentioned by #RHertel, R considers c(0.1) a vector of length 1. You may want to test for length as well. E.g.
> x <- 1
> y <- 1:2
> is.vector(x) & length(x) > 1
[1] FALSE
> is.vector(y) & length(y) > 1
[1] TRUE

Subsetting a vector with a condition (excluding NA)

vector1 = c(1,2,3,NA)
condition1 = (vector1 == 2)
vector1[condition1]
vector1[condition1==TRUE]
In the above code, the condition1 is "FALSE TRUE FALSE NA",
and the 3rd and the 4th lines both gives me the result "2 NA"
which is not I expected.
I wanted elements whose values are really '2', not including NA.
Could anybody explain why R is designed to work in this way?
and how I can get the result I want with a simple command?

The subset vector[NA] will always be NA because the NA value is unknown and therefore the result of the subset is also unknown. %in% returns FALSE for NA, so it can be useful here.
vector1 = c(1,2,3,NA)
condition1 = (vector1 %in% 2)
vector1[condition1]
# [1] 2

If you are in RStudio and enter
?`[`
You will get the following explanation:
NAs in indexing
When extracting, a numerical, logical or character NA index picks an
unknown element and so returns NA in the corresponding element of a
logical, integer, numeric, complex or character result, and NULL for a
list. (It returns 00 for a raw result.)
When replacing (that is using indexing on the lhs of an assignment) NA
does not select any element to be replaced. As there is ambiguity as
to whether an element of the rhs should be used or not, this is only
allowed if the rhs value is of length one (so the two interpretations
would have the same outcome). (The documented behaviour of S was that
an NA replacement index ‘goes nowhere’ but uses up an element of
value: Becker et al p. 359. However, that has not been true of other
implementations.)

try the logical operator in that case,
vector1 = c(1,2,3,NA)
condition1<-(vector1==2 & !is.na(vector1) )
condition1
# FALSE TRUE FALSE FALSE
vector1[condition1]
# 2
& operation returns true when both of the logical operators are True.

identical is "The safe and reliable way to test two objects for being exactly equal. It returns TRUE in this case, FALSE in every other case." (see ?identical)
As it does not compare elementwise comparison you can use it in sapply to compare each element in vector1 to 2. I.e.:
condition1 = sapply(vector1, identical, y = 2)
which will give:
vector1[condition1]
[1] 2

Why does "one" < 2 equal FALSE in R?

I'm reading Hadley Wickham's Advanced R section on coercion, and I can't understand the result of this comparison:
"one" < 2
# [1] FALSE
I'm assuming that R coerces 2 to a character, but I don't understand why R returns FALSE instead of returning an error. This is especially puzzling to me since
-1 < "one"
# TRUE
So my question is two-fold: first, why this answer, and second, is there a way of seeing how R converts the individual elements within a logical vector like these examples?

From help("<"):
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
So in this case, the numeric is of lower precedence than the character. So 2 is coerced to the character "2". Comparison of strings in character vectors is lexicographic which, as I understand it, is alphabetic but locale-dependent.

It coerces 2 into a character, then it does an alphabetical comparison. And numeric characters are assumed to come before alphabetical ones
to get a general idea on the behavior try
'a'<'1'
'1'<'.'
'b'<'B'
'a'<'B'
'A'<'B'
'C'<'B'

How do you cast a double to an integer in R?

My question is: Suppose you have computed an algorithm that gives the number of iterations and you would like to print the number of iterations out. But the output always many decimal places, like the following:
64.00000000
Is it possible to get an integer by doing type casting in R ? How would you do it ??

There are some gotchas in coercing to integer mode. Presumably you have a variety of numbers in some structure. If you are working with a matrix, then the print routine will display all the numbers at the same precision. However, you can change that level. If you have calculated this result with an arithmetic process it may be actually less than 64 bit display as that value.
> 64.00000000-.00000099999
[1] 64
> 64.00000000-.0000099999
[1] 63.99999
So assuming you want all the values in whatever structure this is part of, to be displayed as integers, the safest would be:
round(64.000000, 0)
... since this could happen, otherwise.
> as.integer(64.00000000-.00000000009)
[1] 63
The other gotcha is that the range of value for integers is considerably less than the range of floating point numbers.
The function is.integer can be used to test for integer mode.
is.integer(3)
[1] FALSE
is.integer(3L)
[1] TRUE
Neither round nor trunc will return a vector in integer mode:
is.integer(trunc(3.4))
[1] FALSE

Instead of trying to convert the output into an integer, find out why it is not an integer in the first place, and fix it there.
Did you initialize it as an integer, e.g. num.iterations <- 0L or num.iterations <- integer(1) or did you make the mistake of setting it to 0 (a numeric)?
When you incremented it, did you add 1 (a numeric) or 1L (an integer)?
If you are not sure, go through your code and check your variable's type using the class function.
Fixing the problem at the root could save you a lot of trouble down the line. It could also make your code more efficient as numerous operations are faster on integers than numerics (an example).

The function as.integer() truncate the number up to 0 order, so you must add a 0.5 to get a proper approx
dd<-64.00000000
as.integer(dd+0.5)

If you have a numeric matrix you wish to coerce to an integer matrix (e.g., you are creating a set of dummy variables from a factor), as.integer(matrix_object) will coerce the matrix to a vector, which is not what you want. Instead, you can use storage.mode(matrix_object) <- "integer" to maintain the matrix form.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: surprising behaviour of `identical` - r

In R, I stumbled upon this surprising behaviour of the function identical(). With simple ==: (ncol(dpx)-1) == length(test) TRUE But with identical: identical((ncol(dpx)-1) , length(test)) FALSE They are both of type integer (81 each). What is happening?

Related

How come as.character(1) == as.numeric(1) is TRUE? [duplicate]

How to test if an object is a vector in R

Subsetting a vector with a condition (excluding NA)

Why does "one" < 2 equal FALSE in R?

How do you cast a double to an integer in R?

Categories

Resources