%in% on numeric values returns wrong result - r

6969277959563657216 is not one of the following numbers, yet %in% returns TRUE.
6969277959563657216 %in% c(6972646901044805634,
6935914801507893250,
6930019021496532993,
6969277959563657217,
7005257783989764866)
[1] TRUE
Why is this?

The reason is that most computer programs work with a precision of approx. 16 digits, see FAQ 7.31 Why doesn’t R think these numbers are equal? and the link therein. The interpretation here is of course: "Why does R think these numbers are equal".
Example:
6969277959563657216 == 6969277959563657217
[1] TRUE
More about this also on Wikipedia and several other places.
Edit: Here a SO posting about big integers long/bigint/decimal equivalent datatype in R

Related

How does R compare version strings with the inequality operators?

Could someone explain this behavior in R?
> '3.0.1' < '3.0.2'
[1] TRUE
> '3.0.1' > '3.0.2'
[1] FALSE
What process is R doing to make the comparison?
It's making a lexicographic comparison in this case, as opposed to converting to numeric, as calling as.numeric('3.0.1') returns NA.
The logic here would be something like, "the strings '3.0.1' and '3.0.2' are equivalent until their final characters, and since 1 precedes 2 in an alphanumeric alphabet, '3.0.1' is less than '3.0.2'." You can test this with some toy examples:
'a' < 'b' # TRUE
'ab' < 'ac' # TRUE
'ab0' < 'ab1' # TRUE
Per the note in the manual in the post that #rawr linked in the comments, this will get hairy in different locales, where the alphanumeric alphabet may be sorted differently.

Logic regarding summation of decimals [duplicate]

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 8 years ago.
Does the last statement in this series of statements make logical sense to anybody else? R seems to give similar results for a small subset of possible sums of decimals under 1. I cannot recall any basic mathematical principles that would make this true, but it seems to be unlikely to be an error.
> 0.4+0.6
[1] 1
> 0.4+0.6==1.0
[1] TRUE
> 0.3+0.6
[1] 0.9
> 0.3+0.6==0.9
[1] FALSE
Try typing 0.3+0.6-0.9, on my system the result is -1.110223e-16 this is because the computer doesn't actually sum them as decimal numbers, it stores binary approximations, and sums those. And none of those numbers can be exactly represented in binary, so there is a small amount of error present in the calculations, and apparently it's small enough not to matter in the first one, but not the second.
Floating point arithmetic is not exact, but the == operator is. Use all.equal to compare two floating point values in R.
isTRUE(all.equal(0.3+0.6, 0.9))
You can also define a tolerance when calling all.equals.
isTRUE(all.equal(0.3+0.6, 0.9, tolerance = 0.001))

exponents and negative numbers

I do not know if other R users have found the following problem.
Within R I do the folowing operation:
> (3/-2)^(1/3)
[1] NaN
I obtain a NaN result.
I the similar way if I set:
> w<-(3/-2)
> g<-1/3
> w^g
[1] NaN
However, if I do:
> 3/-2
[1] -1.5
> -1.5^(1/3)
[1] -1.144714
Is there anybody that can explain this contradiction?
Where do you see a problem? -1.5^(1/3) is not the same as (-1.5)^(1/3). If you have basic maths education you shouldn't expect these to be the same.
Read help("Syntax") to learn that ^ has higher precedence than - in R.
This is due to the mathematical definition of exponentiation. For the continuous real exponentiation operator, you are not allowed to have a negative base.
Begin by doing (3/2)^(1/3) and after add "-"
you can't calculate a cube root of a negative number !
If you really want the answer you can do the computation over the complex numbers, i.e. get the cube root of -1.5+0i:
complex(real=-1.5,im=0)^(1/3)
## [1] 0.5723571+0.9913516i
This is actually only one of three complex roots of x^3+1.5==0:
polyroot(c(1.5,0,0,1))
[1] 0.5723571+0.9913516i -1.1447142+0.0000000i 0.5723571-0.9913516i
but the other answers probably come closer to addressing your real question.

Why is this easy comparison false? [duplicate]

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 8 years ago.
Why does this simple statement evaluate to FALSE in R?
mean(c(7.18, 7.13)) == 7.155
Furthermore, what do I have to do in order to make this a TRUE statement? Thanks!
Floating point arithmetic is not exact. The answer to this question has more information.
You can actually see this:
> print(mean(c(7.18,7.13)), digits=16)
[1] 7.154999999999999
> print(7.155, digits=16)
[1] 7.155
In general, do not compare floating point numbers for equality (this applies to pretty much every programming language, not just R).
You can use all.equal to do an inexact comparison:
> all.equal(mean(c(7.18,7.13)), 7.155)
[1] TRUE
It's probably due to small rounding error. Rounding to the third decimal place shows that they are equal:
round(mean(c(7.18, 7.13)), 3) == 7.155
Generally, don't rely on numerical comparisons to give expected logical outputs :)

Is it okay to use floating-point numbers as indices or when creating factors in R?

Is it okay to use floating-point numbers as indices or when creating factors in R?
I don't mean numbers with decimal parts; that would clearly be odd, but instead numbers which really are integers (to the user, that is), but are being stored as floating point numbers.
For example, I've often used constructs like (1:3)*3 or seq(3,9,by=3) as indices, but you'll notice that they're actually being represented as floating point numbers, not integers, even though to me, they're really integers.
Another time this could come up is when reading data from a file; if the file represents the integers as 1.0, 2.0, 3.0, etc, R will store them as floating-point numbers.
(I posted an answer below with an example of why one should be careful, but it doesn't really address if simple constructs like the above can cause trouble.)
(This question was inspired by this question, where the OP created integers to use as coding levels of a factor, but they were being stored as floating point numbers.)
It's always better to use integer representation when you can. For instance, with (1L:3L)*3L or seq(3L,9L,by=3L).
I can come up with an example where floating representation gives an unexpected answer, but it depends on actually doing floating point arithmetic (that is, on the decimal part of a number). I don't know if storing an integer directly in floating point and possibly then doing multiplication, as in the two examples in the original post, could ever cause a problem.
Here's my somewhat forced example to show that floating points can give funny answers. I make two 3's that are different in floating point representation; the first element isn't quite exactly equal to three (on my system with R 2.13.0, anyway).
> (a <- c((0.3*3+0.1)*3,3L))
[1] 3 3
> a[1] == a[2]
[1] FALSE
Creating a factor directly works as expected because factor calls as.character on them which has the same result for both.
> as.character(a)
[1] "3" "3"
> factor(a, levels=1:3, labels=LETTERS[1:3])
[1] C C
Levels: A B C
But using it as an index doesn't work as expected because when they're forced to an integer, they are truncated, so they become 2 and 3.
> trunc(a)
[1] 2 3
> LETTERS[a]
[1] "B" "C"
Constructs such as 1:3 are really integers:
> class(1:3)
[1] "integer"
Using a float as an index entails apparently some truncation:
> foo <- 1:3
> foo
[1] 1 2 3
> foo[1.0]
[1] 1
> foo[1.5]
[1] 1

Resources