Equal numbers shown as false when compared in R [duplicate] - r

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In R, what is the difference between these two?
floating point issue in R?
This is part of a code I created. I spend days looking for the problem when finally realized that a comparison that should be TRUE was being calculated by R as FALSE. I'm using R 2.14.2 64 bits on windows. This is the code to reproduce the problem.
concList= c(1.15, 1.15, 1.15 ,1.15 ,1.15 ,1.15 )
concList=concList-0.4
a=sum(concList)
b=length(concList)*0.75
str(a)
str(b)
print(a==b)
The last print will result in FALSE even thou they are shown as exactly the same number. I tough this could be some problem on the floating point numerical representation of R, so I added the code below which solves the problem.
a=round(a,1)
b=round(b,1)
print(a==b)
My question is, is there any more elegant solution? Is this a bug that should be reported?
Thanks for your time.

Because they aren't exactly the same number. They differ by a very small amount due to the computer's representation of numbers, also known as floating point errors:
> a - b
[1] -8.881784e-16
Jon Skeet has an excellent blog post on this issue, which pops up on Stack Overflow with some regularity.
As #mrdwab suggests in the comments, you should use all.equal(a, b) to test for near equality.

Related

Managing floating point accuracy

I'm struggling with issues re. floating point accuracy, and could not find a solution.
Here is a short example:
aa<-c(99.93029, 0.0697122)
aa
[1] 99.9302900 0.0697122
aa[1]
99.93029
print(aa[1],digits=20)
99.930289999999999
It would appear that, upon storing the vector, R converted the numbers to something with a slightly different internal representation (yes, I have read circle 1 of the "R inferno" and similar material).
How can I force R to store the input values exactly "as is", with no modification?
In my case, my problem is that the values are processed in such a way that the small errors very quickly grow:
aa[2]/(100-aa[1])*100
[1] 100.0032 ## Should be 100, of course !
print(aa[2]/(100-aa[1])*100,digits=20)
[1] 100.00315593171625
So I need to find a way to get my normalization right.
Thanks
PS- There are many questions on this site and elsewhere, discussing the issue of apparent loss of precision, i.e. numbers displayed incorrectly (but stored right). Here, for instance:
How to stop read.table from rounding numbers with different degrees of precision in R?
This is a distinct issue, as the number is stored incorrectly (but displayed right).
(R version 3.2.1 (2015-06-18), win 7 x64)
Floating point precision has always generated lots of confusion. The crucial idea to remember is: when you work with doubles, there is no way to store each real number "as is", or "exactly right" -- the best you can store is the closest available approximation. So when you type (in R or any other modern language) something like x = 99.93029, you'll get this number represented by 99.930289999999999.
Now when you expect a + b to be "exactly 100", you're being inaccurate in terms. The best you can get is "100 up to N digits after the decimal point" and hope that N is big enough. In your case it would be correct to say 99.9302900 + 0.0697122 is 100 with 5 decimal points of accuracy. Naturally, by multiplying that equality by 10^k you'll lose additional k digits of accuracy.
So, there are two solutions here:
a. To get more precision in the output, provide more precision in the input.
bb <- c(99.93029, 0.06971)
print(bb[2]/(100-bb[1])*100, digits = 20)
[1] 99.999999999999119
b. If double precision not enough (can happen in complex algorithms), use packages that provide extra numeric precision operations. For instance, package gmp.
i guess you have misunderstood here. It's the same case where R is storing the correct value but the value is displayed accordingly to the value of option chosen while displaying it.
For Eg:
# the output of below will be:
> print(99.930289999999999,digits=20)
[1] 99.930289999999999395
But
# the output of:
> print(1,digits=20)
[1] 1
Also
> print(1.1,digits=20)
[1] 1.1000000000000000888
In addition to previous answers, I think that a good lecture regarding the subject would be
R Inferno, by P.Burns
http://www.burns-stat.com/documents/books/the-r-inferno/

R expression results in NaN for no obvious reason [duplicate]

This question already has answers here:
How to calculate any negative number to the power of some fraction in R?
(2 answers)
Closed 6 years ago.
How can it be that the expression
> (exp(17.118708 + 4.491715 * -2)/-67.421587)^(-67.421587)
results in
[1] NaN
while
> -50.61828^(-67.421587)
which should basically have the same outcome, gives me
[1] -1.238487e-115
This is driving me crazy, I spent hours searching for the Error. "-2", in this case, is a Parameter of the function. I really can't think of a solution. Thanks for your help!
EDIT:
I see that when I add brackets
> (-50.61828)^(-67.421587)
it also results in
[1] NaN
...but that does not solve my Problem.
It is because of the implementation of pow under C99 standard.
Let alone OP's example: (-50.61828)^(-67.421587), the mathematically justified (-8)^(1/3) = -2 does not work in R:
(-8)^(1/3)
# [1] NaN
Quoted from ?"^":
Users are sometimes surprised by the value returned, for example
why ‘(-8)^(1/3)’ is ‘NaN’. For double inputs, R makes use of IEC
60559 arithmetic on all platforms, together with the C system
function ‘pow’ for the ‘^’ operator. The relevant standards
define the result in many corner cases. In particular, the result
in the example above is mandated by the C99 standard. On many
Unix-alike systems the command ‘man pow’ gives details of the
values in a large number of corner cases.
I am on Ubuntu LINUX, so can help get relevant part of man power printed here:
If x is a finite value less than 0, and y is a finite noninteger, a
domain error occurs, and a NaN is returned.
From what I can tell, -50.61828^(-67.421587) is evaluating as -(50.61828^(-67.421587)). (-50.61828)^(-67.421587) also results in NaN.

How t get the "power of ten" of an arbitrary double number [duplicate]

This question already has answers here:
how to get nearest tenth of a double
(2 answers)
Closed 7 years ago.
Given a number of type double, say d:
How can d be rounded/extracted to it's "most suitable" power of 10?
Example:
0.123 => 0.1
1.234 => 1
12.34 => 10
[At this point, I have not decided which behaviour I want for for example 0.99 (i.e if it should be 0.1 or 0.01 - any solution will do for now.]
I am using this in Java programming, so either some standard library function or just a simple mathematical solution (for any language) will do. (I can think of naive solutions like dividing d by ten and look for the first non zero number, but it feels too ugly)
I am sorry if I am not using the correct terminology in the question, please edit if you can formulate it better.
Compute the base-10 logarithm, using a language of your choice, and round the number up or down according to personal taste.
In Java, you can use Math.log10.

Why does the number 1e9999... (31 9s) cause problems in R?

When entering 1e9999999999999999999999999999999 into R, R hangs and will not respond - requiring it to be terminated.
It seems to happen across 3 different computers, OSes (Windows 7 and Ubuntu). It happens in RStudio, RGui and RScript.
Here's some code to generate the number more easily:
boom <- paste(c("1e", rep(9, 31)), collapse="")
eval(parse(text=boom))
Now clearly this isn't a practical problem. I have no need to use numbers of this magnitude. It's just a question of curiosity.
Curiously, if you try 1e9999999999999999999999999999998 or 1e10000000000000000000000000000000 (add or subtract one from the power), you get Inf and 0 respectively. This number is clearly some kind of boundary, but between what and why here?
I considered that it might be:
A floating point problem, but I think they max out at 1.7977e308, long before the number in question.
An issue with 32-bit integers, but 2^32 is 4294967296, much smaller than the number in question.
Really weird. This is my dominant theory.
EDIT: As of 2015-09-15 at the latest, this no longer causes R to hang. They must have patched it.
This looks like an extreme case in the parser. The XeY format is described in Section 10.3.1: Literal Constants of the R Language Definition and points to ?NumericConstants for "up-to-date information on the currently accepted formats".
The problem seems to be how the parser handles the exponent. The numeric constant is handled by NumericValue (line 4361 of main/gram.c), which calls mkFloat (line 4124 of main/gram.c), which calls R_atof (line 1584 of main/util.c), which calls R_strtod4 (line 1461 of main/util.c). (All as of revision 60052.)
Line 1464 of main/utils.c shows expn declared as int and it will overflow at line 1551 if the exponent is too large. The signed integer overflow causes undefined behavior.
For example, the code below produces values for exponents < 308 or so and Inf for exponents > 308.
const <- paste0("1e",2^(1:31)-2)
for(n in const) print(eval(parse(text=n)))
You can see the undefined behavior for exponents > 2^31 (R hangs for an exponent = 2^31):
const <- paste0("1e",2^(31:61)+1)
for(n in const) print(eval(parse(text=n)))
I doubt this will get any attention from R-core because R can only store numeric values between about 2e-308 to 2e+308 (see ?double) and this number is way beyond that.
This is interesting, but I think R has systemic problems with parsing numbers that have very large exponents:
> 1e10000000000000000000000000000000
[1] 0
> 1e1000000000000000000000000000000
[1] Inf
> 1e100000000000000000000
[1] Inf
> 1e10000000000000000000
[1] 0
> 1e1000
[1] Inf
> 1e100
[1] 1e+100
There we go, finally something reasonable. According to this output and Joshua Ulrich's comment below, R appears to support representing numbers up to about 2e308 and parsing numbers with exponents up to about +2*10^9, but it cannot represent them. After that, there is undefined behavior apparently due to overflow.
R might use sometimes bignums. Perhaps 1e9999999999999999999999999999999 is some threshold, or perhaps the parsing routines have a limited buffer for reading the exponent. Your observation would be consistent with a 32 char (null-terminated) buffer for the exponent.
I'll rather ask that question on forums or mailing list specific to R, which are rumored to be friendly.
Alternatively, since R is free software, you could investigate its source code.

Why does sqrt(4) - 2 equal -8.1648465955514287168521180122928e-39 when using the windows calculator? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How is floating point stored? When does it matter?
Using the built-in calculator on my Win7 x64 I get the number -8.1648465955514287168521180122928e-39 when calculation sqrt(4)-2.
I would expect the result to be 0.
There's some error with floating-point values, when you go to subtract them on occasion. You may get a representation that's 0 or really close to 0 (10^-39's pretty close).
For more information, check out Fractions in Binary on Wikipedia.

Resources