From "http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-the" 7.31
We already know that large number (over 2^53) can make an error in modular operation.
However, I cannot understand why all the large number is regarded as even(I have never seen "odd" of large integer which is over 2^53) even though I take some errors in approximation
(2^53+1)%%2
(2^100-1)%%2
error message(probable complete loss of accuracy in modulus) can be ignored
etc..
are all not 1 but 0
why so? (I know there is some approximation, but I need to know the reason concretely)
> print(2^54,22)
[1] 18014398509481984.00000
> print(2^54+1,22)
[1] 18014398509481984.00000
> print(2^54+2,22)
[1] 18014398509481984.00000
> print(2^54+3,22)
[1] 18014398509481988.0000
An IEEE double precision value has a 53-bit mantissa. Any number requiring more than 53 binary digits of precision will be rounded, i.e. the digits from 54 onwards will be implicitly set to zero. Thus any number with magnitude greater than 2^53 will necessarily be even (since the least-significant bit of its integer representation is beyond the floating-point precision, and is therefore zero).
There is no such thing as an "integer" in versions of R at or earlier that v2.15.3 whose magnitude was greater than 2^31-1. You are working with "numeric" or "double" entities. You are probably "rounding down" or trunc-ating your values.
?`%%`
The soon-to-be but as yet unreleased version 3.0 of R will have 8 byte integers and this problem will then not arise until you go out beyond 2^2^((8*8)-1))-1. At the moment coercion to integer fails at that level:
> as.integer(2^((8*4)-1)-1)
[1] 2147483647
> as.integer(2^((8*8)-1)-1)
[1] NA
Warning message:
NAs introduced by coercion
So your first example may rerun the proper result but your second example may still fail.
Related
I was trying to develop a program in R Studio which needed the user to input any number as they wish which can even be huge.
So I was experimenting giving random numbers and got a problem.
When I entered a huge number every time the R displayed incorrectly.
I restarted R session .. still the problem persists. Please help. Here is the snapshot of what problem I am encountering.
You've exceeded the amount of data you can represent in R's integers (32-bit signed): −2,147,483,648 to 2,147,483,647. There's an option to extend to 64 bits using the bit64 package, as per Ben Bolker's comment below). This would extend your range from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
If you need more precision than that, try the gmp library. Note that the integer is presented as a character, to avoid precision effects rounding the number before it's processed.
options(scipen=99)
a1 <- 123456789123456789123456789
a1
[1] 123456789123456791346468826
a1 - (a1 -1)
[1] 0
# now using arbitrary-precision (big) numbers
library(gmp)
b1 <- as.bigz("123456789123456789123456789")
b1
Big Integer ('bigz') :
[1] 123456789123456789123456789
b1 - (b1 -1)
Big Integer ('bigz') :
[1] 1
When I save a large number in R as an object the wrong number is saved? Why is that?
options("scipen"=100, "digits"=4)
num <- 201912030032451613
num
#> [1] 201912030032451616
Created on 2019-12-12 by the reprex package (v0.2.1.9000)
As #Roland says, this is a floating point issue (the Wikipedia page on floating point numbers is as good as anything). Unpacking it a bit though, R has specific integer format but it is limited to 32 bit integers:
> str(-2147483647L)
int -2147483647
> str(2147483647L)
int 2147483647
> str(21474836470L)
num 21474836470
Warning message:
non-integer value 21474836470L qualified with L; using numeric value
So, when R gets your number it is storing it as a floating point number not an integer. Floating point numbers are limited in how much precision they can store and typically only have about 17 significant digits. Because your number has more significant digits than that there is loss of precision. Losing precision in the smallest digits doesn't usually matter for computer arithmetic, but if your big number is a key of some kind (or a date stamp) then you are in more trouble. The bit64 package is designed with this kind of use case in mind, or you could import it as a string, depending on what you want to do.
I know floating point numbers are strange, but I haven't come across this exact issue before. I have a vector of numbers in R. I see how many are bigger than zero, and I take the mean of this to get the proportion above zero. I assign the number to an object after rounding it. When I go to paste it, somehow the numbers come back. I would dput the vector, but it is too long to do so, but here's the head and str:
> head(x)
[1] 0.1616631 0.2117250 0.1782197 0.1791657 0.2067048 0.2042075
> str(x)
num [1:4000] 0.162 0.212 0.178 0.179 0.207 ...
Now here's where I run into issues:
> y <- round(mean(x > 0) * 100, 1)
> y
[1] 99.7
> str(y)
num 99.7
> paste(100 - y, "is the inverse")
[1] "0.299999999999997 is the inverse"
But it doesn't behave the same if I don't subtract from 100:
> paste(y, "is it pasted")
[1] "99.7 is it pasted"
I know I could put round right into the paste command or use sprintf, and I know how floats are represented in R, but I'm specifically wondering why it occurs for the former situation and not the latter? I cannot get a reproducible example, either, because I cannot get a randomly-generated vector to behave in the same way.
There's rounding error, but in this case R is not handling it nicely.
Any representation of floating-point numbers in R is done as double, which means 53 bits of precision, approximately 16 digits. That's also for the 99.7, you can see where it breaks down:
print(99.7, digits=16) # works fine
print(99.7, digits=17) # Adds a 3 at the end on my platform
That will be always a limit, which you are warned for when specifying it in print (in the docs).
But when you do calculations, any rounding error remains absolute, meaning your expected value of .3 has an absolute error that is just as big, but that is relatively 300 times larger. Therefore it "fails" with less significant digits:
print(100-99.7, digits=14) # works fine
print(100-99.7, digits=15) # Allready rounding error at digits=15
Now paste passes any number to the function as.character, which (in this case unfortunately) does not look at any options you've set, it always uses a default value of 15 significant digits.
To solve it, you can use format to specify the desired number of digits:
paste(format(100 - y, digits=14), "is the inverse")
options(scipen=999)
625075741017804800
625075741017804806
When I type the above in the R console, I get the same output for the two numbers listed above. The output being: 625075741017804800
How do I avoid that?
Numbers greater than 2^53 are not going to be unambiguously stored in the R numeric classed vectors. There was a recent change to allow integer storage in the numeric abscissa, however your number is larger that that increased capacity for precision:
625075741017804806 > 2^53
[1] TRUE
Prior to that change integers could only be stored up to Machine$integer.max == 2147483647. Numbers larger than that value get silently coerced to 'numeric' class. You will either need to work with them using character values or install a package that is capable of achieving arbitrary precision. Rmpfr and gmp are two that come to mind.
You can use package Rmpfr for arbitrary precision
dig <- mpfr("625075741017804806")
print(dig, 18)
# 1 'mpfr' number of precision 60 bits
# [1] 6.25075741017804806e17
I am working in R with very small numbers which reflect probabilities in an Maximum Likelihood Estimation algorithm. Some of these numbers are as small as 1e-155 ( or smaller). However, when there is something as simple as summation taking place, the precision level gets truncated to the least precise one and thus ruins the precisions of my calculations and produces meaningless results.
Example:
> sum(c(7.831908e-70,6.002923e-26,6.372573e-36,5.025015e-38,5.603268e-38,1.118121e-14, 4.512098e-07,4.400717e-05,2.300423e-26,1.317602e-58))
[1] 4.445838e-05
As is seen from the example, the base for this calculation is 1e-5 , which in a very rude manner rounds up sensitive calculation.
Is there a way around this? Why is R choosing such a strange automatic behavior? Perhaps it is not really doing this, I just see the result in the truncated form? In this case, is the actual number with correct precision stored in the variable?
There is no precision loss in your sum. But if you're worried about it, you should use a multiple-precision library:
library("Rmpfr")
x <- c(7.831908e-70,6.002923e-26,6.372573e-36,5.025015e-38,5.603268e-38,1.118121e-14, 4.512098e-07,4.400717e-05,2.300423e-26,1.317602e-58)
sum(mpfr(x, 1024))
# 1 'mpfr' number of precision 1024 bits
# [1] 4.445837981118120898327314579322617633703674840117902103769961398533293289165193843930280422747754618577451267010103975610356319174778512980120125435961577770470993217990999166176083700886405875414277348471907198346293122011042229843450802884152750493740313686430454254150390625000000000000000000000000000000000e-5
Your results are only truncated in the display.
Try:
x <- sum(c(7.831908e-70,6.002923e-26,6.372573e-36,5.025015e-38,5.603268e-38,1.118121e-14, 4.512098e-07,4.400717e-05,2.300423e-26,1.317602e-58))
print(x, digits=22)
[1] 4.445837981118121081878e-05
You can read more about the behaviour of print at ?print.default
You can also set an option - this will affext all calls to print
options(digits=22)
have you ever heard about Floating point numbers?
there is no loss of precision (significant figures) in multiplication or division as far as the result stay between
1.7976931348623157·10^308 to 4.9·10^−324 (see the link for detail)
so if you do 1.0e-30 * 1.0e-10 result will be 1.0e-40
but if you do 1.0e-30 + 1.0e-10 result will be 1.0e-10
Why?
-> finite set of number rapresentable with computer works. (64 bits max 2^64 different representation of numbers with 64 bits)
instead of using a direct conversion like for integer numbers (they represent from ~ -2^62 to +2^62, every INTEGER number -> about from -10^16 to +10*16)
or there exist a clever way like floating point? from 1.7976931348623157·10^308 to - 4.9·10^−324 and it can represent /approximate rational numbers?
So in floating point, to achieve a wider range, precision in sums is sacrified, There is loss of precision during sums or subtractions as the significant figures that could be represented by (the 52 bits of) the fraction part (of a floating point number of 64 bits) are less than log10(2^52) ~ 16.
if you look for a basic everyday example, summary(lm), when the p-value of parameter is near zero, summary() output <2.2e-16 (what a coincidence).
why limited to 64 bits? CPU have the execution units specifically to 64bits floating point arithmetic (64 bit IEEE 754 standard), if you use higher precision like 128 bits floating point, the performances will be lowered by 10 times or more, as CPU need to split the data and operation in multiple 64 bits data and operations.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format