R stores huge numbers wrongly - r

I was trying to develop a program in R Studio which needed the user to input any number as they wish which can even be huge.
So I was experimenting giving random numbers and got a problem.
When I entered a huge number every time the R displayed incorrectly.
I restarted R session .. still the problem persists. Please help. Here is the snapshot of what problem I am encountering.

You've exceeded the amount of data you can represent in R's integers (32-bit signed): −2,147,483,648 to 2,147,483,647. There's an option to extend to 64 bits using the bit64 package, as per Ben Bolker's comment below). This would extend your range from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
If you need more precision than that, try the gmp library. Note that the integer is presented as a character, to avoid precision effects rounding the number before it's processed.
options(scipen=99)
a1 <- 123456789123456789123456789
a1
[1] 123456789123456791346468826
a1 - (a1 -1)
[1] 0
# now using arbitrary-precision (big) numbers
library(gmp)
b1 <- as.bigz("123456789123456789123456789")
b1
Big Integer ('bigz') :
[1] 123456789123456789123456789
b1 - (b1 -1)
Big Integer ('bigz') :
[1] 1

Related

How to calculate plus two large number correctly in R

I have a stupid question.
I have a problem calculating a very simple plus thing in R.
When I try as below, it shows a strange result.
Could you let me know why?
> format(14429588100216510 + 8918402888002604, scientific = F)
[1] "23347990988219112"
It is a big integer number (> 4 bytes). We could use a package that supports those integers
library(gmp)
add.bigz(14429588100216510, 8918402888002604)
#Big Integer ('bigz') :
#[1] 23347990988219114

R addition of large and small numbers is ignoring the smaller values

I'm encountering a problem when adding larger numbers in R. The smaller values are getting ignored and it's producing an incorrect result.
For example, I've been using a binary to decimal converter found here: Convert binary string to binary or decimal value. The penultimate step looks like this:
2^(which(rev(unlist(strsplit(as.character(MyData$Index[1]), "")) == 1))-1)
[1] 1 2 32 64 256 2048 ...
I didn't include all number for length purposes, but when these numbers are summed, they will yield the integer value of the binary number. The correct result should be 4,919,768,674,277,575,011, but R is giving me a result of 4,919,768,674,277,574,656. Notice that this number is off by 355, which is the sum of the first 5 listed numbers.
I had thought it might have to do with a integer limit, but I tested it and R can handle larger numbers than what I need. Here's an example of something I tried, which again yielded an incorrect result:
2^64
[1] 18446744073709551616 #Correct Value
2^65
[1] 36893488147419103232 #Correct Value
2^64 + 2^65
[1] 55340232221128654858 #Correct Value
2^64 + 2^65 + 1
[1] 55340232221128654858 #Incorrect Value
It seems like there's some sort of problem with precision of large number addition, but I don't know how I can fix this so that I can get the desired result.
Any help would be greatly appreciated. And I apologize if anything is formatted improperly, this is my first post on the site.
For large integers, we could use as.bigz from gmp
library(gmp)
as.bigz(2^64) + as.bigz(2^65) + 1
# Big Integer ('bigz') :
#[1] 55340232221128654849

Donot want large numbers to be rounded off in R

options(scipen=999)
625075741017804800
625075741017804806
When I type the above in the R console, I get the same output for the two numbers listed above. The output being: 625075741017804800
How do I avoid that?
Numbers greater than 2^53 are not going to be unambiguously stored in the R numeric classed vectors. There was a recent change to allow integer storage in the numeric abscissa, however your number is larger that that increased capacity for precision:
625075741017804806 > 2^53
[1] TRUE
Prior to that change integers could only be stored up to Machine$integer.max == 2147483647. Numbers larger than that value get silently coerced to 'numeric' class. You will either need to work with them using character values or install a package that is capable of achieving arbitrary precision. Rmpfr and gmp are two that come to mind.
You can use package Rmpfr for arbitrary precision
dig <- mpfr("625075741017804806")
print(dig, 18)
# 1 'mpfr' number of precision 60 bits
# [1] 6.25075741017804806e17

Why does R regard large number as EVEN

From "http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-the" 7.31
We already know that large number (over 2^53) can make an error in modular operation.
However, I cannot understand why all the large number is regarded as even(I have never seen "odd" of large integer which is over 2^53) even though I take some errors in approximation
(2^53+1)%%2
(2^100-1)%%2
error message(probable complete loss of accuracy in modulus) can be ignored
etc..
are all not 1 but 0
why so? (I know there is some approximation, but I need to know the reason concretely)
> print(2^54,22)
[1] 18014398509481984.00000
> print(2^54+1,22)
[1] 18014398509481984.00000
> print(2^54+2,22)
[1] 18014398509481984.00000
> print(2^54+3,22)
[1] 18014398509481988.0000
An IEEE double precision value has a 53-bit mantissa. Any number requiring more than 53 binary digits of precision will be rounded, i.e. the digits from 54 onwards will be implicitly set to zero. Thus any number with magnitude greater than 2^53 will necessarily be even (since the least-significant bit of its integer representation is beyond the floating-point precision, and is therefore zero).
There is no such thing as an "integer" in versions of R at or earlier that v2.15.3 whose magnitude was greater than 2^31-1. You are working with "numeric" or "double" entities. You are probably "rounding down" or trunc-ating your values.
?`%%`
The soon-to-be but as yet unreleased version 3.0 of R will have 8 byte integers and this problem will then not arise until you go out beyond 2^2^((8*8)-1))-1. At the moment coercion to integer fails at that level:
> as.integer(2^((8*4)-1)-1)
[1] 2147483647
> as.integer(2^((8*8)-1)-1)
[1] NA
Warning message:
NAs introduced by coercion
So your first example may rerun the proper result but your second example may still fail.

Weird error in R when importing (64-bit) integer with many digits

I am importing a csv that has a single column which contains very long integers (for example: 2121020101132507598)
a<-read.csv('temp.csv',as.is=T)
When I import these integers as strings they come through correctly, but when imported as integers the last few digits are changed. I have no idea what is going on...
1 "4031320121153001444" 4031320121153001472
2 "4113020071082679601" 4113020071082679808
3 "4073020091116779570" 4073020091116779520
4 "2081720101128577687" 2081720101128577792
5 "4041720081087539887" 4041720081087539712
6 "4011120071074301496" 4011120071074301440
7 "4021520051054304372" 4021520051054304256
8 "4082520061068996911" 4082520061068997120
9 "4082620101129165548" 4082620101129165312
As others have noted, you can't represent integers that large. But R isn't reading those values into integers, it's reading them into double precision numerics.
Double precision can only represent numbers to ~16 places accurately, which is why you see your numbers rounded after 16 places. See the gmp, Rmpfr, and int64 packages for potential solutions. Though I don't see a function to read from a file in any of them, maybe you could cook something up by looking at their sources.
UPDATE:
Here's how you can get your file into an int64 object:
# This assumes your numbers are the only column in the file
# Read them in however, just ensure they're read in as character
a <- scan("temp.csv", what="")
ia <- as.int64(a)
R's maximum intger value is about 2E9. As #Joshua mentions in another answer, one of the potential solutions is the int64 package.
Import the values as character instead. Then convert to type int64.
require(int64)
a <- read.csv('temp.csv', colClasses = 'character', header=FALSE)[[1]]
a <- as.int64(a)
print(a)
[1] 4031320121153001444 4113020071082679601 4073020091116779570
[4] 2081720101128577687 4041720081087539887 4011120071074301496
[7] 4021520051054304372 4082520061068996911 4082620101129165548
You simply cannot represent integers that big. See
.Machine
which on my box has
$integer.max
[1] 2147483647
The maximum value of a 32-bit signed integer is 2,147,483,647. Your numbers are much larger.
Try importing them as floating point values instead.
There4 are a few caveats to be aware of when dealing with floating point arithmetic in R or any other language:
http://blog.revolutionanalytics.com/2009/11/floatingpoint-errors-explained.html
http://blog.revolutionanalytics.com/2009/03/when-is-a-zero-not-a-zero.html
http://floating-point-gui.de/basic/

Resources