R addition of large and small numbers is ignoring the smaller values - r

I'm encountering a problem when adding larger numbers in R. The smaller values are getting ignored and it's producing an incorrect result.
For example, I've been using a binary to decimal converter found here: Convert binary string to binary or decimal value. The penultimate step looks like this:
2^(which(rev(unlist(strsplit(as.character(MyData$Index[1]), "")) == 1))-1)
[1] 1 2 32 64 256 2048 ...
I didn't include all number for length purposes, but when these numbers are summed, they will yield the integer value of the binary number. The correct result should be 4,919,768,674,277,575,011, but R is giving me a result of 4,919,768,674,277,574,656. Notice that this number is off by 355, which is the sum of the first 5 listed numbers.
I had thought it might have to do with a integer limit, but I tested it and R can handle larger numbers than what I need. Here's an example of something I tried, which again yielded an incorrect result:
2^64
[1] 18446744073709551616 #Correct Value
2^65
[1] 36893488147419103232 #Correct Value
2^64 + 2^65
[1] 55340232221128654858 #Correct Value
2^64 + 2^65 + 1
[1] 55340232221128654858 #Incorrect Value
It seems like there's some sort of problem with precision of large number addition, but I don't know how I can fix this so that I can get the desired result.
Any help would be greatly appreciated. And I apologize if anything is formatted improperly, this is my first post on the site.

For large integers, we could use as.bigz from gmp
library(gmp)
as.bigz(2^64) + as.bigz(2^65) + 1
# Big Integer ('bigz') :
#[1] 55340232221128654849

Related

Large exponent values in R with 'Rmpfr'

I am trying to calculate the value of exp(1000) or more in R. I looked at the solution provided in Calculate the exponentials of big negative value and followed it to replicate the code for negative values first, but I have the following output.
a <- mpfr(exp(-1000),precBits = 64)
a
1 'mpfr' number of precision 64 bits
[1] 0
I do not understand why my output would be different from the provided solution. I understand this particular solution is for negative values and that I am looking for positive exponents. Regardless, it should work for both ways.
You need to convert to extended precision before you exponentiate, otherwise R will try to compute exp(-1000) with its usual precision and underflow will occur (as you found out).
> a <- exp(mpfr(-1000, precBits = 64))
> a
1 'mpfr' number of precision 64 bits
[1] 5.07595889754945676548e-435

How to make R display 19 digit number as it is stored?

I have a dataset with a key column which is basically a 19 digit integer.
I'm using tibbles so I use options(pillar.sigfig = 22) to display larger numbers and not scientific notation.
Problem is, I notice that the number stored in the column and the one that is displayed are slightly different, to be specific last 3 digits are different.
E.g
options(pillar.sigfig = 22)
x <- 1099324498500011011
But when I try to return the number I get 1099324498500011008.
I'm not sure why R would change the last 3 digits and since it is a key, it makes my data unusable for analysis.
I have tried the usual options(scipen = 999) for suppressing scientific notation but it does not seem to work on tibbles.
How do I get the same 19 digit number as I intend to store it?
Sorry to be bearer of bad news but R only has
a numeric type (double) using 64 bits and approximately sixteen decimals precision
an integer type (int) using 32 bits
There is nothing else. You may force the print function to show you nineteen digits but that just means ... you are looking at three digits of randomness.
19 digits for (countable) items are common, and often provided by (signed or unsigned) int64_t types. Which R does not have natively but approximates via the integer64 call in the bit64 package.
So the following may be your only workaround:
> suppressMessages(library(bit64))
> x <- as.integer64("123456790123456789")
> x
integer64
[1] 123456790123456789
> x - 1
integer64
[1] 123456790123456788
>
The good news is that integer64 is reasonably well supported by data.table and a number of other packages.
PS It really is 19 digits where it bites:
> as.integer64(1.2e18) + 1
integer64
[1] 1200000000000000001
> as.integer64(1.2e19) + 1
integer64
[1] <NA>
Warning message:
In as.integer64.double(1.2e+19) : NAs produced by integer64 overflow
>

Donot want large numbers to be rounded off in R

options(scipen=999)
625075741017804800
625075741017804806
When I type the above in the R console, I get the same output for the two numbers listed above. The output being: 625075741017804800
How do I avoid that?
Numbers greater than 2^53 are not going to be unambiguously stored in the R numeric classed vectors. There was a recent change to allow integer storage in the numeric abscissa, however your number is larger that that increased capacity for precision:
625075741017804806 > 2^53
[1] TRUE
Prior to that change integers could only be stored up to Machine$integer.max == 2147483647. Numbers larger than that value get silently coerced to 'numeric' class. You will either need to work with them using character values or install a package that is capable of achieving arbitrary precision. Rmpfr and gmp are two that come to mind.
You can use package Rmpfr for arbitrary precision
dig <- mpfr("625075741017804806")
print(dig, 18)
# 1 'mpfr' number of precision 60 bits
# [1] 6.25075741017804806e17

How do you cast a double to an integer in R?

My question is: Suppose you have computed an algorithm that gives the number of iterations and you would like to print the number of iterations out. But the output always many decimal places, like the following:
64.00000000
Is it possible to get an integer by doing type casting in R ? How would you do it ??
There are some gotchas in coercing to integer mode. Presumably you have a variety of numbers in some structure. If you are working with a matrix, then the print routine will display all the numbers at the same precision. However, you can change that level. If you have calculated this result with an arithmetic process it may be actually less than 64 bit display as that value.
> 64.00000000-.00000099999
[1] 64
> 64.00000000-.0000099999
[1] 63.99999
So assuming you want all the values in whatever structure this is part of, to be displayed as integers, the safest would be:
round(64.000000, 0)
... since this could happen, otherwise.
> as.integer(64.00000000-.00000000009)
[1] 63
The other gotcha is that the range of value for integers is considerably less than the range of floating point numbers.
The function is.integer can be used to test for integer mode.
is.integer(3)
[1] FALSE
is.integer(3L)
[1] TRUE
Neither round nor trunc will return a vector in integer mode:
is.integer(trunc(3.4))
[1] FALSE
Instead of trying to convert the output into an integer, find out why it is not an integer in the first place, and fix it there.
Did you initialize it as an integer, e.g. num.iterations <- 0L or num.iterations <- integer(1) or did you make the mistake of setting it to 0 (a numeric)?
When you incremented it, did you add 1 (a numeric) or 1L (an integer)?
If you are not sure, go through your code and check your variable's type using the class function.
Fixing the problem at the root could save you a lot of trouble down the line. It could also make your code more efficient as numerous operations are faster on integers than numerics (an example).
The function as.integer() truncate the number up to 0 order, so you must add a 0.5 to get a proper approx
dd<-64.00000000
as.integer(dd+0.5)
If you have a numeric matrix you wish to coerce to an integer matrix (e.g., you are creating a set of dummy variables from a factor), as.integer(matrix_object) will coerce the matrix to a vector, which is not what you want. Instead, you can use storage.mode(matrix_object) <- "integer" to maintain the matrix form.

Weird error in R when importing (64-bit) integer with many digits

I am importing a csv that has a single column which contains very long integers (for example: 2121020101132507598)
a<-read.csv('temp.csv',as.is=T)
When I import these integers as strings they come through correctly, but when imported as integers the last few digits are changed. I have no idea what is going on...
1 "4031320121153001444" 4031320121153001472
2 "4113020071082679601" 4113020071082679808
3 "4073020091116779570" 4073020091116779520
4 "2081720101128577687" 2081720101128577792
5 "4041720081087539887" 4041720081087539712
6 "4011120071074301496" 4011120071074301440
7 "4021520051054304372" 4021520051054304256
8 "4082520061068996911" 4082520061068997120
9 "4082620101129165548" 4082620101129165312
As others have noted, you can't represent integers that large. But R isn't reading those values into integers, it's reading them into double precision numerics.
Double precision can only represent numbers to ~16 places accurately, which is why you see your numbers rounded after 16 places. See the gmp, Rmpfr, and int64 packages for potential solutions. Though I don't see a function to read from a file in any of them, maybe you could cook something up by looking at their sources.
UPDATE:
Here's how you can get your file into an int64 object:
# This assumes your numbers are the only column in the file
# Read them in however, just ensure they're read in as character
a <- scan("temp.csv", what="")
ia <- as.int64(a)
R's maximum intger value is about 2E9. As #Joshua mentions in another answer, one of the potential solutions is the int64 package.
Import the values as character instead. Then convert to type int64.
require(int64)
a <- read.csv('temp.csv', colClasses = 'character', header=FALSE)[[1]]
a <- as.int64(a)
print(a)
[1] 4031320121153001444 4113020071082679601 4073020091116779570
[4] 2081720101128577687 4041720081087539887 4011120071074301496
[7] 4021520051054304372 4082520061068996911 4082620101129165548
You simply cannot represent integers that big. See
.Machine
which on my box has
$integer.max
[1] 2147483647
The maximum value of a 32-bit signed integer is 2,147,483,647. Your numbers are much larger.
Try importing them as floating point values instead.
There4 are a few caveats to be aware of when dealing with floating point arithmetic in R or any other language:
http://blog.revolutionanalytics.com/2009/11/floatingpoint-errors-explained.html
http://blog.revolutionanalytics.com/2009/03/when-is-a-zero-not-a-zero.html
http://floating-point-gui.de/basic/

Resources