How to make R display 19 digit number as it is stored? - r

I have a dataset with a key column which is basically a 19 digit integer.
I'm using tibbles so I use options(pillar.sigfig = 22) to display larger numbers and not scientific notation.
Problem is, I notice that the number stored in the column and the one that is displayed are slightly different, to be specific last 3 digits are different.
E.g
options(pillar.sigfig = 22)
x <- 1099324498500011011
But when I try to return the number I get 1099324498500011008.
I'm not sure why R would change the last 3 digits and since it is a key, it makes my data unusable for analysis.
I have tried the usual options(scipen = 999) for suppressing scientific notation but it does not seem to work on tibbles.
How do I get the same 19 digit number as I intend to store it?

Sorry to be bearer of bad news but R only has
a numeric type (double) using 64 bits and approximately sixteen decimals precision
an integer type (int) using 32 bits
There is nothing else. You may force the print function to show you nineteen digits but that just means ... you are looking at three digits of randomness.
19 digits for (countable) items are common, and often provided by (signed or unsigned) int64_t types. Which R does not have natively but approximates via the integer64 call in the bit64 package.
So the following may be your only workaround:
> suppressMessages(library(bit64))
> x <- as.integer64("123456790123456789")
> x
integer64
[1] 123456790123456789
> x - 1
integer64
[1] 123456790123456788
>
The good news is that integer64 is reasonably well supported by data.table and a number of other packages.
PS It really is 19 digits where it bites:
> as.integer64(1.2e18) + 1
integer64
[1] 1200000000000000001
> as.integer64(1.2e19) + 1
integer64
[1] <NA>
Warning message:
In as.integer64.double(1.2e+19) : NAs produced by integer64 overflow
>

Related

Why R doesn't show digit after decimal point in numeric data type?

I have a data frame read from a file into R. Data contains some numeric columns some with digit after decimal point and some column do not have. The columns without values with digit after decimal points have an issue as the following:
If I do some manipulation into that column which result in values with some digit after decimal point. However, the final data has some numbers such as 22.5, R does not show the value in this format, it only shows 22. But if I check it in If condition it confirm the value is actually 22.5. This does not happen when the original data contains some decimal points.
Could anyone let me know how to resolve this issue?
This is a FAQ. Presentation may be different from content as it is generally optimised for meaningful output. Really simple example follows:
> df <- data.frame(a=c(10000.12, 10000.13), b=c(42L, 43L))
> df
a b
1 10000.1 42
2 10000.1 43
> all.equal(df[1,"a"], 10000.12)
[1] TRUE
>
So the last digit did not "disappear" as the test confirms---it is simply beyond the (six in the default) digits displayed.
Similarly, you can always explicitly display with more decimals than the (compact, default) displays do:
> cat(sprintf("%14.8f", df[1,"a"]), "\n")
10000.12000000
>
Edit You can also increate the default display size by one or more:
options(digits=7) is the minimal change but not all columns use seven digits:
> options(digits=7)
> df
a b
1 10000.12 42
2 10000.13 43
>
Needless to say, if you had digits .123 only the first two would be shown etc.

Can as.numeric(as.character(x)), where x is originally a numeric, ever change x?

I am wondering if converting numerics to characters and then back again in R can ever change the number? For example, does as.character round off numerics after a certain amount of decimal places (if so, how many)?
#jogo thanks for the suggestion :)
Here is the comment as an answer:
From ?as.character():
as.character represents real and complex numbers to 15 significant
digits (technically the compiler's setting of the ISO C constant
DBL_DIG, which will be 15 on machines supporting IEC60559 arithmetic
according to the C99 standard). This ensures that all the digits in
the result will be reliable (and not the result of representation
error), but does mean that conversion to character and back to numeric
may change the number. If you want to convert numbers to character
with the maximum possible precision, use format.
So yes it does change the number if you have more than 15 significant digits. See:
> as.character(1.000000000000001) # more than 15 significant digits
[1] "1"
> as.character(1.00000000000001) # less than 15 significant digits
[1] "1.00000000000001"
Here are some other examples:
y <- as.numeric(as.character(pi))
identical(y, pi) ### gives FALSE
or
x <- 1/7
y <- as.numeric(as.character(x))
x-y
or
as.numeric(as.character(.Machine$double.xmax)) ## result: Inf

R addition of large and small numbers is ignoring the smaller values

I'm encountering a problem when adding larger numbers in R. The smaller values are getting ignored and it's producing an incorrect result.
For example, I've been using a binary to decimal converter found here: Convert binary string to binary or decimal value. The penultimate step looks like this:
2^(which(rev(unlist(strsplit(as.character(MyData$Index[1]), "")) == 1))-1)
[1] 1 2 32 64 256 2048 ...
I didn't include all number for length purposes, but when these numbers are summed, they will yield the integer value of the binary number. The correct result should be 4,919,768,674,277,575,011, but R is giving me a result of 4,919,768,674,277,574,656. Notice that this number is off by 355, which is the sum of the first 5 listed numbers.
I had thought it might have to do with a integer limit, but I tested it and R can handle larger numbers than what I need. Here's an example of something I tried, which again yielded an incorrect result:
2^64
[1] 18446744073709551616 #Correct Value
2^65
[1] 36893488147419103232 #Correct Value
2^64 + 2^65
[1] 55340232221128654858 #Correct Value
2^64 + 2^65 + 1
[1] 55340232221128654858 #Incorrect Value
It seems like there's some sort of problem with precision of large number addition, but I don't know how I can fix this so that I can get the desired result.
Any help would be greatly appreciated. And I apologize if anything is formatted improperly, this is my first post on the site.
For large integers, we could use as.bigz from gmp
library(gmp)
as.bigz(2^64) + as.bigz(2^65) + 1
# Big Integer ('bigz') :
#[1] 55340232221128654849

Donot want large numbers to be rounded off in R

options(scipen=999)
625075741017804800
625075741017804806
When I type the above in the R console, I get the same output for the two numbers listed above. The output being: 625075741017804800
How do I avoid that?
Numbers greater than 2^53 are not going to be unambiguously stored in the R numeric classed vectors. There was a recent change to allow integer storage in the numeric abscissa, however your number is larger that that increased capacity for precision:
625075741017804806 > 2^53
[1] TRUE
Prior to that change integers could only be stored up to Machine$integer.max == 2147483647. Numbers larger than that value get silently coerced to 'numeric' class. You will either need to work with them using character values or install a package that is capable of achieving arbitrary precision. Rmpfr and gmp are two that come to mind.
You can use package Rmpfr for arbitrary precision
dig <- mpfr("625075741017804806")
print(dig, 18)
# 1 'mpfr' number of precision 60 bits
# [1] 6.25075741017804806e17

Issue with character data type length in R & decimal precision

Trying to create a function to get the precision of numeric data. (the number of digits to the right of the decimal place)
decimalplaces <- function(x) {
if (x %% 1 != 0) {
pattern <- "^([0-9]+)[.]([0-9]+)$"
dec_part <- gsub(pattern,"\\2", x)
nchar(dec_part)
} else {
return(0)
}
}
The issue occurs with values with more than 16 digits -- nchar coerces "dec_part" to a string which can only store 16 digits.
Is there a way to overcome this limitation in R?
Are there alternatives to nchar for numeric data?
(R version 3.1.1 64 bit)
The 'problem' is not in nchar but in gsub, which applies as.character to a non-character x. The documentation for as.character says:
as.character represents real and complex numbers to 15 significant
digits (technically the compiler's setting of the ISO C constant
DBL_DIG, which will be 15 on machines supporting IEC60559 arithmetic
according to the C99 standard). This ensures that all the digits in
the result will be reliable (and not the result of representation
error), but does mean that conversion to character and back to numeric
may change the number. If you want to convert numbers to character
with the maximum possible precision, use format.
So, you can use
dec_part <- gsub(pattern,"\\2", format(x,digits=22))
instead of
dec_part <- gsub(pattern,"\\2", x)
in your code, but be careful because the 15 significant digit limit was set for a good reason, so there is a good chance to find just noise in trailing numbers. For example,
> format(1/3,digits=22)
[1] "0.3333333333333333148296"

Resources