R: How to convert long number to string to save precision - r

I have a problem to convert a long number to a string in R. How to easily convert a number to string to preserve precision? A have a simple example below.
a = -8664354335142704128
toString(a)
[1] "-8664354335142704128"
b = -8664354335142703762
toString(b)
[1] "-8664354335142704128"
a == b
[1] TRUE
I expected toString(a) == toString(b), but I got different values. I suppose toString() converts the number to float or something like that before converting to string.
Thank you for your help.
Edit:
> -8664354335142704128 == -8664354335142703762
[1] TRUE
> along = bit64::as.integer64(-8664354335142704128)
> blong = bit64::as.integer64(-8664354335142703762)
> along == blong
[1] TRUE
> blong
integer64
[1] -8664354335142704128
I also tried:
> as.character(blong)
[1] "-8664354335142704128"
> sprintf("%f", -8664354335142703762)
[1] "-8664354335142704128.000000"
> sprintf("%f", blong)
[1] "-0.000000"
Edit 2:
My question first was, if I can convert a long number to string without loss. Then I realized, in R is impossible to get the real value of a long number passed into a function, because R automatically read the value with the loss.
For example, I have the function:
> my_function <- function(long_number){
+ string_number <- toString(long_number)
+ print(string_number)
+ }
If someone used it and passed a long number, I am not able to get the information, which number was passed exactly.
> my_function(-8664354335142703762)
[1] "-8664354335142704128"
For example, if I read some numbers from a file, it is easy. But it is not my case. I just need to use something that some user passed.
I am not R expert, so I just was curious why in another language it works and in R not. For example in Python:
>>> def my_function(long_number):
... string_number = str(long_number)
... print(string_number)
...
>>> my_function(-8664354335142703762)
-8664354335142703762
Now I know, the problem is how R reads and stores numbers. Every language can do it differently. I have to change the way how to pass numbers to R function, and it solves my problem.
So the correct answer to my question is:
""I suppose toString() converts the number to float", nope, you did it yourself (even if unintentionally)." - Nope, R did it itself, that is the way how R reads numbers.
So I marked r2evans answer as the best answer because this user helped me to find the right solution. Thank you!

Bottom line up front, you must (in this case) read in your large numbers as string before converting to 64-bit integers:
bit64::as.integer64("-8664354335142704128") == bit64::as.integer64("-8664354335142703762")
# [1] FALSE
Some points about what you've tried:
"I suppose toString() converts the number to float", nope, you did it yourself (even if unintentionally). In R, when creating a number, 5 is a float and 5L is an integer. Even if you had tried to create it as an integer, it would have complained and lost precision anyway:
class(5)
# [1] "numeric"
class(5L)
# [1] "integer"
class(-8664354335142703762)
# [1] "numeric"
class(-8664354335142703762L)
# Warning: non-integer value 8664354335142703762L qualified with L; using numeric value
# [1] "numeric"
more appropriately, when you type it in as a number and then try to convert it, R processes the inside of the parentheses first. That is, with
bit64::as.integer64(-8664354335142704128)
R first has to parse and "understand" everything inside the parentheses before it can be passed to the function. (This is typically a compiler/language-parsing thing, not just an R thing.) In this case, it sees that it appears to be a (large) negative float, so it creates a class numeric (float). Only then does it send this numeric to the function, but by this point the precision has already been lost. Ergo the otherwise-illogical
bit64::as.integer64(-8664354335142704128) == bit64::as.integer64(-8664354335142703762)
# [1] TRUE
In this case, it just *happens that the 64-bit version of that number is equal to what you intended.
bit64::as.integer64(-8664254335142704128) # ends in 4128
# integer64
# [1] -8664254335142704128 # ends in 4128, yay! (coincidence?)
If you subtract one, it results in the same effective integer64:
bit64::as.integer64(-8664354335142704127) # ends in 4127
# integer64
# [1] -8664354335142704128 # ends in 4128 ?
This continues for quite a while, until it finally shifts to the next rounding point
bit64::as.integer64(-8664254335142703617)
# integer64
# [1] -8664254335142704128
bit64::as.integer64(-8664254335142703616)
# integer64
# [1] -8664254335142703104
It is unlikely to be coincidence that the difference is 1024, or 2^10. I haven't fished yet, but I'm guessing there's something meaningful about this with respect to floating point precision in 32-bit land.
fortunately, bit64::as.integer64 has several S3 methods, useful for converting different formats/classes to a integer64
library(bit64)
methods(as.integer64)
# [1] as.integer64.character as.integer64.double as.integer64.factor
# [4] as.integer64.integer as.integer64.integer64 as.integer64.logical
# [7] as.integer64.NULL
So, bit64::as.integer64.character can be useful, since precision is not lost when you type it or read it in as a string:
bit64::as.integer64("-8664354335142704128")
# integer64
# [1] -8664354335142704128
bit64::as.integer64("-8664354335142704128") == bit64::as.integer64("-8664354335142703762")
# [1] FALSE
FYI, your number is already near the 64-bit boundary:
-.Machine$integer.max
# [1] -2147483647
-(2^31-1)
# [1] -2147483647
log(8664354335142704128, 2)
# [1] 62.9098
-2^63 # the approximate +/- range of 64-bit integers
# [1] -9.223372e+18
-8664354335142704128
# [1] -8.664354e+18

Related

How to make R display 19 digit number as it is stored?

I have a dataset with a key column which is basically a 19 digit integer.
I'm using tibbles so I use options(pillar.sigfig = 22) to display larger numbers and not scientific notation.
Problem is, I notice that the number stored in the column and the one that is displayed are slightly different, to be specific last 3 digits are different.
E.g
options(pillar.sigfig = 22)
x <- 1099324498500011011
But when I try to return the number I get 1099324498500011008.
I'm not sure why R would change the last 3 digits and since it is a key, it makes my data unusable for analysis.
I have tried the usual options(scipen = 999) for suppressing scientific notation but it does not seem to work on tibbles.
How do I get the same 19 digit number as I intend to store it?
Sorry to be bearer of bad news but R only has
a numeric type (double) using 64 bits and approximately sixteen decimals precision
an integer type (int) using 32 bits
There is nothing else. You may force the print function to show you nineteen digits but that just means ... you are looking at three digits of randomness.
19 digits for (countable) items are common, and often provided by (signed or unsigned) int64_t types. Which R does not have natively but approximates via the integer64 call in the bit64 package.
So the following may be your only workaround:
> suppressMessages(library(bit64))
> x <- as.integer64("123456790123456789")
> x
integer64
[1] 123456790123456789
> x - 1
integer64
[1] 123456790123456788
>
The good news is that integer64 is reasonably well supported by data.table and a number of other packages.
PS It really is 19 digits where it bites:
> as.integer64(1.2e18) + 1
integer64
[1] 1200000000000000001
> as.integer64(1.2e19) + 1
integer64
[1] <NA>
Warning message:
In as.integer64.double(1.2e+19) : NAs produced by integer64 overflow
>

How to test if an object is a vector in R

I want to test if an object is a vector in R. I'm confused as to why
is.vector(c(0.1))
returns TRUE and so does
is.vector(0.1)
I would like it to return false when it is just a number and true when it is a vector. Can anyone offer any help on this please?
Many thanks in advance.
in R there doesn't exist a single number or string alone. They are vectors of length 1. Or embedded in some more complex structures.
is.vector(c(0.1)) and is.vector(0.1) are in R absolutely identical.
That is also the reason, why length("this is a string/character") returns 1 - because length() in this case measures the number of elements in the vector.
And you see it if you type "this is a string/character" into R console:
It returns [1] "this is a string/character" - the [1] indicates: vector of length 1.
So you have to do nchar("this is a string/character") to get the length of the first element - the charater string - returning 26.
nchar(c("this is a string/character", "and this another string"))
## [1] 26 23
## nchar is vectorized as you see ...
This is an important difference to Python, where strings and numbers can stand alone.
So len("this") returns 4 in Python. len(["this"]) however 1 (1 element in list, thus length of list is 1).
As already mentioned by #RHertel, R considers c(0.1) a vector of length 1. You may want to test for length as well. E.g.
> x <- 1
> y <- 1:2
> is.vector(x) & length(x) > 1
[1] FALSE
> is.vector(y) & length(y) > 1
[1] TRUE

Why can't R handle inequalities between negative numbers in quotes

This is a weird problem, with an easy workaround, but I'm just so curious why R is behaving this way.
> "-1"<"-2"
[1] TRUE
> -1<"-2"
[1] TRUE
> "-1"< -2
[1] TRUE
> -1< -2
[1] FALSE
> as.numeric("-1")<"-2"
[1] TRUE
> "-1"<as.numeric("-2")
[1] TRUE
> as.numeric("-1")<as.numeric("-2")
[1] FALSE
What is happening? Please, for my own sanity...
A "number in quotes" is not a number at all, it is a string of characters. Those characters happen to be displayed with the same drawing on your screen as the corresponding number, but they are fundamentally not the same object.
The behavior you are seeing is consistent with the following:
A pair of numbers (numeric in R) is compared in the way that you should expect, numerically with the natural ordering. So, -1 < -2 is indeed FALSE.
A pair of strings (character in R) are compared in lexicographic order, meaning roughly that it is compared alphabetically, character by character, from left to right. Since "-1" and "-2" start with the same character, we move to the second, and "2" comes after "1", so "-2" comes after "-1" and therefore "-1" < "-2" is TRUE.
When comparing objects of mismatched types, you have two basic choices: either you give an error, or you convert one of the types to the other and then fall back on the two facts above. R takes the 2nd route, and chooses to convert numeric to character, which explains the result you got above (all your mismatched examples give TRUE).
Note that it makes more sense to convert numeric to character, rather than the other way around, because most character can't be automatically converted to numeric in a meaningful way.
I've always thought this is because the default behavior is to treat the values in quotes as character, and the values without quotes as double. Without expressly declaring the data types, you get this:
> typeof(-1)
[1] "double"
> typeof("-1")
[1] "character"
> typeof(as.numeric("-1"))
[1] "double"
It's only when the negative numbers are put in quotes that it orders them alphabetically, because they are characters.

Using as.hexmode with R

I have some R code:
writePoint <- page * 2^12 + offset
localCount<-0
instructions <- 0
while(localCount < lengthI$length) {
cat("<instruction address=\"")
cat(as.hexmode(writePoint))
However writePoint is always written as a decimal number. What am I doing wrong?
That's kind of interesting. Here's a bit more compact demonstration and a start toward an explanation:
> cat(as.hexmode(10))
10
> cat(as.hexmode(20))
20
> as.hexmode(20)
[1] "14"
> str(as.hexmode(20))
Class 'hexmode' int 20
So a hexmode number has a print method (seen by typing methods(print) at the console) and it coerces it to a character when it is printed but it doesn't really change its internal representation as a number, so cat give you back a decimal number. Notice that the help page for cat says (but I will admit this behavior was not really implied by this text and I would have thought that it meant that cat would give 14 or 0x14):
> 0x14
[1] 20
cat converts numeric/complex elements in the same way as print (and not in the same way as as.character which is used by the S equivalent), so options "digits" and "scipen" are relevant.
Might want to use the as.character coercion to get what you want:
> as.character(as.hexmode(20))
[1] "14"

Why does not R round function round big numbers

I need a R function that always returns same number of digits after the decimal point regardless of how big the argument is. I tried round() but it does not work this way. Here is my example:
Rweb:> round(111234.678912,4) # expect 111234.6789
[1] 111234.7
Rweb:> round(111234.678912/10,4) # expect 11123.4679
[1] 11123.47
Rweb:> round(111234.678912/100,4) # expect 1112.3468
[1] 1112.347
Rweb:> round(111234.678912/1000,4)
[1] 111.2347
Rweb:> round(111234.678912/10000,4)
[1] 11.1235
It does work if the argument is in exponential format but I need work with numbers in floating format.
It does round the number to the correct number of digits. However, R has limits on the number of digits it displays of very large numbers. That is- those digits are there, they just aren't shown.
You can see this like so:
> round(111234.678912,4)
[1] 111234.7
> round(111234.678912,4) - 111234
[1] 0.6789
You can use formatC to display it with any desired number of digits:
> n = round(111234.678912,4)
> formatC(n, format="f")
[1] "111234.6789"
> formatC(n, format="f", digits=2)
[1] "111234.68"
As #mnel helpfully points out, you can also set the number of digits shown (including those to the left of the decimal point) using options:
> options(digits=6)
> round(111234.678912,4)
[1] 111235
> options(digits=10)
> round(111234.678912,4)
[1] 111234.6789
For anyone else who, like me, thought the question was going to be about bignums :-), there's this to ponder :-)
Rgames> bfoo<-mpfr("1.234545678909887665453421")
Rgames> bfoo
1 'mpfr' number of precision 84 bits
[1] 1.234545678909887665453421
Rgames> round(bfoo,10)
1 'mpfr' number of precision 84 bits
[1] 1.23454567889999999999999999`
let x is a number with big decimal places.
x<-1111111234.6547389758965789345
Here x is a number with big decimal places , you can format decimal places
as your wish.
Such that we wish to take up to 8 decimal places of this number.
x<-c(1111111234.6547389758965789345)
y<-formatC(x,digits=8,format="f")
[1] "1111111234.65473890"
Here format="f" gives number in the usual decimal places say, xxx.xxx.
But if you wanted to get a integer number from this object x you use
format="d"
About "bignums", #Carl Witthoft:
Thanks, Carl. ... I did think about bignums, when I read it. Are you sure there
's a problem with the rounding?
See this:
> mpfr("1.2345456789", prec=84)
1 'mpfr' number of precision 84 bits
[1] 1.23454567889999999999999999
and note that Rmpfr (I'm the maintainer) does stay close to the underlying MPFR library. For round(), I've applied the logic/principle of f(x) returning a result with the same formal precision as x. If you want rounding with decreased formal precision, you can conveniently use roundMpfr():
> roundMpfr(bfoo, 32)
1 'mpfr' number of precision 32 bits
[1] 1.2345456788

Resources