Does R treat numbers as double internally? - r

Does R treat numbers mainly as double?
The following code suggests that R treats numbers as double. Even if I make it integer, it easily becomes double after some calculation. (Code1)
Also, even if the result looks like integer, interanlly it is treated as double. (Code2)
Is my understanding right?
Code 1:
> typeof(5)
[1] "double"
> typeof( 5 / 1 )
[1] "double"
> typeof( as.integer(c(1,2,3)) )
[1] "integer"
> typeof( as.integer(c(1,2,3)) + 1 )
[1] "double"
> typeof( as.integer(c(1,2,3)) / 1 )
[1] "double"
Code 2:
> 1 + 2
[1] 3
> typeof( 1 + 2)
[1] "double"

R handles numbers in different ways. In R both integers and double precision float defaults to their 32 bit version.
As pointed out by Andrey, there are two different types of numbers in R.
Literals 1L, 2L, 3L, ...., this is equivalent to as.integer(1)
regular numbers (1, 2, 3.4, any number really)
As well as their complex counterparts.
Literals are integers as such
typeof(1) #double
class(1) #numeric
typeof(1L) #integer
class(1L) #integer
is well defined. However upon calculation, if any part of the calculation is not stored as a lower or equal type than integer, it will automatically be converted to a double:
typeof(1L + 1L) #integer
typeof(1L + 1) #double
typeof(1L + TRUE) #integer
typeof(1L * 3) #double
typeof(1L * 3L) #integer
One should note however, as R runs with 32 bit variables, these have a limited range, compared to python 3.x. However one can get around the 32 bit variables ( in most cases! ) by using the packages bit64 for 64 bit integers and Rmpfr which gives an interface for arbitrary floating point precision (according to their documentation).
Edit
I wrongly stated that "In R both integers and double precision float defaults to their 32 bit version". This is not the case for double precision variables, which default to a their 64 bit counterpart in almost all releases of R nowadays.

To make a number an integer from the start, add L to it:
typeof(1L)
# [1] "integer"
There are dangers in working with 32 bit integers though:
2e9L
# [1] 2000000000
2e9L + 2e9L
# [1] NA
# Warning message:
# In 2000000000L + 2000000000L : NAs produced by integer overflow

Related

Create a sequence from big power numbers

I'm trying to create a sequence of integers from big numbers and couldn't find a way to suceed. Is there a way to do this ?
I tried :
(2^128):(2^128+3000) which returns: [1] 3.402824e+38
So I tried to use the gmp library :
library(gmp)
as.bigz(2^128):as.bigz(2^128+3000)
and got
messages: 1: In as.bigz(2^128):as.bigz(2^128 + 3000) : numerical
expression has 32 elements: only the first used 2: In
as.bigz(2^128):as.bigz(2^128 + 3000) : numerical expression has 32
elements: only the first used
Add your sequence to your "big number":
library(gmp)
as.bigz(2^128) + 0:3000
Big Integer ('bigz') object of length 3001:
[1] 340282366920938463463374607431768211456 340282366920938463463374607431768211457
[3] 340282366920938463463374607431768211458 340282366920938463463374607431768211459
[5] 340282366920938463463374607431768211460 340282366920938463463374607431768211461
# ...
We can use seq
library(gmp)
seq(as.bigz(2^128), length.out = 3001)

R the number of significant digits leads to unexpected results of inequality using eval and parse text

I am working on boolean rules related to terminal node assignment for CART-like trees related to my work (http://web.ccs.miami.edu/~hishwaran/ishwaran.html)
I have noticed problematic behavior in evaluating inequalities of character strings using eval and parse of text. The issue has to do with how R evaluates the internal representation of a number.
Here's an example involving the number pi. I want to check if a vector (which I call x) is less than or equal to pi.
> pi
> [1] 3.141593
> rule = paste0("x <= ", pi)
> rule
> [1] "x <= 3.14159265358979"
This rule checks whether the object x is less than pi where pi is represented to 14 digits. Now I will assign x to the values 1,2,3 and pi
> x = c(1,2,3,pi)
Here's what x is up to 15 digits
> print(x, digits=15)
> [1] 1.00000000000000 2.00000000000000 3.00000000000000 3.14159265358979
Now let's evaluate this
> eval(parse(text = rule))
> [1] TRUE TRUE TRUE FALSE
Whooaaaaa, it looks like pi is not less than or equal to pi. Right?
But now if I hard-code x to pi to 14 digits, it works:
> x = c(1,2,3,3.14159265358979)
> eval(parse(text = rule))
[1] TRUE TRUE TRUE TRUE
Obviously in the first case, the internal representation for pi has many digits and so when R evaluates the expression, it is greater than the float representation and it returns FALSE. In the second case it compares two floats, so the result is true.
However, how to avoid this happening? I really need the first evaluation to come back true because I am automating this process for rule based inference and I cannot hard code a value (here this being pi) each time.
One solution I use is to add a small tolerance value.
> tol = sqrt(.Machine$double.eps)
> rule = paste0("x <= ", pi + tol)
> x = c(1,2,3,pi)
> eval(parse(text = rule))
> [1] TRUE TRUE TRUE TRUE
However, this seems like an ugly solution.
Any comments and suggestions are greatly appreciated!
You could just go via the pi name or via a function instead, to prevent pi from getting stringified (which is your first problem here)
rule <- "x <= pi"
x <- c(1,2,3,pi)
eval(parse(text = rule)) ## All TRUE
## another way might be to throw stuff you need uneval'ed into a function or a block:
my_pi <- function() {
pi
}
rule <- "x <= my_pi()"
eval(parse(text = rule)) ## All TRUE
You still will suffer from the usual floating point issues, but imprecise stringification won't be your problem anymore.
Here's why your approach didn't work:
> print( pi, digits=20 )
[1] 3.141592653589793116
> print( eval(parse(text=pi)), digits=20 )
[1] 3.1415926535897900074
The stringified pi is less than R's pi by a good margin.
The paste manual says it uses as.character to convert numbers to strings. Which in turn says it's using 15 significant digits which is what you are observing.

Internal representation of a number in R

> trunc(26015)
[1] 26015
> 260.15*100
[1] 26015
> trunc(260.15*100)
[1] 26014
> floor(260.15*100)
[1] 26014
> as.integer(260.15*100)
[1] 26014
For this code in R, is there an issue with the internal representation of the number?
When I do 260.15*100, the number being printed is still 26015, but when I use a function like trunc() or as.integer(), it becomes 26014.
Usually, my value containing the decimal comes from another variable. So how do I overcome this issue?
The print method for a numeric is not the same as its internal representation. 260.15 * 100 is never actually 26015, it is just printed as such. print.numeric uses rounding. The underlying numeric data is floating point. You can see this by changing your print options:
# set print.numeric() to display up to 22 digits, the maximum possible
> options(digits = 22)
> 260.15 * 100
[1] 26014.99999999999636202
> 26015
[1] 26015
In lieu of trunc() or as.integer(), does round() meet your needs?

Fixed and decimal places of a positive number

I want to write a function for counting the number of fixed digits and the number of decimal digits of each positive number. For whole numbers: for example, for number 459 I would like to see
fixed=3 and decimal=0
as the output of the function.
For decimal numbers: for example, for number 12.657 I would like to see
fixed=2 and decimal=3
(because 12 has two digits and 657 has three digits). For numbers less than 1 I would like to see
0
as the fixed, for example for number 0.4056 I would like to see
fixed=0 and decimal=4
or for number 0.13 I would like to see
fixed=0 and decimal=2
I have a general idea as follows:
digits<-function(x){
length(strsplit(as.character(x),split="")[[1]])
}
I want to extend my code as a new one to work as I explained above.
It seems you kind of stuck hoping that this can be done. Okay here is a crude way:
fun <- function(x){
stopifnot(is.numeric(as.numeric(x)))
s = nchar(unlist(strsplit(as.character(x),".",fixed = TRUE)))
if(as.numeric(x) < 1) s[1] <- s[1]-1
setNames(as.list(s),c("fixed","decimal"))
}
CORRECT:
fun(10.234)
$fixed
[1] 2
$decimal
[1] 3
fun(-10.234)
$fixed
[1] 2
$decimal
[1] 3
fun(0.2346)
$fixed
[1] 0
$decimal
[1] 4
> fun(-0.2346)
$fixed
[1] 0
$decimal
[1] 4
INCORRECT: Note that fixed + decimal <=15!!!
fun(-10000567.2346678901876)
$fixed
[1] 8
$decimal
[1] 7 ## This is incorrect
The correct value is:
fun("-10000567.2346678901876") # Note that the input x is a string
$fixed
[1] 8
$decimal
[1] 13
I do not think this can be done. We cannot assume that a simple numeric value is accurately represented in the computer. Most floating-point values certainly can't.
Enter 0.3 at the R console:
> 0.3
[1] 0.3
Looks alright, doesn't it? But now, let us do this:
> print(0.3, digits=22)
[1] 0.29999999999999999
In essence, if you convert a floating-point number to a string you define how precise you want it. The computer cannot give you that precision because it stores all numbers in bits and, thus, never gives you absolute precision. Even if you see the number as 0.3 and you think it has 0 fixed digits and 1 decimal because of that, this is because R chose to print it that way, not because that is the number represented in the computer's memory.
The other answers prove that a function can handle simple cases. I must admit that I learned that R does an incredible job interpreting the numbers. But we have to use caution! As soon as we start transforming numbers, such function cannot guarantuee meaningful results.
EDITED (based on comments):
The below function will work for numbers with 15 or less significant figures:
digits<-function(x){
## make sure x is a number, if not stop
tryCatch(
{
x/1
},
error=function(cond) {
stop('x must be a number')
}
)
## use nchar() and strsplit() to determine number of digits before decimal point
fixed<-nchar(strsplit(as.character(x),"\\.")[[1]][1])
## check if negative
if(substr(strsplit(as.character(x),"\\.")[[1]][1],1,1)=="-"){fixed<-fixed-1}
## check if -1<x<1
if(as.numeric(strsplit(as.character(x),"\\.")[[1]][1])==0){fixed<-fixed-1}
## use nchar() and strsplit() to determine number of digits after decimal point
decimal<-nchar(strsplit(as.character(x),"\\.")[[1]][2])
## for integers, replace NA decimal result with 0
if(is.na(nchar(strsplit(as.character(x),"\\.")[[1]][2]))){decimal<-0}
## return results
print(paste0("fixed: ",fixed," and decimal: ", decimal))
}
If you want to count negative signs (-) as a digit you'll need to remove:
## check if negative
if(substr(strsplit(as.character(x),"\\.")[[1]][1],1,1)=="-"){fixed<-fixed-1}
Note: The 15 or less significant figures restriction is based on 8 byte floating point representation.
The only way I've been able to overcome this is by using the below code.
digits<-function(x){
tryCatch(
{
as.numeric(x)/1
},
error=function(cond) {
stop('x must be a number')
}
)
j <- 0
num <- x
fixed <- 0
decimal <- 0
for(i in 1:nchar(x)){
if(substr(num, i, i) == "."){
j <- 1
} else if(j==0){
fixed <- fixed + 1
} else{
decimal <- decimal + 1
}
}
if(substr(x,1,1)=="-" & substr(as.numeric(x),2,2)==0){
fixed<-fixed-2
}else if(substr(x,1,1)=="-"){
fixed<-fixed-1
}else if(substr(as.numeric(x),1,1)==0){
fixed<-fixed-1
}else{}
print(paste0("fixed: ",fixed," and decimal: ", decimal))
}
However, this code requires the number to be passed into the function as a character string in order to compute passed 15 significant figures, such as is shown below:
x<-"111111111111111111111111111111111111.22222222222222222222222222222222222222222222222"
digits(x)
Which returns:
[1] "fixed: 36 and decimal: 47"
This then limits this function's applications as it cannot be used in a dplyr pipe, and you will not receive accurate results for numbers that have been previously rounded. For example, using the same number as above, but storing it as a number instead of character string yields the below results:
x<-111111111111111111111111111111111111.22222222222222222222222222222222222222222222222
digits(x)
[1] "fixed: 1 and decimal: 18"
I hope this at least somewhat helps!

A problem on "identical()" function in R? How does "identical()" work for different types of objects?

(reproducible example added)
I cannot grasp enough why the following is FALSE (I aware they are double and integer resp.):
identical(1, as.integer(1)) # FALSE
?identical reveals:
num.eq:
logical indicating if (double and complex non-NA) numbers should be compared using == (‘equal’), or by bitwise comparison. The latter (non-default)
differentiates between -0 and +0.
sprintf("%.8190f", as.integer(1)) and sprintf("%.8190f", 1) return exactly equal bit pattern. So, I think that at least one of the following must return TRUE. But, I get FALSE in each of the following:
identical(1, as.integer(1), num.eq=TRUE) # FALSE
identical(1, as.integer(1), num.eq=FALSE) # FALSE
I consider like that now: If sprintf is a notation indicator, not the storage indicator, then this means identical() compares based on storage. i.e.
identical(bitpattern1, bitpattern1bitpattern2) returns FALSE. I could not find any other logical explanation to above FALSE/FALSE situation.
I do know that in both 32bit/64bit architecture of R, integers are stored as 32bit.
They are not identical precisely because they have different types. If you look at the documentation for identical you'll find the example identical(1, as.integer(1)) with the comment ## FALSE, stored as different types. That's one clue. The R language definition reminds us that:
Single numbers, such as 4.2, and strings, such as "four point two" are still vectors, of length 1; there are no more basic types (emphasis mine).
So, basically everything is a vector with a type (that's also why [1] shows up every time R returns something). You can check this by explicitly creating a vector with length 1 by using vector, and then comparing it to 0:
x <- vector("double", 1)
identical(x, 0)
# [1] TRUE
That is to say, both vector("double", 1) and 0 output vectors of type "double" and length == 1.
typeof and storage.mode point to the same thing, so you're kind of right when you say "this means identical() compares based on storage". I don't think this necessarily means that "bit patterns" are being compared, although I suppose it's possible. See what happens when you change the storage mode using storage.mode:
## Assign integer to x. This is really a vector length == 1.
x <- 1L
typeof(x)
# [1] "integer"
identical(x, 1L)
# [1] TRUE
## Now change the storage mode and compare again.
storage.mode(x) <- "double"
typeof(x)
# [1] "double"
identical(x, 1L) # This is no longer TRUE.
# [1] FALSE
identical(x, 1.0) # But this is.
# [1] TRUE
One last note: The documentation for identical states that num.eq is a…
logical indicating if (double and complex non-NA) numbers should be compared using == (‘equal’), or by bitwise comparison.
So, changing num.eq doesn't affect any comparison involving integers. Try the following:
# Comparing integers with integers.
identical(+0L, -0L, num.eq = T) # TRUE
identical(+0L, -0L, num.eq = F) # TRUE
# Comparing integers with doubles.
identical(+0, -0L, num.eq = T) # FALSE
identical(+0, -0L, num.eq = F) # FALSE
# Comparing doubles with doubles.
identical(+0.0, -0.0, num.eq = T) # TRUE
identical(+0.0, -0.0, num.eq = F) # FALSE

Resources