Internal representation of a number in R - r

> trunc(26015)
[1] 26015
> 260.15*100
[1] 26015
> trunc(260.15*100)
[1] 26014
> floor(260.15*100)
[1] 26014
> as.integer(260.15*100)
[1] 26014
For this code in R, is there an issue with the internal representation of the number?
When I do 260.15*100, the number being printed is still 26015, but when I use a function like trunc() or as.integer(), it becomes 26014.
Usually, my value containing the decimal comes from another variable. So how do I overcome this issue?

The print method for a numeric is not the same as its internal representation. 260.15 * 100 is never actually 26015, it is just printed as such. print.numeric uses rounding. The underlying numeric data is floating point. You can see this by changing your print options:
# set print.numeric() to display up to 22 digits, the maximum possible
> options(digits = 22)
> 260.15 * 100
[1] 26014.99999999999636202
> 26015
[1] 26015
In lieu of trunc() or as.integer(), does round() meet your needs?

Related

R the number of significant digits leads to unexpected results of inequality using eval and parse text

I am working on boolean rules related to terminal node assignment for CART-like trees related to my work (http://web.ccs.miami.edu/~hishwaran/ishwaran.html)
I have noticed problematic behavior in evaluating inequalities of character strings using eval and parse of text. The issue has to do with how R evaluates the internal representation of a number.
Here's an example involving the number pi. I want to check if a vector (which I call x) is less than or equal to pi.
> pi
> [1] 3.141593
> rule = paste0("x <= ", pi)
> rule
> [1] "x <= 3.14159265358979"
This rule checks whether the object x is less than pi where pi is represented to 14 digits. Now I will assign x to the values 1,2,3 and pi
> x = c(1,2,3,pi)
Here's what x is up to 15 digits
> print(x, digits=15)
> [1] 1.00000000000000 2.00000000000000 3.00000000000000 3.14159265358979
Now let's evaluate this
> eval(parse(text = rule))
> [1] TRUE TRUE TRUE FALSE
Whooaaaaa, it looks like pi is not less than or equal to pi. Right?
But now if I hard-code x to pi to 14 digits, it works:
> x = c(1,2,3,3.14159265358979)
> eval(parse(text = rule))
[1] TRUE TRUE TRUE TRUE
Obviously in the first case, the internal representation for pi has many digits and so when R evaluates the expression, it is greater than the float representation and it returns FALSE. In the second case it compares two floats, so the result is true.
However, how to avoid this happening? I really need the first evaluation to come back true because I am automating this process for rule based inference and I cannot hard code a value (here this being pi) each time.
One solution I use is to add a small tolerance value.
> tol = sqrt(.Machine$double.eps)
> rule = paste0("x <= ", pi + tol)
> x = c(1,2,3,pi)
> eval(parse(text = rule))
> [1] TRUE TRUE TRUE TRUE
However, this seems like an ugly solution.
Any comments and suggestions are greatly appreciated!
You could just go via the pi name or via a function instead, to prevent pi from getting stringified (which is your first problem here)
rule <- "x <= pi"
x <- c(1,2,3,pi)
eval(parse(text = rule)) ## All TRUE
## another way might be to throw stuff you need uneval'ed into a function or a block:
my_pi <- function() {
pi
}
rule <- "x <= my_pi()"
eval(parse(text = rule)) ## All TRUE
You still will suffer from the usual floating point issues, but imprecise stringification won't be your problem anymore.
Here's why your approach didn't work:
> print( pi, digits=20 )
[1] 3.141592653589793116
> print( eval(parse(text=pi)), digits=20 )
[1] 3.1415926535897900074
The stringified pi is less than R's pi by a good margin.
The paste manual says it uses as.character to convert numbers to strings. Which in turn says it's using 15 significant digits which is what you are observing.

Fixed and decimal places of a positive number

I want to write a function for counting the number of fixed digits and the number of decimal digits of each positive number. For whole numbers: for example, for number 459 I would like to see
fixed=3 and decimal=0
as the output of the function.
For decimal numbers: for example, for number 12.657 I would like to see
fixed=2 and decimal=3
(because 12 has two digits and 657 has three digits). For numbers less than 1 I would like to see
0
as the fixed, for example for number 0.4056 I would like to see
fixed=0 and decimal=4
or for number 0.13 I would like to see
fixed=0 and decimal=2
I have a general idea as follows:
digits<-function(x){
length(strsplit(as.character(x),split="")[[1]])
}
I want to extend my code as a new one to work as I explained above.
It seems you kind of stuck hoping that this can be done. Okay here is a crude way:
fun <- function(x){
stopifnot(is.numeric(as.numeric(x)))
s = nchar(unlist(strsplit(as.character(x),".",fixed = TRUE)))
if(as.numeric(x) < 1) s[1] <- s[1]-1
setNames(as.list(s),c("fixed","decimal"))
}
CORRECT:
fun(10.234)
$fixed
[1] 2
$decimal
[1] 3
fun(-10.234)
$fixed
[1] 2
$decimal
[1] 3
fun(0.2346)
$fixed
[1] 0
$decimal
[1] 4
> fun(-0.2346)
$fixed
[1] 0
$decimal
[1] 4
INCORRECT: Note that fixed + decimal <=15!!!
fun(-10000567.2346678901876)
$fixed
[1] 8
$decimal
[1] 7 ## This is incorrect
The correct value is:
fun("-10000567.2346678901876") # Note that the input x is a string
$fixed
[1] 8
$decimal
[1] 13
I do not think this can be done. We cannot assume that a simple numeric value is accurately represented in the computer. Most floating-point values certainly can't.
Enter 0.3 at the R console:
> 0.3
[1] 0.3
Looks alright, doesn't it? But now, let us do this:
> print(0.3, digits=22)
[1] 0.29999999999999999
In essence, if you convert a floating-point number to a string you define how precise you want it. The computer cannot give you that precision because it stores all numbers in bits and, thus, never gives you absolute precision. Even if you see the number as 0.3 and you think it has 0 fixed digits and 1 decimal because of that, this is because R chose to print it that way, not because that is the number represented in the computer's memory.
The other answers prove that a function can handle simple cases. I must admit that I learned that R does an incredible job interpreting the numbers. But we have to use caution! As soon as we start transforming numbers, such function cannot guarantuee meaningful results.
EDITED (based on comments):
The below function will work for numbers with 15 or less significant figures:
digits<-function(x){
## make sure x is a number, if not stop
tryCatch(
{
x/1
},
error=function(cond) {
stop('x must be a number')
}
)
## use nchar() and strsplit() to determine number of digits before decimal point
fixed<-nchar(strsplit(as.character(x),"\\.")[[1]][1])
## check if negative
if(substr(strsplit(as.character(x),"\\.")[[1]][1],1,1)=="-"){fixed<-fixed-1}
## check if -1<x<1
if(as.numeric(strsplit(as.character(x),"\\.")[[1]][1])==0){fixed<-fixed-1}
## use nchar() and strsplit() to determine number of digits after decimal point
decimal<-nchar(strsplit(as.character(x),"\\.")[[1]][2])
## for integers, replace NA decimal result with 0
if(is.na(nchar(strsplit(as.character(x),"\\.")[[1]][2]))){decimal<-0}
## return results
print(paste0("fixed: ",fixed," and decimal: ", decimal))
}
If you want to count negative signs (-) as a digit you'll need to remove:
## check if negative
if(substr(strsplit(as.character(x),"\\.")[[1]][1],1,1)=="-"){fixed<-fixed-1}
Note: The 15 or less significant figures restriction is based on 8 byte floating point representation.
The only way I've been able to overcome this is by using the below code.
digits<-function(x){
tryCatch(
{
as.numeric(x)/1
},
error=function(cond) {
stop('x must be a number')
}
)
j <- 0
num <- x
fixed <- 0
decimal <- 0
for(i in 1:nchar(x)){
if(substr(num, i, i) == "."){
j <- 1
} else if(j==0){
fixed <- fixed + 1
} else{
decimal <- decimal + 1
}
}
if(substr(x,1,1)=="-" & substr(as.numeric(x),2,2)==0){
fixed<-fixed-2
}else if(substr(x,1,1)=="-"){
fixed<-fixed-1
}else if(substr(as.numeric(x),1,1)==0){
fixed<-fixed-1
}else{}
print(paste0("fixed: ",fixed," and decimal: ", decimal))
}
However, this code requires the number to be passed into the function as a character string in order to compute passed 15 significant figures, such as is shown below:
x<-"111111111111111111111111111111111111.22222222222222222222222222222222222222222222222"
digits(x)
Which returns:
[1] "fixed: 36 and decimal: 47"
This then limits this function's applications as it cannot be used in a dplyr pipe, and you will not receive accurate results for numbers that have been previously rounded. For example, using the same number as above, but storing it as a number instead of character string yields the below results:
x<-111111111111111111111111111111111111.22222222222222222222222222222222222222222222222
digits(x)
[1] "fixed: 1 and decimal: 18"
I hope this at least somewhat helps!

why does as.integer in R decrement the value?

I am doing a simple operation of multiplying a decimal number and converting it to integer but the result seems to be different than expected. Apologies if this is discussed else where, I am not able to find any straight forward answers to this
> as.integer(1190.60 * 100)
[1] 119059
EDIT:
So, I have to convert that to character and then do as.integer to get what is expected
> temp <- 1190.60
> temp2 <- 1190.60 * 100
> class(temp)
[1] "numeric"
> class(temp2)
[1] "numeric"
> as.character(temp2)
[1] "119060"
> as.integer(temp2)
[1] 119059
> as.integer(as.character(temp2))
[1] 119060
EDIT2: According to the comments, thanks #andrey-shabalin
> temp2
[1] 119060
> as.integer(temp2)
[1] 119059
> as.integer(round(temp2))
[1] 119060
EDIT3: As mentioned in the comments the question is related to behaviour of as.integer and not about floating calculations
The answer to this is "floating point error". You can see this easily by checking the following:
> temp <- 1190.60
> temp2 <- 1190.60 * 100
> temp2 - 119060
[1] -1.455192e-11
Due to floating point errors, temp2 isn't exactly 119060 but :
> sprintf("%.20f", temp2)
[1] "119059.99999999998544808477"
If you use as.integer on a float, it works the same way as trunc, i.e. it does round the float in the direction of 0. So in this case that becomes 119059.
If you convert to character using as.character(), R will make sure that it uses maximum 15 significant digits. In this example that would be "119059.999999999". The next digit is another 9, so R will round this to 119060 before conversion. I avoid this in the code above by using sprintf() instead of as.character().

Rounding Error when converting from character to numeric

I have a data.table of data numbers in character format that I am trying to convert to numeric numbers. However the issue is that the numbers are very long and I want to retain all of the numbers without any rounding from R. For examle the first 5 elements of the data.table:
> TimeO[1]
[1] "20110630224701281482"
> TimeO[2]
[1] "20110630224701281523"
> TimeO[3]
[1] "20110630224701281533"
> TimeO[4]
[1] "20110630224701281548"
> TimeO[5]
[1] "20110630224701281762"
I wrote a function to convert from a character into numeric:
convert_time_fast <- function(tim){
b <- tim - tim%/%10^12*10^12
# hhmmssffffff
ms <- b%%10^6; b <-(b-ms)/10^6
ss <- b%%10^2; b <-(b-ss)/10^2
mm <- b%%10^2; hh <-(b-mm)/10^2
# if hours>=22, subtract 24 (previous day)
hh <- hh - (hh>=22)*24
return(hh+mm/60+ss/3600+ms/(3600*10^6))
}
However the rounding occurs in R so datapoints now have the same time. See first 5 elements after converting:
TimeOC <--convert_time_fast(as.numeric(TimeO))
> TimeOC[1]
[1] 1.216311
> TimeOC[2]
[1] 1.216311
> TimeOC[3]
[1] 1.216311
> TimeOC[4]
[1] 1.216311
> TimeOC[5]
[1] 1.216311
Any help figuring this out would be greatly appreciated!
You should test to see if they are really equal (all.equal()).
Usually R limits the number of digits it prints (usually to 7), but they are still there.
See also this example:
> as.numeric("1.21631114")
[1] 1.216311
> as.numeric("1.21631118")
[1] 1.216311
> all.equal(as.numeric("1.21631114"), as.numeric("1.21631118"))
[1] "Mean relative difference: 3.288632e-08" # which indicates they're not the same

R floating point number precision being lost on coversion from character

I have a large floating point number as a character like so
x<-"5374761693.91823";
On doing
as.numeric(x);
I get the following output
5374761694
I would like to preserve the floating point nature of the number while casting.
use digits argument in print to see the actual number:
> print(as.numeric(x), digits=15)
[1] 5374761693.91823
options is another alternative:
> options(digits=16)
> as.numeric(x)
[1] 5374761693.91823
> # assignments
> options(digits=16)
> y <- as.numeric(x)
> y
[1] 5374761693.91823
z <- print(as.numeric(x), digits=15)
z

Resources