precision of double in R

precision of double in R - r

I have to calculate the difference between two long variables in R.
Initially, it was stored as text. But when I tried to convert them into numeric or double to calculate the difference, R fails to recognize that the difference is 1.
testVariable1 = as.numeric("233203300000000001")
testVariable2 = as.numeric("233203300000000002")
testVariable2 - testVariable1
Result:
[1] 0
What can I do to solve this issue?
Thanks in advance!

library(gmp)
as.bigz("233203300000000002")-as.bigz("233203300000000001")
Big Integer ('bigz') :
[1] 1

You could try using the bit64 package:
library(bit64)
##
testVariable1 <- as.integer64("233203300000000001")
testVariable2 <- as.integer64("233203300000000002")
##
R> testVariable2 - testVariable1
#integer64
#[1] 1
R> as.numeric(testVariable2 - testVariable1)
#[1] 1

Related

why does as.integer in R decrement the value?

I am doing a simple operation of multiplying a decimal number and converting it to integer but the result seems to be different than expected. Apologies if this is discussed else where, I am not able to find any straight forward answers to this
> as.integer(1190.60 * 100)
[1] 119059
EDIT:
So, I have to convert that to character and then do as.integer to get what is expected
> temp <- 1190.60
> temp2 <- 1190.60 * 100
> class(temp)
[1] "numeric"
> class(temp2)
[1] "numeric"
> as.character(temp2)
[1] "119060"
> as.integer(temp2)
[1] 119059
> as.integer(as.character(temp2))
[1] 119060
EDIT2: According to the comments, thanks #andrey-shabalin
> temp2
[1] 119060
> as.integer(temp2)
[1] 119059
> as.integer(round(temp2))
[1] 119060
EDIT3: As mentioned in the comments the question is related to behaviour of as.integer and not about floating calculations

The answer to this is "floating point error". You can see this easily by checking the following:
> temp <- 1190.60
> temp2 <- 1190.60 * 100
> temp2 - 119060
[1] -1.455192e-11
Due to floating point errors, temp2 isn't exactly 119060 but :
> sprintf("%.20f", temp2)
[1] "119059.99999999998544808477"
If you use as.integer on a float, it works the same way as trunc, i.e. it does round the float in the direction of 0. So in this case that becomes 119059.
If you convert to character using as.character(), R will make sure that it uses maximum 15 significant digits. In this example that would be "119059.999999999". The next digit is another 9, so R will round this to 119060 before conversion. I avoid this in the code above by using sprintf() instead of as.character().

Rounding Error when converting from character to numeric

I have a data.table of data numbers in character format that I am trying to convert to numeric numbers. However the issue is that the numbers are very long and I want to retain all of the numbers without any rounding from R. For examle the first 5 elements of the data.table:
> TimeO[1]
[1] "20110630224701281482"
> TimeO[2]
[1] "20110630224701281523"
> TimeO[3]
[1] "20110630224701281533"
> TimeO[4]
[1] "20110630224701281548"
> TimeO[5]
[1] "20110630224701281762"
I wrote a function to convert from a character into numeric:
convert_time_fast <- function(tim){
b <- tim - tim%/%10^12*10^12
# hhmmssffffff
ms <- b%%10^6; b <-(b-ms)/10^6
ss <- b%%10^2; b <-(b-ss)/10^2
mm <- b%%10^2; hh <-(b-mm)/10^2
# if hours>=22, subtract 24 (previous day)
hh <- hh - (hh>=22)*24
return(hh+mm/60+ss/3600+ms/(3600*10^6))
}
However the rounding occurs in R so datapoints now have the same time. See first 5 elements after converting:
TimeOC <--convert_time_fast(as.numeric(TimeO))
> TimeOC[1]
[1] 1.216311
> TimeOC[2]
[1] 1.216311
> TimeOC[3]
[1] 1.216311
> TimeOC[4]
[1] 1.216311
> TimeOC[5]
[1] 1.216311
Any help figuring this out would be greatly appreciated!

You should test to see if they are really equal (all.equal()).
Usually R limits the number of digits it prints (usually to 7), but they are still there.
See also this example:
> as.numeric("1.21631114")
[1] 1.216311
> as.numeric("1.21631118")
[1] 1.216311
> all.equal(as.numeric("1.21631114"), as.numeric("1.21631118"))
[1] "Mean relative difference: 3.288632e-08" # which indicates they're not the same

Does R use 0-indexing for dates?

R uses the date "1970-01-01" as an origin. Does it make an exception from its typical 1-indexing to index dates with 0-indexing?
> x <- as.Date("1970-01-01")
> y <- as.Date("1970-01-02")
> unclass(x)
[1] 0
> unclass(y)
[1] 1

No. This is not an indexing thing. "Dates are represented as the number of days since 1970-01-01" (From the ?Date help page). Also note
unclass(as.Date("1969-12-31")) == -1
So it's not an index, it's a difference from a sentinel value. There's no underlying vector here.

Vectorizing a function that uses strsplit

I am trying to make a function that converts time (in character form) to decimal format such that 1 corresponds to 1 am and 23 corresponds to 11 pm and 24 means the end of the day.
Here are the two function that does this. Here one function vectorizes while other do
time2dec <- function(time0)
{
time.dec <-as.numeric(substr(time0,1,2))+as.numeric(substr(time0,4,5))/60+(as.numeric(substr(time0,7,8)))/3600
return(time.dec)
}
time2dec1 <- function(time0)
{
time.dec <-as.numeric(strsplit(time0,':')[[1]][1])+as.numeric(strsplit(time0,':')[[1]][2])/60+as.numeric(strsplit(time0,':')[[1]][3])/3600
return(time.dec)
}
This is what I get...
times <- c('12:23:12','10:23:45','9:08:10')
#>time2dec(times)
[1] 12.38667 10.39583 NA
Warning messages:
1: In time2dec(times) : NAs introduced by coercion
2: In time2dec(times) : NAs introduced by coercion
#>time2dec1(times)
[1] 12.38667
I know time2dec which is vectorized, gives NA for the last element because it extracts 9: instead of 9 as hour. That is why I created time2dec1 but I do not know why it is not getting vectorized.
I will also be interested in getting a better function for doing what I am trying to do.
I saw this which explain a part of my question but does not provide a clue to do what I am trying.

Don't try to reinvent the wheel:
times1 <- difftime(as.POSIXct(times, "%H:%M:%S", tz="GMT"),
as.POSIXct("0:0:0", "%H:%M:%S", tz="GMT"),
units="hours")
#Time differences in hours
#[1] 12.386667 10.395833 9.136111
as.numeric(times1)
#[1] 12.386667 10.395833 9.136111

In the following we shall use this test vector:
ch <- c('12:23:12','10:23:45','9:08:10')
1) To fix up the solution in the question we prepend a 0 and then replace any string of 3 digits with the last two:
num.substr <- function(...) as.numeric(substr(...))
time2dec <- function(time0) {
t0 <- sub("\\d(\\d\\d)", "\\1", paste0(0, time0))
num.substr(t0, 1, 2) + num.substr(t0, 4, 5) / 60 + num.substr(t0, 7, 8) / 3600
}
time2dec(ch)
## [1] 12.386667 10.395833 9.136111
2) Parsing the string is slightly easier with strapply in the gsubfn package:
strapply(ch, "^(.?.):(..):(..)",
~ as.numeric(h) + as.numeric(m)/60 + as.numeric(s)/36000,
simplify = c)
## [1] 12.383667 10.384583 9.133611
3) We can reduce the string manipulation to just removing the colons and then convert the resulting character string to numeric so we can manipulate it numerically:
num <- as.numeric(gsub(":", "", ch))
num %/% 10000 + num %% 10000 %/% 100 / 60 + num %% 100 / 3600
## [1] 12.386667 10.395833 9.136111
4) The chron package has a "times" class that internally represents times as fractions of a day. Converting that to hours gives an easy solution:
library(chron)
24 * as.numeric(times(ch))
## [1] 12.386667 10.395833 9.136111
ADDED Added more solutions.

as.numeric( strptime(times, "%H:%M:%S")-strptime(Sys.Date(), "%Y-%m-%d" ))
[1] 12.386667 10.395833 9.136111
Basically the same as Roland's but bypassing some steps, and I try to avoid using difftime if I can. Had too many bugs arise because I don't really understand the function or the class ... or something. And when I timed it versus Roland's his was faster. Oh, well.
Emulating #G.Grothendieck's efforts (and essentially working similarly to his elegant strapply solution:
num <- apply( matrix(scan(text=gsub(":", " ", ch), what=numeric(0)),nrow=3), 2,
function(x) x[1]+x[2]/60 +x[3]/3600 )
#Read 9 items
num
#[1] 12.386667 10.395833 9.136111
And this actually answers the original question:
num <- sapply( strsplit(ch, ":"), function(x){ x2 <- as.numeric(x);
x2[1]+x2[2]/60 +x2[3]/3600})
num
#[1] 12.386667 10.395833 9.136111

The following does what you want
sapply(strsplit(times, ":"), function(d) {
sum(as.numeric(d)*c(1,1/60,1/3600))
})
Step by step:
strsplit(times, ":")
returns a list with character vectors. Each character vector contains the three part of the time (hour, minutes, seconds). We now want to convert each of the elements in the list to a numeric values. For this we need to apply a function to each element and put the results of the back into a vector which is what sapply does.
sapply(strsplit(times, ":", function(d) {
})
As for the function. We first need to convert the character values to numeris values using as.numeric. The we multiply the first element with 1, the second with 1/60 and the third with 1/3600 and add the results (for which we use sum). Resulting in
sapply(strsplit(times, ":"), function(d) {
sum(as.numeric(d)*c(1,1/60,1/3600))
})

Convert hex to decimal in R

I found out that there is function called .hex.to.dec in the fBasics package.
When I do .hex.to.dec(a), it works.
I have a data frame with a column samp_column consisting of such values:
a373, 115c6, a373, 115c6, 176b3
When I do .hex.to.dec(samp_column), I get this error:
"Error in nchar(b) : 'nchar()' requires a character vector"
When I do .hex.to.dec(as.character(samp_column)), I get this error:
"Error in rep(base.out, 1 + ceiling(log(max(number), base =
base.out))) : invalid 'times' argument"
What would be the best way of doing this?

Use base::strtoi to convert hexadecimal character vectors to integer:
strtoi(c("0xff", "077", "123"))
#[1] 255 63 123

There is a simple and generic way to convert hex <-> other formats using "C/C++ way":
V <- c(0xa373, 0x115c6, 0xa373, 0x115c6, 0x176b3)
sprintf("%d", V)
#[1] "41843" "71110" "41843" "71110" "95923"
sprintf("%.2f", V)
#[1] "41843.00" "71110.00" "41843.00" "71110.00" "95923.00"
sprintf("%x", V)
#[1] "a373" "115c6" "a373" "115c6" "176b3"

As mentioned in #user4221472's answer, strtoi() overflows with integers larger than 2^31.
The simplest way around that is to use as.numeric().
V <- c(0xa373, 0x115c6, 0x176b3, 0x25cf40000)
as.numeric(V)
#[1] 41843 71110 95923 10149429248
As #MS Berends noted in the comments, "[a]lso notice that just printing V in the console will already print in decimal."

strtoi() has a limitation of 31 bits. Hex numbers with the high order bit set return NA:
> strtoi('0x7f8cff8b')
[1] 2139946891
> strtoi('0x8f8cff8b')
[1] NA

To get a signed value with 16 bits:
temp <- strtoi(value, base=16L)
if (temp>32767){ temp <- -(65535 - temp) }
In a general form:
max_unsigned <- 65535 #0xFFFF
max_signed <- 32767 #0x7FFF
temp <- strtoi(value, base=16L)
if (temp>max_signed){ temp <- -(max_unsigned- temp) }