I have a data.table of data numbers in character format that I am trying to convert to numeric numbers. However the issue is that the numbers are very long and I want to retain all of the numbers without any rounding from R. For examle the first 5 elements of the data.table:
> TimeO[1]
[1] "20110630224701281482"
> TimeO[2]
[1] "20110630224701281523"
> TimeO[3]
[1] "20110630224701281533"
> TimeO[4]
[1] "20110630224701281548"
> TimeO[5]
[1] "20110630224701281762"
I wrote a function to convert from a character into numeric:
convert_time_fast <- function(tim){
b <- tim - tim%/%10^12*10^12
# hhmmssffffff
ms <- b%%10^6; b <-(b-ms)/10^6
ss <- b%%10^2; b <-(b-ss)/10^2
mm <- b%%10^2; hh <-(b-mm)/10^2
# if hours>=22, subtract 24 (previous day)
hh <- hh - (hh>=22)*24
return(hh+mm/60+ss/3600+ms/(3600*10^6))
}
However the rounding occurs in R so datapoints now have the same time. See first 5 elements after converting:
TimeOC <--convert_time_fast(as.numeric(TimeO))
> TimeOC[1]
[1] 1.216311
> TimeOC[2]
[1] 1.216311
> TimeOC[3]
[1] 1.216311
> TimeOC[4]
[1] 1.216311
> TimeOC[5]
[1] 1.216311
Any help figuring this out would be greatly appreciated!
You should test to see if they are really equal (all.equal()).
Usually R limits the number of digits it prints (usually to 7), but they are still there.
See also this example:
> as.numeric("1.21631114")
[1] 1.216311
> as.numeric("1.21631118")
[1] 1.216311
> all.equal(as.numeric("1.21631114"), as.numeric("1.21631118"))
[1] "Mean relative difference: 3.288632e-08" # which indicates they're not the same
Related
> trunc(26015)
[1] 26015
> 260.15*100
[1] 26015
> trunc(260.15*100)
[1] 26014
> floor(260.15*100)
[1] 26014
> as.integer(260.15*100)
[1] 26014
For this code in R, is there an issue with the internal representation of the number?
When I do 260.15*100, the number being printed is still 26015, but when I use a function like trunc() or as.integer(), it becomes 26014.
Usually, my value containing the decimal comes from another variable. So how do I overcome this issue?
The print method for a numeric is not the same as its internal representation. 260.15 * 100 is never actually 26015, it is just printed as such. print.numeric uses rounding. The underlying numeric data is floating point. You can see this by changing your print options:
# set print.numeric() to display up to 22 digits, the maximum possible
> options(digits = 22)
> 260.15 * 100
[1] 26014.99999999999636202
> 26015
[1] 26015
In lieu of trunc() or as.integer(), does round() meet your needs?
I am doing a simple operation of multiplying a decimal number and converting it to integer but the result seems to be different than expected. Apologies if this is discussed else where, I am not able to find any straight forward answers to this
> as.integer(1190.60 * 100)
[1] 119059
EDIT:
So, I have to convert that to character and then do as.integer to get what is expected
> temp <- 1190.60
> temp2 <- 1190.60 * 100
> class(temp)
[1] "numeric"
> class(temp2)
[1] "numeric"
> as.character(temp2)
[1] "119060"
> as.integer(temp2)
[1] 119059
> as.integer(as.character(temp2))
[1] 119060
EDIT2: According to the comments, thanks #andrey-shabalin
> temp2
[1] 119060
> as.integer(temp2)
[1] 119059
> as.integer(round(temp2))
[1] 119060
EDIT3: As mentioned in the comments the question is related to behaviour of as.integer and not about floating calculations
The answer to this is "floating point error". You can see this easily by checking the following:
> temp <- 1190.60
> temp2 <- 1190.60 * 100
> temp2 - 119060
[1] -1.455192e-11
Due to floating point errors, temp2 isn't exactly 119060 but :
> sprintf("%.20f", temp2)
[1] "119059.99999999998544808477"
If you use as.integer on a float, it works the same way as trunc, i.e. it does round the float in the direction of 0. So in this case that becomes 119059.
If you convert to character using as.character(), R will make sure that it uses maximum 15 significant digits. In this example that would be "119059.999999999". The next digit is another 9, so R will round this to 119060 before conversion. I avoid this in the code above by using sprintf() instead of as.character().
R uses the date "1970-01-01" as an origin. Does it make an exception from its typical 1-indexing to index dates with 0-indexing?
> x <- as.Date("1970-01-01")
> y <- as.Date("1970-01-02")
> unclass(x)
[1] 0
> unclass(y)
[1] 1
No. This is not an indexing thing. "Dates are represented as the number of days since 1970-01-01" (From the ?Date help page). Also note
unclass(as.Date("1969-12-31")) == -1
So it's not an index, it's a difference from a sentinel value. There's no underlying vector here.
I have a large floating point number as a character like so
x<-"5374761693.91823";
On doing
as.numeric(x);
I get the following output
5374761694
I would like to preserve the floating point nature of the number while casting.
use digits argument in print to see the actual number:
> print(as.numeric(x), digits=15)
[1] 5374761693.91823
options is another alternative:
> options(digits=16)
> as.numeric(x)
[1] 5374761693.91823
> # assignments
> options(digits=16)
> y <- as.numeric(x)
> y
[1] 5374761693.91823
z <- print(as.numeric(x), digits=15)
z
I would like to format numerical values, but during formatting they loose "numeric" quality. Is there a better option?
> values
[1] 5 10 20 30
> class(values[1])
[1] "numeric"
> class(values)
[1] "numeric"
> out<-sprintf("%6.2f",values)
> out
[1] " 5.00" " 10.00" " 20.00" " 30.00"
> class(out)
[1] "character"
> class(out[1])
[1] "character"
out is no longer numeric.
You can use the options of print to change the number of digits printed :
R> print(3.141592, digits=3)
[1] 3.14
You can also set options(digits) to make it more or less permanent in your session :
R> options(digits=3)
R> print(3.141592)
[1] 3.14
But this will not necessarily apply to plots, etc.