Converting a chr to numeric [duplicate] - r

This question already has answers here:
Why are these numbers not equal?
(6 answers)
Closed 5 years ago.
I am trying to convert a chr into a number. The number I am trying to convert is "20171023063155.557". When I use the as.numeric function, it gives me 20171023063155.559. I have tried a few different methods but cannot get it to convert correctly.
Any help would be much appreciated.
as.POSIXct("20171023063155.557", format = "%Y%m%d%H%M%OS")
[1] "2017-10-23 06:31:55 PDT"
> as.POSIXct("20171023063155.557", format = "%Y%m%d%H%M%S")
[1] "2017-10-23 06:31:55 PDT"

Your string actually appears to be a timestamp. I would therefore suggest that you treat it as such. One option here would be to convert it to a date using as.POSIXct:
x <- "20171023063155.557"
y <- as.POSIXct(x, format = "%Y%m%d%H%M%OS")
With a POSIXct object in hand, you can now easily extract information about your timestamp, e.g.
weekdays(y, FALSE)
months(y, FALSE)
[1] "Monday"
[1] "October"
To verify that millisecond precision information has in fact been stored in the POSIXct object, we can call format to check:
format(y, "%Y-%m-%d %H:%M:%OS6")
[1] "2017-10-23 06:31:55.556999"

The problem is the the "rounding" difference imposed by using 32 bit floating point number (class float) and the default number of significant digits R is configured to print:
x <- as.numeric("20171023063155.557")
x
# [1] 2.017102e+13
getOption("digits")
# 7
options(digits=22)
x
# [1] 20171023063155.55859375
So just change the digits option and you can see your number is converted (almost ;-) correctly...

Related

as.POSIXct behaving inconsistently

This might sound like a duplicate issue but I have gone through many POSIxct related bugs but did not come across this. If you still find one, I will really appreciate being pointed in that direction. as.POSIXct is behaving very awkwardly in my case. See the example below:
options(digits.secs = 3)
test_time <- "2017-01-26 23:00:00.010"
test_time <- as.POSIXct(test_time, format = "%Y-%m-%d %H:%M:%OS")
This returns:
"2017-01-26 23:00:00.00"
Now, I try the following option and it returns NA. I have no idea why is this behaving like that when all I need it to convert to is "2017-01-26 23:00:00.010".
test_time <- "2017-01-26 23:00:00.010"
test_time <- as.POSIXct(test_time, format = "%Y-%m-%d %H:%M:%OS3")
Now it works fine when I do this:
as.POSIXlt(strptime(test_time,format = "%Y-%m-%d %H:%M:%OS"), format = "%Y-%m-%d %H:%M:%OS")
But for my purpose I need to have this as a POSIxct object because some libraries I am working with only take POSIXct objects. Converting POSIXlt to POSIXct again results in the same problem as before.
Is there an issue with my system settings? The date is also not one of those daylight savings times one to throw an error. Why would it work with one format and not others? Any leads/suggestions are welcome!
Running on Windows 10 64-bit
The issue here has to do with the maximum precision that POSIXct can handle. It is backed by a double under the hood, representing the number of seconds since the epoch, midnight on 1970-01-01 UTC. Fractional seconds are represented as fractional parts of that double, i.e. 63.02 represents 1970-01-01 00:01:03.02 UTC.
options(digits = 22, digits.secs = 3)
.POSIXct(63.02, tz = "UTC")
#> [1] "1970-01-01 00:01:03.02 UTC"
63.02
#> [1] 63.02000000000000312639
Now, when working with doubles there are limits to the precision that they can represent exactly. You can see this with the above example; typing in 63.02 in the console doesn't return exactly the same number, and instead returns something close, but with some extra bits at the end.
So now let's take a look at your example. If we start as "low level" as possible, the first thing as.POSIXct() does is call strptime(), which returns a POSIXlt object. That keeps each "field" of the date-time as a separate element (i.e. year is kept separate from month, day, second, etc). We can see that it parsed correctly and our sec field holds 0.01.
# `digits.secs` to print 3 fractional digits (has no effect on parsing)
# `digits` to print 22 fractional digits for double values
options(digits.secs = 3, digits = 22)
x <- "2017-01-26 23:00:00.010"
# looks good
lt <- strptime(x, format = "%Y-%m-%d %H:%M:%OS", tz = "America/New_York")
lt
#> [1] "2017-01-26 23:00:00.01 EST"
# This is a POSIXlt, which is a list holding fields like year,month,day,...
class(lt)
#> [1] "POSIXlt" "POSIXt"
# sure enough...
lt$sec
#> [1] 0.01000000000000000020817
But now convert that to POSIXct. At this point, the individual fields are collapsed into a single double, which might have precision issues.
# now convert to POSIXct (i.e. a single double holding all the info)
# looks like we lost the fractional seconds?
ct <- as.POSIXct(lt)
ct
#> [1] "2017-01-26 23:00:00.00 EST"
# no, they are still there, but the precision in the `double` data type
# isn't enough to be able to represent this exactly as `1485489600.010`
unclass(ct)
#> [1] 1485489600.009999990463
#> attr(,"tzone")
#> [1] "America/New_York"
So the ct fractional part of the double value is close to .010, but can't represent it exactly and returns a value slightly less than .010, which gets (I presume) rounded down when the POSIXct is printed, making it look like you lost the fractional seconds.
Because these issues are so troublesome, I recommend using the low level API of the clock package (note that I wrote this package). It has support for fractional seconds up to nanoseconds without loss of precision (by using a different data structure than POSIXct).
https://clock.r-lib.org/
library(clock)
x <- "2017-01-26 23:00:00.010"
nt <- naive_time_parse(x, format = "%Y-%m-%d %H:%M:%S", precision = "millisecond")
nt
#> <time_point<naive><millisecond>[1]>
#> [1] "2017-01-26 23:00:00.010"
# If you need it in a time zone
as_zoned_time(nt, zone = "America/New_York")
#> <zoned_time<millisecond><America/New_York>[1]>
#> [1] "2017-01-26 23:00:00.010-05:00"

R global date issue: all dates are converted to 1975 [duplicate]

This question already has answers here:
Convert four digit year values to class Date
(5 answers)
Closed 2 years ago.
I've encountered a strange issue in R recently:
I have noticed two related errors that I believe have to do with a global date setting in R, but I don't know what the issue is. First, when I use lubridate's "year" function, I get an origin error:
library(lubridate)
library(tidyverse)
year(2008)
Error in as.POSIXlt.numeric(x, tz = tz(x)) : 'origin' must be supplied
When I check origin
lubridate::origin
It says:
[1] "1970-01-01 UTC"
Which is what I was under the impression it is supposed to say. When I try to use as.Date, it says that everything is the year 1975:
as.Date(2008)
[1] "1975-07-02"
as.Date(1910)
[1] "1975-03-26"
However, if I use
ymd("2008-01-01") #it works fine:
[1] "2008-01-01"
I'm at a loss as to what to do - any advice? Thanks!
Well, using as.Date(2008) you pass not a string, but a number. Therefore it takes 2008 days from 1970-01-01, which is... 1975-07-02! :-) Check that with:
as.Date("1970-01-01") + 2008
#> [1] "1975-07-02"
Similarly 1975-03-26 is 1910 days after the origin, 1970-01-01:
as.numeric(as.Date("1975-03-26"))
#> [1] 1910
Note that passing only a year as a string takes today's day and month:
as.Date("2008", format = "%Y")
#> [1] "2008-07-14"
As for lubridate, function year() is for extracting a year from a date object, not to create a date from a string:
year(as.Date("1970-01-01"))
#> [1] 1970
If you want to convert a year as a string to a date, use e.g. parse_date_time():
library(lubridate)
parse_date_time("2008", orders = "%Y")
#> [1] "2008-01-01 UTC"

Formatting Unconventional Date

I'm having trouble formatting a list of dates in R. The conventional methods of formatting in R such as as.Date or as.POSIXct don't seem to be working.
I have dates in the format: 1012015
using
as.POSIXct(as.character(data$Start_Date), format = "%m%d%Y")
does not give me an error, but my date returns
"0015-10-12" because the month is not a two digit number.
Is there a way to change this into the correct date format?F
The lubridate package can help with this:
lubridate::mdy(1012015)
[1] "2015-01-01"
The format looks ambiguous but the OP gave two hints:
He is using format = "%m%d%Y" in his own attempt, and
he argues the issue is because the month is not a two digit number
This uses only base R. The %08d specifies a number to be formatted into 8 characters with 0 fill giving in this case "01012015".
as.POSIXct(sprintf("%08d", 1012015), format = "%m%d%Y")
## [1] "2015-01-01 EST"
Note that if you don't have any hours/minutes/seconds it would be less error prone to use "Date" class since then the possibility of subtle time zone errors is eliminated.
as.Date(sprintf("%08d", 1012015), format = "%m%d%Y")
## [1] "2015-01-01"

Number of seconds to date conversion [duplicate]

This question already has answers here:
Convert UNIX epoch to Date object
(2 answers)
Closed 5 years ago.
I have a data set where one of the columns is sales date. Don't know why, but R converts it to numeric why performing any operation. I would like to convert it back to POSIXct date format in R. To do the same, I am using below code, but getting an unexpected result
x= as.Date(1448208000, origin = "1970-01-01")
[1] "3967028-10-31"
x= as.POSIXct(x,"%Y-%m-%d")
I am not good with dates format in R and would appreciate any kind of help in this regard.
1448208000 is the number of seconds since the unix epoch, and is the numeric representation of a POSIX object. To convert it back to POSIXct you want
as.POSIXct(1448208000, origin = "1970-01-01")
You'll also probably want to ensure the timezone is correct too; see the difference between these two commands
as.POSIXct(1448208000, origin = "1970-01-01", tz = "UTC")
# [1] "2015-11-22 16:00:00 UTC"
as.POSIXct(1448208000, origin = "1970-01-01", tz = "Australia/Melbourne")
# [1] "2015-11-23 03:00:00 AEDT"

rounding times to the nearest hour in R [duplicate]

This question already has an answer here:
Round a POSIX date and time (posixct) to a date relative to a timezone
(1 answer)
Closed 9 years ago.
I have data in the format
time <- c("16:53", "10:57", "11:58")
etc
I would like to create a new column where each of these times is rounded to the nearest hour. I cannot seem to get the POSIX command to work for me.
as.character(format(data2$time, "%H:%M"))
Error in format.default(structure(as.character(x), names = names(x), dim = dim(x), :
invalid 'trim' argument
Let alone use the round command. Can anyone advise?
## Example times
x <- c("16:53", "10:57", "11:58")
## POSIX*t objects need both date and time specified
## Here, the particular date doesn't matter -- just that there is one.
tt <- strptime(paste("2001-01-01", x), format="%Y-%m-%d %H:%M")
## Use round.Date to round, then format to format
format(round(tt, units="hours"), format="%H:%M")
# [1] "17:00" "11:00" "12:00"

Resources