POSIXct not accepting custom origin - r

I feel like I'm missing something obvious here. I'm importing some data which is stored as HH:MM:SS. I'm trying to convert this to POSIXct and manually specify the origin as the date the data was collected.
datIn$TimeComplete <- as.POSIXct(datIn$Time, format="%H:%M:%S", origin="2000-01-01", tz="CET")
The output of this registers the HH:MM:SS correctly but says the day is 2019-03-05 (today) and I can't seem to convince it to do anything different.

You are misunderstanding the concept of origin. Origin is there to help convert numbers to dates. Those numbers represent seconds so you need the origin in order to add those seconds to the origin and get the datetime object. For example,
as.POSIXct(60, tz = "GMT", origin = '2015-03-05')
#[1] "2015-03-05 00:01:00 GMT"
as.POSIXct(3600, tz = "GMT", origin = '2015-07-05')
#[1] "2015-07-05 01:00:00 GMT"
What you are trying to do can be easily achieved by pasting the desired date to your times and converting to datetime, i.e.
as.POSIXct(paste0('2000-01-01 ', '11:03:15'), format = "%Y-%m-%d %H:%M:%S", origin = "", tz = "CET")
#[1] "2000-01-01 11:03:15 CET"

Related

Including seconds when using strptime with examples such as 10-10-2010 00:00:00 [duplicate]

This question already has answers here:
How can I keep midnight (00:00h) using strptime() in R?
(2 answers)
Closed 3 years ago.
I have had a good hunt around and sure this has to have been answered before but I cant seem to find any help!
I have a series of times in a data frame, some of which have the following time stamp in the following format:
Date <- '2018-10-10'
Time <- '00:00:00'
When I use the strptime function it returns only the date, it removes the 00:00:00, see below:
datetime <- strptime(paste(Date,Time),
format = "%Y-%m-%d %H:%M:%S",
tz = 'GMT')
> datetime
[1] "2018-10-10 GMT"
if for example it was Time <- 00:00:01 it would return
> datetime
[1] "2018-10-10 00:00:01 GMT"
Does anyone know a way of ensuring the output for 00:00:00 instances are displayed. Desired output:
"2018-10-10 00:00:00 GMT"
Many thanks!!
Jim
When you type datetime and hit <Enter>, R will use a/the suitable print method to display datetime. Just because datetime returns "2018-10-10 GMT" doesn't mean that datetime has forgotten about the seconds.
To ensure a consistent format of your POSIXlt object, you could use format
format(datetime, "%Y-%m-%d %H:%M:%S", usetz = T)
#[1] "2018-10-10 00:00:00 GMT"
Similar for case 2
Date <- '2018-10-10'
Time <- '00:00:01'
datetime <- strptime(paste(Date,Time), format = "%Y-%m-%d %H:%M:%S", tz = 'GMT')
format(datetime, "%Y-%m-%d %H:%M:%S", usetz = T)
#[1] "2018-10-10 00:00:01 GMT"
Sample data
Date <- '2018-10-10'
Time <- '00:00:00'
datetime <- strptime(paste(Date,Time), format = "%Y-%m-%d %H:%M:%S", tz = 'GMT')

Is it possible to use something like `tz=NULL`?... `as.POSIXct` defaults to locale-dependent timezone (unlike `as.Date`), which causes issues

I know this is a long-standing, deeply embedded issue, but it's something I come up against so regularly, and that I see beginners to R struggle with so regularly, that I'd love to have a satisfactory solution. My google and SO searches have come up empty so far, but please point me in the right direction if this is duplicated elsewhere.
TL;DR: Is there a way to use something like the POSIXct class without a timezone? I generally use tz="UTC" regardless of the actual timezone of the dataset, but it's a messy hack IMO, and I don't particularly like it. What I want is something like tz=NULL, which would behave the same way as UTC, but without actually adding "UTC" as a tzone attribute.
The problem
I'll start with an example (there are plenty) of typical timezone issues. Creating an object with POSIXct values:
df <- data.frame( timestamp = as.POSIXct( c( "2018-01-01 03:00:00",
"2018-01-01 12:00:00" ) ),
a = 1:2 )
df
# timestamp a
# 1 2018-01-01 03:00:00 1
# 2 2018-01-01 12:00:00 2
That's all fine, but then I try to convert the timestamps to dates:
df$date <- as.Date( df$timestamp )
df
# timestamp a date
# 1 2018-01-01 03:00:00 1 2017-12-31
# 2 2018-01-01 12:00:00 2 2018-01-01
The dates have converted incorrectly, because my computer locale is in Australian Eastern Time, meaning that the numeric values of the timestamps have been shifted by the offset relevant to my locale (in this case -11hrs). We can see this by forcing the timezone to UTC, then comparing the values before and after:
df$timestamp[1]
# [1] "2018-01-01 03:00:00 AEDT"
x <- lubridate::force_tz( df$timestamp[1], "UTC" ); x
# [1] "2018-01-01 03:00:00 UTC"
difftime( df$timestamp[1], x )
# Time difference of -11 hours
That's just one example of the issues cause by timezones. There are others, but I won't go into them here.
My hack-y solution
I don't want that behaviour, so I need to convince as.POSIXct not to mess with my timestamps. I generally do this by using tz="UTC", which works fine, except that I'm adding information to the data that isn't real. These times are NOT in UTC, I'm just saying that to avoid time-shift issues. It's a hack, and any time I give my data to someone else, they could be forgiven for thinking that the timestamps are in UTC when they're not. To avoid this, I generally add the actual timezone to the object/column name, and hope that anyone I pass my data on to will understand why someone would label an object with a timezone different to the one in the object itself:
df <- data.frame( timestamp.AET = as.POSIXct( c( "2018-01-01 03:00:00",
"2018-01-01 12:00:00" ),
tz = "UTC" ),
a = 1:2 )
df$date <- as.Date( df$timestamp )
df
# timestamp.AET a date
# 1 2018-01-01 03:00:00 1 2018-01-01
# 2 2018-01-01 12:00:00 2 2018-01-01
What I'm hoping for
What I really want is a way to use POSIXct without having to specify a timezone. I don't want the times messed with in any way. Do everything as though the values were in UTC, and leave any timezone details like offsets, daylight savings, etc to the user. Just don't pretend they actually ARE in UTC. Here's my ideal:
x <- as.POSIXct( "2018-01-01 03:00:00" ); x
# [1] "2018-01-01 03:00:00"
attr( x, "tzone" )
# [1] NULL
shifted <- lubridate::force_tz( x, "UTC" )
shifted == x
# [1] TRUE
as.numeric( shifted ) == as.numeric( x )
# [1] TRUE
as.Date( x )
# [1] "2018-01-01"
So there's no timezone attribute on the object at all. The date conversion works as one would expect from the printed value. If there are daylight savings time-shifts, or any other locale-specific issues, the user (me or someone else) needs to deal with that themselves.
I believe something similar to this is possible in POSIXlt, but I really don't want to shift to that. chron or another timeseries-oriented package might be another solution, but I think POSIXct is more widely used and accepted, and this seems like something that should be possible within base::. A POSIXct object with tz="UTC" is exactly what I need, I just don't want to have to lie about timezones in order to get it to behave the way I want (and I believe most beginners to R expect).
So what do others do here? Is there an easy way to use POSIXct without a timezone that I've missed? Is there a better work-around than tz="UTC"? Is that what others are doing?
I'm not sure I understand your issue. Having (re-)read your post and ensuing comments, I see your point.
To summarise:
as.POSIXct determines tz from your system. as.Date has default tz = "UTC" for class POSIXct. So unless you're in tz = "UTC", dates may change; the solution is to use tz with Date, or to change the behaviour of as.Date.POSIXct (see update below).
Case 1
If you don't specify an explicit tz with as.POSIXct, you can simply specify tz = "" with as.Date to enforce a system-specific timezone.
df <- data.frame(
timestamp = as.POSIXct(c("2018-01-01 03:00:00", "2018-01-01 12:00:00")),
a = 1:2)
df$date <- as.Date(df$timestamp, tz = "")
df;
# timestamp a date
#1 2018-01-01 03:00:00 1 2018-01-01
#2 2018-01-01 12:00:00 2 2018-01-01
Case 2
If you do set an explicit tz with as.POSIXct, you can extract tz from the POSIXct object, and pass it on to as.Date
df <- data.frame(
timestamp = as.POSIXct(c("2018-01-01 03:00:00", "2018-01-01 12:00:00"), tz = "UTC"),
a = 1:2)
tz <- attr(df$timestamp, "tzone")
tz
#[1] "UTC"
df$date <- as.Date(df$timestamp, tz = tz)
df
# timestamp a date
#1 2018-01-01 03:00:00 1 2018-01-01
#2 2018-01-01 12:00:00 2 2018-01-01
Update
There exists a related discussion on Dirk Eddelbuettel's anytime GitHub project site. The discussion turns out somewhat circular, so I'm afraid it does not offer too much in terms of understanding why as.Date.POSIXct does not inherit tz from POSIXct. I would probably call this a base R idiosyncrasy (or as Dirk calls it: "[T]hese are known quirks in Base R").
As for a solution: I would change the behaviour of as.Date.POSIXct rather than the default behaviour of as.POSIXct.
We could simply redefine as.Date.POSIXct to inherit tz from the POSIXct object.
as.Date.POSIXct <- function(x) {
as.Date(as.POSIXlt(x, tz = attr(x, "tzone")))
}
Then you get consistent results for your sample case:
df <- data.frame(
timestamp = as.POSIXct(c("2018-01-01 03:00:00", "2018-01-01 12:00:00")),
a = 1:2)
df$date <- as.Date(df$timestamp)
df
#timestamp a date
#1 2018-01-01 03:00:00 1 2018-01-01
#2 2018-01-01 12:00:00 2 2018-01-01
You basically want a different default for as.POSIXct than what is provided. You don't really want to modify anything except as.POSIXct.default, which is the function that will eventually handle character values. It wouldn't make much sense to modify as.POSIXct.numeric since that will always be an offset to UCT. The tz argument only determines what format.POSIXct will display. So you can modify the formals list of the one you've been given. Put this in your .Rprofile:
formals(as.POSIXct.default) <- alist(x=, ...=, tz="UTC")
Then it passes your tests:
> x <- as.POSIXct( "2018-01-01 03:00:00" ); x
[1] "2018-01-01 03:00:00 UTC"
> attr( x, "tzone" )
[1] "UTC"
> shifted <- lubridate::force_tz( x, "UTC" )
> shifted == x
[1] TRUE
> as.numeric( shifted ) == as.numeric( x )
[1] TRUE
> as.Date( x )
[1] "2018-01-01"
The alternative would be to define an entirely new class, but that would require much more extensive efforts.
A further point to make regards teh specification of time zones. With the prevalence of "daylight savings times" it might be more unambiguous during (input when possible) and output to use the %z format:
dtm <- format( Sys.time(), format="%Y-%m-%d %H:%M:%S %z")
#output
format( Sys.time(), format="%Y-%m-%d %H:%M:%S %z")
[1] "2018-07-06 17:18:27 -0700"
#input and output without the formals change
as.POSIXct(dtm, format="%Y-%m-%d %H:%M:%S %z")
[1] "2018-07-06 17:21:41 PDT"
# after the formals change
as.POSIXct(dtm, format="%Y-%m-%d %H:%M:%S %z")
[1] "2018-07-07 00:21:41 UTC"
So when tz information is present as an offset, it can be handled correctly.

R: ISO 8601 converting UTC to ISO 8601. An hour is added why?

So I'm looking to convert some ISO 8601 time to UTC format in R. For example:
library("lubridate")
x <- "2010-04-14-01-00-00-UTC"
datetime <- lubridate::ymd_hms(x)
datetime
[1] "2010-04-14 01:00:00 UTC"
strftime(datetime, "%Y-%m-%dT%H:%M:%SZ")
[1] "2010-04-14T02:00:00Z"
However in ISO 8601 "Z" indicates UTC time and I would therefore have expected "2010-04-14T01:00:00Z", but an hour has been added onto the datetime. Why? Am I miss-understanding something?
What is the correct way in R to convert between the two? And to convert backwards?
From the documentation:
strftime(x, format = "", tz = "", usetz = FALSE, ...)
[...]
tz A character string specifying the time zone to be used for the
conversion. System-specific (see as.POSIXlt), but "" is the current
time zone, and "GMT" is UTC. Invalid values are most commonly treated
as UTC, on some platforms with a warning.
So, you need to specify the correct timezone:
strftime(datetime, "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
#[1] "2010-04-14T01:00:00Z"
Otherwise it takes the timezone from your system's locale settings:
strftime(datetime, "%Y-%m-%dT%H:%M:%S", usetz = TRUE)
#[1] "2010-04-14T03:00:00 CEST"

Convert DD/MM/YYYY HH:MM into date

I need to transfer a date time read in as a character from a CSV and convert it to a POSIXct format.
I can successfully do with dates only but have been unable to do this for a date and a time combined character string, the time.
time <-('01/08/2014 16:25')
as.POSIXct(time, origin = "03/01/1950 00:00", tz = "GMT")
[1] "0001-08-20 GMT"
class(time2)
[1] "POSIXlt" "POSIXt"
Any pointers would be appreciated!
time <-("01/08/2014 16:25:00")
time2 <- strptime(time,"%d/%m/%Y %H:%M:%S",tz="GMT")
[1] "2014-08-01 16:25:00 GMT"
I was unaware that the %Y has to be capped for 4 digits!

How to convert UTC timestamp to Australian time

I have a large amount of data with timestamps in the following format: 2013-11-14T23:52:29Z.
My research indicates that the timezone is UTC (denoted by a "Z" suffix).
I need to convert it to +1100 UTC (which is Australia/Sydney time), also known as "EDT" (or Eastern Daylight Time).
I have tried the following:
test_timestamp <- "2013-11-14T23:52:29Z"
as.POSIXct(test_timestamp,"Australia/Sydney")
This produces the output "2013-11-14 EST"
This does not pass a sanity test as it should roll the date over into the next calendar day (i.e. 2013-11-15 EST).
I have wasted many hours on this seemingly trivial task, so any help is greatly appreciated.
Try this, with a full format specified (see ?strptime):
format(
as.POSIXct(test_timestamp,format="%Y-%m-%dT%H:%M:%SZ",tz="UTC"),
tz="Australia/Sydney"
)
#[1] "2013-11-15 10:52:29"
Compare your attempt (essentially):
format(as.POSIXct(test_timestamp,tz="Australia/Sydney"),tz="Australia/Sydney")
#[1] "2013-11-14"
Also, this will work to non-destructively edit the data, only altering the output:
result <- as.POSIXct(test_timestamp,format="%Y-%m-%dT%H:%M:%SZ",tz="UTC")
result
#[1] "2013-11-14 23:52:29 UTC"
#dput(result)
#structure(1384473149, class = c("POSIXct","POSIXt"), tzone = "UTC")
attr(result,"tzone") <- "Australia/Sydney"
#dput(result)
#structure(1384473149, class = c("POSIXct","POSIXt"), tzone = "Australia/Sydney")
result
#[1] "2013-11-15 10:52:29 EST"

Resources