I have a large amount of data with timestamps in the following format: 2013-11-14T23:52:29Z.
My research indicates that the timezone is UTC (denoted by a "Z" suffix).
I need to convert it to +1100 UTC (which is Australia/Sydney time), also known as "EDT" (or Eastern Daylight Time).
I have tried the following:
test_timestamp <- "2013-11-14T23:52:29Z"
as.POSIXct(test_timestamp,"Australia/Sydney")
This produces the output "2013-11-14 EST"
This does not pass a sanity test as it should roll the date over into the next calendar day (i.e. 2013-11-15 EST).
I have wasted many hours on this seemingly trivial task, so any help is greatly appreciated.
Try this, with a full format specified (see ?strptime):
format(
as.POSIXct(test_timestamp,format="%Y-%m-%dT%H:%M:%SZ",tz="UTC"),
tz="Australia/Sydney"
)
#[1] "2013-11-15 10:52:29"
Compare your attempt (essentially):
format(as.POSIXct(test_timestamp,tz="Australia/Sydney"),tz="Australia/Sydney")
#[1] "2013-11-14"
Also, this will work to non-destructively edit the data, only altering the output:
result <- as.POSIXct(test_timestamp,format="%Y-%m-%dT%H:%M:%SZ",tz="UTC")
result
#[1] "2013-11-14 23:52:29 UTC"
#dput(result)
#structure(1384473149, class = c("POSIXct","POSIXt"), tzone = "UTC")
attr(result,"tzone") <- "Australia/Sydney"
#dput(result)
#structure(1384473149, class = c("POSIXct","POSIXt"), tzone = "Australia/Sydney")
result
#[1] "2013-11-15 10:52:29 EST"
Related
I have the following array of POSIXct dates
>x
[1] "2003-12-01 UTC" "2003-12-02 UTC" "2003-12-03 UTC" "2003-12-04 UTC" "2003-12-05 UTC" "2003-12-08 UTC"
[7] "2003-12-09 UTC" "2003-12-10 UTC" "2003-12-11 UTC" "2003-12-12 UTC"
whose structure is:
str(x)
POSIXct[1:10], format: "2003-12-01" "2003-12-02" "2003-12-03" "2003-12-04" "2003-12-05" "2003-12-08" "2003-12-09 ..."
Anyway, when I use dput, I obtain:
structure(c(1070236800, 1070323200, 1070409600, 1070496000, 1070582400,
1070841600, 1070928000, 1071014400, 1071100800, 1071187200), class = c("POSIXct",
"POSIXt"), tzone = "UTC")
POSIXct is stored as a numeric value representing the number of seconds since midnight on 1st January 1970 UTC. Note that if we write the same structure manually with the numeric value set to 0, we get:
structure(0, class = c("POSIXct", "POSIXt"), tzone = "UTC")
#> [1] "1970-01-01 UTC"
We can confirm that POSIXct is stored as a double precision floating point number
x <- Sys.time()
x
#> [1] "2022-11-08 11:33:36 GMT"
class(x)
#> [1] "POSIXct" "POSIXt"
typeof(x)
#> [1] "double"
The reason why it is stored as a number is because we need to be able to work on date-times arithmetically. If we subtract numbers from a POSIXct object we are subtracting seconds:
x - 3600
#> [1] "2022-11-08 10:33:36 GMT"
If it were stored as a character string, than any time we wanted to perform calculations on date-times or plot them, we would have to parse the character strings into a numerical value, do the calculations, then rewrite the character strings. This is obviously much less efficient than having an underlying numerical representation that uses a special print method to represent the number as a date-time.
POSIXct converts date-time with an associated time zone. The number you are seeing i.e. 1070236800 is the number of seconds from 1 January 1970, you'll notice if you have a date before this it will be negative. For example,
date <- as.POSIXct("1969-12-31",tz="UTC",format="%Y-%m-%d")
dput(date)
Gives
structure(-86400, class = c("POSIXct", "POSIXt"), tzone = "UTC")
Since 1969 is before 1970 there is a negative number of seconds and the reason for 86400 is I selected 1 day before 1 January 1970 and there are 86400 seconds in a day
So you'll notice that if I type the seconds from your first element and convert it, it gives the date you initially had
as.POSIXct(1070236800, origin = "1970-01-01", tz = "UTC")
Yields
[1] "2003-12-01 UTC"
Storing it this way speeds up computation, processing and conversion to other formats
Trying to get current date in a POSIXct class. I have tried the following:
as.POSIXct(Sys.Date(), format = "%m/%d/%y", tz = "EST")
and got
[1] "2021-02-12 19:00:00 EST"
and I wish to only get the date without the time but in POSIXct class. For instance:
[1] "2021-02-12"
Convert the Date class object to character first:
as.POSIXct(format(Sys.Date()))
## [1] "2021-02-13 EST"
Even shorter is:
trunc(Sys.time(), "day")
## [1] "2021-02-13 EST"
Note:
POSIXct objects are stored internally as seconds since the Epoch and not as separate date and time so they always have times; however, if the time is midnight as it is here then it does not display when printed using the default formatting.
if you only need the Date it is normally better to use Date class since using POSIXct class can result in subtle time zone errors if you are not careful and there is typically no reason to expose yourself to that potential if you don't need to.
if you change the session's time zone then it won't display without the time because midnight in one time zone is not midnight other time zones.
x <- as.POSIXct(format(Sys.Date()))
x
## [1] "2021-02-13 EST"
# change time zone
Sys.setenv(tz = "GMT")
x
## [1] "2021-02-13 05:00:00 GMT"
# change back
Sys.setenv(tz = "")
x
## [1] "2021-02-13 EST"
I feel like I'm missing something obvious here. I'm importing some data which is stored as HH:MM:SS. I'm trying to convert this to POSIXct and manually specify the origin as the date the data was collected.
datIn$TimeComplete <- as.POSIXct(datIn$Time, format="%H:%M:%S", origin="2000-01-01", tz="CET")
The output of this registers the HH:MM:SS correctly but says the day is 2019-03-05 (today) and I can't seem to convince it to do anything different.
You are misunderstanding the concept of origin. Origin is there to help convert numbers to dates. Those numbers represent seconds so you need the origin in order to add those seconds to the origin and get the datetime object. For example,
as.POSIXct(60, tz = "GMT", origin = '2015-03-05')
#[1] "2015-03-05 00:01:00 GMT"
as.POSIXct(3600, tz = "GMT", origin = '2015-07-05')
#[1] "2015-07-05 01:00:00 GMT"
What you are trying to do can be easily achieved by pasting the desired date to your times and converting to datetime, i.e.
as.POSIXct(paste0('2000-01-01 ', '11:03:15'), format = "%Y-%m-%d %H:%M:%S", origin = "", tz = "CET")
#[1] "2000-01-01 11:03:15 CET"
I know this is a long-standing, deeply embedded issue, but it's something I come up against so regularly, and that I see beginners to R struggle with so regularly, that I'd love to have a satisfactory solution. My google and SO searches have come up empty so far, but please point me in the right direction if this is duplicated elsewhere.
TL;DR: Is there a way to use something like the POSIXct class without a timezone? I generally use tz="UTC" regardless of the actual timezone of the dataset, but it's a messy hack IMO, and I don't particularly like it. What I want is something like tz=NULL, which would behave the same way as UTC, but without actually adding "UTC" as a tzone attribute.
The problem
I'll start with an example (there are plenty) of typical timezone issues. Creating an object with POSIXct values:
df <- data.frame( timestamp = as.POSIXct( c( "2018-01-01 03:00:00",
"2018-01-01 12:00:00" ) ),
a = 1:2 )
df
# timestamp a
# 1 2018-01-01 03:00:00 1
# 2 2018-01-01 12:00:00 2
That's all fine, but then I try to convert the timestamps to dates:
df$date <- as.Date( df$timestamp )
df
# timestamp a date
# 1 2018-01-01 03:00:00 1 2017-12-31
# 2 2018-01-01 12:00:00 2 2018-01-01
The dates have converted incorrectly, because my computer locale is in Australian Eastern Time, meaning that the numeric values of the timestamps have been shifted by the offset relevant to my locale (in this case -11hrs). We can see this by forcing the timezone to UTC, then comparing the values before and after:
df$timestamp[1]
# [1] "2018-01-01 03:00:00 AEDT"
x <- lubridate::force_tz( df$timestamp[1], "UTC" ); x
# [1] "2018-01-01 03:00:00 UTC"
difftime( df$timestamp[1], x )
# Time difference of -11 hours
That's just one example of the issues cause by timezones. There are others, but I won't go into them here.
My hack-y solution
I don't want that behaviour, so I need to convince as.POSIXct not to mess with my timestamps. I generally do this by using tz="UTC", which works fine, except that I'm adding information to the data that isn't real. These times are NOT in UTC, I'm just saying that to avoid time-shift issues. It's a hack, and any time I give my data to someone else, they could be forgiven for thinking that the timestamps are in UTC when they're not. To avoid this, I generally add the actual timezone to the object/column name, and hope that anyone I pass my data on to will understand why someone would label an object with a timezone different to the one in the object itself:
df <- data.frame( timestamp.AET = as.POSIXct( c( "2018-01-01 03:00:00",
"2018-01-01 12:00:00" ),
tz = "UTC" ),
a = 1:2 )
df$date <- as.Date( df$timestamp )
df
# timestamp.AET a date
# 1 2018-01-01 03:00:00 1 2018-01-01
# 2 2018-01-01 12:00:00 2 2018-01-01
What I'm hoping for
What I really want is a way to use POSIXct without having to specify a timezone. I don't want the times messed with in any way. Do everything as though the values were in UTC, and leave any timezone details like offsets, daylight savings, etc to the user. Just don't pretend they actually ARE in UTC. Here's my ideal:
x <- as.POSIXct( "2018-01-01 03:00:00" ); x
# [1] "2018-01-01 03:00:00"
attr( x, "tzone" )
# [1] NULL
shifted <- lubridate::force_tz( x, "UTC" )
shifted == x
# [1] TRUE
as.numeric( shifted ) == as.numeric( x )
# [1] TRUE
as.Date( x )
# [1] "2018-01-01"
So there's no timezone attribute on the object at all. The date conversion works as one would expect from the printed value. If there are daylight savings time-shifts, or any other locale-specific issues, the user (me or someone else) needs to deal with that themselves.
I believe something similar to this is possible in POSIXlt, but I really don't want to shift to that. chron or another timeseries-oriented package might be another solution, but I think POSIXct is more widely used and accepted, and this seems like something that should be possible within base::. A POSIXct object with tz="UTC" is exactly what I need, I just don't want to have to lie about timezones in order to get it to behave the way I want (and I believe most beginners to R expect).
So what do others do here? Is there an easy way to use POSIXct without a timezone that I've missed? Is there a better work-around than tz="UTC"? Is that what others are doing?
I'm not sure I understand your issue. Having (re-)read your post and ensuing comments, I see your point.
To summarise:
as.POSIXct determines tz from your system. as.Date has default tz = "UTC" for class POSIXct. So unless you're in tz = "UTC", dates may change; the solution is to use tz with Date, or to change the behaviour of as.Date.POSIXct (see update below).
Case 1
If you don't specify an explicit tz with as.POSIXct, you can simply specify tz = "" with as.Date to enforce a system-specific timezone.
df <- data.frame(
timestamp = as.POSIXct(c("2018-01-01 03:00:00", "2018-01-01 12:00:00")),
a = 1:2)
df$date <- as.Date(df$timestamp, tz = "")
df;
# timestamp a date
#1 2018-01-01 03:00:00 1 2018-01-01
#2 2018-01-01 12:00:00 2 2018-01-01
Case 2
If you do set an explicit tz with as.POSIXct, you can extract tz from the POSIXct object, and pass it on to as.Date
df <- data.frame(
timestamp = as.POSIXct(c("2018-01-01 03:00:00", "2018-01-01 12:00:00"), tz = "UTC"),
a = 1:2)
tz <- attr(df$timestamp, "tzone")
tz
#[1] "UTC"
df$date <- as.Date(df$timestamp, tz = tz)
df
# timestamp a date
#1 2018-01-01 03:00:00 1 2018-01-01
#2 2018-01-01 12:00:00 2 2018-01-01
Update
There exists a related discussion on Dirk Eddelbuettel's anytime GitHub project site. The discussion turns out somewhat circular, so I'm afraid it does not offer too much in terms of understanding why as.Date.POSIXct does not inherit tz from POSIXct. I would probably call this a base R idiosyncrasy (or as Dirk calls it: "[T]hese are known quirks in Base R").
As for a solution: I would change the behaviour of as.Date.POSIXct rather than the default behaviour of as.POSIXct.
We could simply redefine as.Date.POSIXct to inherit tz from the POSIXct object.
as.Date.POSIXct <- function(x) {
as.Date(as.POSIXlt(x, tz = attr(x, "tzone")))
}
Then you get consistent results for your sample case:
df <- data.frame(
timestamp = as.POSIXct(c("2018-01-01 03:00:00", "2018-01-01 12:00:00")),
a = 1:2)
df$date <- as.Date(df$timestamp)
df
#timestamp a date
#1 2018-01-01 03:00:00 1 2018-01-01
#2 2018-01-01 12:00:00 2 2018-01-01
You basically want a different default for as.POSIXct than what is provided. You don't really want to modify anything except as.POSIXct.default, which is the function that will eventually handle character values. It wouldn't make much sense to modify as.POSIXct.numeric since that will always be an offset to UCT. The tz argument only determines what format.POSIXct will display. So you can modify the formals list of the one you've been given. Put this in your .Rprofile:
formals(as.POSIXct.default) <- alist(x=, ...=, tz="UTC")
Then it passes your tests:
> x <- as.POSIXct( "2018-01-01 03:00:00" ); x
[1] "2018-01-01 03:00:00 UTC"
> attr( x, "tzone" )
[1] "UTC"
> shifted <- lubridate::force_tz( x, "UTC" )
> shifted == x
[1] TRUE
> as.numeric( shifted ) == as.numeric( x )
[1] TRUE
> as.Date( x )
[1] "2018-01-01"
The alternative would be to define an entirely new class, but that would require much more extensive efforts.
A further point to make regards teh specification of time zones. With the prevalence of "daylight savings times" it might be more unambiguous during (input when possible) and output to use the %z format:
dtm <- format( Sys.time(), format="%Y-%m-%d %H:%M:%S %z")
#output
format( Sys.time(), format="%Y-%m-%d %H:%M:%S %z")
[1] "2018-07-06 17:18:27 -0700"
#input and output without the formals change
as.POSIXct(dtm, format="%Y-%m-%d %H:%M:%S %z")
[1] "2018-07-06 17:21:41 PDT"
# after the formals change
as.POSIXct(dtm, format="%Y-%m-%d %H:%M:%S %z")
[1] "2018-07-07 00:21:41 UTC"
So when tz information is present as an offset, it can be handled correctly.
I have a DateTime object in R.
tempDateTime<-as.POSIXct("2017-07-13 01:40:00 MDT")
class(tempDateTime)
[1] "POSIXct" "POSIXt"
I would like to drop the minutes from the DateTime object. ie have "2017-07-13 01:00:00 MDT"
Is there a simple way to do this?
In Base R
trunc(tempDateTime, units = "hours")
# "2017-07-13 01:00:00 AEST"
This works because the round function in base R has a method to handle POSIX objects.
From ?round.POSIXt
Round or truncate date-time objects.
As #Thelatemail points out, this returns a POSIXlt object, so you may want to wrap the result in as.POSIXct() again.
Another note, POSIXct is an object that stores the number of seconds since "1970-01-01 00:00:00" (the Unix epoch).
as.numeric(tempDateTime)
# 1499874000
So the manual way to round-down the hours would be
as.POSIXct(floor(as.numeric(tempDateTime) / 3600) * 3600, origin = "1970-01-01")
Try this:
library(lubridate)
> floor_date(tempDateTime, "hour")
[1] "2017-07-13 01:00:00 PDT"