I'm working with the POSIXct data type in R. In my work, I incorporate a function that returns two POSIXct dates in a vector. However, I am discovering some unexpected behavior. I wrote some example code to illustrate my problem:
# POSIXct returning issue:
returnTime <- function(date) {
oneDay <- 60 * 60 * 24
nextDay <- date + oneDay
print(date)
print(nextDay)
return(c(date, nextDay))
}
myTime <- as.POSIXct("2015-01-01", tz = "UTC")
bothDays <- returnTime(myTime)
print(bothDays)
The print statements in the function give:
[1] "2015-01-01 UTC"
[1] "2015-01-02 UTC"
While the print statement at the end of the code gives:
[1] "2014-12-31 19:00:00 EST" "2015-01-01 19:00:00 EST"
I understand what is happening, but I don't see as to why. It could be a simple mistake that is eluding me, but I really am quite confused. I don't understand why the time zone is changing on the return. The class is still POSIXct as well, just the time zone has changed.
Additionally, I did the same as above, but just returned one of the dates and the date's timezone did not change. I can work around this for now, but wanted to see if anyone had any insight to my problem. Thank you in advance!
Thanks for the help below. I instead did:
return(list(date, nextDay))
and this solved my issue of the time zone being dropped.
From ?c.POSIXct:
Using c on "POSIXlt" objects converts them to the current time zone,
and on "POSIXct" objects drops any "tzone" attributes (even if they
are all marked with the same time zone).
See also here.
The problem is that the function c removes the timezone attribute:
attributes(myTime)
#$class
#[1] "POSIXct" "POSIXt"
#
#$tzone
#[1] "UTC"
attributes(c(myTime))
#$class
#[1] "POSIXct" "POSIXt"
To fix, you can e.g. use the setattr function from data.table, to modify the attribute in place:
(setattr(c(myTime), 'tzone', attributes(myTime)$tzone))
#[1] "2015-01-01 UTC"
Related
Format of original input data is like: "1975M01", the variable name is Month.
The way we put the input into POSIXct date-time object is: parse_date_time(Month, "%Y%m").
I wonder why we ignored the 'M'(1975M01), like how can the function recognize the year and month and ignore the M in the middle automatically?
You can try as.POSIXct(). We have no days, though. But probably we want the first day of the month and just paste 01 at the end of the string(s). Finally define the "M" literally in format= string, and of course add %day,
x <- as.POSIXct(paste0('1975M01', '01'), format='%YM%m%d')
x
# [1] "1975-01-01 CET"
where
class(x)
# [1] "POSIXct" "POSIXt"
.
This should also work with lubridate but can't install it.
I want to convert a numeric variable to POSIXct using anytime. My issue is that anytime(<numeric>) converts the input variable as well - I want to keep it.
Simple example:
library(anytime)
t_num <- 1529734500
anytime(t_num)
# [1] "2018-06-23 08:15:00 CEST"
t_num
# [1] "2018-06-23 08:15:00 CEST"
This differs from the 'non-update by reference' behaviour of as.POSIXct in base R:
t_num <- 1529734500
as.POSIXct(t_num, origin = "1970-01-01")
# [1] "2018-06-23 08:15:00 CEST"
t_num
# 1529734500
Similarly, anydate(<numeric>) also updates by reference:
d_num <- 17707
anydate(d_num)
# [1] "2018-06-25"
d_num
# [1] "2018-06-25"
I can't find an explicit description of this behaviour in ?anytime. I could use as.POSIXct as above, but does anyone know how to handle this within anytime?
anytime author here: this is standard R and Rcpp and passing-by-SEXP behaviour: you cannot protect a SEXP being passed from being changed.
The view that anytime takes is that you are asking for an input to be converted to a POSIXct as that is what anytime does: from char, from int, from factor, from anything. As a POSIXct really is a numeric value (plus a S3 class attribute) this is what you are getting.
If you do not want this (counter to the design of anytime) you can do what #Moody_Mudskipper and #PKumar showed: used a temporary expression (or variable).
(I also think the data.table example is a little unfair as data.table -- just like Rcpp -- is very explicit about taking references where it can. So of course it refers back to the original variable. There are idioms for deep copy if you need them.)
Lastly, an obvious trick is to use format if you just want different display:
R> d <- data.frame(t_num=1529734500)
R> d[1, "posixct"] <- format(anytime::anytime(d[1, "t_num"]))
R> d
t_num posixct
1 1529734500 2018-06-23 01:15:00
R>
That would work the same way in data.table, of course, as the string representation is a type change. Ditto for IDate / ITime.
Edit: And the development version in the Github repo has had functionality to preserve the incoming argument since June 2017. So the next CRAN version, whenever I will push it, will have it too.
You could hack it like this:
library(anytime)
t_num <- 1529734500
anytime(t_num+0)
# POSIXct[1:1], format: "2018-06-23 08:15:00"
t_num
# [1] 1529734500
Note that an integer input will be treated differently:
t_int <- 1529734500L
anytime(t_int)
# POSIXct[1:1], format: "2018-06-23 08:15:00"
t_int
# [1] 1529734500
If you do this, it will work :
t_num <- 1529734500
anytime(t_num*1)
#> anytime(t_num*1)
#[1] "2018-06-23 06:15:00 UTC"
#> t_num
#[1] 1529734500
Any reason to be married to anytime?
.POSIXct(t_num, tz = 'Europe/Berlin')
# [1] "2018-06-23 08:15:00 CEST"
.POSIXct(x, tz) is a wrapper for structure(x, class = c('POSIXct', 'POSIXt'), tzone = tz) (i.e. you can ignore declaring the origin), and is essentially as.POSIXct.numeric (except the latter is flexible in allowing non-UTC origin dates), look at print(as.POSIXct.numeric).
When I did my homework before posting the question, I checked the open anytime issues. I have now browsed the closed ones as well, where I found exactly the same issue as mine:
anytime is overwriting inputs
There the package author writes:
I presume because as.POSIXct() leaves its input alone, we should too?
So from anytime version 0.3.1 (unreleased):
Numeric input is now preserved rather than silently cast to the return object type
Thus, one answer to my question is: "wait for 0.3.1"*.
When 0.3.1 is released, the behaviour of anytime(<numeric>) will agree with anytime(<non-numeric>) and as.POSIXct(<numeric>), and work-arounds not needed.
*Didn't have to wait too long: 0.3.1 is now released: "Numeric input is now preserved rather than silently cast to the return object type"
Why does the ymd_hms function from R's lubridate package return "2018-01-09 15:43:44.843 UTC" for ymd_hms('2018-01-09T15:43:44.844Z')?
I naively would have expected "2018-01-09 15:43:44.844 UTC".
ymd_hms('2018-01-09T15:43:44.822Z') returns "2018-01-09 15:43:44.822 UTC".
Since this is GMT/UTC, I don't believe daylight savings would be a factor, and different values for the truncated = option don't seem to make a difference.
From the ymd_hms documention:
NOTE: The ymd family of functions are based on strptime()
As described here, the issue seems to lie in how strptime truncates rather than rounds fractions of a second. Play around with options(digits.secs = n) to see how it handles various numbers of decimal places.
I would rather use:
format(strptime("2018-01-09T15:43:44.844Z", "%Y-%m-%dT%H:%M:%OS", tz = "EST"), format="%Y-%m-%d %H:%M:%OS3 %Z", tz = "EST")
[1] "2018-01-09 15:43:44.844 EST"
Note that the milliseconds are passed in the format using the argument %OS
In ymd from lubridate, the default value of tz was UTC. I don't know exactly when the change was made but I know that in 1.5 the default was UTC but in 1.5.8 the default is now NULL.
This changes the output of ymd from POSIXct objects to Date objects which breaks a lot of my code where I rely on having a POSIXct object but now have a Date. Is there a convenient way to make this backwards compatible or do I need to add the tz='UTC' to all of my old code that relied on this?
Write a wrapper to replace ymd with ymd_hms for which the default is still tz = "UTC"
library(lubridate)
ymd2 = function(x){
ymd_hms(paste(x, "00:00:00"))
}
ymd2("2017/3/4")
#[1] "2017-03-04 UTC"
class(ymd2("2017/3/4"))
#[1] "POSIXct" "POSIXt"
I have a dataframe where one column lists a bunch of datetimes. Oddly, the data type for that column is "integer." I need to coerce the column to a proper datetime data type such as POSIXct so that I can subtract these timestamps from those in another field. However, when I try to coerce these datetime values into POSIXct, they lose the time component. When I try to do math on the datetimes without first coercing into another datatype, R acts as if the time component of the timestamp isn't there (it assumes each date has a time of midnight). What's going on and how do I fix it so that R recognizes the timestamp?
> dates[1]
[1] 2016-05-05T16:46:21-04:00
48 Levels: 2016-05-03T06:45:42-04:00 2016-05-03T06:45:43-04:00 ... 2016-05-05T16:50:00-04:00
> typeof(dates)
[1] "integer"
> as.POSIXct(dates[1])
[1] "2016-05-05 EDT"
> as.character(dates[1])
[1] "2016-05-05T16:46:21-04:00"
> as.POSIXct(as.character(dates[1]))
[1] "2016-05-05 EDT"
You can use as.POSIXct with the tz argument to convert the timestamps with the right level of control.
If the timezones are all UTC-04:00 and that is your local timezone, you can use:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz=Sys.timezone())
If they are all UTC-04:00 and that is not your local timezone, but you know the exact location, then you can specify the appropriate timezone from the tz database:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz="America/Port_of_Spain")
Alternatively, you can use a generic GMT-4 timezone:
dates = as.POSIXct(dates, format="%Y-%m-%dT%H:%M:%S", tz="Etc/GMT-4")
[EDIT: With thanks to Roland for his comment below. I originally used strptime, which uses the same syntax, but returns a POSIXlt object.]