lubridate parses dates one day off - r

When I put a single date to be parsed, it parses accurately
> ymd("20011001")
[1] "2001-10-01 UTC"
But when I try to create a vector of dates they all come out one day off:
> b=c(ymd("20111001"),ymd("20101001"),ymd("20091001"),ymd("20081001"),ymd("20071001"),ymd("20061001"),ymd("20051001"),ymd("20041001"),ymd("20031001"),ymd("20021001"),ymd("20011001"))
> b
[1] "2011-09-30 19:00:00 CDT" "2010-09-30 19:00:00 CDT" "2009-09-30 19:00:00 CDT"
[4] "2008-09-30 19:00:00 CDT" "2007-09-30 19:00:00 CDT" "2006-09-30 19:00:00 CDT"
[7] "2005-09-30 19:00:00 CDT" "2004-09-30 19:00:00 CDT" "2003-09-30 19:00:00 CDT"
[10] "2002-09-30 19:00:00 CDT" "2001-09-30 19:00:00 CDT"
how can I fix this??? Many thanks.

I don't claim to understand exactly what's going on here, but the proximal problem is that c() strips attributes, so using c() on a POSIX[c?]t vector changes it from UTC to the time zone specified by your locale strips the time zone attribute, messing it up (even if you set the time zone to agree with the one specified by your locale). On my system:
library(lubridate)
(y1 <- ymd("20011001"))
## [1] "2001-10-01 UTC"
(y2 <- ymd("20011002"))
c(y1,y2)
## now in EDT (and a day earlier/4 hours before UTC):
## [1] "2001-09-30 20:00:00 EDT" "2001-10-01 20:00:00 EDT"
(y12 <- ymd(c("20011001","20011002")))
## [1] "2001-10-01 UTC" "2001-10-02 UTC"
c(y12)
## back in EDT
## [1] "2001-09-30 20:00:00 EDT" "2001-10-01 20:00:00 EDT"
You can set the time zone explicitly ...
y3 <- ymd("20011001",tz="EDT")
## [1] "2001-10-01 EDT"
But c() is still problematic.
(y3c <- c(y3))
## [1] "2001-09-30 20:00:00 EDT"
So two solutions are
convert a character vector rather than combining the objects after converting them one by one or
restore the tzone attribute after combining.
For example:
attr(y3c,"tzone") <- attr(y3,"tzone")
#Joran points out that this is almost certainly a general property of applying c() to POSIX[c?]t objects, not specifically lubridate-related. I hope someone will chime in and explain whether this is a well-known design decision/infelicity/misfeature.
Update: there is some discussion of this on R-help in 2012, and Brian Ripley comments:
But in any case, the documentation (?c.POSIXct) is clear:
Using ‘c’ on ‘"POSIXlt"’ objects converts them to the current time
zone, and on ‘"POSIXct"’ objects drops any ‘"tzone"’ attributes
(even if they are all marked with the same time zone).
So the recommended way is to add a "tzone" attribute if you know what
you want it to be. POSIXct objects are absolute times: the timezone
merely affects how they are converted (including to character for
printing).
It might be nice if lubridate added a method to do this ...

Related

Generate a uniformly sampled time series object in R

Hi I am looking to generate a uniformly sampled time series at 30 minute interval from a particular start date to some end date. However the constraint is that on each day the 30 minute interval begins at 7:00 and ends at 18:30 i.e. I need the time series object to be something like
c('2016-08-19 07:00:00',
'2016-08-19 07:30:00',
...,
'2016-08-19 18:30:00',
'2016-08-20 07:00:00',
...,
'2016-08-20 18:30:00',
...
'2016-08-31 18:30:00')
Without the constraints it can be done with something like
seq(as.POSIXct('2016-08-19 07:00:00'), as.POSIXct('2016-08-21 18:30:00'), by="30 min")
But I dont want the times between '2016-08-20 18:30:00' and '2016-08-21 07:30:00' in this case. Any help will be appreciated. Thanks!
Using the example series you created:
ts <- seq(as.POSIXct('2016-08-19 07:00:00'),
as.POSIXct('2016-08-21 18:30:00'), by="30 min")
Pull out the hours from your series using strftime:
hours <- strftime(ts, format="%H:%M:%S")
> head(hours)
[1] "07:00:00" "07:30:00"
[3] "08:00:00" "08:30:00"
[5] "09:00:00" "09:30:00"
You can then convert it back to POSIXct:
hours <- as.POSIXct(hours, format="%H:%M:%S")
This will retain the times of the day but it will make the date today's date:
> head(hours)
[1] "2016-09-11 07:00:00 EDT"
[2] "2016-09-11 07:30:00 EDT"
[3] "2016-09-11 08:00:00 EDT"
[4] "2016-09-11 08:30:00 EDT"
[5] "2016-09-11 09:00:00 EDT"
[6] "2016-09-11 09:30:00 EDT"
> tail(hours)
[1] "2016-09-11 16:00:00 EDT"
[2] "2016-09-11 16:30:00 EDT"
[3] "2016-09-11 17:00:00 EDT"
[4] "2016-09-11 17:30:00 EDT"
[5] "2016-09-11 18:00:00 EDT"
[6] "2016-09-11 18:30:00 EDT"
You can then create a TRUE/FALSE vector based on the condition you want:
condition <- hours > "2016-09-11 07:30:00 EDT" &
hours < "2016-09-11 18:30:00 EDT"
Then filter your original series based on this condition:
ts[condition]
Here is my short and handy solution with package lubridate
library("lubridate")
list <- lapply(0:2, function(x){
temp <- ymd_hms('2016-08-19 07:00:00') + days(x)
result <- temp + minutes(seq(0, 690, 30))
return(strftime(result))
})
do.call("c", list)
I have to use strftime(result) to remove the timezone and to have the right times.

How to handle an ambiguous century in datetime objects?

I am playing around with datetime stuff in R and cannot figure out how to alter the time origin to accept older dates. For example:
vals <- as.character(60:70)
as.POSIXct(vals, origin="1900-01-01", format = "%y")
# [1] "2060-07-25 EDT" "2061-07-25 EDT" "2062-07-25 EDT" "2063-07-25 EDT"
# [5] "2064-07-25 EDT" "2065-07-25 EDT" "2066-07-25 EDT" "2067-07-25 EDT"
# [9] "2068-07-25 EDT" "1969-07-25 EDT" "1970-07-25 EDT"
Is it possible to adjust the origin such that as.POSIXct will return 1960 for an input of "60"? What is the best way to handle an ambiguous century?
You can't make as.POSIXct return 1960 for an input of "60". See ?strptime:
‘%y’ Year without century (00-99). On input, values 00 to 68 are
prefixed by 20 and 69 to 99 by 19 - that is the behaviour
specified by the 2004 and 2008 POSIX standards, but they do
also say ‘it is expected that in a future version the default
century inferred from a 2-digit year will change’.
You need to prepend the century to the string and use the "%Y" format if you want different behavior with as.POSIXct.
vals <- as.character(60:70)
as.POSIXct(paste0("19",vals), format = "%Y")
If some of the two-digit dates are after 2000, you can use ifelse or something similar to prepend a different century.
newvals <- paste0(ifelse(vals < "20", "20", "19"), vals)
Assuming that you might want some years greater than 2000, prepending 19 to the vector might not be desirable.
In this case subtracting 100 years might be better.
library(lubridate)
vals <- as.character(60:70)
vals <- as.POSIXct(vals, origin="1900-01-01", format = "%y")
vals[year(vals)>2059] <- vals[year(vals)>2059] - years(100)
vals
[1] "1960-07-25 CDT" "1961-07-25 CDT" "1962-07-25 CDT"
[4] "1963-07-25 CDT" "1964-07-25 CDT" "1965-07-25 CDT"
[7] "1966-07-25 CDT" "1967-07-25 CDT" "1968-07-25 CDT"
[10] "1969-07-25 CDT" "1970-07-25 CDT"

Converting character to POSIXct in R loses time zone

I am trying to convert a character string into a POSIXct date format and running into a problem with the time zone information.
The original character data looks like this:
SD$BGN_DTTM
[1] "1956-05-25 14:30:00 CST" "1956-06-05 16:30:00 CST" "1956-07-04 15:30:00 CST"
[4] "1956-07-08 08:00:00 CST" "1956-08-19 12:00:00 CST" "1956-12-23 00:50:00 CST"
but when I attempt to convert using as.POSIXct , this happens:
SD$BGN_DTTM <- as.POSIXct(SD$BGN_DTTM)
[1] "1956-05-25 14:30:00 PDT" "1956-06-05 16:30:00 PDT" "1956-07-04 15:30:00 PDT"
[4] "1956-07-08 08:00:00 PDT" "1956-08-19 12:00:00 PDT" "1956-12-23 00:50:00 PST"
It looks like the function isn't reading the time zone I've specified. Since my computer is on PDT, it looks like it has used that instead. Note also that it has appended PST to the last date (seems odd). Can anyone tell me what is going on here, and whether there is a method to get R to read the time zone information as shown?
This would still have the problem you noticed with daylight/standard times:
> strptime(test, format="%Y-%m-%d %H:%M:%S", tz="America/Chicago")
[1] "1956-05-25 14:30:00 CDT" "1956-06-05 16:30:00 CDT"
[3] "1956-07-04 15:30:00 CDT" "1956-07-08 08:00:00 CDT"
[5] "1956-08-19 12:00:00 CDT" "1956-12-23 00:50:00 CST"
The strptime function refuses to honor the "%Z" format for input (which in its defense is documented.) Many people have lost great gobs of hair and probably some keyboards into monitors in efforts to get R timezones working to their (dis?)satisfaction.
As we all know, time is a relative thing. Storing time as UTC/GMT or relative to UTC/GMT will make sure that daylight savings etc only come into play when you want them to, as per: Does UTC observe daylight saving time?
So, if:
x <- c("1956-05-25 14:30:00 CST","1956-06-05 16:30:00 CST", "1956-07-04 15:30:00 CST",
"1956-07-08 08:00:00 CST", "1956-08-19 12:00:00 CST","1956-12-23 00:50:00 CST")
You can find out that CST is 6 hours behind UTC/GMT (as opposed to CDT, which is daylight savings time and is 7 hours behind)
Therefore:
out <- as.POSIXct(x,tz="ETC/GMT+6")
will represent CST without any daylight savings shift to CDT.
That way when or if you convert to local central timezones, the proper CST time will be returned without changing the actual data for daylight savings. (i.e. - when R prints CDT, it is only shifting the display of the time forward an hour, but the underlying numerical data is not changed. The last case displays as expected when standard time kicks back in):
attr(out,"tzone") <- "America/Chicago"
out
#[1] "1956-05-25 15:30:00 CDT" "1956-06-05 17:30:00 CDT" "1956-07-04 16:30:00 CDT"
#[4] "1956-07-08 09:00:00 CDT" "1956-08-19 13:00:00 CDT" "1956-12-23 00:50:00 CST"
I.e. - for case 1, 15:30 CDT == 14:30 CST - as originally specified, and when daylight savings stops, for case 6, 00:50 CST == 00:50 CST as originally specified.
Comparing this final out to the other answer, you can see there is an actual numerical time difference of one hour for all the daylight savings cases:
out - strptime(x, format="%Y-%m-%d %H:%M:%S", tz="America/Chicago")
#Time differences in secs
#[1] 3600 3600 3600 3600 3600 0

Changing dates in different time zones by adding to POSIXlt

I am running into an error when I try to localize times for "date" (a variable of class=POSIXlt) in my dataset. Example code is as follows:
# All dates are coded by survey software in EST(not local time)
date <- c("2011-07-26 07:23", "2011-07-29 07:34", "2011-07-29 07:40")
region <-c("USA-EST", "UK", "Singapore")
#Change the times based on time-zone differences
start_time<-strptime(date,"%Y-%m-%d %h:%m")
localtime=as.POSIXlt(start_time)
localtime<-ifelse(region=="UK",start_time+6,start_time)
localtime<-ifelse(region=="Singapore",start_time+12,start_time)
#Then, I need to extract the hour and weekday
weekday<-weekdays(localtime)
hour<-factor(localtime)
There must be something wrong with my "ifelse" statement, because I get the error: number of items to replace is not a multiple of replacement length. Please help!
How about using R's native time code? The trick is that you can't have more than one time-zone in a POSIX vector, so use a list instead:
region <- c("EST","Europe/London","Asia/Singapore")
(localtime <- lapply(seq(date),function(x) as.POSIXlt(date[x],tz=region[x])))
[[1]]
[1] "2011-07-26 07:23:00 EST"
[[2]]
[1] "2011-07-29 07:34:00 Europe/London"
[[3]]
[1] "2011-07-29 07:40:00 Asia/Singapore"
And to convert to a vector in a single timezone:
Reduce("c",localtime)
[1] "2011-07-26 13:23:00 BST" "2011-07-29 07:34:00 BST"
[3] "2011-07-29 00:40:00 BST"
Note that my system timezone is BST, but if yours is EST it will convert to that.
You can use the timezone handling built in in POSIXct:
> start_time <- as.POSIXct(date,"%Y-%m-%d %H:%M", tz = "America/New_York")
> start_time
[1] "2011-07-26 07:23:00 EDT" "2011-07-29 07:34:00 EDT" "2011-07-29 07:40:00 EDT"
> format(start_time, tz="Europe/London", usetz=TRUE)
[1] "2011-07-26 12:23:00 BST" "2011-07-29 12:34:00 BST" "2011-07-29 12:40:00 BST"
> format(start_time, tz="Asia/Singapore", usetz=TRUE)
[1] "2011-07-26 19:23:00 SGT" "2011-07-29 19:34:00 SGT" "2011-07-29 19:40:00 SGT"

Round a POSIX date (POSIXct) with base R functionality

I'm currently playing around a lot with dates and times for a package I'm building.
Stumbling across this post reminded me again that it's generally not a bad idea to check out if something can be done with basic R features before turning to contrib packages.
Thus, is it possible to round a date of class POSIXct with base R functionality?
I checked
methods(round)
which "only" gave me
[1] round.Date round.timeDate*
Non-visible functions are asterisked
This is what I'd like to do (Pseudo Code)
x <- as.POSIXct(Sys.time())
[1] "2012-07-04 10:33:55 CEST"
round(x, atom="minute")
[1] "2012-07-04 10:34:00 CEST"
round(x, atom="hour")
[1] "2012-07-04 11:00:00 CEST"
round(x, atom="day")
[1] "2012-07-04 CEST"
I know this can be done with timeDate, lubridate etc., but I'd like to keep package dependencies down. So before going ahead and checking out the source code of the respective packages, I thought I'd ask if someone has already done something like this.
base has round.POSIXt to do this. Not sure why it doesn't come up with methods.
x <- as.POSIXct(Sys.time())
x
[1] "2012-07-04 10:01:08 BST"
round(x,"mins")
[1] "2012-07-04 10:01:00 BST"
round(x,"hours")
[1] "2012-07-04 10:00:00 BST"
round(x,"days")
[1] "2012-07-04"
On this theme with lubridate, also look into the ceiling_date() and floor_date() functions:
x <- as.POSIXct("2009-08-03 12:01:59.23")
ceiling_date(x, "second")
# "2009-08-03 12:02:00 CDT"
ceiling_date(x, "hour")
# "2009-08-03 13:00:00 CDT"
ceiling_date(x, "day")
# "2009-08-04 CDT"
ceiling_date(x, "week")
# "2009-08-09 CDT"
ceiling_date(x, "month")
# "2009-09-01 CDT"
If you don't want to call external libraries and want to keep POSIXct as I do this is one idea (inspired by this question): use strptime and paste a fake month and day. It should be possible to do it more straight forward, as said in this comment
"For strptime the input string need not specify the date completely:
it is assumed that unspecified seconds, minutes or hours are zero, and
an unspecified year, month or day is the current one."
Thus it seems that you have to use strftime to output a truncated string, paste the missing part and convert again in POSIXct.
This is how an update answer could look:
x <- as.POSIXct(Sys.time())
x
[1] "2018-12-27 10:58:51 CET"
round(x,"mins")
[1] "2018-12-27 10:59:00 CET"
round(x,"hours")
[1] "2018-12-27 11:00:00 CET"
round(x,"days")
[1] "2018-12-27 CET"
as.POSIXct(paste0(strftime(x,format="%Y-%m"),"-01")) #trunc by month
[1] "2018-12-01 CET"
as.POSIXct(paste0(strftime(x,format="%Y"),"-01-01")) #trunc by year
[1] "2018-01-01 CET"

Resources