Having weird problems converting character strings to POSIX objects [duplicate] - r

I would like to use R for time series analysis. I want to make a time-series model and use functions from the packages timeDate and forecast.
I have intraday data in the CET time zone (15 minutes data, 4 data points per hour). On March 31st daylight savings time is implemented and I am missing 4 data points of the 96 that I usually have. On October 28th I have 4 data points too many as time is switched back.
For my time series model I always need 96 data points, as otherwise the intraday seasonality gets messed up.
Do you have any experiences with this? Do you know an R function or a package that would be of help to automat such data handling - something elegant?
Thank you!

I had a similar problem with hydrological data from a sensor. My timestamps were in UTC+1 (CET) and did not switch to daylight saving time (UTC+2, CEST). As I didn't want my data to be one hour off (which would be the case if UTC were used) I took the %z conversion specification of strptime. In ?strptime you'll find:
%z Signed offset in hours and minutes from UTC, so -0800 is 8 hours
behind UTC.
For example: In 2012, the switch from Standard Time to DST occured on 2012-03-25, so there is no 02:00 on this day. If you try to convert "2012-03-25 02:00:00" to a POSIXct-Object,
> as.POSIXct("2012-03-25 02:00:00", tz="Europe/Vienna")
[1] "2012-03-25 CET"
you don't get an error or a warning, you just get date without the time (this behavior is documented).
Using format = "%z" gives the desired result:
> as.POSIXct("2012-03-25 02:00:00 +0100", format="%F %T %z", tz="Europe/Vienna")
[1] "2012-03-25 03:00:00 CEST"
In order to facilitate this import, I wrote a small function with appropriate defaults values:
as.POSIXct.no.dst <- function (x, tz = "", format="%Y-%m-%d %H:%M", offset="+0100", ...)
{
x <- paste(x, offset)
format <- paste(format, "%z")
as.POSIXct(x, tz, format=format, ...)
}
> as.POSIXct.no.dst(c("2012-03-25 00:00", "2012-03-25 01:00", "2012-03-25 02:00", "2012-03-25 03:00"))
[1] "2012-03-25 00:00:00 CET" "2012-03-25 01:00:00 CET" "2012-03-25 03:00:00 CEST"
[4] "2012-03-25 04:00:00 CEST"

If you don't want daylight saving time, convert to a timezone that doesn't have it (e.g. GMT, UTC).
times <- .POSIXct(times, tz="GMT")

Here is getting the daylight savings time offset -
e.g. Central Daylight Savings time
> Sys.time()
"2015-08-20 07:10:38 CDT" # I am at America/Chicago daylight time
> as.POSIXct(as.character(Sys.time()), tz="America/Chicago")
"2015-08-20 07:13:12 CDT"
> as.POSIXct(as.character(Sys.time()), tz="UTC") - as.POSIXct(as.character(Sys.time()), tz="America/Chicago")
Time difference of -5 hours
> as.integer(as.POSIXct(as.character(Sys.time()), tz="UTC") - as.POSIXct(as.character(Sys.time()), tz="America/Chicago"))
-5
Some inspiration was from
Converting time zones in R: tips, tricks and pitfalls

Related

Different parsing behaviour for the first day of April in R as.POSIXct and as.POSIXlt, is R april fooling me?

I am unexperienced in working with data format in R, and I am struggling to understand the different behaviour with the first of April... is it an april fool?? :)
They have the same format, but it seems that the first day can't be parsed using as.POSIXct (when other dates show no issues) or it does not returns the time zone with as.POSIXlt?
(as.POSIXct("1/04/2012 02:58", format = "%d/%m/%Y %H:%M")) # this doesn't work
(as.POSIXct("2/04/2012 02:58", format = "%d/%m/%Y %H:%M")) # this works
(as.POSIXct("01/04/2012 02:58", format = "%d/%m/%Y %H:%M")) # this doesn't
(as.POSIXct("02/04/2012 02:58", format = "%d/%m/%Y %H:%M")) # this does...
(as.POSIXlt("1/04/2012 02:58", format = "%d/%m/%Y %H:%M")) # This works, but does not returns a time zone
(as.POSIXlt("2/04/2012 02:58", format = "%d/%m/%Y %H:%M")) # This works, and returns a time zone
(as.POSIXlt("01/04/2012 02:58", format = "%d/%m/%Y %H:%M")) # This works, and does not returns a time zone
(as.POSIXlt("02/04/2012 02:58", format = "%d/%m/%Y %H:%M")) # This works, and returns a time zone
Any direction as to why? Thanks!
This is almost certainly a daylight savings time issue. Not sure why POSIXct and POSIXlt behave differently though. From your profile, it looks like you're in Mexico.
From here:
most of Mexico, including capital Mexico City, will set the clocks 1 hour forward 3 weeks later, on Sunday, April 1, 2012.
So the problem is that 2:58 AM on 1 April 2012 did not exist in the time zone that is currently active in your locale.
Unless there is something specific having to do with the POSIXct/POSIXlt difference, this should probably be closed as a duplicate of e.g.:
What is wrong with this date and time?
R POSIXct returns NA with "03/12/2017 02:17:13"
PosixCT conversion in R fails
Weird as.POSIXct behavior depending on daylight savings time
Strange strptime behavior in R
as.POSIX error, can not convert a particular date
Weird POSIX behaviour for two closely time strings with and without specifying the format
And this r help question
If you want to deal with this e.g. by setting all times to UTC (i.e. ignoring your local time zone settings), I believe there are lots of suggestions on Stack Overflow (now that you know to search for "daylight savings time" it should be easy to find them).
obligatory xkcd
#Ben Bolker is correct that this is a daylight saving time issue. Specifically, this is what I call a nonexistent time issue. In Mexico City, on April 1st 2012, there was a DST gap of 1 hour where the clocks jumped from 01:59:59 AM straight to 03:00:00 AM, skipping the two o'clock hour entirely. So 02:58:00 AM is a nonexistent time on that day.
These problems can be really frustrating, so in the clock package I've made parsing issues like this an error by default, with many ways to get around them according to your needs.
For future visitors to this post, here is a reprex with the full output from as.POSIXc/lt() vs clock. The relevant clock function is date_time_parse().
library(clock)
x <- c("1/04/2012 02:58", "2/04/2012 02:58")
zone <- "America/Mexico_City"
format <- "%d/%m/%Y %H:%M"
# Nonexistent time - returns NA
as.POSIXct(x, tz = zone, format = format)
#> [1] NA "2012-04-02 02:58:00 CDT"
# Nonexistent time - can't determine zone
as.POSIXlt(x, tz = zone, format = format)
#> [1] "2012-04-01 02:58:00" "2012-04-02 02:58:00 CDT"
# Errors on nonexistent time so you don't have surprising results
date_time_parse(x, zone = zone, format = format)
#> Error: Nonexistent time due to daylight saving time at location 1.
#> ℹ Resolve nonexistent time issues by specifying the `nonexistent` argument.
# Next valid time
date_time_parse(x, zone = zone, format = format, nonexistent = "roll-forward")
#> [1] "2012-04-01 03:00:00 CDT" "2012-04-02 02:58:00 CDT"
# Previous valid time
date_time_parse(x, zone = zone, format = format, nonexistent = "roll-backward")
#> [1] "2012-04-01 01:59:59 CST" "2012-04-02 02:58:00 CDT"
# Shift forward by the size of the gap (1 hour)
date_time_parse(x, zone = zone, format = format, nonexistent = "shift-forward")
#> [1] "2012-04-01 03:58:00 CDT" "2012-04-02 02:58:00 CDT"
# NA on nonexistent times
date_time_parse(x, zone = zone, format = format, nonexistent = "NA")
#> [1] NA "2012-04-02 02:58:00 CDT"

R posixct dates and times not centering on midnight

I have dates and times stored in two columns. The first has the date as "20180831." The time is stored as the number of seconds from midnight; 3am would be stored as 10,800.
I need a combined date time column and am having a hard time with something that should be simple.
I can get the dates in no problem but lubridate "hms" interprets the time field as a period, not a 'time' per se.
I tried converting the date to posix.ct format and then using that as the origin for the time field but posix.ct does not set the time for midnight, instead it sets it for either 1800 or 1900 hours depending on the date. I need it set to midnight for all rows, I don't want any daylight savings time adjustment.
Here's the code:
First I made a function because there are several date and time fields I have to do this for.
mkdate<-function(x){
a<-as.Date(as.character(x),format='%Y%m%d')
a<-as.POSIXct(a)
return(a)
}
df$date<-mkdate(df$date) #applies date making function to date field
df$datetime<-as.POSIXct(df$time,origin=df$date)
I'm sure this has to do with time zones. I'm in Central time zone and I have experimented with adding the "tz" specification into these commands in both the mkdate function and in the time code creating "datetime" column.
I've tried:
tz="America/Chicago"
tz="CST"
tz="UTC"
Help would be much appreciated!
Edited with example:
x<-c(20180831,20180710,20160511,20170105,20180101) #these are dates.
as.POSIXct(as.Date(as.character(x),format="%Y%m%d"))
Above code converts dates to seconds from the Jan 1 1970. I could convert this to numeric and add my 'seconds' value to this field BUT it is not correct. This is what I see instead as the output:
[1] "2018-08-30 19:00:00 CDT" "2018-07-09 19:00:00 CDT" "2016-05-10 19:00:00 CDT" "2017-01-04 18:00:00 CST" "2017-12-31 18:00:00 CST"
Look at the first date - it should be 8/31 but instead it is 8/30. Somewhere in there there is a timezone adjustment taking place. It's moving the clock back 5 or 6 hours because I am on central time. The first entry should be 2018-08-31 00:00:00. I would then convert it to numeric and add the seconds field on and convert back to POSIXct format. I've tried including tz specification all over the place with no luck.
Sys.getlocale("LC_TIME")
returns "English_United States.1252"
I believe the following does what you want.
My locale is the following, so the results are different from yours.
Sys.getlocale("LC_TIME")
#[1] "Portuguese_Portugal.1252"
The difference will be due to the daylight savings time, the summer hour.
As for your problem, all you have to do is to remeber that the objects of class "POSIXct are coded as the number of seconds since an origin, and that origin is usually the midnight of 1970-01-01. So you have to add your seconds since midnight to the seconds of as.Date.
x <- "20180831"
xd <- mkdate(x)
y <- 10800
as.POSIXct(as.integer(xd) + y, origin = "1970-01-01")
#[1] "2018-08-31 04:00:00 BST"
as.POSIXct(as.integer(xd) + y, origin = "1970-01-01", tz = "America/Chicago")
#[1] "2018-08-30 22:00:00 CDT"
There are very many ways to do this:
mktime = function(a, b)modifyList(strptime(a, '%Y%m%d'), list(sec = as.numeric(gsub(',', '', b))))
mktime("20180831",'10,800')
[1] "2018-08-31 03:00:00 PDT"
mktime('20180301','10800')
[1] "2018-03-01 03:00:00 PST"
mktime('20180321','10800')
[1] "2018-03-21 03:00:00 PDT"
Looking at the above code, it does not adjust for the daylight saving time. Irrespective of the date, the seconds still show that it Is 3 AM, including the dates when ST-->DT. This will also take into consideration, your LOCAL timezone.

POSIXct in R strange behavior

I would like to ask the R gurus to comment of the following:
as.POSIXct("05/11/1998 09:35", "%m/%d/%Y %H:%M",tz="EST") - as.POSIXct("1998-05-11 09:35:00 EST")
Time difference of 1 hours
Shouldn't it be zero since dates are the same?
Thanks.
According to ?strptime (which ?as.POSIXct points to) the format= argument should be
A character string. The default for the ‘format’ methods is
‘"%Y-%m-%d %H:%M:%S"’ if any element has a time component
which is not midnight, and ‘"%Y-%m-%d"’ otherwise. If
‘options("digits.secs")’ is set, up to the specified number
of digits will be printed for seconds.
The time "1998-05-11 09:35:00 EST" has a format of "%Y-%m-%d %H:%M:%S %Z". However %Z can only be used for output (see ?strptime)
If you provide the tz= argument to the second call, it will work as expected
> as.POSIXct("05/11/1998 09:35", "%m/%d/%Y %H:%M",tz="EST") - as.POSIXct("1998-05-11 09:35:00 EST", tz="EST")
Time difference of 0 secs
It is worth noting that
'EST' is a time zone used in Canada _without_ daylight saving time, and not
‘EST5EDT’ nor (Australian) Eastern Standard Time.)
(see ?timezone)

Import date-time at a specified timezone, disregard Daylight Savings Time

I have time series data obtained from a data logger that was set to one time zone without daylight savings (NZST or UTC+12:00), and the data spans a few years. Data loggers don't consider DST changes, and are synchronized to local time with/without DST (depending who deployed it).
However, when I get the data into R, I'm unable to properly use as.POSIXct to ignore DST. I'm using R 2.14.0 on a Windows computer with these settings:
> Sys.timezone()
[1] "NZDT"
> Sys.getlocale("LC_TIME")
[1] "English_New Zealand.1252"
Here are three timestamps across the spring DST change, each are spaced 1 hour apart:
> ts_str <- c("28/09/2008 01:00", "28/09/2008 02:00", "28/09/2008 03:00")
> as.POSIXct(ts_str, format="%d/%m/%Y %H:%M", tz="")
[1] "2008-09-28 01:00:00 NZST" NA
[3] "2008-09-28 03:00:00 NZDT"
> as.POSIXct(ts_str, format="%d/%m/%Y %H:%M", tz="UTC")
[1] "2008-09-28 01:00:00 UTC" "2008-09-28 02:00:00 UTC"
[3] "2008-09-28 03:00:00 UTC"
As you can see, the clocks jumped forward at 1:59 to 3:00, so 2:00 is invalid, thus NA. Furthermore, I can use tz="UTC" to get it to ignore DST changes. However, I'd rather keep the correct time zone since I have other data series recorded with DST (NZDT or UTC+13:00) that I'd like to blend in (via merge) for my analysis.
How do I configure the tz parameter on a MS Windows computer? I've tried many things, such as "NZST", "New Zealand Standard Time", "UTC+12:00", "+1200", etc., but no luck. Or do I modify some other setting?
You can use tz="Etc/GMT+12":
as.POSIXct(ts_str, format="%d/%m/%Y %H:%M", tz="Etc/GMT+12")
[1] "2008-09-28 01:00:00 GMT+12" "2008-09-28 02:00:00 GMT+12"
[3] "2008-09-28 03:00:00 GMT+12"
For a full list of available timezones use,
dir(file.path(R.home("share"),"zoneinfo"), recursive=TRUE)
There are a couple of of .tab files in there which aren't timezones but hold some information, but my regex-fu isn't good enough to be able to exclude them with the pattern argument to dir.
If just add 12*60*60 to that UTC derived vector, you will have local "standard" time.

Why doesn't R recognize 'CST' as a valid timezone?

This code works:
ISOdatetime(2011,4,7,12,0,0, tz = "EST")
This code does not:
ISOdatetime(2011,4,7,12,0,0, tz = "CST")
I want the central time zone, with no adjustment for daylight savings. What am I doing wrong? Where can I find a table of timezones recognized by R?
edit: Thanks for the info Josh, but ISOdatetime(2011,3,13,2,0,0, tz = "America/Chicago") yields NA, and is unfortunately a value in my dataset. Any ideas how to deal with this? It seems like my dataset is on Chicago time, but does not observe daylight savings time.
See ?timezone and the file, R_HOME/share/zoneinfo/zone.tab.
There's no such thing as "the central time zone, with no adjustment for daylight savings". The US central time zone has DST rules and they have changed over the years. You could always read in your dates as GMT, add 6 hours, then convert to CST6CDT.
> .POSIXct(ISOdatetime(2011,3,13,2,0,0, tz="GMT")+3600*6, tz="CST6CDT")
[1] "2011-03-13 03:00:00 CDT"
> .POSIXct(ISOdatetime(2011,3,13,2,0,0, tz="GMT")+3600*6, tz="America/Chicago")
[1] "2011-03-13 03:00:00 CDT"

Resources