Converting time zones within time series - r

I brought in a time series into R using the parse_date_time function in the library(lubridate) and I brought it in as EST.
streamflowDateTime<-parse_date_time(streamflowDateTime,"%m%d%Y %H%M",tz="EST")
However, the data experiences DST on 04-03-2005 01:45 and the next time step is 03:00. I want to convert this occurrence and all the time stamps that follow to EST by subtracting an hour so that it is continuous. It would be preferred if there was an automated way to do it where the program figures out where DST starts taking place and moves back an hour itself, since DST does not take effect every year on the same day at the same time.
Here's a sample of the data
structure(c(1112475600, 1112476500, 1112477400, 1112478300, 1112479200,
1112480100, 1112481000, 1112481900, 1112482800, 1112483700, 1112484600,
1112485500, 1112486400, 1112487300, 1112488200, 1112489100, 1112490000,
1112490900, 1112491800, 1112492700, 1112493600, 1112494500, 1112495400,
1112496300, 1112497200, 1112498100, 1112499000, 1112499900, 1112500800,
1112501700, 1112502600, 1112503500, 1112504400, 1112505300, 1112506200,
1112507100, 1112508000, 1112508900, 1112509800, 1112510700, 1112515200,
1112516100, 1112517000, 1112517900, 1112518800, 1112519700, 1112520600,
1112521500, 1112522400, 1112523300, 1112524200, 1112525100, 1112526000,
1112526900, 1112527800, 1112528700, 1112529600, 1112530500, 1112531400,
1112532300, 1112533200, 1112534100, 1112535000, 1112535900, 1112536800,
1112537700, 1112538600, 1112539500, 1112540400, 1112541300, 1112542200,
1112543100, 1112544000, 1112544900, 1112545800, 1112546700, 1112547600,
1112548500, 1112549400, 1112550300, 1112551200, 1112552100, 1112553000,
1112553900, 1112554800, 1112555700, 1112556600, 1112557500, 1112558400,
1112559300, 1112560200, 1112561100, 1112562000, 1112562900, 1112563800,
1112564700, 1112565600, 1112566500, 1112567400, 1112568300, 1112569200
), class = c("POSIXct", "POSIXt"), tzone = "EST")
Edits:
streamflowDateTime[8840:length(streamflowDateTime)] <- streamflowDateTime[8840:length(streamflowDateTime)]-hours(1)
In the full entire data set, the occurence happens at location 8840, which I know manually, I want the code to automatically find the position where the time difference between two consecutive time stamps is not 15 minutes and replace the '8840' in code with that automated value. for loops are too slow

You can probably just supply the full IANA time zone ID America/New_York instead of the time zone abbreviation.
parse_date_time(streamflowDateTime,"%m%d%Y %H%M",tz="America/New_York")
Using America/New_York will properly account for both EST and EDT, including the correct transitions between them.
This seems to be supported, as seen in this blog post - at least on systems that provide IANA/Olson time zones, such as Linux or Mac.
According to the docs:
... R does not come with a predefined list zone names, but relies on the user's OS to interpret time zone names. As a result, some names will be recognized on some computers but not others. Most computers, however, will recognize names in the timezone data base originally compiled by Arthur Olson. These names normally take the form "Country/City." ...
Since Windows uses its own set of time zones, you will probably not be able to use IANA/Olson identifiers. However:
The equivalent Windows time zone id would be "Eastern Standard Time". (Despite the name, this covers both EST and EDT). I am uncertain if R supports these or not.
The fully qualified POSIX time zone for the current rule would be "EST5EDT,M3.2.0,M11.1.0". This should work on all OS's - however it only represents the US Eastern Time Zone since the 2007 change.
From 1987-2006 the rule would have been "EST5EDT,M4.1.0,M10.5.0". Use the appropriate rule for the values you're working in. If you have dates that span these periods, you'll need to split them up and process them separately, or if possible, write a function to use the correct rule for the data.
See also, the timezone tag wiki.

Related

BlueSky Statistics - String to date [time] issues

Trying to convert time as a string to a time variable.
Use Date/Dates/Convert String to Date...... for format I use %H:%M:%S....
Here is the syntax from the GUI
[Convert String Variables to Date]
BSkystrptime (varNames = c('Time'),dateFormat = "%H:%M:%S",prefixOrSuffix = "prefix",prefixOrSuffixValue = "Con_",data = "Dataset2")
BSkyLoadRefreshDataframe(dframe=Dataset2,load.dataframe=TRUE)
A screen shot of result is attached....
Compare variables Time [string] to Con_Time [date/time]
The hours are 2 hours out [wrong!] - the Minutes and Seconds are correct.
What am I doing wrong here?
Screen Shot
I believe you are running into a known issue with a prior release of BlueSky Statistics. This issue is fixed with the current stable release available on the download page.
The reason for this was although the time is converted correctly into the local time zone, BlueSky Statistics was reading the time zone in the local time zone and converting it to UTC.
You are probably +2 hours ahead of UTC, so you are seeing the time move 2 hrs back. Give us a couple of days to post a patch.
You can also confirm this by writing and executing the following syntax in the syntax window
Dataset2$Con_Time

Why is timezone converted when putting date-time into vector in R? [duplicate]

In R, I have a bunch of datetime values that I measure in GMT. I keep running into accidents where some function or another loses the timezone on my values, or even loses the class name. Even on functions so basic as c() and unlist():
> dput(x)
structure(1317830532, class = c("POSIXct", "POSIXt"), tzone = "GMT")
> dput(c(x))
structure(1317830532, class = c("POSIXct", "POSIXt"))
> dput(list(x))
list(structure(1317830532, class = c("POSIXct", "POSIXt"), tzone = "GMT"))
> dput(unlist(list(x)))
1317830532
I feel like I'm a hair's breadth away from having a real Mars Climate Orbiter moment if this happens when I least expect it. Anyone have any strategies for making sure their dates "stay put"?
This behaviour is documented in ?c, ?DateTimeClasses and ?unlist:
From ?DateTimeClasses:
Using c on "POSIXlt" objects converts them to the current time zone, and on "POSIXct" objects drops any "tzone" attributes (even if they are all marked with the same time zone).*
From ?c:
c is sometimes used for its side effect of removing attributes except names.*
That said, my testing indicates that the integrity of your data remains intact, despite using c or unlist. For example:
x <- structure(1317830532, class = c("POSIXct", "POSIXt"),
tzone = "GMT")
y <- structure(1317830532+3600, class = c("POSIXct", "POSIXt"),
tzone = "PST8PDT")
x
[1] "2011-10-05 16:02:12 GMT"
y
[1] "2011-10-05 10:02:12 PDT"
strftime(c(x, y), format="%Y/%m/%d %H:%M:%S", tz="GMT")
[1] "2011/10/05 16:02:12" "2011/10/05 17:02:12"
strftime(c(x, y), format="%Y/%m/%d %H:%M:%S", tz="PST8PDT")
[1] "2011/10/05 09:02:12" "2011/10/05 10:02:12"
strftime(unlist(y), format="%Y/%m/%d %H:%M:%S", tz="PST8PDT")
[1] "2011/10/05 10:02:12"
Your Mars Rover should be OK if you use R to keep track of dates.
Why not set your timezone to GMT for your R sessions, then? If something gets converted to the "current" timezone, it is still right.
Given that this is documented behavior and one should either avoid such functions or else defensively code around such behavior, then you need mechanisms to support either approach. For things like this, I would recommend writing a "poor man's lint"; with such a lint detector, you can go about restoring sanity In addition, to lint detection, there are several approaches to avoiding Mars Polar Orbiter crashes, some are independent of each other, others dependent:
Set a policy & build alternatives First, for all of the functions that you know are causing you problems, either decide that you won't use them, or write a new wrapper function that will behave as intended, and that will set the timezone parameter you desire. Then, ensure that you use that special wrapper rather than the underlying function.
Static analysis Write a search function using your favorite editor (e.g. as a macro), using a shell script & the GNU find and grep functions, or in some other manner (e.g. grep in R), to find those particular functions that are causing you problems. When found, either remove or use a defensive coding method (e.g. the wrapper in #1).
Testing Using unit tests, e.g. Runit or testthat, develop tests that ensure that timezone properties are maintained when using your functions or package. Every time there's a new bug, create a new test to ensure that bug doesn't appear again in released versions.
Weak type checking You can also include tests throughout your code that test whether a timezone is specified. It's best to have your own function for this test, rather than write a block of code that is reproduced throughout. In this way, you can eventually extend the checking to include other types of checks, such as persistence of the timezone and tests for whether operations on two or more objects are mindful of differences in timezones (maybe they allow it, maybe they don't).
Map everything to one TZ Also known as Indiana-be-damned. Retaining a variety of policies about the timezones is hard work, and is essentially friction in working with temporal data. Just map to one TZ (UTC) and then let anything local work from that. If you happen to have local regularity that is invariant of DST, then address that after converting back from UTC.
I do all of #s 1-4 for other issues, but, just as they're easily adapted to timezone checking, they're fairly reusable for lots of Mars Orbiter-avoiding objectives. I do this kind of thing precisely to avoid coding the next such Mars Orbiter. (That was an expensive lesson for all of us that work with numerical data. :))

Converting a Go Time from UnixDate to RFC3339 Fails to Preserve TimeZone

I am converting a UnixDate formatted time string to an RFC3339 formatted time using Go's time package. This seems to be easy and works well on my local machine, but when run on a remote host, the timezone info seems to get lost.
The input time is Eastern Australian Standard Time (EST) and seems to be interpreted as UTC by time.Parse().
Code snippet available here:
package main
import "fmt"
import "time"
func main() {
t,_ := time.Parse(time.UnixDate,"Mon Jan 14 21:50:45 EST 2013")
fmt.Println(t.Format(time.RFC3339)) // prints time as Z
t2,_:=time.Parse(time.RFC3339,t.Format(time.RFC3339))
fmt.Println(t2.Format(time.UnixDate)) // prints time as UTC
}
Do I need to specifically set locales or anything?
Timezone parsing in Go doesn't always work as expected. But that is a great deal due to the fact that timezone abbreviation names are ambiguous. For example, does EST in your scenario mean Eastern Australian Standard Time (GMT+11) or Eastern Standard Time (GMT-5)?
If your local time is "Eastern Australian Standard Time", Go will assume you mean local time. That is why it worked on your local computer. But since the server is not using that as local time, there is no reason to assume you mean Sydney time. Instead, Go chooses neither of the EST timezones and creates a fake time.Location with the name "EST" but the effect of UTC. The only way to tell it was originally meant to be EST would be to call t.Location().String().
The author of the time package wrote a bug report explaining how the timezone parsing works here.

Guard against accidental time-zone conversion

In R, I have a bunch of datetime values that I measure in GMT. I keep running into accidents where some function or another loses the timezone on my values, or even loses the class name. Even on functions so basic as c() and unlist():
> dput(x)
structure(1317830532, class = c("POSIXct", "POSIXt"), tzone = "GMT")
> dput(c(x))
structure(1317830532, class = c("POSIXct", "POSIXt"))
> dput(list(x))
list(structure(1317830532, class = c("POSIXct", "POSIXt"), tzone = "GMT"))
> dput(unlist(list(x)))
1317830532
I feel like I'm a hair's breadth away from having a real Mars Climate Orbiter moment if this happens when I least expect it. Anyone have any strategies for making sure their dates "stay put"?
This behaviour is documented in ?c, ?DateTimeClasses and ?unlist:
From ?DateTimeClasses:
Using c on "POSIXlt" objects converts them to the current time zone, and on "POSIXct" objects drops any "tzone" attributes (even if they are all marked with the same time zone).*
From ?c:
c is sometimes used for its side effect of removing attributes except names.*
That said, my testing indicates that the integrity of your data remains intact, despite using c or unlist. For example:
x <- structure(1317830532, class = c("POSIXct", "POSIXt"),
tzone = "GMT")
y <- structure(1317830532+3600, class = c("POSIXct", "POSIXt"),
tzone = "PST8PDT")
x
[1] "2011-10-05 16:02:12 GMT"
y
[1] "2011-10-05 10:02:12 PDT"
strftime(c(x, y), format="%Y/%m/%d %H:%M:%S", tz="GMT")
[1] "2011/10/05 16:02:12" "2011/10/05 17:02:12"
strftime(c(x, y), format="%Y/%m/%d %H:%M:%S", tz="PST8PDT")
[1] "2011/10/05 09:02:12" "2011/10/05 10:02:12"
strftime(unlist(y), format="%Y/%m/%d %H:%M:%S", tz="PST8PDT")
[1] "2011/10/05 10:02:12"
Your Mars Rover should be OK if you use R to keep track of dates.
Why not set your timezone to GMT for your R sessions, then? If something gets converted to the "current" timezone, it is still right.
Given that this is documented behavior and one should either avoid such functions or else defensively code around such behavior, then you need mechanisms to support either approach. For things like this, I would recommend writing a "poor man's lint"; with such a lint detector, you can go about restoring sanity In addition, to lint detection, there are several approaches to avoiding Mars Polar Orbiter crashes, some are independent of each other, others dependent:
Set a policy & build alternatives First, for all of the functions that you know are causing you problems, either decide that you won't use them, or write a new wrapper function that will behave as intended, and that will set the timezone parameter you desire. Then, ensure that you use that special wrapper rather than the underlying function.
Static analysis Write a search function using your favorite editor (e.g. as a macro), using a shell script & the GNU find and grep functions, or in some other manner (e.g. grep in R), to find those particular functions that are causing you problems. When found, either remove or use a defensive coding method (e.g. the wrapper in #1).
Testing Using unit tests, e.g. Runit or testthat, develop tests that ensure that timezone properties are maintained when using your functions or package. Every time there's a new bug, create a new test to ensure that bug doesn't appear again in released versions.
Weak type checking You can also include tests throughout your code that test whether a timezone is specified. It's best to have your own function for this test, rather than write a block of code that is reproduced throughout. In this way, you can eventually extend the checking to include other types of checks, such as persistence of the timezone and tests for whether operations on two or more objects are mindful of differences in timezones (maybe they allow it, maybe they don't).
Map everything to one TZ Also known as Indiana-be-damned. Retaining a variety of policies about the timezones is hard work, and is essentially friction in working with temporal data. Just map to one TZ (UTC) and then let anything local work from that. If you happen to have local regularity that is invariant of DST, then address that after converting back from UTC.
I do all of #s 1-4 for other issues, but, just as they're easily adapted to timezone checking, they're fairly reusable for lots of Mars Orbiter-avoiding objectives. I do this kind of thing precisely to avoid coding the next such Mars Orbiter. (That was an expensive lesson for all of us that work with numerical data. :))

Python 3: timestamp to datetime: where does this additional hour come from?

I'm using the following functions:
# The epoch used in the datetime API.
EPOCH = datetime.datetime.fromtimestamp(0)
def timedelta_to_seconds(delta):
seconds = (delta.microseconds * 1e6) + delta.seconds + (delta.days * 86400)
seconds = abs(seconds)
return seconds
def datetime_to_timestamp(date, epoch=EPOCH):
# Ensure we deal with `datetime`s.
date = datetime.datetime.fromordinal(date.toordinal())
epoch = datetime.datetime.fromordinal(epoch.toordinal())
timedelta = date - epoch
timestamp = timedelta_to_seconds(timedelta)
return timestamp
def timestamp_to_datetime(timestamp, epoch=EPOCH):
# Ensure we deal with a `datetime`.
epoch = datetime.datetime.fromordinal(epoch.toordinal())
epoch_difference = timedelta_to_seconds(epoch - EPOCH)
adjusted_timestamp = timestamp - epoch_difference
date = datetime.datetime.fromtimestamp(adjusted_timestamp)
return date
And using them with the passed code:
twenty = datetime.datetime(2010, 4, 4)
print(twenty)
print(datetime_to_timestamp(twenty))
print(timestamp_to_datetime(datetime_to_timestamp(twenty)))
And getting the following results:
2010-04-04 00:00:00
1270339200.0
2010-04-04 01:00:00
For some reason, I'm getting an additional hour added in the last call, despite my code having, as far as I can see, no flaws.
Where is this additional hour coming from?
# Ensure we deal with `datetime`s.
date = datetime.datetime.fromordinal(date.toordinal())
(That's chopping off the time-of-day completely, as ‘ordinal’ is only a day number. Is that what you meant to do? I suspect not.)
Anyway, as Michael said, datetime.fromtimestamp gives you a naïve datetime corresponding to what local time for that POSIX (UTC) timestamp would be for you. So when you call —
date = datetime.datetime.fromtimestamp(adjusted_timestamp)
you're getting the local time for the POSIX timestamp representing 2010-04-04T00:00:00, which of course in BST is an hour ahead. This doesn't happen in the return direction because your epoch is in January, when BST is not in force. (However your EPOCH would also be completely off if you weren't in the UK.)
You should replace both your uses of datetime.fromtimestamp with datetime.utcfromtimestamp.
It's sad that datetime continues the awful time tradition of keeping times in local time. Calling them ‘naïve’ and taking away the DST flag just makes them even worse. Personally I can't stand to use datetime, preferring integer UTC timestamps for everything (converting to local timezones for formatting only).
Judging by your profile, you're in the UK. That means you're currently running on UTC+1 due to DST.
If I take your timestamp and run it through datetime.fromtimestamp on Python 2.6 (I know you use Python 3, but this is what I have), that shows me that it believes it refers to 2010-04-04 02:00:00 - and I'm in CEST, so that's UTC+2.
Running datetime.fromtimestamp(0), I get that the epoch is 1970-01-01 01:00:00. This then shows me that it is correctly adding only a single hour (since January 1st is outside of DST, and the epoch is midnight UTC on that date, it would be 01:00 here).
In other words, your problem is that you're sending in a time which has DST applied, but datetime_to_timestamp treats it as if DST didn't exist. timestamp_to_datetime, however, applies the DST.
Unfortunately, I don't know enough Python to know how you would solve this, but this should at least give you something to go on.

Resources