Setting time zone in weatherData queries - r

I am using the weatherData package, specifically, its getDetailedWeather function. It returns a data frame, one of the component of the data frame is Time, of class POSIXct. My problem is that all the Time comes set to the local timezone of the machine I am using. I am pretty sure that this is incorrect, that the data reflects the local time, and the only thing the API does is add the timezone to the data, without changing it. Am I correct? How can I tell the API to stop using my timezone as default?
E.g.:
library(weatherData)
dat <- getDetailedWeather("NRT", "2014-04-29")
dat$Time
# [1] "2014-04-29 00:00:00 EST" ## local timezone, not of the weather station

Looking at the results of the example in ?getDetailedWeather:
library(weatherData)
dat <- getDetailedWeather("NRT", "2014-04-29")
dat$Time
# [1] "2014-04-29 00:00:00 EST" "2014-04-29 00:30:00 EST" "2014-04-29 01:00:00 EST" etc
The returned times seem to be 'correct', in that it goes from 00:00 to 23:30. The timezone for the data is not that of the weather station though, but rather of the host computer system. You may be best off just changing this output data once you have it, as R will always present date/time POSIXct objects in the local timezone by default, e.g.:
as.POSIXct(as.character(dat$Time),tz="UTC")
# [1] "2014-04-29 00:00:00 UTC" "2014-04-29 00:30:00 UTC" "2014-04-29 01:00:00 UTC" etc
The above changes the timezone to a new timezone (in this case "UTC", but you could use one appropriate for the weather station location) without affecting the time of day data. See here: Valid time zones in lubridate for identifying local timezone codes.

Related

Problem with converting unix time zone in lubridate

I import dataset from treasuredata in a JSON format and then transform unix time column to standard time using Lubridate fuction "as_datetime".
I set tz(timezone) as GMT+7 as I am in Bangkok Thailand, and use that convert datetime in my analysis. Later, I found that it is wrong and for the correct datetime I have to input tz as "GMT-7". I dont understand why.
as_datetime(1565923100)
#[1] "2019-08-16 02:38:20 UTC"
as_datetime(1565923100, tz = "gmt+7")
#[1] "2019-08-15 19:38:20 gmt"
as_datetime(1565923100, tz = "gmt-7")
#[1] "2019-08-16 09:38:20 gmt"
now()
#[1] "2019-08-16 10:16:17 +07"
From the code, the gmt-7 gives the correct time while I think I should use the gmt+7 because Bangkok timezone is gmt+7. I check using 'now' function and got the correct time too. I dont understand the logic behind the code. Thank you for any explanation.

POSIXct origin base type causes time zone differences

I've run into a problem with managing time zones with POSIXct in R. I have set the TZ option globally as "Europe/London" but since we have switched back to GMT have run as.POSIXct no longer converts the numeric vector back to the right time.
Digging into why I found that differences in time zone can be caused by the object type used to set the origin date.
For example:
# Date time is set as 1 second after 1970-01-01
as.POSIXct(1, origin = "1970-01-01")
# [1] "1970-01-01 01:00:01 BST"
# Same numeric value, but one hour less now that the origin is set using a POSIXct
as.POSIXct(1, origin = as.POSIXct("1970-01-01"))
# [1] "1970-01-01 00:00:01 BST"
The first value doesn't really make sense given that the query was taken outside of British summer time, yet these were taken in GMT (see results below):
Sys.timezone()
# [1] "Europe/London"
Sys.time()
# [1] "2018-10-31 11:05:36 GMT"
Even when you explicitly state the time zone at each stage, the hour difference still persists:
as.POSIXct(1, origin = "1970-01-01", tz = "Europe/London")
# [1] "1970-01-01 01:00:01 BST"
as.POSIXct(1, origin = as.POSIXct("1970-01-01", tz = "Europe/London"), "Europe/London")
# [1] "1970-01-01 00:00:01 BST"
To make matters worse the documentation resulting from ?as.POSIXct is pretty vague about the management of time zones, specifically:
If a time zone is needed and that specified is invalid on your system,
what happens is system-specific but attempts to set it will probably
be ignored.
Given this, I have a series of questions:
1) Why does as.POSIXct(1, origin = "1970-01-01", tz = "Europe/London") add an hour? Even when the origin date would be parsed as a GMT time and the time zone has been set explicitly.
2) What is the best method of ensuring that you time zone in R is consistent when converting from numeric in R?
3) What is the best practice for managing time zones in R? Is there a good reference, especially for POSIXct date types.
You are in a bit of history here for question 1. See below all outcomes for BST, GMT and UTC. UTC and GMT should be (and are) the same.
Now, why do you get BST with the first line of code?
That is because in 1970 the UK was the whole year on BST. Actually, the UK was on BST from 1968-02-18 to 1971-10-31. Which means R is correct by returning "1970-01-01 01:00:01 BST" when you supply the timezone for "Europe/London". See for more info on this wikipedia page.
Times:
as.POSIXct(1, origin = "1970-01-01", tz = "Europe/London")
[1] "1970-01-01 01:00:01 BST"
as.POSIXct(1, origin = "1970-01-01", tz = "GMT")
[1] "1970-01-01 00:00:01 GMT"
as.POSIXct(1, origin = "1970-01-01", tz = "UTC")
[1] "1970-01-01 00:00:01 UTC"
Q2: First you need to know from which time zone the dates are from. Then either keep working in that time zone or change the time zone to your local time zone. Or strip the timezone of the date time object, which would force everything to UTC.
I would say lubridate's force_tz and with_tz functions to force the time zones. But since you don't want lubridate, either set your local time zone to whatever you need. I tend to use Sys.setenv(TZ = "UTC") if I'm working with stock data so xts objects don't complain when I have a different local time.
Q3: here is a bit from R for Data Science
here is an SO post on time zones

R posixct dates and times not centering on midnight

I have dates and times stored in two columns. The first has the date as "20180831." The time is stored as the number of seconds from midnight; 3am would be stored as 10,800.
I need a combined date time column and am having a hard time with something that should be simple.
I can get the dates in no problem but lubridate "hms" interprets the time field as a period, not a 'time' per se.
I tried converting the date to posix.ct format and then using that as the origin for the time field but posix.ct does not set the time for midnight, instead it sets it for either 1800 or 1900 hours depending on the date. I need it set to midnight for all rows, I don't want any daylight savings time adjustment.
Here's the code:
First I made a function because there are several date and time fields I have to do this for.
mkdate<-function(x){
a<-as.Date(as.character(x),format='%Y%m%d')
a<-as.POSIXct(a)
return(a)
}
df$date<-mkdate(df$date) #applies date making function to date field
df$datetime<-as.POSIXct(df$time,origin=df$date)
I'm sure this has to do with time zones. I'm in Central time zone and I have experimented with adding the "tz" specification into these commands in both the mkdate function and in the time code creating "datetime" column.
I've tried:
tz="America/Chicago"
tz="CST"
tz="UTC"
Help would be much appreciated!
Edited with example:
x<-c(20180831,20180710,20160511,20170105,20180101) #these are dates.
as.POSIXct(as.Date(as.character(x),format="%Y%m%d"))
Above code converts dates to seconds from the Jan 1 1970. I could convert this to numeric and add my 'seconds' value to this field BUT it is not correct. This is what I see instead as the output:
[1] "2018-08-30 19:00:00 CDT" "2018-07-09 19:00:00 CDT" "2016-05-10 19:00:00 CDT" "2017-01-04 18:00:00 CST" "2017-12-31 18:00:00 CST"
Look at the first date - it should be 8/31 but instead it is 8/30. Somewhere in there there is a timezone adjustment taking place. It's moving the clock back 5 or 6 hours because I am on central time. The first entry should be 2018-08-31 00:00:00. I would then convert it to numeric and add the seconds field on and convert back to POSIXct format. I've tried including tz specification all over the place with no luck.
Sys.getlocale("LC_TIME")
returns "English_United States.1252"
I believe the following does what you want.
My locale is the following, so the results are different from yours.
Sys.getlocale("LC_TIME")
#[1] "Portuguese_Portugal.1252"
The difference will be due to the daylight savings time, the summer hour.
As for your problem, all you have to do is to remeber that the objects of class "POSIXct are coded as the number of seconds since an origin, and that origin is usually the midnight of 1970-01-01. So you have to add your seconds since midnight to the seconds of as.Date.
x <- "20180831"
xd <- mkdate(x)
y <- 10800
as.POSIXct(as.integer(xd) + y, origin = "1970-01-01")
#[1] "2018-08-31 04:00:00 BST"
as.POSIXct(as.integer(xd) + y, origin = "1970-01-01", tz = "America/Chicago")
#[1] "2018-08-30 22:00:00 CDT"
There are very many ways to do this:
mktime = function(a, b)modifyList(strptime(a, '%Y%m%d'), list(sec = as.numeric(gsub(',', '', b))))
mktime("20180831",'10,800')
[1] "2018-08-31 03:00:00 PDT"
mktime('20180301','10800')
[1] "2018-03-01 03:00:00 PST"
mktime('20180321','10800')
[1] "2018-03-21 03:00:00 PDT"
Looking at the above code, it does not adjust for the daylight saving time. Irrespective of the date, the seconds still show that it Is 3 AM, including the dates when ST-->DT. This will also take into consideration, your LOCAL timezone.

Import date-time at a specified timezone, disregard Daylight Savings Time

I have time series data obtained from a data logger that was set to one time zone without daylight savings (NZST or UTC+12:00), and the data spans a few years. Data loggers don't consider DST changes, and are synchronized to local time with/without DST (depending who deployed it).
However, when I get the data into R, I'm unable to properly use as.POSIXct to ignore DST. I'm using R 2.14.0 on a Windows computer with these settings:
> Sys.timezone()
[1] "NZDT"
> Sys.getlocale("LC_TIME")
[1] "English_New Zealand.1252"
Here are three timestamps across the spring DST change, each are spaced 1 hour apart:
> ts_str <- c("28/09/2008 01:00", "28/09/2008 02:00", "28/09/2008 03:00")
> as.POSIXct(ts_str, format="%d/%m/%Y %H:%M", tz="")
[1] "2008-09-28 01:00:00 NZST" NA
[3] "2008-09-28 03:00:00 NZDT"
> as.POSIXct(ts_str, format="%d/%m/%Y %H:%M", tz="UTC")
[1] "2008-09-28 01:00:00 UTC" "2008-09-28 02:00:00 UTC"
[3] "2008-09-28 03:00:00 UTC"
As you can see, the clocks jumped forward at 1:59 to 3:00, so 2:00 is invalid, thus NA. Furthermore, I can use tz="UTC" to get it to ignore DST changes. However, I'd rather keep the correct time zone since I have other data series recorded with DST (NZDT or UTC+13:00) that I'd like to blend in (via merge) for my analysis.
How do I configure the tz parameter on a MS Windows computer? I've tried many things, such as "NZST", "New Zealand Standard Time", "UTC+12:00", "+1200", etc., but no luck. Or do I modify some other setting?
You can use tz="Etc/GMT+12":
as.POSIXct(ts_str, format="%d/%m/%Y %H:%M", tz="Etc/GMT+12")
[1] "2008-09-28 01:00:00 GMT+12" "2008-09-28 02:00:00 GMT+12"
[3] "2008-09-28 03:00:00 GMT+12"
For a full list of available timezones use,
dir(file.path(R.home("share"),"zoneinfo"), recursive=TRUE)
There are a couple of of .tab files in there which aren't timezones but hold some information, but my regex-fu isn't good enough to be able to exclude them with the pattern argument to dir.
If just add 12*60*60 to that UTC derived vector, you will have local "standard" time.

Why doesn't R recognize 'CST' as a valid timezone?

This code works:
ISOdatetime(2011,4,7,12,0,0, tz = "EST")
This code does not:
ISOdatetime(2011,4,7,12,0,0, tz = "CST")
I want the central time zone, with no adjustment for daylight savings. What am I doing wrong? Where can I find a table of timezones recognized by R?
edit: Thanks for the info Josh, but ISOdatetime(2011,3,13,2,0,0, tz = "America/Chicago") yields NA, and is unfortunately a value in my dataset. Any ideas how to deal with this? It seems like my dataset is on Chicago time, but does not observe daylight savings time.
See ?timezone and the file, R_HOME/share/zoneinfo/zone.tab.
There's no such thing as "the central time zone, with no adjustment for daylight savings". The US central time zone has DST rules and they have changed over the years. You could always read in your dates as GMT, add 6 hours, then convert to CST6CDT.
> .POSIXct(ISOdatetime(2011,3,13,2,0,0, tz="GMT")+3600*6, tz="CST6CDT")
[1] "2011-03-13 03:00:00 CDT"
> .POSIXct(ISOdatetime(2011,3,13,2,0,0, tz="GMT")+3600*6, tz="America/Chicago")
[1] "2011-03-13 03:00:00 CDT"

Resources