melt.data.frame() changes behavior how POSIXct columns are printed - r

Melting the dataframe t.wide changes how the column "time" (class POSIXct) is printed.
t.wide <- data.frame(product=letters[1:5],
result=c(2, 4, 0, 0, 1),
t1=as.POSIXct("2014-05-26") + seq(0, 10800, length.out=5),
t2=as.POSIXct("2014-05-27") + seq(0, 10800, length.out=5),
t3=as.POSIXct("2014-05-28") + seq(0, 10800, length.out=5))
library(reshape2)
t.long <- melt(t.wide, measure.vars=c("t1", "t2", "t3"), value.name="time")
t.long$time
[1] 1401055200 1401057900 1401060600 1401063300 1401066000 1401141600 1401144300
[8] 1401147000 1401149700 1401152400 1401228000 1401230700 1401233400 1401236100
[15] 1401238800
attr(,"class")
[1] "POSIXct" "POSIXt"
Strangely, if print() is called explicitly, the object is printed as expected (timestamps, not their numeric representation).
print(t.long$time)
[1] "2014-05-26 00:00:00 CEST" "2014-05-26 00:45:00 CEST" "2014-05-26 01:30:00 CEST"
[4] "2014-05-26 02:15:00 CEST" "2014-05-26 03:00:00 CEST" "2014-05-27 00:00:00 CEST"
[7] "2014-05-27 00:45:00 CEST" "2014-05-27 01:30:00 CEST" "2014-05-27 02:15:00 CEST"
[10] "2014-05-27 03:00:00 CEST" "2014-05-28 00:00:00 CEST" "2014-05-28 00:45:00 CEST"
[13] "2014-05-28 01:30:00 CEST" "2014-05-28 02:15:00 CEST" "2014-05-28 03:00:00 CEST"
Setting the attributes to the same value as before magically changes how the object is printed.
attributes(t.long$time) <- attributes(t.long$time)
t.long$time
[1] "2014-05-26 00:00:00 CEST" "2014-05-26 00:45:00 CEST" "2014-05-26 01:30:00 CEST"
[4] "2014-05-26 02:15:00 CEST" "2014-05-26 03:00:00 CEST" "2014-05-27 00:00:00 CEST"
[7] "2014-05-27 00:45:00 CEST" "2014-05-27 01:30:00 CEST" "2014-05-27 02:15:00 CEST"
[10] "2014-05-27 03:00:00 CEST" "2014-05-28 00:00:00 CEST" "2014-05-28 00:45:00 CEST"
[13] "2014-05-28 01:30:00 CEST" "2014-05-28 02:15:00 CEST" "2014-05-28 03:00:00 CEST"
Can anyone explain this behavior?

UPDATE:
I opened this as Issue #50 on the git repo hadley/reshape2.
UPDATE: FIXED
This issue has been fixed in the development version of reshape2.
Thanks #kevin-ushey!
I believe the reason is because after the reshaping for whatever reason R does not think that t.long$time has attributes. For some reason the OBJECT flag (which indicates the vector has attributes) in the SEXP header for your vector is not being set. When you copy the attributes back to it, the OBJECT flag gets set and the correct print method is dispatched...
# No "OBJ" in SEXP header (the '[NAM(2),ATT]' part below)
.Internal(inspect( t.long$time ) )
##10359e548 14 REALSXP g0c6 [NAM(2),ATT] (len=15, tl=0) 1.40106e+09,...
# Now we have "OBJ" in the SEXP header indicating attributes
# So the print method for POSIXct get dispatched...
attributes(t.long$time) <- attributes(t.long$time)
.Internal(inspect( t.long$time ) )
##1118d7f50 14 REALSXP g0c6 [OBJ,NAM(2),ATT] (len=15, tl=0) 1.40106e+09,...
From the R Internals document...
The actual autoprinting is done by PrintValueEnv in file print.c. If the object to be printed has the S4 bit set and S4 methods dispatch is on, show is called to print the object. Otherwise, if the object bit is set (so the object has a "class" attribute), print is called to dispatch methods: for objects without a class the internal code of print.default is called.
Check the difference between..
print.default(t.long$time)
# [1] 1401058800 1401061500 1401064200 1401066900 1401069600 1401145200 1401147900 1401150600 1401153300 1401156000 1401231600 1401234300
#[13] 1401237000 1401239700 1401242400
#attr(,"class")
#[1] "POSIXct" "POSIXt"
print.POSIXct(t.long$time)
# [1] "2014-05-26 00:00:00 BST" "2014-05-26 00:45:00 BST" "2014-05-26 01:30:00 BST" "2014-05-26 02:15:00 BST" "2014-05-26 03:00:00 BST"
# [6] "2014-05-27 00:00:00 BST" "2014-05-27 00:45:00 BST" "2014-05-27 01:30:00 BST" "2014-05-27 02:15:00 BST" "2014-05-27 03:00:00 BST"
#[11] "2014-05-28 00:00:00 BST" "2014-05-28 00:45:00 BST" "2014-05-28 01:30:00 BST" "2014-05-28 02:15:00 BST" "2014-05-28 03:00:00 BST"
Now I can only speculate, but perhaps this is due to some internal code in reshape2 and is related to this warning..
One thing to watch is that if you copy attributes from one object to another you may (un)set the "class" attribute and so need to copy the object and S4 bits as well. There is a macro/function DUPLICATE_ATTRIB to automate this.

Related

Read character datetimes without timezones

I am trying to import in R a text file including datetimes. Times are stored in character format, without timezone information, but we know it is French time (Europe/Paris).
An issue arise for the days of timezone change: e.g. there is a time change from 2018-10-28 03:00:00 CEST to 2018-10-28 02:00:00 CET, thus we have duplicates in our character format, and R cannot tell wether it is CEST or CET.
Consider the following example:
data_in <- "date,val
2018-10-28 01:30:00,25
2018-10-28 02:00:00,26
2018-10-28 02:30:00,27
2018-10-28 02:00:00,28
2018-10-28 02:30:00,29
2018-10-28 03:00:00,30"
library(readr)
data <- read_delim(data_in, ",", locale = locale(tz = "Europe/Paris"))
We end up having duplicates in our dates:
data$date
[1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CET" "2018-10-28 02:00:00 CEST"
[5] "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
Expected output would be:
data$date
[1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CEST" "2018-10-28 02:00:00 CET"
[5] "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
Any idea how to solve the issue (besides telling people to use UTC or ISO formats). I guess the only way is to suppose the dates are sorted, so we can tell the first ones are CEST.
If you are certain that your time is always-increasing, then you can look for an apparent decrease (of time-of-day) and manually insert the TZ offset to the string, then parse as usual. I added some logic to look for this decrease only around 2-3am so that if you have multiple days of data spanning midnight, you would not get a false-alarm.
data <- read.csv(text = data_in)
fakedate <- as.POSIXct(gsub("^[-0-9]+ ", "2000-01-01 ", data$date))
decreases <- cumany(grepl(" 0[23]:", data$date) & c(FALSE, diff(fakedate) < 0))
data$date <- paste(data$date, ifelse(decreases, "+0100", "+0200"))
data
# date val
# 1 2018-10-28 01:30:00 +0200 25
# 2 2018-10-28 02:00:00 +0200 26
# 3 2018-10-28 02:30:00 +0200 27
# 4 2018-10-28 02:00:00 +0100 28
# 5 2018-10-28 02:30:00 +0100 29
# 6 2018-10-28 03:00:00 +0100 30
as.POSIXct(data$date, format="%Y-%m-%d %H:%M:%S %z", tz="Europe/Paris")
# [1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CEST"
# [4] "2018-10-28 02:00:00 CET" "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
My use of "2000-01-01" was just some non-DST day so that we can parse the timestamp into POSIXt and calculate a diff on it. (If we didn't insert a date, we could still use as.POSIXct with a format, but if you ever ran this on one of the two DST days, you might get different results since as.POSIXct("01:02:03", format="%H:%M:%S") always assumes "today".
This is obviously a bit fragile with its assumptions, but perhaps it'll be good enough for what you need.

Coercing a date object into a POSIXct object

When converting a date object to a POSIXct object, I expected the hours to be zero.
Turns out the hours are either 1 or 2, depending on summer/winter time.
eg:
oct.days <- (as.Date("2018-10-26")+0:5)
as.POSIXct(oct.days)
[1] "2018-10-26 02:00:00 CEST" "2018-10-27 02:00:00 CEST" "2018-10-28 02:00:00 CEST"
[4] "2018-10-29 01:00:00 CET" "2018-10-30 01:00:00 CET" "2018-10-31 01:00:00 CET"
(I'm in Germany, winter time was implemented on Oct 28th at 3 am.)
Rounding it down fixed the issue
round(as.POSIXct(oct.days),"days")
but I wonder for what reason the date object contains extra hours?
tks!

Find where a timestamp vector changes timezones

How can I find the element in a timestamp vector where it switches to a different time zone due to the time changeover?
Sample data:
ts <- structure(c(1521921600, 1521925200, 1521928800, 1521932400, 1521936000,
1521939600, 1521943200, 1521946800, 1521950400, 1521954000, 1521957600
), class = c("POSIXct", "POSIXt"))
Output:
"2018-03-24 21:00:00 CET" "2018-03-24 22:00:00 CET" "2018-03-24 23:00:00 CET" "2018-03-25 00:00:00 CET" "2018-03-25 01:00:00 CET" "2018-03-25 03:00:00 CEST" "2018-03-25 04:00:00 CEST" "2018-03-25 05:00:00 CEST" "2018-03-25 06:00:00 CEST" "2018-03-25 07:00:00 CEST" "2018-03-25 08:00:00 CEST"
The first 5 elements are in CET and then it switches to CEST. So the answer here would be 5 or 6. Both answers would be fine.
In the sample data the difference is always 1 hour, but I need it aswell for different time intervalls, for example 15 or 30 minutes.
seq(min(ts), to = max(ts), by = 15*60)
seq(min(ts), to = max(ts), by = 30*60)
The expected answer for 15 min would be 20/21.
The expected answer for 30 min would be 10/11.
You can use lubridate's dst:
which(!duplicated(dst(ts)))[2]
This will give you the point where the time zone changes to DST.

timezone issue in R, i want to use UTC+0100, even in summer, although CET switches to CEST automatically

I've got some data with POSIXct timestamps in "CET" (Central European Time = Winter time = UTC+0100) and "CEST" (Central European Summer Time = UTC+0200). Since I've had some trouble with plots and calculations because of that daylight savings time, I want all of the timestamps to be in UTC+0100 time.
Here is an example for my timestamps on switch-back-to-winter-time-day:
> tdf$time_posix_vec[1:20]
[1] "2015-10-25 00:00:00 CEST" "2015-10-25 00:15:00 CEST" "2015-10-25 00:30:00 CEST" "2015-10-25 00:45:00 CEST" "2015-10-25 01:00:00 CEST"
[6] "2015-10-25 01:15:00 CEST" "2015-10-25 01:30:00 CEST" "2015-10-25 01:45:00 CEST" "2015-10-25 02:00:00 CEST" "2015-10-25 02:15:00 CEST"
[11] "2015-10-25 02:30:00 CEST" "2015-10-25 02:45:00 CEST" "2015-10-25 02:00:00 CET" "2015-10-25 02:15:00 CET" "2015-10-25 02:30:00 CET"
[16] "2015-10-25 02:45:00 CET" "2015-10-25 03:00:00 CET" "2015-10-25 03:15:00 CET" "2015-10-25 03:30:00 CET" "2015-10-25 03:45:00 CET"
To demonstrate the issue i picked an example timestamp:
> tx <- tdf$time_posix_vec[7]
> tx
[1] "2015-10-25 01:30:00 CEST"
I already tried lubridate's with_tz function, but if I use it with "CET", this is what happens:
> with_tz(tx, tzone = "CET")
[1] "2015-10-25 01:30:00 CEST"
I assume, the timezone handler knows that in my location CET becomes CEST between last week of march and last week of october.
To solve the issue I could use Algeria's timezone, since Algeria uses CET without daylight savings time (as wikipedia told me). However, this could change in the future, and
I wonder if this solution would be a bit unsafe because of that?
> with_tz(tx, tzone = "Africa/Algiers")
[1] "2015-10-25 00:30:00 CET"
The best way, I thought, would be to use "UTC+1", but the behaviour of with_tz is exactly the opposite of what I expected:
> with_tz(tx, tzone = "UTC+1")
[1] "2015-10-24 22:30:00 UTC"
to get 00:30:00 I would have to use:
> with_tz(tx, tzone = "UTC-1")
[1] "2015-10-25 00:30:00 UTC"
but then also the label "UTC" is wrong, because in UTC it would be
> with_tz(tx, tzone = "UTC")
[1] "2015-10-24 23:30:00 UTC"
Why is "UTC+1" switching the timestamp to UTC-0100 instead of UTC+0100?
And is there a function that forces the timestamp to UTC+0100 and also gives puts the correct timezone label to the timestamp, so the result would be "2015-10-25 00:30:00 UTC+1"?
Thanks in advance,
greetings, Peter
I think I found the solution: now I use
t1 <- as.POSIXct("2016-07-12 17:43","Etc/GMT-1")
for example. It confused me that GMT-1 is the same as UTC+0100, they seem to turn around the sign at bsd style timezones.

switch to DST: round_date() returns NAs

In 2013, the switch from Central European Time (CET) to Central European Summer Time (CEST) took place on Sunday 2013-03-31. Clocks are advanced by one hour from 2am to 3pm, so basically there is no 2am.
start <- strptime("2013-03-31 01:00:00", format="%F %T", tz="CET")
times <- start + (0:5) * 60*15
times
[1] "2013-03-31 01:00:00 CET" "2013-03-31 01:15:00 CET"
[3] "2013-03-31 01:30:00 CET" "2013-03-31 01:45:00 CET"
[5] "2013-03-31 03:00:00 CEST" "2013-03-31 03:15:00 CEST"
Rounding the vector times to hours gives NAs. Even for times before 01:30, which aren't affected by the transition at all.
library(lubridate)
round_date(times, unit = "hour")
[1] "2013-03-31 01:00:00 CET" NA
[3] NA NA
[5] NA "2013-03-31 03:00:00 CEST"
This seems to be a bug, or am I missing something? I am running:
sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252
[3] LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
[5] LC_TIME=German_Austria.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.3.3
loaded via a namespace (and not attached):
[1] digest_0.6.4 memoise_0.2.1 plyr_1.8.1 Rcpp_0.11.2 stringr_0.6.2
It looks like the culprit is ceiling_date which is called by round_date:
ceiling_date(times,"hour")
[1] "2013-03-31 01:00:00 CET" NA
[3] NA NA
[5] NA "2013-03-31 04:00:00 CEST"
Looking at the code it works by adding 1 to the hour, thereby creating a non-existant time. It is definitely a bug.
base::round has support for times to do what you want though:
round(times,"hour")
[1] "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET"
[3] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
[5] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
It's an edge case and you could consider the behavior a bug. round_date uses ceiling_date and there this happens:
y <- floor_date(times - eseconds(1), "hour")
#[1] "2013-03-31 00:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 03:00:00 CEST"
hour(y) <- hour(y) + 1
#[1] "2013-03-31 01:00:00 CET" NA NA NA NA "2013-03-31 04:00:00 CEST"
As you see it tries to increment 2013-03-31 01:00:00 CET by one hour and doesn't deal correctly with the time zones.
The root issue is probably in the "hour<-" POSIXct S4 method.
This has been fixed in master:
> times <- ymd_hms("2013-03-31 01:00:00 CET", "2013-03-31 01:15:00 CEST",
+ "2013-03-31 01:30:00 CEST", "2013-03-31 01:45:00 CEST",
+ "2013-03-31 03:00:00 CEST", "2013-03-31 03:15:00 CEST",
+ tz = "Europe/Amsterdam")
> round_date(times, unit = "hour")
[1] "2013-03-31 01:00:00 CET" "2013-03-31 01:00:00 CET" "2013-03-31 03:00:00 CEST"
[4] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
> ceiling_date(times, unit = "hour")
[1] "2013-03-31 01:00:00 CET" "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST"
[4] "2013-03-31 03:00:00 CEST" "2013-03-31 03:00:00 CEST" "2013-03-31 04:00:00 CEST"

Resources