Below is my R-code that is behaving weirdly. I expect a time of 22:00 as entered but i get 23:00.
as.POSIXct(chron(dates="01/04/06",times="22:00:00"),tz="CET")
[1] "2006-01-04 23:00:00 CET"
In the next line of my code i use the results to select a window from an xts/zoo object: Therefore just ignoring the error and instead entering 21:00 (in above) wasnt useful since it returns the wrong data. Windowing with the result of the code above returns the correct values.
head(qs<-as.zoo(window(Q,start=as.POSIXct(chron(dates="01/04/06",times="22:00:00"),tz="CET"),end=as.POSIXct(chron(dates="01/05/06",times="21:00:00"),tz="CET"))))
Here is a sample set of the data (Q):
Stage.Qm Flow.Qm Stage.QmDB Flow.QmDB Stage.Q1000 Flow.Q1000 Stage.Q1000DB Flow.Q1000DB
2006-01-04 23:00:00 541.1589 5.636957 541.1592 5.646017 541.5708 20.44692 541.5708 20.44692
2006-01-04 23:01:00 541.1589 5.637268 541.1592 5.645087 541.5701 20.41321 541.5701 20.41321
2006-01-04 23:02:00 541.1589 5.638604 541.1588 5.635806 541.5701 20.40946 541.5701 20.40946
2006-01-04 23:03:00 541.1589 5.638979 541.1588 5.635694 541.5704 20.42712 541.5704 20.42712
2006-01-04 23:04:00 541.1589 5.639619 541.1590 5.640691 541.5710 20.45848 541.5710 20.45848
2006-01-04 23:05:00 541.1590 5.640662 541.1591 5.641682 541.5715 20.47893 541.5715 20.4789
In the documentation you can read: "The current implementation of chron objects does not handle time zones nor daylight savings time." Thus, a solution would be to not use chron here.
Just use as.POSIXct.default:
as.POSIXct("2006-01-04 22:00:00", "%Y-%m-%d %H:%M:%S", tz="CET")
[1] "2006-01-04 22:00:00 CET"
Related
I have a vector representing time such as:
1. 04.05.2003 23:00:00.000 GMT+0200
2. 04.05.2003 23:00:00.000 GMT+0200
3. 04.05.2003 23:30:00.000 GMT+0200
4. 04.05.2003 23:45:00.000 GMT+0200
I want to have a date variable representing the day and within-day variation, and I tried to convert it to a date variable in R using:
as.POSIXct(variable,format="%d.%b.%Y %H:%M:%OS %Z")
But I've got an empty cell. I can't figure out how to convert it.
Use lubridate::dmy_hms :
variable <- c('4.05.2003 23:00:00.000 GMT+0200','04.05.2003 23:15:00.000 GMT+0200', '04.05.2003 23:30:00.000 GMT+0200')
variable <- lubridate::dmy_hms(variable)
variable
#[1] "2003-05-04 21:00:00 UTC" "2003-05-04 21:15:00 UTC" "2003-05-04 21:30:00 UTC"
Not sure about the time zone but you are using %b where you should be using %m:
as.POSIXct(variable,format="%d.%m.%Y %H:%M:%OS")
# "2003-05-04 23:00:00 GMT"
My actual data looks like:
8/8/2013 15:10
7/26/2013 10:30
7/11/2013 14:20
3/28/2013 16:15
3/18/2013 15:50
When I read this from the excel file, R reads it as:
41494.63
41481.44
41466.60
41361.68
41351.66
So I used as.POSIXct(as.numeric(x[1:5])*86400, origin="1899-12-30",tz="GMT") and I got:
2013-08-08 15:07:12 GMT
2013-07-26 10:33:36 GMT
2013-07-11 14:24:00 GMT
2013-03-28 16:19:12 GMT
2013-03-18 15:50:24 GMT
Why there is a difference in time? How to overcome it?
The problem is that either R of Excel is rounding the number to two decimals. When you convert the for example the cell with 8/8/2013 15:10 to text formatting (in Excel on Mac OSX), you get the number 41494.63194.
When you use:
as.POSIXct(41494.63194*86400, origin="1899-12-30",tz="GMT")
it will give you:
[1] "2013-08-08 15:09:59 GMT"
This is 1 second off from the original date (which is also an indication that 41494.63194 is rounded to five decimals).
Probably the best solution to do is export your excel-file to a .csv or a tab-separated .txt file and then read it into R. This gives me at least the correct dates:
> df
datum
1 8/8/2013 15:10
2 7/26/2013 10:30
3 7/11/2013 14:20
4 3/28/2013 16:15
5 3/18/2013 15:50
Given
x <- c("8/8/2013 15:10","7/26/2013 10:30","7/11/2013 14:20","3/28/2013 16:15","3/18/2013 15:50")
(which is read as a character vector),
try
x <- as.POSIXct(x, format = "%m/%d/%Y %H:%M", tz = "GMT")
It reads correctly as a POSIXct vector to me.
Maybe it is a matter of how R reads the data. Just an example here with lubridate seems to work well.
x <- "8/8/2013 15:10"
library(lubridate)
dmy_hm(x, tz = "GMT")
[1] "2013-08-08 15:10:00 GMT"
This is how it works over here on a Windows system. This is what a source Excel 2010 file looks like:
date num secs constant Rtime
(mm/dd/yyyy) (in Excel) (num*86400) (Windows) (secs-constant)
08/08/2013 15:10 41494.63 3585136200 2209161600 1375974600
07/26/2013 10:30 41481.44 3583996200 2209161600 1374834600
11/07/2013 14:20 41585.60 3592995600 2209161600 1383834000
03/28/2013 16:15 41361.68 3573648900 2209161600 1364487300
03/18/2013 15:50 41351.66 3572783400 2209161600 1363621800
Rtime <- c(1375974600,1374834600,1383834000,1364487300,1363621800)
as.POSIXct(Rtime,origin="1970-01-01",tz="GMT")
#[1] "2013-08-08 15:10:00 GMT" "2013-07-26 10:30:00 GMT"
#[3] "2013-11-07 14:20:00 GMT" "2013-03-28 16:15:00 GMT"
#[5] "2013-03-18 15:50:00 GMT"
Why this constant? Firstly, because Excel and Office generally is a mess when dealing with dates. Seriously, look over here: Why is 1899-12-30 the zero date in Access / SQL Server instead of 12/31?
2209161600 is the difference in seconds between the POSIXct start of 1970-01-01 and 1899-12-30, which is the 0 point in Excel on Windows.
dput(as.POSIXct(2209161600,origin="1899-12-30",tz="GMT"))
#structure(0, tzone = "GMT", class = c("POSIXct", "POSIXt"))
I am trying to convert a character string into a POSIXct date format and running into a problem with the time zone information.
The original character data looks like this:
SD$BGN_DTTM
[1] "1956-05-25 14:30:00 CST" "1956-06-05 16:30:00 CST" "1956-07-04 15:30:00 CST"
[4] "1956-07-08 08:00:00 CST" "1956-08-19 12:00:00 CST" "1956-12-23 00:50:00 CST"
but when I attempt to convert using as.POSIXct , this happens:
SD$BGN_DTTM <- as.POSIXct(SD$BGN_DTTM)
[1] "1956-05-25 14:30:00 PDT" "1956-06-05 16:30:00 PDT" "1956-07-04 15:30:00 PDT"
[4] "1956-07-08 08:00:00 PDT" "1956-08-19 12:00:00 PDT" "1956-12-23 00:50:00 PST"
It looks like the function isn't reading the time zone I've specified. Since my computer is on PDT, it looks like it has used that instead. Note also that it has appended PST to the last date (seems odd). Can anyone tell me what is going on here, and whether there is a method to get R to read the time zone information as shown?
This would still have the problem you noticed with daylight/standard times:
> strptime(test, format="%Y-%m-%d %H:%M:%S", tz="America/Chicago")
[1] "1956-05-25 14:30:00 CDT" "1956-06-05 16:30:00 CDT"
[3] "1956-07-04 15:30:00 CDT" "1956-07-08 08:00:00 CDT"
[5] "1956-08-19 12:00:00 CDT" "1956-12-23 00:50:00 CST"
The strptime function refuses to honor the "%Z" format for input (which in its defense is documented.) Many people have lost great gobs of hair and probably some keyboards into monitors in efforts to get R timezones working to their (dis?)satisfaction.
As we all know, time is a relative thing. Storing time as UTC/GMT or relative to UTC/GMT will make sure that daylight savings etc only come into play when you want them to, as per: Does UTC observe daylight saving time?
So, if:
x <- c("1956-05-25 14:30:00 CST","1956-06-05 16:30:00 CST", "1956-07-04 15:30:00 CST",
"1956-07-08 08:00:00 CST", "1956-08-19 12:00:00 CST","1956-12-23 00:50:00 CST")
You can find out that CST is 6 hours behind UTC/GMT (as opposed to CDT, which is daylight savings time and is 7 hours behind)
Therefore:
out <- as.POSIXct(x,tz="ETC/GMT+6")
will represent CST without any daylight savings shift to CDT.
That way when or if you convert to local central timezones, the proper CST time will be returned without changing the actual data for daylight savings. (i.e. - when R prints CDT, it is only shifting the display of the time forward an hour, but the underlying numerical data is not changed. The last case displays as expected when standard time kicks back in):
attr(out,"tzone") <- "America/Chicago"
out
#[1] "1956-05-25 15:30:00 CDT" "1956-06-05 17:30:00 CDT" "1956-07-04 16:30:00 CDT"
#[4] "1956-07-08 09:00:00 CDT" "1956-08-19 13:00:00 CDT" "1956-12-23 00:50:00 CST"
I.e. - for case 1, 15:30 CDT == 14:30 CST - as originally specified, and when daylight savings stops, for case 6, 00:50 CST == 00:50 CST as originally specified.
Comparing this final out to the other answer, you can see there is an actual numerical time difference of one hour for all the daylight savings cases:
out - strptime(x, format="%Y-%m-%d %H:%M:%S", tz="America/Chicago")
#Time differences in secs
#[1] 3600 3600 3600 3600 3600 0
I'm still trying to better understand how mlply works. Here is a simplified version of my dataset:
days <- list(c(as.POSIXct("2010-08-29 00:00:00 EDT"), as.POSIXct("2010-08-30 00:00:00 EDT")))
day2 <- list(c(as.POSIXct("2010-07-22 00:00:00 EDT"), as.POSIXct("2010-07-23 00:00:00 EDT"), as.POSIXct("2010-07-24 00:00:00 EDT")))
days <- append(day2, days)
arrivals <- data.frame(date=as.POSIXct("2010-08-29 21:00:00 EDT"), size=72)
arrivals <- rbind(arrivals, c("2010-07-22 17:30:00 EDT",84))
using mapply with pmax to pick the maximum between days and arrivals, I get the following:
starting <- mapply(function(x,y){pmax(x,y)},days,arrivals$date)
starting[[1]]
"2010-08-29 21:00:00 EDT" "2010-08-29 21:00:00 EDT" "2010-08-29 21:00:00 EDT"
I'm sure the next version using mlply is not the equivalent and is obviously my error but, I'm not quite sure why the output differs.
starts <- mlply( cbind(arrivals$date,days), function(date,days){pmax(date,days)})
as.POSIXct(starts[[1]], origin='1970-1-1')
[1] "2010-08-30 02:00:00 EDT" "2010-08-30 02:00:00 EDT" "2010-08-30 02:00:00 EDT"
Ideally, I'm looking how to rewrite the mapply statement using mlply.
Thanks in advance,
--JT
Compare
> starts[[1]]
[1] 1283112000 1283112000 1283112000
> as.numeric(starting[[1]])
[1] 1283112000 1283112000 1283112000
>
POSIX references to UTC/GMT. You appear to be 5 hours ahead in your example. This is an output issue internally they appear to be the same times. Further comment is difficult. It would depend on what OS you are running. It could be affected by your locale setings etc.
Also as.POSIXct gives an unexpected timezone suggests there maybe an issue with as.POSIXct.date but Im not sure if this is still an issue.
I have an irregular time series and am using xts's endpoints to get hourly indices of my time series.
endpoints(data, on="hours")
I am using this in order to calculate hourly in such fashion
period.apply(data, INDEX=endpoints(data, on="hours"), FUN=mean)
The problem, however, is that function endpoints returns two consecutive indices (thus for the same hour).
> endpoints(data, on="hours")[7201:7220]
[1] 87077 87078 87089 87101 87113 87125 87137 87149 87162 87175 87187 87199 87211 87223 87235 87247 87259 87271 87283 87295
If we take a look which datetimes they represent:
data[endpoints(data, on="hours")[7201:7220]]
we notice that these are
> data[endpoints(data, on="hours")[7201:7220]]
jstimestamp X61757 X61754 X61760 X61753 X61758 X61762 X61756 X61759 X61761 X61755 X61752
2007-10-28 01:55:00 1.193529e+12 938.7 1339.6 450.8 799.4 850.0 1653.6 622.3 159.6 4415.4 681.6 1421.0
2007-10-28 02:00:00 1.193530e+12 946.0 1326.3 437.8 799.9 829.3 1644.1 629.0 182.8 4413.7 688.5 1397.2
2007-10-28 02:55:00 1.193533e+12 916.4 1337.0 432.3 778.0 838.6 1581.5 616.8 166.0 4282.8 670.9 1361.8
2007-10-28 03:55:00 1.193540e+12 909.1 1273.8 446.9 765.4 836.2 1559.7 599.5 163.8 4191.2 667.9 1373.3
2007-10-28 04:55:00 1.193544e+12 930.8 1320.3 426.4 758.3 834.8 1567.5 594.0 152.7 4130.2 688.4 1377.3
2007-10-28 05:55:00 1.193547e+12 943.5 1355.1 447.7 784.6 856.9 1592.4 629.0 163.8 4150.3 686.2 1391.5
2007-10-28 06:55:00 1.193551e+12 1018.3 1443.2 463.7 841.0 877.1 1677.3 670.8 161.8 4310.8 708.9 1441.3
2007-10-28 07:55:00 1.193554e+12 1052.2 1525.7 472.5 887.7 903.6 1734.9 716.1 199.5 4390.9 722.7 1504.3
2007-10-28 08:52:34 1.193558e+12 1167.1 1570.3 519.2 933.0 957.5 1919.9 795.0 225.5 4706.4 733.0 1561.1
2007-10-28 09:55:00 1.193562e+12 1224.1 1653.4 547.2 992.1 1039.9 2053.5 797.2 217.9 4952.1 739.4 1610.6
2007-10-28 10:55:00 1.193565e+12 1233.4 1745.9 569.8 1038.1 1060.8 2145.3 778.0 231.6 5182.4 759.1 1621.5
2007-10-28 11:55:00 1.193569e+12 1217.8 1751.6 581.3 1056.6 1084.2 2177.6 791.0 246.6 5296.4 758.6 1642.0
2007-10-28 12:55:00 1.193572e+12 1212.5 1786.3 589.2 1034.4 1069.2 2191.2 784.4 242.2 5357.5 728.5 1670.8
2007-10-28 13:55:00 1.193576e+12 1200.1 1694.8 586.1 1059.3 1063.2 2174.2 773.3 248.6 5336.7 747.8 1650.6
2007-10-28 14:55:00 1.193580e+12 1188.1 1736.7 577.7 1049.9 1041.1 2168.4 771.5 233.6 5332.9 746.9 1651.5
2007-10-28 15:55:00 1.193583e+12 1187.9 1696.8 574.1 1056.9 1060.4 2152.6 790.4 255.8 5326.9 740.6 1653.0
2007-10-28 16:55:00 1.193587e+12 1250.3 1793.2 580.8 1048.1 1116.8 2232.6 810.2 257.1 5360.4 765.6 1688.4
2007-10-28 17:55:00 1.193590e+12 1325.9 1796.0 614.7 1148.7 1134.2 2368.6 816.9 301.6 5530.3 772.7 1673.1
2007-10-28 18:55:00 1.193594e+12 1433.2 1966.3 697.8 1183.6 1276.2 2615.3 928.2 324.1 5805.4 762.7 1853.9
2007-10-28 19:55:00 1.193598e+12 1436.2 1906.2 678.5 1196.9 1217.9 2575.4 882.2 337.1 5809.5 789.7 1852.5
The problem is that hour 2007-10-28 02 is represented twice. My understanding of hourly endpoints is that this should not happen. Am I doing something wrong here?
Edit: This was indeed daylight savings in effect as per Dirk Eddelbuettel's answer below. To resolve the problem, I needed to:
convert to timezone UTC when parsing the data (the default timezone that was applied was my machine's -> CET)
after calculating hourly means as:
data.hourly = period.apply(data, INDEX=endpoints(data, on="hours"), FUN=mean)
I needed to manually override data.hourly's time zone as well (again, default computer's time zone was used):
indexTZ(data.hourly) <- "UTC"
Just a hunch but this may be yet another manifestation of TZ and a switch from standard time to daylight savings time. Try to convert to UTC and see what happens.
Alternatively, move your data by a week (or month or ...) and see if the same issue arises.
And if that's the issue. then this is not really a bug as the 'fall back' night has indeed two hours starting at 02:00am.
Edit: And while you didn't say what your timezone was, and while it doesn't work in mine (US Central), we can exhibit the issue for the European continent:
R> ISOdate(2007, 10, 28, 0, 30, 0, tz="Europe/Berlin") + seq(0,4)*60*60
[1] "2007-10-28 00:30:00 CEST" "2007-10-28 01:30:00 CEST"
[3] "2007-10-28 02:30:00 CEST" "2007-10-28 02:30:00 CET"
[5] "2007-10-28 03:30:00 CET"
R>
See how
the TZ attribute switches from CEST to CET
there are in fact two 02:30:00 hours, one for each of the two timezones.
So no bug in xts but a feature in the data.