Create hourly intervals without regard to day-month-year in R - r

I have a list of dates as this:
"2014-01-20 18:47:09 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:59 GMT"
"2014-01-20 18:46:41 GMT"
I used this code to split the dates in four-hour intervals
data.frame(table(cut(datenormord, breaks = "4 hour")))
Results are these:
2013-07-22 06:00:00 144
2013-07-22 11:00:00 268
2013-07-22 16:00:00 331
2013-07-22 21:00:00 332
What I want is to see how many observations there are in each interval of four hours but not taking account of days months and years. For example I would like to see how many observations there are from 00:00 to 04:00 by adding observations of everyday of every year contained in my dataset
For example i want something like this:
01:00:00 1230
06:00:00 2430
11:00:00 3230

You can try removing the dates from your date using strftime then reformatting them to a date, which will just add the current day, year and month to all the datapoints. You can them break and count like you posted.
datenormord<-c("2014-01-20 01:47:09 GMT", "2014-01-20 07:46:59 GMT","2014-01-20 13:46:59 GMT" ,"2014-01-20 18:46:59 GMT" ,"2014-01-20 18:46:41 GMT")
datenormord<-strftime(as.POSIXlt(datenormord), format="%H:%M:%S")
datenormord<-as.POSIXlt(datenormord, format="%H:%M:%S")
result<-data.frame(table(cut(datenormord, breaks = "4 hour")))
You can remove the date in the final data frame as well:
result$Var1<-with(result,format(strftime(Var1,format="%H:%M")))

Related

Create a time series with a row every 15 minutes

I'm having trouble creating a time series (POSIXct or dttm column) with a row every 15 minutes.
Something that will look like this for every 15 minutes between Jan 1st 2015 and Dec 31st 2016 (here as month/day/year hour:minutes):
1/15/2015 0:00
1/15/2015 0:15
1/15/2015 0:30
1/15/2015 0:45
1/15/2015 1:00
A loop starting date of 01/01/2015 0:00 and then adding 15 minutes until 12/31/2016 23:45?
Does anyone has an idea of how this can be done easily?
Little bit easier to read
library(lubridate)
seq(ymd_hm('2015-01-01 00:00'),ymd_hm('2016-12-31 23:45'), by = '15 mins')
intervals.15.min <- 0 : (366 * 24 * 60 * 60 / 15 / 60)
res <- as.POSIXct("2015-01-01","GMT") + intervals.15.min * 15 * 60
res <- res[res < as.POSIXct("2016-01-01 00:00:00 GMT")]
head(res)
# "2015-01-01 00:00:00 GMT" "2015-01-01 00:15:00 GMT" "2015-01-01 00:30:00 GMT"
tail(res)
# "2015-12-31 23:15:00 GMT" "2015-12-31 23:30:00 GMT" "2015-12-31 23:45:00 GMT"

seq.Date giving problems

I have recently updated R to version 3.2.3 and now I found a problem using seq with dates:
date1<-as.POSIXct("2014-01-30 02:00:00")
date2<-as.POSIXct("2014-12-24 11:00:00")
seq(date1,date2,by="month")
#[1] "2014-01-30 02:00:00 CET" "2014-03-02 02:00:00 CET"
#[3] NA "2014-04-30 02:00:00 CEST"
#[5] "2014-05-30 02:00:00 CEST" "2014-06-30 02:00:00 CEST"
#[7] "2014-07-30 02:00:00 CEST" "2014-08-30 02:00:00 CEST"
#[9] "2014-09-30 02:00:00 CEST" "2014-10-30 02:00:00 CET"
#[11] "2014-11-30 02:00:00 CET"
I don't understand where the NA comes from. I have tried on different machines with both the same R version as mine or a previous one and in the place of that NA they correctly give "2014-03-30". Furthermore, if I change the year in the dates from 2014 to 2015, no NAs are returned!
I guess that during the installation something in my locale was modified but I cannot understand how to fix the problem.
Sys.getlocale() returns:
"en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
and my system is a Mac Book Pro with Maverick.
Thanks for any help!
I was guessing Germany and here's what the situation was in that CET timezone on Mar 30 (a Sunday)
http://www.timetemperature.com/utc-converter/utc-20140330-germany-12.html
UTC or GMT Time Germany
Sunday 30th March 2014 00:00:00 Sun 01:00 AM
Sunday 30th March 2014 01:00:00 Sun 03:00 AM*
Changing the setting to Italy, I get the same result:
UTC or GMT Time Italy
Sunday 30th March 2014 00:00:00 Sun 01:00 AM
Sunday 30th March 2014 01:00:00 Sun 03:00 AM*
The key here is to be suspicious of weirdness when the time is in the early morning hours of a Spring or Fall date, or when calculations of intervals crosses such dates. The rules change from year to year, and since countries often do the switch on a Sunday or Saturday morning, the exact dates jump around.
The changes vary by country and in the US they may vary by state or even "sub-state" boundaries: In Washington State in 2014 you find the change happening onhte second Sunday of March:
http://www.timetemperature.com/utc-converter/utc-20140309-us-washington+state-12.html
UTC or GMT Time US-Washington State
snipped several times
Sunday 9th March 2014 07:00:00 Sat 11:00 PM
Sunday 9th March 2014 08:00:00 Sun 12:00 AM
Sunday 9th March 2014 09:00:00 Sun 01:00 AM
Sunday 9th March 2014 10:00:00 Sun 03:00 AM*
Sunday 9th March 2014 11:00:00 Sun 04:00 AM*
I'm in the same TZ as Washington state. With a Sys.timezone set, one can reproduce the NA, at least on a Mac. The implementation of times and timezones is OS-specific, so it's possible to see variations in how these weirdities get visible:
> Sys.timezone(location = TRUE)
[1] "America/Los_Angeles"
> date1<-as.POSIXct("2014-01-09 02:00:00")
> date2<-as.POSIXct("2014-12-09 11:00:00")
> seq(date1,date2,by="month")
[1] "2014-01-09 02:00:00 PST" "2014-02-09 02:00:00 PST"
[3] NA "2014-04-09 02:00:00 PDT"
[5] "2014-05-09 02:00:00 PDT" "2014-06-09 02:00:00 PDT"
[7] "2014-07-09 02:00:00 PDT" "2014-08-09 02:00:00 PDT"
[9] "2014-09-09 02:00:00 PDT" "2014-10-09 02:00:00 PDT"
[11] "2014-11-09 02:00:00 PST" "2014-12-09 02:00:00 PST"
By inspecting the relevant code in seq.POSIXt there appears that a call to seq with by="month" works as follows
[some manipulation of the data]
conversion of data1 & data2 to POSIXlt
creation of a sequence of months numbers spanning the interval from data1 to data2 (in this case 0,...,11)
manual update of data1$mon to this sequence of months (and up to this point the dates are all properly handled)
finally, the resulting dates are converted to POSIXct and here the NA shows up
while the resulting NA is technically correct, since it is trying to convert an invalid date ("2014-01-30 02:00:00 CET", which does not exist) to POSIXct, could the issue be possibly worked around by passing through difftimes? [*]
not sure it is worth, though...
[*] here by difftimes I mean to add the correct number of seconds to the dates instead of just adding the months...

as.Date converts wrong date from POSIXct data

I have 3848 rows of POSIXct data - stop times of bike trips in the month of April. As you can see, all of the data is in POSIXct format and is within the range of the month of April.
length(output2_stoptime)
[1] 3848
head(output2_stoptime)
[1] "2015-04-01 17:19:27 EST" "2015-04-02 07:26:06 EST" "2015-04-08 10:09:37 EST"
[4] "2015-04-12 20:08:00 EST" "2015-04-13 17:53:11 EST" "2015-04-14 07:17:34 EST"
class(output2_stoptime)
[1] "POSIXct" "POSIXt"
range(output2_stoptime)
[1] "2015-04-01 00:34:29 EST" "2015-04-30 20:49:22 EST"
Sys.timezone()
[1] "EST"
However, when I try converting this into a table of stop times per day, I get 4 dates that are converted as the 1st of May. I thought this might be occurring due to the different system timezone as I am located in Europe at the moment, but even after setting the timezone to EST, the problem persists. For example:
by_day_output2 = as.data.frame(as.Date(output2_stoptime), tz = "EST")
colnames(by_day_output2)[1] = "SUM"
movements_Apr = as.data.frame(table(by_day_output2$SUM))
colnames(movements_Apr)[1] = "DATE"
tail(movements_Apr)
DATE Freq
26 2015-04-26 96
27 2015-04-27 125
28 2015-04-28 145
29 2015-04-29 151
30 2015-04-30 99
31 2015-05-01 4
Why are the four dates converting improperly when the time zones of the data and the system match? None of the data falls within May.

How to subtract a number in a file name in as.POSIXct in R?

I have several files :
dir<- list.files("/data/test", "*.img$", full.names = TRUE)
dir:
/data/test/data.df_df_fg.20141231.jh.ds.0930.edfr.img
/data/test/data.df_df_fg.20141231.jh.ds.1030.edfr.img
/data/test/data.df_df_fg.20141231.jh.ds.1130.edfr.img
I want to extract the date from the file names:
dt <- as.POSIXct(strptime(basename(dir),"data.df_df_fg.%Y%m%d.jh.ds.%H%M.edfr", tz = "GMT"))
dt:
[1] "2014-12-31 09:30:00 GMT"
[2]"2014-12-31 10:30:00 GMT"
[3] "2014-12-31 11:30:00 GMT"
What I need is to subtract 1 hour from dt so I get:
[1] "2014-12-31 08:30:00 GMT"
[2]"2014-12-31 09:30:00 GMT"
[3] "2014-12-31 10:30:00 GMT"
and if the hour is 2014-12-31 24:30:00 GMT , make it 23:30:00 GMT but also reduce the date to 2014-12-30.because we are already in the previous day
Try:
dt-as.difftime(1,units="hours")

R_Sub setting a timeseries by specific hour of the day

I have a long time series (zoo), 'obs' with one hour timestep and three years data
> head(obs)
time obs
2009-12-22 01:00:00 23.708
2009-12-22 02:00:00 23.708
2009-12-22 03:00:00 23.708
2009-12-22 04:00:00 23.708
2009-12-22 06:00:00 23.708
2009-12-22 07:00:00 23.708
I am only interested in the readings of 01:00:00 of each day and want to subset this series only. Is there anyway to do it? I am already using 'xts' package but couldn't find a way.
Try:
subset(obs, as.numeric(format(obs$time, "%H"))==1)
This extracts the hours from obs$time in a 0 to 24 format, and subsets only the times where it is equal to 1.
xts is the right package. What you are interested in is the function
[.xts (Extract subsets of xts Objects)
For example:
obs["T01:00/T01:59"]
will return all the observation where the "T" time is between 01:00 and 01:59.

Resources