Time series in R for every couple of hours? - r

Time Customer Count
11:00 13
13:00 25
15:00 22
17:00 21
19:00 15
21:00 10
I have the above data frame for the number of customers coming into a small shop from 11:00 (11am) to 21:00 (9pm).
I need to make a time series of this data but I'm having trouble declaring the ts function for this data. There is only data for 10 hours from 11 to 9 and it's only taken every two hours. I can't decide how to declare the frequency.
If anyone could help, would be really grateful. Thanks in advance

One option is to create sequence 2 hours basis and then filter for time-range as:
library(lubridate)
v <- seq(from = as.POSIXct("2017-01-01 11:00"),
to = as.POSIXct("2017-01-05 21:00"), by = "2 hour")
v[hour(v)>=9 & hour(v)<=21]
# [1] "2017-01-01 11:00:00 GMT" "2017-01-01 13:00:00 GMT" "2017-01-01 15:00:00 GMT"
# [4] "2017-01-01 17:00:00 GMT" "2017-01-01 19:00:00 GMT" "2017-01-01 21:00:00 GMT"
# [7] "2017-01-02 09:00:00 GMT" "2017-01-02 11:00:00 GMT" "2017-01-02 13:00:00 GMT"
# [10] "2017-01-02 15:00:00 GMT" "2017-01-02 17:00:00 GMT" "2017-01-02 19:00:00 GMT"
# [13] "2017-01-02 21:00:00 GMT" "2017-01-03 09:00:00 GMT" "2017-01-03 11:00:00 GMT"
# [16] "2017-01-03 13:00:00 GMT" "2017-01-03 15:00:00 GMT" "2017-01-03 17:00:00 GMT"
# so on few more rows.
The above seq generates time-series from 1st January till 5th January at 2 hours interval. A filter condition is applied once series has been generated. The filter condition considers only time-date for which hour >= 9 and hour <= 21. This will provide desired time-series.

Related

Generate an ordered series of datetime

I am working in R.
I have to generate a series of dates and times. In particular, I would like to have two data points per day, hence to assign twice each date with a different time, for instance:
"2001-05-13 00:00:00"
"2001-05-13 12:00:00"
"2001-05-14 00:00:00"
"2001-05-14 12:00:00"
I found the following code to produce a series of dates:
seq(as.Date("2000/1/1"), as.Date("2003/1/1"), by = 0.5)
Nevertheless, even if I set the by = 0.5, the code returns only a date , not a datetime.
Any idea how to produce a series of datetimes?
as.Date will produce only dates, use as.POSIXct to produce date-time.
seq(as.POSIXct("2000-01-01 00:00:00", tz = 'UTC'),
as.POSIXct("2003-01-01 00:00:00", tz = 'UTC'), by = '12 hours')
# [1] "2000-01-01 00:00:00 UTC" "2000-01-01 12:00:00 UTC"
# [3] "2000-01-02 00:00:00 UTC" "2000-01-02 12:00:00 UTC"
# [5] "2000-01-03 00:00:00 UTC" "2000-01-03 12:00:00 UTC"
# [7] "2000-01-04 00:00:00 UTC" "2000-01-04 12:00:00 UTC"
# [9] "2000-01-05 00:00:00 UTC" "2000-01-05 12:00:00 UTC"
#[11] "2000-01-06 00:00:00 UTC" "2000-01-06 12:00:00 UTC"
#[13] "2000-01-07 00:00:00 UTC" "2000-01-07 12:00:00 UTC"
#...
#...

Read character datetimes without timezones

I am trying to import in R a text file including datetimes. Times are stored in character format, without timezone information, but we know it is French time (Europe/Paris).
An issue arise for the days of timezone change: e.g. there is a time change from 2018-10-28 03:00:00 CEST to 2018-10-28 02:00:00 CET, thus we have duplicates in our character format, and R cannot tell wether it is CEST or CET.
Consider the following example:
data_in <- "date,val
2018-10-28 01:30:00,25
2018-10-28 02:00:00,26
2018-10-28 02:30:00,27
2018-10-28 02:00:00,28
2018-10-28 02:30:00,29
2018-10-28 03:00:00,30"
library(readr)
data <- read_delim(data_in, ",", locale = locale(tz = "Europe/Paris"))
We end up having duplicates in our dates:
data$date
[1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CET" "2018-10-28 02:00:00 CEST"
[5] "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
Expected output would be:
data$date
[1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CEST" "2018-10-28 02:00:00 CET"
[5] "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
Any idea how to solve the issue (besides telling people to use UTC or ISO formats). I guess the only way is to suppose the dates are sorted, so we can tell the first ones are CEST.
If you are certain that your time is always-increasing, then you can look for an apparent decrease (of time-of-day) and manually insert the TZ offset to the string, then parse as usual. I added some logic to look for this decrease only around 2-3am so that if you have multiple days of data spanning midnight, you would not get a false-alarm.
data <- read.csv(text = data_in)
fakedate <- as.POSIXct(gsub("^[-0-9]+ ", "2000-01-01 ", data$date))
decreases <- cumany(grepl(" 0[23]:", data$date) & c(FALSE, diff(fakedate) < 0))
data$date <- paste(data$date, ifelse(decreases, "+0100", "+0200"))
data
# date val
# 1 2018-10-28 01:30:00 +0200 25
# 2 2018-10-28 02:00:00 +0200 26
# 3 2018-10-28 02:30:00 +0200 27
# 4 2018-10-28 02:00:00 +0100 28
# 5 2018-10-28 02:30:00 +0100 29
# 6 2018-10-28 03:00:00 +0100 30
as.POSIXct(data$date, format="%Y-%m-%d %H:%M:%S %z", tz="Europe/Paris")
# [1] "2018-10-28 01:30:00 CEST" "2018-10-28 02:00:00 CEST" "2018-10-28 02:30:00 CEST"
# [4] "2018-10-28 02:00:00 CET" "2018-10-28 02:30:00 CET" "2018-10-28 03:00:00 CET"
My use of "2000-01-01" was just some non-DST day so that we can parse the timestamp into POSIXt and calculate a diff on it. (If we didn't insert a date, we could still use as.POSIXct with a format, but if you ever ran this on one of the two DST days, you might get different results since as.POSIXct("01:02:03", format="%H:%M:%S") always assumes "today".
This is obviously a bit fragile with its assumptions, but perhaps it'll be good enough for what you need.

Create a time series by 30 minute intervals

I am trying to create a time series with 30 min intervals. I used the following command with the output also shown:
ts = seq(as.POSIXct("2009-01-01 00:00"), as.POSIXct("2014-12-31 23:30"),by = "hour")
"2010-02-21 12:00:00 EST" "2010-02-21 13:00:00 EST" "2010-02-21 14:00:00 EST"
When I change it to by ="min" it changes to be every minute.
How do I create a time series with every 30 minute intervals?
You can specify minutes in the by argument, and pass the time zone "UTC" as Adrian pointed out. Check ?seq.POSIXt for more details about the by argument specified as a character string:
A character string, containing one of "sec", "min", "hour", "day",
"DSTday", "week", "month", "quarter" or "year". This can optionally be
preceded by a (positive or negative) integer and a space, or followed
by "s".
ts <- seq(as.POSIXct("2017-01-01", tz = "UTC"),
as.POSIXct("2017-01-02", tz = "UTC"),
by = "30 min")
head(ts)
Output
[1] "2017-01-01 00:00:00 UTC"
[2] "2017-01-01 00:30:00 UTC"
[3] "2017-01-01 01:00:00 UTC"
[4] "2017-01-01 01:30:00 UTC"
[5] "2017-01-01 02:00:00 UTC"
[6] "2017-01-01 02:30:00 UTC"
Default units are seconds. So just do 1800 seconds to get 30 minutes.
ts = seq(as.POSIXct("2009-01-01 00:00"), as.POSIXct("2014-12-31 23:30"),by = 1800)
ts[1:20]
[1] "2009-01-01 00:00:00 EST" "2009-01-01 00:30:00 EST" "2009-01-01 01:00:00 EST" "2009-01-01 01:30:00 EST" "2009-01-01 02:00:00 EST"
[6] "2009-01-01 02:30:00 EST" "2009-01-01 03:00:00 EST" "2009-01-01 03:30:00 EST" "2009-01-01 04:00:00 EST" "2009-01-01 04:30:00 EST"
[11] "2009-01-01 05:00:00 EST" "2009-01-01 05:30:00 EST" "2009-01-01 06:00:00 EST" "2009-01-01 06:30:00 EST" "2009-01-01 07:00:00 EST"
[16] "2009-01-01 07:30:00 EST" "2009-01-01 08:00:00 EST" "2009-01-01 08:30:00 EST" "2009-01-01 09:00:00 EST" "2009-01-01 09:30:00 EST"

Create sequence of dates and times in R without time zones

I need to create a sequence of dates and times in R, increasing in 15 minute periods.
Currently, I am doing this:
datestimes=seq(as.POSIXlt("2011-01-01 00:00:00"), as.POSIXlt("2015-09-30 23:45:00"), by="15 min")
I should have one reading for each time in the year. The problem is that since it is adjusting for BST, I get two values for certain dates in October.
anm=aggregate(datestimes, by=list(datestimes$datestimes), FUN=length)
anm[which(anm$datestimes>1),]
Group.1 datestimes X.Date.
28993 2011-10-30 01:00:00 2 2
28994 2011-10-30 01:15:00 2 2
28995 2011-10-30 01:30:00 2 2
28996 2011-10-30 01:45:00 2 2
63933 2012-10-28 01:00:00 2 2
63934 2012-10-28 01:15:00 2 2
63935 2012-10-28 01:30:00 2 2
63936 2012-10-28 01:45:00 2 2
98873 2013-10-27 01:00:00 2 2
98874 2013-10-27 01:15:00 2 2
98875 2013-10-27 01:30:00 2 2
98876 2013-10-27 01:45:00 2 2
133813 2014-10-26 01:00:00 2 2
133814 2014-10-26 01:15:00 2 2
133815 2014-10-26 01:30:00 2 2
133816 2014-10-26 01:45:00 2 2
I tried using the as.chron command since this does not use timezones, but it will not allow increments of 15 minutes which is what I need.
The problem is that since it is adjusting for BST, I get two values for certain dates in October.
That's because the 'fall back' (mnemonic for daylight savings times adjustment adding an hour in the fall) happens under human time and that is what you get by default unless you override it.
R> seq(as.POSIXlt("2012-10-28 00:00:00", tz="UTC"),
+ as.POSIXlt("2012-10-28 03:00:00", tz="UTC"), by="15 min")
[1] "2012-10-28 00:00:00 UTC" "2012-10-28 00:15:00 UTC"
[3] "2012-10-28 00:30:00 UTC" "2012-10-28 00:45:00 UTC"
[5] "2012-10-28 01:00:00 UTC" "2012-10-28 01:15:00 UTC"
[7] "2012-10-28 01:30:00 UTC" "2012-10-28 01:45:00 UTC"
[9] "2012-10-28 02:00:00 UTC" "2012-10-28 02:15:00 UTC"
[11] "2012-10-28 02:30:00 UTC" "2012-10-28 02:45:00 UTC"
[13] "2012-10-28 03:00:00 UTC"
R>
The example I show here covers the same subset as above but without the fall back as we now impose UTC as a time zone. And UTC has be construction no daylight savings adjustment.
Maybe try this (UTC timezone should not allow any duplicate):
datestimes=seq(as.POSIXlt("2015-09-01 00:00:00", tz="UTC"),
as.POSIXlt("2015-10-30 23:45:00", tz="UTC"),
by="15 min")

Generate a working day sequence in R

I want to generate a working week / working day sequence (Monday-Friday; 8am - 5pm) in R. However I only figured out how to extract a working week (Monday-Friday) with 24 hours.
library(timeDate)
start <- as.POSIXct("2010-01-01")
interval <- 60
seq_1 <- as.timeDate(seq(from=start, by=interval*60, length.out = 200))
seq_2 <- seq_1[isWeekday(seq_1)]; seq_2
dayOfWeek(seq_2)
Is there a similar function which can extract only working hours? Thanks
You can use function format to obtain hours
seq_2[as.numeric(format(seq_2,'%H')) %in% 8:15 ]
Select weekdays and then repeat with frequency equal to the desired hours. I'm afraid I missed your 8 o;clock start and used the phrase "9 to 5" as my guide:
twoyears <- seq.Date(as.Date("2010-01-01"), by='day', length.out=365*2)
twoworkyrs <- twoyears[isWeekday(twoyears, wday = 1:5)]
twoworkyrs[ 1:10]
# [1] "2010-01-01" "2010-01-04" "2010-01-05" "2010-01-06" "2010-01-07" "2010-01-08"
# [7] "2010-01-11" "2010-01-12" "2010-01-13" "2010-01-14"
workhours <- as.POSIXct( as.numeric(rep(twoworkyrs, each=9))*24*3600 + # weekdays
(9:17)*3600 , n # working hours
origin="1970-01-01", tz="America/LosAngeles")
#----- First two weeks ----------------
> workhours[1:90]
[1] "2010-01-01 09:00:00 UTC" "2010-01-01 10:00:00 UTC" "2010-01-01 11:00:00 UTC"
[4] "2010-01-01 12:00:00 UTC" "2010-01-01 13:00:00 UTC" "2010-01-01 14:00:00 UTC"
[7] "2010-01-01 15:00:00 UTC" "2010-01-01 16:00:00 UTC" "2010-01-01 17:00:00 UTC"
[10] "2010-01-04 09:00:00 UTC" "2010-01-04 10:00:00 UTC" "2010-01-04 11:00:00 UTC"
[13] "2010-01-04 12:00:00 UTC" "2010-01-04 13:00:00 UTC" "2010-01-04 14:00:00 UTC"
[16] "2010-01-04 15:00:00 UTC" "2010-01-04 16:00:00 UTC" "2010-01-04 17:00:00 UTC"
[19] "2010-01-05 09:00:00 UTC" "2010-01-05 10:00:00 UTC" "2010-01-05 11:00:00 UTC"
[22] "2010-01-05 12:00:00 UTC" "2010-01-05 13:00:00 UTC" "2010-01-05 14:00:00 UTC"
[25] "2010-01-05 15:00:00 UTC" "2010-01-05 16:00:00 UTC" "2010-01-05 17:00:00 UTC"
[snipped
I must admit that timezone conversions are one of my weakest suits.

Resources