Divide time-series data into weekday and weekend datasets using R - r

I have dataset consisting of two columns (timestamp and power) as:
str(df2)
'data.frame': 720 obs. of 2 variables:
$ timestamp: POSIXct, format: "2015-08-01 00:00:00" "2015-08-01 01:00:00" " ...
$ power : num 124 149 118 167 130 ..
This dataset is of entire one month duration. I want to create two subsets of it - one containing the weekend data, and other one containing weekday (Monday - Friday) data. In other words, one dataset should contain data corresponding to saturday and sunday and the other one should contain data of other days. Both of the subsets should retain both of the columns. How can I do this in R?
I tried to use the concept of aggregate and split, but I am not clear in the function parameter (FUN) of aggregate, how should I specify a divison of dataset.

You can use R base functions to do this, first use strptime to separate date data from first column and then use function weekdays.
Example:
df1<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00"),power=1:2)
df1$day<-strptime(df1[,1], "%Y-%m-%d")
df1$weekday<-weekdays(df1$day)
df1
timestamp power day weekday
2015-08-01 00:00:00 1 2015-08-01 Saturday
2015-10-13 00:00:00 2 2015-10-13 Tuesday

Building on top of #ShruS example:
df<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00", "2015-10-11 00:00:00", "2015-10-14 00:00:00"))
df$day<-strptime(df[,1], "%Y-%m-%d")
df$weekday<-weekdays(df$day)
df1 = subset(df,df$weekday == "Saturday" | df$weekday == "Sunday")
df2 = subset(df,df$weekday != "Saturday" & df$weekday != "Sunday")
> df
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
3 2015-10-11 00:00:00 2015-10-11 Sunday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
> df1
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
3 2015-10-11 00:00:00 2015-10-11 Sunday
> df2
timestamp day weekday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
4 2015-10-14 00:00:00 2015-10-14 Wednesday

Initially, I tried for complex approaches using extra libraries, but at the end, I came out with a basic approach using R.
#adding day column to existing set
df2$day <- weekdays(as.POSIXct(df2$timestamp))
# creating two data_subsets, i.e., week_data and weekend_data
week_data<- data.frame(timestamp=factor(), power= numeric(),day= character())
weekend_data<- data.frame(timestamp=factor(),power=numeric(),day= character())
#Specifying weekend days in vector, weekend
weekend <- c("Saturday","Sunday")
for(i in 1:nrow(df2)){
if(is.element(df2[i,3], weekend)){
weekend_data <- rbind(weekend_data, df2[i,])
} else{
week_data <- rbind(week_data, df2[i,])
}
}
The datasets created, i.e., weekend_data and week_data are my required sub datasets.

Related

How do I recycle a character vector in R?

I have a list of every day from 2018-01-01 to 2018-06-01. It is a vector and it looks like this:
dates <- c("2018-01-01", "2018-01-02", "2018-01-03", ... , "2018-05-30", "2018-06-01")
I want to make a data frame where the first column has each of those dates and the second column has their day of the week. I am assuming that 2018-01-01 is a Monday.
date day
2018-01-01 Monday
2018-01-02 Tuesday
2018-01-03 Wednesday
... ...
2018-06-01 Monday
I'm working on a data frame towards that end, but I was curious for a better way to recycle through the days of the week than the solution I put together.
day <- NULL
for (i in 1:length(dates)) {
x <- i
while (x > 7) {
x <- i - 7
}
day <- c(day, days[x])
}
cbind(dates,day)
We can use weekdays to get day of the week and put it in a dataframe.
data.frame(dates, day = weekdays(dates))
# dates day
#1 2018-01-01 Monday
#2 2018-01-02 Tuesday
#3 2018-01-03 Wednesday
#4 2018-05-30 Wednesday
#5 2018-06-01 Friday
EDIT
If we don't want to use any in-built function we can create a vector of days and lookup from there. Considering the first day is "Monday" we can use the modulo operator to find the relevant day for rest of the dates
days <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
day <- days[(as.numeric(dates - dates[1]) %% 7) + 1]
day
#[1] "Monday" "Tuesday" "Wednesday" "Wednesday" "Friday"
and then put them in dataframe
data.frame(dates, day)
# dates day
#1 2018-01-01 Monday
#2 2018-01-02 Tuesday
#3 2018-01-03 Wednesday
#4 2018-05-30 Wednesday
#5 2018-06-01 Friday
data
dates<-as.Date(c("2018-01-01","2018-01-02","2018-01-03","2018-05-30","2018-06-01"))

Filtering time series conditional on list of dates by week number (R)

I have a data frame containing a time sequence sampled every 30 minutes (for 2016). I need to make a subset containing every Wednesday 10:30:00 if the week contains no holiday falling on Sunday to Wedneday, and every Thursday 11:00:00 if the week contains a holiday falling on Sunday to Wedneday. This would create a schedule of EIA petroleum weekly report releases. I do not want to use xts.
I know how to subset by day of week and time of day. But I do not know how to subset conditional on the week containing a date present in a list of dates. How could I do that?
The code below creates a subset by day of week and time of day without filtering by holidays. It also includes the list of holiday dates to use as filter.
#Make time sequence every 30mins with Time & DayWk columns
Calendar30mn <- as.data.frame(seq(as.POSIXlt("2016/1/1 00:00:00"), as.POSIXlt("2016/12/31 23:59:59"), by="30 mins"))
colnames(Calendar30mn) <- "DateTime"
Calendar30mn$Time <- strftime(Calendar30mn$DateTime, format="%H:%M:%S")
Calendar30mn$DayWk <- weekdays(Calendar30mn$DateTime)
#List of US Federal holidays falling on Sunday/Monday/Tuesday/Wedneday
FedHolidaysSuntoWed <- structure(c(16818, 16846, 16951, 16986, 17049, 17161, 17084), class = "Date")
-----
#Subset for Wednesday 10:30:00
EIAOildates1 <- subset (Calendar30mn, Time == "10:30:00" & DayWk == "Wednesday")
#Subset for Thursday 11:00:00
EIAOildates2 <- subset (Calendar30mn, Time == "11:00:00" & DayWk == "Thursday")
#Bind subsets and set reverse order (most recent at the top)
EIAOildates <- rbind(EIAOildates1, EIAOildates2)
The above code generates EIAOildates1 containing a subset for Wednesday 10:30:00. I would like that subset to only contain Wednesday 10:30:00 if any day of that week is not present in FedHolidaysSuntoWed. And viceversa for EIAOildates2.
This is the answer:
library(lubridate)
#Make time sequence every 30mins with Time & DayWk & WkNumber columns
Calendar30mn <- as.data.frame(seq(as.POSIXlt("2016/1/1 00:00:00"), as.POSIXlt("2016/12/31 23:59:59"), by="30 mins"))
colnames(Calendar30mn) <- "DateTime"
Calendar30mn$Time <- strftime(Calendar30mn$DateTime, format="%H:%M:%S")
Calendar30mn$DayWk <- weekdays(Calendar30mn$DateTime)
Calendar30mn$WkNumber <- week(Calendar30mn$DateTime)
#List of US Federal holidays falling on Sunday/Monday/Tuesday/Wedneday & Corresponding WkNumber
FedHolidaysSuntoWed <- structure(c(16818, 16846, 16951, 16986, 17049, 17161, 17084), class = "Date")
FedHolidaysSuntoWedWkNumber <- week(FedHolidaysSuntoWed)
#Subset for Wednesday 10:30:00
EIAOildates1 <- subset (Calendar30mn, Time == "10:30:00" & DayWk == "Wednesday"
& !(Calendar30mn$WkNumber %in% FedHolidaysSuntoWedWkNumber))
#Subset for Thursday 11:00:00
EIAOildates2 <- subset (Calendar30mn, Time == "11:00:00" & DayWk == "Thursday"
& (Calendar30mn$WkNumber %in% FedHolidaysSuntoWedWkNumber))
#Bind and sort subsets
EIAOildates <- rbind(EIAOildates1, EIAOildates2)
EIAOildates <- EIAOildates[(order(as.Date(EIAOildates$DateTime))),]
This is a sample of the output of EIAOildates:
DateTime Time DayWk WkNumber
262 2016-01-06 10:30:00 10:30:00 Wednesday 1
598 2016-01-13 10:30:00 10:30:00 Wednesday 2
983 2016-01-21 11:00:00 11:00:00 Thursday 3
1270 2016-01-27 10:30:00 10:30:00 Wednesday 4
1606 2016-02-03 10:30:00 10:30:00 Wednesday 5
1942 2016-02-10 10:30:00 10:30:00 Wednesday 6
2327 2016-02-18 11:00:00 11:00:00 Thursday 7
16726 2016-12-14 10:30:00 10:30:00 Wednesday 50
17062 2016-12-21 10:30:00 10:30:00 Wednesday 51
17447 2016-12-29 11:00:00 11:00:00 Thursday 52

as.POSIXct does not get format correctly [duplicate]

I have a dataframe object, and among the fields in it, I have a dates:
df$dates
I need to add a column which is 'Week Starting', i.e.
df[,'WeekStart']= manipulation
Where the week start is the date of the Monday of that week. i.e.: today is Thursday 24/09/15, would have an entry as '21-Sept'. Next thursday, 01/10/15, would be '28-Sept'.
I see that there is a weekday() function which will convert a day into a week-day, but how can I deal with this most recent monday?
A base R approach with the function strftime.
df$Week.Start <- dates-abs(1-as.numeric(strftime(df$dates, "%u")))
This can be a one-liner but we'll create a few variables to see what's happening. The %u format pattern for dates returns the day of the week as a single decimal number. We can convert that number to numeric and subtract the distance from our dates. We can then subtract that vector from our date column.
day_of_week <- as.numeric(strftime(df$dates, "%u"))
day_diff <- abs(1-day_of_week)
df$Week.Start <- dates-day_diff
# dates Week.Start
# 1 2042-10-22 2042-10-20
# 2 2026-08-14 2026-08-10
# 3 2018-11-23 2018-11-19
# 4 2017-08-21 2017-08-21
# 5 2022-05-26 2022-05-23
# 6 2037-05-27 2037-05-25
Data
set.seed(7)
all_dates <- seq(Sys.Date(), Sys.Date()+10000, by="days")
dates <- sample(all_dates, 20)
df <- data.frame(dates)
Simples:
dates <-(Sys.Date()+1:30)
week.starts <- as.Date(sapply (dates, function(d) { return (d + (-6 - as.POSIXlt(d)$wday %% -7 ))}), origin = "1970-01-01")
and running as
d <- data.frame(dataes=dates, monday=week.starts)
gives
dataes monday
1 2015-09-25 2015-09-21
2 2015-09-26 2015-09-21
3 2015-09-27 2015-09-21
4 2015-09-28 2015-09-28
5 2015-09-29 2015-09-28
6 2015-09-30 2015-09-28
7 2015-10-01 2015-09-28
8 2015-10-02 2015-09-28
9 2015-10-03 2015-09-28
10 2015-10-04 2015-09-28
11 2015-10-05 2015-10-05
12 2015-10-06 2015-10-05
13 2015-10-07 2015-10-05
14 2015-10-08 2015-10-05
15 2015-10-09 2015-10-05
16 2015-10-10 2015-10-05
17 2015-10-11 2015-10-05
18 2015-10-12 2015-10-12
19 2015-10-13 2015-10-12
20 2015-10-14 2015-10-12
21 2015-10-15 2015-10-12
22 2015-10-16 2015-10-12
23 2015-10-17 2015-10-12
24 2015-10-18 2015-10-12
25 2015-10-19 2015-10-19
26 2015-10-20 2015-10-19
27 2015-10-21 2015-10-19
28 2015-10-22 2015-10-19
29 2015-10-23 2015-10-19
30 2015-10-24 2015-10-19
Similar approach, example:
# data
d <- data.frame(date = as.Date( c("20/09/2015","24/09/2015","28/09/2015","01/10/2015"), "%d/%m/%Y"))
# get monday
d$WeekStart <- d$date - 6 - (as.POSIXlt(d$date)$wday %% -7)
d
# result
# date WeekStart
# 1 2015-09-20 2015-09-14
# 2 2015-09-24 2015-09-21
# 3 2015-09-28 2015-09-28
# 4 2015-10-01 2015-09-28
How about just subtracting from the dates the number of days required to get to the previous Monday? e.g if your data is
dates <- as.Date(c("2000-07-12", "2005-02-19", "2010-09-01"))
weekdays(dates)
# [1] "Wednesday" "Saturday" "Wednesday"
then you can compare this to a vector
wdays <- setNames(0:6, c("Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday", "Sunday"))
and subtract the required number of days from each date, ie
dates - wdays[weekdays(dates)]
# Wednesday Saturday Wednesday
#"2000-07-10" "2005-02-14" "2010-08-30"
will give the dates of the Monday preceding each date in dates. To test:
weekdays(dates - wdays[weekdays(dates)])
#Wednesday Saturday Wednesday
# "Monday" "Monday" "Monday"
Everything can be written also in one line as
dates - match(weekdays(dates), c("Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday", "Sunday")) + 1
#"2000-07-10" "2005-02-14" "2010-08-30"
a[1] <-as.Date("2016-08-20")
Finding Next day (Here "Monday")
a[1] + match("Monday",weekdays(seq(a[1]+1, a[1]+6,"days")))
"2016-08-22"
Finding Last Day (Here "Friday")
a[1] + (match("Friday",weekdays(seq(a[1]+1, a[1]+6,"days")))-7)
"2016-08-19"
A simple base-R way if your dates are properly coded as date class in R: as.Date(unclass(dates)-unclass(dates)%%7-3). You unclass it do get number of days since 1970-01-01. Then subtract remainder from division on 7 (day of the week!). Then subtract 3 because 1970-01-01 was Thursday –
Also you can group your data by week, and then create a column of "minimal date of that week". Here is how to do it in data.table package:
df=data.table(df)
df[,lastMonday:=min(dates),by=.(week(dates))]
It should work if you dont have spaces in dates.
Also, in some locales week starts with sunday, so you should be careful.
And you will need additional grouping variable, if your dates span for more than a year
If you want nearest any day and hour to the current date, use this function:
dayhour <- function(day,hour){
k <- as.Date(Sys.time())+day-as.numeric(format(strptime(Sys.time(),format="%Y-%m-%d %H:%M:%S"), format ='%u'))
dh <- format(strptime(paste(k,hour), format="%Y-%m-%d %H"), format="%A %H")
return(dh)
}
For the weekdays use 0 to 6 as day argument for sunday to saturday respectively:
> dayhour(0,17)
[1] "Sunday 17"

Create a Dataframe of All Dates in the year 2015

I am interested in creating a dataframe of Date values for the year 2015. There would be one row per date. Also, these would have to correspond to their accurate weekday. For example weekdays() applied to 2015-01-01 would have a value of Thursday. This is because I ultimately want to extract the dates that correspond to Saturdays and Sundays.
try this:
dates <- seq(as.Date("2015-01-01"),as.Date("2015-12-31"),1)
weekdays <- weekdays(dates)
res <- data.frame(dates,weekdays)
res[res$weekdays=="Sunday" | res$weekdays=="Saturday",]
##EDIT thanks to #Jaap
res[res$weekdays %in% c("Sunday","Saturday"),]
dates weekdays
3 2015-01-03 Saturday
4 2015-01-04 Sunday
10 2015-01-10 Saturday
11 2015-01-11 Sunday
17 2015-01-17 Saturday
18 2015-01-18 Sunday

How to change to Year Month Week format?

I have dates in year month day format that I want to convert to year month week format like so:
date dateweek
2015-02-18 -> 2015-02-8
2015-02-19 -> 2015-02-8
2015-02-20 -> ....
2015-02-21
2015-02-22
2015-02-23
2015-02-24 ...
2015-02-25 -> 2015-02-9
2015-02-26 -> 2015-02-9
2015-02-27 -> 2015-02-9
I tried
data$dateweek <- week(as.POSIXlt(data$date))
but that returns only weeks without the corresponding year and month.
I also tried:
data$dateweek <- as.POSIXct('2015-02-18')
data$dateweek <- format(data$dateweek, '%Y-%m-%U')
# data$dateweek <- format(as.POSIXct(data$date), '%Y-%m-%U')
but the corresponding columns look strange:
date datetime
2015-01-01 2015-01-00
2015-01-02 2015-01-00
2015-01-03 2015-01-00
2015-01-04 2015-01-01
2015-01-05 2015-01-01
2015-01-06 2015-01-01
2015-01-07 2015-01-01
2015-01-08 2015-01-01
2015-01-09 2015-01-01
2015-01-10 2015-01-01
2015-01-11 2015-01-02
You need to use the '%Y-%m-%V format to change it:
mydate <- as.POSIXct('2015-02-18')
> format(mydate, '%Y-%m-%V')
[1] "2015-02-08"
From the documentation strptime:
%V
Week of the year as decimal number (00–53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1. (Accepted but ignored on input.)
and there is also (The US convention) :
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
It really depends on which one you want to use for your case.
mydate <- as.POSIXct('2015-02-18')
> format(mydate, '%Y-%m-%U')
[1] "2015-02-07"
In your case you should do:
data$dateweek <- format(as.POSIXct(data$date), '%Y-%m-%U')

Resources