Create a Dataframe of All Dates in the year 2015 - r

I am interested in creating a dataframe of Date values for the year 2015. There would be one row per date. Also, these would have to correspond to their accurate weekday. For example weekdays() applied to 2015-01-01 would have a value of Thursday. This is because I ultimately want to extract the dates that correspond to Saturdays and Sundays.

try this:
dates <- seq(as.Date("2015-01-01"),as.Date("2015-12-31"),1)
weekdays <- weekdays(dates)
res <- data.frame(dates,weekdays)
res[res$weekdays=="Sunday" | res$weekdays=="Saturday",]
##EDIT thanks to #Jaap
res[res$weekdays %in% c("Sunday","Saturday"),]
dates weekdays
3 2015-01-03 Saturday
4 2015-01-04 Sunday
10 2015-01-10 Saturday
11 2015-01-11 Sunday
17 2015-01-17 Saturday
18 2015-01-18 Sunday

Related

How do I recycle a character vector in R?

I have a list of every day from 2018-01-01 to 2018-06-01. It is a vector and it looks like this:
dates <- c("2018-01-01", "2018-01-02", "2018-01-03", ... , "2018-05-30", "2018-06-01")
I want to make a data frame where the first column has each of those dates and the second column has their day of the week. I am assuming that 2018-01-01 is a Monday.
date day
2018-01-01 Monday
2018-01-02 Tuesday
2018-01-03 Wednesday
... ...
2018-06-01 Monday
I'm working on a data frame towards that end, but I was curious for a better way to recycle through the days of the week than the solution I put together.
day <- NULL
for (i in 1:length(dates)) {
x <- i
while (x > 7) {
x <- i - 7
}
day <- c(day, days[x])
}
cbind(dates,day)
We can use weekdays to get day of the week and put it in a dataframe.
data.frame(dates, day = weekdays(dates))
# dates day
#1 2018-01-01 Monday
#2 2018-01-02 Tuesday
#3 2018-01-03 Wednesday
#4 2018-05-30 Wednesday
#5 2018-06-01 Friday
EDIT
If we don't want to use any in-built function we can create a vector of days and lookup from there. Considering the first day is "Monday" we can use the modulo operator to find the relevant day for rest of the dates
days <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
day <- days[(as.numeric(dates - dates[1]) %% 7) + 1]
day
#[1] "Monday" "Tuesday" "Wednesday" "Wednesday" "Friday"
and then put them in dataframe
data.frame(dates, day)
# dates day
#1 2018-01-01 Monday
#2 2018-01-02 Tuesday
#3 2018-01-03 Wednesday
#4 2018-05-30 Wednesday
#5 2018-06-01 Friday
data
dates<-as.Date(c("2018-01-01","2018-01-02","2018-01-03","2018-05-30","2018-06-01"))

How do I merge two time series to result in an object with only the dates from the smaller one (R)?

I have two daily time series ranging from 1st of Jan 2016 to 1st of Aug 2016, however one my my series only includes data from business days (i.e weekends and bank holidays omitted), the other has data for everyday. My question is, how do I merge the two series so that for both time series I have only the business day data left over (deleting those extra days from the second time series)
The question was tagged also with data.table so I guess that the two time series are stored as data.frames or data.tables.
By default, joins in data.table are right joins. So, if you know in advance which one the "shorter" time series is you can write:
library(data.table)
dt_long[dt_short, on = "date"]
# date weekday i.weekday
#1: 2017-03-30 4 4
#2: 2017-03-31 5 5
#3: 2017-04-03 1 1
#4: 2017-04-04 2 2
#5: 2017-04-05 3 3
#6: 2017-04-06 4 4
If you are not sure which the "shorter" time series is you can use an inner join:
dt_short[dt_long, on = "date", nomatch = 0]
nomatch = 0 specifies the inner join.
If your time series are not already data.tables as the sample data here but are stored as data.frames, you need to coerce them to data.table class beforehand by:
setDT(dt_long)
setDT(dt_short)
Data
As the OP hasn't provided any reproducible data, we need to prepare sample data on our own (similar to this answer but as data.table):
library(data.table)
dt_long <- data.table(date = as.Date("2017-03-30") + 0:7)
# add payload: integer weekday according ISO (week starts on Monday == 1L)
dt_long[, weekday := as.integer(format(date, "%u"))]
# remove weekends
dt_short <- dt_long[weekday < 6L]
We have two data.frames df_long that contains weekends and df_short that doesn't include weekends
Date <- as.Date(seq(as.Date("2003-03-03"), as.Date("2003-03-17"), by = 1), format="%Y-%m-%d")
weekday <- weekdays(as.Date(Date))
df_long <- data.frame(Date, weekday)
df_short<- df_long[ c(1:5, 8:12, 15), ]
You can join them using dplyr::inner_join to delete the weekends and holidays from df_long and keep just the business days.
library(dplyr)
df_join <- df_long %>% inner_join(., df_short, by ="Date")
> df_join
Date weekday.x weekday.y
1 2003-03-03 Monday Monday
2 2003-03-04 Tuesday Tuesday
3 2003-03-05 Wednesday Wednesday
4 2003-03-06 Thursday Thursday
5 2003-03-07 Friday Friday
6 2003-03-10 Monday Monday
7 2003-03-11 Tuesday Tuesday
8 2003-03-12 Wednesday Wednesday
9 2003-03-13 Thursday Thursday
10 2003-03-14 Friday Friday
11 2003-03-17 Monday Monday

Divide time-series data into weekday and weekend datasets using R

I have dataset consisting of two columns (timestamp and power) as:
str(df2)
'data.frame': 720 obs. of 2 variables:
$ timestamp: POSIXct, format: "2015-08-01 00:00:00" "2015-08-01 01:00:00" " ...
$ power : num 124 149 118 167 130 ..
This dataset is of entire one month duration. I want to create two subsets of it - one containing the weekend data, and other one containing weekday (Monday - Friday) data. In other words, one dataset should contain data corresponding to saturday and sunday and the other one should contain data of other days. Both of the subsets should retain both of the columns. How can I do this in R?
I tried to use the concept of aggregate and split, but I am not clear in the function parameter (FUN) of aggregate, how should I specify a divison of dataset.
You can use R base functions to do this, first use strptime to separate date data from first column and then use function weekdays.
Example:
df1<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00"),power=1:2)
df1$day<-strptime(df1[,1], "%Y-%m-%d")
df1$weekday<-weekdays(df1$day)
df1
timestamp power day weekday
2015-08-01 00:00:00 1 2015-08-01 Saturday
2015-10-13 00:00:00 2 2015-10-13 Tuesday
Building on top of #ShruS example:
df<-data.frame(timestamp=c("2015-08-01 00:00:00","2015-10-13 00:00:00", "2015-10-11 00:00:00", "2015-10-14 00:00:00"))
df$day<-strptime(df[,1], "%Y-%m-%d")
df$weekday<-weekdays(df$day)
df1 = subset(df,df$weekday == "Saturday" | df$weekday == "Sunday")
df2 = subset(df,df$weekday != "Saturday" & df$weekday != "Sunday")
> df
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
3 2015-10-11 00:00:00 2015-10-11 Sunday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
> df1
timestamp day weekday
1 2015-08-01 00:00:00 2015-08-01 Saturday
3 2015-10-11 00:00:00 2015-10-11 Sunday
> df2
timestamp day weekday
2 2015-10-13 00:00:00 2015-10-13 Tuesday
4 2015-10-14 00:00:00 2015-10-14 Wednesday
Initially, I tried for complex approaches using extra libraries, but at the end, I came out with a basic approach using R.
#adding day column to existing set
df2$day <- weekdays(as.POSIXct(df2$timestamp))
# creating two data_subsets, i.e., week_data and weekend_data
week_data<- data.frame(timestamp=factor(), power= numeric(),day= character())
weekend_data<- data.frame(timestamp=factor(),power=numeric(),day= character())
#Specifying weekend days in vector, weekend
weekend <- c("Saturday","Sunday")
for(i in 1:nrow(df2)){
if(is.element(df2[i,3], weekend)){
weekend_data <- rbind(weekend_data, df2[i,])
} else{
week_data <- rbind(week_data, df2[i,])
}
}
The datasets created, i.e., weekend_data and week_data are my required sub datasets.

How to change to Year Month Week format?

I have dates in year month day format that I want to convert to year month week format like so:
date dateweek
2015-02-18 -> 2015-02-8
2015-02-19 -> 2015-02-8
2015-02-20 -> ....
2015-02-21
2015-02-22
2015-02-23
2015-02-24 ...
2015-02-25 -> 2015-02-9
2015-02-26 -> 2015-02-9
2015-02-27 -> 2015-02-9
I tried
data$dateweek <- week(as.POSIXlt(data$date))
but that returns only weeks without the corresponding year and month.
I also tried:
data$dateweek <- as.POSIXct('2015-02-18')
data$dateweek <- format(data$dateweek, '%Y-%m-%U')
# data$dateweek <- format(as.POSIXct(data$date), '%Y-%m-%U')
but the corresponding columns look strange:
date datetime
2015-01-01 2015-01-00
2015-01-02 2015-01-00
2015-01-03 2015-01-00
2015-01-04 2015-01-01
2015-01-05 2015-01-01
2015-01-06 2015-01-01
2015-01-07 2015-01-01
2015-01-08 2015-01-01
2015-01-09 2015-01-01
2015-01-10 2015-01-01
2015-01-11 2015-01-02
You need to use the '%Y-%m-%V format to change it:
mydate <- as.POSIXct('2015-02-18')
> format(mydate, '%Y-%m-%V')
[1] "2015-02-08"
From the documentation strptime:
%V
Week of the year as decimal number (00–53) as defined in ISO 8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1. (Accepted but ignored on input.)
and there is also (The US convention) :
%U
Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
It really depends on which one you want to use for your case.
mydate <- as.POSIXct('2015-02-18')
> format(mydate, '%Y-%m-%U')
[1] "2015-02-07"
In your case you should do:
data$dateweek <- format(as.POSIXct(data$date), '%Y-%m-%U')

How to Parse Year + Week Number in R?

Is there a good way to get a year + week number converted a date in R? I have tried the following:
> as.POSIXct("2008 41", format="%Y %U")
[1] "2008-02-21 EST"
> as.POSIXct("2008 42", format="%Y %U")
[1] "2008-02-21 EST"
According to ?strftime:
%Y Year with century. Note that whereas there was no zero in the
original Gregorian calendar, ISO 8601:2004 defines it to be valid
(interpreted as 1BC): see http://en.wikipedia.org/wiki/0_(year). Note
that the standard also says that years before 1582 in its calendar
should only be used with agreement of the parties involved.
%U Week of the year as decimal number (00–53) using Sunday as the
first day 1 of the week (and typically with the first Sunday of the
year as day 1 of week 1). The US convention.
This is kinda like another question you may have seen before. :)
The key issue is: what day should a week number specify? Is it the first day of the week? The last? That's ambiguous. I don't know if week one is the first day of the year or the 7th day of the year, or possibly the first Sunday or Monday of the year (which is a frequent interpretation). (And it's worse than that: these generally appear to be 0-indexed, rather than 1-indexed.) So, an enumerated day of the week needs to be specified.
For instance, try this:
as.POSIXlt("2008 42 1", format = "%Y %U %u")
The %u indicator specifies the day of the week.
Additional note: See ?strptime for the various options for format conversion. It's important to be careful about the enumeration of weeks, as these can be split across the end of the year, and day 1 is ambiguous: is it specified based on a Sunday or Monday, or from the first day of the year? This should all be specified and tested on the different systems where the R code will run. I'm not certain that Windows and POSIX systems sing the same tune on some of these conversions, hence I'd test and test again.
Day-of-week == zero in the POSIXlt DateTimesClasses system is Sunday. Not exactly Biblical and not in agreement with the R indexing that starts at "1" convention either, but that's what it is. Week zero is the first (partial) week in the year. Week one (but day of week zero) starts with the first Sunday. And all the other sequence types in POSIXlt have 0 as their starting point. It kind of interesting to see what coercing the list elements of POSIXlt objects do. The only way you can actually change a POSIXlt date is to alter the $year, the $mon or the $mday elements. The others seem to be epiphenomena.
today <- as.POSIXlt(Sys.Date())
today # Tuesday
#[1] "2012-02-21 UTC"
today$wday <- 0 # attempt to make it Sunday
today
# [1] "2012-02-21 UTC" The attempt fails
today$mday <- 19
today
#[1] "2012-02-19 UTC" Success
I did not come up with this myself (it's taken from a blog post by Forester), but nevertheless I thought I'd add this to the answer list because it's the first implementation of the ISO 8601 week number convention that I've seen in R.
No doubt, week numbers are a very ambiguous topic, but I prefer an ISO standard over the current implementation of week numbers via format(..., "%U") because it seems that this is what most people agreed on, at least in Germany (calendars etc.).
I've put the actual function def at the bottom to facilitate focusing on the output first. Also, I just stumbled across package ISOweek, maybe worth a try.
Approach Comparison
x.days <- c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
x.names <- sapply(1:length(posix), function(x) {
x.day <- as.POSIXlt(posix[x], tz="Europe/Berlin")$wday
if (x.day == 0) {
x.day <- 7
}
out <- x.days[x.day]
})
data.frame(
posix,
name=x.names,
week.r=weeknum,
week.iso=ISOweek(as.character(posix), tzone="Europe/Berlin")$weeknum
)
# Result
posix name week.r week.iso
1 2012-01-01 Sun 1 4480458
2 2012-01-02 Mon 1 1
3 2012-01-03 Tue 1 1
4 2012-01-04 Wed 1 1
5 2012-01-05 Thu 1 1
6 2012-01-06 Fri 1 1
7 2012-01-07 Sat 1 1
8 2012-01-08 Sun 2 1
9 2012-01-09 Mon 2 2
10 2012-01-10 Tue 2 2
11 2012-01-11 Wed 2 2
12 2012-01-12 Thu 2 2
13 2012-01-13 Fri 2 2
14 2012-01-14 Sat 2 2
15 2012-01-15 Sun 3 2
16 2012-01-16 Mon 3 3
17 2012-01-17 Tue 3 3
18 2012-01-18 Wed 3 3
19 2012-01-19 Thu 3 3
20 2012-01-20 Fri 3 3
21 2012-01-21 Sat 3 3
22 2012-01-22 Sun 4 3
23 2012-01-23 Mon 4 4
24 2012-01-24 Tue 4 4
25 2012-01-25 Wed 4 4
26 2012-01-26 Thu 4 4
27 2012-01-27 Fri 4 4
28 2012-01-28 Sat 4 4
29 2012-01-29 Sun 5 4
30 2012-01-30 Mon 5 5
31 2012-01-31 Tue 5 5
Function Def
It's taken directly from the blog post, I've just changed a couple of minor things. The function is still kind of sketchy (e.g. the week number of the first date is far off), but I find it to be a nice start!
ISOweek <- function(
date,
format="%Y-%m-%d",
tzone="UTC",
return.val="weekofyear"
){
##converts dates into "dayofyear" or "weekofyear", the latter providing the ISO-8601 week
##date should be a vector of class Date or a vector of formatted character strings
##format refers to the date form used if a vector of
## character strings is supplied
##convert date to POSIXt format
if(class(date)[1]%in%c("Date","character")){
date=as.POSIXlt(date,format=format, tz=tzone)
}
# if(class(date)[1]!="POSIXt"){
if (!inherits(date, "POSIXt")) {
print("Date is of wrong format.")
break
}else if(class(date)[2]=="POSIXct"){
date=as.POSIXlt(date, tz=tzone)
}
print(date)
if(return.val=="dayofyear"){
##add 1 because POSIXt is base zero
return(date$yday+1)
}else if(return.val=="weekofyear"){
##Based on the ISO8601 weekdate system,
## Monday is the first day of the week
## W01 is the week with 4 Jan in it.
year=1900+date$year
jan4=strptime(paste(year,1,4,sep="-"),format="%Y-%m-%d")
wday=jan4$wday
wday[wday==0]=7 ##convert to base 1, where Monday == 1, Sunday==7
##calculate the date of the first week of the year
weekstart=jan4-(wday-1)*86400
weeknum=ceiling(as.numeric((difftime(date,weekstart,units="days")+0.1)/7))
#########################################################################
##calculate week for days of the year occuring in the next year's week 1.
#########################################################################
mday=date$mday
wday=date$wday
wday[wday==0]=7
year=ifelse(weeknum==53 & mday-wday>=28,year+1,year)
weeknum=ifelse(weeknum==53 & mday-wday>=28,1,weeknum)
################################################################
##calculate week for days of the year occuring prior to week 1.
################################################################
##first calculate the numbe of weeks in the previous year
year.shift=year-1
jan4.shift=strptime(paste(year.shift,1,4,sep="-"),format="%Y-%m-%d")
wday=jan4.shift$wday
wday[wday==0]=7 ##convert to base 1, where Monday == 1, Sunday==7
weekstart=jan4.shift-(wday-1)*86400
weeknum.shift=ceiling(as.numeric((difftime(date,weekstart)+0.1)/7))
##update year and week
year=ifelse(weeknum==0,year.shift,year)
weeknum=ifelse(weeknum==0,weeknum.shift,weeknum)
return(list("year"=year,"weeknum"=weeknum))
}else{
print("Unknown return.val")
break
}
}

Resources