This question already has answers here:
How to filter or subset specific date and time intervals in R? Lubridate?
(2 answers)
Closed 2 years ago.
I am working on a project and would be happy about your help.
I am working with stocks and the effect of weekdays on performance, is there a way to take all the values (for instance the S&P 500) of a data frame (df) from a specific weekday (e.g. Tuesday) and enter these values in a different data frame (df2) in a new column?
Thank you very much,
Ferdinand
df <- read.csv("AAPL.csv") # from Yahoo! Finance
> head(df)
Date Open High Low Close Adj.Close Volume
1 2019-07-10 201.85 203.73 201.56 203.23 200.8332 17897100
2 2019-07-11 203.31 204.39 201.71 201.75 199.3706 20191800
3 2019-07-12 202.45 204.00 202.20 203.30 200.9023 17595200
4 2019-07-15 204.09 205.87 204.00 205.21 202.7898 16947400
5 2019-07-16 204.59 206.11 203.50 204.50 202.0882 16866800
6 2019-07-17 204.05 205.09 203.27 203.35 200.9517 14107500
df$Day <- format(as.Date(df$Date), "%A") # Get the day
idx <- df$Day == "Tuesday" # Where are the Tuesdays ?
df2 <- df[idx, ] # Logical indexing
> head(df2)
Date Open High Low Close Adj.Close Volume Day
5 2019-07-16 204.59 206.11 203.50 204.50 202.0882 16866800 Tuesday
10 2019-07-23 208.46 208.91 207.29 208.84 206.3770 18355200 Tuesday
15 2019-07-30 208.76 210.16 207.31 208.78 206.3177 33935700 Tuesday
20 2019-08-06 196.31 198.07 194.04 197.00 194.6766 35824800 Tuesday
25 2019-08-13 201.02 212.14 200.48 208.97 207.2901 47218500 Tuesday
30 2019-08-20 210.88 213.35 210.32 210.36 208.6689 26884300 Tuesday
Your function :
myfunction <- function(mydf) {
df$Day <- format(as.Date(df$Date), "%A")
idx <- df$Day == "Tuesday"
df2 <- df[idx, ]
}
Testing myfunction :
> out = myfunction(df)
> head(out)
Date Open High Low Close Adj.Close Volume Day
5 2019-07-16 204.59 206.11 203.50 204.50 202.0882 16866800 Tuesday
10 2019-07-23 208.46 208.91 207.29 208.84 206.3770 18355200 Tuesday
15 2019-07-30 208.76 210.16 207.31 208.78 206.3177 33935700 Tuesday
20 2019-08-06 196.31 198.07 194.04 197.00 194.6766 35824800 Tuesday
25 2019-08-13 201.02 212.14 200.48 208.97 207.2901 47218500 Tuesday
30 2019-08-20 210.88 213.35 210.32 210.36 208.6689 26884300 Tuesday
Related
In R, how can I produce a list of dates of all 2nd to last Wednesdays of the month in a specified date range? I've tried a few things but have gotten inconsistent results for months with five Wednesdays.
To generate a regular sequence of dates you can use seq with dates for parameter from and to. See the seq.Date documentation for more options.
Create a data frame with the date, the month and weekday. And then obtain the second to last wednesday for each month with the help of aggregate.
day_sequence = seq(as.Date("2020/1/1"), as.Date("2020/12/31"), "day")
df = data.frame(day = day_sequence,
month = months(day_sequence),
weekday = weekdays(day_sequence))
#Filter only wednesdays
df = df[df$weekday == "Wednesday",]
result = aggregate(day ~ month, df, function(x){head(tail(x,2),1)})
tail(x,2) will return the last two rows, then head(.., 1) will give you the first of these last two.
Result:
month day
1 April 2020-04-22
2 August 2020-08-19
3 December 2020-12-23
4 February 2020-02-19
5 January 2020-01-22
6 July 2020-07-22
7 June 2020-06-17
8 March 2020-03-18
9 May 2020-05-20
10 November 2020-11-18
11 October 2020-10-21
12 September 2020-09-23
There are probably simpler ways of doing this but the function below does what the question asks for. it returns a named vector of days such that
They are between from and to.
Are weekday day, where 1 is Monday.
Are n to last of the month.
By n to last I mean the nth counting from the end of the month.
whichWeekday <- function(from, to, day, n, format = "%Y-%m-%d"){
from <- as.Date(from, format = format)
to <- as.Date(to, format = format)
day <- as.character(day)
d <- seq(from, to, by = "days")
m <- format(d, "%Y-%m")
f <- c(TRUE, m[-1] != m[-length(m)])
f <- cumsum(f)
wed <- tapply(d, f, function(x){
i <- which(format(x, "%u") == day)
x[ tail(i, n)[1] ]
})
y <- as.Date(wed, origin = "1970-01-01")
setNames(y, format(y, "%Y-%m"))
}
whichWeekday("2019-01-01", "2020-03-31", 4, 2)
# 2019-01 2019-02 2019-03 2019-04 2019-05
#"2019-01-23" "2019-02-20" "2019-03-20" "2019-04-17" "2019-05-22"
# 2019-06 2019-07 2019-08 2019-09 2019-10
#"2019-06-19" "2019-07-24" "2019-08-21" "2019-09-18" "2019-10-23"
# 2019-11 2019-12 2020-01 2020-02 2020-03
#"2019-11-20" "2019-12-18" "2020-01-22" "2020-02-19" "2020-03-18"
I have a time series which spans multiple years and want to divide it into four categories based on date (ie, 15 April - 10 May, 11 May - 10 July, and so on). My first thought was to use lubridate to define each time period with interval() and then use %within% to determine whether an event occurs within it or not.
df
id datetime
1 HAR10 2019-06-26 04:35:06
2 HAR05 2019-08-05 19:15:00
3 HAR07 2018-07-26 01:01:00
4 HAR07 2018-07-24 23:36:00
5 HAR05 2019-08-27 18:59:43
6 HAR05 2019-07-12 03:33:00
7 HAR07 2018-08-09 16:21:00
8 HAR07 2019-05-01 00:04:28
9 HAR04 2019-07-01 05:25:00
10 HAR07 2018-07-18 15:17:00
perA <- interval(ymd(20190511), ymd(20190710))
df %within% perA
I immediately ran into a problem with year, since I want to get all events from, say, April - May, regardless of what year they occurred, but interval is year-specific so the interval defined above works for my 2019 data but not my 2018 data. I could define a new set of intervals for each year, but that seems very messy.
Another problem is that a vector of TRUE and FALSE, which %within% returns, is not what I need. I need to assign each event to a category based on which time range it falls within.
My second thought was to use filter(), but I don't think that solves either of my problems. Any help is appreciated!
You can easily extract the month, day or even hour and set to the same year across dates. I made up some groups. This is a dplyr solution, but you should be able to easily convert to base if you prefer.
library(dplyr)
library(lubridate)
df %>%
mutate(noyeardate = as.Date(paste(2000, month(datetime), day(datetime), sep = "-")),
interval = case_when(noyeardate %within% interval(ymd(20000101), ymd(20000331)) ~ "Group 1",
noyeardate %within% interval(ymd(20000401), ymd(20000630)) ~ "Group 2",
noyeardate %within% interval(ymd(20000701), ymd(20000930)) ~ "Group 3",
noyeardate %within% interval(ymd(20001001), ymd(20001231)) ~ "Group 4"))
id datetime noyeardate interval
1 HAR10 2018-07-18 15:17:00 2000-07-18 Group 3
2 HAR05 2018-07-24 23:36:00 2000-07-24 Group 3
3 HAR07 2018-07-26 01:01:00 2000-07-26 Group 3
4 HAR07 2018-08-09 16:21:00 2000-08-09 Group 3
5 HAR05 2019-05-01 00:04:28 2000-05-01 Group 2
6 HAR05 2019-06-26 04:35:06 2000-06-26 Group 2
7 HAR07 2019-07-01 05:25:00 2000-07-01 Group 3
8 HAR07 2019-07-12 03:33:00 2000-07-12 Group 3
9 HAR04 2019-08-05 19:15:00 2000-08-05 Group 3
10 HAR07 2019-08-27 18:59:43 2000-08-27 Group 3
Data:
df <- data.frame(id = c("HAR10", "HAR05", "HAR07", "HAR07", "HAR05", "HAR05", "HAR07", "HAR07", "HAR04", "HAR07"),
datetime = as.POSIXct(c("2018-07-18 15:17:00", "2018-07-24 23:36:00",
"2018-07-26 01:01:00", "2018-08-09 16:21:00", "2019-05-01 00:04:28",
"2019-06-26 04:35:06", "2019-07-01 05:25:00", "2019-07-12 03:33:00",
"2019-08-05 19:15:00", "2019-08-27 18:59:43")))
I have a list of every day from 2018-01-01 to 2018-06-01. It is a vector and it looks like this:
dates <- c("2018-01-01", "2018-01-02", "2018-01-03", ... , "2018-05-30", "2018-06-01")
I want to make a data frame where the first column has each of those dates and the second column has their day of the week. I am assuming that 2018-01-01 is a Monday.
date day
2018-01-01 Monday
2018-01-02 Tuesday
2018-01-03 Wednesday
... ...
2018-06-01 Monday
I'm working on a data frame towards that end, but I was curious for a better way to recycle through the days of the week than the solution I put together.
day <- NULL
for (i in 1:length(dates)) {
x <- i
while (x > 7) {
x <- i - 7
}
day <- c(day, days[x])
}
cbind(dates,day)
We can use weekdays to get day of the week and put it in a dataframe.
data.frame(dates, day = weekdays(dates))
# dates day
#1 2018-01-01 Monday
#2 2018-01-02 Tuesday
#3 2018-01-03 Wednesday
#4 2018-05-30 Wednesday
#5 2018-06-01 Friday
EDIT
If we don't want to use any in-built function we can create a vector of days and lookup from there. Considering the first day is "Monday" we can use the modulo operator to find the relevant day for rest of the dates
days <- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
day <- days[(as.numeric(dates - dates[1]) %% 7) + 1]
day
#[1] "Monday" "Tuesday" "Wednesday" "Wednesday" "Friday"
and then put them in dataframe
data.frame(dates, day)
# dates day
#1 2018-01-01 Monday
#2 2018-01-02 Tuesday
#3 2018-01-03 Wednesday
#4 2018-05-30 Wednesday
#5 2018-06-01 Friday
data
dates<-as.Date(c("2018-01-01","2018-01-02","2018-01-03","2018-05-30","2018-06-01"))
I have two daily time series ranging from 1st of Jan 2016 to 1st of Aug 2016, however one my my series only includes data from business days (i.e weekends and bank holidays omitted), the other has data for everyday. My question is, how do I merge the two series so that for both time series I have only the business day data left over (deleting those extra days from the second time series)
The question was tagged also with data.table so I guess that the two time series are stored as data.frames or data.tables.
By default, joins in data.table are right joins. So, if you know in advance which one the "shorter" time series is you can write:
library(data.table)
dt_long[dt_short, on = "date"]
# date weekday i.weekday
#1: 2017-03-30 4 4
#2: 2017-03-31 5 5
#3: 2017-04-03 1 1
#4: 2017-04-04 2 2
#5: 2017-04-05 3 3
#6: 2017-04-06 4 4
If you are not sure which the "shorter" time series is you can use an inner join:
dt_short[dt_long, on = "date", nomatch = 0]
nomatch = 0 specifies the inner join.
If your time series are not already data.tables as the sample data here but are stored as data.frames, you need to coerce them to data.table class beforehand by:
setDT(dt_long)
setDT(dt_short)
Data
As the OP hasn't provided any reproducible data, we need to prepare sample data on our own (similar to this answer but as data.table):
library(data.table)
dt_long <- data.table(date = as.Date("2017-03-30") + 0:7)
# add payload: integer weekday according ISO (week starts on Monday == 1L)
dt_long[, weekday := as.integer(format(date, "%u"))]
# remove weekends
dt_short <- dt_long[weekday < 6L]
We have two data.frames df_long that contains weekends and df_short that doesn't include weekends
Date <- as.Date(seq(as.Date("2003-03-03"), as.Date("2003-03-17"), by = 1), format="%Y-%m-%d")
weekday <- weekdays(as.Date(Date))
df_long <- data.frame(Date, weekday)
df_short<- df_long[ c(1:5, 8:12, 15), ]
You can join them using dplyr::inner_join to delete the weekends and holidays from df_long and keep just the business days.
library(dplyr)
df_join <- df_long %>% inner_join(., df_short, by ="Date")
> df_join
Date weekday.x weekday.y
1 2003-03-03 Monday Monday
2 2003-03-04 Tuesday Tuesday
3 2003-03-05 Wednesday Wednesday
4 2003-03-06 Thursday Thursday
5 2003-03-07 Friday Friday
6 2003-03-10 Monday Monday
7 2003-03-11 Tuesday Tuesday
8 2003-03-12 Wednesday Wednesday
9 2003-03-13 Thursday Thursday
10 2003-03-14 Friday Friday
11 2003-03-17 Monday Monday
I have dates formatted
as.Date(variable, format="%Y%m%d")
I extracted the weekday from that using
weekdays(as.Date(variable))
I now need to be able to say which occurrence of the day of the week the date was. For example, this was the second Tuesday of February, or this is the 4th Friday of March.
The occurrence is simply the ceiling of (day of month / 7) and day of month can be extracted using as.POSIXlt so put all together:
d <- as.Date(variable, format="%Y%m%d")
occ <- c("1st", "2nd", "3rd", "4th", "5th")
paste(occ[ceiling(as.POSIXlt(d)$mday / 7L)], weekdays(d), "of", months(d))
You can find the nth weekday of the year with (as.integer(format(x, "%d")) - 1) %/% 7 + 1:
days <- as.Date("2017-03-01") + 0:9
wdays <- weekdays(days)
nth <- (as.integer(format(days, "%d")) - 1) %/% 7 + 1
(Put in a data.frame for easy alignment:)
cbind.data.frame(days, wdays, nth)
# days wdays nth
# 1 2017-03-01 Wednesday 1
# 2 2017-03-02 Thursday 1
# 3 2017-03-03 Friday 1
# 4 2017-03-04 Saturday 1
# 5 2017-03-05 Sunday 1
# 6 2017-03-06 Monday 1
# 7 2017-03-07 Tuesday 1
# 8 2017-03-08 Wednesday 2
# 9 2017-03-09 Thursday 2
# 10 2017-03-10 Friday 2