Finding the first day of specific months in R - r

I currently have a column "Month" & a column "DayWeek" with the Month and Day of the week written out. Using the code below I can get a column with a 1 for each Wednesday in Feb, May, Aug & Nov. Im struggling to find a way to get a column with 1s just for the first Wednesday of each of the 4 months I just mentioned. Any ideas or do I have to create a loop for it?
testPrices$Rebalance <- ifelse((testPrices$Month=="February" & testPrices$DayWeek == "Wednesday"),1,ifelse((testPrices$Month=="May" & testPrices$DayWeek == "Wednesday"),1,ifelse((testPrices$Month=="August" & testPrices$DayWeek == "Wednesday"),1,ifelse((testPrices$Month=="November" & testPrices$DayWeek == "Wednesday"),1,0))))

Well, without a reproducible example, I couldn't come up with a complete solution, but here is a way to generate the first Wednesday date of each month. In this example, I start at 1 JAN 2013 and go out 36 months, but you can figure out what's appropriate for you. Then, you can check against the first Wednesday vector produced here to see if your dates are members of the first Wednesday of the month group and assign a 1, if so.
# I chose this as an origin
orig <- "2013-01-01"
# generate vector of 1st date of the month for 36 months
d <- seq(as.Date(orig), length=36, by="1 month")
# Use that to make a list of the first 7 dates of each month
d <- lapply(d, function(x) as.Date(seq(1:7),origin=x)-1)
# Look through the list for Wednesdays only,
# and concatenate them into a vector
do.call('c', lapply(d, function(x) x[strftime(x,"%A")=="Wednesday"]))
Output:
[1] "2013-01-02" "2013-02-06" "2013-03-06" "2013-04-03" "2013-05-01" "2013-06-05" "2013-07-03"
[8] "2013-08-07" "2013-09-04" "2013-10-02" "2013-11-06" "2013-12-04" "2014-01-01" "2014-02-05"
[15] "2014-03-05" "2014-04-02" "2014-05-07" "2014-06-04" "2014-07-02" "2014-08-06" "2014-09-03"
[22] "2014-10-01" "2014-11-05" "2014-12-03" "2015-01-07" "2015-02-04" "2015-03-04" "2015-04-01"
[29] "2015-05-06" "2015-06-03" "2015-07-01" "2015-08-05" "2015-09-02" "2015-10-07" "2015-11-04"
[36] "2015-12-02"
Note: I adapted this code from answers found here and here.

I created a sample dataset to work with like this (Thanks #Frank!):
orig <- "2013-01-01"
d <- data.frame(date=seq(as.Date(orig), length=1000, by='1 day'))
d$Month <- months(d$date)
d$DayWeek <- weekdays(d$date)
d$DayMonth <- as.numeric(format(d$date, '%d'))
From a data frame like this, you can extract the first Wednesday of specific months using subset, like this:
subset(d, Month %in% c('January', 'February') & DayWeek == 'Wednesday' & DayMonth < 8)
This takes advantage of the fact that the day number (1..31) will always be between 1 to 7, and obviously there will be precisely one such day. You could do similarly for 2nd, 3rd, 4th Wednesday, changing the condition to accordingly, for example DayMonth > 7 & DayMonth < 15.

Related

How can I create a sequence of year-week string values based on existing dates?

I am plotting weekly figures that cross over from 2018 into 2019 and the tick marks on my X-axis represent the year then week.
For example:
2018-50, 2018-51, 2018-52, 2018-53, 2019-01, 2019-02, 2019-03
I have two data frames and the dates in either aren't always going to be the same. As such, one solution I have thought of that might work is to find the lowest yearWeek value in either data frame, and the maximum yearWeek value in either data frame, and to then create a sequence using those two values. Note that both values could either exist within a single data frame or one data frame could have the lowest/earliest value and the other the highest/latest value.
Both data frames look like this:
week yearWeek month day date
1 31 2018-31 2018-08-01 Wed 2018-08-01
2 31 2018-31 2018-08-01 Thu 2018-08-02
3 31 2018-31 2018-08-01 Fri 2018-08-03
4 31 2018-31 2018-08-01 Sat 2018-08-04
5 32 2018-32 2018-08-01 Sun 2018-08-05
6 32 2018-32 2018-08-01 Mon 2018-08-06
I have looked for a solution and this answer is almost there, but not quite.
The problems with this solution are:
The single-figure week number don't have a 0 before them; and
Despite specifying seq(31:53), for example, the output starts from 1 (I know why this happens); and
There doesn't seem to be a way to stop the count at 53 using this method (2018 had a (short) 53rd week which I would like to include) and resume from 2019-01 onwards.
I want to be able to set the X-axis range from 2018-31 (31st week of 2018) to 2019-13 (13th week of 2019).
Something like this:
In short, how can I create a sequence of year-week values ranging from the minimum date value to the maximum date value (in this case 2018-31-2019-13)?
I think this would work for you
x1 <- c(31:53)
x2 <- sprintf("%02d", c(1:13))
paste(c(rep(2018, length(x1)), rep(2019, length(x2))), c(x1, x2), sep = "-")
# [1] "2018-31" "2018-32" "2018-33" "2018-34" "2018-35" "2018-36" "2018-37"
# "2018-38" "2018-39" "2018-40" "2018-41" "2018-42" "2018-43" "2018-44"
# "2018-45" "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51"
# "2018-52" "2018-53" "2019-01" "2019-02" "2019-03" "2019-04" "2019-05"
# "2019-06" "2019-07" "2019-08" "2019-09" "2019-10" "2019-11" "2019-12" "2019-13"
For the updated question we can do
#rbind both the dataset
df <- rbind(df1, df2)
#convert them to date
df$Date <- as.Date(df$date)
#Generate a sequence from min date to maximum date, format them
# to year-week combination and select only the unique ones
unique(format(seq(min(df$Date), max(df$Date), by = "day"), "%Y-%W"))
Define two sequences, and then restrict to the range you want:
years <- c("2018", "2019")
months <- sprintf("%02d", c(1:52))
result <- apply(expand.grid(years, months), 1, function(x) paste(x,collapse="-"))
result <- result[result >= "2018-31" & result <= "2019-13"]
result
[1] "2019-01" "2019-02" "2019-03" "2019-04" "2019-05" "2019-06" "2019-07"
[8] "2019-08" "2019-09" "2019-10" "2019-11" "2019-12" "2019-13" "2018-31"
[15] "2018-32" "2018-33" "2018-34" "2018-35" "2018-36" "2018-37" "2018-38"
[22] "2018-39" "2018-40" "2018-41" "2018-42" "2018-43" "2018-44" "2018-45"
[29] "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51" "2018-52"
Note that the pruning off of dates we don't want works here even using text date strings, because all dates are fixed width strings, and are left zero padded, if necessary. So, sorting therefore works as it would for actual numbers.
here is a possibility using the str_pad function from the stringr package:
weeks <- str_pad(41:65 %% 53 + 1, 2, "left", "0")
years <- ifelse(41:65 <= 52, "2018", "2019")
paste(years, weeks, sep = "-")
[1] "2018-42" "2018-43" "2018-44" "2018-45" "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51" "2018-52" "2018-53" "2019-01" "2019-02" "2019-03" "2019-04" "2019-05" "2019-06" "2019-07" "2019-08" "2019-09"
[22] "2019-10" "2019-11" "2019-12" "2019-13"
As I just learned from the other two answers sprintf provides a base alternative to str_pad. So you can also use
weeks <- sprintf("%02d", 41:65 %% 53 + 1)
Here is a possibility using strftime:
weeks <- seq(from = ISOdate(2018,12,10), to = ISOdate(2019,4,1), by="week")
strftime(weeks,format="%Y-%W")

Get the month from the week of the year

Let's say we have this:
ex <- c('2012-41')
This represent the week 41 from the year 2012. How would I get the month from this?
Since a week can be between two months, I will be interested to get the month when that week started (here October).
Not duplicate to How to extract Month from date in R (do not have a standard date format like %Y-%m-%d).
you could try:
ex <- c('2019-10')
splitDate <- strsplit(ex, "-")
dateNew <- as.Date(paste(splitDate[[1]][1], splitDate[[1]][2], 1, sep="-"), "%Y-%U-%u")
monthSelected <- lubridate::month(dateNew)
3
I hope this helps!
This depends on the definition of week. See the discussion of %V and %W in ?strptime for two possible definitions of week. We use %V below but the function allows one to specify the other if desired. The function performs a sapply over the elements of x and for each such element it extracts the year into yr and forms a sequence of all dates for that year in sq. It then converts those dates to year-month and finds the first occurrence of the current component of x in that sequence, finally extracting the match's month.
yw2m <- function(x, fmt = "%Y-%V") {
sapply(x, function(x) {
yr <- as.numeric(substr(x, 1, 4))
sq <- seq(as.Date(paste0(yr, "-01-01")), as.Date(paste0(yr, "-12-31")), "day")
as.numeric(format(sq[which.max(format(sq, fmt) == x)], "%m"))
})
}
yw2m('2012-41')
## [1] 10
The following will add the week-of-year to an input of year-week formatted strings and return a vector of dates as character. The lubridate package weeks() function will add the dates corresponding to the end of the relevant week. Note for example I've added an additional case in your 'ex' variable to the 52nd week, and it returns Dec-31st
library(lubridate)
ex <- c('2012-41','2016-4','2018-52')
dates <- strsplit(ex,"-")
dates <- sapply(dates,function(x) {
year_week <- unlist(x)
year <- year_week[1]
week <- year_week[2]
start_date <- as.Date(paste0(year,'-01-01'))
date <- start_date+weeks(week)
#note here: OP asked for beginning of week.
#There's some ambiguity here, the above is end-of-week;
#uncommment here for beginning of week, just subtracted 6 days.
#I think this might yield inconsistent results, especially year-boundaries
#hence suggestion to use end of week. See below for possible solution
#date <- start_date+weeks(week)-days(6)
return (as.character(date))
})
Yields:
> dates
[1] "2012-10-14" "2016-01-29" "2018-12-31"
And to simply get the month from these full dates:
month(dates)
Yields:
> month(dates)
[1] 10 1 12

How do I find the first and last day of next month?

If I have a given date, how do I find the first and last days of the next month?
For example,
today <- as.Date("2009-04-04")
I want to find
# first date in next month
"2009-05-01"
# last date in next month
"2009-05-31"
You can do this with base R:
today <- as.Date("2009-04-04")
first <- function(x) {
x <- as.POSIXlt(x)
x$mon[] <- x$mon + 1
x$mday[] <- 1
x$isdst[] <- -1L
as.Date(x)
}
first(today)
#[1] "2009-05-01"
first(first(today)) - 1
#[1] "2009-05-31"
lubridate has some useful tools for this purpose.
library(lubridate)
today <- ymd("2009-04-12")
# First day of next month
first <- ceiling_date(today, unit = "month")
# Last day of next month
last <- ceiling_date(first, unit= "month") -1
first
#"2009-05-01"
last
#"2009-05-31"
Here are some solutions. We use today from the question to test. In both cases the input may be a Date class vector.
1) Base R Define function fom to give the first of the month of its Date
argument. Using that we can get the date of the first and last of the next month as follows. We use the facts that 31 and 62 days after the first of the month is necessarily a date in the next month and month after the next month.
fom <- function(x) as.Date(cut(x, "month"))
fom(fom(today) + 31)
## [1] "2009-05-01"
fom(fom(today) + 62) - 1
## [1] "2009-05-31"
2) yearmon yearmon class objects internally represent a year and month as the year plus 0 for January, 1/12 for Febrary, 2/12 for March and so on. Using as.Date.yearmon the frac argument specifies the fraction of the way through the month to output. The default is frac = 0 and results in the first of the month being output and frac = 1 means the end of the month.
library(zoo)
as.Date(as.yearmon(today) + 1/12)
## [1] "2009-05-01"
as.Date(as.yearmon(today) + 1/12, frac = 1)
## [1] "2009-05-31"

How to Get the Same Weekday Last Year Given any Given Year?

I would like to get the same day last year given any year. How can I best do this in R. For example, given Sunday 2010/01/03, I would like to obtain the Sunday of the same week the year before.
# "Sunday"
weekdays(as.Date("2010/01/03", format="%Y/%m/%d"))
# "Saturday"
weekdays(as.Date("2009/01/03", format="%Y/%m/%d"))
To find the same weekday one year ago, simply subtract 52 weeks or 364 days from the given date:
d <- as.Date("2010-01-03")
weekdays(d)
#[1] "Sunday"
d - 52L * 7L
#[1] "2009-01-04"
weekdays(d - 52L * 7L)
#[1] "Sunday"
Please note that the calendar year has 365 days (or 366 days in a leap year) which is one or two days more than 52 weeks. So, the calendar date of the same weekday one year ago moves on by one or two days. (Or, it explains why New Year's Eve is always on a different weekday.)
Using lubridate the following formula will give you the corresponding weekday in the same week in the previous year:
as.Date(dDate - 364 - ifelse(weekdays( dDate - 363) == weekdays( dDate ), 1, 0))
Where dDate is some date, i.e. dDate <- as.Date("2016-02-29"). The ifelse accounts for leap years.
Here's a simple algorithm. subtract 365 days from the day of interest. Adjust that day to the closest matching day of the week using the Tableau code below (easily translatable into other languages). This is equivalent to the rule in the table below (with 1 = Monday and 7 = Sunday). Basically you adjust day - 365 to be on the correct day of the week either in the same week if that moves <= 3 days otherwise you use the matching weekday from the previous/next week. It'll choose whichever leads to the least difference in terms of # of days.
[day prior year raw] = [day] - 365
[matching day prior year] =
if abs(datepart('weekday',[day]) - datepart('weekday',[day prior year raw]))<= 3
then [day prior year raw]+datepart('weekday',[day]) - datepart('weekday',[day prior year raw])
else [day prior year raw]+(if datepart('weekday',[day]) > datepart('weekday',[day prior year raw])
then -7+(datepart('weekday',[day]) - datepart('weekday',[day prior year raw]))
else 7+(datepart('weekday',[day]) - datepart('weekday',[day prior year raw])) end
)
end)
Look at ?years in package lubridate. This creates a period object which correctly spans a period, across leap years.
> library(lubridate)
> # set the reference date
> d1 = as.Date("2017/01/03", format="%Y/%m/%d")
>
> # verify across years and leap years
> d1 - years(1)
[1] "2016-01-03"
> d1 - years(2)
[1] "2015-01-03"
> d1 - years(3)
[1] "2014-01-03"
> d1 - years(4)
[1] "2013-01-03"
> d1 - years(5)
[1] "2012-01-03"
>
> weekdays(d1 - years(1))
[1] "Sunday"
> weekdays(d1 - years(2))
[1] "Saturday"
>
> # feb 29 on year period in yields NA
> ymd("2016/02/29") - years(1)
[1] NA
>
> # feb 29 in a non-leap year fails to convert
> ymd("2015/02/29") - years(1)
[1] NA
Warning message:
All formats failed to parse. No formats found.
>
> # feb 29, leap year with 4 year period works.
> ymd("2016/02/29") - years(4)
[1] "2012-02-29"
>

Format date strings comprising weeks and quarters as Date objects

I have dates in an R dataframe column formatted as character strings as WK01Q32014.
I want to turn each date into a Date() object.
So I altered the format to make it look like 01-3-2014. I want to try to do something like as.Date("01-3-2014","%W-%Q-%Y") for example, but there is no format code for quarters that I know of.
Is there any way to do this using the lubridate, zoo, or any other libraries?
I dont know of any specific function, but here's a basic one:
convert_WQ_to_Date <- function(D) {
weeks <- as.integer(substr(D, 3, 4))
quarter <- as.integer(substr(D, 6, 6))
year <- substr(D, 7, 10)
days <- 7 * ((quarter - 1) * 13 + (weeks-1))
as.Date(sprintf("%s-01-01", year)) + days
}
Example
D <- c("WK01Q32014", "WK01Q12014", "WK05Q42014", "WK01Q22014", "WK02Q32014")
convert_WQ_to_Date(D)
[1] "2014-07-02" "2014-01-01" "2014-10-29" "2014-04-02" "2014-07-09"
The week, quarter and year does not uniquely define a date so we will have to add some assumption. Here we add the assumption that the first week is the first day of the quarter, the second week is 7 days later and so on,
Below, we extract the qtr-year part and use as.yearqtr in the zoo package to convert that to a yearqtr object and then use as.Date to convert that to a date which is the first of the quarter. We then extract the week, subtract 1 and multiply by 7 to get the days offset. Adding the first of the quarter to the offset gives the result:
library(zoo)
xx <- "01-3-2014" # week-quarter-year
qtr.start <- as.Date(as.yearqtr(sub("...", "", xx), "%q-%Y"))
days <- 7 * (as.numeric(sub("-.*", "", xx)) - 1)
qtr.start + days
## [1] "2014-07-01"
Assuming the traditional notion of each quarter starting respectively at the 1st January, 1st April, 1st July and 1st September (in line with the quarters function), just start at these dates and add 7 days for each week:
x <- c("01-3-2014","01-1-2014","05-4-2014","01-2-2014","02-3-2014")
y <- as.numeric(substr(x,6,9))
m <- as.numeric(substr(x,4,4))
d <- as.numeric(substr(x,1,2))
as.Date(paste(y,(m-1)*3+1,"01",sep="-")) + (7*(d-1))
#[1] "2014-07-01" "2014-01-01" "2014-10-29" "2014-04-01" "2014-07-08"

Resources