I have data from several years and each record has a date value (YYYY-MM-DD). I want to label each record with the season that it fell into. For example, I want to take all the records from December 15 to March 15, across all years, and put "Winter" in a season column. Is there a way in R to specify a sequence of dates using just the month and date, regardless of year?
Lubridate quarter command doesn't work because I have custom dates to define the seasons and the seasons are not all of equal length, and I can't just do month(datevalue) %in% c(12,1,2,3) because I need to split the months in half (i.e. March 15 is winter and March 16 is spring).
I could manually enter in the date range for each year in my dataset (e.g. Dec 15 2015 to March 15 2015 or Dec 15 2016 to Mar 15 2016, etc...), but is there a better way?
You can extract the month and date out of the date column and use case_when to assign Season based on those two dates.
library(dplyr)
library(lubridate)
df %>%
mutate(day = day(Date),
month = month(Date),
Season = case_when(#15 December to 15 March as Winter
month == 12 & day >= 15 |
month %in% 1:2 | month == 3 & day <= 15 ~ "Winter",
#Add conditions for other season
)
)
We assume that when the question says that winter is "Dec 15 2015 to March 15 201 or Dec 15 2016 to Mar 15 2016" what is really meant is that winter is Dec 16, 2015 to Mar 15, 2016 or Dec 16, 2016 to Mar 15, 2017.
Also it is not clear what the precise output is supposed to be but in each case below we provide a second argument which takes a vector giving the season names or numbers. The default is that winter is reported as 1, spring is 2, summer is 3 and fall is 4 but you could pass a second argument of c("Winter", "Spring", "Summer", "Fall") instead or use other names if you wish.
1) yearmon/yearqtr Convert to Date class and subtract 15. Then convert that to yearmon class which represents dates internally as year + fraction where fraction = 0 for January, 1/12 for February, ..., 11/12 for December. Add 1/12 to get to the next month. Convert that to yearqtr class which represents dates as year + fraction where fraction is 0, 1/4, 2/4 or 3/4 for the 4 quarters and take cycle of that which gives the quarter number (1, 2, 3 or 4).
If we knew that the input x was a Date vector as opposed to a character vector then we could simplify this by replacing as.Date(x) in season.
library(zoo)
season <- function(x, s = 1:4)
s[cycle(as.yearqtr(as.yearmon(as.Date(x) - 15) + 1/12))]
# test
d <- c(as.Date("2020-12-15") + 0:1, as.Date("2021-03-15") + 0:1)
season(d)
## [1] 4 1 1 2
season(d, c("Winter", "Spring", "Summer", "Fall"))
## [1] "Fall" "Winter" "Winter" "Spring"
2) base The above could be translated to base R using POSIXlt. Subtract 15 as before and then add 1 to the month to get to the next month. Finally extract the month and ensure that is is less than or equal to the third month.
season.lt <- function(x, s = 1:4) {
lt <- as.POSIXlt(as.Date(d) - 15)
lt$mon <- lt$mon + 1
s[as.POSIXlt(format(lt))$mon %/% 3 + 1]
}
# test - d defined in (1)
is.season.lt(d)
## [1] 4 1 1 2
3) lubridate We can follow the same logic in lubridate like this:
season.lub <- function(x, s = 1:4)
s[(month((as.Date(x) - 15) %m+% months(1)) - 1) %/% 3 + 1]
# test - d defined in (1)
season.lub(d)
## [1] 4 1 1 2
I am working with a data set that has the following structure for its dates:
Week DateStart DateEnd Day
1 5-Aug-16 11-Aug-16 Monday
2 12-Aug-16 18-Aug-16 Thursday
Where "Week" corresponds to a study week number, "DateStart" and "DateEnd" are the first and last days of that week, and "Day" represents the specific day from within that week. I would like to use the "DateStart", "DateEnd", and "Day" fields to create a new field, "Date", that assigns a specific date to each "Day" that falls within the "DateStart" and "DateEnd" interval.
I've used the %--% operator to turn DateStart and DateEnd into an interval:
Week_Interval <- DateStart %--% DateEnd
but then I haven't had much luck on figuring out how to match the Day field to a date within the resulting interval. I've tried reading through the lubridate documentation, but it didn't seem like there was anything in there that could specifically solve my problem. I'm hoping someone here might have some experience with this and could help point me in the right direction.
My ideal output would be something like:
Week DateStart DateEnd Day Date
1 5-Aug-16 11-Aug-16 Monday 08-08-2016
2 12-Aug-16 18-Aug-16 Thursday 18-08-2016
Where the date follows the standard dd-mm-yyyy format.
Take the difference between the day of the week of Day and DateStart modulo 7 and add that to the DateStart.
No packages are used.
dow <- c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
transform(DF, Date =
DateStart + (match(Day, dow) - 1 - as.POSIXlt(DateStart)$wday) %% 7)
giving:
Week DateStart DateEnd Day Date
1 1 2016-08-05 2016-08-11 Monday 2016-08-08
2 2 2016-08-12 2016-08-18 Thursday 2016-08-18
Note 1
An alternative to writing out the days of the week, provided you are in an English locale, is:
dow <- weekdays(as.Date("1950-01-01") + 0:6)
Note 2
In the example the Start Date is Friday on both rows. If it were known that that is always the case we could shorten the code by hard coding it as 5:
transform(DF, Date = DateStart + (match(Day, dow) - 1 - 5) %% 7)
Note 3
The input, in reproducible form, is:
Lines <- "Week DateStart DateEnd Day
1 5-Aug-16 11-Aug-16 Monday
2 12-Aug-16 18-Aug-16 Thursday"
DF <- read.table(text = Lines, header = TRUE)
fmt <- "%d-%b-%y"
DF <- transform(DF, DateStart = as.Date(DateStart, fmt),
DateEnd = as.Date(DateEnd, fmt))
# example data
df = read.table(text = "
Week DateStart DateEnd Day
1 5-Aug-16 11-Aug-16 Monday
2 12-Aug-16 18-Aug-16 Thursday
", header=T, stringsAsFactors=F)
library(tidyverse)
library(lubridate)
df %>%
group_by(Week, Day) %>% # for each week and day
mutate(Date = list(seq(dmy(DateStart), dmy(DateEnd), "1 day")), # get sequence of dates between start and end
Day2 = map(Date, weekdays)) %>% # get name of days for each date in the sequence
unnest() %>% # unnest dates
ungroup() %>% # forget the grouping
filter(Day == Day2) %>% # keep days that match
select(-Day2) # remove unnecessary column
# # A tibble: 2 x 5
# Week DateStart DateEnd Day Date
# <int> <chr> <chr> <chr> <date>
# 1 1 5-Aug-16 11-Aug-16 Monday 2016-08-08
# 2 2 12-Aug-16 18-Aug-16 Thursday 2016-08-18
I'm running into trouble converting a year + month string containing week number 53 using the as.Date() function in R.
The code works for the top example for week number 52 but returns NA for the bottom example for week number 53.
a <- "2017521"
as.Date(a, '%Y%W%u')
"2017-12-25"
b <- "2017531"
as.Date(b, '%Y%W%u')
NA
You're getting NA for b <- "2017531" because you're trying to reference a date that did not exist.
This has to do with the way you formatted your date, and the way the calendar is initiated.
%W refers to the numerical week 00-53
%u refers to the day of the week 1-7 Monday is 1
b <- "2017531"
as.Date(b, '%Y%W%u')
# [1] NA
Week 53 day 1 would refer to Monday of the 53rd week. But the only day of the week that occurred on the 53rd week of 2017 was Sunday.
c <- "2017537"
as.Date(a, '%Y%W%u')
# [1] "2017-12-31"
You can further confirm this by checking the date Saturday of week 52:
d <- "2017526"
as.Date(a, '%Y%W%u')
# [1] "2017-12-30"
I am new in R.
I want the week number of the month, which the date belongs to.
By using the following code:
>CurrentDate<-Sys.Date()
>Week Number <- format(CurrentDate, format="%U")
>Week Number
"31"
%U will return the Week number of the year .
But i want the week number of the month.
If the date is 2014-08-01 then i want to get 1.( The Date belongs to the 1st week of the month).
Eg:
2014-09-04 -> 1 (The Date belongs to the 1st week of the month).
2014-09-10 -> 2 (The Date belongs to the 2nd week of the month).
and so on...
How can i get this?
Reference:
http://astrostatistics.psu.edu/su07/R/html/base/html/strptime.html
By analogy of the weekdays function:
monthweeks <- function(x) {
UseMethod("monthweeks")
}
monthweeks.Date <- function(x) {
ceiling(as.numeric(format(x, "%d")) / 7)
}
monthweeks.POSIXlt <- function(x) {
ceiling(as.numeric(format(x, "%d")) / 7)
}
monthweeks.character <- function(x) {
ceiling(as.numeric(format(as.Date(x), "%d")) / 7)
}
dates <- sample(seq(as.Date("2000-01-01"), as.Date("2015-01-01"), "days"), 7)
dates
#> [1] "2004-09-24" "2002-11-21" "2011-08-13" "2008-09-23" "2000-08-10" "2007-09-10" "2013-04-16"
monthweeks(dates)
#> [1] 4 3 2 4 2 2 3
Another solution to use stri_datetime_fields() from the stringi package:
stringi::stri_datetime_fields(dates)$WeekOfMonth
#> [1] 4 4 2 4 2 3 3
You can use day from the lubridate package. I'm not sure if there's a week-of-month type function in the package, but we can do the math.
library(lubridate)
curr <- Sys.Date()
# [1] "2014-08-08"
day(curr) ## 8th day of the current month
# [1] 8
day(curr) / 7 ## Technically, it's the 1.14th week
# [1] 1.142857
ceiling(day(curr) / 7) ## but ceiling() will take it up to the 2nd week.
# [1] 2
Issue Overview
It was difficult to tell which answers worked, so I built my own function nth_week and tested it against the others.
The issue that's leading to most of the answers being incorrect is this:
The first week of a month is often a short-week
Same with the last week of the month
For example, October 1st 2019 is a Tuesday, so 6 days into October (which is a Sunday) is already the second week. Also, contiguous months often share the same week in their respective counts, meaning that the last week of the prior month is commonly also the first week of the current month. So, we should expect a week count higher than 52 per year and some months that contain a span of 6 weeks.
Results Comparison
Here's a table showing examples where some of the above suggested algorithms go awry:
DATE Tori user206 Scri Klev Stringi Grot Frei Vale epi iso coni
Fri-2016-01-01 1 1 1 1 5 1 1 1 1 1 1
Sat-2016-01-02 1 1 1 1 1 1 1 1 1 1 1
Sun-2016-01-03 2 1 1 1 1 2 2 1 -50 1 2
Mon-2016-01-04 2 1 1 1 2 2 2 1 -50 -51 2
----
Sat-2018-12-29 5 5 5 5 5 5 5 4 5 5 5
Sun-2018-12-30 6 5 5 5 5 6 6 4 -46 5 6
Mon-2018-12-31 6 5 5 5 6 6 6 4 -46 -46 6
Tue-2019-01-01 1 1 1 1 6 1 1 1 1 1 1
You can see that only Grothendieck, conighion, Freitas, and Tori are correct due to their treatment of partial week periods. I compared all days from year 100 to year 3000; there are no differences among those 4. (Stringi is probably correct for noting weekends as separate, incremented periods, but I didn't check to be sure; epiweek() and isoweek(), because of their intended uses, show some odd behavior near year-ends when using them for week incrementation.)
Speed Comparison
Below are the tests for efficiency between the implementations of: Tori, Grothendieck, Conighion, and Freitas
# prep
library(lubridate)
library(tictoc)
kepler<- ymd(15711227) # Kepler's birthday since it's a nice day and gives a long vector of dates
some_dates<- seq(kepler, today(), by='day')
# test speed of Tori algorithm
tic(msg = 'Tori')
Tori<- (5 + day(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7
toc()
Tori: 0.19 sec elapsed
# test speed of Grothendieck algorithm
wk <- function(x) as.numeric(format(x, "%U"))
tic(msg = 'Grothendieck')
Grothendieck<- (wk(some_dates) - wk(as.Date(cut(some_dates, "month"))) + 1)
toc()
Grothendieck: 1.99 sec elapsed
# test speed of conighion algorithm
tic(msg = 'conighion')
weeknum <- as.integer( format(some_dates, format="%U") )
mindatemonth <- as.Date( paste0(format(some_dates, "%Y-%m"), "-01") )
weeknummin <- as.integer( format(mindatemonth, format="%U") ) # the number of the week of the first week within the month
conighion <- weeknum - (weeknummin - 1) # this is as an integer
toc()
conighion: 2.42 sec elapsed
# test speed of Freitas algorithm
first_day_of_month_wday <- function(dx) {
day(dx) <- 1
wday(dx)
}
tic(msg = 'Freitas')
Freitas<- ceiling((day(some_dates) + first_day_of_month_wday(some_dates) - 1) / 7)
toc()
Freitas: 0.97 sec elapsed
Fastest correct algorithm by about at least 5X
require(lubridate)
(5 + day(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7
# some_dates above is any vector of dates, like:
some_dates<- seq(ymd(20190101), today(), 'day')
Function Implementation
I also wrote a generalized function for it that performs either month or year week counts, begins on a day you choose (i.e. say you want to start your week on Monday), labels output for easy checking, and is still extremely fast thanks to lubridate.
nth_week<- function(dates = NULL,
count_weeks_in = c("month","year"),
begin_week_on = "Sunday"){
require(lubridate)
count_weeks_in<- tolower(count_weeks_in[1])
# day_names and day_index are for beginning the week on a day other than Sunday
# (this vector ordering matters, so careful about changing it)
day_names<- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
# index integer of first match
day_index<- pmatch(tolower(begin_week_on),
tolower(day_names))[1]
### Calculate week index of each day
if (!is.na(pmatch(count_weeks_in, "year"))) {
# For year:
# sum the day of year, index for day of week at start of year, and constant 5
# then integer divide quantity by 7
# (explicit on package so lubridate and data.table don't fight)
n_week<- (5 +
lubridate::yday(dates) +
lubridate::wday(floor_date(dates, 'year'),
week_start = day_index)
) %/% 7
} else {
# For month:
# same algorithm as above, but for month rather than year
n_week<- (5 +
lubridate::day(dates) +
lubridate::wday(floor_date(dates, 'month'),
week_start = day_index)
) %/% 7
}
# naming very helpful for review
names(n_week)<- paste0(lubridate::wday(dates,T), '-', dates)
n_week
}
Function Output
# Example raw vector output:
some_dates<- seq(ymd(20190930), today(), by='day')
nth_week(some_dates)
Mon-2019-09-30 Tue-2019-10-01 Wed-2019-10-02
5 1 1
Thu-2019-10-03 Fri-2019-10-04 Sat-2019-10-05
1 1 1
Sun-2019-10-06 Mon-2019-10-07 Tue-2019-10-08
2 2 2
Wed-2019-10-09 Thu-2019-10-10 Fri-2019-10-11
2 2 2
Sat-2019-10-12 Sun-2019-10-13
2 3
# Example tabled output:
library(tidyverse)
nth_week(some_dates) %>%
enframe('DATE','nth_week_default') %>%
cbind(some_year_day_options = as.vector(nth_week(some_dates, count_weeks_in = 'year', begin_week_on = 'Mon')))
DATE nth_week_default some_year_day_options
1 Mon-2019-09-30 5 40
2 Tue-2019-10-01 1 40
3 Wed-2019-10-02 1 40
4 Thu-2019-10-03 1 40
5 Fri-2019-10-04 1 40
6 Sat-2019-10-05 1 40
7 Sun-2019-10-06 2 40
8 Mon-2019-10-07 2 41
9 Tue-2019-10-08 2 41
10 Wed-2019-10-09 2 41
11 Thu-2019-10-10 2 41
12 Fri-2019-10-11 2 41
13 Sat-2019-10-12 2 41
14 Sun-2019-10-13 3 41
Hope this work saves people the time of having to weed through all the responses to figure out which are correct.
I don't know R but if you take the week of the first day in the month you could use it to get the week in the month
2014-09-18
First day of month = 2014-09-01
Week of first day on month = 36
Week of 2014-09-18 = 38
Week in the month = 1 + (38 - 36) = 3
Using lubridate you can do
ceiling((day(date) + first_day_of_month_wday(date) - 1) / 7)
Where the function first_day_of_month_wday returns the weekday of the first day of month.
first_day_of_month_wday <- function(dx) {
day(dx) <- 1
wday(dx)
}
This adjustment must be done in order to get the correct week number otherwise if you have the 7th day of month on a Monday you will get 1 instead of 2, for example.
This is only a shift in the day of month.
The minus 1 is necessary because when the first day of month is sunday the adjustment is not needed, and the others weekdays follow this rule.
I came across the same issue and I solved it with mday from data.table package. Also, I realized that when using the ceiling() function, one also needs to account for the '5th week' situation. For example ceiling of the 30th day of a month ceiling(30/7) will give 5 ! Therefore, the ifelse statement below.
# Create a sample data table with days from year 0 until present
DT <- data.table(days = seq(as.Date("0-01-01"), Sys.Date(), "days"))
# compute the week of the month and account for the '5th week' case
DT[, week := ifelse( ceiling(mday(days)/7)==5, 4, ceiling(mday(days)/7) )]
> DT
days week
1: 0000-01-01 1
2: 0000-01-02 1
3: 0000-01-03 1
4: 0000-01-04 1
5: 0000-01-05 1
---
736617: 2016-10-14 2
736618: 2016-10-15 3
736619: 2016-10-16 3
736620: 2016-10-17 3
736621: 2016-10-18 3
To have an idea about the speed, then run:
system.time( DT[, week := ifelse( ceiling(mday(days)/7)==5, 4, ceiling(mday(days)/7) )] )
# user system elapsed
# 3.23 0.05 3.27
It took approx. 3 seconds to compute the weeks for more than 700 000 days.
However, the ceiling way above will always create the last week longer than all the other weeks (the four weeks have 7,7,7, and 9 or 10 days). Another way would be to use something like
ceiling(1:31/31*4)
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4
where you get 7, 8 , 8 and 8 days per respective week in a 31 days month.
DT[, week2 := ceiling(mday(days)/31*4)]
There is a simple way to do it with lubridate package:
isoweek() returns the week as it would appear in the ISO 8601 system, which uses a reoccurring leap week.
epiweek() is the US CDC version of epidemiological week. It follows same rules as
isoweek() but starts on Sunday. In other parts of the world the convention is to start epidemiological weeks on Monday, which is the same as isoweek().
Reference here
I am late to the party and maybe noone is gonna read this answer...
Anyway, why not stay simple and do it like this:
library(lubridate)
x <- ymd(20200311, 20200308)
week(x) - week(floor_date(x, unit = "months")) + 1
[1] 3 2
I don't know any build in functions but a work around would be
CurrentDate <- Sys.Date()
# The number of the week relative to the year
weeknum <- as.integer( format(CurrentDate, format="%U") )
# Find the minimum week of the month relative to the year
mindatemonth <- as.Date( paste0(format(CurrentDate, "%Y-%m"), "-01") )
weeknummin <- as.integer( format(mindatemonth, format="%U") ) # the number of the week of the first week within the month
# Calculate the number of the week relative to the month
weeknum <- weeknum - (weeknummin - 1) # this is as an integer
# With the following you can convert the integer to the same format of
# format(CurrentDate, format="%U")
formatC(weeknum, width = 2, flag = "0")
Simply do this:
library(lubridate)
ds1$Week <- week(ds1$Sale_Date)
This is high performance! It instantly works on my 12 milion rows dataset.
On example above, ds1 is the dataset, Sale_Date is a date column (like "2015-11-23")
The other approach, using "as.integer( format..." might work on small datasets, but on 12 million rows it would keep running forever...
I have following data set:
>d
x date
1 1 1-3-2013
2 2 2-4-2010
3 3 2-5-2011
4 4 1-6-2012
I want:
> d
x date
1 1 31-12-2013
2 2 31-12-2010
3 3 31-12-2011
4 4 31-12-2012
i.e. Last day, last month and the year of the date object.
Please Help!
You can also just use the ceiling_date function in LUBRIDATE package.
You can do something like -
library(lubridate)
last_date <- ceiling_date(date,"year") - days(1)
ceiling_date(date,"year") gives you the first date of the next year and to get the last date of the current year, you subtract this by 1 or days(1).
Hope this helps.
Another option using lubridate package:
## using d from Roland answer
transform(d,last =dmy(paste0('3112',year(dmy(date)))))
x date last
1 1 1-3-2013 2013-12-31
2 2 2-4-2010 2010-12-31
3 3 2-5-2011 2011-12-31
4 4 1-6-2012 2012-12-31
d <- read.table(text="x date
1 1 1-3-2013
2 2 2-4-2010
3 3 2-5-2011
4 4 1-6-2012", header=TRUE)
d$date <- as.Date(d$date, "%d-%m-%Y")
d$date <- as.POSIXlt(d$date)
d$date$mon <- 11
d$date$mday <- 31
d$date <- as.Date(d$date)
# x date
#1 1 2013-12-31
#2 2 2010-12-31
#3 3 2011-12-31
#4 4 2012-12-31
1) cut.Date Define cut_year to give the first day of the year. Adding 366 gets us to the next year and then applying cut_year again gets us to the first day of the next year. Finally subtract 1 to get the last day of the year. The code uses base functionality only.
cut_year <- function(x) as.Date(cut(as.Date(x), "year"))
transform(d, date = cut_year(cut_year(date) + 366) - 1)
2) format
transform(d, date = as.Date(format(as.Date(date), "%Y-12-31")))
3) zoo A "yearmon" class variable stores the date as a year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec. Thus taking its floor and adding 11/12 gets one to Dec and as.Date.yearmon(..., frac = 1) uses the last of the month instead of the first.
library(zoo)
transform(d, date = as.Date(floor(as.yearmon(as.Date(date))) + 11 / 12, frac = 1))
Note: The inner as.Date in cut_year and in the other two solutions can be omitted if it is known that date is already of "Date" class.
ADDED additional solutions.