This probably seems straightforward, but I am pretty stumped.
I have a set of dates ~ August 1 of each year and need to sum sales by week number. The earliest date is 2008-12-08 (YYYY-MM-DD). I need to create a "week_id" field where week #1 begins on 2008-12-08. And the date 2011-09-03 is week 142. Note that this is different since the calculation of week number does not reset every year.
I am putting up a small example dataset here:
data <- data.frame(
dates = c("2008-12-08", "2009-08-10", "2010-03-31", "2011-10-16", "2008-06-03", "2009-11-14" , "2010-05-05", "2011-09-03"))
data$date = as.Date(data$date)
Any help is appreciated
data$week_id = as.numeric(data$date - as.Date("2008-12-08")) %/% 7 + 1
This would take the day difference between the two dates and find the integer number of 7 days elapsed. I add one since we want the dates where zero weeks have elapsed since the start to be week 1 instead of week 0.
dates date week_id
1 2008-12-07 2008-12-07 0 # added for testing
2 2008-12-08 2008-12-08 1
3 2008-12-09 2008-12-09 1 # added for testing
4 2008-12-14 2008-12-14 1 # added for testing
5 2008-12-15 2008-12-15 2 # added for testing
6 2009-08-10 2009-08-10 36
7 2010-03-31 2010-03-31 69
8 2011-10-16 2011-10-16 149
9 2008-06-03 2008-06-03 -26
10 2009-11-14 2009-11-14 49
11 2010-05-05 2010-05-05 74
12 2011-09-03 2011-09-03 143
Given a date and the day of the week it is, I want to know if there is a code that tells me which of those days of the month it is. For example in the picture below, given 2/12/2020 and "Wednesday" I want to be given the output "2" for it being the second Wednesday of the month.
You can do that in base R in essentially one operation. You also do not need the second input column.
Here is slower walkthrough:
Code
dates <- c("2/12/2020","2/11/2020","2/10/2020","2/7/2020","2/6/2020", "2/5/2020")
Dates <- anytime::anydate(dates) ## one of several parsers
dow <- weekdays(Dates) ## for illustration, base R function
cnt <- (as.integer(format(Dates, "%d")) - 1) %/% 7 + 1
res <- data.frame(dt=Dates, dow=dow, cnt=cnt)
res
(Final) Output
R> res
dt dow cnt
1 2020-02-12 Wednesday 2
2 2020-02-11 Tuesday 2
3 2020-02-10 Monday 2
4 2020-02-07 Friday 1
5 2020-02-06 Thursday 1
6 2020-02-05 Wednesday 1
R>
Functionality like this is often in dedicated date/time libraries. I wrapped some code from the (C++) Boost date_time library in package RcppBDH -- that allowed to easily find 'the third Wednesday in the last month each quarter' and alike.
(lubridate::day(your_date) - 1) %/% 7 + 1
The idea here is that the first 7 days of the month are all the first for their weekday. Next 7 are 2nd, etc.
> (1:30 - 1) %/% 7 + 1
# [1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5
Just to offer an alternative calculation for the nth-weekday of the month, you can just divide the day by 7 and always round up:
date <- lubridate::mdy("02/12/2020")
ceiling(day(date)/7)
Hello I am trying to find the week number for a series of date over three years. However R is not giving the correct week number. I am generating a seq of dates from 2016-04-01 to 2019-03-30 and then I am trying to calculate week over three years such that I get the week number 54, 55 , 56 and so on.
However when I check the week 2016-04-03 R shows the week number as 14 where as when cross checked with excel it is the week number 15 and also it simply calculates 7 days and does not reference the actual calendar days. Also the week number starts from 1 for every start of year
The code looks like this
days <- seq(as.Date("2016-04-03"),as.Date("2019-03-30"),'days')
weekdays <- data.frame('days'=days, Month = month(days), week = week(days),nweek = rep(1,length(days)))
This is how the results looks like
days week
2016-04-01 14
2016-04-02 14
2016-04-03 14
2016-04-04 14
2016-04-05 14
2016-04-06 14
2016-04-07 14
2016-04-08 15
2016-04-09 15
2016-04-10 15
2016-04-11 15
2016-04-12 15
However when checked from excel this is what I get
days week
2016-04-01 14
2016-04-02 14
2016-04-03 15
2016-04-04 15
2016-04-05 15
2016-04-06 15
2016-04-07 15
2016-04-08 15
2016-04-09 15
2016-04-10 16
2016-04-11 16
2016-04-12 16
Can someone please help me identify wherever I am going wrong.
Thanks a lot in advance!!
Not anything that you're doing wrong per se, there is just a difference in how R (I presume you're using the lubridate package) and Excel calculate week numbers.
R will calculate week numbers based on the seven day block from 1 January that year; but
Excel calculates week numbers based on a week starting from Sunday.
Taking the first few days of January 2016 for an example. On, Friday, 1 January 2016, both R and Excel will say this is week 1.
On Sunday, 3 January 2016:
this is within the first seven days of the start of the year so R will return week number 1; but
it is a Sunday, so Excel ticks over to week number 2.
Try this:
ifelse(test = weekdays.Date(days[1]) == "Sunday", yes = epiweek(days[1]), no = epiweek(days[1]) + 1) + cumsum(weekdays.Date(days) == "Sunday")
This tests whether the first day is a Sunday or not and returns an appropriate week number starting point, then adds on one more week number each Sunday. Gives the same week number if there's overlap between years.
I'm trying to define a custom week for a dataframe.
I have a dataframe with timestamps.
I've read the questions on here regarding isocalendar. While it does the job. It's not what I want.
I'm trying to define the weeks from Friday to Thrusday.
For example:
Friday 2nd Jan 2015 would be the first day of the week.
Thursday 8th Jan 2015 would be the last day of the week.
And this would be week 1.
Is there a way to set a custom weekday? so when I access the the datetime library, I get the result that I expect.
df['Week_Number'] = df['Date'].dt.week
Here's one solution - convert your dates to a Period representing weeks that end on Thursday.
In [39]: df = pd.DataFrame({'Date':pd.date_range('2015-1-1', '2015-12-31')})
In [40]: df['Period'] = df['Date'].dt.to_period('W-THU')
In [41]: df['Week_Number'] = df['Period'].dt.week
In [44]: df.head()
Out[44]:
Date Period Week_Number
0 2015-01-01 2014-12-26/2015-01-01 1
1 2015-01-02 2015-01-02/2015-01-08 2
2 2015-01-03 2015-01-02/2015-01-08 2
3 2015-01-04 2015-01-02/2015-01-08 2
4 2015-01-05 2015-01-02/2015-01-08 2
Note that it follows the same convention as datetimes, where week 1 can be incomplete, so you may have to do a little extra munging if you want 1 to be the first complete week.
I am new in R.
I want the week number of the month, which the date belongs to.
By using the following code:
>CurrentDate<-Sys.Date()
>Week Number <- format(CurrentDate, format="%U")
>Week Number
"31"
%U will return the Week number of the year .
But i want the week number of the month.
If the date is 2014-08-01 then i want to get 1.( The Date belongs to the 1st week of the month).
Eg:
2014-09-04 -> 1 (The Date belongs to the 1st week of the month).
2014-09-10 -> 2 (The Date belongs to the 2nd week of the month).
and so on...
How can i get this?
Reference:
http://astrostatistics.psu.edu/su07/R/html/base/html/strptime.html
By analogy of the weekdays function:
monthweeks <- function(x) {
UseMethod("monthweeks")
}
monthweeks.Date <- function(x) {
ceiling(as.numeric(format(x, "%d")) / 7)
}
monthweeks.POSIXlt <- function(x) {
ceiling(as.numeric(format(x, "%d")) / 7)
}
monthweeks.character <- function(x) {
ceiling(as.numeric(format(as.Date(x), "%d")) / 7)
}
dates <- sample(seq(as.Date("2000-01-01"), as.Date("2015-01-01"), "days"), 7)
dates
#> [1] "2004-09-24" "2002-11-21" "2011-08-13" "2008-09-23" "2000-08-10" "2007-09-10" "2013-04-16"
monthweeks(dates)
#> [1] 4 3 2 4 2 2 3
Another solution to use stri_datetime_fields() from the stringi package:
stringi::stri_datetime_fields(dates)$WeekOfMonth
#> [1] 4 4 2 4 2 3 3
You can use day from the lubridate package. I'm not sure if there's a week-of-month type function in the package, but we can do the math.
library(lubridate)
curr <- Sys.Date()
# [1] "2014-08-08"
day(curr) ## 8th day of the current month
# [1] 8
day(curr) / 7 ## Technically, it's the 1.14th week
# [1] 1.142857
ceiling(day(curr) / 7) ## but ceiling() will take it up to the 2nd week.
# [1] 2
Issue Overview
It was difficult to tell which answers worked, so I built my own function nth_week and tested it against the others.
The issue that's leading to most of the answers being incorrect is this:
The first week of a month is often a short-week
Same with the last week of the month
For example, October 1st 2019 is a Tuesday, so 6 days into October (which is a Sunday) is already the second week. Also, contiguous months often share the same week in their respective counts, meaning that the last week of the prior month is commonly also the first week of the current month. So, we should expect a week count higher than 52 per year and some months that contain a span of 6 weeks.
Results Comparison
Here's a table showing examples where some of the above suggested algorithms go awry:
DATE Tori user206 Scri Klev Stringi Grot Frei Vale epi iso coni
Fri-2016-01-01 1 1 1 1 5 1 1 1 1 1 1
Sat-2016-01-02 1 1 1 1 1 1 1 1 1 1 1
Sun-2016-01-03 2 1 1 1 1 2 2 1 -50 1 2
Mon-2016-01-04 2 1 1 1 2 2 2 1 -50 -51 2
----
Sat-2018-12-29 5 5 5 5 5 5 5 4 5 5 5
Sun-2018-12-30 6 5 5 5 5 6 6 4 -46 5 6
Mon-2018-12-31 6 5 5 5 6 6 6 4 -46 -46 6
Tue-2019-01-01 1 1 1 1 6 1 1 1 1 1 1
You can see that only Grothendieck, conighion, Freitas, and Tori are correct due to their treatment of partial week periods. I compared all days from year 100 to year 3000; there are no differences among those 4. (Stringi is probably correct for noting weekends as separate, incremented periods, but I didn't check to be sure; epiweek() and isoweek(), because of their intended uses, show some odd behavior near year-ends when using them for week incrementation.)
Speed Comparison
Below are the tests for efficiency between the implementations of: Tori, Grothendieck, Conighion, and Freitas
# prep
library(lubridate)
library(tictoc)
kepler<- ymd(15711227) # Kepler's birthday since it's a nice day and gives a long vector of dates
some_dates<- seq(kepler, today(), by='day')
# test speed of Tori algorithm
tic(msg = 'Tori')
Tori<- (5 + day(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7
toc()
Tori: 0.19 sec elapsed
# test speed of Grothendieck algorithm
wk <- function(x) as.numeric(format(x, "%U"))
tic(msg = 'Grothendieck')
Grothendieck<- (wk(some_dates) - wk(as.Date(cut(some_dates, "month"))) + 1)
toc()
Grothendieck: 1.99 sec elapsed
# test speed of conighion algorithm
tic(msg = 'conighion')
weeknum <- as.integer( format(some_dates, format="%U") )
mindatemonth <- as.Date( paste0(format(some_dates, "%Y-%m"), "-01") )
weeknummin <- as.integer( format(mindatemonth, format="%U") ) # the number of the week of the first week within the month
conighion <- weeknum - (weeknummin - 1) # this is as an integer
toc()
conighion: 2.42 sec elapsed
# test speed of Freitas algorithm
first_day_of_month_wday <- function(dx) {
day(dx) <- 1
wday(dx)
}
tic(msg = 'Freitas')
Freitas<- ceiling((day(some_dates) + first_day_of_month_wday(some_dates) - 1) / 7)
toc()
Freitas: 0.97 sec elapsed
Fastest correct algorithm by about at least 5X
require(lubridate)
(5 + day(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7
# some_dates above is any vector of dates, like:
some_dates<- seq(ymd(20190101), today(), 'day')
Function Implementation
I also wrote a generalized function for it that performs either month or year week counts, begins on a day you choose (i.e. say you want to start your week on Monday), labels output for easy checking, and is still extremely fast thanks to lubridate.
nth_week<- function(dates = NULL,
count_weeks_in = c("month","year"),
begin_week_on = "Sunday"){
require(lubridate)
count_weeks_in<- tolower(count_weeks_in[1])
# day_names and day_index are for beginning the week on a day other than Sunday
# (this vector ordering matters, so careful about changing it)
day_names<- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
# index integer of first match
day_index<- pmatch(tolower(begin_week_on),
tolower(day_names))[1]
### Calculate week index of each day
if (!is.na(pmatch(count_weeks_in, "year"))) {
# For year:
# sum the day of year, index for day of week at start of year, and constant 5
# then integer divide quantity by 7
# (explicit on package so lubridate and data.table don't fight)
n_week<- (5 +
lubridate::yday(dates) +
lubridate::wday(floor_date(dates, 'year'),
week_start = day_index)
) %/% 7
} else {
# For month:
# same algorithm as above, but for month rather than year
n_week<- (5 +
lubridate::day(dates) +
lubridate::wday(floor_date(dates, 'month'),
week_start = day_index)
) %/% 7
}
# naming very helpful for review
names(n_week)<- paste0(lubridate::wday(dates,T), '-', dates)
n_week
}
Function Output
# Example raw vector output:
some_dates<- seq(ymd(20190930), today(), by='day')
nth_week(some_dates)
Mon-2019-09-30 Tue-2019-10-01 Wed-2019-10-02
5 1 1
Thu-2019-10-03 Fri-2019-10-04 Sat-2019-10-05
1 1 1
Sun-2019-10-06 Mon-2019-10-07 Tue-2019-10-08
2 2 2
Wed-2019-10-09 Thu-2019-10-10 Fri-2019-10-11
2 2 2
Sat-2019-10-12 Sun-2019-10-13
2 3
# Example tabled output:
library(tidyverse)
nth_week(some_dates) %>%
enframe('DATE','nth_week_default') %>%
cbind(some_year_day_options = as.vector(nth_week(some_dates, count_weeks_in = 'year', begin_week_on = 'Mon')))
DATE nth_week_default some_year_day_options
1 Mon-2019-09-30 5 40
2 Tue-2019-10-01 1 40
3 Wed-2019-10-02 1 40
4 Thu-2019-10-03 1 40
5 Fri-2019-10-04 1 40
6 Sat-2019-10-05 1 40
7 Sun-2019-10-06 2 40
8 Mon-2019-10-07 2 41
9 Tue-2019-10-08 2 41
10 Wed-2019-10-09 2 41
11 Thu-2019-10-10 2 41
12 Fri-2019-10-11 2 41
13 Sat-2019-10-12 2 41
14 Sun-2019-10-13 3 41
Hope this work saves people the time of having to weed through all the responses to figure out which are correct.
I don't know R but if you take the week of the first day in the month you could use it to get the week in the month
2014-09-18
First day of month = 2014-09-01
Week of first day on month = 36
Week of 2014-09-18 = 38
Week in the month = 1 + (38 - 36) = 3
Using lubridate you can do
ceiling((day(date) + first_day_of_month_wday(date) - 1) / 7)
Where the function first_day_of_month_wday returns the weekday of the first day of month.
first_day_of_month_wday <- function(dx) {
day(dx) <- 1
wday(dx)
}
This adjustment must be done in order to get the correct week number otherwise if you have the 7th day of month on a Monday you will get 1 instead of 2, for example.
This is only a shift in the day of month.
The minus 1 is necessary because when the first day of month is sunday the adjustment is not needed, and the others weekdays follow this rule.
I came across the same issue and I solved it with mday from data.table package. Also, I realized that when using the ceiling() function, one also needs to account for the '5th week' situation. For example ceiling of the 30th day of a month ceiling(30/7) will give 5 ! Therefore, the ifelse statement below.
# Create a sample data table with days from year 0 until present
DT <- data.table(days = seq(as.Date("0-01-01"), Sys.Date(), "days"))
# compute the week of the month and account for the '5th week' case
DT[, week := ifelse( ceiling(mday(days)/7)==5, 4, ceiling(mday(days)/7) )]
> DT
days week
1: 0000-01-01 1
2: 0000-01-02 1
3: 0000-01-03 1
4: 0000-01-04 1
5: 0000-01-05 1
---
736617: 2016-10-14 2
736618: 2016-10-15 3
736619: 2016-10-16 3
736620: 2016-10-17 3
736621: 2016-10-18 3
To have an idea about the speed, then run:
system.time( DT[, week := ifelse( ceiling(mday(days)/7)==5, 4, ceiling(mday(days)/7) )] )
# user system elapsed
# 3.23 0.05 3.27
It took approx. 3 seconds to compute the weeks for more than 700 000 days.
However, the ceiling way above will always create the last week longer than all the other weeks (the four weeks have 7,7,7, and 9 or 10 days). Another way would be to use something like
ceiling(1:31/31*4)
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4
where you get 7, 8 , 8 and 8 days per respective week in a 31 days month.
DT[, week2 := ceiling(mday(days)/31*4)]
There is a simple way to do it with lubridate package:
isoweek() returns the week as it would appear in the ISO 8601 system, which uses a reoccurring leap week.
epiweek() is the US CDC version of epidemiological week. It follows same rules as
isoweek() but starts on Sunday. In other parts of the world the convention is to start epidemiological weeks on Monday, which is the same as isoweek().
Reference here
I am late to the party and maybe noone is gonna read this answer...
Anyway, why not stay simple and do it like this:
library(lubridate)
x <- ymd(20200311, 20200308)
week(x) - week(floor_date(x, unit = "months")) + 1
[1] 3 2
I don't know any build in functions but a work around would be
CurrentDate <- Sys.Date()
# The number of the week relative to the year
weeknum <- as.integer( format(CurrentDate, format="%U") )
# Find the minimum week of the month relative to the year
mindatemonth <- as.Date( paste0(format(CurrentDate, "%Y-%m"), "-01") )
weeknummin <- as.integer( format(mindatemonth, format="%U") ) # the number of the week of the first week within the month
# Calculate the number of the week relative to the month
weeknum <- weeknum - (weeknummin - 1) # this is as an integer
# With the following you can convert the integer to the same format of
# format(CurrentDate, format="%U")
formatC(weeknum, width = 2, flag = "0")
Simply do this:
library(lubridate)
ds1$Week <- week(ds1$Sale_Date)
This is high performance! It instantly works on my 12 milion rows dataset.
On example above, ds1 is the dataset, Sale_Date is a date column (like "2015-11-23")
The other approach, using "as.integer( format..." might work on small datasets, but on 12 million rows it would keep running forever...