How do i find week number from an arbitrary start date in R? - r

How do I find the week number from an arbitrary start date in R. Let's say I want my start date to be august 1st.

Using lubridate, you can do:
interval(today(), dmy("21-08-2020"))/weeks(1)
[1] 30.42857
Or from the date of interest to another date:
interval(dmy("21-08-2020"), dmy("21-09-2020"))/weeks(1)
[1] 4.428571

You can use difftime for this:
difftime("2020-08-21", Sys.Date(), units = "weeks")
# Time difference of 30.45238 weeks

Related

Assigning Day Ranks With Missing Days [duplicate]

I'm looking to find the day of year for a POSIXct class object with lubridate. For example, 12-9-2015 is day 343.
It's easy to find the day of the week or month with lubridate:
> lubridate::wday("2015-12-09 04:27:56 EST", labels = T)
Wed
> lubridate::day("2015-12-09 04:27:56 EST")
9
Is there an easy way to do so for the day of the year? I've searched the documentation and other questions but have not (yet) found an answer.
The correct function is yday, as in
lubridate::yday(Sys.time())
Figured out a more complicated way to do this before stumbling on this answer from u/blindjesse:
# compute the time interval from the first of the year until now
YTD = interval(floor_date(now(), unit='year'), now())
# compute the length of the interval in days, and discard the fractional part
as.integer(time_length(YTD, "day"))
By the way, an even more compact version of u/blindjesse's answer would be:
lubridate::yday(now())

sequence of monthly dates making sure it's the same day, or the last day of month in case of invalid

Given an initial date, I want to generate a sequence of dates with monthly intervals, ensuring every element has the same day as the initial date or the last day of the month in case the same day would yield an invalid date.
Sounds pretty standard, right?
Using difftime is not possible. Here's what the help file of difftime says:
Units such as "months" are not possible as they are not of constant
length. To create intervals of months, quarters or years use seq.Date
or seq.POSIXt.
But then looking at the help file of seq.POSIXt I find that:
Using "month" first advances the month without changing the day: if
this results in an invalid day of the month, it is counted forward
into the next month: see the examples.
This is the example in the help file.
seq(ISOdate(2000,1,31), by = "month", length.out = 4)
> seq(ISOdate(2000,1,31), by = "month", length.out = 4)
[1] "2000-01-31 12:00:00 GMT" "2000-03-02 12:00:00 GMT"
"2000-03-31 12:00:00 GMT" "2000-05-01 12:00:00 GMT"
So, given that the initial date is on day 31, this would yield invalid dates on February, April, etc. So, the sequence end up actually skipping those months because it "counts forward" and end up with March-02, instead of February-29.
If I start on 2000-01-31, I would like the sequence as follows:
2000-01-31
2000-02-29
2000-03-31
2000-04-30
...
And it should properly handle leap-years, so if the initial date is 2015-01-31 the sequence should be:
2015-01-31
2015-02-28
2015-03-31
2015-04-30
...
These are just examples to illustrate the problem and I do not know the initial date in advance, nor can I assume anything about it. The initial date may well be in the middle of the month (2015-01-15) in which case seq works fine. But it can also be, as in the examples, towards the end of the month on dates that using seq alone would be problematic (days 29, 30 and 31). I cannot assume either that the initial date is the last day of the month.
I have looked around trying to find a solution. In some questions here in SO (e.g. here) there is a "trick" to get the last day of a month, by getting the first day of the next month and simply subtract 1. And finding the first day is "easy" because it is just day 1.
So my solution so far is:
# Given an initial date for my sequence
initial_date <- as.Date("2015-01-31")
# Find the first day of the month
library(magrittr) # to use pipes and make the code more readable
firs_day_of_month <- initial_date %>%
format("%Y-%m") %>%
paste0("-01") %>%
as.Date()
# Generate a sequence from initial date, using seq
# This is the sequence that will have incorrect values in months that would
# have invalid dates
given_dat_seq <- seq(initial_date, by = "month", length.out = 4)
# And then generate an auxiliary sequence for the last day of the month
# I do this generating a sequence that starts the first day of the
# same month as initial date and it goes one month further
# (lenght 5 instead of 4) and substract 1 to all the elements
last_day_seq <- seq(firs_day_of_month, by = "month", length.out = 5)-1
# And finally, for each pair of elements, I take the min date of both
pmin(given_dat_seq, last_day_seq[2:5])
It works, but it is, at the same time, kinda dumb, hacky and convoluted. So I do not like it. And most importantly, I cannot believe there is no easier way to do this in R.
Can someone please point me to a simpler solution? (I guess it should have been as simple as seq(initial_date, "month", 4), but apparently it is not). I've googled it and looked here in SO and R mailing lists, but apart from the tricks I mentioned above, I couldn't find a solution.
The simplest solution is %m+% from lubridate, which solves this exact problem. So:
seq_monthly <- function(from,length.out) {
return(from %m+% months(c(0:(length.out-1))))
}
Output:
> seq_monthly(as.Date("2015-01-31"),length.out=4)
[1] "2015-01-31" "2015-02-28" "2015-03-31" "2015-04-30"
Similar to the lubridate answer, here is one using RcppBDT (which wraps the Boost Date.Time library from C++)
R> dt <- new(bdtDt, 2010, 1, 31); for (i in 1:5) { dt$addMonths(i); print(dt) }
[1] "2010-02-28"
[1] "2010-04-30"
[1] "2010-07-31"
[1] "2010-11-30"
[1] "2011-04-30"
R> dt <- new(bdtDt, 2000, 1, 31); for (i in 1:5) { dt$addMonths(i); print(dt) }
[1] "2000-02-29"
[1] "2000-04-30"
[1] "2000-07-31"
[1] "2000-11-30"
[1] "2001-04-30"
R>

How to have a difference in week units between two days (even if they're close but belong to different weeks)

I make an example to be clear:
if we speak of 2006/2007, the last day of 2006 was Sunday and the first of 2007 was Monday.
According to Italy (but also other countries), they belong to different weeks.
How can I obtain this information in R?
If I do:
difftime(as.Date("2007-01-01"),as.Date("2006-12-31"),units="weeks")
I get: 0.1428571
...but I would like to know some way to get 1 (as they differ of 1 week)
Your problem is that Monday should be the first day of the week. R packages usually consider Sunday to be the first day of the week.
My solution uses the lubridate package and reduces the day of the week by one. With this Mondays become Sundays, the first day of the week for lubridate. I then use floor_date to get the first day of the week and difftime the result.
library(lubridate)
dates <-c(as.Date("2007-01-01"),as.Date("2006-12-31"))
weekdays(dates)
#[1] "Monday" "Sunday"
tempdates <-update(dates,wdays=wday(dates)-1)
weekdays(tempdates)
#[1] "Sunday" "Saturday"
floor1 <-floor_date(tempdates, "week")
difftime(floor1[1],floor1[2], units = "weeks")
#Time difference of 1 weeks
January 1st and 2nd end up on the same week with this solution
dates <-c(as.Date("2007-01-02"),as.Date("2007-01-01"))
tempdates <-update(dates,wdays=wday(dates)-1)
floor1 <-floor_date(tempdates, "week")
difftime(floor1[1],floor1[2], units = "weeks")
#Time difference of 0 weeks
You could start with strftime(as.Date("2007-01-01"),"%U"), which identifies the week number in the year (look up strftime) and add the special case for the last week of a year maybe.
The difference in "weeks" depends on whether you are using "week" as 7 days or as a sequence number. This gives you an R method for working with the second definition of "week":
diff( c(as.numeric(format( as.Date("2007-01-01"), "%W")),
as.numeric(format(as.Date("2006-12-31"), "%W")) ))
[1] 51

Lubridate week() to find consecutive week number for multi-year periods

Within R, say I have a vector of some Lubridate dates:
> Date
"2012-01-01 UTC"
"2013-01-01 UTC"
Next, suppose I want to see what week number these days fall in:
> week(Date)
1
1
Lubridate is fantastic!
But wait...I'm dealing a time series with 10,000 rows of data...and the data spans 3 years.
I've been struggling with finding some way to make this happen:
> result of awesome R code here
1
54
The question: is there a succinct way to coax out a list of week numbers over multiyear periods within Lubridate? More directly, I would like the first week of the second year to be represented as the 54th week. And the first week in the third year to be represented as the 107th week, ad nauseum.
So far, I've attempted a number of hackney schemes but cannot seem to create something not fastened together with scotch tape. Any advice would be greatly appreciated. Thanks in advance.
To get the interval from a particular date to another date, you can just subtract...
If tda is your vector of dates, then
tda - min(tda)
will be the difference in seconds between them.
To get the units out in weeks:
(tda - min(tda))/eweeks(1)
To do it from a particular date:
tda - ymd(19960101)
This gives the number of days from 1996 to each value.
From there, you can divide by days per week, or seconds per week.
(tda - ymd(19960101))/eweeks(1)
To get only the integer part, and starting from January 2012:
trunc((tda - ymd(20111225))/eweeks(1))
Test data:
tda = ymd(c(20120101, 20120106, 20130101, 20130108))
Output:
1 1 53 54
Since eweeks() is now deprecated, I thought I'd add to #beroe's answer.
If tda is your date vector, you can get the week numbers with:
weeknos <- (interval(min(tda), tda) %/% weeks(1)) + 1
where %/% causes integer division. ( 5 / 3 = 1.667; 5 %/% 3 = 1)
You can do something like this :
week(dat) +53*(year(dat)-min(year(dat)))
Given you like lubridate (as do I)
year_week <- function(x,base) week(x) - week(base) + 52*(year(x) - year(base))
test <- ymd(c(20120101, 20120106, 20130101, 20130108))
year_week(test, "2012-01-01")
Giving
[1] 0 0 52 53

Bucketing data into weekly, bi-weekly, monthly and quarterly data in R

I have a data frame with two columns. Date, Gender
I want to change the Date column to the start of the week for that observation. For example if Jun-28-2011 is a Tuesday, I'd like to change it to Jun-27-2011. Basically I want to re-label Date fields such that two data points that are in the same week have the same Date.
I also want to be able to do it by-weekly, or monthly and specially quarterly.
Update:
Let's use this as a dataset.
datset <- data.frame(date = as.Date("2011-06-28")+c(1:100))
One slick way to do this that I just learned recently is to use the lubridate package:
library(lubridate)
datset <- data.frame(date = as.Date("2011-06-28")+c(1:100))
#Add 1, since floor_date appears to round down to Sundays
floor_date(datset$date,"week") + 1
I'm not sure about how to do bi-weekly binning, but monthly and quarterly are easily handled with the respective base functions:
quarters(datset$date)
months(datset$date)
EDIT: Interestingly, floor_date from lubridate does not appear to be able to round down to the nearest quarter, but the function of the same name in ggplot2 does.
Look at ?strftime. In particular, the following formats:
%b: Abbreviated month name in the
current locale. (Also matches full
name on input.)
%B: Full month name
in the current locale. (Also matches
abbreviated name on input.)
%m: Month as decimal number (01–12).
%W: Week of the year as decimal number
(00–53) using Monday as the first day
of week (and typically with the first
Monday of the year as day 1 of week
1). The UK convention.
eg:
> strftime("2011-07-28","Month: %B, Week: %W")
[1] "Month: July, Week: 30"
> paste("Quarter:",ceiling(as.integer(strftime("2011-07-28","%m"))/3))
[1] "Quarter: 3"

Resources