Extracting last date of the year from a date object - r

I have following data set:
>d
x date
1 1 1-3-2013
2 2 2-4-2010
3 3 2-5-2011
4 4 1-6-2012
I want:
> d
x date
1 1 31-12-2013
2 2 31-12-2010
3 3 31-12-2011
4 4 31-12-2012
i.e. Last day, last month and the year of the date object.
Please Help!

You can also just use the ceiling_date function in LUBRIDATE package.
You can do something like -
library(lubridate)
last_date <- ceiling_date(date,"year") - days(1)
ceiling_date(date,"year") gives you the first date of the next year and to get the last date of the current year, you subtract this by 1 or days(1).
Hope this helps.

Another option using lubridate package:
## using d from Roland answer
transform(d,last =dmy(paste0('3112',year(dmy(date)))))
x date last
1 1 1-3-2013 2013-12-31
2 2 2-4-2010 2010-12-31
3 3 2-5-2011 2011-12-31
4 4 1-6-2012 2012-12-31

d <- read.table(text="x date
1 1 1-3-2013
2 2 2-4-2010
3 3 2-5-2011
4 4 1-6-2012", header=TRUE)
d$date <- as.Date(d$date, "%d-%m-%Y")
d$date <- as.POSIXlt(d$date)
d$date$mon <- 11
d$date$mday <- 31
d$date <- as.Date(d$date)
# x date
#1 1 2013-12-31
#2 2 2010-12-31
#3 3 2011-12-31
#4 4 2012-12-31

1) cut.Date Define cut_year to give the first day of the year. Adding 366 gets us to the next year and then applying cut_year again gets us to the first day of the next year. Finally subtract 1 to get the last day of the year. The code uses base functionality only.
cut_year <- function(x) as.Date(cut(as.Date(x), "year"))
transform(d, date = cut_year(cut_year(date) + 366) - 1)
2) format
transform(d, date = as.Date(format(as.Date(date), "%Y-12-31")))
3) zoo A "yearmon" class variable stores the date as a year plus 0 for Jan, 1/12 for Feb, ..., 11/12 for Dec. Thus taking its floor and adding 11/12 gets one to Dec and as.Date.yearmon(..., frac = 1) uses the last of the month instead of the first.
library(zoo)
transform(d, date = as.Date(floor(as.yearmon(as.Date(date))) + 11 / 12, frac = 1))
Note: The inner as.Date in cut_year and in the other two solutions can be omitted if it is known that date is already of "Date" class.
ADDED additional solutions.

Related

R create week numbers with specified start date

This seems like it should be straightforward but I cannot find a way to do this.
I have a sales cycle that begins ~ August 1 of each year and need to sum sales by week number. I need to create a "week number" field where week #1 begins on a date that I specify. Thus far I have looked at lubridate, baseR, and strftime, and I cannot find a way to change the "start" date from 01/01/YYYY to something else.
Solution needs to let me specify the start date and iterate week numbers as 7 days from the start date. The actual start date doesn't always occur on a Sunday or Monday.
EG Data Frame
eg_data <- data.frame(
cycle = c("cycle2019", "cycle2019", "cycle2018", "cycle2018", "cycle2017", "cycle2017", "cycle2016", "cycle2016"),
dates = as.POSIXct(c("2019-08-01" , "2019-08-10" ,"2018-07-31" , "2018-08-16", "2017-08-03" , "2017-08-14" , "2016-08-05", "2016-08-29")),
week_n = c("1", "2","1","3","1","2","1","4"))
I'd like the result to look like what is above - it would take the min date for each cycle and use that as a starting point, then iterate up week numbers based on a given date's distance from the cycle starting date.
This almost works. (Doing date arithmetic gives us durations in seconds: there may be a smoother way to convert with lubridate tools?)
secs_per_week <- 60*60*24*7
(eg_data
%>% group_by(cycle)
%>% mutate(nw=1+as.numeric(round((dates-min(dates))/secs_per_week)))
)
The results don't match for 2017, because there is an 11-day gap between the first and second observation ...
cycle dates week_n nw
<chr> <dttm> <chr> <dbl>
5 cycle2017 2017-08-03 00:00:00 1 1
6 cycle2017 2017-08-14 00:00:00 2 3
If someone has a better answer plz post, but this works -
Take the dataframe in the example, eg_data -
eg_data %>%
group_by(cycle) %>%
mutate(
cycle_start = as.Date(min(dates)),
days_diff = as.Date(dates) - cycle_start,
week_n = days_diff / 7,
week_n_whole = ceiling(days_diff / 7) ) -> eg_data_check
(First time I've answered my own question)
library("lubridate")
eg_data %>%
as_tibble() %>%
group_by(cycle) %>%
mutate(new_week = week(dates)-31)
This doesn't quite work the same as your example, but perhaps with some fiddling based on your domain experience you could adapt it:
library(lubridate)
eg_data %>%
mutate(aug1 = ymd_h(paste(str_sub(cycle, start = -4), "080100")),
week_n2 = ceiling((dates - aug1)/ddays(7)))
EDIT: If you have specific known dates for the start of each cycle, it might be helpful to join those dates to your data for the calc:
library(lubridate)
cycle_starts <- data.frame(
cycle = c("cycle2019", "cycle2018", "cycle2017", "cycle2016"),
start_date = ymd_h(c(2019080100, 2018072500, 2017080500, 2016071300))
)
eg_data %>%
left_join(cycle_starts) %>%
mutate(week_n2 = ceiling((dates - start_date)/ddays(7)))
#Joining, by = "cycle"
# cycle dates week_n start_date week_n2
#1 cycle2019 2019-08-01 1 2019-08-01 1
#2 cycle2019 2019-08-10 2 2019-08-01 2
#3 cycle2018 2018-07-31 1 2018-07-25 1
#4 cycle2018 2018-08-16 3 2018-07-25 4
#5 cycle2017 2017-08-03 1 2017-08-05 0
#6 cycle2017 2017-08-14 2 2017-08-05 2
#7 cycle2016 2016-08-05 1 2016-07-13 4
#8 cycle2016 2016-08-29 4 2016-07-13 7
This is a concise solution using lubridate
library(lubridate)
eg_data %>%
group_by(cycle) %>%
mutate(new_week = floor(as.period(ymd(dates) - ymd(min(dates))) / weeks()) + 1)
# A tibble: 8 x 4
# Groups: cycle [4]
cycle dates week_n new_week
<chr> <dttm> <chr> <dbl>
1 cycle2019 2019-08-01 00:00:00 1 1
2 cycle2019 2019-08-10 00:00:00 2 2
3 cycle2018 2018-07-31 00:00:00 1 1
4 cycle2018 2018-08-16 00:00:00 3 3
5 cycle2017 2017-08-03 00:00:00 1 1
6 cycle2017 2017-08-14 00:00:00 2 2
7 cycle2016 2016-08-05 00:00:00 1 1
8 cycle2016 2016-08-29 00:00:00 4 4

Adding column with Yes/No values based on date of another column

ID Date
1 2020-06-03
2 2018-05-04
3 2019-08-09
I want to add a column to this data frame that indicates Yes/No based on whether or not the Date falls within the last year based on the date the code is being run.
ID Date YN
1 2019-06-03 Yes
2 2018-05-04 No
3 2019-06-02 No
You could do:
library(lubridate)
library(dplyr)
nw <- ymd("2020-06-03")
df %>%
mutate(Date = ymd(Date),
yn = if_else(nw > Date & Date >= nw - years(1), "Yes", "No"))
ID Date YN yn
1 1 2019-06-03 Yes Yes
2 2 2018-05-04 No No
3 3 2019-06-02 No No
You can use base R for this, no need for additional packages, the year has approximately 365.25 days but you need to add 1 day to have the time elapsing within a year. Take the difference between today using Sys.Date() and what is in d1[["Date"]]. diff.time() can be applied to vectors. You'll need to get creative with leap years though.
I also realized that you don't specify whether the column Date is of Date format or just a character vector. If the latter, then you need to convert the column Date to Date format using as.Date(). inherits(x, 'Date') checks whether a vector x inherits the class Date. Assume d1 is the name assigned to your data.frame object:
# in case 'Date' is a string, convert it to date:
if(!inherits(d1[["Date"]], "Date")) d1[["Date"]] <- as.Date(d1[["Date"]])
d1[["YN"]] <- ifelse(difftime(Sys.Date(), d1[["Date"]], units="days") <= 366.25, "Yes", "No")
Result:
> d1
ID Date YN
1 1 2019-06-03 Yes
2 2 2018-05-04 No
3 3 2019-06-02 No

Identify if a day of the week is 2nd/3rd etc Mon/Tues/etc day of the month in R

Given a date and the day of the week it is, I want to know if there is a code that tells me which of those days of the month it is. For example in the picture below, given 2/12/2020 and "Wednesday" I want to be given the output "2" for it being the second Wednesday of the month.
You can do that in base R in essentially one operation. You also do not need the second input column.
Here is slower walkthrough:
Code
dates <- c("2/12/2020","2/11/2020","2/10/2020","2/7/2020","2/6/2020", "2/5/2020")
Dates <- anytime::anydate(dates) ## one of several parsers
dow <- weekdays(Dates) ## for illustration, base R function
cnt <- (as.integer(format(Dates, "%d")) - 1) %/% 7 + 1
res <- data.frame(dt=Dates, dow=dow, cnt=cnt)
res
(Final) Output
R> res
dt dow cnt
1 2020-02-12 Wednesday 2
2 2020-02-11 Tuesday 2
3 2020-02-10 Monday 2
4 2020-02-07 Friday 1
5 2020-02-06 Thursday 1
6 2020-02-05 Wednesday 1
R>
Functionality like this is often in dedicated date/time libraries. I wrapped some code from the (C++) Boost date_time library in package RcppBDH -- that allowed to easily find 'the third Wednesday in the last month each quarter' and alike.
(lubridate::day(your_date) - 1) %/% 7 + 1
The idea here is that the first 7 days of the month are all the first for their weekday. Next 7 are 2nd, etc.
> (1:30 - 1) %/% 7 + 1
# [1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4 5 5
Just to offer an alternative calculation for the nth-weekday of the month, you can just divide the day by 7 and always round up:
date <- lubridate::mdy("02/12/2020")
ceiling(day(date)/7)

Nested for loops for date differences

I am new to R and I am trying to calculate date differences from a baseline for every subject. I know how to calculate the day differences using difftime but I am having trouble doing it in a loop for every subject. Any help would be greatly appreciated.
Basically I want to go from:
ID DATE
1 1.1.2015
1 1.1.2016
2 1.1.2017
3 1.1.2017
3 1.1.2016
3 1.1.2017
to:
ID DATE DATEDIFF
1 1.1.2015 0
1 1.1.2016 365
2 1.1.2017 0
3 1.1.2015 0
3 1.1.2016 365
3 1.1.2017 730
Use lubridate to parse the dates and dplyr to calculate the new column:
library(lubridate)
df <- data.frame(
id = c(1,1,2,3,3,3),
date = c('1.1.2015','1.1.2016','1.1.2017','1.1.2015','1.1.2016','1.1.2017'))
# parse dates as DayMonthYear
df$date <- dmy(df$date)
# calculate the difference to the oldest date in each group
# mutate is called once for each group, so you could use an
# arbitrary expression to calculate your new column only with
# the data for this group
df %>% group_by(id) %>% mutate(datediff = date-min(date))
Result:
id date datediff
1 1 2015-01-01 0 days
2 1 2016-01-01 365 days
3 2 2017-01-01 0 days
4 3 2015-01-01 0 days
5 3 2016-01-01 365 days
6 3 2017-01-01 731 days

R: How to get the Week number of the month

I am new in R.
I want the week number of the month, which the date belongs to.
By using the following code:
>CurrentDate<-Sys.Date()
>Week Number <- format(CurrentDate, format="%U")
>Week Number
"31"
%U will return the Week number of the year .
But i want the week number of the month.
If the date is 2014-08-01 then i want to get 1.( The Date belongs to the 1st week of the month).
Eg:
2014-09-04 -> 1 (The Date belongs to the 1st week of the month).
2014-09-10 -> 2 (The Date belongs to the 2nd week of the month).
and so on...
How can i get this?
Reference:
http://astrostatistics.psu.edu/su07/R/html/base/html/strptime.html
By analogy of the weekdays function:
monthweeks <- function(x) {
UseMethod("monthweeks")
}
monthweeks.Date <- function(x) {
ceiling(as.numeric(format(x, "%d")) / 7)
}
monthweeks.POSIXlt <- function(x) {
ceiling(as.numeric(format(x, "%d")) / 7)
}
monthweeks.character <- function(x) {
ceiling(as.numeric(format(as.Date(x), "%d")) / 7)
}
dates <- sample(seq(as.Date("2000-01-01"), as.Date("2015-01-01"), "days"), 7)
dates
#> [1] "2004-09-24" "2002-11-21" "2011-08-13" "2008-09-23" "2000-08-10" "2007-09-10" "2013-04-16"
monthweeks(dates)
#> [1] 4 3 2 4 2 2 3
Another solution to use stri_datetime_fields() from the stringi package:
stringi::stri_datetime_fields(dates)$WeekOfMonth
#> [1] 4 4 2 4 2 3 3
You can use day from the lubridate package. I'm not sure if there's a week-of-month type function in the package, but we can do the math.
library(lubridate)
curr <- Sys.Date()
# [1] "2014-08-08"
day(curr) ## 8th day of the current month
# [1] 8
day(curr) / 7 ## Technically, it's the 1.14th week
# [1] 1.142857
ceiling(day(curr) / 7) ## but ceiling() will take it up to the 2nd week.
# [1] 2
Issue Overview
It was difficult to tell which answers worked, so I built my own function nth_week and tested it against the others.
The issue that's leading to most of the answers being incorrect is this:
The first week of a month is often a short-week
Same with the last week of the month
For example, October 1st 2019 is a Tuesday, so 6 days into October (which is a Sunday) is already the second week. Also, contiguous months often share the same week in their respective counts, meaning that the last week of the prior month is commonly also the first week of the current month. So, we should expect a week count higher than 52 per year and some months that contain a span of 6 weeks.
Results Comparison
Here's a table showing examples where some of the above suggested algorithms go awry:
DATE Tori user206 Scri Klev Stringi Grot Frei Vale epi iso coni
Fri-2016-01-01 1 1 1 1 5 1 1 1 1 1 1
Sat-2016-01-02 1 1 1 1 1 1 1 1 1 1 1
Sun-2016-01-03 2 1 1 1 1 2 2 1 -50 1 2
Mon-2016-01-04 2 1 1 1 2 2 2 1 -50 -51 2
----
Sat-2018-12-29 5 5 5 5 5 5 5 4 5 5 5
Sun-2018-12-30 6 5 5 5 5 6 6 4 -46 5 6
Mon-2018-12-31 6 5 5 5 6 6 6 4 -46 -46 6
Tue-2019-01-01 1 1 1 1 6 1 1 1 1 1 1
You can see that only Grothendieck, conighion, Freitas, and Tori are correct due to their treatment of partial week periods. I compared all days from year 100 to year 3000; there are no differences among those 4. (Stringi is probably correct for noting weekends as separate, incremented periods, but I didn't check to be sure; epiweek() and isoweek(), because of their intended uses, show some odd behavior near year-ends when using them for week incrementation.)
Speed Comparison
Below are the tests for efficiency between the implementations of: Tori, Grothendieck, Conighion, and Freitas
# prep
library(lubridate)
library(tictoc)
kepler<- ymd(15711227) # Kepler's birthday since it's a nice day and gives a long vector of dates
some_dates<- seq(kepler, today(), by='day')
# test speed of Tori algorithm
tic(msg = 'Tori')
Tori<- (5 + day(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7
toc()
Tori: 0.19 sec elapsed
# test speed of Grothendieck algorithm
wk <- function(x) as.numeric(format(x, "%U"))
tic(msg = 'Grothendieck')
Grothendieck<- (wk(some_dates) - wk(as.Date(cut(some_dates, "month"))) + 1)
toc()
Grothendieck: 1.99 sec elapsed
# test speed of conighion algorithm
tic(msg = 'conighion')
weeknum <- as.integer( format(some_dates, format="%U") )
mindatemonth <- as.Date( paste0(format(some_dates, "%Y-%m"), "-01") )
weeknummin <- as.integer( format(mindatemonth, format="%U") ) # the number of the week of the first week within the month
conighion <- weeknum - (weeknummin - 1) # this is as an integer
toc()
conighion: 2.42 sec elapsed
# test speed of Freitas algorithm
first_day_of_month_wday <- function(dx) {
day(dx) <- 1
wday(dx)
}
tic(msg = 'Freitas')
Freitas<- ceiling((day(some_dates) + first_day_of_month_wday(some_dates) - 1) / 7)
toc()
Freitas: 0.97 sec elapsed
Fastest correct algorithm by about at least 5X
require(lubridate)
(5 + day(some_dates) + wday(floor_date(some_dates, 'month'))) %/% 7
# some_dates above is any vector of dates, like:
some_dates<- seq(ymd(20190101), today(), 'day')
Function Implementation
I also wrote a generalized function for it that performs either month or year week counts, begins on a day you choose (i.e. say you want to start your week on Monday), labels output for easy checking, and is still extremely fast thanks to lubridate.
nth_week<- function(dates = NULL,
count_weeks_in = c("month","year"),
begin_week_on = "Sunday"){
require(lubridate)
count_weeks_in<- tolower(count_weeks_in[1])
# day_names and day_index are for beginning the week on a day other than Sunday
# (this vector ordering matters, so careful about changing it)
day_names<- c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
# index integer of first match
day_index<- pmatch(tolower(begin_week_on),
tolower(day_names))[1]
### Calculate week index of each day
if (!is.na(pmatch(count_weeks_in, "year"))) {
# For year:
# sum the day of year, index for day of week at start of year, and constant 5
# then integer divide quantity by 7
# (explicit on package so lubridate and data.table don't fight)
n_week<- (5 +
lubridate::yday(dates) +
lubridate::wday(floor_date(dates, 'year'),
week_start = day_index)
) %/% 7
} else {
# For month:
# same algorithm as above, but for month rather than year
n_week<- (5 +
lubridate::day(dates) +
lubridate::wday(floor_date(dates, 'month'),
week_start = day_index)
) %/% 7
}
# naming very helpful for review
names(n_week)<- paste0(lubridate::wday(dates,T), '-', dates)
n_week
}
Function Output
# Example raw vector output:
some_dates<- seq(ymd(20190930), today(), by='day')
nth_week(some_dates)
Mon-2019-09-30 Tue-2019-10-01 Wed-2019-10-02
5 1 1
Thu-2019-10-03 Fri-2019-10-04 Sat-2019-10-05
1 1 1
Sun-2019-10-06 Mon-2019-10-07 Tue-2019-10-08
2 2 2
Wed-2019-10-09 Thu-2019-10-10 Fri-2019-10-11
2 2 2
Sat-2019-10-12 Sun-2019-10-13
2 3
# Example tabled output:
library(tidyverse)
nth_week(some_dates) %>%
enframe('DATE','nth_week_default') %>%
cbind(some_year_day_options = as.vector(nth_week(some_dates, count_weeks_in = 'year', begin_week_on = 'Mon')))
DATE nth_week_default some_year_day_options
1 Mon-2019-09-30 5 40
2 Tue-2019-10-01 1 40
3 Wed-2019-10-02 1 40
4 Thu-2019-10-03 1 40
5 Fri-2019-10-04 1 40
6 Sat-2019-10-05 1 40
7 Sun-2019-10-06 2 40
8 Mon-2019-10-07 2 41
9 Tue-2019-10-08 2 41
10 Wed-2019-10-09 2 41
11 Thu-2019-10-10 2 41
12 Fri-2019-10-11 2 41
13 Sat-2019-10-12 2 41
14 Sun-2019-10-13 3 41
Hope this work saves people the time of having to weed through all the responses to figure out which are correct.
I don't know R but if you take the week of the first day in the month you could use it to get the week in the month
2014-09-18
First day of month = 2014-09-01
Week of first day on month = 36
Week of 2014-09-18 = 38
Week in the month = 1 + (38 - 36) = 3
Using lubridate you can do
ceiling((day(date) + first_day_of_month_wday(date) - 1) / 7)
Where the function first_day_of_month_wday returns the weekday of the first day of month.
first_day_of_month_wday <- function(dx) {
day(dx) <- 1
wday(dx)
}
This adjustment must be done in order to get the correct week number otherwise if you have the 7th day of month on a Monday you will get 1 instead of 2, for example.
This is only a shift in the day of month.
The minus 1 is necessary because when the first day of month is sunday the adjustment is not needed, and the others weekdays follow this rule.
I came across the same issue and I solved it with mday from data.table package. Also, I realized that when using the ceiling() function, one also needs to account for the '5th week' situation. For example ceiling of the 30th day of a month ceiling(30/7) will give 5 ! Therefore, the ifelse statement below.
# Create a sample data table with days from year 0 until present
DT <- data.table(days = seq(as.Date("0-01-01"), Sys.Date(), "days"))
# compute the week of the month and account for the '5th week' case
DT[, week := ifelse( ceiling(mday(days)/7)==5, 4, ceiling(mday(days)/7) )]
> DT
days week
1: 0000-01-01 1
2: 0000-01-02 1
3: 0000-01-03 1
4: 0000-01-04 1
5: 0000-01-05 1
---
736617: 2016-10-14 2
736618: 2016-10-15 3
736619: 2016-10-16 3
736620: 2016-10-17 3
736621: 2016-10-18 3
To have an idea about the speed, then run:
system.time( DT[, week := ifelse( ceiling(mday(days)/7)==5, 4, ceiling(mday(days)/7) )] )
# user system elapsed
# 3.23 0.05 3.27
It took approx. 3 seconds to compute the weeks for more than 700 000 days.
However, the ceiling way above will always create the last week longer than all the other weeks (the four weeks have 7,7,7, and 9 or 10 days). Another way would be to use something like
ceiling(1:31/31*4)
[1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4
where you get 7, 8 , 8 and 8 days per respective week in a 31 days month.
DT[, week2 := ceiling(mday(days)/31*4)]
There is a simple way to do it with lubridate package:
isoweek() returns the week as it would appear in the ISO 8601 system, which uses a reoccurring leap week.
epiweek() is the US CDC version of epidemiological week. It follows same rules as
isoweek() but starts on Sunday. In other parts of the world the convention is to start epidemiological weeks on Monday, which is the same as isoweek().
Reference here
I am late to the party and maybe noone is gonna read this answer...
Anyway, why not stay simple and do it like this:
library(lubridate)
x <- ymd(20200311, 20200308)
week(x) - week(floor_date(x, unit = "months")) + 1
[1] 3 2
I don't know any build in functions but a work around would be
CurrentDate <- Sys.Date()
# The number of the week relative to the year
weeknum <- as.integer( format(CurrentDate, format="%U") )
# Find the minimum week of the month relative to the year
mindatemonth <- as.Date( paste0(format(CurrentDate, "%Y-%m"), "-01") )
weeknummin <- as.integer( format(mindatemonth, format="%U") ) # the number of the week of the first week within the month
# Calculate the number of the week relative to the month
weeknum <- weeknum - (weeknummin - 1) # this is as an integer
# With the following you can convert the integer to the same format of
# format(CurrentDate, format="%U")
formatC(weeknum, width = 2, flag = "0")
Simply do this:
library(lubridate)
ds1$Week <- week(ds1$Sale_Date)
This is high performance! It instantly works on my 12 milion rows dataset.
On example above, ds1 is the dataset, Sale_Date is a date column (like "2015-11-23")
The other approach, using "as.integer( format..." might work on small datasets, but on 12 million rows it would keep running forever...

Resources