Difference in months between two dates in R? [duplicate] - r

I have two dates let´s say 14.01.2013 and 26.03.2014.
I would like to get the difference between those two dates in terms of weeks(?), months(in the example 14), quarters(4) and years(1).
Do you know the best way to get this?

what about this:
# get difference between dates `"01.12.2013"` and `"31.12.2013"`
# weeks
difftime(strptime("26.03.2014", format = "%d.%m.%Y"),
strptime("14.01.2013", format = "%d.%m.%Y"),units="weeks")
Time difference of 62.28571 weeks
# months
(as.yearmon(strptime("26.03.2014", format = "%d.%m.%Y"))-
as.yearmon(strptime("14.01.2013", format = "%d.%m.%Y")))*12
[1] 14
# quarters
(as.yearqtr(strptime("26.03.2014", format = "%d.%m.%Y"))-
as.yearqtr(strptime("14.01.2013", format = "%d.%m.%Y")))*4
[1] 4
# years
year(strptime("26.03.2014", format = "%d.%m.%Y"))-
year(strptime("14.01.2013", format = "%d.%m.%Y"))
[1] 1
as.yearmon() and as.yearqtr() are in package zoo. year() is in package lubridate.
What do you think?

All the existing answers are imperfect (IMO) and either make assumptions about the desired output or don't provide flexibility for the desired output.
Based on the examples from the OP, and the OP's stated expected answers, I think these are the answers you are looking for (plus some additional examples that make it easy to extrapolate).
(This only requires base R and doesn't require zoo or lubridate)
Convert to Datetime Objects
date_strings = c("14.01.2013", "26.03.2014")
datetimes = strptime(date_strings, format = "%d.%m.%Y") # convert to datetime objects
Difference in Days
You can use the diff in days to get some of our later answers
diff_in_days = difftime(datetimes[2], datetimes[1], units = "days") # days
diff_in_days
#Time difference of 435.9583 days
Difference in Weeks
Difference in weeks is a special case of units = "weeks" in difftime()
diff_in_weeks = difftime(datetimes[2], datetimes[1], units = "weeks") # weeks
diff_in_weeks
#Time difference of 62.27976 weeks
Note that this is the same as dividing our diff_in_days by 7 (7 days in a week)
as.double(diff_in_days)/7
#[1] 62.27976
Difference in Years
With similar logic, we can derive years from diff_in_days
diff_in_years = as.double(diff_in_days)/365 # absolute years
diff_in_years
#[1] 1.194406
You seem to be expecting the diff in years to be "1", so I assume you just want to count absolute calendar years or something, which you can easily do by using floor()
# get desired output, given your definition of 'years'
floor(diff_in_years)
#[1] 1
Difference in Quarters
# get desired output for quarters, given your definition of 'quarters'
floor(diff_in_years * 4)
#[1] 4
Difference in Months
Can calculate this as a conversion from diff_years
# months, defined as absolute calendar months (this might be what you want, given your question details)
months_diff = diff_in_years*12
floor(month_diff)
#[1] 14
I know this question is old, but given that I still had to solve this problem just now, I thought I would add my answers. Hope it helps.

For weeks, you can use function difftime:
date1 <- strptime("14.01.2013", format="%d.%m.%Y")
date2 <- strptime("26.03.2014", format="%d.%m.%Y")
difftime(date2,date1,units="weeks")
Time difference of 62.28571 weeks
But difftime doesn't work with duration over weeks.
The following is a very suboptimal solution using cut.POSIXt for those durations but you can work around it:
seq1 <- seq(date1,date2, by="days")
nlevels(cut(seq1,"months"))
15
nlevels(cut(seq1,"quarters"))
5
nlevels(cut(seq1,"years"))
2
This is however the number of months, quarters or years spanned by your time interval and not the duration of your time interval expressed in months, quarters, years (since those do not have a constant duration). Considering the comment you made on #SvenHohenstein answer I would think you can use nlevels(cut(seq1,"months")) - 1 for what you're trying to achieve.

I just wrote this for another question, then stumbled here.
library(lubridate)
#' Calculate age
#'
#' By default, calculates the typical "age in years", with a
#' \code{floor} applied so that you are, e.g., 5 years old from
#' 5th birthday through the day before your 6th birthday. Set
#' \code{floor = FALSE} to return decimal ages, and change \code{units}
#' for units other than years.
#' #param dob date-of-birth, the day to start calculating age.
#' #param age.day the date on which age is to be calculated.
#' #param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}.
#' #param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}.
#' #return Age in \code{units}. Will be an integer if \code{floor = TRUE}.
#' #examples
#' my.dob <- as.Date('1983-10-20')
#' age(my.dob)
#' age(my.dob, units = "minutes")
#' age(my.dob, floor = FALSE)
age <- function(dob, age.day = today(), units = "years", floor = TRUE) {
calc.age = interval(dob, age.day) / duration(num = 1, units = units)
if (floor) return(as.integer(floor(calc.age)))
return(calc.age)
}
Usage examples:
my.dob <- as.Date('1983-10-20')
age(my.dob)
# [1] 31
age(my.dob, floor = FALSE)
# [1] 31.15616
age(my.dob, units = "minutes")
# [1] 16375680
age(seq(my.dob, length.out = 6, by = "years"))
# [1] 31 30 29 28 27 26

Here the still lacking lubridate answer (although Gregor's function is built on this package)
The lubridate timespan documentation is very helpful for understanding the difference between periods and duration. I also like the lubridate cheatsheet and this very useful thread
library(lubridate)
dates <- c(dmy('14.01.2013'), dmy('26.03.2014'))
span <- dates[1] %--% dates[2] #creating an interval object
#creating period objects
as.period(span, unit = 'year')
#> [1] "1y 2m 12d 0H 0M 0S"
as.period(span, unit = 'month')
#> [1] "14m 12d 0H 0M 0S"
as.period(span, unit = 'day')
#> [1] "436d 0H 0M 0S"
Periods do not accept weeks as units. But you can convert durations to weeks:
as.duration(span)/ dweeks(1)
#makes duration object (in seconds) and divides by duration of a week (in seconds)
#> [1] 62.28571
Created on 2019-11-04 by the reprex package (v0.3.0)

Here's a solution:
dates <- c("14.01.2013", "26.03.2014")
# Date format:
dates2 <- strptime(dates, format = "%d.%m.%Y")
dif <- diff(as.numeric(dates2)) # difference in seconds
dif/(60 * 60 * 24 * 7) # weeks
[1] 62.28571
dif/(60 * 60 * 24 * 30) # months
[1] 14.53333
dif/(60 * 60 * 24 * 30 * 3) # quartes
[1] 4.844444
dif/(60 * 60 * 24 * 365) # years
[1] 1.194521

This is a simple way to find out the difference in years with the lubridate package:
as.numeric(as.Date("14-03-2013", format = "%d-%m-%Y") %--% as.Date("23-03-2014", format = "%d-%m-%Y"), "years")
This returns 1.023956
You can use floor() if you don't want the decimals.

try this for a months solution
StartDate <- strptime("14 January 2013", "%d %B %Y")
EventDates <- strptime(c("26 March 2014"), "%d %B %Y")
difftime(EventDates, StartDate)

A more "precise" calculation. That is, the number of week/month/quarter/year for a non-complete week/month/quarter/year is the fraction of calendar days in that week/month/quarter/year. For example, the number of months between 2016-02-22 and 2016-03-31 is 8/29 + 31/31 = 1.27586
explanation inline with code
#' Calculate precise number of periods between 2 dates
#'
#' #details The number of week/month/quarter/year for a non-complete week/month/quarter/year
#' is the fraction of calendar days in that week/month/quarter/year.
#' For example, the number of months between 2016-02-22 and 2016-03-31
#' is 8/29 + 31/31 = 1.27586
#'
#' #param startdate start Date of the interval
#' #param enddate end Date of the interval
#' #param period character. It must be one of 'day', 'week', 'month', 'quarter' and 'year'
#'
#' #examples
#' identical(numPeriods(as.Date("2016-02-15"), as.Date("2016-03-31"), "month"), 15/29 + 1)
#' identical(numPeriods(as.Date("2016-02-15"), as.Date("2016-03-31"), "quarter"), (15 + 31)/(31 + 29 + 31))
#' identical(numPeriods(as.Date("2016-02-15"), as.Date("2016-03-31"), "year"), (15 + 31)/366)
#'
#' #return exact number of periods between
#'
numPeriods <- function(startdate, enddate, period) {
numdays <- as.numeric(enddate - startdate) + 1
if (grepl("day", period, ignore.case=TRUE)) {
return(numdays)
} else if (grepl("week", period, ignore.case=TRUE)) {
return(numdays / 7)
}
#create a sequence of dates between start and end dates
effDaysinBins <- cut(seq(startdate, enddate, by="1 day"), period)
#use the earliest start date of the previous bins and create a breaks of periodic dates with
#user's period interval
intervals <- seq(from=as.Date(min(levels(effDaysinBins)), "%Y-%m-%d"),
by=paste("1",period),
length.out=length(levels(effDaysinBins))+1)
#create a sequence of dates between the earliest interval date and last date of the interval
#that contains the enddate
allDays <- seq(from=intervals[1],
to=intervals[intervals > enddate][1] - 1,
by="1 day")
#bin all days in the whole period using previous breaks
allDaysInBins <- cut(allDays, intervals)
#calculate ratio of effective days to all days in whole period
sum( tabulate(effDaysinBins) / tabulate(allDaysInBins) )
} #numPeriods
Please let me know if you find more boundary cases where the above solution does not work.

Related

Finding age in R [duplicate]

I am using data.table for the first time.
I have a column of about 400,000 ages in my table. I need to convert them from birth dates to ages.
What is the best way to do this?
I've been thinking about this and have been dissatisfied with the two answers so far. I like using lubridate, as #KFB did, but I also want things wrapped up nicely in a function, as in my answer using the eeptools package. So here's a wrapper function using the lubridate interval method with some nice options:
#' Calculate age
#'
#' By default, calculates the typical "age in years", with a
#' \code{floor} applied so that you are, e.g., 5 years old from
#' 5th birthday through the day before your 6th birthday. Set
#' \code{floor = FALSE} to return decimal ages, and change \code{units}
#' for units other than years.
#' #param dob date-of-birth, the day to start calculating age.
#' #param age.day the date on which age is to be calculated.
#' #param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}.
#' #param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}.
#' #return Age in \code{units}. Will be an integer if \code{floor = TRUE}.
#' #examples
#' my.dob <- as.Date('1983-10-20')
#' age(my.dob)
#' age(my.dob, units = "minutes")
#' age(my.dob, floor = FALSE)
age <- function(dob, age.day = today(), units = "years", floor = TRUE) {
calc.age = lubridate::interval(dob, age.day) / lubridate::duration(num = 1, units = units)
if (floor) return(as.integer(floor(calc.age)))
return(calc.age)
}
Usage examples:
> my.dob <- as.Date('1983-10-20')
> age(my.dob)
[1] 31
> age(my.dob, floor = FALSE)
[1] 31.15616
> age(my.dob, units = "minutes")
[1] 16375680
> age(seq(my.dob, length.out = 6, by = "years"))
[1] 31 30 29 28 27 26
From the comments of this blog entry, I found the age_calc function in the eeptools package. It takes care of edge cases (leap years, etc.), checks inputs and looks quite robust.
library(eeptools)
x <- as.Date(c("2011-01-01", "1996-02-29"))
age_calc(x[1],x[2]) # default is age in months
[1] 46.73333 224.83118
age_calc(x[1],x[2], units = "years") # but you can set it to years
[1] 3.893151 18.731507
floor(age_calc(x[1],x[2], units = "years"))
[1] 3 18
For your data
yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years"))
assuming you want age in integer years.
Assume you have a data.table, you could do below:
library(data.table)
library(lubridate)
# toy data
X = data.table(birth=seq(from=as.Date("1970-01-01"), to=as.Date("1980-12-31"), by="year"))
Sys.Date()
Option 1 : use "as.period" from lubriate package
X[, age := as.period(Sys.Date() - birth)][]
birth age
1: 1970-01-01 44y 0m 327d 0H 0M 0S
2: 1971-01-01 43y 0m 327d 6H 0M 0S
3: 1972-01-01 42y 0m 327d 12H 0M 0S
4: 1973-01-01 41y 0m 326d 18H 0M 0S
5: 1974-01-01 40y 0m 327d 0H 0M 0S
6: 1975-01-01 39y 0m 327d 6H 0M 0S
7: 1976-01-01 38y 0m 327d 12H 0M 0S
8: 1977-01-01 37y 0m 326d 18H 0M 0S
9: 1978-01-01 36y 0m 327d 0H 0M 0S
10: 1979-01-01 35y 0m 327d 6H 0M 0S
11: 1980-01-01 34y 0m 327d 12H 0M 0S
Option 2 : if you do not like the format of Option 1, you could do below:
yr = duration(num = 1, units = "years")
X[, age := new_interval(birth, Sys.Date())/yr][]
# you get
birth age
1: 1970-01-01 44.92603
2: 1971-01-01 43.92603
3: 1972-01-01 42.92603
4: 1973-01-01 41.92329
5: 1974-01-01 40.92329
6: 1975-01-01 39.92329
7: 1976-01-01 38.92329
8: 1977-01-01 37.92055
9: 1978-01-01 36.92055
10: 1979-01-01 35.92055
11: 1980-01-01 34.92055
Believe Option 2 should be the more desirable.
I prefer to do this using the lubridate package, borrowing syntax I originally encountered in another post.
It's necessary to standardize your input dates in terms of R date objects, preferably with the lubridate::mdy() or lubridate::ymd() or similar functions, as applicable. You can use the interval() function to generate an interval describing the time elapsed between the two dates, and then use the duration() function to define how this interval should be "diced".
I've summarized the simplest case for calculating an age from two dates below, using the most current syntax in R.
df$DOB <- mdy(df$DOB)
df$EndDate <- mdy(df$EndDate)
df$Calc_Age <- interval(start= df$DOB, end=df$EndDate)/
duration(n=1, unit="years")
Age may be rounded down to the nearest complete integer using the base R 'floor()` function, like so:
df$Calc_AgeF <- floor(df$Calc_Age)
Alternately, the digits= argument in the base R round() function can be used to round up or down, and specify the exact number of decimals in the returned value, like so:
df$Calc_Age2 <- round(df$Calc_Age, digits = 2) ## 2 decimals
df$Calc_Age0 <- round(df$Calc_Age, digits = 0) ## nearest integer
It's worth noting that once the input dates are passed through the calculation step described above (i.e., interval() and duration() functions) , the returned value will be numeric and no longer a date object in R. This is significant whereas the lubridate::floor_date() is limited strictly to date-time objects.
The above syntax works regardless whether the input dates occur in a data.table or data.frame object.
I wanted an implementation that didn't increase my dependencies beyond data.table, which is usually my only dependency. The data.table is only needed for mday, which means day of the month.
Development function
This function is logically how I would think about someone's age. I start with [current year] - [brith year] - 1, then add 1 if they've already had their birthday in the current year. To check for that offset I start by considering month, then (if necessary) day of month.
Here is that step by step implementation:
agecalc <- function(origin, current){
require(data.table)
y <- year(current) - year(origin) - 1
offset <- 0
if(month(current) > month(origin)) offset <- 1
if(month(current) == month(origin) &
mday(current) >= mday(origin)) offset <- 1
age <- y + offset
return(age)
}
Production function
This is the same logic refactored and vectorized:
agecalc <- function(origin, current){
require(data.table)
age <- year(current) - year(origin) - 1
ii <- (month(current) > month(origin)) | (month(current) == month(origin) &
mday(current) >= mday(origin))
age[ii] <- age[ii] + 1
return(age)
}
Experimental function that uses strings
You could also do a string comparison on the month / day part. Perhaps there are times when this is more efficient, for example if you had the year as a number and the birth date as a string.
agecalc_strings <- function(origin, current){
origin <- as.character(origin)
current <- as.character(current)
age <- as.numeric(substr(current, 1, 4)) - as.numeric(substr(origin, 1, 4)) - 1
if(substr(current, 6, 10) >= substr(origin, 6, 10)){
age <- age + 1
}
return(age)
}
Some tests on the vectorized "production" version:
## Examples for specific dates to test the calculation with things like
## beginning and end of months, and leap years:
agecalc(as.IDate("1985-08-13"), as.IDate("1985-08-12"))
agecalc(as.IDate("1985-08-13"), as.IDate("1985-08-13"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-08-12"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-08-13"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-09-12"))
agecalc(as.IDate("2000-02-29"), as.IDate("2000-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2000-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-03-01"))
agecalc(as.IDate("2000-02-29"), as.IDate("2004-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2004-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2011-03-01"))
## Testing every age for every day over several years
## This test requires vectorized version:
d <- data.table(d=as.IDate("2000-01-01") + 0:10000)
d[ , b1 := as.IDate("2000-08-15")]
d[ , b2 := as.IDate("2000-02-29")]
d[ , age1_num := (d - b1) / 365]
d[ , age2_num := (d - b2) / 365]
d[ , age1 := agecalc(b1, d)]
d[ , age2 := agecalc(b2, d)]
d
Below is a trivial plot of ages as numeric and integer. As you can see the
integer ages are a sort of stair step pattern that is tangent to (but below) the
straight line of numeric ages.
plot(numeric_age1 ~ today, dt, type = "l",
ylab = "ages", main = "ages plotted")
lines(integer_age1 ~ today, dt, col = "blue")
I wasn't happy with any of the responses when it comes to calculating the age in months or years, when dealing with leap years, so this is my function using the lubridate package.
Basically, it slices the interval between from and to into (up to) yearly chunks, and then adjusts the interval for whether that chunk is leap year or not. The total interval is the sum of the age of each chunk.
library(lubridate)
#' Get Age of Date relative to Another Date
#'
#' #param from,to the date or dates to consider
#' #param units the units to consider
#' #param floor logical as to whether to floor the result
#' #param simple logical as to whether to do a simple calculation, a simple calculation doesn't account for leap year.
#' #author Nicholas Hamilton
#' #export
age <- function(from, to = today(), units = "years", floor = FALSE, simple = FALSE) {
#Account for Leap Year if Working in Months and Years
if(!simple && length(grep("^(month|year)",units)) > 0){
df = data.frame(from,to)
calc = sapply(1:nrow(df),function(r){
#Start and Finish Points
st = df[r,1]; fn = df[r,2]
#If there is no difference, age is zero
if(st == fn){ return(0) }
#If there is a difference, age is not zero and needs to be calculated
sign = +1 #Age Direction
if(st > fn){ tmp = st; st = fn; fn = tmp; sign = -1 } #Swap and Change sign
#Determine the slice-points
mid = ceiling_date(seq(st,fn,by='year'),'year')
#Build the sequence
dates = unique( c(st,mid,fn) )
dates = dates[which(dates >= st & dates <= fn)]
#Determine the age of the chunks
chunks = sapply(head(seq_along(dates),-1),function(ix){
k = 365/( 365 + leap_year(dates[ix]) )
k*interval( dates[ix], dates[ix+1] ) / duration(num = 1, units = units)
})
#Sum the Chunks, and account for direction
sign*sum(chunks)
})
#If Simple Calculation or Not Months or Not years
}else{
calc = interval(from,to) / duration(num = 1, units = units)
}
if (floor) calc = as.integer(floor(calc))
calc
}
(Sys.Date() - yourDate) / 365.25
A very simple way of calculating the age from two dates without using any additional packages probably is:
df$age = with(df, as.Date(date_2, "%Y-%m-%d") - as.Date(date_1, "%Y-%m-%d"))
Here is a (I think simpler) solution using lubridate:
library(lubridate)
age <- function(dob, on.day=today()) {
intvl <- interval(dob, on.day)
prd <- as.period(intvl)
return(prd#year)
}
Note that age_calc from the eeptools package in particular fails on cases with the year 2000 around birthdays.
Some examples that don't work in age_calc:
library(lubridate)
library(eeptools)
age_calc(ymd("1997-04-21"), ymd("2000-04-21"), units = "years")
age_calc(ymd("2000-04-21"), ymd("2019-04-21"), units = "years")
age_calc(ymd("2000-04-21"), ymd("2016-04-21"), units = "years")
Some of the other solutions also have some output that is not intuitive to what I would want for decimal ages when leap years are involved. I like #James_D 's solution and it is precise and concise, but I wanted something where the decimal age is calculated as complete years plus the fraction of the year completed from their last birthday to their next birthday (which would be out of 365 or 366 days depending on year). In the case of leap years I use lubridate's rollback function to use March 1st for non-leap years following February 29th. I used some test cases from #geneorama and added some of my own, and the output aligns with what I would expect.
library(lubridate)
# Calculate precise age from birthdate in ymd format
age_calculation <- function(birth_date, later_year) {
if (birth_date > later_year)
{
stop("Birth date is after the desired date!")
}
# Calculate the most recent birthday of the person based on the desired year
latest_bday <- ymd(add_with_rollback(birth_date, years((year(later_year) - year(birth_date))), roll_to_first = TRUE))
# Get amount of days between the desired date and the latest birthday
days_between <- as.numeric(days(later_year - latest_bday), units = "days")
# Get how many days are in the year between their most recent and next bdays
year_length <- as.numeric(days((add_with_rollback(latest_bday, years(1), roll_to_first = TRUE)) - latest_bday), units = "days")
# Get the year fraction (amount of year completed before next birthday)
fraction_year <- days_between/year_length
# Sum the difference of years with the year fraction
age_sum <- (year(later_year) - year(birth_date)) + fraction_year
return(age_sum)
}
test_list <- list(c("1985-08-13", "1986-08-12"),
c("1985-08-13", "1985-08-13"),
c("1985-08-13", "1986-08-13"),
c("1985-08-13", "1986-09-12"),
c("2000-02-29", "2000-02-29"),
c("2000-02-29", "2000-03-01"),
c("2000-02-29", "2001-02-28"),
c("2000-02-29", "2004-02-29"),
c("2000-02-29", "2011-03-01"),
c("1997-04-21", "2000-04-21"),
c("2000-04-21", "2016-04-21"),
c("2000-04-21", "2019-04-21"),
c("2017-06-15", "2018-04-30"),
c("2019-04-20", "2019-08-24"),
c("2020-05-25", "2021-11-25"),
c("2020-11-25", "2021-11-24"),
c("2020-11-24", "2020-11-25"),
c("2020-02-28", "2020-02-29"),
c("2020-02-29", "2020-02-28"))
for (i in 1:length(test_list))
{
print(paste0("Dates from ", test_list[[i]][1], " to ", test_list[[i]][2]))
result <- age_calculation(ymd(test_list[[i]][1]), ymd(test_list[[i]][2]))
print(result)
}
Output:
[1] "Dates from 1985-08-13 to 1986-08-12"
[1] 0.9972603
[1] "Dates from 1985-08-13 to 1985-08-13"
[1] 0
[1] "Dates from 1985-08-13 to 1986-08-13"
[1] 1
[1] "Dates from 1985-08-13 to 1986-09-12"
[1] 1.082192
[1] "Dates from 2000-02-29 to 2000-02-29"
[1] 0
[1] "Dates from 2000-02-29 to 2000-03-01"
[1] 0.00273224
[1] "Dates from 2000-02-29 to 2001-02-28"
[1] 0.9972603
[1] "Dates from 2000-02-29 to 2004-02-29"
[1] 4
[1] "Dates from 2000-02-29 to 2011-03-01"
[1] 11
[1] "Dates from 1997-04-21 to 2000-04-21"
[1] 3
[1] "Dates from 2000-04-21 to 2016-04-21"
[1] 16
[1] "Dates from 2000-04-21 to 2019-04-21"
[1] 19
[1] "Dates from 2017-06-15 to 2018-04-30"
[1] 0.8739726
[1] "Dates from 2019-04-20 to 2019-08-24"
[1] 0.3442623
[1] "Dates from 2020-05-25 to 2021-11-25"
[1] 1.50411
[1] "Dates from 2020-11-25 to 2021-11-24"
[1] 0.9972603
[1] "Dates from 2020-11-24 to 2020-11-25"
[1] 0.002739726
[1] "Dates from 2020-02-28 to 2020-02-29"
[1] 0.00273224
[1] "Dates from 2020-02-29 to 2020-02-28"
Error in age_calculation(ymd(test_list[[i]][1]), ymd(test_list[[i]][2])) :
Birth date is after the desired date!
As others have been saying, the trunc function is excellent to get integer age.
I realise there are a lot of answers but since I can't help myself, I might as well add to the discussion.
I'm building a package that's focused on dates and datetimes and in it I use a function called time_diff(). Here is a simplified version.
time_diff <- function(x, y, units, num = 1,
type = c("duration", "period"),
as_period = FALSE){
type <- match.arg(type)
units <- match.arg(units, c("picoseconds", "nanoseconds", "microseconds",
"milliseconds", "seconds", "minutes", "hours", "days",
"weeks", "months", "years"))
int <- lubridate::interval(x, y)
if (as_period || type == "period"){
if (as_period) int <- lubridate::as.period(int, unit = units)
unit <- lubridate::period(num = num, units = units)
} else {
unit <- do.call(get(paste0("d", units),
asNamespace("lubridate")),
list(x = num))
}
out <- int / unit
out
}
# Wrapper around the more general time_diff
age_years <- function(x, y){
trunc(time_diff(x, y, units = "years", num = 1,
type = "period", as_period = TRUE))
}
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
bday <- dmy("01-01-2000")
time_diff(bday, today(), "years", type = "period")
#> [1] 23.11233
leap1 <- dmy("29-02-2020")
leap2 <- dmy("28-02-2021")
leap3 <- dmy("01-03-2021")
# Many people might say this is wrong so use the more exact age_years
time_diff(leap1, leap2, "years", type = "period")
#> [1] 1
# age in years, accounting for leap years properly
age_years(leap1, leap2)
#> [1] 0
age_years(leap1, leap3)
#> [1] 1
# So to add a column of ages in years, one can do this..
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
my_data <- tibble(dob = seq(bday, today(), by = "day"))
my_data <- my_data %>%
mutate(age_years = age_years(dob, today()))
slice_head(my_data, n = 10)
#> # A tibble: 10 x 2
#> dob age_years
#> <date> <dbl>
#> 1 2000-01-01 23
#> 2 2000-01-02 23
#> 3 2000-01-03 23
#> 4 2000-01-04 23
#> 5 2000-01-05 23
#> 6 2000-01-06 23
#> 7 2000-01-07 23
#> 8 2000-01-08 23
#> 9 2000-01-09 23
#> 10 2000-01-10 23
Created on 2023-02-11 with reprex v2.0.2

Converting a number into time (0,5 of an hour = 00:30:00)

I am trying to convert a number into time format.
For example:
I calculate how long has to be charged an electric car at the charging station of 11 kWh.
Energy demand - 2,8 kWh
Charging time = 2,8 kWh/11 kWh = 0,257 h
0,257 h = 15 min 25 sec. = 00:15:25
How can I convert 0,257 h into 00:15:25 in R?
Based on the example, we will assume that the input is less than 24 (but if that is not the case these could be modified to handle that depending on the definition of what such an input should produce).
1) chron::times Use chron times like this. times measures times in fractions of a day so divide the hours (.257) by 24 to give the fraction of a day that it represents.
library(chron)
times(.257 / 24)
## [1] 00:15:25
This gives a chron "times" class object. If x is such an object use format(x) to convert it to a character string, if desired.
2) POSIXct This uses no packages although it is longer. It returns the time as a character string. POSIXct measures time in seconds and so multiply the hours (.257) by 3600 as there are 3600 seconds in an hour.
format(as.POSIXct("1970-01-01") + 3600 * .257, "%H:%M:%S")
## [1] "00:15:25"
2a) This variation would also work. It is longer but it involves no conversion factors. It returns a character string.
format(as.POSIXct("1970-01-01") + as.difftime(.257, units = "hours"), "%H:%M:%S")
## [1] "00:15:25"
Updates: Added (2). Also added (2a) and improved (2).
The answer by #GGrothendieck seems to be the way to go here. But if you had to do this in base R, you could just compute the hour, minute, and second components and build the time string manually:
x <- 2.257 # number of hours
total <- round(x*60*60, digits=0) # the total number of seconds
hours <- trunc(total / (60*60))
minutes <- trunc((x - hours) * 60)
seconds <- total %% 60
ts <- paste0(formatC(hours, width=2, flag="0"), ":",
formatC(minutes, width=2, flag="0"), ":",
formatC(seconds, width=2, flag="0"))
ts
[1] "02:15:25"
Demo
The tidyverse solution would use the hms package:
hms::hms(0.257 * 60^2)
#> 00:15:25.2
Gives you an object of classes hms and difftime. If you want a string:
format(hms::hms(0.257 * 60^2))
#> [1] "00:15:25.2"

change a column from birth date to age in r

I am using data.table for the first time.
I have a column of about 400,000 ages in my table. I need to convert them from birth dates to ages.
What is the best way to do this?
I've been thinking about this and have been dissatisfied with the two answers so far. I like using lubridate, as #KFB did, but I also want things wrapped up nicely in a function, as in my answer using the eeptools package. So here's a wrapper function using the lubridate interval method with some nice options:
#' Calculate age
#'
#' By default, calculates the typical "age in years", with a
#' \code{floor} applied so that you are, e.g., 5 years old from
#' 5th birthday through the day before your 6th birthday. Set
#' \code{floor = FALSE} to return decimal ages, and change \code{units}
#' for units other than years.
#' #param dob date-of-birth, the day to start calculating age.
#' #param age.day the date on which age is to be calculated.
#' #param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}.
#' #param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}.
#' #return Age in \code{units}. Will be an integer if \code{floor = TRUE}.
#' #examples
#' my.dob <- as.Date('1983-10-20')
#' age(my.dob)
#' age(my.dob, units = "minutes")
#' age(my.dob, floor = FALSE)
age <- function(dob, age.day = today(), units = "years", floor = TRUE) {
calc.age = lubridate::interval(dob, age.day) / lubridate::duration(num = 1, units = units)
if (floor) return(as.integer(floor(calc.age)))
return(calc.age)
}
Usage examples:
> my.dob <- as.Date('1983-10-20')
> age(my.dob)
[1] 31
> age(my.dob, floor = FALSE)
[1] 31.15616
> age(my.dob, units = "minutes")
[1] 16375680
> age(seq(my.dob, length.out = 6, by = "years"))
[1] 31 30 29 28 27 26
From the comments of this blog entry, I found the age_calc function in the eeptools package. It takes care of edge cases (leap years, etc.), checks inputs and looks quite robust.
library(eeptools)
x <- as.Date(c("2011-01-01", "1996-02-29"))
age_calc(x[1],x[2]) # default is age in months
[1] 46.73333 224.83118
age_calc(x[1],x[2], units = "years") # but you can set it to years
[1] 3.893151 18.731507
floor(age_calc(x[1],x[2], units = "years"))
[1] 3 18
For your data
yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years"))
assuming you want age in integer years.
Assume you have a data.table, you could do below:
library(data.table)
library(lubridate)
# toy data
X = data.table(birth=seq(from=as.Date("1970-01-01"), to=as.Date("1980-12-31"), by="year"))
Sys.Date()
Option 1 : use "as.period" from lubriate package
X[, age := as.period(Sys.Date() - birth)][]
birth age
1: 1970-01-01 44y 0m 327d 0H 0M 0S
2: 1971-01-01 43y 0m 327d 6H 0M 0S
3: 1972-01-01 42y 0m 327d 12H 0M 0S
4: 1973-01-01 41y 0m 326d 18H 0M 0S
5: 1974-01-01 40y 0m 327d 0H 0M 0S
6: 1975-01-01 39y 0m 327d 6H 0M 0S
7: 1976-01-01 38y 0m 327d 12H 0M 0S
8: 1977-01-01 37y 0m 326d 18H 0M 0S
9: 1978-01-01 36y 0m 327d 0H 0M 0S
10: 1979-01-01 35y 0m 327d 6H 0M 0S
11: 1980-01-01 34y 0m 327d 12H 0M 0S
Option 2 : if you do not like the format of Option 1, you could do below:
yr = duration(num = 1, units = "years")
X[, age := new_interval(birth, Sys.Date())/yr][]
# you get
birth age
1: 1970-01-01 44.92603
2: 1971-01-01 43.92603
3: 1972-01-01 42.92603
4: 1973-01-01 41.92329
5: 1974-01-01 40.92329
6: 1975-01-01 39.92329
7: 1976-01-01 38.92329
8: 1977-01-01 37.92055
9: 1978-01-01 36.92055
10: 1979-01-01 35.92055
11: 1980-01-01 34.92055
Believe Option 2 should be the more desirable.
I prefer to do this using the lubridate package, borrowing syntax I originally encountered in another post.
It's necessary to standardize your input dates in terms of R date objects, preferably with the lubridate::mdy() or lubridate::ymd() or similar functions, as applicable. You can use the interval() function to generate an interval describing the time elapsed between the two dates, and then use the duration() function to define how this interval should be "diced".
I've summarized the simplest case for calculating an age from two dates below, using the most current syntax in R.
df$DOB <- mdy(df$DOB)
df$EndDate <- mdy(df$EndDate)
df$Calc_Age <- interval(start= df$DOB, end=df$EndDate)/
duration(n=1, unit="years")
Age may be rounded down to the nearest complete integer using the base R 'floor()` function, like so:
df$Calc_AgeF <- floor(df$Calc_Age)
Alternately, the digits= argument in the base R round() function can be used to round up or down, and specify the exact number of decimals in the returned value, like so:
df$Calc_Age2 <- round(df$Calc_Age, digits = 2) ## 2 decimals
df$Calc_Age0 <- round(df$Calc_Age, digits = 0) ## nearest integer
It's worth noting that once the input dates are passed through the calculation step described above (i.e., interval() and duration() functions) , the returned value will be numeric and no longer a date object in R. This is significant whereas the lubridate::floor_date() is limited strictly to date-time objects.
The above syntax works regardless whether the input dates occur in a data.table or data.frame object.
I wanted an implementation that didn't increase my dependencies beyond data.table, which is usually my only dependency. The data.table is only needed for mday, which means day of the month.
Development function
This function is logically how I would think about someone's age. I start with [current year] - [brith year] - 1, then add 1 if they've already had their birthday in the current year. To check for that offset I start by considering month, then (if necessary) day of month.
Here is that step by step implementation:
agecalc <- function(origin, current){
require(data.table)
y <- year(current) - year(origin) - 1
offset <- 0
if(month(current) > month(origin)) offset <- 1
if(month(current) == month(origin) &
mday(current) >= mday(origin)) offset <- 1
age <- y + offset
return(age)
}
Production function
This is the same logic refactored and vectorized:
agecalc <- function(origin, current){
require(data.table)
age <- year(current) - year(origin) - 1
ii <- (month(current) > month(origin)) | (month(current) == month(origin) &
mday(current) >= mday(origin))
age[ii] <- age[ii] + 1
return(age)
}
Experimental function that uses strings
You could also do a string comparison on the month / day part. Perhaps there are times when this is more efficient, for example if you had the year as a number and the birth date as a string.
agecalc_strings <- function(origin, current){
origin <- as.character(origin)
current <- as.character(current)
age <- as.numeric(substr(current, 1, 4)) - as.numeric(substr(origin, 1, 4)) - 1
if(substr(current, 6, 10) >= substr(origin, 6, 10)){
age <- age + 1
}
return(age)
}
Some tests on the vectorized "production" version:
## Examples for specific dates to test the calculation with things like
## beginning and end of months, and leap years:
agecalc(as.IDate("1985-08-13"), as.IDate("1985-08-12"))
agecalc(as.IDate("1985-08-13"), as.IDate("1985-08-13"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-08-12"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-08-13"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-09-12"))
agecalc(as.IDate("2000-02-29"), as.IDate("2000-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2000-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-03-01"))
agecalc(as.IDate("2000-02-29"), as.IDate("2004-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2004-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2011-03-01"))
## Testing every age for every day over several years
## This test requires vectorized version:
d <- data.table(d=as.IDate("2000-01-01") + 0:10000)
d[ , b1 := as.IDate("2000-08-15")]
d[ , b2 := as.IDate("2000-02-29")]
d[ , age1_num := (d - b1) / 365]
d[ , age2_num := (d - b2) / 365]
d[ , age1 := agecalc(b1, d)]
d[ , age2 := agecalc(b2, d)]
d
Below is a trivial plot of ages as numeric and integer. As you can see the
integer ages are a sort of stair step pattern that is tangent to (but below) the
straight line of numeric ages.
plot(numeric_age1 ~ today, dt, type = "l",
ylab = "ages", main = "ages plotted")
lines(integer_age1 ~ today, dt, col = "blue")
I wasn't happy with any of the responses when it comes to calculating the age in months or years, when dealing with leap years, so this is my function using the lubridate package.
Basically, it slices the interval between from and to into (up to) yearly chunks, and then adjusts the interval for whether that chunk is leap year or not. The total interval is the sum of the age of each chunk.
library(lubridate)
#' Get Age of Date relative to Another Date
#'
#' #param from,to the date or dates to consider
#' #param units the units to consider
#' #param floor logical as to whether to floor the result
#' #param simple logical as to whether to do a simple calculation, a simple calculation doesn't account for leap year.
#' #author Nicholas Hamilton
#' #export
age <- function(from, to = today(), units = "years", floor = FALSE, simple = FALSE) {
#Account for Leap Year if Working in Months and Years
if(!simple && length(grep("^(month|year)",units)) > 0){
df = data.frame(from,to)
calc = sapply(1:nrow(df),function(r){
#Start and Finish Points
st = df[r,1]; fn = df[r,2]
#If there is no difference, age is zero
if(st == fn){ return(0) }
#If there is a difference, age is not zero and needs to be calculated
sign = +1 #Age Direction
if(st > fn){ tmp = st; st = fn; fn = tmp; sign = -1 } #Swap and Change sign
#Determine the slice-points
mid = ceiling_date(seq(st,fn,by='year'),'year')
#Build the sequence
dates = unique( c(st,mid,fn) )
dates = dates[which(dates >= st & dates <= fn)]
#Determine the age of the chunks
chunks = sapply(head(seq_along(dates),-1),function(ix){
k = 365/( 365 + leap_year(dates[ix]) )
k*interval( dates[ix], dates[ix+1] ) / duration(num = 1, units = units)
})
#Sum the Chunks, and account for direction
sign*sum(chunks)
})
#If Simple Calculation or Not Months or Not years
}else{
calc = interval(from,to) / duration(num = 1, units = units)
}
if (floor) calc = as.integer(floor(calc))
calc
}
(Sys.Date() - yourDate) / 365.25
A very simple way of calculating the age from two dates without using any additional packages probably is:
df$age = with(df, as.Date(date_2, "%Y-%m-%d") - as.Date(date_1, "%Y-%m-%d"))
Here is a (I think simpler) solution using lubridate:
library(lubridate)
age <- function(dob, on.day=today()) {
intvl <- interval(dob, on.day)
prd <- as.period(intvl)
return(prd#year)
}
Note that age_calc from the eeptools package in particular fails on cases with the year 2000 around birthdays.
Some examples that don't work in age_calc:
library(lubridate)
library(eeptools)
age_calc(ymd("1997-04-21"), ymd("2000-04-21"), units = "years")
age_calc(ymd("2000-04-21"), ymd("2019-04-21"), units = "years")
age_calc(ymd("2000-04-21"), ymd("2016-04-21"), units = "years")
Some of the other solutions also have some output that is not intuitive to what I would want for decimal ages when leap years are involved. I like #James_D 's solution and it is precise and concise, but I wanted something where the decimal age is calculated as complete years plus the fraction of the year completed from their last birthday to their next birthday (which would be out of 365 or 366 days depending on year). In the case of leap years I use lubridate's rollback function to use March 1st for non-leap years following February 29th. I used some test cases from #geneorama and added some of my own, and the output aligns with what I would expect.
library(lubridate)
# Calculate precise age from birthdate in ymd format
age_calculation <- function(birth_date, later_year) {
if (birth_date > later_year)
{
stop("Birth date is after the desired date!")
}
# Calculate the most recent birthday of the person based on the desired year
latest_bday <- ymd(add_with_rollback(birth_date, years((year(later_year) - year(birth_date))), roll_to_first = TRUE))
# Get amount of days between the desired date and the latest birthday
days_between <- as.numeric(days(later_year - latest_bday), units = "days")
# Get how many days are in the year between their most recent and next bdays
year_length <- as.numeric(days((add_with_rollback(latest_bday, years(1), roll_to_first = TRUE)) - latest_bday), units = "days")
# Get the year fraction (amount of year completed before next birthday)
fraction_year <- days_between/year_length
# Sum the difference of years with the year fraction
age_sum <- (year(later_year) - year(birth_date)) + fraction_year
return(age_sum)
}
test_list <- list(c("1985-08-13", "1986-08-12"),
c("1985-08-13", "1985-08-13"),
c("1985-08-13", "1986-08-13"),
c("1985-08-13", "1986-09-12"),
c("2000-02-29", "2000-02-29"),
c("2000-02-29", "2000-03-01"),
c("2000-02-29", "2001-02-28"),
c("2000-02-29", "2004-02-29"),
c("2000-02-29", "2011-03-01"),
c("1997-04-21", "2000-04-21"),
c("2000-04-21", "2016-04-21"),
c("2000-04-21", "2019-04-21"),
c("2017-06-15", "2018-04-30"),
c("2019-04-20", "2019-08-24"),
c("2020-05-25", "2021-11-25"),
c("2020-11-25", "2021-11-24"),
c("2020-11-24", "2020-11-25"),
c("2020-02-28", "2020-02-29"),
c("2020-02-29", "2020-02-28"))
for (i in 1:length(test_list))
{
print(paste0("Dates from ", test_list[[i]][1], " to ", test_list[[i]][2]))
result <- age_calculation(ymd(test_list[[i]][1]), ymd(test_list[[i]][2]))
print(result)
}
Output:
[1] "Dates from 1985-08-13 to 1986-08-12"
[1] 0.9972603
[1] "Dates from 1985-08-13 to 1985-08-13"
[1] 0
[1] "Dates from 1985-08-13 to 1986-08-13"
[1] 1
[1] "Dates from 1985-08-13 to 1986-09-12"
[1] 1.082192
[1] "Dates from 2000-02-29 to 2000-02-29"
[1] 0
[1] "Dates from 2000-02-29 to 2000-03-01"
[1] 0.00273224
[1] "Dates from 2000-02-29 to 2001-02-28"
[1] 0.9972603
[1] "Dates from 2000-02-29 to 2004-02-29"
[1] 4
[1] "Dates from 2000-02-29 to 2011-03-01"
[1] 11
[1] "Dates from 1997-04-21 to 2000-04-21"
[1] 3
[1] "Dates from 2000-04-21 to 2016-04-21"
[1] 16
[1] "Dates from 2000-04-21 to 2019-04-21"
[1] 19
[1] "Dates from 2017-06-15 to 2018-04-30"
[1] 0.8739726
[1] "Dates from 2019-04-20 to 2019-08-24"
[1] 0.3442623
[1] "Dates from 2020-05-25 to 2021-11-25"
[1] 1.50411
[1] "Dates from 2020-11-25 to 2021-11-24"
[1] 0.9972603
[1] "Dates from 2020-11-24 to 2020-11-25"
[1] 0.002739726
[1] "Dates from 2020-02-28 to 2020-02-29"
[1] 0.00273224
[1] "Dates from 2020-02-29 to 2020-02-28"
Error in age_calculation(ymd(test_list[[i]][1]), ymd(test_list[[i]][2])) :
Birth date is after the desired date!
As others have been saying, the trunc function is excellent to get integer age.
I realise there are a lot of answers but since I can't help myself, I might as well add to the discussion.
I'm building a package that's focused on dates and datetimes and in it I use a function called time_diff(). Here is a simplified version.
time_diff <- function(x, y, units, num = 1,
type = c("duration", "period"),
as_period = FALSE){
type <- match.arg(type)
units <- match.arg(units, c("picoseconds", "nanoseconds", "microseconds",
"milliseconds", "seconds", "minutes", "hours", "days",
"weeks", "months", "years"))
int <- lubridate::interval(x, y)
if (as_period || type == "period"){
if (as_period) int <- lubridate::as.period(int, unit = units)
unit <- lubridate::period(num = num, units = units)
} else {
unit <- do.call(get(paste0("d", units),
asNamespace("lubridate")),
list(x = num))
}
out <- int / unit
out
}
# Wrapper around the more general time_diff
age_years <- function(x, y){
trunc(time_diff(x, y, units = "years", num = 1,
type = "period", as_period = TRUE))
}
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
bday <- dmy("01-01-2000")
time_diff(bday, today(), "years", type = "period")
#> [1] 23.11233
leap1 <- dmy("29-02-2020")
leap2 <- dmy("28-02-2021")
leap3 <- dmy("01-03-2021")
# Many people might say this is wrong so use the more exact age_years
time_diff(leap1, leap2, "years", type = "period")
#> [1] 1
# age in years, accounting for leap years properly
age_years(leap1, leap2)
#> [1] 0
age_years(leap1, leap3)
#> [1] 1
# So to add a column of ages in years, one can do this..
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
my_data <- tibble(dob = seq(bday, today(), by = "day"))
my_data <- my_data %>%
mutate(age_years = age_years(dob, today()))
slice_head(my_data, n = 10)
#> # A tibble: 10 x 2
#> dob age_years
#> <date> <dbl>
#> 1 2000-01-01 23
#> 2 2000-01-02 23
#> 3 2000-01-03 23
#> 4 2000-01-04 23
#> 5 2000-01-05 23
#> 6 2000-01-06 23
#> 7 2000-01-07 23
#> 8 2000-01-08 23
#> 9 2000-01-09 23
#> 10 2000-01-10 23
Created on 2023-02-11 with reprex v2.0.2

Get date difference in years (floating point)

I want to correct source activity based on the difference between reference and measurement date and source half life (measured in years). Say I have
ref_date <- as.Date('06/01/08',format='%d/%m/%y')
and a column in my data.frame with the same date format, e.g.,
today <- as.Date(Sys.Date(), format='%d/%m/%y')
I can find the number of years between these dates using the lubridate package
year(today)-year(ref_date)
[1] 5
Is there a function I can use to get a floating point answer today - ref_date = 5.2y, for example?
Yes, of course, use difftime() with an as numeric:
R> as.numeric(difftime(as.Date("2003-04-05"), as.Date("2001-01-01"),
+ unit="weeks"))/52.25
[1] 2.2529
R>
Note that we do have to switch to weeks scaled by 52.25 as there is a bit of ambiguity
there in terms of counting years---a February 29 comes around every 4 years but not every 100th etc.
So you have to define that. difftime() handles all time units up to weeks. Months cannot be done for the same reason of the non-constant 'numerator'.
The lubridate package contains a built-in function, time_length, which can help perform this task.
time_length(difftime(as.Date("2003-04-05"), as.Date("2001-01-01")), "years")
[1] 2.257534
time_length(difftime(as.Date("2017-03-01"), as.Date("2012-03-01")),"years")
[1] 5.00274
Documentation for the lubridate package can be found here.
Inspired by Bryan F, time_length() would work better if using interval object
time_length(interval(as.Date("2003-04-05"), as.Date("2001-01-01")), "years")
[1] -2.257534
time_length(difftime(as.Date("2017-03-01"), as.Date("2012-03-01")),"years")
[1] 5.00274
time_length(interval(as.Date("2017-03-01"), as.Date("2012-03-01")),"years")
[1] -5
You can see if you use interval() to get the time difference and then pass it to time_length(), time_length() would take into account the fact that not all months and years have the same number of days, e.g., the leap year.
Not an exact answer to your question, but the answer from Dirk Eddelbuettel in some situations can produce small errors.
Please, consider the following example:
as.numeric(difftime(as.Date("2012-03-01"), as.Date("2017-03-01"), unit="weeks"))/52.25
[1] -4.992481
The correct answer here should be at least 5 years.
The following function (using lubridate package) will calculate a number of full years between two dates:
# Function to calculate an exact full number of years between two dates
year.diff <- function(firstDate, secondDate) {
yearsdiff <- year(secondDate) - year(firstDate)
monthsdiff <- month(secondDate) - month(firstDate)
daysdiff <- day(secondDate) - day(firstDate)
if ((monthsdiff < 0) | (monthsdiff == 0 & daysdiff < 0)) {
yearsdiff <- yearsdiff - 1
}
yearsdiff
}
You can modify it to calculate a fractional part depending on how you define the number of days in the last (not finished) year.
You can use the function AnnivDates() of the package BondValuation:
R> library('BondValuation')
R> DateIndexes <- unlist(
+ suppressWarnings(
+ AnnivDates("2001-01-01", "2003-04-05", CpY=1)$DateVectors[2]
+ )
+ )
R> names(DateIndexes) <- NULL
R> DateIndexes[length(DateIndexes)] - DateIndexes[1]
[1] 2.257534
Click here for documentation of the package BondValuation.
To get the date difference in years (floating point) you can convert the dates to decimal numbers of Year and calculate then their difference.
#Example Dates
x <- as.Date(c("2001-01-01", "2003-04-05"))
#Convert Date to decimal year:
date2DYear <- function(x) {
as.numeric(format(x,"%Y")) + #Get Year an add
(as.numeric(format(x,"%j")) - 0.5) / #Day of the year divided by
as.numeric(format(as.Date(paste0(format(x,"%Y"), "-12-31")),"%j")) #days of the year
}
diff(date2DYear(x)) #Get the difference in years
#[1] 2.257534
I subtract 0.5 from the day of the year as it is not known if you are at the beginning or the end of the day and %j starts with 1.
I think the difference between 2012-03-01 and 2017-03-01 need not to be 5 Years, as 2012 has 366 days and 2017 365 and 2012-03-01 is on the 61 day of the year and 2017-03-01 on the 60.
x <- as.Date(c("2012-03-01", "2017-03-01"))
diff(date2DYear(x))
#[1] 4.997713
Note that using time_length and interval from lubridate need not come to the same result when you make a cumulative time difference.
library(lubridate)
x <- as.Date(c("2012-01-01", "2012-03-01", "2012-12-31"))
time_length(interval(x[1], x[3]), "years")
#[1] 0.9972678
time_length(interval(x[1], x[2]), "years") +
time_length(interval(x[2], x[3]), "years")
#[1] 0.9995509 #!
diff(date2DYear(x[c(1,3)]))
#[1] 0.9972678
diff(date2DYear(x[c(1,2)])) + diff(date2DYear(x[c(2,3)]))
#[1] 0.9972678
x <- as.Date(c("2013-01-01", "2013-03-01", "2013-12-31"))
time_length(interval(x[1], x[3]), "years")
#[1] 0.9972603
time_length(interval(x[1], x[2]), "years") +
time_length(interval(x[2], x[3]), "years")
#[1] 0.9972603
diff(date2DYear(x[c(1,3)]))
#[1] 0.9972603
diff(date2DYear(x[c(1,2)])) + diff(date2DYear(x[c(2,3)]))
#[1] 0.9972603
Since you are already using lubridate package, you can obtain number of years in floating point using a simple trick:
find number of seconds in one year:
seconds_in_a_year <- as.integer((seconds(ymd("2010-01-01")) - seconds(ymd("2009-01-01"))))
now obtain number of seconds between the 2 dates you desire
seconds_between_dates <- as.integer(seconds(date1) - seconds(date2))
your final answer for number of years in floating points will be
years_between_dates <- seconds_between_dates / seconds_in_a_year

Get the difference between dates in terms of weeks, months, quarters, and years

I have two dates let´s say 14.01.2013 and 26.03.2014.
I would like to get the difference between those two dates in terms of weeks(?), months(in the example 14), quarters(4) and years(1).
Do you know the best way to get this?
what about this:
# get difference between dates `"01.12.2013"` and `"31.12.2013"`
# weeks
difftime(strptime("26.03.2014", format = "%d.%m.%Y"),
strptime("14.01.2013", format = "%d.%m.%Y"),units="weeks")
Time difference of 62.28571 weeks
# months
(as.yearmon(strptime("26.03.2014", format = "%d.%m.%Y"))-
as.yearmon(strptime("14.01.2013", format = "%d.%m.%Y")))*12
[1] 14
# quarters
(as.yearqtr(strptime("26.03.2014", format = "%d.%m.%Y"))-
as.yearqtr(strptime("14.01.2013", format = "%d.%m.%Y")))*4
[1] 4
# years
year(strptime("26.03.2014", format = "%d.%m.%Y"))-
year(strptime("14.01.2013", format = "%d.%m.%Y"))
[1] 1
as.yearmon() and as.yearqtr() are in package zoo. year() is in package lubridate.
What do you think?
All the existing answers are imperfect (IMO) and either make assumptions about the desired output or don't provide flexibility for the desired output.
Based on the examples from the OP, and the OP's stated expected answers, I think these are the answers you are looking for (plus some additional examples that make it easy to extrapolate).
(This only requires base R and doesn't require zoo or lubridate)
Convert to Datetime Objects
date_strings = c("14.01.2013", "26.03.2014")
datetimes = strptime(date_strings, format = "%d.%m.%Y") # convert to datetime objects
Difference in Days
You can use the diff in days to get some of our later answers
diff_in_days = difftime(datetimes[2], datetimes[1], units = "days") # days
diff_in_days
#Time difference of 435.9583 days
Difference in Weeks
Difference in weeks is a special case of units = "weeks" in difftime()
diff_in_weeks = difftime(datetimes[2], datetimes[1], units = "weeks") # weeks
diff_in_weeks
#Time difference of 62.27976 weeks
Note that this is the same as dividing our diff_in_days by 7 (7 days in a week)
as.double(diff_in_days)/7
#[1] 62.27976
Difference in Years
With similar logic, we can derive years from diff_in_days
diff_in_years = as.double(diff_in_days)/365 # absolute years
diff_in_years
#[1] 1.194406
You seem to be expecting the diff in years to be "1", so I assume you just want to count absolute calendar years or something, which you can easily do by using floor()
# get desired output, given your definition of 'years'
floor(diff_in_years)
#[1] 1
Difference in Quarters
# get desired output for quarters, given your definition of 'quarters'
floor(diff_in_years * 4)
#[1] 4
Difference in Months
Can calculate this as a conversion from diff_years
# months, defined as absolute calendar months (this might be what you want, given your question details)
months_diff = diff_in_years*12
floor(month_diff)
#[1] 14
I know this question is old, but given that I still had to solve this problem just now, I thought I would add my answers. Hope it helps.
For weeks, you can use function difftime:
date1 <- strptime("14.01.2013", format="%d.%m.%Y")
date2 <- strptime("26.03.2014", format="%d.%m.%Y")
difftime(date2,date1,units="weeks")
Time difference of 62.28571 weeks
But difftime doesn't work with duration over weeks.
The following is a very suboptimal solution using cut.POSIXt for those durations but you can work around it:
seq1 <- seq(date1,date2, by="days")
nlevels(cut(seq1,"months"))
15
nlevels(cut(seq1,"quarters"))
5
nlevels(cut(seq1,"years"))
2
This is however the number of months, quarters or years spanned by your time interval and not the duration of your time interval expressed in months, quarters, years (since those do not have a constant duration). Considering the comment you made on #SvenHohenstein answer I would think you can use nlevels(cut(seq1,"months")) - 1 for what you're trying to achieve.
I just wrote this for another question, then stumbled here.
library(lubridate)
#' Calculate age
#'
#' By default, calculates the typical "age in years", with a
#' \code{floor} applied so that you are, e.g., 5 years old from
#' 5th birthday through the day before your 6th birthday. Set
#' \code{floor = FALSE} to return decimal ages, and change \code{units}
#' for units other than years.
#' #param dob date-of-birth, the day to start calculating age.
#' #param age.day the date on which age is to be calculated.
#' #param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}.
#' #param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}.
#' #return Age in \code{units}. Will be an integer if \code{floor = TRUE}.
#' #examples
#' my.dob <- as.Date('1983-10-20')
#' age(my.dob)
#' age(my.dob, units = "minutes")
#' age(my.dob, floor = FALSE)
age <- function(dob, age.day = today(), units = "years", floor = TRUE) {
calc.age = interval(dob, age.day) / duration(num = 1, units = units)
if (floor) return(as.integer(floor(calc.age)))
return(calc.age)
}
Usage examples:
my.dob <- as.Date('1983-10-20')
age(my.dob)
# [1] 31
age(my.dob, floor = FALSE)
# [1] 31.15616
age(my.dob, units = "minutes")
# [1] 16375680
age(seq(my.dob, length.out = 6, by = "years"))
# [1] 31 30 29 28 27 26
Here the still lacking lubridate answer (although Gregor's function is built on this package)
The lubridate timespan documentation is very helpful for understanding the difference between periods and duration. I also like the lubridate cheatsheet and this very useful thread
library(lubridate)
dates <- c(dmy('14.01.2013'), dmy('26.03.2014'))
span <- dates[1] %--% dates[2] #creating an interval object
#creating period objects
as.period(span, unit = 'year')
#> [1] "1y 2m 12d 0H 0M 0S"
as.period(span, unit = 'month')
#> [1] "14m 12d 0H 0M 0S"
as.period(span, unit = 'day')
#> [1] "436d 0H 0M 0S"
Periods do not accept weeks as units. But you can convert durations to weeks:
as.duration(span)/ dweeks(1)
#makes duration object (in seconds) and divides by duration of a week (in seconds)
#> [1] 62.28571
Created on 2019-11-04 by the reprex package (v0.3.0)
Here's a solution:
dates <- c("14.01.2013", "26.03.2014")
# Date format:
dates2 <- strptime(dates, format = "%d.%m.%Y")
dif <- diff(as.numeric(dates2)) # difference in seconds
dif/(60 * 60 * 24 * 7) # weeks
[1] 62.28571
dif/(60 * 60 * 24 * 30) # months
[1] 14.53333
dif/(60 * 60 * 24 * 30 * 3) # quartes
[1] 4.844444
dif/(60 * 60 * 24 * 365) # years
[1] 1.194521
This is a simple way to find out the difference in years with the lubridate package:
as.numeric(as.Date("14-03-2013", format = "%d-%m-%Y") %--% as.Date("23-03-2014", format = "%d-%m-%Y"), "years")
This returns 1.023956
You can use floor() if you don't want the decimals.
try this for a months solution
StartDate <- strptime("14 January 2013", "%d %B %Y")
EventDates <- strptime(c("26 March 2014"), "%d %B %Y")
difftime(EventDates, StartDate)
A more "precise" calculation. That is, the number of week/month/quarter/year for a non-complete week/month/quarter/year is the fraction of calendar days in that week/month/quarter/year. For example, the number of months between 2016-02-22 and 2016-03-31 is 8/29 + 31/31 = 1.27586
explanation inline with code
#' Calculate precise number of periods between 2 dates
#'
#' #details The number of week/month/quarter/year for a non-complete week/month/quarter/year
#' is the fraction of calendar days in that week/month/quarter/year.
#' For example, the number of months between 2016-02-22 and 2016-03-31
#' is 8/29 + 31/31 = 1.27586
#'
#' #param startdate start Date of the interval
#' #param enddate end Date of the interval
#' #param period character. It must be one of 'day', 'week', 'month', 'quarter' and 'year'
#'
#' #examples
#' identical(numPeriods(as.Date("2016-02-15"), as.Date("2016-03-31"), "month"), 15/29 + 1)
#' identical(numPeriods(as.Date("2016-02-15"), as.Date("2016-03-31"), "quarter"), (15 + 31)/(31 + 29 + 31))
#' identical(numPeriods(as.Date("2016-02-15"), as.Date("2016-03-31"), "year"), (15 + 31)/366)
#'
#' #return exact number of periods between
#'
numPeriods <- function(startdate, enddate, period) {
numdays <- as.numeric(enddate - startdate) + 1
if (grepl("day", period, ignore.case=TRUE)) {
return(numdays)
} else if (grepl("week", period, ignore.case=TRUE)) {
return(numdays / 7)
}
#create a sequence of dates between start and end dates
effDaysinBins <- cut(seq(startdate, enddate, by="1 day"), period)
#use the earliest start date of the previous bins and create a breaks of periodic dates with
#user's period interval
intervals <- seq(from=as.Date(min(levels(effDaysinBins)), "%Y-%m-%d"),
by=paste("1",period),
length.out=length(levels(effDaysinBins))+1)
#create a sequence of dates between the earliest interval date and last date of the interval
#that contains the enddate
allDays <- seq(from=intervals[1],
to=intervals[intervals > enddate][1] - 1,
by="1 day")
#bin all days in the whole period using previous breaks
allDaysInBins <- cut(allDays, intervals)
#calculate ratio of effective days to all days in whole period
sum( tabulate(effDaysinBins) / tabulate(allDaysInBins) )
} #numPeriods
Please let me know if you find more boundary cases where the above solution does not work.

Resources