lubridate convert decimals into months - r

i have estimated a variable age.first.union as a time difference using lubridate by subracting the date of wedding wdow from the date of birth wdob. I got the following numeric vector
head(wm$age.first.union, 3)
[1] 15.43014 12.67123 17.34247
I would like to have the decimals converted into months (and possibly also into days, but that's a minor detail), so the first value would be 15 years and 5 months. What I did was to create a series of new variables and then perform some calculations. To get the number of months, first, I duplicated and truncated the age.first.union variable. Then I estimated the difference between the two to get only the decimal part and then used proportions (e.g. 0.43 : 10 = x : 12 ) to get the months.
I looked into the lubridate documentation but I could not find much on this. I tried the following
years(floor(dseconds(15.43014)))
but I got only the years
[1] "15y 0m 0d 0H 0M 0S"
One idea would be to get the durations in seconds
seconds(floor(dyears(15.43014)))
[1] "486604895S"
but then the challenge would be that months have difference lengths. Even an approximation of years = 365 days, and months = 30 days would be more then perfect, but I do not know how to do it apart from lengthy calculations.
One final idea would be to have years and month using the calculation as described at the beginning of this post, and then merge the two variables into the final one using something similar to make_date (but it looks like a make_duration does not seem to exist yet).
The whole process looks quite cumbersome to me, anyone has a different take?
Many thanks
Manolo

While lubridate provides a function decimal_date to convert a fractional date to D-M-Y date, you seem to be dealing with durations. So this won't work.
However, you can quite easily define a custom function to extract the integer year, month and fractional day (based on an average 30.42 days per month in a regular year):
age <- c(15.43014 12.67123 17.34247)
f <- function(x) {
year <- floor(x);
month <- floor((x - year) * 12);
day <- ((x - year) * 12 - month) * 30.42;
return(sprintf("%i years, %i months, %3.2f days", year, month, day))
}
lapply(age, f);
#[[1]]
#[1] "15 years, 5 months, 4.92 days"
#
#[[2]]
#[1] "12 years, 8 months, 1.67 days"
#
#[[3]]
#[1] "17 years, 4 months, 3.34 days"
Update
If you want to return the integer year, month and fractional day you can define f as
f <- function(x) {
year <- floor(x);
month <- floor((x - year) * 12);
day <- ((x - year) * 12 - month) * 30.42;
return(list(year = year, month = month, day = day))
}
which gives you e.g.
sapply(age, f);
# [,1] [,2] [,3]
#year 15 12 17
#month 5 8 4
#day 4.918306 1.665799 3.335249

We can define our own ym S3 class to represent year/month objects. Here we define several ym methods as well as extractor functions for years and months. The as.data.frame.ym method is a partial implementation. We have defined a month to be 1/12th of a year.
as.ym <- function(x, ...) structure(x, class = "ym")
as.data.frame.ym <- function(x, ...)
structure(list(x), row.names = seq_along(x), class = "data.frame")
years.ym <- as.integer
months.ym <- function(x) 12 * as.numeric(x) %% 1
format.ym <- function(x, ...) paste0(years.ym(x), "Y ", round(months.ym(x)), "M")
print.ym <- function(x, ...) print(format(x), ...)
# test
x <- c(15.43014, 12.67123, 17.34247) # test input
xx <- as.ym(x)
xx
## [1] "15Y 5M" "12Y 8M" "17Y 4M"
DF <- data.frame(x, xx)
DF
x xx
1 15.43014 15Y 5M
2 12.67123 12Y 8M
3 17.34247 17Y 4M
years.ym(xx)
## [1] 15 12 17
months.ym(xx)
## [1] 5.16168 8.05476 4.10964
class(xx)
## [1] "ym"
Days
To extend this to include days, as well, we assume that there are 365.25 days in a year and, again, we use 12 months in a year. We create a ymd S3 class for this.
as.ymd <- function(x, ...) structure(x, class = "ymd")
as.data.frame.ymd <- function(x, ...)
structure(list(x), row.names = seq_along(x), class = "data.frame")
years.ymd <- as.integer
months.ymd <- function(x) as.integer(12 * as.numeric(x) %% 1)
days.ymd <- function(x) (365.25 * as.numeric(x)) %% (365.25 / 12)
format.ymd <- function(x, ...)
paste0(years.ymd(x), "Y ", as.integer(months.ymd(x)), "M ", round(days.ymd(x), 1), "D")
print.ymd <- function(x, ...) print(format(x), ...)
xx <- as.ymd(x)
xx
## [1] "15Y 5M 4.9D" "12Y 8M 1.7D" "17Y 4M 3.3D"
DF <- data.frame(x, xx)
DF
x xx
1 15.43014 15Y 5M 4.9D
2 12.67123 12Y 8M 1.7D
3 17.34247 17Y 4M 3.3D
years.ymd(xx)
## [1] 15 12 17
months.ymd(xx)
## [1] 5 8 4
days.ymd(xx)
## [1] 4.921135 1.666758 3.337167
class(xx)
## [1] "ymd"

Related

Finding age in R [duplicate]

I am using data.table for the first time.
I have a column of about 400,000 ages in my table. I need to convert them from birth dates to ages.
What is the best way to do this?
I've been thinking about this and have been dissatisfied with the two answers so far. I like using lubridate, as #KFB did, but I also want things wrapped up nicely in a function, as in my answer using the eeptools package. So here's a wrapper function using the lubridate interval method with some nice options:
#' Calculate age
#'
#' By default, calculates the typical "age in years", with a
#' \code{floor} applied so that you are, e.g., 5 years old from
#' 5th birthday through the day before your 6th birthday. Set
#' \code{floor = FALSE} to return decimal ages, and change \code{units}
#' for units other than years.
#' #param dob date-of-birth, the day to start calculating age.
#' #param age.day the date on which age is to be calculated.
#' #param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}.
#' #param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}.
#' #return Age in \code{units}. Will be an integer if \code{floor = TRUE}.
#' #examples
#' my.dob <- as.Date('1983-10-20')
#' age(my.dob)
#' age(my.dob, units = "minutes")
#' age(my.dob, floor = FALSE)
age <- function(dob, age.day = today(), units = "years", floor = TRUE) {
calc.age = lubridate::interval(dob, age.day) / lubridate::duration(num = 1, units = units)
if (floor) return(as.integer(floor(calc.age)))
return(calc.age)
}
Usage examples:
> my.dob <- as.Date('1983-10-20')
> age(my.dob)
[1] 31
> age(my.dob, floor = FALSE)
[1] 31.15616
> age(my.dob, units = "minutes")
[1] 16375680
> age(seq(my.dob, length.out = 6, by = "years"))
[1] 31 30 29 28 27 26
From the comments of this blog entry, I found the age_calc function in the eeptools package. It takes care of edge cases (leap years, etc.), checks inputs and looks quite robust.
library(eeptools)
x <- as.Date(c("2011-01-01", "1996-02-29"))
age_calc(x[1],x[2]) # default is age in months
[1] 46.73333 224.83118
age_calc(x[1],x[2], units = "years") # but you can set it to years
[1] 3.893151 18.731507
floor(age_calc(x[1],x[2], units = "years"))
[1] 3 18
For your data
yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years"))
assuming you want age in integer years.
Assume you have a data.table, you could do below:
library(data.table)
library(lubridate)
# toy data
X = data.table(birth=seq(from=as.Date("1970-01-01"), to=as.Date("1980-12-31"), by="year"))
Sys.Date()
Option 1 : use "as.period" from lubriate package
X[, age := as.period(Sys.Date() - birth)][]
birth age
1: 1970-01-01 44y 0m 327d 0H 0M 0S
2: 1971-01-01 43y 0m 327d 6H 0M 0S
3: 1972-01-01 42y 0m 327d 12H 0M 0S
4: 1973-01-01 41y 0m 326d 18H 0M 0S
5: 1974-01-01 40y 0m 327d 0H 0M 0S
6: 1975-01-01 39y 0m 327d 6H 0M 0S
7: 1976-01-01 38y 0m 327d 12H 0M 0S
8: 1977-01-01 37y 0m 326d 18H 0M 0S
9: 1978-01-01 36y 0m 327d 0H 0M 0S
10: 1979-01-01 35y 0m 327d 6H 0M 0S
11: 1980-01-01 34y 0m 327d 12H 0M 0S
Option 2 : if you do not like the format of Option 1, you could do below:
yr = duration(num = 1, units = "years")
X[, age := new_interval(birth, Sys.Date())/yr][]
# you get
birth age
1: 1970-01-01 44.92603
2: 1971-01-01 43.92603
3: 1972-01-01 42.92603
4: 1973-01-01 41.92329
5: 1974-01-01 40.92329
6: 1975-01-01 39.92329
7: 1976-01-01 38.92329
8: 1977-01-01 37.92055
9: 1978-01-01 36.92055
10: 1979-01-01 35.92055
11: 1980-01-01 34.92055
Believe Option 2 should be the more desirable.
I prefer to do this using the lubridate package, borrowing syntax I originally encountered in another post.
It's necessary to standardize your input dates in terms of R date objects, preferably with the lubridate::mdy() or lubridate::ymd() or similar functions, as applicable. You can use the interval() function to generate an interval describing the time elapsed between the two dates, and then use the duration() function to define how this interval should be "diced".
I've summarized the simplest case for calculating an age from two dates below, using the most current syntax in R.
df$DOB <- mdy(df$DOB)
df$EndDate <- mdy(df$EndDate)
df$Calc_Age <- interval(start= df$DOB, end=df$EndDate)/
duration(n=1, unit="years")
Age may be rounded down to the nearest complete integer using the base R 'floor()` function, like so:
df$Calc_AgeF <- floor(df$Calc_Age)
Alternately, the digits= argument in the base R round() function can be used to round up or down, and specify the exact number of decimals in the returned value, like so:
df$Calc_Age2 <- round(df$Calc_Age, digits = 2) ## 2 decimals
df$Calc_Age0 <- round(df$Calc_Age, digits = 0) ## nearest integer
It's worth noting that once the input dates are passed through the calculation step described above (i.e., interval() and duration() functions) , the returned value will be numeric and no longer a date object in R. This is significant whereas the lubridate::floor_date() is limited strictly to date-time objects.
The above syntax works regardless whether the input dates occur in a data.table or data.frame object.
I wanted an implementation that didn't increase my dependencies beyond data.table, which is usually my only dependency. The data.table is only needed for mday, which means day of the month.
Development function
This function is logically how I would think about someone's age. I start with [current year] - [brith year] - 1, then add 1 if they've already had their birthday in the current year. To check for that offset I start by considering month, then (if necessary) day of month.
Here is that step by step implementation:
agecalc <- function(origin, current){
require(data.table)
y <- year(current) - year(origin) - 1
offset <- 0
if(month(current) > month(origin)) offset <- 1
if(month(current) == month(origin) &
mday(current) >= mday(origin)) offset <- 1
age <- y + offset
return(age)
}
Production function
This is the same logic refactored and vectorized:
agecalc <- function(origin, current){
require(data.table)
age <- year(current) - year(origin) - 1
ii <- (month(current) > month(origin)) | (month(current) == month(origin) &
mday(current) >= mday(origin))
age[ii] <- age[ii] + 1
return(age)
}
Experimental function that uses strings
You could also do a string comparison on the month / day part. Perhaps there are times when this is more efficient, for example if you had the year as a number and the birth date as a string.
agecalc_strings <- function(origin, current){
origin <- as.character(origin)
current <- as.character(current)
age <- as.numeric(substr(current, 1, 4)) - as.numeric(substr(origin, 1, 4)) - 1
if(substr(current, 6, 10) >= substr(origin, 6, 10)){
age <- age + 1
}
return(age)
}
Some tests on the vectorized "production" version:
## Examples for specific dates to test the calculation with things like
## beginning and end of months, and leap years:
agecalc(as.IDate("1985-08-13"), as.IDate("1985-08-12"))
agecalc(as.IDate("1985-08-13"), as.IDate("1985-08-13"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-08-12"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-08-13"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-09-12"))
agecalc(as.IDate("2000-02-29"), as.IDate("2000-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2000-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-03-01"))
agecalc(as.IDate("2000-02-29"), as.IDate("2004-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2004-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2011-03-01"))
## Testing every age for every day over several years
## This test requires vectorized version:
d <- data.table(d=as.IDate("2000-01-01") + 0:10000)
d[ , b1 := as.IDate("2000-08-15")]
d[ , b2 := as.IDate("2000-02-29")]
d[ , age1_num := (d - b1) / 365]
d[ , age2_num := (d - b2) / 365]
d[ , age1 := agecalc(b1, d)]
d[ , age2 := agecalc(b2, d)]
d
Below is a trivial plot of ages as numeric and integer. As you can see the
integer ages are a sort of stair step pattern that is tangent to (but below) the
straight line of numeric ages.
plot(numeric_age1 ~ today, dt, type = "l",
ylab = "ages", main = "ages plotted")
lines(integer_age1 ~ today, dt, col = "blue")
I wasn't happy with any of the responses when it comes to calculating the age in months or years, when dealing with leap years, so this is my function using the lubridate package.
Basically, it slices the interval between from and to into (up to) yearly chunks, and then adjusts the interval for whether that chunk is leap year or not. The total interval is the sum of the age of each chunk.
library(lubridate)
#' Get Age of Date relative to Another Date
#'
#' #param from,to the date or dates to consider
#' #param units the units to consider
#' #param floor logical as to whether to floor the result
#' #param simple logical as to whether to do a simple calculation, a simple calculation doesn't account for leap year.
#' #author Nicholas Hamilton
#' #export
age <- function(from, to = today(), units = "years", floor = FALSE, simple = FALSE) {
#Account for Leap Year if Working in Months and Years
if(!simple && length(grep("^(month|year)",units)) > 0){
df = data.frame(from,to)
calc = sapply(1:nrow(df),function(r){
#Start and Finish Points
st = df[r,1]; fn = df[r,2]
#If there is no difference, age is zero
if(st == fn){ return(0) }
#If there is a difference, age is not zero and needs to be calculated
sign = +1 #Age Direction
if(st > fn){ tmp = st; st = fn; fn = tmp; sign = -1 } #Swap and Change sign
#Determine the slice-points
mid = ceiling_date(seq(st,fn,by='year'),'year')
#Build the sequence
dates = unique( c(st,mid,fn) )
dates = dates[which(dates >= st & dates <= fn)]
#Determine the age of the chunks
chunks = sapply(head(seq_along(dates),-1),function(ix){
k = 365/( 365 + leap_year(dates[ix]) )
k*interval( dates[ix], dates[ix+1] ) / duration(num = 1, units = units)
})
#Sum the Chunks, and account for direction
sign*sum(chunks)
})
#If Simple Calculation or Not Months or Not years
}else{
calc = interval(from,to) / duration(num = 1, units = units)
}
if (floor) calc = as.integer(floor(calc))
calc
}
(Sys.Date() - yourDate) / 365.25
A very simple way of calculating the age from two dates without using any additional packages probably is:
df$age = with(df, as.Date(date_2, "%Y-%m-%d") - as.Date(date_1, "%Y-%m-%d"))
Here is a (I think simpler) solution using lubridate:
library(lubridate)
age <- function(dob, on.day=today()) {
intvl <- interval(dob, on.day)
prd <- as.period(intvl)
return(prd#year)
}
Note that age_calc from the eeptools package in particular fails on cases with the year 2000 around birthdays.
Some examples that don't work in age_calc:
library(lubridate)
library(eeptools)
age_calc(ymd("1997-04-21"), ymd("2000-04-21"), units = "years")
age_calc(ymd("2000-04-21"), ymd("2019-04-21"), units = "years")
age_calc(ymd("2000-04-21"), ymd("2016-04-21"), units = "years")
Some of the other solutions also have some output that is not intuitive to what I would want for decimal ages when leap years are involved. I like #James_D 's solution and it is precise and concise, but I wanted something where the decimal age is calculated as complete years plus the fraction of the year completed from their last birthday to their next birthday (which would be out of 365 or 366 days depending on year). In the case of leap years I use lubridate's rollback function to use March 1st for non-leap years following February 29th. I used some test cases from #geneorama and added some of my own, and the output aligns with what I would expect.
library(lubridate)
# Calculate precise age from birthdate in ymd format
age_calculation <- function(birth_date, later_year) {
if (birth_date > later_year)
{
stop("Birth date is after the desired date!")
}
# Calculate the most recent birthday of the person based on the desired year
latest_bday <- ymd(add_with_rollback(birth_date, years((year(later_year) - year(birth_date))), roll_to_first = TRUE))
# Get amount of days between the desired date and the latest birthday
days_between <- as.numeric(days(later_year - latest_bday), units = "days")
# Get how many days are in the year between their most recent and next bdays
year_length <- as.numeric(days((add_with_rollback(latest_bday, years(1), roll_to_first = TRUE)) - latest_bday), units = "days")
# Get the year fraction (amount of year completed before next birthday)
fraction_year <- days_between/year_length
# Sum the difference of years with the year fraction
age_sum <- (year(later_year) - year(birth_date)) + fraction_year
return(age_sum)
}
test_list <- list(c("1985-08-13", "1986-08-12"),
c("1985-08-13", "1985-08-13"),
c("1985-08-13", "1986-08-13"),
c("1985-08-13", "1986-09-12"),
c("2000-02-29", "2000-02-29"),
c("2000-02-29", "2000-03-01"),
c("2000-02-29", "2001-02-28"),
c("2000-02-29", "2004-02-29"),
c("2000-02-29", "2011-03-01"),
c("1997-04-21", "2000-04-21"),
c("2000-04-21", "2016-04-21"),
c("2000-04-21", "2019-04-21"),
c("2017-06-15", "2018-04-30"),
c("2019-04-20", "2019-08-24"),
c("2020-05-25", "2021-11-25"),
c("2020-11-25", "2021-11-24"),
c("2020-11-24", "2020-11-25"),
c("2020-02-28", "2020-02-29"),
c("2020-02-29", "2020-02-28"))
for (i in 1:length(test_list))
{
print(paste0("Dates from ", test_list[[i]][1], " to ", test_list[[i]][2]))
result <- age_calculation(ymd(test_list[[i]][1]), ymd(test_list[[i]][2]))
print(result)
}
Output:
[1] "Dates from 1985-08-13 to 1986-08-12"
[1] 0.9972603
[1] "Dates from 1985-08-13 to 1985-08-13"
[1] 0
[1] "Dates from 1985-08-13 to 1986-08-13"
[1] 1
[1] "Dates from 1985-08-13 to 1986-09-12"
[1] 1.082192
[1] "Dates from 2000-02-29 to 2000-02-29"
[1] 0
[1] "Dates from 2000-02-29 to 2000-03-01"
[1] 0.00273224
[1] "Dates from 2000-02-29 to 2001-02-28"
[1] 0.9972603
[1] "Dates from 2000-02-29 to 2004-02-29"
[1] 4
[1] "Dates from 2000-02-29 to 2011-03-01"
[1] 11
[1] "Dates from 1997-04-21 to 2000-04-21"
[1] 3
[1] "Dates from 2000-04-21 to 2016-04-21"
[1] 16
[1] "Dates from 2000-04-21 to 2019-04-21"
[1] 19
[1] "Dates from 2017-06-15 to 2018-04-30"
[1] 0.8739726
[1] "Dates from 2019-04-20 to 2019-08-24"
[1] 0.3442623
[1] "Dates from 2020-05-25 to 2021-11-25"
[1] 1.50411
[1] "Dates from 2020-11-25 to 2021-11-24"
[1] 0.9972603
[1] "Dates from 2020-11-24 to 2020-11-25"
[1] 0.002739726
[1] "Dates from 2020-02-28 to 2020-02-29"
[1] 0.00273224
[1] "Dates from 2020-02-29 to 2020-02-28"
Error in age_calculation(ymd(test_list[[i]][1]), ymd(test_list[[i]][2])) :
Birth date is after the desired date!
As others have been saying, the trunc function is excellent to get integer age.
I realise there are a lot of answers but since I can't help myself, I might as well add to the discussion.
I'm building a package that's focused on dates and datetimes and in it I use a function called time_diff(). Here is a simplified version.
time_diff <- function(x, y, units, num = 1,
type = c("duration", "period"),
as_period = FALSE){
type <- match.arg(type)
units <- match.arg(units, c("picoseconds", "nanoseconds", "microseconds",
"milliseconds", "seconds", "minutes", "hours", "days",
"weeks", "months", "years"))
int <- lubridate::interval(x, y)
if (as_period || type == "period"){
if (as_period) int <- lubridate::as.period(int, unit = units)
unit <- lubridate::period(num = num, units = units)
} else {
unit <- do.call(get(paste0("d", units),
asNamespace("lubridate")),
list(x = num))
}
out <- int / unit
out
}
# Wrapper around the more general time_diff
age_years <- function(x, y){
trunc(time_diff(x, y, units = "years", num = 1,
type = "period", as_period = TRUE))
}
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
bday <- dmy("01-01-2000")
time_diff(bday, today(), "years", type = "period")
#> [1] 23.11233
leap1 <- dmy("29-02-2020")
leap2 <- dmy("28-02-2021")
leap3 <- dmy("01-03-2021")
# Many people might say this is wrong so use the more exact age_years
time_diff(leap1, leap2, "years", type = "period")
#> [1] 1
# age in years, accounting for leap years properly
age_years(leap1, leap2)
#> [1] 0
age_years(leap1, leap3)
#> [1] 1
# So to add a column of ages in years, one can do this..
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
my_data <- tibble(dob = seq(bday, today(), by = "day"))
my_data <- my_data %>%
mutate(age_years = age_years(dob, today()))
slice_head(my_data, n = 10)
#> # A tibble: 10 x 2
#> dob age_years
#> <date> <dbl>
#> 1 2000-01-01 23
#> 2 2000-01-02 23
#> 3 2000-01-03 23
#> 4 2000-01-04 23
#> 5 2000-01-05 23
#> 6 2000-01-06 23
#> 7 2000-01-07 23
#> 8 2000-01-08 23
#> 9 2000-01-09 23
#> 10 2000-01-10 23
Created on 2023-02-11 with reprex v2.0.2

Create n different dates in consecutive months from a starting year-month

I have a starting time specified as a year-month character, e.g. "2020-12". From the start, for each of T consecutive months, I need to generate n different dates (year-month-day), where the day is random.
Any help will be useful!
The data I'm working on:
data <- data.frame(
data = sample(seq(as.Date('2000/01/01'), as.Date('2020/01/01'), by="day"), 500),
price = round(runif(500, min = 10, max = 20),2),
quantity = round(rnorm(500,30),0)
)
func <- function(start, months, n) {
startdate <- as.Date(paste0(start, "-01"))
enddate <- seq(startdate, by = "month", length.out = months)
months <- seq_len(months)
enddate_lt <- as.POSIXlt(enddate)
enddate_lt$mon <- enddate_lt$mon + 1
enddate_lt$mday <- enddate_lt$mday - 1
days_per_month <- as.integer(format(enddate_lt, format = "%d"))
days <- lapply(days_per_month, sample, size = n)
dates <- Map(`+`, enddate, days)
do.call(c, dates)
}
set.seed(2021)
func("2020-12", 4, 3)
# [1] "2020-12-08" "2020-12-07" "2020-12-15" "2021-01-27" "2021-01-08" "2021-01-13" "2021-02-21" "2021-02-07" "2021-02-28"
# [10] "2021-03-28" "2021-03-07" "2021-03-15"
func("2020-12", 5, 2)
# [1] "2020-12-06" "2020-12-16" "2021-01-08" "2021-01-10" "2021-02-24" "2021-02-13" "2021-03-20" "2021-03-29" "2021-04-19"
# [10] "2021-04-28"
func("2020-12", 2, 10)
# [1] "2020-12-29" "2020-12-30" "2020-12-04" "2020-12-15" "2020-12-09" "2020-12-27" "2020-12-05" "2020-12-06" "2020-12-23"
# [10] "2020-12-17" "2021-01-03" "2021-01-20" "2021-01-05" "2021-01-22" "2021-01-23" "2021-01-06" "2021-01-10" "2021-01-07"
# [19] "2021-01-19" "2021-01-12"
Most of the dancing with POSIXlt objects is because it gives us clean (base R) access to the number of days in a month, which makes sampleing the days in a month rather simple. It can also be done (code-golf shorter) using the lubridate package, but I don't know that that is any more correct than this code is.
This just dumps out a sequence of random dates, with n days per month. It does not sort within each month, though it does output the months in order. (That's not a difficult extension, there just wasn't a requirement for it.) It doesn't put out a frame, you can easily extend this to fit in a frame or call data.frame(date = do.call(c, dates)) on the last line, depending on what you need to do with the output.
You could convert the start time to a class for monthly data, zoo::yearmon. Then use as.Date.yearmon and its frac argument ("a number between 0 and 1 inclusive that indicates the fraction of the way through the period that the result represents") with random values from runif (uniform between 0 and 1) to convert to a random date within each year-month.
start = "2020-12"
T = 3
n = 2
library(zoo)
set.seed(1)
as.Date(as.yearmon(start) + rep((1:T)/12, each = n), frac = runif(T * n))
# [1] "2021-01-08" "2021-01-12" "2021-02-16" "2021-02-25" "2021-03-07" "2021-03-27"

R Programming 30 day Months

I'm currently writing a script in the R Programming Language and I've hit a snag.
I have time series data organized in a way where there are 30 days in each month for 12 months in 1 year. However, I need the data organized in a proper 365 days in a year calendar, as in 30 days in a month, 31 days in a month, etc.
Is there a simple way for R to recognize there are 30 days in a month and to operate within that parameter? At the moment I have my script converting the number of days from the source in UNIX time and it counts up.
For example:
startingdate <- "20060101"
endingdate <- "20121230"
date <- seq(from = as.Date(startingdate, "%Y%m%d"), to = as.Date(endingdate, "%Y%m%d"), by = "days")
This would generate an array of dates with each month having 29 days/30 days/31 days etc. However, my data is currently organized as 30 days per month, regardless of 29 days or 31 days present.
Thanks.
The first 4 solutions are basically variations of the same theme using expand.grid. (3) uses magrittr and the others use no packages. The last two work by creating long sequence of numbers and then picking out the ones that have month and day in range.
1) apply This gives a series of yyyymmdd numbers such that there are 30 days in each month. Note that the line defining yrs in this case is the same as yrs <- 2006:2012 so if the years are handy we could shorten that line. Omit as.numeric in the line defining s if you want character string output instead. Also, s and d are the same because we have whole years so we could omit the line defining d and use s as the answer in this case and also in general if we are always dealing with whole years.
startingdate <- "20060101"
endingdate <- "20121230"
yrs <- seq(as.numeric(substr(startingdate, 1, 4)), as.numeric(substr(endingdate, 1, 4)))
g <- expand.grid(yrs, sprintf("%02d", 1:12), sprintf("%02d", 1:30))
s <- sort(as.numeric(apply(g, 1, paste, collapse = "")))
d <- s[ s >= startingdate & s <= endingdate ] # optional if whole years
Run some checks.
head(d)
## [1] 20060101 20060102 20060103 20060104 20060105 20060106
tail(d)
## 20121225 20121226 20121227 20121228 20121229 20121230
length(d) == length(2006:2012) * 12 * 30
## [1] TRUE
2) no apply An alternative variation would be this. In this and the following solutions we are using yrs as calculated in (1) so we omit it to avoid redundancy. Also, in this and the following solutions, the corresponding line to the one setting d is omitted, again, to avoid redundancy -- if you don't have whole years then add the line defining d in (1) replacing s in that line with s2.
g2 <- expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30))
s2 <- with(g2, sort(as.numeric(paste0(yr, mon, day))))
3) magrittr This could also be written using magrittr like this:
library(magrittr)
expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30)) %>%
with(paste0(yr, mon, day)) %>%
as.numeric %>%
sort -> s3
4) do.call Another variation.
g4 <- expand.grid(yrs, 1:12, 1:30)
s4 <- sort(as.numeric(do.call("sprintf", c("%d%02d%02d", g4))))
5) subset sequence Create a sequence of numbers from the starting date to the ending date and if each number is of the form yyyymmdd pick out those for which mm and dd are in range.
seq5 <- seq(as.numeric(startingdate), as.numeric(endingdate))
d5 <- seq5[ seq5 %/% 100 %% 100 %in% 1:12 & seq5 %% 100 %in% 1:30]
6) grep Using seq5 from (5)
d6 <- as.numeric(grep("(0[1-9]|1[0-2])(0[1-9]|[12][0-9]|30)$", seq5, value = TRUE))
Here's an alternative:
date <- unclass(startingdate):unclass(endingdate) %% 30L
month <- rep(1:12, each = 30, length.out = NN <- length(date))
year <- rep(1:(NN %/% 360 + 1), each = 360, length.out = NN)
(of course, we can easily adjust by adding constants to taste if you want a specific day to be 0, or a specific month, etc.)

change a column from birth date to age in r

I am using data.table for the first time.
I have a column of about 400,000 ages in my table. I need to convert them from birth dates to ages.
What is the best way to do this?
I've been thinking about this and have been dissatisfied with the two answers so far. I like using lubridate, as #KFB did, but I also want things wrapped up nicely in a function, as in my answer using the eeptools package. So here's a wrapper function using the lubridate interval method with some nice options:
#' Calculate age
#'
#' By default, calculates the typical "age in years", with a
#' \code{floor} applied so that you are, e.g., 5 years old from
#' 5th birthday through the day before your 6th birthday. Set
#' \code{floor = FALSE} to return decimal ages, and change \code{units}
#' for units other than years.
#' #param dob date-of-birth, the day to start calculating age.
#' #param age.day the date on which age is to be calculated.
#' #param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}.
#' #param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}.
#' #return Age in \code{units}. Will be an integer if \code{floor = TRUE}.
#' #examples
#' my.dob <- as.Date('1983-10-20')
#' age(my.dob)
#' age(my.dob, units = "minutes")
#' age(my.dob, floor = FALSE)
age <- function(dob, age.day = today(), units = "years", floor = TRUE) {
calc.age = lubridate::interval(dob, age.day) / lubridate::duration(num = 1, units = units)
if (floor) return(as.integer(floor(calc.age)))
return(calc.age)
}
Usage examples:
> my.dob <- as.Date('1983-10-20')
> age(my.dob)
[1] 31
> age(my.dob, floor = FALSE)
[1] 31.15616
> age(my.dob, units = "minutes")
[1] 16375680
> age(seq(my.dob, length.out = 6, by = "years"))
[1] 31 30 29 28 27 26
From the comments of this blog entry, I found the age_calc function in the eeptools package. It takes care of edge cases (leap years, etc.), checks inputs and looks quite robust.
library(eeptools)
x <- as.Date(c("2011-01-01", "1996-02-29"))
age_calc(x[1],x[2]) # default is age in months
[1] 46.73333 224.83118
age_calc(x[1],x[2], units = "years") # but you can set it to years
[1] 3.893151 18.731507
floor(age_calc(x[1],x[2], units = "years"))
[1] 3 18
For your data
yourdata$age <- floor(age_calc(yourdata$birthdate, units = "years"))
assuming you want age in integer years.
Assume you have a data.table, you could do below:
library(data.table)
library(lubridate)
# toy data
X = data.table(birth=seq(from=as.Date("1970-01-01"), to=as.Date("1980-12-31"), by="year"))
Sys.Date()
Option 1 : use "as.period" from lubriate package
X[, age := as.period(Sys.Date() - birth)][]
birth age
1: 1970-01-01 44y 0m 327d 0H 0M 0S
2: 1971-01-01 43y 0m 327d 6H 0M 0S
3: 1972-01-01 42y 0m 327d 12H 0M 0S
4: 1973-01-01 41y 0m 326d 18H 0M 0S
5: 1974-01-01 40y 0m 327d 0H 0M 0S
6: 1975-01-01 39y 0m 327d 6H 0M 0S
7: 1976-01-01 38y 0m 327d 12H 0M 0S
8: 1977-01-01 37y 0m 326d 18H 0M 0S
9: 1978-01-01 36y 0m 327d 0H 0M 0S
10: 1979-01-01 35y 0m 327d 6H 0M 0S
11: 1980-01-01 34y 0m 327d 12H 0M 0S
Option 2 : if you do not like the format of Option 1, you could do below:
yr = duration(num = 1, units = "years")
X[, age := new_interval(birth, Sys.Date())/yr][]
# you get
birth age
1: 1970-01-01 44.92603
2: 1971-01-01 43.92603
3: 1972-01-01 42.92603
4: 1973-01-01 41.92329
5: 1974-01-01 40.92329
6: 1975-01-01 39.92329
7: 1976-01-01 38.92329
8: 1977-01-01 37.92055
9: 1978-01-01 36.92055
10: 1979-01-01 35.92055
11: 1980-01-01 34.92055
Believe Option 2 should be the more desirable.
I prefer to do this using the lubridate package, borrowing syntax I originally encountered in another post.
It's necessary to standardize your input dates in terms of R date objects, preferably with the lubridate::mdy() or lubridate::ymd() or similar functions, as applicable. You can use the interval() function to generate an interval describing the time elapsed between the two dates, and then use the duration() function to define how this interval should be "diced".
I've summarized the simplest case for calculating an age from two dates below, using the most current syntax in R.
df$DOB <- mdy(df$DOB)
df$EndDate <- mdy(df$EndDate)
df$Calc_Age <- interval(start= df$DOB, end=df$EndDate)/
duration(n=1, unit="years")
Age may be rounded down to the nearest complete integer using the base R 'floor()` function, like so:
df$Calc_AgeF <- floor(df$Calc_Age)
Alternately, the digits= argument in the base R round() function can be used to round up or down, and specify the exact number of decimals in the returned value, like so:
df$Calc_Age2 <- round(df$Calc_Age, digits = 2) ## 2 decimals
df$Calc_Age0 <- round(df$Calc_Age, digits = 0) ## nearest integer
It's worth noting that once the input dates are passed through the calculation step described above (i.e., interval() and duration() functions) , the returned value will be numeric and no longer a date object in R. This is significant whereas the lubridate::floor_date() is limited strictly to date-time objects.
The above syntax works regardless whether the input dates occur in a data.table or data.frame object.
I wanted an implementation that didn't increase my dependencies beyond data.table, which is usually my only dependency. The data.table is only needed for mday, which means day of the month.
Development function
This function is logically how I would think about someone's age. I start with [current year] - [brith year] - 1, then add 1 if they've already had their birthday in the current year. To check for that offset I start by considering month, then (if necessary) day of month.
Here is that step by step implementation:
agecalc <- function(origin, current){
require(data.table)
y <- year(current) - year(origin) - 1
offset <- 0
if(month(current) > month(origin)) offset <- 1
if(month(current) == month(origin) &
mday(current) >= mday(origin)) offset <- 1
age <- y + offset
return(age)
}
Production function
This is the same logic refactored and vectorized:
agecalc <- function(origin, current){
require(data.table)
age <- year(current) - year(origin) - 1
ii <- (month(current) > month(origin)) | (month(current) == month(origin) &
mday(current) >= mday(origin))
age[ii] <- age[ii] + 1
return(age)
}
Experimental function that uses strings
You could also do a string comparison on the month / day part. Perhaps there are times when this is more efficient, for example if you had the year as a number and the birth date as a string.
agecalc_strings <- function(origin, current){
origin <- as.character(origin)
current <- as.character(current)
age <- as.numeric(substr(current, 1, 4)) - as.numeric(substr(origin, 1, 4)) - 1
if(substr(current, 6, 10) >= substr(origin, 6, 10)){
age <- age + 1
}
return(age)
}
Some tests on the vectorized "production" version:
## Examples for specific dates to test the calculation with things like
## beginning and end of months, and leap years:
agecalc(as.IDate("1985-08-13"), as.IDate("1985-08-12"))
agecalc(as.IDate("1985-08-13"), as.IDate("1985-08-13"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-08-12"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-08-13"))
agecalc(as.IDate("1985-08-13"), as.IDate("1986-09-12"))
agecalc(as.IDate("2000-02-29"), as.IDate("2000-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2000-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2001-03-01"))
agecalc(as.IDate("2000-02-29"), as.IDate("2004-02-28"))
agecalc(as.IDate("2000-02-29"), as.IDate("2004-02-29"))
agecalc(as.IDate("2000-02-29"), as.IDate("2011-03-01"))
## Testing every age for every day over several years
## This test requires vectorized version:
d <- data.table(d=as.IDate("2000-01-01") + 0:10000)
d[ , b1 := as.IDate("2000-08-15")]
d[ , b2 := as.IDate("2000-02-29")]
d[ , age1_num := (d - b1) / 365]
d[ , age2_num := (d - b2) / 365]
d[ , age1 := agecalc(b1, d)]
d[ , age2 := agecalc(b2, d)]
d
Below is a trivial plot of ages as numeric and integer. As you can see the
integer ages are a sort of stair step pattern that is tangent to (but below) the
straight line of numeric ages.
plot(numeric_age1 ~ today, dt, type = "l",
ylab = "ages", main = "ages plotted")
lines(integer_age1 ~ today, dt, col = "blue")
I wasn't happy with any of the responses when it comes to calculating the age in months or years, when dealing with leap years, so this is my function using the lubridate package.
Basically, it slices the interval between from and to into (up to) yearly chunks, and then adjusts the interval for whether that chunk is leap year or not. The total interval is the sum of the age of each chunk.
library(lubridate)
#' Get Age of Date relative to Another Date
#'
#' #param from,to the date or dates to consider
#' #param units the units to consider
#' #param floor logical as to whether to floor the result
#' #param simple logical as to whether to do a simple calculation, a simple calculation doesn't account for leap year.
#' #author Nicholas Hamilton
#' #export
age <- function(from, to = today(), units = "years", floor = FALSE, simple = FALSE) {
#Account for Leap Year if Working in Months and Years
if(!simple && length(grep("^(month|year)",units)) > 0){
df = data.frame(from,to)
calc = sapply(1:nrow(df),function(r){
#Start and Finish Points
st = df[r,1]; fn = df[r,2]
#If there is no difference, age is zero
if(st == fn){ return(0) }
#If there is a difference, age is not zero and needs to be calculated
sign = +1 #Age Direction
if(st > fn){ tmp = st; st = fn; fn = tmp; sign = -1 } #Swap and Change sign
#Determine the slice-points
mid = ceiling_date(seq(st,fn,by='year'),'year')
#Build the sequence
dates = unique( c(st,mid,fn) )
dates = dates[which(dates >= st & dates <= fn)]
#Determine the age of the chunks
chunks = sapply(head(seq_along(dates),-1),function(ix){
k = 365/( 365 + leap_year(dates[ix]) )
k*interval( dates[ix], dates[ix+1] ) / duration(num = 1, units = units)
})
#Sum the Chunks, and account for direction
sign*sum(chunks)
})
#If Simple Calculation or Not Months or Not years
}else{
calc = interval(from,to) / duration(num = 1, units = units)
}
if (floor) calc = as.integer(floor(calc))
calc
}
(Sys.Date() - yourDate) / 365.25
A very simple way of calculating the age from two dates without using any additional packages probably is:
df$age = with(df, as.Date(date_2, "%Y-%m-%d") - as.Date(date_1, "%Y-%m-%d"))
Here is a (I think simpler) solution using lubridate:
library(lubridate)
age <- function(dob, on.day=today()) {
intvl <- interval(dob, on.day)
prd <- as.period(intvl)
return(prd#year)
}
Note that age_calc from the eeptools package in particular fails on cases with the year 2000 around birthdays.
Some examples that don't work in age_calc:
library(lubridate)
library(eeptools)
age_calc(ymd("1997-04-21"), ymd("2000-04-21"), units = "years")
age_calc(ymd("2000-04-21"), ymd("2019-04-21"), units = "years")
age_calc(ymd("2000-04-21"), ymd("2016-04-21"), units = "years")
Some of the other solutions also have some output that is not intuitive to what I would want for decimal ages when leap years are involved. I like #James_D 's solution and it is precise and concise, but I wanted something where the decimal age is calculated as complete years plus the fraction of the year completed from their last birthday to their next birthday (which would be out of 365 or 366 days depending on year). In the case of leap years I use lubridate's rollback function to use March 1st for non-leap years following February 29th. I used some test cases from #geneorama and added some of my own, and the output aligns with what I would expect.
library(lubridate)
# Calculate precise age from birthdate in ymd format
age_calculation <- function(birth_date, later_year) {
if (birth_date > later_year)
{
stop("Birth date is after the desired date!")
}
# Calculate the most recent birthday of the person based on the desired year
latest_bday <- ymd(add_with_rollback(birth_date, years((year(later_year) - year(birth_date))), roll_to_first = TRUE))
# Get amount of days between the desired date and the latest birthday
days_between <- as.numeric(days(later_year - latest_bday), units = "days")
# Get how many days are in the year between their most recent and next bdays
year_length <- as.numeric(days((add_with_rollback(latest_bday, years(1), roll_to_first = TRUE)) - latest_bday), units = "days")
# Get the year fraction (amount of year completed before next birthday)
fraction_year <- days_between/year_length
# Sum the difference of years with the year fraction
age_sum <- (year(later_year) - year(birth_date)) + fraction_year
return(age_sum)
}
test_list <- list(c("1985-08-13", "1986-08-12"),
c("1985-08-13", "1985-08-13"),
c("1985-08-13", "1986-08-13"),
c("1985-08-13", "1986-09-12"),
c("2000-02-29", "2000-02-29"),
c("2000-02-29", "2000-03-01"),
c("2000-02-29", "2001-02-28"),
c("2000-02-29", "2004-02-29"),
c("2000-02-29", "2011-03-01"),
c("1997-04-21", "2000-04-21"),
c("2000-04-21", "2016-04-21"),
c("2000-04-21", "2019-04-21"),
c("2017-06-15", "2018-04-30"),
c("2019-04-20", "2019-08-24"),
c("2020-05-25", "2021-11-25"),
c("2020-11-25", "2021-11-24"),
c("2020-11-24", "2020-11-25"),
c("2020-02-28", "2020-02-29"),
c("2020-02-29", "2020-02-28"))
for (i in 1:length(test_list))
{
print(paste0("Dates from ", test_list[[i]][1], " to ", test_list[[i]][2]))
result <- age_calculation(ymd(test_list[[i]][1]), ymd(test_list[[i]][2]))
print(result)
}
Output:
[1] "Dates from 1985-08-13 to 1986-08-12"
[1] 0.9972603
[1] "Dates from 1985-08-13 to 1985-08-13"
[1] 0
[1] "Dates from 1985-08-13 to 1986-08-13"
[1] 1
[1] "Dates from 1985-08-13 to 1986-09-12"
[1] 1.082192
[1] "Dates from 2000-02-29 to 2000-02-29"
[1] 0
[1] "Dates from 2000-02-29 to 2000-03-01"
[1] 0.00273224
[1] "Dates from 2000-02-29 to 2001-02-28"
[1] 0.9972603
[1] "Dates from 2000-02-29 to 2004-02-29"
[1] 4
[1] "Dates from 2000-02-29 to 2011-03-01"
[1] 11
[1] "Dates from 1997-04-21 to 2000-04-21"
[1] 3
[1] "Dates from 2000-04-21 to 2016-04-21"
[1] 16
[1] "Dates from 2000-04-21 to 2019-04-21"
[1] 19
[1] "Dates from 2017-06-15 to 2018-04-30"
[1] 0.8739726
[1] "Dates from 2019-04-20 to 2019-08-24"
[1] 0.3442623
[1] "Dates from 2020-05-25 to 2021-11-25"
[1] 1.50411
[1] "Dates from 2020-11-25 to 2021-11-24"
[1] 0.9972603
[1] "Dates from 2020-11-24 to 2020-11-25"
[1] 0.002739726
[1] "Dates from 2020-02-28 to 2020-02-29"
[1] 0.00273224
[1] "Dates from 2020-02-29 to 2020-02-28"
Error in age_calculation(ymd(test_list[[i]][1]), ymd(test_list[[i]][2])) :
Birth date is after the desired date!
As others have been saying, the trunc function is excellent to get integer age.
I realise there are a lot of answers but since I can't help myself, I might as well add to the discussion.
I'm building a package that's focused on dates and datetimes and in it I use a function called time_diff(). Here is a simplified version.
time_diff <- function(x, y, units, num = 1,
type = c("duration", "period"),
as_period = FALSE){
type <- match.arg(type)
units <- match.arg(units, c("picoseconds", "nanoseconds", "microseconds",
"milliseconds", "seconds", "minutes", "hours", "days",
"weeks", "months", "years"))
int <- lubridate::interval(x, y)
if (as_period || type == "period"){
if (as_period) int <- lubridate::as.period(int, unit = units)
unit <- lubridate::period(num = num, units = units)
} else {
unit <- do.call(get(paste0("d", units),
asNamespace("lubridate")),
list(x = num))
}
out <- int / unit
out
}
# Wrapper around the more general time_diff
age_years <- function(x, y){
trunc(time_diff(x, y, units = "years", num = 1,
type = "period", as_period = TRUE))
}
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
bday <- dmy("01-01-2000")
time_diff(bday, today(), "years", type = "period")
#> [1] 23.11233
leap1 <- dmy("29-02-2020")
leap2 <- dmy("28-02-2021")
leap3 <- dmy("01-03-2021")
# Many people might say this is wrong so use the more exact age_years
time_diff(leap1, leap2, "years", type = "period")
#> [1] 1
# age in years, accounting for leap years properly
age_years(leap1, leap2)
#> [1] 0
age_years(leap1, leap3)
#> [1] 1
# So to add a column of ages in years, one can do this..
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
my_data <- tibble(dob = seq(bday, today(), by = "day"))
my_data <- my_data %>%
mutate(age_years = age_years(dob, today()))
slice_head(my_data, n = 10)
#> # A tibble: 10 x 2
#> dob age_years
#> <date> <dbl>
#> 1 2000-01-01 23
#> 2 2000-01-02 23
#> 3 2000-01-03 23
#> 4 2000-01-04 23
#> 5 2000-01-05 23
#> 6 2000-01-06 23
#> 7 2000-01-07 23
#> 8 2000-01-08 23
#> 9 2000-01-09 23
#> 10 2000-01-10 23
Created on 2023-02-11 with reprex v2.0.2

Add a month to a Date [duplicate]

This question already has answers here:
How to subtract months from a date in R?
(6 answers)
Closed 4 years ago.
I am trying to add a month to a date i have. But then its not possible in a straight manner so far. Following is what i tried.
d <- as.Date("2004-01-31")
d + 60
# [1] "2004-03-31"
Adding wont help as the month wont be overlapped.
seq(as.Date("2004-01-31"), by = "month", length = 2)
# [1] "2004-01-31" "2004-03-02"
Above might work , but again its not straight forward.
Also its also adding 30 days or something to the date which has issues like the below
seq(as.Date("2004-01-31"), by = "month", length = 10)
# [1] "2004-01-31" "2004-03-02" "2004-03-31" "2004-05-01" "2004-05-31" "2004-07-01" "2004-07-31" "2004-08-31" "2004-10-01" "2004-10-31"
In the above , for the first 2 dates , month haven’t changed.
Also the following approach also failed for month but was success for year
d <- as.POSIXlt(as.Date("2010-01-01"))
d$year <- d$year +1
d
# [1] "2011-01-01 UTC"
d <- as.POSIXlt(as.Date("2010-01-01"))
d$month <- d$month +1
d
Error in format.POSIXlt(x, usetz = TRUE) : invalid 'x' argument
What is the right method to do this ?
Function %m+% from lubridate adds one month without exceeding last day of the new month.
library(lubridate)
(d <- ymd("2012-01-31"))
1 parsed with %Y-%m-%d
[1] "2012-01-31 UTC"
d %m+% months(1)
[1] "2012-02-29 UTC"
It is ambiguous when you say "add a month to a date".
Do you mean
add 30 days?
increase the month part of the date by 1?
In both cases a whole package for a simple addition seems a bit exaggerated.
For the first point, of course, the simple + operator will do:
d=as.Date('2010-01-01')
d + 30
#[1] "2010-01-31"
As for the second I would just create a one line function as simple as that (and with a more general scope):
add.months= function(date,n) seq(date, by = paste (n, "months"), length = 2)[2]
You can use it with arbitrary months, including negative:
add.months(d, 3)
#[1] "2010-04-01"
add.months(d, -3)
#[1] "2009-10-01"
Of course, if you want to add only and often a single month:
add.month=function(date) add.months(date,1)
add.month(d)
#[1] "2010-02-01"
If you add one month to 31 of January, since 31th February is meaningless, the best to get the job done is to add the missing 3 days to the following month, March. So correctly:
add.month(as.Date("2010-01-31"))
#[1] "2010-03-03"
In case, for some very special reason, you need to put a ceiling to the last available day of the month, it's a bit longer:
add.months.ceil=function (date, n){
#no ceiling
nC=add.months(date, n)
#ceiling
day(date)=01
C=add.months(date, n+1)-1
#use ceiling in case of overlapping
if(nC>C) return(C)
return(nC)
}
As usual you could add a single month version:
add.month.ceil=function(date) add.months.ceil(date,1)
So:
d=as.Date('2010-01-31')
add.month.ceil(d)
#[1] "2010-02-28"
d=as.Date('2010-01-21')
add.month.ceil(d)
#[1] "2010-02-21"
And with decrements:
d=as.Date('2010-03-31')
add.months.ceil(d, -1)
#[1] "2010-02-28"
d=as.Date('2010-03-21')
add.months.ceil(d, -1)
#[1] "2010-02-21"
Besides you didn't tell if you were interested to a scalar or vector solution. As for the latter:
add.months.v= function(date,n) as.Date(sapply(date, add.months, n), origin="1970-01-01")
Note: *apply family destroys the class data, that's why it has to be rebuilt.
The vector version brings:
d=c(as.Date('2010/01/01'), as.Date('2010/01/31'))
add.months.v(d,1)
[1] "2010-02-01" "2010-03-03"
Hope you liked it))
Vanilla R has a naive difftime class, but the Lubridate CRAN package lets you do what you ask:
require(lubridate)
d <- ymd(as.Date('2004-01-01')) %m+% months(1)
d
[1] "2004-02-01"
Hope that helps.
The simplest way is to convert Date to POSIXlt format.
Then perform the arithmetic operation as follows:
date_1m_fwd <- as.POSIXlt("2010-01-01")
date_1m_fwd$mon <- date_1m_fwd$mon +1
Moreover, incase you want to deal with Date columns in data.table, unfortunately, POSIXlt format is not supported.
Still you can perform the add month using basic R codes as follows:
library(data.table)
dt <- as.data.table(seq(as.Date("2010-01-01"), length.out=5, by="month"))
dt[,shifted_month:=tail(seq(V1[1], length.out=length(V1)+3, by="month"),length(V1))]
Hope it helps.
"mondate" is somewhat similar to "Date" except that adding n adds n months rather than n days:
> library(mondate)
> d <- as.Date("2004-01-31")
> as.mondate(d) + 1
mondate: timeunits="months"
[1] 2004-02-29
Here's a function that doesn't require any packages to be installed. You give it a Date object (or a character that it can convert into a Date), and it adds n months to that date without changing the day of the month (unless the month you land on doesn't have enough days in it, in which case it defaults to the last day of the returned month). Just in case it doesn't make sense reading it, there are some examples below.
Function definition
addMonth <- function(date, n = 1){
if (n == 0){return(date)}
if (n %% 1 != 0){stop("Input Error: argument 'n' must be an integer.")}
# Check to make sure we have a standard Date format
if (class(date) == "character"){date = as.Date(date)}
# Turn the year, month, and day into numbers so we can play with them
y = as.numeric(substr(as.character(date),1,4))
m = as.numeric(substr(as.character(date),6,7))
d = as.numeric(substr(as.character(date),9,10))
# Run through the computation
i = 0
# Adding months
if (n > 0){
while (i < n){
m = m + 1
if (m == 13){
m = 1
y = y + 1
}
i = i + 1
}
}
# Subtracting months
else if (n < 0){
while (i > n){
m = m - 1
if (m == 0){
m = 12
y = y - 1
}
i = i - 1
}
}
# If past 28th day in base month, make adjustments for February
if (d > 28 & m == 2){
# If it's a leap year, return the 29th day
if ((y %% 4 == 0 & y %% 100 != 0) | y %% 400 == 0){d = 29}
# Otherwise, return the 28th day
else{d = 28}
}
# If 31st day in base month but only 30 days in end month, return 30th day
else if (d == 31){if (m %in% c(1, 3, 5, 7, 8, 10, 12) == FALSE){d = 30}}
# Turn year, month, and day into strings and put them together to make a Date
y = as.character(y)
# If month is single digit, add a leading 0, otherwise leave it alone
if (m < 10){m = paste('0', as.character(m), sep = '')}
else{m = as.character(m)}
# If day is single digit, add a leading 0, otherwise leave it alone
if (d < 10){d = paste('0', as.character(d), sep = '')}
else{d = as.character(d)}
# Put them together and convert return the result as a Date
return(as.Date(paste(y,'-',m,'-',d, sep = '')))
}
Some examples
Adding months
> addMonth('2014-01-31', n = 1)
[1] "2014-02-28" # February, non-leap year
> addMonth('2014-01-31', n = 5)
[1] "2014-06-30" # June only has 30 days, so day of month dropped to 30
> addMonth('2014-01-31', n = 24)
[1] "2016-01-31" # Increments years when n is a multiple of 12
> addMonth('2014-01-31', n = 25)
[1] "2016-02-29" # February, leap year
Subtracting months
> addMonth('2014-01-31', n = -1)
[1] "2013-12-31"
> addMonth('2014-01-31', n = -7)
[1] "2013-06-30"
> addMonth('2014-01-31', n = -12)
[1] "2013-01-31"
> addMonth('2014-01-31', n = -23)
[1] "2012-02-29"
addedMonth <- seq(as.Date('2004-01-01'), length=2, by='1 month')[2]
addedQuarter <- seq(as.Date('2004-01-01'), length=2, by='1 quarter')[2]
I turned antonio's thoughts into a specific function:
library(DescTools)
> AddMonths(as.Date('2004-01-01'), 1)
[1] "2004-02-01"
> AddMonths(as.Date('2004-01-31'), 1)
[1] "2004-02-29"
> AddMonths(as.Date('2004-03-30'), -1)
[1] "2004-02-29"

Resources