How to find decimal representation of years in R? - r

Since I need reasonably accurate representations of years in decimal format (~ 4-5 digits of accuracy would work) I turned to the lubridate package. This is what I have tried:
refDate <- as.Date("2016-01-10")
endDate <- as.Date("2020-12-31")
daysInLeapYear <- 366
daysInRegYear <- 365
leapYearFractStart <- 0
leapYearRegStart <- 0
daysInterval <- as.interval(difftime(endDate, refDate, unit = "d"), start = refDate)
periodObject <- as.period(daysInterval)
if(leap_year(refDate)) {
leapYearFractStart <- (as.numeric(days_in_month(refDate))-as.numeric(format(refDate, "%d")))/daysInLeapYear
}
if(!leap_year(refDate)) {
leapYearRegStart <- (as.numeric(days_in_month(refDate))-as.numeric(format(refDate, "%d")))/daysInRegYear
}
returnData <- periodObject#year+(periodObject#month/12)+leapYearFractStart+leapYearRegStart
It is safe to assume that the end date is always at the end of a month, hence no leap year check at the end. Relying on lubridate for proper year/month counting I am adjusting for leap-years only for the start date.
I recon this gets me to within 3 digits of accuracy only! In addition, it looks a bit crude.
Is there a more complete and accurate procedure to determine decimal representation of years in an interval?

It's very unclear what you're trying to do exactly here, which makes accuracy difficult to talk about.
lubridate has a function decimal_date which turns dates into decimals. But since 3 decimal places gives you 1000 possible positions within a year, when we only have 365/366 days, there are between 2 and 3 viable values that fall within a day. Accuracy depends on when in the day you want the result to fall.
> decimal_date(as.POSIXlt("2016-01-10 00:00:01"))
[1] 2016.025
> decimal_date(as.POSIXlt("2016-01-10 12:00:00"))
[1] 2016.026
> decimal_date(as.POSIXlt("2016-01-10 23:59:59"))
[1] 2016.027
In other words, going beyond 3 decimal places is only really important if you're interested in the time of day.

This solution uses only base R. We get the beginning of the year using cut(..., "year") and the number of days in the year by differencing it with the beginning of the next year obtained using cut(..., "year") on an arbitrary date in the following year. Finally use those quantities to get the fraction and add it to the year.
d <- as.Date(c("2015-01-31", "2016-01-01", "2016-01-10", "2016-12-31")) # sample input
year_begin <- as.Date(cut(d, "year"))
days_in_year <- as.numeric( as.Date(cut(year_begin + 366, "year")) - year_begin )
as.numeric(format(d, "%Y")) + as.numeric(d - year_begin) / days_in_year
## [1] 2015.082 2016.000 2016.025 2016.997
Alternately, using as.POSIXlt this variation crams it into one line:
with(unclass(as.POSIXlt(d)),1900+year+yday/as.numeric(as.Date(cut(d-yday+366,"y"))-d+yday))
## [1] 2015.082 2016.000 2016.025 2016.997

Related

Substract decimal years from date in r

I tried to subtract decimal years from date in order to get initial date, something like this question but I am using years with a decimal part, ej: 5.5 years, I need the origin date from that difference, like this:
library(lubridate)
ymd("2021-05-21")-years(5.5)
# 2015-11-21 desired output
But, this give an error because years function only accepts integers. How can I achieve this?
We could use years and months
v1 <- 5.5
yr <- as.integer(v1)
mth <- as.integer((v1* 12) %% 12)
ymd("2021-05-21") - (years(yr) + months(mth))
#[1] "2015-11-21"
It is tricky to calculate time differences accurately, especially perhaps years. Sources such as Wikipedia talk about an average length of a Gregorian year of g = 365.2425 days, taking account for leap years (not yet for leap seconds, though, which are not regular). Anyway, we could assume g as the average length of a year, neglecting the actual number of leap days in our time difference and define a function add_yr() that should be reasonably valid for dates after October 15, 1582.
add_yr <- \(d, y) as.Date(d) + y * 365.2425
(prior to R4.1.* use this code: add_yr <- function(d, y) as.Date(d) + y * 365.2425)
This shows that we need to insert -5.495 instead of -5.5 to get OP's desired date ("2015-11-21").
add_yr("2021-05-21", -5.495)
# [1] "2015-11-21"
add_yr("2021-05-21", -5.5)
# [1] "2015-11-20"
The gain in accuracy is almost 2 days in this case:
(5.5 - 5.495) * 365.2425
# [1] 1.826212

R difftime subtracts 2 days

I have some timedelta strings which were exported from Python. I'm trying to import them for use in R, but I'm getting some weird results.
When the timedeltas are small, I get results that are off by 2 days, e.g.:
> as.difftime('26 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of 24.20389 days
When they are larger, it doesn't work at all:
> as.difftime('36 days 04:53:36.000000000',format='%d days %H:%M:%S.000000000')
Time difference of NA secs
I also read into 'R' some time delta objects I had processed with 'Python' and had a similar issue with the 26 days 04:53:36.000000000 format. As Gregor said, %d in strptime is the day of the month as a zero padded decimal number so won't work with numbers >31 and there doesn't seem to be an option for cumulative days (probably because strptime is for date time objects and not time delta objects).
My solution was to convert the objects to strings and extract the numerical data as Gregor suggested and I did this using the gsub function.
# convert to strings
data$tdelta <- as.character(data$tdelta)
# extract numerical data
days <- as.numeric(gsub('^.*([0-9]+) days.*$','\\1',data$tdelta))
hours <- as.numeric(gsub('^.*ys ([0-9]+):.*$','\\1',data$tdelta))
minutes <- as.numeric(gsub('^.*:([0-9]+):.*$','\\1',data$tdelta))
seconds <- as.numeric(gsub('^.*:([0-9]+)..*$','\\1',data$tdelta))
# add up numerical components to whatever units you want
time_diff_seconds <- seconds + minutes*60 + hours*60*60 + days*24*60*60
# add column to data frame
data$tdelta <- time_diff_seconds
That should allow you to do computations with the time differences. Hope that helps.

Adding quarters to R date

I have a R time series data, where I am calculating the means for all values up to a particular date, and storing this means in the date + 4 quarters. The dates are all month ends. To achieve this, I am looking to increment 4 quarters to a date. My question is how can I add 4 quarters to an R date data-type. An illustration:
a <- as.Date("2006-01-01")
b <- as.Date("2011-01-01")
date_range <- quarter(seq.Date(a, b, by = "quarter"), with_year = TRUE)
> date_range[1] + 1
[1] 2007.1
> date_range[1] + quarter(1)
[1] 2007.1
> date_range[1] + 0.25
[1] 2006.35
One possible way I am thinking is to get year-quarter dates, and then adding 4 to it. But wasn't sure what is the best way to do this?
The problem is that quarters have different lengths. Q1 is shortest because it includes February (though it ties with Q2 in leap years). Things like this make "adding a quarter to a date" poorly defined. Even adding months to a date can be tricky at the ends months - what is 1 month after January 31?
Beginnings of months are more straightforward, and I would recommend you use the 1st day of quarters rather than the last (if you must use a specific date). lubridate provides functions like floor_date() and ceiling_date() to which you can pass unit = "quarter" and they will return the first day of the current or subsequent quarter, respectively. You can also always add months(3) to a day at the beginning of a month, though of course if your intention is to add 4 quarters you may as well just add 1 year.
Just add 12 months or a year instead?
Or if it must be quarters, define yourself a function, like so:
quarters <- function(x) {
months(3*x)
}
and then use it to add to the date sequence:
date_range <- seq.Date(a, b, by = "quarter")
date_range + quarters(4)
Lubridate has a function for quarters already included. This is a much better solution than creating your own function.
https://www.rdocumentation.org/packages/lubridate/versions/1.7.4/topics/quarter
Old answer but to those arriving here, lubridate has a function %m+%that adds months and preserves monthends.
a <- as.Date("2006-01-01")
Add future months worth of dates:
The original poster wanted 4 quarters in future so that will be 12 months.
future_date <- a %m+% months(12)
future_date
[1] "2007-01-01"
You could also do years as the period:
future_date <- a %m+% years(1)
Remove months from date:
Subtract dates with %m-%
If you wanted a date 3 months ago from 1/1/2006:
past_date <- a %m-% months(3)
past_date
[1] "2005-10-01"
Example with dates not at end of months:
mplus will preserve days in month:
as.Date("2022-10-10") %m-% months(3)
[1] "2022-07-10"
For more, see documentation on "Add and subtract months to a date without exceeding the last day of the new month"
Note that other answers that use Date class will give irregularly spaced series and so are unsuitable for time series analysis.
To do this in such a way that time series analyses can be performed and noting the zoo tag on the question, the yearmon class represents year/month as year + fraction where fraction is 0 for Jan, 1/12 for Feb, 2/12 for Mar, ..., 11/12 for Dec. Thus adding 4 quarters is just a matter of adding 1. (Adding x quarters is done by adding x/4.)
library(zoo)
ym <- yearmon(2006) + 0:11/12 # months in 2006
ym + 1 # one year later
Also this converts yearmon objects to end-of-month Date and in the second line Date to yearmon. Using frac = 0 or omitting frac in the first line would convert to beginning of month dates.
d <- as.Date(ym, frac = 1) # d is Date vector of end-of-months
as.yearmon(d) # convert Date vector to yearmon
If your input dates represent quarters then there is also the yearqtr class which represents a year/quarter as year + fraction where fraction is 0, 1/4, 2/4, 3/4 for the 4 quarters of a year. Adding 4 quarters is done by adding 1 (or to add x quarters add x/4).
yq <- as.yearqtr(2006) + 0:3/4 # all quarters in 2006
yq + 1 # one year later
Conversions work similarly to yearmon:
d <- as.Date(ym, frac = 1) # d is Date vector of end-of-quarters
as.yearqtr(d) # convert Date vector to yearqtr

calculating ages in R by subtracting two dates columns

I have 2 columns with ~ 2000 rows of dates in them. One is a variable with a visit date (df$visitdate), and the other is a birth date of the individual (df$birthday).
Wondering if there is any simple way to subtract the visit date - birth date to create the variable "age at the time of the visit", accounting for leap years, etc.
I tried to use the following code (from an answer in a similar question) but it didn't work in my case.
find number of seconds in one year:
seconds_in_a_year <- as.integer((seconds(ymd("2010-01-01")) - seconds(ymd("2009-01-01"))))
now obtain number of seconds between the 2 dates you desire
seconds_between_dates <- as.integer(seconds(date1) - seconds(date2))
your final answer for number of years in floating points will be
years_between_dates <- seconds_between_dates / seconds_in_a_year
When I tried to apply this to my data frame (note: using variables rather than specific dates, so this may be the cause) I got the following:
seconds_in_a_year <- as.integer((seconds(ymd(df$visitdate)) - seconds(ymd(df$birthday))))
Warning message:
NAs introduced by coercion
Following the code along I got a final output of:
years_between_dates
[1] 1.157407e-05 [2] 1.157407e-05
Any help is greatly appreciated!
Subtracting from a Date object another Date object gives you the time difference in days, e.g.
> dates = as.Date(c("2007-03-01", "2004-05-23"))
>
> dates[1] - dates[2]
Time difference of 1012 days
So, assuming 365 days in a year
> age_time_visit = as.numeric(dates[1] - dates[2]) / 365
> age_time_visit
[1] 2.772603
There are various answers for this scattered around the internet.
I think the one I've typically used was inspired by Professor Ripley:
http://r.789695.n4.nabble.com/Calculate-difference-between-dates-in-years-td835196.html
age_years <- function(first, second)
{
lt <- data.frame(first, second)
age <- as.numeric(format(lt[,2],format="%Y")) - as.numeric(format(lt[,1],format="%Y"))
first <- as.Date(paste(format(lt[,2],format="%Y"),"-",format(lt[,1],format="%m-%d"),sep=""))
age[which(first > lt[,2])] <- age[which(first > lt[,2])] - 1
age
}
There's another approach at https://gist.github.com/mmparker/7254445
Or you you just want to raw, decimal value of years, you can get the number of days and divide by 365.2425
Here is an approach that accounts for leap years (don't know if this has been done before, but suspect it has...).
get.age <- function(from, to) {
require(lubridate) # for leap_year(...)
n <- as.integer(to-from)
n.l <- sum(leap_year(seq(from,to,by=1)))
n.l/366 + (n+1-n.l)/365
}
get.age(as.Date("2009-01-01"),as.Date("2012-12-31"))
# [1] 4
get.age(as.Date("2012-01-01"),as.Date("2012-01-31")) # 2012 was a leap year
# [1] 0.08469945
get.age(as.Date("2011-01-01"),as.Date("2011-01-31")) # 2011 was not
# [1] 0.08493151
So the basic idea is to create a vector with one element for every day between from and to (inclusive), then for each day account for whether that day is part of a leap year or not. The we add up the leap year days and the non-leap year days separately and calculate the number of years as:
leap-year-days/366 + non-leap-year-days/365
This works for single dates (vectors of length 1). To enable this for columns of dates, as you asked, we use Vectorize(...).
vget.age <- Vectorize(get.age) # vectorized version
And then a demo:
# example data set
set.seed(1) # for reproducible example
today <- as.Date("2015-09-09")
df <- data.frame(birth.date=today-sample(1000:10000,2000)) # 2000 birthdays
result <- vget.age(df$birth.date,today) # how old are they?
head(result)
# [1] 9.282192 11.909589 16.854795 25.115068 7.706849 24.865753

How to convert in both directions between year,month,day and dates in R?

How to convert between year,month,day and dates in R?
I know one can do this via strings, but I would prefer to avoid converting to strings, partly because maybe there is a performance hit?, and partly because I worry about regionalization issues, where some of the world uses "year-month-day" and some uses "year-day-month".
It looks like ISODate provides the direction year,month,day -> DateTime , although it does first converts the number to a string, so if there is a way that doesn't go via a string then I prefer.
I couldn't find anything that goes the other way, from datetimes to numerical values? I would prefer not needing to use strsplit or things like that.
Edit: just to be clear, what I have is, a data frame which looks like:
year month day hour somevalue
2004 1 1 1 1515353
2004 1 1 2 3513535
....
I want to be able to freely convert to this format:
time(hour units) somevalue
1 1515353
2 3513535
....
... and also be able to go back again.
Edit: to clear up some confusion on what 'time' (hour units) means, ultimately what I did was, and using information from How to find the difference between two dates in hours in R?:
forwards direction:
lh$time <- as.numeric( difftime(ISOdate(lh$year,lh$month,lh$day,lh$hour), ISOdate(2004,1,1,0), units="hours"))
lh$year <- NULL; lh$month <- NULL; lh$day <- NULL; lh$hour <- NULL
backwards direction:
... well, I didnt do backwards yet, but I imagine something like:
create difftime object out of lh$time (somehow...)
add ISOdate(2004,1,1,0) to difftime object
use one of the solution below to get the year,month,day, hour back
I suppose in the future, I could ask the exact problem I'm trying to solve, but I was trying to factorize my specific problem into generic reusable questions, but maybe that was a mistake?
Because there are so many ways in which a date can be passed in from files, databases etc and for the reason you mention of just being written in different orders or with different separators, representing the inputted date as a character string is a convenient and useful solution. R doesn't hold the actual dates as strings and you don't need to process them as strings to work with them.
Internally R is using the operating system to do these things in a standard way. You don't need to manipulate strings at all - just perhaps convert some things from character to their numerical equivalent. For example, it is quite easy to wrap up both operations (forwards and backwards) in simple functions you can deploy.
toDate <- function(year, month, day) {
ISOdate(year, month, day)
}
toNumerics <- function(Date) {
stopifnot(inherits(Date, c("Date", "POSIXt")))
day <- as.numeric(strftime(Date, format = "%d"))
month <- as.numeric(strftime(Date, format = "%m"))
year <- as.numeric(strftime(Date, format = "%Y"))
list(year = year, month = month, day = day)
}
I forego the a single call to strptime() and subsequent splitting on a separation character because you don't like that kind of manipulation.
> toDate(2004, 12, 21)
[1] "2004-12-21 12:00:00 GMT"
> toNumerics(toDate(2004, 12, 21))
$year
[1] 2004
$month
[1] 12
$day
[1] 21
Internally R's datetime code works well and is well tested and robust if a bit complex in places because of timezone issues etc. I find the idiom used in toNumerics() more intuitive than having a date time as a list and remembering which elements are 0-based. Building on the functionality provided would seem easier than trying to avoid string conversions etc.
I'm a bit late to the party, but one other way to convert from integers to date is the lubridate::make_date function. See the example below from R for Data Science:
library(lubridate)
library(nycflights13)
library(tidyverse)
a <- flights %>%
mutate(date = make_date(year, month, day))
Found one solution for going from date to year,month,day.
Let's say we have a date object, that we'll create here using ISOdate:
somedate <- ISOdate(2004,12,21)
Then, we can get the numerical components of this as follows:
unclass(as.POSIXlt(somedate))
Gives:
$sec
[1] 0
$min
[1] 0
$hour
[1] 12
$mday
[1] 21
$mon
[1] 11
$year
[1] 104
Then one can get what one wants for example:
unclass(as.POSIXlt(somedate))$mon
Note that $year is [actual year] - 1900, month is 0-based, mday is 1-based (as per the POSIX standard)

Resources