Calculate time difference on only months and days - r

I would like to calculate the time difference considering ONLY the days and months.
For example:
> as.Date("2018-12-15")-as.Date("2018-12-16")
Time difference of -1 days
> as.Date("2008-12-15")-as.Date("2018-12-16")
Time difference of -3653 days
I want both of them to return -1.
Edit:
Leap years should not be considered as we just want an approximation and the return value do not need to be exact.

As suggested by #Omry Atia we can set the years component to same year and then calculate the difference.
library(lubridate)
get_difference_without_years <- function(x, y) {
x <- ymd(x)
year(x) <- 2018
y <- ymd(y)
year(y) <- 2018
x - y
}
get_difference_without_years("2018-12-15", "2018-12-16")
#Time difference of -1 days
get_difference_without_years("2008-12-15", "2018-12-16")
#Time difference of -1 days
To keep it in base R
get_difference_without_years <- function(x, y) {
x <- as.Date(paste0("2018-", format(as.Date(x), "%m-%d")))
y <- as.Date(paste0("2018-", format(as.Date(y), "%m-%d")))
x - y
}
get_difference_without_years("2008-12-15", "2018-12-16")
#Time difference of -1 days
get_difference_without_years("2018-12-15", "2018-12-16")
#Time difference of -1 days

The question is not well defined for the case that the dates straddle the end of Feb and one year is a leap year and one is not but ignoring this we can replace the year in each date with a leap year if either is a leap year (year 2000) and a non-leap year (year 1999) otherwise and then subtract:
library(lubridate)
d1 <- "2008-12-15"
d2 <- "2018-12-16"
yr <- 1999 + (leap_year(as.Date(d1)) || leap_year(as.Date(d2)))
as.Date(sub("....", yr, d1)) - as.Date(sub("....", yr, d2))
## Time difference of -1 days
ADDED
In a comment the poster indicated that we can ignore the problems introduced by leap years. In that case we can just pick a leap year as the date to substitute in so that it always returns an answer. We do that below. We no longer need lubridate to check whether the dates are leap years or not.
as.Date(sub("....", 2000, d1)) - as.Date(sub("....", 2000, d2))
## Time difference of -1 days
(Alternately we could pick a year that is not a leap year and since most years are not leap years that would more likely not be one day off for straddled dates; however, it would be at the cost of failing if one of the dates is Feb 29th.)

If we're allowed to be a bit more approximate, ignoring leap-years, we can simplify things a bit by using %j (day of year) in format().
yd_diff <- function(x, y=NULL) {
x <- as.integer(format(x, "%j"))
if (is.null(y)) {
diff(x)
} else {
x - as.integer(format(y, "%j"))
}
}
d1 <- as.Date("2008-12-15")
d2 <- as.Date("2018-12-16")
yd_diff(d1, d2)
# 0
set.seed(1)
rd <- as.Date(sample(1:10000, 5), origin="1970-01-01")
yd_diff(rd)
# -30 180 65 -123
And even simpler, we can convert the date to integer and take the modulo days in a year. Graciously, R lets you use modulo with non-integers.
(as.integer(d1) %% 365.24) - (as.integer(d2) %% 365.24)
# -0.6
diff(as.integer(rd) %% 365.24)
# -30.72 180.80 64.84 -123.44

Another solution might be to extract only the day-of-year from each date, and then do the maths op, especially if leap-years are important.
For example, the DoY for the following:
DayOfYear(2020, 12, 15) = 350 # leap year
DayOfYear(2018, 12, 15) = 349
DayOfYear(2016, 12, 15) = 350 # leap year
DayOfYear(2011, 12, 16) = 350
You can find lots of suggestions on how to get the DoY from extract day number of year from dates and How do you convert POSIX date to day of year in R?.

Related

Count leap days for different date ranges in R

I am using R and lubridate.
I need to count the number of leap days occurring in a bunch of different date ranges. I have done lots of googling but most results seem to just want to find out if certain years are leap years but do not consider where you are starting and ending within each year, or are for different programs I am not familiar with.
I was thinking a function would be the best way to go but was struggling on getting the code down.
My idea was to count the number of leap years in the date range using lubridate's leap_year function, and then check the partial years at the beginning and end of the period and add/subtract to the leap year count if needed.
start_date <- as.Date("2008-03-31")
end_date <- as.Date("2020-09-30")
years_list <- seq(start_date, end_date, by="years")
leap_days <- sum(leap_year(years_list))
The next step would be to check the partial years and add/subtract from leap_days when needed, which is where I am struggling. The desired result for this situation would be 3 (leap years in 2012, 2016, and 2020). Ultimately, I would be checking lots of different date ranges, not just this one.
Any help is appreciated.
If you accept the premise that a "leap day" is always February 29, then perhaps
grep("-02-29", seq(start_date, end_date, by = "day"), value = TRUE)
# [1] "2012-02-29" "2016-02-29" "2020-02-29"
This function seems to work, returning the total count of leap days.
count_leap_days <- function(x){
if(!require(lubridate)){
stop("install package 'lubridate'")
}
first_leap <- if(leap_year(x[1])) month(x[1]) %in% 1:2
x <- x[-1]
n <- length(x)
if(n > 0){
if(leap_year(x[n])) {
last_leap <- (month(x[n]) >= 3) || (month(x[n] == 2 && day(x[n] == 29)))
x <- x[-n]
}
}
ly <- c(first_leap, leap_year(x), last_leap)
sum(ly)
}
count_leap_days(years_list)
#[1] 3

Converting julian days in to date format yyyy-dd-mm HH:MM:SS

With the help of the following function (original from R - convert POSIXct to fraction of julian day), I convert different starting points of observations, from date format to Julian days.
date <- as.POSIXct(c('2006-12-12 13:00:00', '2008-12-12 12:00:00', '2007-12-12 12:00:00'))
julian_conv <- function(x) {
if (is.na(x)) {
return(NA)
}
else {
j <-julian(x, origin = as.POSIXlt(paste0(format(x, "%Y"),'-01-01')))
temp <- unclass(j)
return(temp[1] + 1)
}
}
julian.days <- sapply(date, julian_conv)
The results:
print(julian.days)
[1] 346.5417 347.5417 346.5000
Then I took the average of those starting points.
mean <- mean(julian.days)
[1] 346.8611
Now I need to convert the average starting point back into a date format (yyyy-dd-mm HH:MM:SS), once for a common year and once for a leap. The question now, how is this possible?
You can just add the Julian days to a date object:
2006 was a year with 365 days:
as.Date("2006-01-01") + mean(julian.days)
2008 was a leap year:
as.Date("2008-01-01") + mean(julian.days)

R Programming 30 day Months

I'm currently writing a script in the R Programming Language and I've hit a snag.
I have time series data organized in a way where there are 30 days in each month for 12 months in 1 year. However, I need the data organized in a proper 365 days in a year calendar, as in 30 days in a month, 31 days in a month, etc.
Is there a simple way for R to recognize there are 30 days in a month and to operate within that parameter? At the moment I have my script converting the number of days from the source in UNIX time and it counts up.
For example:
startingdate <- "20060101"
endingdate <- "20121230"
date <- seq(from = as.Date(startingdate, "%Y%m%d"), to = as.Date(endingdate, "%Y%m%d"), by = "days")
This would generate an array of dates with each month having 29 days/30 days/31 days etc. However, my data is currently organized as 30 days per month, regardless of 29 days or 31 days present.
Thanks.
The first 4 solutions are basically variations of the same theme using expand.grid. (3) uses magrittr and the others use no packages. The last two work by creating long sequence of numbers and then picking out the ones that have month and day in range.
1) apply This gives a series of yyyymmdd numbers such that there are 30 days in each month. Note that the line defining yrs in this case is the same as yrs <- 2006:2012 so if the years are handy we could shorten that line. Omit as.numeric in the line defining s if you want character string output instead. Also, s and d are the same because we have whole years so we could omit the line defining d and use s as the answer in this case and also in general if we are always dealing with whole years.
startingdate <- "20060101"
endingdate <- "20121230"
yrs <- seq(as.numeric(substr(startingdate, 1, 4)), as.numeric(substr(endingdate, 1, 4)))
g <- expand.grid(yrs, sprintf("%02d", 1:12), sprintf("%02d", 1:30))
s <- sort(as.numeric(apply(g, 1, paste, collapse = "")))
d <- s[ s >= startingdate & s <= endingdate ] # optional if whole years
Run some checks.
head(d)
## [1] 20060101 20060102 20060103 20060104 20060105 20060106
tail(d)
## 20121225 20121226 20121227 20121228 20121229 20121230
length(d) == length(2006:2012) * 12 * 30
## [1] TRUE
2) no apply An alternative variation would be this. In this and the following solutions we are using yrs as calculated in (1) so we omit it to avoid redundancy. Also, in this and the following solutions, the corresponding line to the one setting d is omitted, again, to avoid redundancy -- if you don't have whole years then add the line defining d in (1) replacing s in that line with s2.
g2 <- expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30))
s2 <- with(g2, sort(as.numeric(paste0(yr, mon, day))))
3) magrittr This could also be written using magrittr like this:
library(magrittr)
expand.grid(yr = yrs, mon = sprintf("%02d", 1:12), day = sprintf("%02d", 1:30)) %>%
with(paste0(yr, mon, day)) %>%
as.numeric %>%
sort -> s3
4) do.call Another variation.
g4 <- expand.grid(yrs, 1:12, 1:30)
s4 <- sort(as.numeric(do.call("sprintf", c("%d%02d%02d", g4))))
5) subset sequence Create a sequence of numbers from the starting date to the ending date and if each number is of the form yyyymmdd pick out those for which mm and dd are in range.
seq5 <- seq(as.numeric(startingdate), as.numeric(endingdate))
d5 <- seq5[ seq5 %/% 100 %% 100 %in% 1:12 & seq5 %% 100 %in% 1:30]
6) grep Using seq5 from (5)
d6 <- as.numeric(grep("(0[1-9]|1[0-2])(0[1-9]|[12][0-9]|30)$", seq5, value = TRUE))
Here's an alternative:
date <- unclass(startingdate):unclass(endingdate) %% 30L
month <- rep(1:12, each = 30, length.out = NN <- length(date))
year <- rep(1:(NN %/% 360 + 1), each = 360, length.out = NN)
(of course, we can easily adjust by adding constants to taste if you want a specific day to be 0, or a specific month, etc.)

Count the number of Fridays or Mondays in Month in R

I would like a function that counts the number of specific days per month..
i.e.. Nov '13 -> 5 fridays.. while Dec'13 would return 4 Fridays..
Is there an elegant function that would return this?
library(lubridate)
num_days <- function(date){
x <- as.Date(date)
start = floor_date(x, "month")
count = days_in_month(x)
d = wday(start)
sol = ifelse(d > 4, 5, 4) #estimate that is the first day of the month is after Thu or Fri then the week will have 5 Fridays
sol
}
num_days("2013-08-01")
num_days(today())
What would be a better way to do this?
1) Here d is the input, a Date class object, e.g. d <- Sys.Date(). The result gives the number of Fridays in the year/month that contains d. Replace 5 with 1 to get the number of Mondays:
first <- as.Date(cut(d, "month"))
last <- as.Date(cut(first + 31, "month")) - 1
sum(format(seq(first, last, "day"), "%w") == 5)
2) Alternately replace the last line with the following line. Here, the first term is the number of Fridays from the Epoch to the next Friday on or after the first of the next month and the second term is the number of Fridays from the Epoch to the next Friday on or after the first of d's month. Again, we replace all 5's with 1's to get the count of Mondays.
ceiling(as.numeric(last + 1 - 5 + 4) / 7) - ceiling(as.numeric(first - 5 + 4) / 7)
The second solution is slightly longer (although it has the same number of lines) but it has the advantage of being vectorized, i.e. d could be a vector of dates.
UPDATE: Added second solution.
There are a number of ways to do it. Here is one:
countFridays <- function(y, m) {
fr <- as.Date(paste(y, m, "01", sep="-"))
to <- fr + 31
dt <- seq(fr, to, by="1 day")
df <- data.frame(date=dt, mon=as.POSIXlt(dt)$mon, wday=as.POSIXlt(dt)$wday)
df <- subset(df, df$wday==5 & df$mon==df[1,"mon"])
return(nrow(df))
}
It creates the first of the months, and a day in the next months.
It then creates a data frame of month index (on a 0 to 11 range, but we only use this for comparison) and weekday.
We then subset to a) be in the same month and b) on a Friday. That is your result set, and
we return the number of rows as your anwser.
Note that this only uses base R code.
Without using lubridate -
#arguments to pass to function:
whichweekday <- 5
whichmonth <- 11
whichyear <- 2013
#function code:
firstday <- as.Date(paste('01',whichmonth,whichyear,sep="-"),'%d-%m-%Y')
lastday <- if(whichmonth == 12) { '31-12-2013' } else {seq(as.Date(firstday,'%d-%m-%Y'), length=2, by="1 month")[2]-1}
sum(
strftime(
seq.Date(
from = firstday,
to = lastday,
by = "day"),
'%w'
) == whichweekday)

Get date difference in years (floating point)

I want to correct source activity based on the difference between reference and measurement date and source half life (measured in years). Say I have
ref_date <- as.Date('06/01/08',format='%d/%m/%y')
and a column in my data.frame with the same date format, e.g.,
today <- as.Date(Sys.Date(), format='%d/%m/%y')
I can find the number of years between these dates using the lubridate package
year(today)-year(ref_date)
[1] 5
Is there a function I can use to get a floating point answer today - ref_date = 5.2y, for example?
Yes, of course, use difftime() with an as numeric:
R> as.numeric(difftime(as.Date("2003-04-05"), as.Date("2001-01-01"),
+ unit="weeks"))/52.25
[1] 2.2529
R>
Note that we do have to switch to weeks scaled by 52.25 as there is a bit of ambiguity
there in terms of counting years---a February 29 comes around every 4 years but not every 100th etc.
So you have to define that. difftime() handles all time units up to weeks. Months cannot be done for the same reason of the non-constant 'numerator'.
The lubridate package contains a built-in function, time_length, which can help perform this task.
time_length(difftime(as.Date("2003-04-05"), as.Date("2001-01-01")), "years")
[1] 2.257534
time_length(difftime(as.Date("2017-03-01"), as.Date("2012-03-01")),"years")
[1] 5.00274
Documentation for the lubridate package can be found here.
Inspired by Bryan F, time_length() would work better if using interval object
time_length(interval(as.Date("2003-04-05"), as.Date("2001-01-01")), "years")
[1] -2.257534
time_length(difftime(as.Date("2017-03-01"), as.Date("2012-03-01")),"years")
[1] 5.00274
time_length(interval(as.Date("2017-03-01"), as.Date("2012-03-01")),"years")
[1] -5
You can see if you use interval() to get the time difference and then pass it to time_length(), time_length() would take into account the fact that not all months and years have the same number of days, e.g., the leap year.
Not an exact answer to your question, but the answer from Dirk Eddelbuettel in some situations can produce small errors.
Please, consider the following example:
as.numeric(difftime(as.Date("2012-03-01"), as.Date("2017-03-01"), unit="weeks"))/52.25
[1] -4.992481
The correct answer here should be at least 5 years.
The following function (using lubridate package) will calculate a number of full years between two dates:
# Function to calculate an exact full number of years between two dates
year.diff <- function(firstDate, secondDate) {
yearsdiff <- year(secondDate) - year(firstDate)
monthsdiff <- month(secondDate) - month(firstDate)
daysdiff <- day(secondDate) - day(firstDate)
if ((monthsdiff < 0) | (monthsdiff == 0 & daysdiff < 0)) {
yearsdiff <- yearsdiff - 1
}
yearsdiff
}
You can modify it to calculate a fractional part depending on how you define the number of days in the last (not finished) year.
You can use the function AnnivDates() of the package BondValuation:
R> library('BondValuation')
R> DateIndexes <- unlist(
+ suppressWarnings(
+ AnnivDates("2001-01-01", "2003-04-05", CpY=1)$DateVectors[2]
+ )
+ )
R> names(DateIndexes) <- NULL
R> DateIndexes[length(DateIndexes)] - DateIndexes[1]
[1] 2.257534
Click here for documentation of the package BondValuation.
To get the date difference in years (floating point) you can convert the dates to decimal numbers of Year and calculate then their difference.
#Example Dates
x <- as.Date(c("2001-01-01", "2003-04-05"))
#Convert Date to decimal year:
date2DYear <- function(x) {
as.numeric(format(x,"%Y")) + #Get Year an add
(as.numeric(format(x,"%j")) - 0.5) / #Day of the year divided by
as.numeric(format(as.Date(paste0(format(x,"%Y"), "-12-31")),"%j")) #days of the year
}
diff(date2DYear(x)) #Get the difference in years
#[1] 2.257534
I subtract 0.5 from the day of the year as it is not known if you are at the beginning or the end of the day and %j starts with 1.
I think the difference between 2012-03-01 and 2017-03-01 need not to be 5 Years, as 2012 has 366 days and 2017 365 and 2012-03-01 is on the 61 day of the year and 2017-03-01 on the 60.
x <- as.Date(c("2012-03-01", "2017-03-01"))
diff(date2DYear(x))
#[1] 4.997713
Note that using time_length and interval from lubridate need not come to the same result when you make a cumulative time difference.
library(lubridate)
x <- as.Date(c("2012-01-01", "2012-03-01", "2012-12-31"))
time_length(interval(x[1], x[3]), "years")
#[1] 0.9972678
time_length(interval(x[1], x[2]), "years") +
time_length(interval(x[2], x[3]), "years")
#[1] 0.9995509 #!
diff(date2DYear(x[c(1,3)]))
#[1] 0.9972678
diff(date2DYear(x[c(1,2)])) + diff(date2DYear(x[c(2,3)]))
#[1] 0.9972678
x <- as.Date(c("2013-01-01", "2013-03-01", "2013-12-31"))
time_length(interval(x[1], x[3]), "years")
#[1] 0.9972603
time_length(interval(x[1], x[2]), "years") +
time_length(interval(x[2], x[3]), "years")
#[1] 0.9972603
diff(date2DYear(x[c(1,3)]))
#[1] 0.9972603
diff(date2DYear(x[c(1,2)])) + diff(date2DYear(x[c(2,3)]))
#[1] 0.9972603
Since you are already using lubridate package, you can obtain number of years in floating point using a simple trick:
find number of seconds in one year:
seconds_in_a_year <- as.integer((seconds(ymd("2010-01-01")) - seconds(ymd("2009-01-01"))))
now obtain number of seconds between the 2 dates you desire
seconds_between_dates <- as.integer(seconds(date1) - seconds(date2))
your final answer for number of years in floating points will be
years_between_dates <- seconds_between_dates / seconds_in_a_year

Resources