I'm trying to calculate the difference-in-years between 2 dates.
time1 <- as.Date(x = "2017-02-14",
format = "%Y-%m-%d",
origin = "",
tz = "")
time2 <- as.Date(x = "1972-02-17",
format = "%Y-%m-%d",
origin = "",
tz = "")
as.integer(x = difftime(time1 = time1,
time2 = time2,
tz = "",
units = "days")) / 365
According to the above code, the difference-in-years is about 45.02466.
A glance at the 2 dates confirms that the difference-in-years is close to, but less than, 45.
The difference-in-years value should begin with 44.
Why is difftime inflating the number of years by 1?
Probably this happens since you divide by 365 days, whereas at least 44/4 = 11 years had a duration of 366 days. You could consider to divide by 365.25, which is a more correct average duration of a year.
Also, you might like to look up the lubridate package to see the interval of how many years have passed between two dates
library(lubridate)
> time1 <- as.Date(x = "2017-02-14",
+ format = "%Y-%m-%d",
+ origin = "",
+ tz = "")
>
> time2 <- as.Date(x = "1972-02-17",
+ format = "%Y-%m-%d",
+ origin = "",
+ tz = "")
>
>
> interval(time2, time1) %/% years(1)
[1] 44
There are an average of 365.25 days in a year due to leap years -- could be slightly different if the number of years is not a multiple of 4 but this is a reasonable long term average to use. By using 365 instead this inflated the answer in the question.
time1 <- as.Date("2017-02-14")
time2 <- as.Date("1972-02-17")
as.numeric(time1 - time2) / 365.25
## [1] 44.99384
Standardized years
If you want to be really exact about leap years there were 12 leap days between time1 and time2
tt <- seq(time2, time1, by = "day")
no_leap_days <- sum(format(tt, "%m-%d") == "02-29")
no_leap_days
## [1] 12
so another way to do this is to calculate the number of days between the two dates, subtract the number of leap days effectively standardizing every year to 365 days and then divide by 365:
as.numeric(time1 - time2 - no_leap_days) / 365
## [1] 44.99178
Note
Note that "Date" class does not have a time zone, origin is only used in as.Date if the argument is numeric and the format used in the question is the default format so format is not needed.
Related
I have a starting time specified as a year-month character, e.g. "2020-12". From the start, for each of T consecutive months, I need to generate n different dates (year-month-day), where the day is random.
Any help will be useful!
The data I'm working on:
data <- data.frame(
data = sample(seq(as.Date('2000/01/01'), as.Date('2020/01/01'), by="day"), 500),
price = round(runif(500, min = 10, max = 20),2),
quantity = round(rnorm(500,30),0)
)
func <- function(start, months, n) {
startdate <- as.Date(paste0(start, "-01"))
enddate <- seq(startdate, by = "month", length.out = months)
months <- seq_len(months)
enddate_lt <- as.POSIXlt(enddate)
enddate_lt$mon <- enddate_lt$mon + 1
enddate_lt$mday <- enddate_lt$mday - 1
days_per_month <- as.integer(format(enddate_lt, format = "%d"))
days <- lapply(days_per_month, sample, size = n)
dates <- Map(`+`, enddate, days)
do.call(c, dates)
}
set.seed(2021)
func("2020-12", 4, 3)
# [1] "2020-12-08" "2020-12-07" "2020-12-15" "2021-01-27" "2021-01-08" "2021-01-13" "2021-02-21" "2021-02-07" "2021-02-28"
# [10] "2021-03-28" "2021-03-07" "2021-03-15"
func("2020-12", 5, 2)
# [1] "2020-12-06" "2020-12-16" "2021-01-08" "2021-01-10" "2021-02-24" "2021-02-13" "2021-03-20" "2021-03-29" "2021-04-19"
# [10] "2021-04-28"
func("2020-12", 2, 10)
# [1] "2020-12-29" "2020-12-30" "2020-12-04" "2020-12-15" "2020-12-09" "2020-12-27" "2020-12-05" "2020-12-06" "2020-12-23"
# [10] "2020-12-17" "2021-01-03" "2021-01-20" "2021-01-05" "2021-01-22" "2021-01-23" "2021-01-06" "2021-01-10" "2021-01-07"
# [19] "2021-01-19" "2021-01-12"
Most of the dancing with POSIXlt objects is because it gives us clean (base R) access to the number of days in a month, which makes sampleing the days in a month rather simple. It can also be done (code-golf shorter) using the lubridate package, but I don't know that that is any more correct than this code is.
This just dumps out a sequence of random dates, with n days per month. It does not sort within each month, though it does output the months in order. (That's not a difficult extension, there just wasn't a requirement for it.) It doesn't put out a frame, you can easily extend this to fit in a frame or call data.frame(date = do.call(c, dates)) on the last line, depending on what you need to do with the output.
You could convert the start time to a class for monthly data, zoo::yearmon. Then use as.Date.yearmon and its frac argument ("a number between 0 and 1 inclusive that indicates the fraction of the way through the period that the result represents") with random values from runif (uniform between 0 and 1) to convert to a random date within each year-month.
start = "2020-12"
T = 3
n = 2
library(zoo)
set.seed(1)
as.Date(as.yearmon(start) + rep((1:T)/12, each = n), frac = runif(T * n))
# [1] "2021-01-08" "2021-01-12" "2021-02-16" "2021-02-25" "2021-03-07" "2021-03-27"
i have time string as "08:00","06:00"
and i wanna calculate difference between them
and divide it by 15 mins.
then results should be 8 in integer
i don't how to code in R
anybody can help me ?
Something like this using difftime?
difftime(
as.POSIXct("08:00", format = "%H:%M"),
as.POSIXct("06:00", format = "%H:%M"),
units = "mins") / 15
#Time difference of 8 mins
Or to convert to numeric
as.numeric(
difftime(as.POSIXct("08:00", format = "%H:%M"),
as.POSIXct("06:00", format = "%H:%M"),
units = "mins") / 15)
#[1] 8
It would be easy with lubridate, where we convert the strings in hm format and divide by 15 minutes.
library(lubridate)
(hm(a) - hm(b))/minutes(15)
#[1] 8
data
a <- "08:00"
b <- "06:00"
How can a date/time object in R be transformed on the fraction of a julian day?
For example, how can I turn this date:
date <- as.POSIXct('2006-12-12 12:00:00',tz='GMT')
into a number like this
> fjday
[1] 365.5
where julian day is elapsed day counted from the january 1st. The fraction 0.5 means that it's 12pm, and therefore half of the day.
This is just an example, but my real data covers all the 365 days of year 2006.
Since all your dates are from the same year (2006) this should be pretty easy:
julian(date, origin = as.POSIXct('2006-01-01', tz = 'GMT'))
If you or another reader happen to expand your dataset to other years, then you can set the origin for the beginning of each year as follows:
sapply(date, function(x) julian(x, origin = as.POSIXct(paste0(format(x, "%Y"),'-01-01'), tz = 'GMT')))
Have a look at the difftime function:
> unclass(difftime('2006-12-12 12:00:00', '2006-01-01 00:00:00', tz="GMT", units = "days"))
[1] 345.5
attr(,"units")
[1] "days"
A function to convert POSIX to julian day, an extension of the answer above, source it before using.
julian_conv <- function(x) {
if (is.na(x)) { # Because julian() cannot accept NA values
return(NA)
}
else {
j <-julian(x, origin = as.POSIXlt(paste0(format(x, "%Y"),'-01-01')))
temp <- unclass(j) # To unclass the object julian day to extract julian day
return(temp[1] + 1) # Because Julian day 1 is 1 e.g., 2016-01-01
}
}
Example:
date <- as.POSIXct('2006-12-12 12:00:00')
julian_conv(date)
#[1] 345.5
I have two dates let´s say 14.01.2013 and 26.03.2014.
I would like to get the difference between those two dates in terms of weeks(?), months(in the example 14), quarters(4) and years(1).
Do you know the best way to get this?
what about this:
# get difference between dates `"01.12.2013"` and `"31.12.2013"`
# weeks
difftime(strptime("26.03.2014", format = "%d.%m.%Y"),
strptime("14.01.2013", format = "%d.%m.%Y"),units="weeks")
Time difference of 62.28571 weeks
# months
(as.yearmon(strptime("26.03.2014", format = "%d.%m.%Y"))-
as.yearmon(strptime("14.01.2013", format = "%d.%m.%Y")))*12
[1] 14
# quarters
(as.yearqtr(strptime("26.03.2014", format = "%d.%m.%Y"))-
as.yearqtr(strptime("14.01.2013", format = "%d.%m.%Y")))*4
[1] 4
# years
year(strptime("26.03.2014", format = "%d.%m.%Y"))-
year(strptime("14.01.2013", format = "%d.%m.%Y"))
[1] 1
as.yearmon() and as.yearqtr() are in package zoo. year() is in package lubridate.
What do you think?
All the existing answers are imperfect (IMO) and either make assumptions about the desired output or don't provide flexibility for the desired output.
Based on the examples from the OP, and the OP's stated expected answers, I think these are the answers you are looking for (plus some additional examples that make it easy to extrapolate).
(This only requires base R and doesn't require zoo or lubridate)
Convert to Datetime Objects
date_strings = c("14.01.2013", "26.03.2014")
datetimes = strptime(date_strings, format = "%d.%m.%Y") # convert to datetime objects
Difference in Days
You can use the diff in days to get some of our later answers
diff_in_days = difftime(datetimes[2], datetimes[1], units = "days") # days
diff_in_days
#Time difference of 435.9583 days
Difference in Weeks
Difference in weeks is a special case of units = "weeks" in difftime()
diff_in_weeks = difftime(datetimes[2], datetimes[1], units = "weeks") # weeks
diff_in_weeks
#Time difference of 62.27976 weeks
Note that this is the same as dividing our diff_in_days by 7 (7 days in a week)
as.double(diff_in_days)/7
#[1] 62.27976
Difference in Years
With similar logic, we can derive years from diff_in_days
diff_in_years = as.double(diff_in_days)/365 # absolute years
diff_in_years
#[1] 1.194406
You seem to be expecting the diff in years to be "1", so I assume you just want to count absolute calendar years or something, which you can easily do by using floor()
# get desired output, given your definition of 'years'
floor(diff_in_years)
#[1] 1
Difference in Quarters
# get desired output for quarters, given your definition of 'quarters'
floor(diff_in_years * 4)
#[1] 4
Difference in Months
Can calculate this as a conversion from diff_years
# months, defined as absolute calendar months (this might be what you want, given your question details)
months_diff = diff_in_years*12
floor(month_diff)
#[1] 14
I know this question is old, but given that I still had to solve this problem just now, I thought I would add my answers. Hope it helps.
For weeks, you can use function difftime:
date1 <- strptime("14.01.2013", format="%d.%m.%Y")
date2 <- strptime("26.03.2014", format="%d.%m.%Y")
difftime(date2,date1,units="weeks")
Time difference of 62.28571 weeks
But difftime doesn't work with duration over weeks.
The following is a very suboptimal solution using cut.POSIXt for those durations but you can work around it:
seq1 <- seq(date1,date2, by="days")
nlevels(cut(seq1,"months"))
15
nlevels(cut(seq1,"quarters"))
5
nlevels(cut(seq1,"years"))
2
This is however the number of months, quarters or years spanned by your time interval and not the duration of your time interval expressed in months, quarters, years (since those do not have a constant duration). Considering the comment you made on #SvenHohenstein answer I would think you can use nlevels(cut(seq1,"months")) - 1 for what you're trying to achieve.
I just wrote this for another question, then stumbled here.
library(lubridate)
#' Calculate age
#'
#' By default, calculates the typical "age in years", with a
#' \code{floor} applied so that you are, e.g., 5 years old from
#' 5th birthday through the day before your 6th birthday. Set
#' \code{floor = FALSE} to return decimal ages, and change \code{units}
#' for units other than years.
#' #param dob date-of-birth, the day to start calculating age.
#' #param age.day the date on which age is to be calculated.
#' #param units unit to measure age in. Defaults to \code{"years"}. Passed to \link{\code{duration}}.
#' #param floor boolean for whether or not to floor the result. Defaults to \code{TRUE}.
#' #return Age in \code{units}. Will be an integer if \code{floor = TRUE}.
#' #examples
#' my.dob <- as.Date('1983-10-20')
#' age(my.dob)
#' age(my.dob, units = "minutes")
#' age(my.dob, floor = FALSE)
age <- function(dob, age.day = today(), units = "years", floor = TRUE) {
calc.age = interval(dob, age.day) / duration(num = 1, units = units)
if (floor) return(as.integer(floor(calc.age)))
return(calc.age)
}
Usage examples:
my.dob <- as.Date('1983-10-20')
age(my.dob)
# [1] 31
age(my.dob, floor = FALSE)
# [1] 31.15616
age(my.dob, units = "minutes")
# [1] 16375680
age(seq(my.dob, length.out = 6, by = "years"))
# [1] 31 30 29 28 27 26
Here the still lacking lubridate answer (although Gregor's function is built on this package)
The lubridate timespan documentation is very helpful for understanding the difference between periods and duration. I also like the lubridate cheatsheet and this very useful thread
library(lubridate)
dates <- c(dmy('14.01.2013'), dmy('26.03.2014'))
span <- dates[1] %--% dates[2] #creating an interval object
#creating period objects
as.period(span, unit = 'year')
#> [1] "1y 2m 12d 0H 0M 0S"
as.period(span, unit = 'month')
#> [1] "14m 12d 0H 0M 0S"
as.period(span, unit = 'day')
#> [1] "436d 0H 0M 0S"
Periods do not accept weeks as units. But you can convert durations to weeks:
as.duration(span)/ dweeks(1)
#makes duration object (in seconds) and divides by duration of a week (in seconds)
#> [1] 62.28571
Created on 2019-11-04 by the reprex package (v0.3.0)
Here's a solution:
dates <- c("14.01.2013", "26.03.2014")
# Date format:
dates2 <- strptime(dates, format = "%d.%m.%Y")
dif <- diff(as.numeric(dates2)) # difference in seconds
dif/(60 * 60 * 24 * 7) # weeks
[1] 62.28571
dif/(60 * 60 * 24 * 30) # months
[1] 14.53333
dif/(60 * 60 * 24 * 30 * 3) # quartes
[1] 4.844444
dif/(60 * 60 * 24 * 365) # years
[1] 1.194521
This is a simple way to find out the difference in years with the lubridate package:
as.numeric(as.Date("14-03-2013", format = "%d-%m-%Y") %--% as.Date("23-03-2014", format = "%d-%m-%Y"), "years")
This returns 1.023956
You can use floor() if you don't want the decimals.
try this for a months solution
StartDate <- strptime("14 January 2013", "%d %B %Y")
EventDates <- strptime(c("26 March 2014"), "%d %B %Y")
difftime(EventDates, StartDate)
A more "precise" calculation. That is, the number of week/month/quarter/year for a non-complete week/month/quarter/year is the fraction of calendar days in that week/month/quarter/year. For example, the number of months between 2016-02-22 and 2016-03-31 is 8/29 + 31/31 = 1.27586
explanation inline with code
#' Calculate precise number of periods between 2 dates
#'
#' #details The number of week/month/quarter/year for a non-complete week/month/quarter/year
#' is the fraction of calendar days in that week/month/quarter/year.
#' For example, the number of months between 2016-02-22 and 2016-03-31
#' is 8/29 + 31/31 = 1.27586
#'
#' #param startdate start Date of the interval
#' #param enddate end Date of the interval
#' #param period character. It must be one of 'day', 'week', 'month', 'quarter' and 'year'
#'
#' #examples
#' identical(numPeriods(as.Date("2016-02-15"), as.Date("2016-03-31"), "month"), 15/29 + 1)
#' identical(numPeriods(as.Date("2016-02-15"), as.Date("2016-03-31"), "quarter"), (15 + 31)/(31 + 29 + 31))
#' identical(numPeriods(as.Date("2016-02-15"), as.Date("2016-03-31"), "year"), (15 + 31)/366)
#'
#' #return exact number of periods between
#'
numPeriods <- function(startdate, enddate, period) {
numdays <- as.numeric(enddate - startdate) + 1
if (grepl("day", period, ignore.case=TRUE)) {
return(numdays)
} else if (grepl("week", period, ignore.case=TRUE)) {
return(numdays / 7)
}
#create a sequence of dates between start and end dates
effDaysinBins <- cut(seq(startdate, enddate, by="1 day"), period)
#use the earliest start date of the previous bins and create a breaks of periodic dates with
#user's period interval
intervals <- seq(from=as.Date(min(levels(effDaysinBins)), "%Y-%m-%d"),
by=paste("1",period),
length.out=length(levels(effDaysinBins))+1)
#create a sequence of dates between the earliest interval date and last date of the interval
#that contains the enddate
allDays <- seq(from=intervals[1],
to=intervals[intervals > enddate][1] - 1,
by="1 day")
#bin all days in the whole period using previous breaks
allDaysInBins <- cut(allDays, intervals)
#calculate ratio of effective days to all days in whole period
sum( tabulate(effDaysinBins) / tabulate(allDaysInBins) )
} #numPeriods
Please let me know if you find more boundary cases where the above solution does not work.
I'm working with some time data and I'm having problems converting a time difference to years and months.
My data looks more or less like this,
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
First I subtract the data of birth (DOB) form today's data (Today).
dfn$ageToday <- dfn$Today - dfn$DOB
This gives me the Time difference in days.
dfn$ageToday
Time differences in days
[1] 1875.866 1872.866 1869.866 1866.866 1863.866
[6] 1860.866 1857.866 1854.866 1851.866 1848.866
[11] 1845.866 1842.866 1839.866 1836.866 1833.866
[16] 1830.866 1827.866 1824.866 1821.866 1818.866
[21] 1815.866 1812.866 1809.866 1806.866 1803.866
[26] 1800.866
attr(,"tzone")
[1] ""
This is where first part of my question comes in; how do I convert this difference to years and months (rounded to months)? (i.e. 4.7, 4.11, etc.)
I read the ?difftime man page and the ?format, but I did not figure it out.
Any help would be appreciated.
Furthermore, I would like to melt my final object and if I try using melt on the data frame above using this command,
require(plyr)
require(reshape)
mdfn <- melt(dfn, id=c('Patient'))
I get this strange warning I haven't see before
Error in as.POSIXct.default(value) :
do not know how to convert 'value' to class "POSIXct"
So, my second question is; how do I create a time diffrence I can melt alongside my POSIXct variables? If I melt without dfn$ageToday everything works like a charm.
Thanks, Eric
The lubridatepackage makes working with dates and times, including finding time differences, really easy.
library("lubridate")
library("reshape2")
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
dfn$diff <- new_interval(dfn$DOB, dfn$Today) / duration(num = 1, units = "years")
mdfn <- melt(dfn, id=c('Patient'))
class(mdfn$value) # all values are coerced into numeric
The new_interval() function calculates the time difference between two dates. Note that there is a function today() that could substitute for your use of Sys.time. Finally note the duration() function that creates a standard, ehm, duration that you can use to divide the interval by a length of standard units, in this case, a unit of one year.
In case you want to preserve the contents of Today and DOB, then you may want to convert everything to character first and reconvert later...
library("lubridate")
library("reshape2")
dfn <- data.frame(
Today = Sys.time(),
DOB = seq(as.POSIXct('2007-03-27 00:00:01'), len= 26, by="3 day"),
Patient = factor(1:26, labels = LETTERS))
# Create standard durations for a year and a month
one.year <- duration(num = 1, units = "years")
one.month <- duration(num = 1, units = "months")
# Calculate the difference in years as float and integer
dfn$diff.years <- new_interval(dfn$DOB, dfn$Today) / one.year
dfn$years <- floor( new_interval(dfn$DOB, dfn$Today) / one.year )
# Calculate the modulo for number of months
dfn$diff.months <- round( new_interval(dfn$DOB, dfn$Today) / one.month )
dfn$months <- dfn$diff.months %% 12
# Paste the years and months together
# I am not using the decimal point so as not to imply this is
# a numeric representation of the diference
dfn$y.m <- paste(dfn$years, dfn$months, sep = '|')
# convert Today and DOB to character so as to preserve them in melting
dfn$Today <- as.character(dfn$Today)
dfn$DOB <- as.character(dfn$DOB)
# melt using string representation of difference between the two dates
dfn2 <- dfn[,c("Today", "DOB", "Patient", "y.m")]
mdfn2 <- melt(dfn2, id=c('Patient'))
# alternative melt using numeric representation of difference in years
dfn3 <- dfn[,c("Today", "DOB", "Patient", "diff.years")]
mdfn3 <- melt(dfn3, id=c('Patient'))