Mean of date interval in lubridate

Mean of date interval in lubridate - r

How do you get the 'mean' (or more accurately the midpoint) of a lubridate interval object? I've tried using the base mean() function but that return back a double.
library(lubridate)
ex = interval(ymd("2009-05-01"), ymd("2009-07-01"))
mean(ex)
[1] 5270400

One option would be to just add to the starting date the half of the duration of interval. Try out:
ex#start + as.duration(ex)/2
[1] "2009-05-31 12:00:00 UTC"

You could use the int_start() and int_end() functions to calculate the midpoint of an interval:
library(lubridate)
int_start(ex) + (int_end(ex) - int_start(ex))/2
[1] "2009-05-31 12:00:00 UTC"
You could make this into a simple function for simplicity:
int_midpoint <- function(interval) {
int_start(interval) + (int_end(interval) - int_start(interval))/2
}
int_midpoint(ex)
[1] "2009-05-31 12:00:00 UTC"
This function will also work with lapply (though not sapply) on a vector:
lapply(c(ex, ex), int_midpoint)
[[1]]
[1] "2009-05-31 12:00:00 UTC"
[[2]]
[1] "2009-05-31 12:00:00 UTC"

Related

Convert number to date and datetime in R?

I have these numbers:
-44384.520833333299 (to datetime). It should be 07/07/2021 12:30:00
-44384 (to date). It should be 07/07/2021
How can I convert these numbers and a list of them in R?

openxlsx::convertToDateTime(44384.520833333299)
# [1] "2021-07-07 12:30:00 CEST"
openxlsx::convertToDate(44384)
# [1] "2021-07-07"

Another possibility, using lubridate:
library(lubridate)
# The first argument must be in seconds
as_datetime(3600*24*44384.520833333299, origin="1899-12-30")
#> [1] "2021-07-07 12:29:59 UTC"
as_date(44384, origin="1899-12-30")
#> [1] "2021-07-07"

Why does the lubridate::ymd_hms function add an NA observation when the "silent" argument is set TRUE?

Could any one explain why the "silent=T" argument triggers a warning and an NA observation, and tell me how to avoid this?
x <- c("2010-04-14-04-35-59", "20100401120000")
ymd_hms(x, silent=T)
[1] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC" NA
Warning message:
1 failed to parse.
R version 3.4.0, lubridate version 1.6.0

Here, lubridate tries to evaluate "silent=T" as a date format, the argument for removing message being quiet.
lubridate::ymd_hms(x, quiet=TRUE)
[1] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
This is because you can pass vector inside a lubridate function :
x <- c("2010-04-14-04-35-59", "20100401120000")
y <- c("2010-04-14-04-35-59", "20100401120000")
z <- c("2010-04-14-04-35-59", "20100401120000")
lubridate::ymd_hms(x, y, z)
[1] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
[3] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
[5] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
So here, with silent=T, you're telling lubridate that silent=T is a vector to parse. Hence the NA.

I faced this issue for cases where the format is different. Please see that all the dates are following the same format. Using parse_date_time() can solve this problem.
parse_date_time(df$date, c("y/m/d","y/m/d HMS","m/d/y","m/d/y HM"))
Please be sure that the date format is contained in the list.

Handle durations but no dates

What is the best way to mainpulate only durations in R ? I mean I have a string vector like:
> test
[1] "00:04:06" "00:04:02" "00:04:16" "00:03:51" "00:03:55"
and I want to convert it to some specific class, which will understand these durations. I know I can use for example strptime:
> strptime(test, format = '%H:%M:%S')
[1] "2016-05-02 00:04:06 UTC" "2016-05-02 00:04:02 UTC" "2016-05-02 00:04:16 UTC" "2016-05-02 00:03:51 UTC" "2016-05-02 00:03:55 UTC"
but this creates a real dates vectors with today's date. I'd like to avoid it since this can cause troubles in the future for my application and this is a 'wrong' info.

Code:
require(lubridate)
test<-c("00:04:06", "00:04:02", "00:04:16", "00:03:51", "00:03:55")
t2<-lapply(test,lubridate::hms)
as.numeric(unlist(t2))
Output:
[1] 6 2 16 51 55

Why does lubridate appear to change time zones for two dates combined into a vector?

I am seeing an unexpected result when using the lubridate package in R. I am simply trying to combine two dates into a vector. When I do so, the time zone changes. What is happening here?
> x <- ymd("2016-02-08")
> y <- ymd("2016-03-29")
> x
[1] "2016-02-08 UTC"
> y
[1] "2016-03-29 UTC"
> c(x,y)
[1] "2016-02-07 18:00:00 CST" "2016-03-28 19:00:00 CDT"

Using c() will remove the timezone attribute. Hence you have to reassign it:
xy <- c(x,y)
attr(xy, "tzone") <- "UTC"
> xy
[1] "2016-02-08 UTC" "2016-03-29 UTC"
Source and more information: Peter Ehlers on R Help

vector of POSIXct and sapply

What if you want to apply a function other than format to a list of POSIXct objects? For instance, say I want to take a vector of times, truncate those times to the hour, and apply an arbitrary function to each one of those times.
> obs.times=as.POSIXct(c('2010-01-02 12:37:45','2010-01-02 08:45:45','2010-01-09 14:45:53'))
> obs.truncated=trunc(obs.times, units="hours")
> obs.truncated
[1] "2010-01-02 12:00:00 EST" "2010-01-02 08:00:00 EST"
[3] "2010-01-09 14:00:00 EST"
Now, I would expect the length of obs.truncated to be 3 but
> length(obs.truncated)
[1] 9
So you can see that trying to apply a function to this vector is not going to work. The class of obs.truncated is
> class(obs.truncated)
[1] "POSIXt" "POSIXlt"
Any idea what is going on here? apply and length appear to be taking the first element of the vector as its own list.

The length() of such a POSIXlt used to be reported as nine, but that got recently corrected.
Also, when I do trunc(obs.times) the wrong thing happens -- trunc() operates only once on a string of three elements. you do need apply() et al.
So here is an example of using sapply() with component-wise resetting:
> sapply(obs.times, function(.) {
+ p <- as.POSIXlt(.);
+ p$min <- p$sec <- 0;
+ format(p) })
[1] "2010-01-02 12:00:00" "2010-01-02 08:00:00" "2010-01-09 14:00:00"
>
Whereas
> trunc(obs.times, units="hours")
[1] "2010-01-02 12:00:00 CST" "2010-01-02 08:00:00 CST"
[3] "2010-01-09 14:00:00 CST"
> class(trunc(obs.times, units="hours"))
[1] "POSIXt" "POSIXlt"
> length(trunc(obs.times, units="hours"))
[1] 1
>

Categories

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Mean of date interval in lubridate - r

How do you get the 'mean' (or more accurately the midpoint) of a lubridate interval object? I've tried using the base mean() function but that return back a double. library(lubridate) ex = interval(ymd("2009-05-01"), ymd("2009-07-01")) mean(ex) [1] 5270400

One option would be to just add to the starting date the half of the duration of interval. Try out: ex#start + as.duration(ex)/2 [1] "2009-05-31 12:00:00 UTC"

Related

Convert number to date and datetime in R?

Why does the lubridate::ymd_hms function add an NA observation when the "silent" argument is set TRUE?

Handle durations but no dates

Why does lubridate appear to change time zones for two dates combined into a vector?

vector of POSIXct and sapply

Categories

Resources