How do you get the 'mean' (or more accurately the midpoint) of a lubridate interval object? I've tried using the base mean() function but that return back a double.
library(lubridate)
ex = interval(ymd("2009-05-01"), ymd("2009-07-01"))
mean(ex)
[1] 5270400
One option would be to just add to the starting date the half of the duration of interval. Try out:
ex#start + as.duration(ex)/2
[1] "2009-05-31 12:00:00 UTC"
You could use the int_start() and int_end() functions to calculate the midpoint of an interval:
library(lubridate)
int_start(ex) + (int_end(ex) - int_start(ex))/2
[1] "2009-05-31 12:00:00 UTC"
You could make this into a simple function for simplicity:
int_midpoint <- function(interval) {
int_start(interval) + (int_end(interval) - int_start(interval))/2
}
int_midpoint(ex)
[1] "2009-05-31 12:00:00 UTC"
This function will also work with lapply (though not sapply) on a vector:
lapply(c(ex, ex), int_midpoint)
[[1]]
[1] "2009-05-31 12:00:00 UTC"
[[2]]
[1] "2009-05-31 12:00:00 UTC"
Related
I have these numbers:
-44384.520833333299 (to datetime). It should be 07/07/2021 12:30:00
-44384 (to date). It should be 07/07/2021
How can I convert these numbers and a list of them in R?
openxlsx::convertToDateTime(44384.520833333299)
# [1] "2021-07-07 12:30:00 CEST"
openxlsx::convertToDate(44384)
# [1] "2021-07-07"
Another possibility, using lubridate:
library(lubridate)
# The first argument must be in seconds
as_datetime(3600*24*44384.520833333299, origin="1899-12-30")
#> [1] "2021-07-07 12:29:59 UTC"
as_date(44384, origin="1899-12-30")
#> [1] "2021-07-07"
Could any one explain why the "silent=T" argument triggers a warning and an NA observation, and tell me how to avoid this?
x <- c("2010-04-14-04-35-59", "20100401120000")
ymd_hms(x, silent=T)
[1] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC" NA
Warning message:
1 failed to parse.
R version 3.4.0, lubridate version 1.6.0
Here, lubridate tries to evaluate "silent=T" as a date format, the argument for removing message being quiet.
lubridate::ymd_hms(x, quiet=TRUE)
[1] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
This is because you can pass vector inside a lubridate function :
x <- c("2010-04-14-04-35-59", "20100401120000")
y <- c("2010-04-14-04-35-59", "20100401120000")
z <- c("2010-04-14-04-35-59", "20100401120000")
lubridate::ymd_hms(x, y, z)
[1] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
[3] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
[5] "2010-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
So here, with silent=T, you're telling lubridate that silent=T is a vector to parse. Hence the NA.
I faced this issue for cases where the format is different. Please see that all the dates are following the same format. Using parse_date_time() can solve this problem.
parse_date_time(df$date, c("y/m/d","y/m/d HMS","m/d/y","m/d/y HM"))
Please be sure that the date format is contained in the list.
What is the best way to mainpulate only durations in R ? I mean I have a string vector like:
> test
[1] "00:04:06" "00:04:02" "00:04:16" "00:03:51" "00:03:55"
and I want to convert it to some specific class, which will understand these durations. I know I can use for example strptime:
> strptime(test, format = '%H:%M:%S')
[1] "2016-05-02 00:04:06 UTC" "2016-05-02 00:04:02 UTC" "2016-05-02 00:04:16 UTC" "2016-05-02 00:03:51 UTC" "2016-05-02 00:03:55 UTC"
but this creates a real dates vectors with today's date. I'd like to avoid it since this can cause troubles in the future for my application and this is a 'wrong' info.
Code:
require(lubridate)
test<-c("00:04:06", "00:04:02", "00:04:16", "00:03:51", "00:03:55")
t2<-lapply(test,lubridate::hms)
as.numeric(unlist(t2))
Output:
[1] 6 2 16 51 55
I am seeing an unexpected result when using the lubridate package in R. I am simply trying to combine two dates into a vector. When I do so, the time zone changes. What is happening here?
> x <- ymd("2016-02-08")
> y <- ymd("2016-03-29")
> x
[1] "2016-02-08 UTC"
> y
[1] "2016-03-29 UTC"
> c(x,y)
[1] "2016-02-07 18:00:00 CST" "2016-03-28 19:00:00 CDT"
Using c() will remove the timezone attribute. Hence you have to reassign it:
xy <- c(x,y)
attr(xy, "tzone") <- "UTC"
> xy
[1] "2016-02-08 UTC" "2016-03-29 UTC"
Source and more information: Peter Ehlers on R Help
What if you want to apply a function other than format to a list of POSIXct objects? For instance, say I want to take a vector of times, truncate those times to the hour, and apply an arbitrary function to each one of those times.
> obs.times=as.POSIXct(c('2010-01-02 12:37:45','2010-01-02 08:45:45','2010-01-09 14:45:53'))
> obs.truncated=trunc(obs.times, units="hours")
> obs.truncated
[1] "2010-01-02 12:00:00 EST" "2010-01-02 08:00:00 EST"
[3] "2010-01-09 14:00:00 EST"
Now, I would expect the length of obs.truncated to be 3 but
> length(obs.truncated)
[1] 9
So you can see that trying to apply a function to this vector is not going to work. The class of obs.truncated is
> class(obs.truncated)
[1] "POSIXt" "POSIXlt"
Any idea what is going on here? apply and length appear to be taking the first element of the vector as its own list.
The length() of such a POSIXlt used to be reported as nine, but that got recently corrected.
Also, when I do trunc(obs.times) the wrong thing happens -- trunc() operates only once on a string of three elements. you do need apply() et al.
So here is an example of using sapply() with component-wise resetting:
> sapply(obs.times, function(.) {
+ p <- as.POSIXlt(.);
+ p$min <- p$sec <- 0;
+ format(p) })
[1] "2010-01-02 12:00:00" "2010-01-02 08:00:00" "2010-01-09 14:00:00"
>
Whereas
> trunc(obs.times, units="hours")
[1] "2010-01-02 12:00:00 CST" "2010-01-02 08:00:00 CST"
[3] "2010-01-09 14:00:00 CST"
> class(trunc(obs.times, units="hours"))
[1] "POSIXt" "POSIXlt"
> length(trunc(obs.times, units="hours"))
[1] 1
>