How to calculate mean of two timestamp columns in R? - r

I have data frame in R, where two columns are datetimes (POSIX class). I need to calculate mean datetime by each row.
Here's some reproducible example:
a <- c(
"2018-10-11 15:22:17",
"2018-10-10 16:30:37",
"2018-10-10 16:52:46",
"2018-10-10 16:58:33",
"2018-10-10 16:32:24")
b <- c(
"2018-10-11 15:25:12",
"2018-10-10 16:30:39",
"2018-10-10 16:55:14",
"2018-10-10 16:58:53",
"2018-10-10 16:32:27")
a <- strptime(a, format = "%Y-%m-%d %H:%M:%S")
b <- strptime(b, format = "%Y-%m-%d %H:%M:%S")
f <- data.frame(a, b)
The results should be like that:
a b time_mean
1 2018-10-11 15:22:17 2018-10-11 15:25:12 2018-10-11 15:23:44
2 2018-10-10 16:30:37 2018-10-10 16:30:39 2018-10-10 16:30:38
3 2018-10-10 16:52:46 2018-10-10 16:55:14 2018-10-10 16:54:00
4 2018-10-10 16:58:33 2018-10-10 16:58:53 2018-10-10 16:58:43
5 2018-10-10 16:32:24 2018-10-10 16:32:27 2018-10-10 16:32:25
I tried following:
apply(f, 1, function(x) mean)
apply(f, 1, function(x) mean(c(x[1], x[2])))

Instead of using apply (which can convert it to a matrix and then strip off the class attributes), use Map
f$time_mean <- do.call(c, Map(function(x, y) mean(c(x, y)), a, b))
f$time_mean
#[1] "2018-10-11 15:23:44 EDT" "2018-10-10 16:30:38 EDT" "2018-10-10 16:54:00 EDT" "2018-10-10 16:58:43 EDT"
#[5] "2018-10-10 16:32:25 EDT"
Or as it is from data.frame f
do.call(c, Map(function(x, y) mean(c(x, y)), f$a, f$b))
Also, another option is converting to numeric class with ?xtfrm (that also has POSIXlt method dispatch), do the rowMeans and convert to DateTime class as in #jay.sf's post
as.POSIXlt(rowMeans(sapply(f, xtfrm)), origin = "1970-01-01")
#[1] "2018-10-11 15:23:44 EDT" "2018-10-10 16:30:38 EDT" "2018-10-10 16:54:00 EDT" "2018-10-10 16:58:43 EDT"
#[5] "2018-10-10 16:32:25 EDT"

You could calculate with the numerics.
f$time_mean <- as.POSIXct(sapply(seq(nrow(f)), function(x)
mean(as.numeric(f[x, ]))), origin="1970-01-01")
f
# a b time_mean
# 1 2018-10-11 15:22:17 2018-10-11 15:25:12 2018-10-11 15:23:44
# 2 2018-10-10 16:30:37 2018-10-10 16:30:39 2018-10-10 16:30:38
# 3 2018-10-10 16:52:46 2018-10-10 16:55:14 2018-10-10 16:54:00
# 4 2018-10-10 16:58:33 2018-10-10 16:58:53 2018-10-10 16:58:43
# 5 2018-10-10 16:32:24 2018-10-10 16:32:27 2018-10-10 16:32:25

Related

How does as.POSIXct "tryFormats" work with multiple input types?

I have been confused about how as.POSIXct tries different formats, and why does it work in some cases and not in others.
In the following reprex, there are dates formated %m/%d/%Y %H:%M and others %m/%d/%Y %H:%M:%S. As expected, if using the format option, only the dates fitting that format will be successfully returned. I would have expected that using tryFormats with both formats would successfully convert all dates. Why is it not the case? Why does lubridate returns what is expected?
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
options(lubridate.verbose = TRUE)
datetimes = c("3/1/2015 23:44",
"3/2/2015 1:01",
"3/29/2015 0:56",
"3/29/2015 1:00",
"3/29/2015 00:56:01")
as.POSIXct(datetimes, format = "%m/%d/%Y %H:%M")
#> [1] "2015-03-01 23:44:00 GMT" "2015-03-02 01:01:00 GMT"
#> [3] "2015-03-29 00:56:00 GMT" NA
#> [5] "2015-03-29 00:56:00 GMT"
as.POSIXct(datetimes, format = "%m/%d/%Y %H:%M:%S")
#> [1] NA NA
#> [3] NA NA
#> [5] "2015-03-29 00:56:01 GMT"
as.POSIXct(datetimes,
tryFormat = c("%m/%d/%Y %H:%M:%S", "%m/%d/%Y %H:%M"),
optional = T)
#> [1] NA NA NA NA NA
as.POSIXct(datetimes,
tryFormat = c("%m/%d/%Y %H:%M", "%m/%d/%Y %H:%M:%S"),
optional = T)
#> [1] NA NA NA NA NA
mdy_hms(datetimes, truncated = 1)
#> 1 parsed with %Om/%d/%Y %H:%M:%S
#> 0 parsed with %m/%d/%Y %H:%M:%S
#> 4 parsed with %Om/%d/%Y %H:%M
#> [1] "2015-03-01 23:44:00 UTC" "2015-03-02 01:01:00 UTC"
#> [3] "2015-03-29 00:56:00 UTC" "2015-03-29 01:00:00 UTC"
#> [5] "2015-03-29 00:56:01 UTC"
Created on 2022-03-21 by the reprex package (v2.0.1)

Convert two columns into date and time in r

I know this question has been asked over and over again. But this time, the problem is a little different.
a<-matrix(c("01-02-2014", "02-02-2014", "03-02-2014",
"04-02-2014","05-02-2014","0 1", "0 2", "0 3", "0 4","0 5"),nrow=5)
a<-data.frame(a)
names(a)<-c("date","time")
a$date<-as.Date(a$date, format="%d-%m-%Y")
So now I get this data frame.
date time
1 2014-02-01 0 1
2 2014-02-02 0 2
3 2014-02-03 0 3
4 2014-02-04 0 4
5 2014-02-05 0 5
As you can see, the time is the minute of the day, but it is not in typical 00:00 form so the R doesnt recognize it as time, so my question is how do i tranform the time column into a 00:00 format so i can merge with date column to form %Y%m%d %H:%M ??
We can use sprintf after splitting the 'time' column by space (" ") to get the required format
a$time <- sapply(strsplit(as.character(a$time), " "),
function(x) do.call(sprintf, c(fmt = "%02d:%02d", as.list(as.numeric(x)))))
a$time
#[1] "00:01" "00:02" "00:03" "00:04" "00:05"
Then, paste the columns and convert to POSIXct
as.POSIXct(paste(a$date, a$time))
#[1] "2014-02-01 00:01:00 EST" "2014-02-02 00:02:00 EST"
#[3] "2014-02-03 00:03:00 EST" "2014-02-04 00:04:00 EST"
#[5] "2014-02-05 00:05:00 EST"
Or using lubridate we can directly convert it to POSIXct without formatting the 'time' column
library(lubridate)
ymd_hm(paste(a$date, a$time), tz = "EST")
#[1] "2014-02-01 00:01:00 EST" "2014-02-02 00:02:00 EST"
#[3] "2014-02-03 00:03:00 EST" "2014-02-04 00:04:00 EST"
#[5] "2014-02-05 00:05:00 EST"
you do not need to do anything. Just use it the way it is:
strptime(do.call(paste,a),"%Y-%m-%d %H %M","UTC")
[1] "2014-02-01 00:01:00 UTC" "2014-02-02 00:02:00 UTC"
[3] "2014-02-03 00:03:00 UTC" "2014-02-04 00:04:00 UTC"
[5] "2014-02-05 00:05:00 UTC"
or just even
strptime(paste(a$date,a$time),"%Y-%m-%d %H %M")
[1] "2014-02-01 00:01:00 PST" "2014-02-02 00:02:00 PST"
[3] "2014-02-03 00:03:00 PST" "2014-02-04 00:04:00 PST"
[5] "2014-02-05 00:05:00 PST"

How can I flag a regular time series with a irregular error sign in R?

I have a straight sequence of time series, for example:
library(lubridate)
start = parse_date_time("2018-01-01","%Y-%m-%d")
end = parse_date_time("2018-01-02","%Y-%m-%d")
series = seq(start,end,by=600)
> series
[1] "2018-01-01 00:00:00 UTC" "2018-01-01 00:10:00 UTC" "2018-01-01 00:20:00 UTC" "2018-01-01 00:30:00 UTC"
[5] "2018-01-01 00:40:00 UTC" "2018-01-01 00:50:00 UTC" "2018-01-01 01:00:00 UTC" "2018-01-01 01:10:00 UTC"
[9] "2018-01-01 01:20:00 UTC" "2018-01-01 01:30:00 UTC" "2018-01-01 01:40:00 UTC" "2018-01-01 01:50:00 UTC"
[13] "2018-01-01 02:00:00 UTC" "2018-01-01 02:10:00 UTC" "2018-01-01 02:20:00 UTC" "2018-01-01 02:30:00 UTC"...
And I also have a vector of irregular status, for example:
error = data.frame(
on = parse_date_time(c("2018-01-01 00:13:57","2018-01-01 01:01:44"),"%Y-%m-%d %H:%M:%S"),
off = parse_date_time(c("2018-01-01 00:21:32","2018-01-01 02:33:45"),"%Y-%m-%d %H:%M:%S")
)
> error
on off
1 2018-01-01 00:13:57 2018-01-01 00:21:32
2 2018-01-01 01:01:44 2018-01-01 02:33:45
How can I flag my series with the error just like below?
> flag
series error
[1] "2018-01-01 00:00:00 UTC" "OK"
[2] "2018-01-01 00:10:00 UTC" "OK"
[3] "2018-01-01 00:20:00 UTC" "ERROR"
[4] "2018-01-01 00:30:00 UTC" "ERROR"
[5] "2018-01-01 00:40:00 UTC" "OK"
[6] "2018-01-01 00:50:00 UTC" "OK"
[7] "2018-01-01 01:00:00 UTC" "OK"
[8] "2018-01-01 01:10:00 UTC" "ERROR"
[9] "2018-01-01 01:20:00 UTC" "ERROR"
[10] "2018-01-01 01:30:00 UTC" "ERROR"
[11] "2018-01-01 01:40:00 UTC" "ERROR"
[12] "2018-01-01 01:50:00 UTC" "ERROR"
[13] "2018-01-01 02:00:00 UTC" "ERROR"
[14] "2018-01-01 02:10:00 UTC" "ERROR"
[15] "2018-01-01 02:20:00 UTC" "ERROR"
[16] "2018-01-01 02:30:00 UTC" "ERROR"
[17] "2018-01-01 02:40:00 UTC" "ERROR"
[18] "2018-01-01 02:50:00 UTC" "OK"
Here is a solution using map_lgl, because lubridate intervals play funny with dplyr functions for me. Note that I use ceiling_date on off to reproduce your desired output, even though it's not obvious to me why the last row counts as ERROR since, for example, row 4 in the output "2018-01-01 00:30:00 UTC" is after the first off value "2018-01-01 00:21:32". The key parts are simply the creation of intervals with interval (or alternatively, on %--% off) and then the use of any(%within%) to return a logical value for whether a given value in the series is inside one of the error intervals. ifelse lets us convert the values into character flags.
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
start = parse_date_time("2018-01-01","%Y-%m-%d")
end = parse_date_time("2018-01-02","%Y-%m-%d")
series = seq(start,end,by=600)
error = data.frame(
on = parse_date_time(c("2018-01-01 00:13:57","2018-01-01 01:01:44"),"%Y-%m-%d %H:%M:%S"),
off = parse_date_time(c("2018-01-01 00:21:32","2018-01-01 02:33:45"),"%Y-%m-%d %H:%M:%S")
) %>%
mutate(
off = ceiling_date(off, unit = "10 minutes"),
intvs = interval(on, off)
)
series %>%
tibble(dttm = .) %>%
bind_cols(status = map_lgl(series, ~ any(. %within% error$intvs))) %>%
mutate(status = ifelse(status == TRUE, "ERROR", "OK")) %>%
print(n = 20)
#> # A tibble: 145 x 2
#> dttm status
#> <dttm> <chr>
#> 1 2018-01-01 00:00:00 OK
#> 2 2018-01-01 00:10:00 OK
#> 3 2018-01-01 00:20:00 ERROR
#> 4 2018-01-01 00:30:00 ERROR
#> 5 2018-01-01 00:40:00 OK
#> 6 2018-01-01 00:50:00 OK
#> 7 2018-01-01 01:00:00 OK
#> 8 2018-01-01 01:10:00 ERROR
#> 9 2018-01-01 01:20:00 ERROR
#> 10 2018-01-01 01:30:00 ERROR
#> 11 2018-01-01 01:40:00 ERROR
#> 12 2018-01-01 01:50:00 ERROR
#> 13 2018-01-01 02:00:00 ERROR
#> 14 2018-01-01 02:10:00 ERROR
#> 15 2018-01-01 02:20:00 ERROR
#> 16 2018-01-01 02:30:00 ERROR
#> 17 2018-01-01 02:40:00 ERROR
#> 18 2018-01-01 02:50:00 OK
#> 19 2018-01-01 03:00:00 OK
#> 20 2018-01-01 03:10:00 OK
#> # ... with 125 more rows
Created on 2018-03-15 by the reprex package (v0.2.0).

R matrix column to fill with timestamp

I'm trying to fill one matrix column with a date-time called up from another column.
B <- matrix(0, nrow(A) - 1, 3)
B[, 1] <- "Anne"
times <- as.POSIXct(tname$DT[1:955], format = "%Y-%m-%d %H:%M:%S")
B[, 2] <- times
When returning times, it lists them in the format "%Y-%m-%d %H:%M:%S",
[1] "2017-05-19 11:01:00 EDT" "2017-05-19 12:01:00 EDT" "2017-05-19
12:31:00 EDT" "2017-05-19 13:01:00 EDT"
[5] "2017-05-19 13:31:00 EDT" "2017-05-19 14:01:00 EDT" "2017-05-19
14:31:00 EDT" "2017-05-19 15:01:00 EDT"
[9] "2017-05-20 08:01:00 EDT" "2017-05-20 09:01:00 EDT" "2017-05-20
10:01:00 EDT" "2017-05-20 11:01:00 EDT" ....
however, when I call up B[, 2] it gives me weird numbers:
[1] "1495206060" "1495209660" "1495211460" "1495213260" "1495215060"
"1495216860" "1495218660" "1495220460"
[9] "1495281660" "1495285260" "1495288860" "1495292460" "1495296060" ....
How do I copy my dates and times into my matrix in the right format?

Subsetting results from sapply

After I use sapply, I get a list, and I would like to access individual elements of those lists. So far, I have:
large.list <- sapply(1:length(visit_num), function(x)
seq(enter.shift.want[x], to= exit.prime[x], by= 'hour'))
where enter.shift.want and exit.prime are vectors of dates.
head(large.list, 2)
[[1]]
[1] "1982-05-17 13:00:00 PDT" "1982-05-17 14:00:00 PDT" "1982-05-17 15:00:00 PDT"
[4] "1982-05-17 16:00:00 PDT" "1982-05-17 17:00:00 PDT" "1982-05-17 18:00:00 PDT"
[7] "1982-05-17 19:00:00 PDT" "1982-05-17 20:00:00 PDT" "1982-05-17 21:00:00 PDT"
[10] "1982-05-17 22:00:00 PDT"
[[2]]
[1] "1982-07-14 13:00:00 PDT" "1982-07-14 14:00:00 PDT" "1982-07-14 15:00:00 PDT"
[4] "1982-07-14 16:00:00 PDT" "1982-07-14 17:00:00 PDT" "1982-07-14 18:00:00 PDT"
[7] "1982-07-14 19:00:00 PDT" "1982-07-14 20:00:00 PDT" "1982-07-14 21:00:00 PDT"
[10] "1982-07-14 22:00:00 PDT"
I would like to have large.list[1] as a vector of dates/time.
Then I would like to do
large.list[1]<=enter.shift.want[1]
and get a vector of true and false results. Then I would want generalize and do
large.list[n]<= enter.shift.want[n] for each n in (1:length(visit_num)) , and add up the true/falses.
Thanks in advance.
If enter.shift.want is a list or a vector with same number of elements as large.list, here is one way to apply it to the whole list.
res <- Map(`<=`, large.list, enter.shift.want)
res1 <- Map(`<=`, large.list, enter.shift.want1)
To get the total number of TRUE per list element
colSums(do.call(cbind, res))
#[1] 3 3
Or
sapply(res, sum)
#[1] 3 3
sapply(res1,sum)
#[1] 3 7
data
large.list <- list(structure(c(390488400, 390492000, 390495600, 390499200,
390502800, 390506400, 390510000, 390513600, 390517200, 390520800
), class = c("POSIXct", "POSIXt"), tzone = "PDT"), structure(c(395499600,
395503200, 395506800, 395510400, 395514000, 395517600, 395521200,
395524800, 395528400, 395532000), class = c("POSIXct", "POSIXt"
), tzone = "PDT"))
v1 <- c('1982-05-17 00:00:00', '1982-07-14 00:00:00')
enter.shift.want <- lapply(v1, function(x) seq(as.POSIXct(x, tz='PDT'),
length.out=10, by='3 hour'))
enter.shift.want1 <- as.POSIXct(c('1982-05-17 15:00:00',
'1982-07-14 19:00:00'), tz='PDT')

Resources