Collapse multiple rows based on period start and period end time values - gaps-and-islands

I would like to collapse multiple rows based on period start and period end time values, if the period end is same as period start on the next row, should be collapsed showing the min start date and max end date:
IDENTIFIER_1 IDENTIFIER_2 PERIOD_START PERIOD_END
4 44858 2022-07-15 07:45:00.000 2022-07-15 07:50:00.000
4 44858 2022-07-15 08:15:00.000 2022-07-15 08:20:00.000
4 44858 2022-07-15 08:20:00.000 2022-07-15 08:25:00.000
4 44858 2022-07-15 08:25:00.000 2022-07-15 08:30:00.000
4 44858 2022-07-15 08:30:00.000 2022-07-15 08:35:00.000
4 44858 2022-07-15 09:05:00.000 2022-07-15 09:10:00.000
4 44858 2022-07-15 09:10:00.000 2022-07-15 09:15:00.000
4 44858 2022-07-15 10:20:00.000 2022-07-15 10:25:00.000
4 44858 2022-07-15 10:25:00.000 2022-07-15 10:30:00.000
4 44858 2022-07-15 11:15:00.000 2022-07-15 11:20:00.000
So the results should show this data like this:
IDENTIFIER_1 IDENTIFIER_2 PERIOD_START PERIOD_END
4 44858 2022-07-15 07:45:00.000 2022-07-15 07:50:00.000
4 44858 2022-07-15 08:15:00.000 2022-07-15 08:35:00.000
4 44858 2022-07-15 09:05:00.000 2022-07-15 09:15:00.000
4 44858 2022-07-15 10:20:00.000 2022-07-15 10:30:00.000
4 44858 2022-07-15 11:15:00.000 2022-07-15 11:20:00.000

Related

Weird behavior when wrapping purrr::map within dplyr::mutate

I am running into some errors I do not fully understand when trying to call purrr::map around dplyr::mutate. The reproducible code is as follows:
library(purrr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tibble)
# data
test_dset <- structure(list(genus = c("Aureitalea", "Aureivirga", "Auricoccucs"),
t_count = c(0L, 0L, 0L), n = c(1L, 1L, 1L),
ncbi_id = list("1176327", "1433990", character(0)),
g_test = list(c(`1176327` = 0),
c(`1433990` = 0),
structure(numeric(0), .Names = character(0)))),
class = c("rowwise_df", "tbl_df", "tbl", "data.frame"),
row.names = c(NA, -3L),
groups = structure(list(.rows = structure(list(1L, 2L, 3L),
ptype = integer(0),
class = c("vctrs_list_of","vctrs_vctr", "list"))),
row.names = c(NA, -3L),
class = c("tbl_df", "tbl", "data.frame")))
test_dset
#> # A tibble: 3 × 5
#> # Rowwise:
#> genus t_count n ncbi_id g_test
#> <chr> <int> <int> <list> <list>
#> 1 Aureitalea 0 1 <chr [1]> <dbl [1]>
#> 2 Aureivirga 0 1 <chr [1]> <dbl [1]>
#> 3 Auricoccucs 0 1 <chr [0]> <dbl [0]>
# process a vector of pvals
proc_gtest <- function(pvals){
if (length(pvals) == 0){
return(NA_character_)
}
sig <- which(pvals < 0.05)
if (length(sig) == 0){
return(NA_character_)
} else {
return(names(pvals)[sig])
}
}
# returns errors
test_dset |> mutate(ncbi_filt = map(g_test, proc_gtest))
#> Error: Problem with `mutate()` column `ncbi_filt`.
#> ℹ `ncbi_filt = map(g_test, proc_gtest)`.
#> ℹ `ncbi_filt` must be size 1, not 0.
#> ℹ Did you mean: `ncbi_filt = list(map(g_test, proc_gtest))` ?
#> ℹ The error occurred in row 3.
# this is okay
map(test_dset$g_test, proc_gtest)
#> [[1]]
#> [1] "1176327"
#>
#> [[2]]
#> [1] "1433990"
#>
#> [[3]]
#> [1] NA
# adding list doesn't work because it returns a list of NULL
# with names as the quantities I wanted.
test_dset |> mutate(ncbi_filt = list(map(g_test, proc_gtest))) |> pull(ncbi_filt)
#> [[1]]
#> [[1]]$`1176327`
#> NULL
#>
#>
#> [[2]]
#> [[2]]$`1433990`
#> NULL
#>
#>
#> [[3]]
#> named list()
Created on 2021-10-13 by the reprex package (v2.0.1)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.1.1 (2021-08-10)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2021-10-13
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> ! package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0)
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0)
#> cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0)
#> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0)
#> dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.1.0)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
#> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0)
#> knitr 1.34 2021-09-09 [1] CRAN (R 4.1.0)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0)
#> pillar 1.6.2 2021-07-29 [1] CRAN (R 4.1.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
#> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
#> P R.cache 0.15.0 2021-04-30 [?] CRAN (R 4.1.0)
#> P R.methodsS3 1.8.1 2020-08-26 [?] CRAN (R 4.1.0)
#> P R.oo 1.24.0 2020-08-26 [?] CRAN (R 4.1.0)
#> P R.utils 2.11.0 2021-09-26 [?] CRAN (R 4.1.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
#> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.0)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
#> sessioninfo 1.1.1 2018-11-05 [3] CRAN (R 4.1.0)
#> stringi 1.7.4 2021-08-25 [1] CRAN (R 4.1.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
#> P styler 1.6.2 2021-09-23 [?] CRAN (R 4.1.0)
#> tibble * 3.1.4 2021-08-25 [1] CRAN (R 4.1.0)
#> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0)
#> xfun 0.26 2021-09-14 [1] CRAN (R 4.1.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0)
#>
#> [1] /Users/quangnguyen/research/microbe_set_trait/renv/library/R-4.1/x86_64-apple-darwin17.0
#> [2] /private/var/folders/fs/hp4_8vfs665_nqytkhjc8s6w0000gn/T/Rtmp6ZA9pW/renv-system-library
#> [3] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
#>
#> P ── Loaded and on-disk path mismatch.
My understanding is that the error is due to the fact that the function being mapped returns nothing at row 3. The solution dplyr gave is that I should wrap everything in a list.
However:
I am using the original map which should already return a list (other tutorials on using map to transform list columns for tibbles also did not wrap everything around list). Wrapping this inside a list returns a list of NULL elements where the things that I want to extract are set as names of this new list.
My function does return values even if the element in the list is empty (returns NA_character_.
As seen in the reprex, the normal map function works and returns a list of length 3 with the empty row having an NA assigned to it as per the logic of the custom function. Right now I'm working around this by just generating a separate list and attach it to the data frame later, however I would love to understand what I'm looking at!
It is an issue with rowwise group attribute. As we are looping over each element in map, just ungroup
library(dplyr)
library(purrr)
test_dset %>%
ungroup %>%
mutate(ncbi_filt = map(g_test, proc_gtest))
# A tibble: 3 × 6
genus t_count n ncbi_id g_test ncbi_filt
<chr> <int> <int> <list> <list> <list>
1 Aureitalea 0 1 <chr [1]> <dbl [1]> <chr [1]>
2 Aureivirga 0 1 <chr [1]> <dbl [1]> <chr [1]>
3 Auricoccucs 0 1 <chr [0]> <dbl [0]> <chr [1]>
Or use map_chr to return as a vector (as there is one single value returned)
test_dset %>%
ungroup %>%
mutate(ncbi_filt = map_chr(g_test, proc_gtest))
# A tibble: 3 × 6
genus t_count n ncbi_id g_test ncbi_filt
<chr> <int> <int> <list> <list> <chr>
1 Aureitalea 0 1 <chr [1]> <dbl [1]> 1176327
2 Aureivirga 0 1 <chr [1]> <dbl [1]> 1433990
3 Auricoccucs 0 1 <chr [0]> <dbl [0]> <NA>
If there is a rowwise attribute, we can directly apply the function and get the output in a list (if the output returns length > 1 or of different structure)
test_dset %>%
mutate(ncbi_filt = list(proc_gtest(g_test)))
# A tibble: 3 × 6
# Rowwise:
genus t_count n ncbi_id g_test ncbi_filt
<chr> <int> <int> <list> <list> <list>
1 Aureitalea 0 1 <chr [1]> <dbl [1]> <chr [1]>
2 Aureivirga 0 1 <chr [1]> <dbl [1]> <chr [1]>
3 Auricoccucs 0 1 <chr [0]> <dbl [0]> <chr [1]>
The function returns a single value, so we don't need to wrap with list as well
test_dset %>%
mutate(ncbi_filt = proc_gtest(g_test))
# A tibble: 3 × 6
# Rowwise:
genus t_count n ncbi_id g_test ncbi_filt
<chr> <int> <int> <list> <list> <chr>
1 Aureitalea 0 1 <chr [1]> <dbl [1]> 1176327
2 Aureivirga 0 1 <chr [1]> <dbl [1]> 1433990
3 Auricoccucs 0 1 <chr [0]> <dbl [0]> <NA>

R: Computing difference in values for multiple groups/variables in R

Is there a way to calculate the difference between each group efficiently? Ideally, I want to create a new column with mutate() function to show the difference (in one column, in a long format). I don't want to have to do compute the difference between each group individually.
i.e. I want to find the difference in values between each group, on a given date and hour:
arc1045 - arc1046,
arc1045 - arc1047,
arc1045 - arc1048,
arc1045 - arc1050,
arc1046 - arc1047,
arc1046 - arc1048,
.
.
.
The data frame can be retrieved using the code below.
structure(list(date = structure(c(18215, 18215, 18215, 18215,
18215), class = "Date"), hour = 9:13, arc1045 = c(15.2933333333333,
16.1275, 17.0366666666667, 18.36, 19.2725), arc1046 = c(14.8133333333333,
15.615, 16.3733333333333, 17.405, 18.4), arc1047 = c(15.0233333333333,
15.93, 16.8133333333333, 18.17, 18.6125), arc1048 = c(14.45,
15.31, 15.9333333333333, 16.965, 18.06), arc1050 = c(14.45, 15.2875,
15.9466666666667, 16.89, 18.1025)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
#> date hour arc1045 arc1046 arc1047 arc1048 arc1050
#> 1 2019-11-15 9 15.29333 14.81333 15.02333 14.45000 14.45000
#> 2 2019-11-15 10 16.12750 15.61500 15.93000 15.31000 15.28750
#> 3 2019-11-15 11 17.03667 16.37333 16.81333 15.93333 15.94667
#> 4 2019-11-15 12 18.36000 17.40500 18.17000 16.96500 16.89000
#> 5 2019-11-15 13 19.27250 18.40000 18.61250 18.06000 18.10250
Created on 2020-11-04 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Catalina 10.15.7
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-11-04
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> backports 1.1.10 2020-09-15 [1] CRAN (R 4.0.2)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2)
#> cli 2.1.0 2020-10-12 [1] CRAN (R 4.0.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
#> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2)
#> ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.2)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.8 2020-10-08 [1] CRAN (R 4.0.2)
#> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.2)
#> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.2)
#> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2)
#> xfun 0.19.1 2020-10-31 [1] Github (yihui/xfun#621896e)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
Thank you.
You can put your data frame into long form with pivot_longer, then do a full_join to get all combinations by date, hour, and row number. Using distinct you can get unique combinations and remove duplicates (e.g., arc1045 - arc1046 and arc1046 - arc1045).
library(tidyverse)
df_long <- df %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = starts_with("arc"))
df_long %>%
full_join(df_long, by = c("date", "hour", "rn")) %>%
filter(name.x != name.y) %>%
distinct(date, hour, rn,
comb_name = paste0(pmin(name.x, name.y), pmax(name.x, name.y)),
.keep_all = TRUE) %>%
mutate(diff = value.x - value.y) %>%
select(date, hour, comb_name, diff)
Output
date hour comb_name diff
<date> <int> <chr> <dbl>
1 2019-11-15 9 arc1045arc1046 0.480
2 2019-11-15 9 arc1045arc1047 0.270
3 2019-11-15 9 arc1045arc1048 0.843
4 2019-11-15 9 arc1045arc1050 0.843
5 2019-11-15 9 arc1046arc1047 -0.210
6 2019-11-15 9 arc1046arc1048 0.363
7 2019-11-15 9 arc1046arc1050 0.363
8 2019-11-15 9 arc1047arc1048 0.573
9 2019-11-15 9 arc1047arc1050 0.573
10 2019-11-15 9 arc1048arc1050 0
...

Convert string to datatime object in r

I have a list of strings times. I want to convert those strings to datetime object. I tried as.POSIXct and did not get expected outcomes. I want datetimes like this 00:30, 01:30 ect...
Is there any easy code for doing this?
> times
[1] "00:30" "01:30" "02:30" "03:30" "04:30" "05:30" "06:30" "07:30" "08:30" "09:30" "10:30" "11:30" "12:30" "13:30" "14:30"
[16] "15:30" "16:30" "17:30" "18:30" "19:30" "20:30" "21:30" "22:30" "23:30"
> times <- as.POSIXct(times, format = '%H:%M')
[1] "2020-03-11 00:30:00 CDT" "2020-03-11 01:30:00 CDT" "2020-03-11 02:30:00 CDT" "2020-03-11 03:30:00 CDT"
[5] "2020-03-11 04:30:00 CDT" "2020-03-11 05:30:00 CDT" "2020-03-11 06:30:00 CDT" "2020-03-11 07:30:00 CDT"
[9] "2020-03-11 08:30:00 CDT" "2020-03-11 09:30:00 CDT" "2020-03-11 10:30:00 CDT" "2020-03-11 11:30:00 CDT"
[13] "2020-03-11 12:30:00 CDT" "2020-03-11 13:30:00 CDT" "2020-03-11 14:30:00 CDT" "2020-03-11 15:30:00 CDT"
[17] "2020-03-11 16:30:00 CDT" "2020-03-11 17:30:00 CDT" "2020-03-11 18:30:00 CDT" "2020-03-11 19:30:00 CDT"
[21] "2020-03-11 20:30:00 CDT" "2020-03-11 21:30:00 CDT" "2020-03-11 22:30:00 CDT" "2020-03-11 23:30:00 CDT"
As previous comments and answers suggested, the POSIXct (i.e., datetime) class in R always stores dates along with times. If you convert from a character object with just times to that class, today's date is added by default (if you want another date, you could do, for example, this: as.POSIXct(paste("2020-01-01", times), format = "%Y-%m-%d %H:%M")).
However, this should almost never be a problem since you can use format(times, format = "%H:%M") or for ggplot2 scale_x_datetime to get just the times back. For plotting, this would look something like this:
times <- c("00:30", "01:30", "02:30", "03:30", "04:30", "05:30", "06:30", "07:30", "08:30", "09:30", "10:30", "11:30", "12:30", "13:30", "14:30",
"15:30", "16:30", "17:30", "18:30", "19:30", "20:30", "21:30", "22:30", "23:30")
library(tidyverse)
df <- tibble(
time_chr = times,
time = as.POSIXct(times, format = "%H:%M"),
value = rnorm(length(times))
)
df
#> # A tibble: 24 x 3
#> time_chr time value
#> <chr> <dttm> <dbl>
#> 1 00:30 2020-03-12 00:30:00 0.352
#> 2 01:30 2020-03-12 01:30:00 -0.547
#> 3 02:30 2020-03-12 02:30:00 -0.574
#> 4 03:30 2020-03-12 03:30:00 0.843
#> 5 04:30 2020-03-12 04:30:00 0.798
#> 6 05:30 2020-03-12 05:30:00 -0.620
#> 7 06:30 2020-03-12 06:30:00 0.213
#> 8 07:30 2020-03-12 07:30:00 1.21
#> 9 08:30 2020-03-12 08:30:00 0.370
#> 10 09:30 2020-03-12 09:30:00 0.497
#> # … with 14 more rows
ggplot(df, aes(x = time, y = value)) +
geom_line() +
scale_x_datetime(date_labels = "%H:%M")
Created on 2020-03-12 by the reprex package (v0.3.0)
In base you can use as.difftime to convert a string to time object:
as.difftime(times, "%H:%M")
#Time differences in mins
# [1] 30 90 150 210 270 330 390 450 510 570 630 690 750 810 870
#[16] 930 990 1050 1110 1170 1230 1290 1350 1410
You can also use the hms package:
library(hms)
head(as_hms(paste0(times, ":00")))
#00:30:00
#01:30:00
#02:30:00
#03:30:00
#04:30:00
#05:30:00
or the lubridate package as already suggested by #jpmam1
library(lubridate)
hm(times)
# [1] "30M 0S" "1H 30M 0S" "2H 30M 0S" "3H 30M 0S" "4H 30M 0S"
# [6] "5H 30M 0S" "6H 30M 0S" "7H 30M 0S" "8H 30M 0S" "9H 30M 0S"
#[11] "10H 30M 0S" "11H 30M 0S" "12H 30M 0S" "13H 30M 0S" "14H 30M 0S"
#[16] "15H 30M 0S" "16H 30M 0S" "17H 30M 0S" "18H 30M 0S" "19H 30M 0S"
#[21] "20H 30M 0S" "21H 30M 0S" "22H 30M 0S" "23H 30M 0S"
as.POSIXct stores dates with your times. If you need it only for plotting that will cause no problem and use the answer from #JBGruber. Storing dates where there are no dates should be avoided or the dates should be set to values where it is clear that they are wrong.
head(as.POSIXct(paste("9999-1-1", times)))
#[1] "9999-01-01 00:30:00 CET" "9999-01-01 01:30:00 CET"
#[3] "9999-01-01 02:30:00 CET" "9999-01-01 03:30:00 CET"
#[5] "9999-01-01 04:30:00 CET" "9999-01-01 05:30:00 CET"

How to calculate mean of two timestamp columns in R?

I have data frame in R, where two columns are datetimes (POSIX class). I need to calculate mean datetime by each row.
Here's some reproducible example:
a <- c(
"2018-10-11 15:22:17",
"2018-10-10 16:30:37",
"2018-10-10 16:52:46",
"2018-10-10 16:58:33",
"2018-10-10 16:32:24")
b <- c(
"2018-10-11 15:25:12",
"2018-10-10 16:30:39",
"2018-10-10 16:55:14",
"2018-10-10 16:58:53",
"2018-10-10 16:32:27")
a <- strptime(a, format = "%Y-%m-%d %H:%M:%S")
b <- strptime(b, format = "%Y-%m-%d %H:%M:%S")
f <- data.frame(a, b)
The results should be like that:
a b time_mean
1 2018-10-11 15:22:17 2018-10-11 15:25:12 2018-10-11 15:23:44
2 2018-10-10 16:30:37 2018-10-10 16:30:39 2018-10-10 16:30:38
3 2018-10-10 16:52:46 2018-10-10 16:55:14 2018-10-10 16:54:00
4 2018-10-10 16:58:33 2018-10-10 16:58:53 2018-10-10 16:58:43
5 2018-10-10 16:32:24 2018-10-10 16:32:27 2018-10-10 16:32:25
I tried following:
apply(f, 1, function(x) mean)
apply(f, 1, function(x) mean(c(x[1], x[2])))
Instead of using apply (which can convert it to a matrix and then strip off the class attributes), use Map
f$time_mean <- do.call(c, Map(function(x, y) mean(c(x, y)), a, b))
f$time_mean
#[1] "2018-10-11 15:23:44 EDT" "2018-10-10 16:30:38 EDT" "2018-10-10 16:54:00 EDT" "2018-10-10 16:58:43 EDT"
#[5] "2018-10-10 16:32:25 EDT"
Or as it is from data.frame f
do.call(c, Map(function(x, y) mean(c(x, y)), f$a, f$b))
Also, another option is converting to numeric class with ?xtfrm (that also has POSIXlt method dispatch), do the rowMeans and convert to DateTime class as in #jay.sf's post
as.POSIXlt(rowMeans(sapply(f, xtfrm)), origin = "1970-01-01")
#[1] "2018-10-11 15:23:44 EDT" "2018-10-10 16:30:38 EDT" "2018-10-10 16:54:00 EDT" "2018-10-10 16:58:43 EDT"
#[5] "2018-10-10 16:32:25 EDT"
You could calculate with the numerics.
f$time_mean <- as.POSIXct(sapply(seq(nrow(f)), function(x)
mean(as.numeric(f[x, ]))), origin="1970-01-01")
f
# a b time_mean
# 1 2018-10-11 15:22:17 2018-10-11 15:25:12 2018-10-11 15:23:44
# 2 2018-10-10 16:30:37 2018-10-10 16:30:39 2018-10-10 16:30:38
# 3 2018-10-10 16:52:46 2018-10-10 16:55:14 2018-10-10 16:54:00
# 4 2018-10-10 16:58:33 2018-10-10 16:58:53 2018-10-10 16:58:43
# 5 2018-10-10 16:32:24 2018-10-10 16:32:27 2018-10-10 16:32:25

R matrix column to fill with timestamp

I'm trying to fill one matrix column with a date-time called up from another column.
B <- matrix(0, nrow(A) - 1, 3)
B[, 1] <- "Anne"
times <- as.POSIXct(tname$DT[1:955], format = "%Y-%m-%d %H:%M:%S")
B[, 2] <- times
When returning times, it lists them in the format "%Y-%m-%d %H:%M:%S",
[1] "2017-05-19 11:01:00 EDT" "2017-05-19 12:01:00 EDT" "2017-05-19
12:31:00 EDT" "2017-05-19 13:01:00 EDT"
[5] "2017-05-19 13:31:00 EDT" "2017-05-19 14:01:00 EDT" "2017-05-19
14:31:00 EDT" "2017-05-19 15:01:00 EDT"
[9] "2017-05-20 08:01:00 EDT" "2017-05-20 09:01:00 EDT" "2017-05-20
10:01:00 EDT" "2017-05-20 11:01:00 EDT" ....
however, when I call up B[, 2] it gives me weird numbers:
[1] "1495206060" "1495209660" "1495211460" "1495213260" "1495215060"
"1495216860" "1495218660" "1495220460"
[9] "1495281660" "1495285260" "1495288860" "1495292460" "1495296060" ....
How do I copy my dates and times into my matrix in the right format?

Resources