R: Computing difference in values for multiple groups/variables in R - r

Is there a way to calculate the difference between each group efficiently? Ideally, I want to create a new column with mutate() function to show the difference (in one column, in a long format). I don't want to have to do compute the difference between each group individually.
i.e. I want to find the difference in values between each group, on a given date and hour:
arc1045 - arc1046,
arc1045 - arc1047,
arc1045 - arc1048,
arc1045 - arc1050,
arc1046 - arc1047,
arc1046 - arc1048,
.
.
.
The data frame can be retrieved using the code below.
structure(list(date = structure(c(18215, 18215, 18215, 18215,
18215), class = "Date"), hour = 9:13, arc1045 = c(15.2933333333333,
16.1275, 17.0366666666667, 18.36, 19.2725), arc1046 = c(14.8133333333333,
15.615, 16.3733333333333, 17.405, 18.4), arc1047 = c(15.0233333333333,
15.93, 16.8133333333333, 18.17, 18.6125), arc1048 = c(14.45,
15.31, 15.9333333333333, 16.965, 18.06), arc1050 = c(14.45, 15.2875,
15.9466666666667, 16.89, 18.1025)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
#> date hour arc1045 arc1046 arc1047 arc1048 arc1050
#> 1 2019-11-15 9 15.29333 14.81333 15.02333 14.45000 14.45000
#> 2 2019-11-15 10 16.12750 15.61500 15.93000 15.31000 15.28750
#> 3 2019-11-15 11 17.03667 16.37333 16.81333 15.93333 15.94667
#> 4 2019-11-15 12 18.36000 17.40500 18.17000 16.96500 16.89000
#> 5 2019-11-15 13 19.27250 18.40000 18.61250 18.06000 18.10250
Created on 2020-11-04 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Catalina 10.15.7
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-11-04
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> backports 1.1.10 2020-09-15 [1] CRAN (R 4.0.2)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2)
#> cli 2.1.0 2020-10-12 [1] CRAN (R 4.0.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
#> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2)
#> ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.2)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.8 2020-10-08 [1] CRAN (R 4.0.2)
#> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.2)
#> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.2)
#> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2)
#> xfun 0.19.1 2020-10-31 [1] Github (yihui/xfun#621896e)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
Thank you.

You can put your data frame into long form with pivot_longer, then do a full_join to get all combinations by date, hour, and row number. Using distinct you can get unique combinations and remove duplicates (e.g., arc1045 - arc1046 and arc1046 - arc1045).
library(tidyverse)
df_long <- df %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = starts_with("arc"))
df_long %>%
full_join(df_long, by = c("date", "hour", "rn")) %>%
filter(name.x != name.y) %>%
distinct(date, hour, rn,
comb_name = paste0(pmin(name.x, name.y), pmax(name.x, name.y)),
.keep_all = TRUE) %>%
mutate(diff = value.x - value.y) %>%
select(date, hour, comb_name, diff)
Output
date hour comb_name diff
<date> <int> <chr> <dbl>
1 2019-11-15 9 arc1045arc1046 0.480
2 2019-11-15 9 arc1045arc1047 0.270
3 2019-11-15 9 arc1045arc1048 0.843
4 2019-11-15 9 arc1045arc1050 0.843
5 2019-11-15 9 arc1046arc1047 -0.210
6 2019-11-15 9 arc1046arc1048 0.363
7 2019-11-15 9 arc1046arc1050 0.363
8 2019-11-15 9 arc1047arc1048 0.573
9 2019-11-15 9 arc1047arc1050 0.573
10 2019-11-15 9 arc1048arc1050 0
...

Related

strange behavior using rbind with data.table (>= 1.13.0) in combination with data.frame

Trying to rbind a data.table containing an IDate (result of fread) to a data.frame containing a character converts the IDate to its internal integer representation. Probably this is by design, but if not it's a bug. fread supports IDate since data.table 1.13.0 (see https://github.com/Rdatatable/data.table/blob/master/NEWS.md).
The example below shows that the data.table method of rbind can deal with it correctly (throw an error), but the data.frame method of rbind does not.
I don't know how and where this can/should be fixed.
library(data.table)
df1 <- data.frame(date = "2020-11-05")
dt1 <- data.table(date = "2020-11-05")
dt2 <- fread("date\n2020-11-05")
rbind(dt1, dt2) # ok -- throws error: rbind.data.table
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 1 of item 2 does not match with column 1 of item 1.
## not ok -- converts int representation of IDate to character: rbind.data.frame
rbind(df1, dt2)
#> date
#> 1 2020-11-05
#> 2 18571
## the other way round: ok -- throws an error: rbind.data.table
rbind(dt2, df1)
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 1 of item 2 does not match with column 1 of item 1.
### solution
dt3 <- fread("date\n2020-11-05", colClasses = "character")
rbind(dt1, dt3)
#> date
#> 1: 2020-11-05
#> 2: 2020-11-05
Created on 2020-11-05 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.3 (2020-10-10)
#> os Debian GNU/Linux 10 (buster)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate de_AT.UTF-8
#> ctype de_AT.UTF-8
#> tz Europe/Vienna
#> date 2020-11-05
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> backports 1.2.0 2020-11-02 [1] CRAN (R 4.0.3)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3)
#> cli 2.1.0 2020-10-12 [1] CRAN (R 4.0.3)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> data.table * 1.13.2 2020-10-19 [1] CRAN (R 4.0.3)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.3)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.3)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
#> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.3)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.3)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.3)
#> ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.3)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.3)
#> rlang 0.4.8 2020-10-08 [1] CRAN (R 4.0.3)
#> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.3)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.3)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 3.0.0 2020-10-31 [1] CRAN (R 4.0.3)
#> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.3)
#> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.3)
#> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.3)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /usr/local/lib/R/site-library
#> [2] /usr/lib/R/site-library
#> [3] /usr/lib/R/library

Why does dplyr::mutate_at() on the first element in a rowwise-tibble also take effect on the rest of the elements?

In the following code, I defined a tibble df with two columns: name column contains a character vector of c("a", "b", "c"), and data column contains a list of tibbles, each with the column value. Then I'd like to change the column name of each tibble's value column to the character in the corresponding row, e.g. "a", "b" and "c". To manipulate the tibble in a row-wise manner, I used dplyr::rowwise(), but then I found that the changes taking effect on the first element (changing the column name to "a") also took effect on the rest of the elements (since after the first row, the printed tibble before the change of the column name showed the column name of "a"). And therefore, it can be expected that the change of column names to the following elements in the column failed, since there were no longer column names of "value" (all changed to "a"). Do I have to use a purrr::map() function here instead of the tidier row-wise tibble manipulation?
Would you please give me an answer using rowwise-mutate_at method? Thanks.
library(tidyverse)
#> Warning: 程辑包'tidyverse'是用R版本3.6.3 来建造的
#> Warning: 程辑包'ggplot2'是用R版本3.6.1 来建造的
#> Warning: 程辑包'tibble'是用R版本3.6.3 来建造的
#> Warning: 程辑包'tidyr'是用R版本3.6.1 来建造的
#> Warning: 程辑包'readr'是用R版本3.6.1 来建造的
#> Warning: 程辑包'purrr'是用R版本3.6.1 来建造的
#> Warning: 程辑包'dplyr'是用R版本3.6.3 来建造的
#> Warning: 程辑包'stringr'是用R版本3.6.1 来建造的
#> Warning: 程辑包'forcats'是用R版本3.6.3 来建造的
df <- tibble::tibble(name = c("a", "b", "c"),
data = list(tibble::tibble(value = 1:10)))
df_mutate <- df %>%
dplyr::rowwise() %>%
dplyr::mutate_at("data", ~ {
print(.x)
colnames(.x)[colnames(.x) %in% "value"] <- name
list(.x)
}) %>%
dplyr::ungroup()
#> # A tibble: 10 x 1
#> value
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6
#> 7 7
#> 8 8
#> 9 9
#> 10 10
#> # A tibble: 10 x 1
#> a
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6
#> 7 7
#> 8 8
#> 9 9
#> 10 10
#> # A tibble: 10 x 1
#> a
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6
#> 7 7
#> 8 8
#> 9 9
#> 10 10
Created on 2020-06-19 by the reprex package (v0.3.0)
devtools::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 3.6.0 (2019-04-26)
#> os Windows Server x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Chinese (Simplified)_China.936
#> ctype Chinese (Simplified)_China.936
#> tz Asia/Taipei
#> date 2020-06-19
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
#> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
#> broom 0.5.6 2020-04-20 [1] CRAN (R 3.6.3)
#> callr 3.4.0 2019-12-09 [1] CRAN (R 3.6.2)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.1)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.1)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.2)
#> dbplyr 1.4.2 2019-06-17 [1] CRAN (R 3.6.3)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1)
#> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.1)
#> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.2)
#> dplyr * 1.0.0 2020-05-29 [1] CRAN (R 3.6.3)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1)
#> fansi 0.4.0 2018-10-05 [1] CRAN (R 3.6.1)
#> forcats * 0.5.0 2020-03-01 [1] CRAN (R 3.6.3)
#> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.1)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.1)
#> ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.1)
#> glue 1.4.1 2020-05-13 [1] CRAN (R 3.6.3)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.1)
#> haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.3)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.1)
#> hms 0.5.2 2019-10-30 [1] CRAN (R 3.6.2)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1)
#> httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.1)
#> jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.1)
#> knitr 1.26 2019-11-12 [1] CRAN (R 3.6.2)
#> lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.0)
#> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.1)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3)
#> lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.1)
#> modelr 0.1.6 2020-02-22 [1] CRAN (R 3.6.3)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.1)
#> nlme 3.1-143 2019-12-10 [1] CRAN (R 3.6.2)
#> pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.2)
#> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.1)
#> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.1)
#> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1)
#> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.1)
#> purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.1)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2)
#> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.2)
#> readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.1)
#> readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.1)
#> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.1)
#> reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.3)
#> rlang 0.4.6 2020-05-02 [1] CRAN (R 3.6.3)
#> rmarkdown 2.0 2019-12-12 [1] CRAN (R 3.6.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1)
#> rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.3)
#> scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
#> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
#> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.2)
#> tibble * 3.0.1 2020-04-20 [1] CRAN (R 3.6.3)
#> tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.1)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.3)
#> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.3)
#> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.1)
#> vctrs 0.3.0 2020-05-11 [1] CRAN (R 3.6.3)
#> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.1)
#> xfun 0.11 2019-11-12 [1] CRAN (R 3.6.2)
#> xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.1)
#> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
#>
#> [1] C:/Users/xzhu/Documents/R/win-library/3.6
#> [2] C:/Program Files/R/R-3.6.0/library
Yes, you can use map2 :
library(dplyr)
df %>% mutate(data = purrr::map2(name, data, ~{names(.y) <- .x;.y}))
Or Map in base R :
df$data <- Map(function(x, y) {names(y) <- x;y}, df$name, df$data)
If you want to use rowwise a similar approach would be :
df %>% rowwise() %>% mutate(data = {names(data) <- name;list(data)})

How can I tell what arima model this code is running?

I'm reading over some R code, and I've come across a line that where the function prototype doesn't seem to match what I've seen in the library's api (fabletools).
fitted_model = a_time_series %>%
filter(date <= tsibble::year(someyear)) %>%
fabletools::model(arima = ARIMA(time)
...Where time is a column from a a_time_series. How do I tell what arima model this is using?
(e.g. arima(1,1,1) or arima(0,1,1) ,etc)
I've checked this documentation however, the function prototypes don't seem to match.
You can identify the ARIMA output by looking at the formatted output in the console. If you need to obtain this display as text, you can use the format() function.
library(fable)
#> Loading required package: fabletools
library(tsibble)
library(dplyr)
tourism %>%
group_by(Purpose) %>%
summarise(Trips = sum(Trips)) %>%
model(auto_arima = ARIMA(Trips)) %>%
mutate(format(auto_arima))
#> # A mable: 4 x 3
#> # Key: Purpose [4]
#> Purpose auto_arima `format(auto_arima)`
#> <chr> <model> <chr>
#> 1 Business <ARIMA(0,1,1)(0,1,1)[4]> <ARIMA(0,1,1)(0,1,1)[4]>
#> 2 Holiday <ARIMA(0,1,1)(0,1,1)[4]> <ARIMA(0,1,1)(0,1,1)[4]>
#> 3 Other <ARIMA(0,1,1)(1,0,0)[4]> <ARIMA(0,1,1)(1,0,0)[4]>
#> 4 Visiting <ARIMA(1,0,1)(2,1,0)[4]> <ARIMA(1,0,1)(2,1,0)[4]>
Created on 2020-06-12 by the reprex package (v0.3.0)
Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.2 (2019-12-12)
#> os Ubuntu 18.04.4 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en_AU:en
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-06-12
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> anytime 0.3.7 2020-01-20 [1] CRAN (R 3.6.1)
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.1)
#> backports 1.1.7 2020-05-13 [1] RSPM (R 3.6.3)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.2)
#> cli 2.0.2 2020-02-28 [1] RSPM (R 3.6.2)
#> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.1)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.1)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.1)
#> devtools 2.2.2 2020-02-17 [1] RSPM (R 3.6.2)
#> digest 0.6.25 2020-02-23 [1] RSPM (R 3.6.2)
#> distributional 0.1.0.9000 2020-06-10 [1] local
#> dplyr * 1.0.0 2020-05-29 [1] CRAN (R 3.6.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 3.6.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1)
#> fable * 0.2.1 2020-06-11 [1] local
#> fabletools * 0.2.0 2020-06-11 [1] local
#> fansi 0.4.1 2020-01-08 [1] RSPM (R 3.6.2)
#> farver 2.0.3 2020-01-16 [1] CRAN (R 3.6.1)
#> feasts 0.1.4 2020-06-04 [1] local
#> fs 1.4.1 2020-04-04 [1] RSPM (R 3.6.3)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.1)
#> ggplot2 3.3.1 2020-05-28 [1] CRAN (R 3.6.2)
#> glue 1.4.1.9000 2020-05-26 [1] Github (tidyverse/glue#a605000)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.1)
#> highr 0.8 2019-03-20 [1] CRAN (R 3.6.1)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1)
#> knitr 1.28 2020-02-06 [1] RSPM (R 3.6.2)
#> lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.2)
#> lifecycle 0.2.0.9000 2020-03-19 [1] Github (r-lib/lifecycle#355dcba)
#> lubridate 1.7.8 2020-04-06 [1] RSPM (R 3.6.3)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.1)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.1)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.1)
#> nlme 3.1-142 2019-11-07 [2] CRAN (R 3.6.2)
#> pillar 1.4.4 2020-05-25 [1] Github (r-lib/pillar#2f5ad11)
#> pkgbuild 1.0.8 2020-05-07 [1] RSPM (R 3.6.3)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 3.6.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2)
#> processx 3.4.2 2020-02-09 [1] RSPM (R 3.6.2)
#> progressr 0.6.0 2020-05-19 [1] CRAN (R 3.6.2)
#> ps 1.3.3 2020-05-08 [1] RSPM (R 3.6.3)
#> purrr 0.3.4 2020-04-17 [1] RSPM (R 3.6.3)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
#> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 3.6.2)
#> remotes 2.1.1 2020-02-15 [1] RSPM (R 3.6.2)
#> rlang 0.4.6.9000 2020-05-20 [1] Github (r-lib/rlang#691b5a8)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.1)
#> scales 1.1.1 2020-05-11 [1] RSPM (R 3.6.3)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.1)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
#> testthat 2.3.2 2020-03-02 [1] RSPM (R 3.6.3)
#> tibble 3.0.1 2020-04-20 [1] RSPM (R 3.6.3)
#> tidyr 1.1.0 2020-05-20 [1] RSPM (R 3.6.3)
#> tidyselect 1.1.0 2020-05-11 [1] RSPM (R 3.6.3)
#> tsibble * 0.9.0 2020-06-02 [1] Github (tidyverts/tsibble#c837e83)
#> urca 1.3-0 2016-09-06 [1] CRAN (R 3.6.1)
#> usethis 1.5.1.9000 2020-01-31 [1] Github (r-lib/usethis#7d8b066)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.1)
#> vctrs 0.3.0.9000 2020-05-28 [1] Github (r-lib/vctrs#373e1ce)
#> withr 2.2.0 2020-04-20 [1] RSPM (R 3.6.3)
#> xfun 0.13 2020-04-13 [1] RSPM (R 3.6.3)
#> yaml 2.2.1 2020-02-01 [1] RSPM (R 3.6.2)
#>
#> [1] /home/mitchell/R/x86_64-pc-linux-gnu-library/3.6
#> [2] /opt/R/3.6.2/lib/R/library

Calculating upper and lower confidence intervals by group in dplyr summarise()

I am trying to make a table that shows N (number of observations), percent frequency (of answers > 0), and the lower and upper confidence intervals for percent frequency, and I want to group this by type.
Example of data
dat <- data.frame(
"type" = c("B","B","A","B","A","A","B","A","A","B","A","A","A","B","B","B"),
"num" = c(3,0,0,9,6,0,4,1,1,5,6,1,3,0,0,0)
)
Expected output (with values filled in):
Type N Percent Lower 95% CI Upper 95% CI
A
B
Attempt
library(dplyr)
library(qwraps2)
table<-dat %>%
group_by(type) %>%
summarise(N=n(),
mean.ci = mean_ci(dat$num),
"Percent"=n_perc(num > 0))
This worked to get N and percent frequency, but returned an error: "Column must be length 1 (a summary value), not 3" when I added in mean_ci
The second code I tried, found here:
table2<-dat %>%
group_by(type) %>%
summarise(N.num=n(),
mean.num = mean(dat$num),
sd.num = sd(dat$num),
"Percent"=n_perc(num > 0)) %>%
mutate(se.num = sd.num / sqrt(N.num),
lower.ci = 100*(mean.num - qt(1 - (0.05 / 2), N.num - 1) * se.num),
upper.ci = 100*(mean.num + qt(1 - (0.05 / 2), N.num - 1) * se.num))
# A tibble: 2 x 8
# type N.num mean.num sd.num Percent se.num lower.ci upper.ci
# <fct> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
#1 A 8 2.44 2.83 "6 (75.00\\%)" 1.00 7.35 480.
#2 B 8 2.44 2.83 "4 (50.00\\%)" 1.00 7.35 480.
This gave me an output, but the confidence intervals are not logical.
The output of mean_ci is a vector of length 3. This is maybe unexpected because the package has added a print method so that when you see this in the console it looks like a single character value and not a numeric length > 1 vector. But, you can see the underlying data structure by looking at str.
mean_ci(dat$num) %>% str
# 'qwraps2_mean_ci' Named num [1:3] 2.44 1.05 3.82
# - attr(*, "names")= chr [1:3] "mean" "lcl" "ucl"
# - attr(*, "alpha")= num 0.05
In summarize, each element of each column of the output needs to be length 1, so providing a length 3 object for summarize to put in a single "cell" (column element) results in an error. A workaround is to put the length 3 vector in a list, so that it is now a length 1 list. Then you can use unnest_wider to separate it into 3 columns (and therefore making the table "wider")
library(tidyverse)
dat %>%
group_by(type) %>%
summarise( N=n(),
mean.ci = list(mean_ci(num)),
"Percent"= n_perc(num > 0)) %>%
unnest_wider(mean.ci)
# # A tibble: 2 x 6
# type N mean lcl ucl Percent
# <fct> <int> <dbl> <dbl> <dbl> <chr>
# 1 A 8 2.25 0.523 3.98 "6 (75.00\\%)"
# 2 B 8 2.62 0.344 4.91 "4 (50.00\\%)"
IceCreamToucan’s answer is very good. I’m posting this answer to offer a
different way to present the information.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(qwraps2)
dat <- data.frame("type" = c("B","B","A","B","A","A","B","A","A","B","A","A","A","B","B","B"),
"num" = c(3,0,0,9,6,0,4,1,1,5,6,1,3,0,0,0))
When building the dplyr::summarize call you can use the qwraps2::frmtci
call to format the output of qwraps2::mean_ci into a character string of
length one.
I would also recommend using the data pronoun .data so you can be explicit
about the variables to summarize.
dat %>%
dplyr::group_by(type) %>%
dplyr::summarize(N = n(),
mean.ci = qwraps2::frmtci(qwraps2::mean_ci(.data$num)),
Percent = qwraps2::n_perc(.data$num > 0))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 4
#> type N mean.ci Percent
#> <chr> <int> <chr> <chr>
#> 1 A 8 2.25 (0.52, 3.98) "6 (75.00\\%)"
#> 2 B 8 2.62 (0.34, 4.91) "4 (50.00\\%)"
Created on 2020-09-15 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Catalina 10.15.6
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Denver
#> date 2020-09-15
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.9 2020-08-24 [1] CRAN (R 4.0.2)
#> callr 3.4.4 2020-09-07 [1] CRAN (R 4.0.2)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> dplyr * 1.0.2 2020-08-18 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.0)
#> knitr 1.29 2020-06-23 [1] CRAN (R 4.0.0)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2)
#> ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> qwraps2 * 0.5.0 2020-09-14 [1] local
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.0)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
#> rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.2)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0)
#> vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.2)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.17 2020-09-09 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Group output from naniar using dplyr, nesting/unnesting, compatibility with newer version of R

I had a piece of R script running to get an overview on missing values in a repeated measures data frame. I used naniar and dplyr from the tidyverse and it worked fine. I used the combination to group the output by different factors (e.g. study, day, participant,...):
miss_trigger <- data_mlm_npu_filter[,c("Trigger_counter", "stadi_AU")] %>%
group_by(Trigger_counter) %>%
miss_var_summary()
Now, some months later, I first got the warning message
#Warning message:
# `cols` is now required.
#Please use `cols = c(data)`
After searching for the warning message, I found that there has change something with nesting/unnesting but this information did not help me to fix the warning/what changes to apply to my code.
And now after updating R to 3.6.2, I am just getting:
Error in group_by_fun(data, .fun = miss_var_summary()) :
could not find function "group_by_fun"
The miss_var_summary function itself works without problems. So, I would really just like to group my output from naniar as before. What do I have to do? Apparently I am missing a key information or understanding of the packages I am using on how to fix this myself.
This was a bug introduced by a new version of tidyr, this should now work:
library(naniar)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
oceanbuoys %>%
group_by(year) %>%
miss_var_summary()
#> # A tibble: 14 x 4
#> # Groups: year [2]
#> year variable n_miss pct_miss
#> <dbl> <chr> <int> <dbl>
#> 1 1997 air_temp_c 77 20.9
#> 2 1997 latitude 0 0
#> 3 1997 longitude 0 0
#> 4 1997 sea_temp_c 0 0
#> 5 1997 humidity 0 0
#> 6 1997 wind_ew 0 0
#> 7 1997 wind_ns 0 0
#> 8 1993 humidity 93 25.3
#> 9 1993 air_temp_c 4 1.09
#> 10 1993 sea_temp_c 3 0.815
#> 11 1993 latitude 0 0
#> 12 1993 longitude 0 0
#> 13 1993 wind_ew 0 0
#> 14 1993 wind_ns 0 0
Created on 2020-05-14 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.0 (2020-04-24)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-05-14
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.6 2020-04-05 [1] CRAN (R 4.0.0)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.0)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> colorspace 1.4-2 2020-02-27 [1] R-Forge (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.0 2020-04-10 [1] CRAN (R 4.0.0)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> dplyr * 0.8.99.9002 2020-05-04 [1] Github (tidyverse/dplyr#8710f8a)
#> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.4.1 2020-04-04 [1] CRAN (R 4.0.0)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0)
#> ggplot2 3.3.0 2020-03-05 [1] CRAN (R 4.0.0)
#> glue 1.4.0 2020-04-03 [1] CRAN (R 4.0.0)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.4.0 2019-10-04 [1] CRAN (R 4.0.0)
#> knitr 1.28 2020-02-06 [1] CRAN (R 4.0.0)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0)
#> naniar * 0.5.1 2020-04-30 [1] CRAN (R 4.0.0)
#> pillar 1.4.4 2020-05-05 [1] CRAN (R 4.0.0)
#> pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 4.0.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> pkgload 1.0.2 2018-10-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.2 2020-02-09 [1] CRAN (R 4.0.0)
#> ps 1.3.3 2020-05-08 [1] CRAN (R 4.0.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 4.0.0)
#> remotes 2.1.1 2020-02-15 [1] CRAN (R 4.0.0)
#> rlang 0.4.6 2020-05-02 [1] CRAN (R 4.0.0)
#> rmarkdown 2.1 2020-01-20 [1] CRAN (R 4.0.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> tibble 3.0.1 2020-04-20 [1] CRAN (R 4.0.0)
#> tidyr 1.0.3 2020-05-07 [1] CRAN (R 4.0.0)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0)
#> vctrs 0.2.99.9011 2020-05-04 [1] Github (r-lib/vctrs#0ca806c)
#> visdat 0.5.3 2019-02-15 [1] CRAN (R 4.0.0)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.13 2020-04-13 [1] CRAN (R 4.0.0)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Resources