The output of the qwraps2 code is not as expected - r

I am trying to get the summary table output for one of the datasets but the output is not in the form of a tidy table
options(qwraps2_markup = "markdown")
age_summary <- list("Age" =
list("Min" = ~min(.data$Age),
"Max" = ~max(.data$Age),
"Mean" = ~mean_sd(.data$Age)))
age_tab <- summary_table(insurance, age_summary)
age_tab
When I knit the RMarkdown file, the summary table is similar to the one that comes as an output in the Console and not the expected formatted summary table.

The object generated by qwraps2::summary_table is a character matrix with
the class attribute qwraps2_summary_table. The
qwraps2:::print.qwraps2_summary_table and qwraps2:::print.qable methods
are responsible for the way the table is presented in the output. Chunk
options will be responsible for how the table is rendered in the output
document.
Update: as of qwraps2 version 0.5.0 the use of the .data is no longer
needed or recommended.
options(qwraps2_markup = "markdown")
library(qwraps2)
eg_data <- data.frame(Age = rnorm(1000, mean = 54, sd = 10))
age_summary <- list("Age" =
list(
"Min" = ~ min(Age),
"Max" = ~ max(Age),
"Mean" = ~ mean_sd(Age)
)
)
age_table <- summary_table(eg_data, age_summary)
Take a look at the structure of age_table
str(age_table)
#> 'qwraps2_summary_table' chr [1:3, 1] "25.1137149669314" "83.5664804448924" ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:3] "Min" "Max" "Mean"
#> ..$ : chr "eg_data (N = 1,000)"
#> - attr(*, "rgroups")= Named int 3
#> ..- attr(*, "names")= chr "Age"
#> - attr(*, "n")= int 1000
As noted above, the object is a 3 x 1 character matrix of class
qwraps2_summary_table. To see the return in the R console:
print.default(age_table)
#> eg_data (N = 1,000)
#> Min "25.1137149669314"
#> Max "83.5664804448924"
#> Mean "53.75 ± 9.94"
#> attr(,"rgroups")
#> Age
#> 3
#> attr(,"n")
#> [1] 1000
#> attr(,"class")
#> [1] "qwraps2_summary_table" "matrix" "array"
Since the options(qwraps2_markup = "markdown") as been set, the printing
method will return a markdown table
age_table
#>
#>
#> | |eg_data (N = 1,000) |
#> |:-----------------|:-------------------|
#> |**Age** | |
#> | Min |25.1137149669314 |
#> | Max |83.5664804448924 |
#> | Mean |53.75 ± 9.94 |
Make sure you have the results = "asis" chunk option set in your .Rmd file
so that the table will render correctly in your output document.
Created on 2020-09-14 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Catalina 10.15.6
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Denver
#> date 2020-09-14
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.9 2020-08-24 [1] CRAN (R 4.0.2)
#> callr 3.4.4 2020-09-07 [1] CRAN (R 4.0.2)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.0)
#> knitr 1.29 2020-06-23 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2)
#> ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2)
#> qwraps2 * 0.5.0 2020-09-14 [1] local
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.0)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
#> rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.17 2020-09-09 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Related

Error with tune_bayes() about Gaussian Process being fit on no data

Description
I am trying to use tune_bayes() to optimize two parameters in a decision tree but the function always crashes giving me a an error
#> i Gaussian process model
#> ! The Gaussian process model is being fit using 2 features but only has 0
#> data points to do so. This may cause errors or a poor model fit.
#> ! Gaussian process model: no non-missing arguments to min; returning Inf, ...
#> x Gaussian process model: Error in seq_len(n - 1L): argument must be coerc...
#> ! An error occurred when creating candidates parameters: Error in seq_len(n - 1L) :
#> argument must be coercible to non-negative integer
#> Error: `best` should be a single, non-missing numeric
With tune_grid() the recipes and all work just fine but for some reason tune_bayes() does not want to run. I tried to update the tuneable parameters to some other range using update(...) but this did not help either. I do not understand why there seems to be no available data.
Reprex
library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
titanic <- read_csv("train.csv")
#>
#> -- Column specification --------------------------------------------------------
#> cols(
#> PassengerId = col_double(),
#> Survived = col_double(),
#> Pclass = col_double(),
#> Name = col_character(),
#> Sex = col_character(),
#> Age = col_double(),
#> SibSp = col_double(),
#> Parch = col_double(),
#> Ticket = col_character(),
#> Fare = col_double(),
#> Cabin = col_character(),
#> Embarked = col_character()
#> )
titanic <- titanic %>%
mutate(
Survived = factor(
Survived,
levels = c(0, 1),
labels = c("Deceased", "Survived")
)
)
set.seed(123)
titanic_split <- initial_split(titanic, strata = Survived)
titanic_train <- training(titanic_split)
titanic_test <- testing(titanic_split)
set.seed(456)
titanic_folds <- vfold_cv(titanic_train, strata = Survived)
titanic_rec <- recipe(formula = Survived ~ ., data = titanic_train) %>%
update_role(PassengerId, new_role = "Id") %>%
step_mutate(Cabin = if_else(is.na(Cabin), "Missing", "Available")) %>%
step_impute_median(Age) %>%
step_mutate(
title = str_match(Name, ", ([:alpha:]+)\\."),
title = if_else(is.na(title[, 2]), "NA", title[, 2])
) %>%
step_other(title, threshold = 0.02, other = "Other") %>%
step_rm(Ticket, Embarked) %>%
update_role(Name, new_role = "Id") %>%
step_string2factor(all_nominal_predictors())
titanic_spec <-
decision_tree(mode = "classification", tree_depth = tune(), min_n = tune()) %>%
set_mode("classification") %>%
set_engine("rpart")
titanic_wf <- workflow() %>%
add_recipe(titanic_rec) %>%
add_model(titanic_spec)
params <- parameters(titanic_wf) %>%
finalize()
options(tidymodels.dark = TRUE)
#doParallel::registerDoParallel(cores = 6)
set.seed(14834)
titanic_tune <- tune_bayes(
object = titanic_wf,
resamples = titanic_folds,
param_info = params,
iter = 30,
initial = 5,
metrics = metric_set(sensitivity, specificity, mcc, roc_auc),
control = control_bayes(
verbose = TRUE,
no_improve = 5,
save_pred = TRUE,
pkgs = "tidyverse"
)
)
#>
#> > Generating a set of 5 initial parameter results
#> v Initialization complete
#>
#> Optimizing sensitivity using the expected improvement
#>
#> -- Iteration 1 -----------------------------------------------------------------
#>
#> i Current best: sensitivity=NA (#iter NA)
#> i Gaussian process model
#> ! The Gaussian process model is being fit using 2 features but only has 0
#> data points to do so. This may cause errors or a poor model fit.
#> ! Gaussian process model: no non-missing arguments to min; returning Inf, ...
#> x Gaussian process model: Error in seq_len(n - 1L): argument must be coerc...
#> ! An error occurred when creating candidates parameters: Error in seq_len(n - 1L) :
#> argument must be coercible to non-negative integer
#> Error: `best` should be a single, non-missing numeric
#> x Optimization stopped prematurely; returning current results.
Created on 2021-07-05 by the reprex package (v2.0.0)
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.0.5 (2021-03-31)
#> os Windows 10 x64
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2021-07-05
#>
#> - Packages -------------------------------------------------------------------
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.3)
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3)
#> broom * 0.7.6 2021-04-05 [1] CRAN (R 4.0.5)
#> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.3)
#> class 7.3-18 2021-01-24 [2] CRAN (R 4.0.5)
#> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.3)
#> codetools 0.2-18 2020-11-04 [2] CRAN (R 4.0.5)
#> colorspace 2.0-1 2021-05-04 [1] CRAN (R 4.0.5)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.4)
#> DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.3)
#> dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.0.5)
#> dials * 0.0.9 2020-09-16 [1] CRAN (R 4.0.4)
#> DiceDesign 1.9 2021-02-13 [1] CRAN (R 4.0.4)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
#> dplyr * 1.0.6 2021-05-05 [1] CRAN (R 4.0.5)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.3)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.3)
#> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.0.5)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.5)
#> forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.0.5)
#> foreach 1.5.1 2020-10-15 [1] CRAN (R 4.0.3)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.3)
#> furrr 0.2.2 2021-01-29 [1] CRAN (R 4.0.5)
#> future 1.21.0 2020-12-10 [1] CRAN (R 4.0.3)
#> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3)
#> ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.0.4)
#> globals 0.14.0 2020-11-22 [1] CRAN (R 4.0.3)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.3)
#> gower 0.2.2 2020-06-23 [1] CRAN (R 4.0.3)
#> GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.0.4)
#> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.3)
#> hardhat 0.1.5 2020-11-09 [1] CRAN (R 4.0.4)
#> haven 2.3.1 2020-06-01 [1] CRAN (R 4.0.3)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.0.5)
#> hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.5)
#> htmltools 0.5.1.9005 2021-07-01 [1] Github (rstudio/htmltools#7fbab16)
#> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.5)
#> infer * 0.5.4.9000 2021-03-27 [1] Github (tidymodels/infer#66d24a0)
#> ipred 0.9-11 2021-03-12 [1] CRAN (R 4.0.4)
#> iterators 1.0.13 2020-10-15 [1] CRAN (R 4.0.3)
#> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3)
#> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.5)
#> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.5)
#> lava 1.6.9 2021-03-11 [1] CRAN (R 4.0.4)
#> lhs 1.1.1 2020-10-05 [1] CRAN (R 4.0.4)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.4)
#> listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.3)
#> lubridate 1.7.10 2021-02-26 [1] CRAN (R 4.0.4)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3)
#> MASS 7.3-53.1 2021-02-12 [2] CRAN (R 4.0.5)
#> Matrix 1.3-2 2021-01-06 [2] CRAN (R 4.0.5)
#> modeldata * 0.1.0 2020-10-22 [1] CRAN (R 4.0.4)
#> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.3)
#> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.3)
#> nnet 7.3-15 2021-01-24 [2] CRAN (R 4.0.5)
#> parallelly 1.25.0 2021-04-30 [1] CRAN (R 4.0.5)
#> parsnip * 0.1.5.9003 2021-05-22 [1] Github (tidymodels/parsnip#46a2018)
#> pillar 1.6.1 2021-05-16 [1] CRAN (R 4.0.5)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.3)
#> plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.3)
#> pROC 1.17.0.1 2021-01-13 [1] CRAN (R 4.0.4)
#> prodlim 2019.11.13 2019-11-17 [1] CRAN (R 4.0.4)
#> ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.5)
#> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.3)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
#> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.4)
#> readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.5)
#> readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.3)
#> recipes * 0.1.16.9000 2021-05-29 [1] Github (tidymodels/recipes#0806713)
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.5)
#> rlang * 0.4.11.9000 2021-07-01 [1] Github (r-lib/rlang#dc03e44)
#> rmarkdown 2.9.1 2021-07-01 [1] Github (rstudio/rmarkdown#1ea3575)
#> rpart * 4.1-15 2019-04-12 [1] CRAN (R 4.0.5)
#> rsample * 0.1.0 2021-05-08 [1] CRAN (R 4.0.5)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.3)
#> rvest 1.0.0 2021-03-09 [1] CRAN (R 4.0.5)
#> scales * 1.1.1 2020-05-11 [1] CRAN (R 4.0.3)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.3)
#> stringi 1.6.2 2021-05-17 [1] CRAN (R 4.0.5)
#> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.3)
#> survival 3.2-10 2021-03-16 [2] CRAN (R 4.0.5)
#> tibble * 3.1.2 2021-05-16 [1] CRAN (R 4.0.5)
#> tidymodels * 0.1.3 2021-04-19 [1] CRAN (R 4.0.5)
#> tidyr * 1.1.3 2021-03-03 [1] CRAN (R 4.0.5)
#> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.3)
#> tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.0.5)
#> timeDate 3043.102 2018-02-21 [1] CRAN (R 4.0.3)
#> tune * 0.1.5.9000 2021-05-22 [1] Github (tidymodels/tune#b0e83a7)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.3)
#> vctrs * 0.3.8 2021-04-29 [1] CRAN (R 4.0.3)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.5)
#> workflows * 0.2.2 2021-03-10 [1] CRAN (R 4.0.4)
#> workflowsets * 0.0.2 2021-04-16 [1] CRAN (R 4.0.5)
#> xfun 0.24 2021-06-15 [1] CRAN (R 4.0.5)
#> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.5)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.3)
#> yardstick * 0.0.8 2021-03-28 [1] CRAN (R 4.0.5)
#>
#> [1] C:/Users/Albert/Documents/R/win-library/4.0
#> [2] C:/Program Files/R/R-4.0.5/library

strange behavior using rbind with data.table (>= 1.13.0) in combination with data.frame

Trying to rbind a data.table containing an IDate (result of fread) to a data.frame containing a character converts the IDate to its internal integer representation. Probably this is by design, but if not it's a bug. fread supports IDate since data.table 1.13.0 (see https://github.com/Rdatatable/data.table/blob/master/NEWS.md).
The example below shows that the data.table method of rbind can deal with it correctly (throw an error), but the data.frame method of rbind does not.
I don't know how and where this can/should be fixed.
library(data.table)
df1 <- data.frame(date = "2020-11-05")
dt1 <- data.table(date = "2020-11-05")
dt2 <- fread("date\n2020-11-05")
rbind(dt1, dt2) # ok -- throws error: rbind.data.table
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 1 of item 2 does not match with column 1 of item 1.
## not ok -- converts int representation of IDate to character: rbind.data.frame
rbind(df1, dt2)
#> date
#> 1 2020-11-05
#> 2 18571
## the other way round: ok -- throws an error: rbind.data.table
rbind(dt2, df1)
#> Error in rbindlist(l, use.names, fill, idcol): Class attribute on column 1 of item 2 does not match with column 1 of item 1.
### solution
dt3 <- fread("date\n2020-11-05", colClasses = "character")
rbind(dt1, dt3)
#> date
#> 1: 2020-11-05
#> 2: 2020-11-05
Created on 2020-11-05 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.3 (2020-10-10)
#> os Debian GNU/Linux 10 (buster)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate de_AT.UTF-8
#> ctype de_AT.UTF-8
#> tz Europe/Vienna
#> date 2020-11-05
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> backports 1.2.0 2020-11-02 [1] CRAN (R 4.0.3)
#> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3)
#> cli 2.1.0 2020-10-12 [1] CRAN (R 4.0.3)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> data.table * 1.13.2 2020-10-19 [1] CRAN (R 4.0.3)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.3)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.3)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
#> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.3)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.3)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.3)
#> ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.3)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.3)
#> rlang 0.4.8 2020-10-08 [1] CRAN (R 4.0.3)
#> rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.3)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.3)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 3.0.0 2020-10-31 [1] CRAN (R 4.0.3)
#> usethis 1.6.3 2020-09-17 [1] CRAN (R 4.0.3)
#> withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.3)
#> xfun 0.19 2020-10-30 [1] CRAN (R 4.0.3)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /usr/local/lib/R/site-library
#> [2] /usr/lib/R/site-library
#> [3] /usr/lib/R/library

Column not being recognised as variable in R [duplicate]

This question already has answers here:
Convert row names into first column
(9 answers)
Closed 2 years ago.
Hi,
I just transposed a large data set and I realised that the first row doesn't have a column name. I have included an extract of the dataset, I tried to use names(df)[1] <- "Year" but it changed the variable name for the second column instead of the first. Is there a way I can include a variable name for the first column?
df <- structure(list(Construction = c("3209.4", "3307.0", "3519.3", "3693.0",
"3545.1", "3620.2"), Manufacturing = c(" 654.9", " 692.9", " 785.1",
" 810.1", " 744.8", " 793.6")), row.names = c("1975 1Q", "1975 2Q",
"1975 3Q", "1975 4Q", "1976 1Q", "1976 2Q"), class = "data.frame")
df
#> Construction Manufacturing
#> 1975 1Q 3209.4 654.9
#> 1975 2Q 3307.0 692.9
#> 1975 3Q 3519.3 785.1
#> 1975 4Q 3693.0 810.1
#> 1976 1Q 3545.1 744.8
#> 1976 2Q 3620.2 793.6
Created on 2020-09-03 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Catalina 10.15.5
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Melbourne
#> date 2020-09-03
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> backports 1.1.9 2020-08-24 [1] CRAN (R 4.0.2)
#> callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.2)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
#> devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.2)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
#> knitr 1.29 2020-06-23 [1] CRAN (R 4.0.2)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.4.3 2020-07-05 [1] CRAN (R 4.0.2)
#> ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2)
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.2)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
#> rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.2)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.2)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.2)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.2)
#> xfun 0.16 2020-07-24 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
It is the row.names and not a column. If we need to create a column with row names, use rownames_to_column from tibble
library(tibble)
library(dplyr)
df <- df %>%
rownames_to_column('Year')

Calculating upper and lower confidence intervals by group in dplyr summarise()

I am trying to make a table that shows N (number of observations), percent frequency (of answers > 0), and the lower and upper confidence intervals for percent frequency, and I want to group this by type.
Example of data
dat <- data.frame(
"type" = c("B","B","A","B","A","A","B","A","A","B","A","A","A","B","B","B"),
"num" = c(3,0,0,9,6,0,4,1,1,5,6,1,3,0,0,0)
)
Expected output (with values filled in):
Type N Percent Lower 95% CI Upper 95% CI
A
B
Attempt
library(dplyr)
library(qwraps2)
table<-dat %>%
group_by(type) %>%
summarise(N=n(),
mean.ci = mean_ci(dat$num),
"Percent"=n_perc(num > 0))
This worked to get N and percent frequency, but returned an error: "Column must be length 1 (a summary value), not 3" when I added in mean_ci
The second code I tried, found here:
table2<-dat %>%
group_by(type) %>%
summarise(N.num=n(),
mean.num = mean(dat$num),
sd.num = sd(dat$num),
"Percent"=n_perc(num > 0)) %>%
mutate(se.num = sd.num / sqrt(N.num),
lower.ci = 100*(mean.num - qt(1 - (0.05 / 2), N.num - 1) * se.num),
upper.ci = 100*(mean.num + qt(1 - (0.05 / 2), N.num - 1) * se.num))
# A tibble: 2 x 8
# type N.num mean.num sd.num Percent se.num lower.ci upper.ci
# <fct> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
#1 A 8 2.44 2.83 "6 (75.00\\%)" 1.00 7.35 480.
#2 B 8 2.44 2.83 "4 (50.00\\%)" 1.00 7.35 480.
This gave me an output, but the confidence intervals are not logical.
The output of mean_ci is a vector of length 3. This is maybe unexpected because the package has added a print method so that when you see this in the console it looks like a single character value and not a numeric length > 1 vector. But, you can see the underlying data structure by looking at str.
mean_ci(dat$num) %>% str
# 'qwraps2_mean_ci' Named num [1:3] 2.44 1.05 3.82
# - attr(*, "names")= chr [1:3] "mean" "lcl" "ucl"
# - attr(*, "alpha")= num 0.05
In summarize, each element of each column of the output needs to be length 1, so providing a length 3 object for summarize to put in a single "cell" (column element) results in an error. A workaround is to put the length 3 vector in a list, so that it is now a length 1 list. Then you can use unnest_wider to separate it into 3 columns (and therefore making the table "wider")
library(tidyverse)
dat %>%
group_by(type) %>%
summarise( N=n(),
mean.ci = list(mean_ci(num)),
"Percent"= n_perc(num > 0)) %>%
unnest_wider(mean.ci)
# # A tibble: 2 x 6
# type N mean lcl ucl Percent
# <fct> <int> <dbl> <dbl> <dbl> <chr>
# 1 A 8 2.25 0.523 3.98 "6 (75.00\\%)"
# 2 B 8 2.62 0.344 4.91 "4 (50.00\\%)"
IceCreamToucan’s answer is very good. I’m posting this answer to offer a
different way to present the information.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(qwraps2)
dat <- data.frame("type" = c("B","B","A","B","A","A","B","A","A","B","A","A","A","B","B","B"),
"num" = c(3,0,0,9,6,0,4,1,1,5,6,1,3,0,0,0))
When building the dplyr::summarize call you can use the qwraps2::frmtci
call to format the output of qwraps2::mean_ci into a character string of
length one.
I would also recommend using the data pronoun .data so you can be explicit
about the variables to summarize.
dat %>%
dplyr::group_by(type) %>%
dplyr::summarize(N = n(),
mean.ci = qwraps2::frmtci(qwraps2::mean_ci(.data$num)),
Percent = qwraps2::n_perc(.data$num > 0))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 4
#> type N mean.ci Percent
#> <chr> <int> <chr> <chr>
#> 1 A 8 2.25 (0.52, 3.98) "6 (75.00\\%)"
#> 2 B 8 2.62 (0.34, 4.91) "4 (50.00\\%)"
Created on 2020-09-15 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Catalina 10.15.6
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Denver
#> date 2020-09-15
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.9 2020-08-24 [1] CRAN (R 4.0.2)
#> callr 3.4.4 2020-09-07 [1] CRAN (R 4.0.2)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> dplyr * 1.0.2 2020-08-18 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.0)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.0)
#> knitr 1.29 2020-06-23 [1] CRAN (R 4.0.0)
#> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2)
#> ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
#> qwraps2 * 0.5.0 2020-09-14 [1] local
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.0)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
#> rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.2)
#> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
#> utf8 1.1.4 2018-05-24 [1] CRAN (R 4.0.0)
#> vctrs 0.3.4 2020-08-29 [1] CRAN (R 4.0.2)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.17 2020-09-09 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Table R Markdown qwraps2 and knitr::kable. Column missing

I have the following dummy data.frame
set.seed(12345)
df<-data.frame(var1=floor(runif(10,1000000,5000000)), group=rep(c("A","B"),5), event=rep(c("Yes","No"),5))
And would like to create a summary table of it. I tried to use qwraps2
As follows:
summary<-list("VAlue1" =
list("min" = ~ min(.data$var1),
"max" = ~ max(.data$var1),
"mean (sd)" = ~ qwraps2::mean_sd(.data$var1)),
"Group" =
list("Yes" = ~ qwraps2::n_perc0(.data$group == "A"),
"No" = ~ qwraps2::n_perc0(.data$group == "B")))
knitr::kable(
qwraps2::summary_table(dplyr::group_by(df, event),summary )
)
The output is unfortunately missing the variable to look at:
| |event: No (N = 5) |event: Yes (N = 5) |
|:---------|:--------------------------------|:----------------------------------|
|min |2591303 |1315253 |
|max |4232714 |4711820 |
|mean (sd) |3,456,579.40 ± 672,665.35 |3,029,844.00 ± 1,572,709.32 |
|Yes |0 (0) |5 (100) |
|No |5 (100) |0 (0) |
How do I incorporate the category "Value1" and "Group"?
Thank you!
suggestions for other packages are welcome, too.
The object returned by summary_table is a character matrix with the
additional S3 class qwraps2_summary_table. The rowgroup names Value1 and
Group are not part of the character matrix explicitly, they are part
attributes. The print method for the qwraps2_summary_table object builds
the table as need for the appropriate markup language, LaTeX or markdown.
Two edits the the example posted to get the table you are looking for:
Add options(qwraps2_markup = "markdown") to your script. The default
mark up language is LaTeX, setting this option changes the default to
markdown.
Do not wrap summary_table inside of knitr::kable: this prevents the
needed print method from being called.
options(qwraps2_markup = "markdown")
set.seed(12345)
df <- data.frame(var1 = floor(runif(10,1000000,5000000)),
group = rep(c("A","B"),5),
event = rep(c("Yes","No"),5))
summary <- list("Value1" =
list("min" = ~ min(var1),
"max" = ~ max(var1),
"mean (sd)" = ~ qwraps2::mean_sd(var1)),
"Group" =
list("Yes" = ~ qwraps2::n_perc0(group == "A"),
"No" = ~ qwraps2::n_perc0(group == "B")))
tab <- qwraps2::summary_table(df, summaries = summary, by = "event")
str(tab)
#> 'qwraps2_summary_table' chr [1:5, 1:2] "1665487" "4958947" ...
#> - attr(*, "dimnames")=List of 2
#> ..$ : chr [1:5] "min" "max" "mean (sd)" "Yes" ...
#> ..$ : chr [1:2] "No (N = 5)" "Yes (N = 5)"
#> - attr(*, "rgroups")= Named int [1:2] 3 2
#> ..- attr(*, "names")= chr [1:2] "Value1" "Group"
tab
#>
#>
#> | |No (N = 5) |Yes (N = 5) |
#> |:----------------------|:----------------------------------|:--------------------------------|
#> |**Value1** | | |
#> | min |1665487 |2300381 |
#> | max |4958947 |4043929 |
#> | mean (sd) |3,741,784.20 ± 1,370,520.00 |3,392,933.80 ± 782,295.15 |
#> |**Group** | | |
#> | Yes |0 (0) |5 (100) |
#> | No |5 (100) |0 (0) |
Created on 2020-09-15 by the reprex package (v0.3.0)
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Catalina 10.15.6
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Denver
#> date 2020-09-15
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
#> backports 1.1.9 2020-08-24 [1] CRAN (R 4.0.2)
#> callr 3.4.4 2020-09-07 [1] CRAN (R 4.0.2)
#> cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0)
#> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0)
#> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0)
#> devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2)
#> digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0)
#> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
#> fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0)
#> htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.0)
#> knitr 1.29 2020-06-23 [1] CRAN (R 4.0.0)
#> magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0)
#> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0)
#> pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
#> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.0)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
#> processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2)
#> ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2)
#> qwraps2 0.5.0 2020-09-14 [1] local
#> R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.0)
#> Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.0)
#> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
#> rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
#> rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.0)
#> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.0)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
#> testthat 2.3.2 2020-03-02 [1] CRAN (R 4.0.0)
#> usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.0)
#> withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.0)
#> xfun 0.17 2020-09-09 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Resources