Case when doesnt run for factor columns [duplicate] - r

This question already has answers here:
case_when in mutate pipe
(7 answers)
Closed 2 years ago.
THis seems fair enough, maybe its a bug or I am missing something very basic. I try to convert Species to binary variable & hence using case when for a simple operation, however receive an error not sure should arise.
iris %>%
dplyr::mutate(Species=as.factor(Species),
Species=case_when(Species=="setosa"~"virginica",
TRUE~Species))
Error: Problem with `mutate()` input `Species`.
x must be a character vector, not a `factor` object.
i Input `Species` is `case_when(Species == "setosa" ~ "virginica", TRUE ~ Species)`.
Details on session info
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] conflicted_1.0.4 extrafontdb_1.0 extrafont_0.17 forcats_0.5.0
[5] purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.4
[9] tidyverse_1.3.0 ggplot2_3.3.2 dplyr_1.0.2 stringr_1.4.0
loaded via a namespace (and not attached):
[1] qpdf_1.1 xfun_0.19 tidyselect_1.1.0
[4] haven_2.3.1 snakecase_0.11.0 colorspace_1.4-1
[7] vctrs_0.3.4 generics_0.1.0 usethis_1.6.3
[10] htmltools_0.5.0 yaml_2.2.1 utf8_1.1.4
[13] rlang_0.4.8 pillar_1.4.6 glue_1.4.2
[16] withr_2.3.0 DBI_1.1.0 dbplyr_2.0.0
[19] modelr_0.1.8 readxl_1.3.1 lifecycle_0.2.0
[22] munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0
[25] rvest_0.3.6 memoise_1.1.0 evaluate_0.14
[28] knitr_1.30 curl_4.3 fansi_0.4.1
[31] Rttf2pt1_1.3.8 broom_0.7.2 pdftools_2.3.1
[34] Rcpp_1.0.5 scales_1.1.1 backports_1.2.0
[37] jsonlite_1.7.1 fs_1.5.0 hms_0.5.3
[40] askpass_1.1 digest_0.6.27 stringi_1.5.3
[43] grid_4.0.3 cli_2.1.0 tools_4.0.3
[46] magrittr_1.5 crayon_1.3.4 pkgconfig_2.0.3
[49] ellipsis_0.3.1 xml2_1.3.2 reprex_0.3.0
[52] lubridate_1.7.9 tidytuesdayR_1.0.1 assertthat_0.2.1
[55] rmarkdown_2.5 httr_1.4.2 rstudioapi_0.12
[58] R6_2.5.0 compiler_4.0.3

Using case_when on factor variables is bit tricky.
case_when is type strict meaning all the values should evaluate to same type. The first value that you have is of type character ("virginica") and the TRUE value is of type factor hence you get a type mismatch error there. Also all the values should have factor with same levels as your original data. So incorporating all these changes you could do :
library(dplyr)
iris %>%
mutate(Species=case_when(Species == "setosa" ~
factor("virginica", levels = unique(.$Species)),
TRUE ~ Species))

The iris data set already defaults to having the Species column by a factor. You want character type here, so:
iris %>%
dplyr::mutate(Species=as.character(Species),
Species=case_when(Species=="setosa" ~ "virginica", TRUE ~ Species))

Related

Tidyr's crossing() function not producing expected names in output

I apologize in advance that I could not create a reproducible example, but when I do tidyr::crossing on some dataframes, I get a crossed tibble where the x variables have the form: x$col_name and the y variables have y$col_name. If I do:
crossing(iris,mtcars)
I get names that don't have the x$ prefix, as desired. I checked that the class of the input dataframes are the same as the example above, and there are no duplicate names in the example I'm working with. I can't share the data for the usual privacy reasons. I realize there is not much to work with here, but I'm hoping someone here is experience enough with tidyr to understand this issue.
Here is some session info:
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] writexl_1.4.0 readxl_1.4.0 lubridate_1.8.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9
[7] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] cellranger_1.1.0 pillar_1.7.0 compiler_4.2.0 dbplyr_2.2.0 tools_4.2.0
[6] jsonlite_1.8.0 lifecycle_1.0.1 gtable_0.3.0 pkgconfig_2.0.3 rlang_1.0.2
[11] reprex_2.0.1 rstudioapi_0.13 DBI_1.1.3 cli_3.3.0 haven_2.5.0
[16] xml2_1.3.3 withr_2.5.0 httr_1.4.3 fs_1.5.2 generics_0.1.2
[21] vctrs_0.4.1 hms_1.1.1 grid_4.2.0 tidyselect_1.1.2 glue_1.6.2
[26] R6_2.5.1 fansi_1.0.3 tzdb_0.3.0 modelr_0.1.8 magrittr_2.0.3
[31] backports_1.4.1 scales_1.2.0 ellipsis_0.3.2 rvest_1.0.2 assertthat_0.2.1
[36] colorspace_2.0-3 utf8_1.2.2 stringi_1.7.6 munsell_0.5.0 broom_0.8.0
[41] crayon_1.5.1
Okay, the answer is simple:
crossing(iris,mtcars)
crossing(x = iris,y = mtcars)
This is rather odd behavior in my opinion.

Using variables inside function R. Error in is_data_pronoun(expr)

I have created a function in R and have been using it in research for well over a month now. This function I have created takes use of the embrace operator {{. The purpose of the function isn't really important, but basically at the beginning of my function I created a tibble from the data given in which I use through the function.
f <- function(data, x, y, z){
tb <- data %>%
transmute("var1" = {{x}},
"var2" = {{y}},
"var3" = {{z}})
# Do some stuff with tb
return(tb)
}
My data lets call df already has the variable names x, y, and z so I have been able to just use function by just putting in the data name as shown below.
df <- tibble("x" = 1:3,
"y" = 4:6,
"z" = 7:9)
f(data = df)
> output
However, today I installed tidymodels and right after I installed it I have been getting an error.
f(data = df)
>Error in is_data_pronoun(expr) :
argument "expr" is missing, with no default
It seems to fix this error all I have to do is give the variables names in the function and it works as show below.
f(data = df, x = x, y = y, z = z)
># A tibble: 3 x 3
var1 var2 var3
<int> <int> <int>
1 1 4 7
2 2 5 8
3 3 6 9
This is kinda annoying as now I would have to go all throughout my file when I use the function and put x = x, y = y, z = z in the function. Does anyone have any idea why I am getting this error and why it all the sudden as come up and how to fix it? I am planning on publishing the function for others to use so thats why I'm using the {{. I have also already completely uninstalled R and all my packages and reinstalled what I was using except for tidymodels and am still getting the error. My guess is it has something to do with updated version of dplyr?
Here is my session info
> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tseries_0.10-50 knitr_1.38
[3] tidyquant_1.0.3 quantmod_0.4.18
[5] TTR_0.24.3 PerformanceAnalytics_2.0.4
[7] xts_0.12.1 zoo_1.8-9
[9] lubridate_1.8.0 forecast_8.16
[11] timetk_2.8.0 forcats_0.5.1
[13] stringr_1.4.0 dplyr_1.0.8
[15] purrr_0.3.4 readr_2.1.2
[17] tidyr_1.2.0 tibble_3.1.6
[19] ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 ggsignif_0.6.3 ellipsis_0.3.2 class_7.3-20
[5] fs_1.5.2 rstudioapi_0.13 ggpubr_0.4.0 listenv_0.8.0
[9] furrr_0.2.3 prodlim_2019.11.13 fansi_1.0.3 xml2_1.3.3
[13] codetools_0.2-18 splines_4.1.3 jsonlite_1.8.0 broom_0.7.12
[17] dbplyr_2.1.1 compiler_4.1.3 httr_1.4.2 backports_1.4.1
[21] assertthat_0.2.1 Matrix_1.4-0 fastmap_1.1.0 cli_3.2.0
[25] htmltools_0.5.2 tools_4.1.3 gtable_0.3.0 glue_1.6.2
[29] Rcpp_1.0.8.3 carData_3.0-5 cellranger_1.1.0 fracdiff_1.5-1
[33] vctrs_0.4.0 urca_1.3-0 nlme_3.1-155 lmtest_0.9-40
[37] timeDate_3043.102 gower_1.0.0 xfun_0.30 globals_0.14.0
[41] rvest_1.0.2 lifecycle_1.0.1 rstatix_0.7.0 future_1.24.0
[45] MASS_7.3-55 scales_1.1.1 ipred_0.9-12 hms_1.1.1
[49] parallel_4.1.3 yaml_2.3.5 curl_4.3.2 rpart_4.1.16
[53] stringi_1.7.6 hardhat_0.2.0 lava_1.6.10 rlang_1.0.2
[57] pkgconfig_2.0.3 rsample_0.1.1 evaluate_0.15 lattice_0.20-45
[61] recipes_0.2.0 tidyselect_1.1.2 parallelly_1.31.0 magrittr_2.0.3
[65] R6_2.5.1 generics_0.1.2 DBI_1.1.2 pillar_1.7.0
[69] haven_2.4.3 withr_2.5.0 survival_3.2-13 abind_1.4-5
[73] nnet_7.3-17 future.apply_1.8.1 modelr_0.1.8 crayon_1.5.1
[77] car_3.0-12 Quandl_2.11.0 utf8_1.2.2 tzdb_0.3.0
[81] rmarkdown_2.13 grid_4.1.3 readxl_1.4.0 reprex_2.0.1
[85] digest_0.6.29 munsell_0.5.0 quadprog_1.5-8

RStudio Viewer Error: "session/viewhtml...." not found

I recently installed a daily build version of R Studio, 1.4.671. Since that installation, anything that runs in the viewer (e.g. gt or lavaanPlot) gives me an error like this:
/session/viewhtml528813ce72d/index.html?viewer_pane=1&capabilities=1&host=http%3A%2F%2F127.0.0.1%3A27742 not found
I have fully uninstalled 1.4.671, restarted my computer, and reinstalled the version that worked this morning, 1.3.1056. Not sure
This is becoming quite a problem because I am not able to easily see any model coefficients that I am currently working on (in a neat way, they are messy in the console).
I have also reset RStudio's state following https://support.rstudio.com/hc/en-us/articles/200534577-Resetting-RStudio-s-State and removed my .Renviron file.
Update: if the error shows but I choose to export as HTML, the HTML file works.
Update2: both running Shiny and knitting an RMarkdown document to HTML works. It's just displaying something inside RStudio's viewer that is causing issues.
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] gt_0.2.1 patchwork_1.0.0 waffle_1.0.1
[4] plotly_4.9.2.1 ggstance_0.3.4 ggridges_0.5.2
[7] foreign_0.8-78 gghighlight_0.3.0 gridExtra_2.3
[10] readxl_1.3.1 emmeans_1.4.7 broom_0.5.6
[13] fastDummies_1.6.1 modelsummary_0.5.0 tables_0.9.3
[16] gtsummary_1.3.2 janitor_2.0.1 haven_2.3.1
[19] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.0
[22] purrr_0.3.4 readr_1.3.1 tidyr_1.1.0
[25] tibble_3.0.1 ggplot2_3.3.1 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] nlme_3.1-147 fs_1.4.1 lubridate_1.7.8
[4] RColorBrewer_1.1-2 httr_1.4.1 tools_4.0.0
[7] backports_1.1.7 DT_0.13 R6_2.4.1
[10] DBI_1.1.0 lazyeval_0.2.2 colorspace_1.4-1
[13] withr_2.2.0 tidyselect_1.1.0 extrafontdb_1.0
[16] curl_4.3 compiler_4.0.0 cli_2.0.2
[19] rvest_0.3.5 xml2_1.3.2 sandwich_2.5-1
[22] labeling_0.3 sass_0.2.0 scales_1.1.1
[25] checkmate_2.0.0 mvtnorm_1.1-0 commonmark_1.7
[28] digest_0.6.25 rmarkdown_2.2 pkgconfig_2.0.3
[31] htmltools_0.5.0 extrafont_0.17 dbplyr_1.4.4
[34] htmlwidgets_1.5.1 rlang_0.4.6 rstudioapi_0.11
[37] farver_2.0.3 generics_0.0.2 zoo_1.8-8
[40] jsonlite_1.6.1 magrittr_1.5 Matrix_1.2-18
[43] Rcpp_1.0.4.6 munsell_0.5.0 fansi_0.4.1
[46] lifecycle_0.2.0 stringi_1.4.6 multcomp_1.4-13
[49] yaml_2.2.1 snakecase_0.11.0 MASS_7.3-51.5
[52] plyr_1.8.6 grid_4.0.0 blob_1.2.1
[55] crayon_1.3.4 lattice_0.20-41 splines_4.0.0
[58] hms_0.5.3 knitr_1.28 pillar_1.4.4
[61] estimability_1.3 codetools_0.2-16 reprex_0.3.0
[64] glue_1.4.1 packrat_0.5.0 evaluate_0.14
[67] data.table_1.12.8 modelr_0.1.8 vctrs_0.3.0
[70] Rttf2pt1_1.3.8 cellranger_1.1.0 gtable_0.3.0
[73] assertthat_0.2.1 xfun_0.14 xtable_1.8-4
[76] coda_0.19-3 survival_3.1-12 viridisLite_0.3.0
[79] TH.data_1.0-10 ellipsis_0.3.1
>

How to run an lapply in parallel in R?

So I have tried a few different ways of doing this but each returns an a different error which is making me question if i'm even doing it correctly.
So without any parallel components, we have the following:
all_necks <- lapply(b_list, b_fun)
This works perfectly; b_list is a dataframe and b_fun is a ton of joins and functions which are to be done on the list.
Because each run takes about 5 minutes and there are 550 elements in b_list, I need this to be faster to be practical.
I try future.lapply but get the following error:
library(future.apply)
options(future.globals.maxSize= 178258920000)
plan(multiprocess, workers = 5) ## Parallelize using five cores
all_necks <- future_lapply(b_list, b_fun)
ERROR:
Error in serialize(data, node$con) : error writing to connection
Then I tried foreach and got the following:
library(doParallel)
cl <- makeCluster(detectCores())
registerDoParallel(cl)
all_necks <- foreach(i = 1:b_list %dopar% {b_fun})
ERROR:
There were 16 warnings (use warnings() to see them)
1: In environment() : closing unused connection 19 (<-DESKTOP-XXX)
2: In environment() : closing unused connection 18 (<-DESKTOP-XXX)
...
I must be doing this incorrectly but I really just want this long lapply to run faster via parallel processing.
I would prefer to do this on 5 cores.
EDIT: Session Info Added
R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] future.apply_1.5.0 future_1.17.0 formattable_0.2.0.1 lubridate_1.7.9 data.table_1.12.8 chron_2.3-55
[7] Nmisc_0.3.5 anytime_0.3.7 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.0 purrr_0.3.4
[13] readr_1.3.1 tidyr_1.1.0 tibble_3.0.1 ggplot2_3.3.2 tidyverse_1.3.0 jsonlite_1.6.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 lattice_0.20-41 listenv_0.8.0 assertthat_0.2.1 digest_0.6.25 R6_2.4.1
[7] cellranger_1.1.0 backports_1.1.7 reprex_0.3.0 evaluate_0.14 httr_1.4.1 pillar_1.4.4
[13] rlang_0.4.6 readxl_1.3.1 rstudioapi_0.11 furrr_0.1.0 blob_1.2.1 rmarkdown_2.3
[19] htmlwidgets_1.5.1 munsell_0.5.0 tinytex_0.24 broom_0.5.6 compiler_4.0.1 modelr_0.1.8
[25] xfun_0.14 pkgconfig_2.0.3 globals_0.12.5 htmltools_0.5.0 tidyselect_1.1.0 codetools_0.2-16
[31] fansi_0.4.1 crayon_1.3.4 dbplyr_1.4.4 withr_2.2.0 rappdirs_0.3.1 grid_4.0.1
[37] nlme_3.1-148 gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0 magrittr_1.5 scales_1.1.1
[43] cli_2.0.2 stringi_1.4.6 fs_1.4.1 xml2_1.3.2 ellipsis_0.3.1 generics_0.0.2
[49] vctrs_0.3.1 tools_4.0.1 glue_1.4.1 hms_0.5.3 parallel_4.0.1 colorspace_1.4-1
[55] rvest_0.3.5 knitr_1.28 haven_2.3.1

Why did this aggregation R code give the total sum rather than the monthly aggregation of daily expenses?

In the typical result of the code below, the monthly column should has the monthly aggregations of daily expenses, which has 366 entries, while monthly should be 12 entries. Why did the code below give the total sum of one entry in the Monthly variable?
Although this link https://ro-che.info/articles/2017-02-22-group_by_month_r shows the code leads to the desired outcome, the monthly aggregated 12 values!
So why my code which is exactly the same gives only the total sum, as it is shown in the screenshot of R studio on my laptop!
library(dplyr)
library(lubridate)
set.seed(2017)
#options(digits=4)
(expenses <- data_frame(
date=seq(as.Date("2016-01-01"), as.Date("2016-12-31"), by=1),
amount=rgamma(length(date), shape = 2, scale = 20)))
Monthly<-expenses %>% group_by(month=floor_date(date, "month")) %>%
summarize(amount=sum(amount))
monthly2<-expenses %>% mutate(Mon=month(date), Day=day(date)) %>%
group_by(Mon,Day) %>%
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate_1.7.4 plyr_1.8.4 XML_3.98-1.20 leaflet_2.0.2
[5] raster_2.9-23 sp_1.3-1 elevatr_0.2.0 clifro_3.2-2
[9] knitr_1.23 MASS_7.3-51.4 forcats_0.4.0 stringr_1.4.0
[13] dplyr_0.8.3 purrr_0.3.2 readr_1.3.1 tidyr_0.8.3
[17] tibble_2.1.3 ggplot2_3.2.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 lattice_0.20-38 utf8_1.1.4
[4] assertthat_0.2.1 zeallot_0.1.0 digest_0.6.20
[7] mime_0.7 R6_2.4.0 cellranger_1.1.0
[10] backports_1.1.4 httr_1.4.0 pillar_1.4.2
[13] rlang_0.4.0 curl_4.0 lazyeval_0.2.2
[16] readxl_1.3.1 rstudioapi_0.10 labeling_0.3
[19] htmlwidgets_1.3 RCurl_1.95-4.12 munsell_0.5.0
[22] shiny_1.3.2 broom_0.5.2 compiler_3.6.1
[25] httpuv_1.5.1 modelr_0.1.4 xfun_0.8
[28] pkgconfig_2.0.2 htmltools_0.3.6 tidyselect_0.2.5
[31] codetools_0.2-16 fansi_0.4.0 crayon_1.3.4
[34] withr_2.1.2 later_0.8.0 bitops_1.0-6
[37] grid_3.6.1 nlme_3.1-140 jsonlite_1.6
[40] xtable_1.8-4 gtable_0.3.0 magrittr_1.5
[43] scales_1.0.0 cli_1.1.0 stringi_1.4.3
[46] reshape2_1.4.3 promises_1.0.1 xml2_1.2.1
[49] generics_0.0.2 vctrs_0.2.0 RColorBrewer_1.1-2
[52] tools_3.6.1 glue_1.3.1 hms_0.5.0
[55] crosstalk_1.0.0 yaml_2.2.0 colorspace_1.4-1
[58] rvest_0.3.4 haven_2.1.1

Resources