I have created a function in R and have been using it in research for well over a month now. This function I have created takes use of the embrace operator {{. The purpose of the function isn't really important, but basically at the beginning of my function I created a tibble from the data given in which I use through the function.
f <- function(data, x, y, z){
tb <- data %>%
transmute("var1" = {{x}},
"var2" = {{y}},
"var3" = {{z}})
# Do some stuff with tb
return(tb)
}
My data lets call df already has the variable names x, y, and z so I have been able to just use function by just putting in the data name as shown below.
df <- tibble("x" = 1:3,
"y" = 4:6,
"z" = 7:9)
f(data = df)
> output
However, today I installed tidymodels and right after I installed it I have been getting an error.
f(data = df)
>Error in is_data_pronoun(expr) :
argument "expr" is missing, with no default
It seems to fix this error all I have to do is give the variables names in the function and it works as show below.
f(data = df, x = x, y = y, z = z)
># A tibble: 3 x 3
var1 var2 var3
<int> <int> <int>
1 1 4 7
2 2 5 8
3 3 6 9
This is kinda annoying as now I would have to go all throughout my file when I use the function and put x = x, y = y, z = z in the function. Does anyone have any idea why I am getting this error and why it all the sudden as come up and how to fix it? I am planning on publishing the function for others to use so thats why I'm using the {{. I have also already completely uninstalled R and all my packages and reinstalled what I was using except for tidymodels and am still getting the error. My guess is it has something to do with updated version of dplyr?
Here is my session info
> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tseries_0.10-50 knitr_1.38
[3] tidyquant_1.0.3 quantmod_0.4.18
[5] TTR_0.24.3 PerformanceAnalytics_2.0.4
[7] xts_0.12.1 zoo_1.8-9
[9] lubridate_1.8.0 forecast_8.16
[11] timetk_2.8.0 forcats_0.5.1
[13] stringr_1.4.0 dplyr_1.0.8
[15] purrr_0.3.4 readr_2.1.2
[17] tidyr_1.2.0 tibble_3.1.6
[19] ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 ggsignif_0.6.3 ellipsis_0.3.2 class_7.3-20
[5] fs_1.5.2 rstudioapi_0.13 ggpubr_0.4.0 listenv_0.8.0
[9] furrr_0.2.3 prodlim_2019.11.13 fansi_1.0.3 xml2_1.3.3
[13] codetools_0.2-18 splines_4.1.3 jsonlite_1.8.0 broom_0.7.12
[17] dbplyr_2.1.1 compiler_4.1.3 httr_1.4.2 backports_1.4.1
[21] assertthat_0.2.1 Matrix_1.4-0 fastmap_1.1.0 cli_3.2.0
[25] htmltools_0.5.2 tools_4.1.3 gtable_0.3.0 glue_1.6.2
[29] Rcpp_1.0.8.3 carData_3.0-5 cellranger_1.1.0 fracdiff_1.5-1
[33] vctrs_0.4.0 urca_1.3-0 nlme_3.1-155 lmtest_0.9-40
[37] timeDate_3043.102 gower_1.0.0 xfun_0.30 globals_0.14.0
[41] rvest_1.0.2 lifecycle_1.0.1 rstatix_0.7.0 future_1.24.0
[45] MASS_7.3-55 scales_1.1.1 ipred_0.9-12 hms_1.1.1
[49] parallel_4.1.3 yaml_2.3.5 curl_4.3.2 rpart_4.1.16
[53] stringi_1.7.6 hardhat_0.2.0 lava_1.6.10 rlang_1.0.2
[57] pkgconfig_2.0.3 rsample_0.1.1 evaluate_0.15 lattice_0.20-45
[61] recipes_0.2.0 tidyselect_1.1.2 parallelly_1.31.0 magrittr_2.0.3
[65] R6_2.5.1 generics_0.1.2 DBI_1.1.2 pillar_1.7.0
[69] haven_2.4.3 withr_2.5.0 survival_3.2-13 abind_1.4-5
[73] nnet_7.3-17 future.apply_1.8.1 modelr_0.1.8 crayon_1.5.1
[77] car_3.0-12 Quandl_2.11.0 utf8_1.2.2 tzdb_0.3.0
[81] rmarkdown_2.13 grid_4.1.3 readxl_1.4.0 reprex_2.0.1
[85] digest_0.6.29 munsell_0.5.0 quadprog_1.5-8
Related
I apologize in advance that I could not create a reproducible example, but when I do tidyr::crossing on some dataframes, I get a crossed tibble where the x variables have the form: x$col_name and the y variables have y$col_name. If I do:
crossing(iris,mtcars)
I get names that don't have the x$ prefix, as desired. I checked that the class of the input dataframes are the same as the example above, and there are no duplicate names in the example I'm working with. I can't share the data for the usual privacy reasons. I realize there is not much to work with here, but I'm hoping someone here is experience enough with tidyr to understand this issue.
Here is some session info:
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] writexl_1.4.0 readxl_1.4.0 lubridate_1.8.0 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9
[7] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] cellranger_1.1.0 pillar_1.7.0 compiler_4.2.0 dbplyr_2.2.0 tools_4.2.0
[6] jsonlite_1.8.0 lifecycle_1.0.1 gtable_0.3.0 pkgconfig_2.0.3 rlang_1.0.2
[11] reprex_2.0.1 rstudioapi_0.13 DBI_1.1.3 cli_3.3.0 haven_2.5.0
[16] xml2_1.3.3 withr_2.5.0 httr_1.4.3 fs_1.5.2 generics_0.1.2
[21] vctrs_0.4.1 hms_1.1.1 grid_4.2.0 tidyselect_1.1.2 glue_1.6.2
[26] R6_2.5.1 fansi_1.0.3 tzdb_0.3.0 modelr_0.1.8 magrittr_2.0.3
[31] backports_1.4.1 scales_1.2.0 ellipsis_0.3.2 rvest_1.0.2 assertthat_0.2.1
[36] colorspace_2.0-3 utf8_1.2.2 stringi_1.7.6 munsell_0.5.0 broom_0.8.0
[41] crayon_1.5.1
Okay, the answer is simple:
crossing(iris,mtcars)
crossing(x = iris,y = mtcars)
This is rather odd behavior in my opinion.
Using the data available on FactoMineR package: (http://factominer.free.fr/book/orange.csv), I created a PCA and after a PCA with supplementary information. The latter step when I used the function dimdesc() I obtained an error that surfing through internet I was not able to find a way to solve it.
I post here if someone could help me. I attach below my code. Thanks on advance for your comments/hints.
data_orange<-read.delim("orange.csv", header = T, sep = ";")
data_orange_subset <- data_orange[,1:8]
res.pca<-PCA(data_orange_subset, graph = F)
dimdesc(res.pca,axes=1:2)
--> This works
When considering supplementary information in the PCA:
data_orange_2 <- data_orange[,-c(16,17)]
res.pca.all <- PCA(data_orange_2, graph = F,
quanti.sup = 9:15,
quali.sup = 1)
dimdesc(res.pca.all, axes = 1:2)
Error in if (sum(tabF[, 2] <= proba) > 0) resF <- tabF[tabF[, 2] <= proba, :
missing value where TRUE/FALSE needed
I've checked for NA values in dataframe but it is not the case.
SessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19.1
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
[4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] factoextra_1.0.7.999 ggplot2_3.3.6 FactoMineR_2.4
loaded via a namespace (and not attached):
[1] ggrepel_0.9.1 Rcpp_1.0.8.3 lattice_0.20-45 tidyr_1.2.0
[5] assertthat_0.2.1 digest_0.6.29 utf8_1.2.2 R6_2.5.1
[9] backports_1.4.1 evaluate_0.15 pillar_1.7.0 rlang_1.0.3
[13] rstudioapi_0.13 minqa_1.2.4 car_3.1-0 nloptr_2.0.3
[17] Matrix_1.4-1 DT_0.23 rmarkdown_2.13 labeling_0.4.2
[21] splines_4.1.2 lme4_1.1-29 htmlwidgets_1.5.4 munsell_0.5.0
[25] broom_0.8.0 compiler_4.1.2 xfun_0.31 pkgconfig_2.0.3
[29] faraway_1.0.7 htmltools_0.5.2 flashClust_1.01-2 tidyselect_1.1.2
[33] tibble_3.1.7 gridExtra_2.3 dendextend_1.15.2 viridisLite_0.4.0
[37] fansi_1.0.3 crayon_1.5.1 dplyr_1.0.9 withr_2.5.0
[41] ggpubr_0.4.0 MASS_7.3-56 leaps_3.1 grid_4.1.2
[45] nlme_3.1-157 gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2
[49] magrittr_2.0.3 scales_1.2.0 cli_3.3.0 carData_3.0-5
[53] farver_2.1.0 ggsignif_0.6.3 viridis_0.6.2 scatterplot3d_0.3-41
[57] ellipsis_0.3.2 generics_0.1.2 vctrs_0.4.1 boot_1.3-28
[61] ggsci_2.9 tools_4.1.2 glue_1.6.2 purrr_0.3.4
[65] abind_1.4-5 fastmap_1.1.0 yaml_2.3.5 colorspace_2.0-3
[69] cluster_2.1.3 rstatix_0.7.0 knitr_1.39
a<- paste0("\U2265",80)
b<- paste0("\U2265",80)
data <- data.frame(a,b)
write.csv(data, "C:/NMPED Data Transformation/Newfile.csv", row.names = F, na = "",fileEncoding = "UTF-8")
But the output showing as :
a b
80 80
expected output is:
a b
≥80 ≥80
my output of sessioninfo() is
sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 LC_MONETARY=English_India.1252 LC_NUMERIC=C LC_TIME=English_India.1252
attached base packages:
stats graphics grDevices utils datasets methods base
other attached packages:
haven_2.5.0 ggmap_3.0.0 readxl_1.3.1 RODBC_1.3-19 data.table_1.14.2 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
readr_2.1.1 tidyr_1.1.4 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
tidyselect_1.1.1 lattice_0.20-45 colorspace_2.0-2 vctrs_0.3.8 generics_0.1.1 utf8_1.2.2 rlang_0.4.12 pillar_1.6.4
glue_1.5.0 withr_2.4.3 DBI_1.1.1 sp_1.4-6 dbplyr_2.1.1 modelr_0.1.8 plyr_1.8.6 jpeg_0.1-9
lifecycle_1.0.1 munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0 rvest_1.0.2 RgoogleMaps_1.4.5.3 tzdb_0.2.0 fansi_0.5.0
broom_0.7.10 Rcpp_1.0.7 scales_1.1.1 backports_1.3.0 jsonlite_1.7.2 fs_1.5.1 rjson_0.2.20 hms_1.1.1
png_0.1-7 stringi_1.7.6 grid_4.1.3 cli_3.1.0 tools_4.1.3 bitops_1.0-7 magrittr_2.0.1 crayon_1.4.2
pkgconfig_2.0.3 ellipsis_0.3.2 xml2_1.3.3 reprex_2.0.1 lubridate_1.8.0 assertthat_0.2.1 httr_1.4.2 rstudioapi_0.13
R6_2.5.1 compiler_4.1.3
Use the locale English_India.utf8.
In the example below I first set your locale (since mine is different) and the wanted symbols are not printed.
Then I set LC_CTYPE = "English_India.utf8" and the greater-than-or-equal-to symbols are there.
In the end I reset the original locale.
old_loc <- Sys.getlocale("LC_CTYPE")
a <- paste0("\U2265",80)
b <- paste0("\U2265",80)
data <- data.frame(a, b)
Sys.setlocale("LC_CTYPE", "English_India.1252")
data
#> a b
#> 1 =80 =80
Sys.setlocale("LC_CTYPE", "English_India.utf8")
#> [1] "English_India.utf8"
data
#> a b
#> 1 ≥80 ≥80
write.csv(data, "~/Temp/Newfile.csv",
row.names = FALSE,
na = "",
fileEncoding = "UTF-8")
Sys.setlocale("LC_CTYPE", old_loc)
#> [1] "Portuguese_Portugal.utf8"
Created on 2022-05-01 by the reprex package (v2.0.1)
I'm following Jan Kirenz tutorial for classification using Tidymodels. Everything so far has gone well until I try to evaluate the model using the function fit_resamples(). I keep getting the error message Error in UseMethod("required_pkgs") : no applicable method for 'required_pkgs' applied to an object of class "workflow" .
The code he uses in that section is:
log_res <-
log_wflow %>%
fit_resamples(
resamples = cv_folds,
metrics = metric_set(
recall, precision, f_meas,
accuracy, kap,
roc_auc, sens, spec),
control = control_resamples(
save_pred = TRUE)
)
I tried using the example from the function's documentation page and I get the same error message.
library(tidymodels)
set.seed(6735)
folds <- vfold_cv(mtcars, v = 5)
spline_rec <- recipe(mpg ~ ., data = mtcars) %>%
step_ns(disp) %>%
step_ns(wt)
lin_mod <- linear_reg() %>%
set_engine("lm")
control <- control_resamples(save_pred = TRUE)
spline_res <- fit_resamples(lin_mod, spline_rec, folds, control = control)
#Error in UseMethod("required_pkgs") : no applicable method for 'required_pkgs' applied to an object of class "workflow"
Does anyone know what is the issue here and how could I resolve it? I have been unable to find any mention of this issue.
Here is my sessionInfo():
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] workflowsets_0.1.0 tune_0.1.6 modeldata_0.1.1 infer_1.0.0 dials_0.0.9 scales_1.1.1 broom_0.7.9
[8] tidymodels_0.1.3 yardstick_0.0.8 GGally_2.1.2 keras_2.6.0 xgboost_1.4.1.1 ranger_0.13.1 parsnip_0.1.7
[15] recipes_0.1.16 workflows_0.2.3 rsample_0.1.0 skimr_2.1.3 visdat_0.5.3 gt_0.3.1 forcats_0.5.1
[22] stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_2.0.1 tidyr_1.1.3 tibble_3.1.4 ggplot2_3.3.5
[29] tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] colorspace_2.0-2 ellipsis_0.3.2 class_7.3-19 base64enc_0.1-3 fs_1.5.0 rstudioapi_0.13 listenv_0.8.0
[8] furrr_0.2.3 farver_2.1.0 bit64_4.0.5 prodlim_2019.11.13 fansi_0.5.0 lubridate_1.7.10 xml2_1.3.2
[15] codetools_0.2-18 splines_4.1.1 knitr_1.33 zeallot_0.1.0 jsonlite_1.7.2 pROC_1.18.0 dbplyr_2.1.1
[22] png_0.1-7 tfruns_1.5.0 compiler_4.1.1 httr_1.4.2 backports_1.2.1 assertthat_0.2.1 Matrix_1.3-4
[29] fastmap_1.1.0 cli_3.0.1 htmltools_0.5.2 tools_4.1.1 gtable_0.3.0 glue_1.4.2 Rcpp_1.0.7
[36] DiceDesign_1.9 cellranger_1.1.0 vctrs_0.3.8 iterators_1.0.13 timeDate_3043.102 gower_0.2.2 xfun_0.25
[43] globals_0.14.0 rvest_1.0.1 lifecycle_1.0.0 future_1.22.1 MASS_7.3-54 ipred_0.9-11 vroom_1.5.4
[50] hms_1.1.0 parallel_4.1.1 RColorBrewer_1.1-2 yaml_2.2.1 curl_4.3.2 reticulate_1.20 sass_0.4.0
[57] rpart_4.1-15 reshape_0.8.8 stringi_1.7.4 tensorflow_2.6.0 foreach_1.5.1 checkmate_2.0.0 lhs_1.1.1
[64] hardhat_0.1.6 lava_1.6.10 repr_1.1.3 rlang_0.4.11 pkgconfig_2.0.3 lattice_0.20-44 labeling_0.4.2
[71] bit_4.0.4 tidyselect_1.1.1 parallelly_1.27.0 plyr_1.8.6 magrittr_2.0.1 R6_2.5.1 generics_0.1.0
[78] DBI_1.1.1 pillar_1.6.2 haven_2.4.3 whisker_0.4 withr_2.4.2 survival_3.2-11 nnet_7.3-16
[85] future.apply_1.8.1 modelr_0.1.8 crayon_1.4.1 utf8_1.2.2 tzdb_0.1.2 grid_4.1.1 readxl_1.3.1
[92] data.table_1.14.0 reprex_2.0.1 digest_0.6.27 GPfit_1.0-8 munsell_0.5.0
the second chunk in your question works fine when I attach the package named tune. I think it's a better way to attach tidymodels family to your workspace via library(tidymodels) wrapper rather than attaching individually.
If tidymodels package installed correctly, (run a <- require(tidymodels) and a should be logical TRUE) this piece of code will work;
library(tidymodels)
set.seed(6735)
folds <- vfold_cv(mtcars, v = 5)
spline_rec <- recipe(mpg ~ ., data = mtcars) %>%
step_ns(disp) %>%
step_ns(wt)
lin_mod <- linear_reg() %>%
set_engine("lm")
control <- control_resamples(save_pred = TRUE)
spline_res <- fit_resamples(lin_mod, spline_rec, folds, control = control)
This question already has answers here:
case_when in mutate pipe
(7 answers)
Closed 2 years ago.
THis seems fair enough, maybe its a bug or I am missing something very basic. I try to convert Species to binary variable & hence using case when for a simple operation, however receive an error not sure should arise.
iris %>%
dplyr::mutate(Species=as.factor(Species),
Species=case_when(Species=="setosa"~"virginica",
TRUE~Species))
Error: Problem with `mutate()` input `Species`.
x must be a character vector, not a `factor` object.
i Input `Species` is `case_when(Species == "setosa" ~ "virginica", TRUE ~ Species)`.
Details on session info
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] conflicted_1.0.4 extrafontdb_1.0 extrafont_0.17 forcats_0.5.0
[5] purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.4
[9] tidyverse_1.3.0 ggplot2_3.3.2 dplyr_1.0.2 stringr_1.4.0
loaded via a namespace (and not attached):
[1] qpdf_1.1 xfun_0.19 tidyselect_1.1.0
[4] haven_2.3.1 snakecase_0.11.0 colorspace_1.4-1
[7] vctrs_0.3.4 generics_0.1.0 usethis_1.6.3
[10] htmltools_0.5.0 yaml_2.2.1 utf8_1.1.4
[13] rlang_0.4.8 pillar_1.4.6 glue_1.4.2
[16] withr_2.3.0 DBI_1.1.0 dbplyr_2.0.0
[19] modelr_0.1.8 readxl_1.3.1 lifecycle_0.2.0
[22] munsell_0.5.0 gtable_0.3.0 cellranger_1.1.0
[25] rvest_0.3.6 memoise_1.1.0 evaluate_0.14
[28] knitr_1.30 curl_4.3 fansi_0.4.1
[31] Rttf2pt1_1.3.8 broom_0.7.2 pdftools_2.3.1
[34] Rcpp_1.0.5 scales_1.1.1 backports_1.2.0
[37] jsonlite_1.7.1 fs_1.5.0 hms_0.5.3
[40] askpass_1.1 digest_0.6.27 stringi_1.5.3
[43] grid_4.0.3 cli_2.1.0 tools_4.0.3
[46] magrittr_1.5 crayon_1.3.4 pkgconfig_2.0.3
[49] ellipsis_0.3.1 xml2_1.3.2 reprex_0.3.0
[52] lubridate_1.7.9 tidytuesdayR_1.0.1 assertthat_0.2.1
[55] rmarkdown_2.5 httr_1.4.2 rstudioapi_0.12
[58] R6_2.5.0 compiler_4.0.3
Using case_when on factor variables is bit tricky.
case_when is type strict meaning all the values should evaluate to same type. The first value that you have is of type character ("virginica") and the TRUE value is of type factor hence you get a type mismatch error there. Also all the values should have factor with same levels as your original data. So incorporating all these changes you could do :
library(dplyr)
iris %>%
mutate(Species=case_when(Species == "setosa" ~
factor("virginica", levels = unique(.$Species)),
TRUE ~ Species))
The iris data set already defaults to having the Species column by a factor. You want character type here, so:
iris %>%
dplyr::mutate(Species=as.character(Species),
Species=case_when(Species=="setosa" ~ "virginica", TRUE ~ Species))