dplyr::between issue dplyr 1.0.7 - r

I have opened an old script and the following use of between no longer works. A previous question I had asked demonstrates that it used to work here. Unfortunately, I am not sure what version I was using then.
library(lubridate)
library(tidyverse)
df <- data.frame(date1 = c("2011-09-18", "2013-03-06", "2013-08-08"),
date2 = c("2012-02-18", "2014-03-06", "2015-02-03"))
df$date1 <- as.Date(parse_date_time(df$date1, "ymd"))
df$date2 <- as.Date(parse_date_time(df$date2, "ymd"))
df
# date1 date2
# 1 2011-09-18 2012-02-18
# 2 2013-03-06 2014-03-06
# 3 2013-08-08 2015-02-03
df$y_2014 <- if_else(between(2014, year(df$date1), year(df$date2)), 1, 0, as.numeric(NA))
#between(2014, year(df$date1), year(df$date2))
Error: left must be length 1
sessionInfo()
R version 4.0.3 (2020-10-10)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
[7] tibble_3.0.5 ggplot2_3.3.3 tidyverse_1.3.0 lubridate_1.7.9.2
loaded via a namespace (and not attached):
[1] tinytex_0.29 tidyselect_1.1.0 xfun_0.20 haven_2.3.1 colorspace_2.0-0 vctrs_0.3.8
[7] generics_0.1.0 htmltools_0.5.1 yaml_2.2.1 utf8_1.1.4 rlang_0.4.10 pillar_1.6.4
[13] withr_2.4.0 glue_1.4.2 DBI_1.1.1 dbplyr_2.1.1 readxl_1.3.1 modelr_0.1.8
[19] fortunes_1.5-4 lifecycle_1.0.1 cellranger_1.1.0 munsell_0.5.0 gtable_0.3.0 rvest_0.3.6
[25] memoise_1.1.0 evaluate_0.14 knitr_1.30 ps_1.5.0 curl_4.3 fansi_0.4.2
[31] urltools_1.7.3 triebeard_0.3.0 broom_0.7.3 Rcpp_1.0.7 scales_1.1.1 backports_1.2.1
[37] jsonlite_1.7.2 fs_1.5.0 hms_1.0.0 pingr_2.0.1 digest_0.6.27 stringi_1.5.3
[43] processx_3.4.5 cowplot_1.1.1 grid_4.0.3 rprojroot_2.0.2 cli_2.2.0 tools_4.0.3
[49] magrittr_2.0.1 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.2 xml2_1.3.2 reprex_0.3.0
[55] datapasta_3.1.0 assertthat_0.2.1 rmarkdown_2.6 httr_1.4.2 rstudioapi_0.13 R6_2.5.0
[61] speedtest_0.2.0 compiler_4.0.3
Has anyone come across this issue before?
thanks

You could make use of rowwise() If you don't want to load data.table just for its between:
library(lubridate)
library(tidyverse)
df %>%
rowwise() %>%
mutate(y_2014 = if_else(between(2014, year(date1), year(date2)), 1, 0, as.numeric(NA)))
# A tibble: 3 x 3
# Rowwise:
date1 date2 y_2014
<date> <date> <dbl>
1 2011-09-18 2012-02-18 0
2 2013-03-06 2014-03-06 1
3 2013-08-08 2015-02-03 1
Maybe use ungroup() if you want to make some other mutates or transmutes or summarizes on your data-table

Related

Running GeneRegulatoryNetwork package Pando results in MatrixGenerics error: "MatrixGenerics:::.load_next_suggested_package_to_search(x)"

I have put my query on several platforms but haven't received any response yet. So I am trying my luck here.
I ran the Joint RNA and ATAC multiomic tutorial till the Peak Calling and added MACS2 peak set to the Seurat Object (d149 in this case) and started running Pando from thereon. But later I get error in MatrixGenerics:::. I am not sure what I did wrong. Please help. I have ran this on 3 different systems and still getting the same error! Is there a sequence/order to which packages are installed in R?
library(Pando)
data(motifs)
d149.pando <-Seurat::FindVariableFeatures(d149, assay='RNA')
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> d149.pando <- initiate_grn(d149.pando)
> d149.pando <- find_motifs(
d149.pando,
pfm = motifs,
genome = BSgenome.Hsapiens.UCSC.hg38
)
Adding TF info
Building motif matrix
Finding motif positions
Creating Motif object
> d149.pando <- infer_grn(d149.pando)
Selecting candidate regulatory regions near genes
Error in MatrixGenerics:::.load_next_suggested_package_to_search(x) :
Failed to find a rowMaxs() method for lgCMatrix objects.
sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /ncbs_gs/nlsas_data/usershares/praghu/yojetsharma/.conda/envs/Signac/lib/libopenblasp-r0.3.21.so
locale:
[1] LC_CTYPE=C LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] DelayedMatrixStats_1.19.0 DelayedArray_0.23.2
[3] Matrix_1.5-1 MatrixGenerics_1.9.1
[5] matrixStats_0.62.0 Pando_0.5.1
[7] BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.65.2
[9] rtracklayer_1.57.0 Biostrings_2.65.6
[11] XVector_0.37.1 EnsDb.Hsapiens.v86_2.99.0
[13] ensembldb_2.21.5 AnnotationFilter_1.21.0
[15] GenomicFeatures_1.49.7 AnnotationDbi_1.59.1
[17] Biobase_2.57.1 GenomicRanges_1.49.1
[19] GenomeInfoDb_1.33.7 IRanges_2.31.2
[21] S4Vectors_0.35.4 BiocGenerics_0.43.4
[23] sp_1.5-0 SeuratObject_4.1.2
[25] Seurat_4.2.0 Signac_1.8.0
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 scattermore_0.8
[3] R.methodsS3_1.8.2 tidyr_1.2.1
[5] ggplot2_3.3.6 bit64_4.0.5
[7] knitr_1.40 R.utils_2.12.0
[9] irlba_2.3.5 data.table_1.14.2
[11] rpart_4.1.16 KEGGREST_1.37.3
[13] TFBSTools_1.35.0 RCurl_1.98-1.8
[15] generics_0.1.3 cowplot_1.1.1
[17] RSQLite_2.2.17 RANN_2.6.1
[19] future_1.28.0 ggpointdensity_0.1.0
[21] bit_4.0.4 tzdb_0.3.0
[23] spatstat.data_2.2-0 xml2_1.3.3
[25] httpuv_1.6.6 SummarizedExperiment_1.27.3
[27] assertthat_0.2.1 DirichletMultinomial_1.39.0
[29] viridis_0.6.2 xfun_0.33
[31] hms_1.1.2 promises_1.2.0.1
[33] fansi_1.0.3 restfulr_0.0.15
[35] progress_1.2.2 caTools_1.18.2
[37] dbplyr_2.2.1 igraph_1.3.5
[39] DBI_1.1.3 htmlwidgets_1.5.4
[41] spatstat.geom_2.4-0 purrr_0.3.4
[43] ellipsis_0.3.2 dplyr_1.0.10
[45] backports_1.4.1 annotate_1.75.0
[47] biomaRt_2.53.2 deldir_1.0-6
[49] sparseMatrixStats_1.9.0 vctrs_0.4.1
[51] ROCR_1.0-11 abind_1.4-5
[53] cachem_1.0.6 withr_2.5.0
[55] grr_0.9.5 ggforce_0.3.4
[57] progressr_0.11.0 checkmate_2.1.0
[59] sctransform_0.3.5 GenomicAlignments_1.33.1
[61] prettyunits_1.1.1 goftest_1.2-3
[63] cluster_2.1.4 lazyeval_0.2.2
[65] seqLogo_1.63.0 crayon_1.5.1
[67] hdf5r_1.3.6 pkgconfig_2.0.3
[69] tweenr_2.0.2 nlme_3.1-159
[71] ProtGenerics_1.29.0 nnet_7.3-17
[73] pals_1.7 rlang_1.0.5
[75] globals_0.16.1 lifecycle_1.0.2
[77] miniUI_0.1.1.1 filelock_1.0.2
[79] BiocFileCache_2.5.0 dichromat_2.0-0.1
[81] polyclip_1.10-0 lmtest_0.9-40
[83] Matrix.utils_0.9.8 zoo_1.8-11
[85] base64enc_0.1-3 ggridges_0.5.3
[87] png_0.1-7 viridisLite_0.4.1
[89] rjson_0.2.21 bitops_1.0-7
[91] R.oo_1.25.0 KernSmooth_2.23-20
[93] blob_1.2.3 stringr_1.4.1
[95] parallelly_1.32.1 spatstat.random_2.2-0
[97] readr_2.1.2 jpeg_0.1-9
[99] CNEr_1.33.0 scales_1.2.1
[101] memoise_2.0.1 magrittr_2.0.3
[103] plyr_1.8.7 ica_1.0-3
[105] zlibbioc_1.43.0 compiler_4.2.0
[107] BiocIO_1.7.1 RColorBrewer_1.1-3
[109] fitdistrplus_1.1-8 Rsamtools_2.13.4
[111] cli_3.4.0 listenv_0.8.0
[113] patchwork_1.1.2 pbapply_1.5-0
[115] htmlTable_2.4.1 Formula_1.2-4
[117] MASS_7.3-58.1 mgcv_1.8-40
[119] tidyselect_1.1.2 stringi_1.7.8
[121] yaml_2.3.5 latticeExtra_0.6-30
[123] ggrepel_0.9.1 grid_4.2.0
[125] VariantAnnotation_1.43.3 fastmatch_1.1-3
[127] tools_4.2.0 future.apply_1.9.1
[129] parallel_4.2.0 rstudioapi_0.14
[131] TFMPvalue_0.0.8 foreign_0.8-82
[133] gridExtra_2.3 farver_2.1.1
[135] Rtsne_0.16 ggraph_2.0.6
[137] BiocManager_1.30.18 digest_0.6.29
[139] rgeos_0.5-10 pracma_2.4.2
[141] shiny_1.7.2 motifmatchr_1.19.0
[143] Rcpp_1.0.9 later_1.3.0
[145] RcppAnnoy_0.0.19 httr_1.4.4
[147] biovizBase_1.45.0 colorspace_2.0-3
[149] XML_3.99-0.10 tensor_1.5
[151] reticulate_1.26 splines_4.2.0
[153] uwot_0.1.14 RcppRoll_0.3.0
[155] spatstat.utils_2.3-1 graphlayouts_0.8.1
[157] mapproj_1.2.8 plotly_4.10.0
[159] xtable_1.8-4 jsonlite_1.8.0
[161] poweRlaw_0.70.6 tidygraph_1.2.2
[163] R6_2.5.1 Hmisc_4.7-1
[165] pillar_1.8.1 htmltools_0.5.3
[167] mime_0.12 glue_1.6.2
[169] fastmap_1.1.0 BiocParallel_1.31.12
[171] codetools_0.2-18 maps_3.4.0
[173] utf8_1.2.2 lattice_0.20-45
[175] spatstat.sparse_2.1-1 tibble_3.1.8
[177] curl_4.3.2 leiden_0.4.3
[179] gtools_3.9.3 GO.db_3.15.0
[181] interp_1.1-3 survival_3.4-0
[183] munsell_0.5.0 GenomeInfoDbData_1.2.8
[185] reshape2_1.4.4 gtable_0.3.1
[187] spatstat.core_2.4-4

Conditional replacement using if else statement throwing error in R v 4.2.0

I have a dataset with a lot of repeating values.
Example of the data structure:
GRID DATE TIME TAGLFT TAGRT COL LOCX LOCY YEAR TOTAL.sum
<chr> <date> <chr> <chr> <chr> <chr> <dbl> <dbl> <int> <int>
AG 2004-06-01 09:33 47962 47963 P/- 20.5 0 2004 2
AG 2004-06-01 09:33 47962 47963 P/- 20.5 0.5 2004 2
I am trying to conditionally replace LOCY values if the sum column for the group_by is greater than 1 like so:
data %>%
group_by(GRID, DATE, TIME, TAGLFT, TAGRT, COL, LOCX, YEAR) %>%
mutate(LOCY=if(TOTAL.sum>1) first(LOCY) else LOCY)
Desired output is:
GRID DATE TIME TAGLFT TAGRT COL LOCX LOCY YEAR TOTAL.sum
<chr> <date> <chr> <chr> <chr> <chr> <dbl> <dbl> <int> <int>
AG 2004-06-01 09:33 47962 47963 P/- 20.5 0 2004 2
AG 2004-06-01 09:33 47962 47963 P/- 20.5 0 2004 2
The script above was working fine until I updated my R to v 4.2.0.
Now when I run the mutate part of my script, I get the following error:
Error in `mutate()`:
! Problem while computing `LOCY = if (sum > 1) first(LOCY) else LOCY`.
ℹ The error occurred in group 73: GRID = "AG", DATE = 2004-06-01, TIME = "09:33", TAGLFT = "47962", TAGRT = "47963", COL = "P/-", LOCX = 20.5, YEAR = 2004.
Caused by error in `if (TOTAL.sum > 1) ...`:
! the condition has length > 1
Run `rlang::last_error()` to see where the error occurred.
Any ideas what might be going on here? Is it a problem with my script? Am I missing something with my syntax?
In case it's needed, here is my sessionInfo()
> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] adehabitatHR_0.4.19 adehabitatLT_0.3.25 CircStats_0.2-6 boot_1.3-28
[5] MASS_7.3-57 adehabitatMA_0.3.14 ade4_1.7-19 deldir_1.0-6
[9] jpeg_0.1-9 ggmap_3.0.0 RgoogleMaps_1.4.5.3 raster_3.5-15
[13] maps_3.4.0 rgdal_1.5-30 rgeos_0.5-9 maptools_1.1-4
[17] sp_1.4-7 data.table_1.14.2 sf_1.0-7 spatsoc_0.1.16
[21] DescTools_0.99.45 glmmTMB_1.1.3 lme4_1.1-29 Matrix_1.4-1
[25] ggnetwork_0.5.10 scales_1.2.0 cowplot_1.1.1 gridExtra_2.3
[29] psych_2.2.5 beepr_1.3 here_1.0.1 magrittr_2.0.3
[33] rcompanion_2.4.15 devtools_2.4.3 usethis_2.1.6 forcats_0.5.1
[37] stringr_1.4.0 purrr_0.3.4 readr_2.1.2 tibble_3.1.7
[41] ggplot2_3.3.6 tidyverse_1.3.1 lubridate_1.8.0 tidyr_1.2.0
[45] plyr_1.8.7 krsp_0.0.2 dplyr_1.0.9
loaded via a namespace (and not attached):
[1] readxl_1.4.0 backports_1.4.1 TMB_1.8.1 splines_4.2.0
[5] TH.data_1.1-1 fansi_1.0.3 memoise_2.0.1 tzdb_0.3.0
[9] remotes_2.4.2 modelr_0.1.8 matrixStats_0.62.0 sandwich_3.0-1
[13] prettyunits_1.1.1 colorspace_2.0-3 rvest_1.0.2 haven_2.5.0
[17] callr_3.7.0 crayon_1.5.1 jsonlite_1.8.0 libcoin_1.0-9
[21] Exact_3.1 keyring_1.3.0 survival_3.3-1 zoo_1.8-10
[25] glue_1.6.2 gtable_0.3.0 emmeans_1.7.4-1 pkgbuild_1.3.1
[29] mvtnorm_1.1-3 DBI_1.1.2 Rcpp_1.0.8.3 xtable_1.8-4
[33] tmvnsim_1.0-2 units_0.8-0 foreign_0.8-82 proxy_0.4-26
[37] stats4_4.2.0 httr_1.4.3 modeltools_0.2-23 ellipsis_0.3.2
[41] pkgconfig_2.0.3 multcompView_0.1-8 dbplyr_2.1.1 utf8_1.2.2
[45] tidyselect_1.1.2 rlang_1.0.2 munsell_0.5.0 cellranger_1.1.0
[49] tools_4.2.0 cachem_1.0.6 cli_3.3.0 generics_0.1.2
[53] audio_0.1-10 broom_0.8.0 fastmap_1.1.0 processx_3.5.3
[57] fs_1.5.2 coin_1.4-2 rootSolve_1.8.2.3 nlme_3.1-157
[61] xml2_1.3.3 brio_1.1.3 compiler_4.2.0 rstudioapi_0.13
[65] png_0.1-7 e1071_1.7-9 testthat_3.1.4 reprex_2.0.1
[69] stringi_1.7.6 ps_1.7.0 desc_1.4.1 lattice_0.20-45
[73] classInt_0.4-3 nloptr_2.0.2 vctrs_0.4.1 pillar_1.7.0
[77] lifecycle_1.0.1 lmtest_0.9-40 estimability_1.3 bitops_1.0-7
[81] lmom_2.8 R6_2.5.1 RMySQL_0.10.23 KernSmooth_2.23-20
[85] gld_2.6.4 sessioninfo_1.2.2 codetools_0.2-18 assertthat_0.2.1
[89] pkgload_1.2.4 rjson_0.2.21 rprojroot_2.0.3 withr_2.5.0
[93] nortest_1.0-4 mnormt_2.0.2 multcomp_1.4-19 expm_0.999-6
[97] parallel_4.2.0 hms_1.1.1 terra_1.5-21 grid_4.2.0
[101] coda_0.19-4 class_7.3-20 minqa_1.2.4 numDeriv_2016.8-1.1
You can use the vectorized version of if which is ifelse function as follows:
data %>%
group_by(GRID, DATE, TIME, TAGLFT, TAGRT, COL, LOCX, YEAR) %>%
mutate(LOCY = ifelse(sum > 1, first(LOCY) , LOCY))
#> # A tibble: 2 × 10
# Groups: GRID, DATE, TIME, TAGLFT, TAGRT, COL, LOCX, YEAR [1]
GRID DATE TIME TAGLFT TAGRT COL LOCX LOCY YEAR sum
<chr> <chr> <chr> <int> <int> <chr> <dbl> <dbl> <int> <int>
1 AG 2004-06-01 09:33 47962 47963 P/- 20.5 0 2004 2
2 AG 2004-06-01 09:33 47962 47963 P/- 20.5 0 2004 2
Created on 2022-05-27 by the reprex package (v2.0.1)

app$vspace error in building phylogenetic tree in R

I am working with phylogenetic trees. Import the phylogenetic tree file with ggtree::read.tree and get the information with readxl::read_xlsx. I want to visualize in tree. When I try to add color and shape information (from xlsx, I tried assigning it to a variable before but it didn't work) with the ggtree::geom_tippoint function, I get the "Error in app$vspace(new_style$margin-top %||% 0) :attempt to apply non-function" error.
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Turkish_Turkey.1254 LC_CTYPE=Turkish_Turkey.1254
#> [3] LC_MONETARY=Turkish_Turkey.1254 LC_NUMERIC=C
#> [5] LC_TIME=Turkish_Turkey.1254
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.36 magrittr_2.0.1 R.cache_0.15.0
#> [5] rlang_1.0.1 fastmap_1.1.0 fansi_0.5.0 stringr_1.4.0
#> [9] styler_1.6.2 highr_0.9 tools_4.1.1 xfun_0.26
#> [13] R.oo_1.24.0 utf8_1.2.2 cli_3.2.0 withr_2.4.3
#> [17] htmltools_0.5.2 ellipsis_0.3.2 yaml_2.2.1 digest_0.6.28
#> [21] tibble_3.1.5 lifecycle_1.0.1 crayon_1.5.0 purrr_0.3.4
#> [25] R.utils_2.11.0 vctrs_0.3.8 fs_1.5.0 glue_1.4.2
#> [29] evaluate_0.14 rmarkdown_2.11 reprex_2.0.1 stringi_1.7.5
#> [33] compiler_4.1.1 pillar_1.7.0 R.methodsS3_1.8.1 backports_1.4.1
#> [37] pkgconfig_2.0.3
The contents of the nwk file are as follows.
(((((((A:4,B:4):6,C:5):8,D:6):3,E:21):10,((F:4,G:12):14,H:8):13):13,((I:5,J:2):30,(K:11,L:11):2):17):4,M:56);
xlsx file content is as follows.
label
con
host
rb
color
shape
A
Japan
Sol
Tsw
#ee4444
15
B
Japan
Sol
Sw5
#ee4444
15
C
South Korea
Sol
Tsw
#ee4444
15
D
South Korea
Cap
#A1CD42
16
E
China
Sol
Tsw
#ee4444
15
F
Italy
Cap
Tsw
#A1CD42
15
G
USA
Cap
#A1CD42
16
H
USA
Per
Sw5
#86d4ea
15
K
Italy
Sol
Sw5
#ee4444
15
L
Italy
Cap
#A1CD42
16
M
Turkey
Per
Tsw
#86d4ea
15
J
Turkey
Sol
#ee4444
16
I
Turkey
Cap
Sw5
#A1CD42
15
d1<- read.tree(file = "D:/Download/tree_newick.nwk")
d1a<-data.frame(read_xlsx(path="D:/Download/tree_newichk_info.xlsx", sheet = "Sheet1"))
d2<-ggtree(d1, layout = "circular")+xlim(-5, NA) %<+% d1a
d3<-d2+geom_text(aes(label=node), hjust=.3)+
geom_tiplab(aes(,color=d1a$con , label=label,size=10))+
geom_tippoint(aes(shape=ifelse(rb==c("Tsw","Sw5"),15, ifelse (rb!=c("Tsw","Sw5"), 16,17))), color= ifelse(d1a$host == "Cap",'#A1CD42', ifelse (d1a$host== "Sol", '#ee4444','#86d4ea')))
d3
shape_f<-ifelse(d1a$rb==c("Tsw","Sw5"),15, ifelse (d1a$rb!=c("Tsw","Sw5"), 16,17))
color_f=ifelse(d1a$host == "Cap",'#A1CD42', ifelse (d1a$host== "Sol", '#ee4444','#86d4ea'))
d4<-d2+geom_text(aes(label=node), hjust=.3)+geom_tiplab(aes(label=label))+geom_tippoint(aes(shape=shape_f,color=color_f))
d4
shape_d<-d1a$shape
color_d<-d1a$color
d5<-d2+ geom_text(aes(label=node), hjust=.3)+geom_tiplab(aes(label=label))+geom_tippoint(aes(shape=shape_d,color=color_d))
d5
sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=Turkish_Turkey.1254 LC_CTYPE=Turkish_Turkey.1254 LC_MONETARY=Turkish_Turkey.1254 LC_NUMERIC=C
[5] LC_TIME=Turkish_Turkey.1254
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reprex_2.0.1 shiny_1.7.1 forcats_0.5.1 stringr_1.4.0 purrr_0.3.4 readr_2.0.2
[7] tidyr_1.1.4 tibble_3.1.5 tidyverse_1.3.1 readxl_1.3.1 ggnewscale_0.4.6 ggtreeExtra_1.2.3
[13] ggtree_3.0.4 treeio_1.16.2 tidytree_0.3.8 ggplot2_3.3.5 dplyr_1.0.7 ape_5.6-1
[19] treedataverse_0.0.1 BiocManager_1.30.16
loaded via a namespace (and not attached):
[1] nlme_3.1-152 fs_1.5.0 lubridate_1.8.0 httr_1.4.2 R.cache_0.15.0 tools_4.1.1 backports_1.4.1
[8] bslib_0.3.1 utf8_1.2.2 R6_2.5.1 DBI_1.1.1 lazyeval_0.2.2 colorspace_2.0-3 withr_2.4.3
[15] processx_3.5.2 tidyselect_1.1.2 compiler_4.1.1 cli_3.2.0 rvest_1.0.2 xml2_1.3.2 labeling_0.4.2
[22] sass_0.4.0 scales_1.1.1 callr_3.7.0 digest_0.6.28 yulab.utils_0.0.4 R.utils_2.11.0 rmarkdown_2.11
[29] pkgconfig_2.0.3 htmltools_0.5.2 styler_1.6.2 highr_0.9 dbplyr_2.1.1 fastmap_1.1.0 rlang_1.0.1
[36] rstudioapi_0.13 gridGraphics_0.5-1 jquerylib_0.1.4 farver_2.1.0 generics_0.1.2 jsonlite_1.7.2 R.oo_1.24.0
[43] magrittr_2.0.1 ggplotify_0.1.0 patchwork_1.1.1 Rcpp_1.0.8 munsell_0.5.0 fansi_0.5.0 clipr_0.7.1
[50] R.methodsS3_1.8.1 lifecycle_1.0.1 stringi_1.7.5 yaml_2.2.1 grid_4.1.1 parallel_4.1.1 promises_1.2.0.1
[57] crayon_1.5.0 miniUI_0.1.1.1 lattice_0.20-44 haven_2.4.3 hms_1.1.1 ps_1.6.0 knitr_1.36
[64] pillar_1.7.0 glue_1.4.2 evaluate_0.14 ggfun_0.0.5 modelr_0.1.8 vctrs_0.3.8 tzdb_0.1.2
[71] httpuv_1.6.3 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1 cachem_1.0.6 xfun_0.26 mime_0.12
[78] xtable_1.8-4 broom_0.7.9 later_1.3.0 aplot_0.1.3 ellipsis_0.3.2

How to make a column exist in r?

I have a very large dataset where I am looking to take a column of identifiers (CP) first edit how the identifiers look to match another file, and then search if there are ```CP`` matches between the files.
I do the editing of the CP first with:
fullGWAS <- fread('file.csv',sep=",")
colnames(fullGWAS)[1] <- "CP"
fullGWAS2<-gsub("_.*","",fullGWAS$CP)
fullGWAS2 <-data.frame(fullGWAS2)
colnames(fullGWAS2)[1] <- "CP"
fullGWAS3 <- select(fullGWAS, c(2:15))
gwasdf <- cbind(fullGWAS2, fullGWAS3)
As an example gwasdf looks like:
> head(gwasdf)
CP chr bpos a1 a2 freq BETAsbp Psbp BETAdbp Pdbp BETApp Ppp minP
1 1:2556125 1 2556125 t c 0.3255 -0.0262 0.41300 -0.0113 0.5388 -0.0157 0.4690 0.41300
2 1:2556548 1 2556548 t c 0.3261 -0.0274 0.39270 -0.0121 0.5096 -0.0160 0.4615 0.39270
3 1:2556709 1 2556709 a g 0.3257 -0.0263 0.41210 -0.0116 0.5266 -0.0155 0.4749 0.41210
4 12:11366987 12 11366987 t c 0.9443 0.0355 0.61460 0.0019 0.9631 0.0185 0.7007 0.61460
5 17:21949792 17 21949792 a c 0.4570 -0.0384 0.20690 -0.0043 0.8065 -0.0212 0.3050 0.20690
6 17:21955349 17 21955349 t g 0.5253 0.0505 0.09562 0.0103 0.5574 0.0248 0.2303 0.09562
minTRAIT BETAmean
1 SBP -0.01875
2 SBP -0.01975
3 SBP -0.01895
4 SBP 0.01870
5 SBP -0.02135
6 SBP 0.03040
I can see CP is here yet when I try to check this I get:
exists("gwasdf$CP")
[1] FALSE
class(gwasdf)
[1] "data.frame"
nrow(gwasdf)
[1] 7083535
Why is this false and how can I make it be true?
I am trying to ultimately check whether the CP identifiers are present in another file with follow-up code using:
CPmatches <- df2[CP %in% gwasdf$CP] #df2 is another file I just read in
mismatchextract <- subset(gwasdf, !(CP %in% df2$CP))
For extra info I use RStudio with:
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] matrixStats_0.57.0 sqldf_0.4-11 RSQLite_2.2.1 gsubfn_0.7
[5] proto_1.0.0 data.table_1.13.2 forcats_0.5.0 stringr_1.4.0
[9] dplyr_1.0.2 purrr_0.3.4 readr_1.4.0 tidyr_1.1.2
[13] tibble_3.0.4 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] tidyselect_1.1.0 haven_2.3.1 tcltk_4.0.2 colorspace_1.4-1 vctrs_0.3.4
[6] generics_0.1.0 chron_2.3-56 blob_1.2.1 rlang_0.4.8 pillar_1.4.7
[11] glue_1.4.1 withr_2.3.0 DBI_1.1.0 bit64_4.0.5 dbplyr_2.0.0
[16] modelr_0.1.8 readxl_1.3.1 lifecycle_0.2.0 munsell_0.5.0 gtable_0.3.0
[21] cellranger_1.1.0 rvest_0.3.6 memoise_1.1.0 fansi_0.4.1 broom_0.7.2
[26] Rcpp_1.0.5 scales_1.1.1 backports_1.1.10 jsonlite_1.7.1 fs_1.5.0
[31] bit_4.0.4 hms_0.5.3 digest_0.6.27 stringi_1.5.3 grid_4.0.2
[36] cli_2.2.0 tools_4.0.2 magrittr_2.0.1 crayon_1.3.4 pkgconfig_2.0.3
[41] ellipsis_0.3.1 xml2_1.3.2 reprex_0.3.0 lubridate_1.7.9 assertthat_0.2.1
[46] httr_1.4.2 rstudioapi_0.13 R6_2.5.0 compiler_4.0.2
Something like this using dplyr and the %in% operator? Assuming there are two separate datasets and a goal of subsetting based on whether an element in one dataset belongs to a separate dataset.
qwasdf_1 <- data.frame(
CP1 = c("1:2556125", "1:2556548", "99:12345678")
)
qwasdf_2 <- data.frame(
CP2 = c("1:2556125", "1:2556548", "1:2556709")
)
library(dplyr)
qwasdf_1 %>%
filter(CP1 %in% qwasdf_2$CP2)
#> CP1
#> 1 1:2556125
#> 2 1:2556548
Created on 2020-11-23 by the reprex package (v0.3.0)

group_by and summarise previously (10 minutes ago) worked on my data frame but no doesn't

I have a dataframe I am manipulating that I ran group_by and summarise on a few minutes ago. After a forced restart of my computer (due to company IT) my group_by function no longer works. I have had this error sporadically for the last month or so.
Here's my code:
covid_per10k_hosp <- datasetv5_pat %>%
ungroup() %>%
mutate(death2=case_when(death=="deceased"~1, TRUE~0)) %>%
group_by(PROV_ID) %>%
summarize(n_deaths=sum(death2))
example data:
PAT_ID PROV_ID death
1 A deceased
2 A alive
3 B deceased
4 B deceased
Expected Output:
PROV_ID n_deaths
A 1
B 2
Actual Output:
PROV_ID n_deaths
A 1
A 1
B 2
B 2
Edit to respond to comments suggesting additional information, here is the output from sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] finalfit_1.0.2 ggsci_2.9 icd_4.0.9
ggpubr_0.4.0
[5] readr_1.3.1 vroom_1.3.1 knitr_1.29
tableone_0.12.0
[9] dplyr_1.0.2 summarytools_0.9.6 expss_0.10.6
Hmisc_4.4-1
[13] ggplot2_3.3.2 Formula_1.2-3 survival_3.2-3
lattice_0.20-38
loaded via a namespace (and not attached):
[1] tidyr_1.1.1 bit64_4.0.5 splines_3.6.0
carData_3.0-4
[5] assertthat_0.2.1 latticeExtra_0.6-29 pander_0.6.3
cellranger_1.1.0
[9] pillar_1.4.6 backports_1.1.7 glue_1.4.2
digest_0.6.25
[13] RColorBrewer_1.1-2 pryr_0.1.4 ggsignif_0.6.0
checkmate_2.0.0
[17] colorspace_1.4-1 htmltools_0.5.0 Matrix_1.2-17
survey_4.0
[21] plyr_1.8.6 pkgconfig_2.0.3 broom_0.7.0
haven_2.3.1
[25] magick_2.4.0 purrr_0.3.4 scales_1.1.1
jpeg_0.1-8.1
[29] openxlsx_4.1.5 rio_0.5.16 htmlTable_2.0.1
tibble_3.0.3
[33] generics_0.0.2 car_3.0-9 ellipsis_0.3.1
withr_2.2.0
[37] nnet_7.3-12 cli_2.0.2 magrittr_1.5
crayon_1.3.4
[41] readxl_1.3.1 mice_3.11.0 fansi_0.4.1
rstatix_0.6.0
[45] forcats_0.5.0 foreign_0.8-71 rapportools_1.0
tools_3.6.0
[49] data.table_1.13.0 hms_0.5.3 mitools_2.4
lifecycle_0.2.0
[53] matrixStats_0.56.0 stringr_1.4.0 munsell_0.5.0
cluster_2.0.8
[57] zip_2.1.0 packrat_0.5.0 compiler_3.6.0
rlang_0.4.7
[61] grid_3.6.0 rstudioapi_0.11 htmlwidgets_1.5.1
tcltk_3.6.0
[65] base64enc_0.1-3 boot_1.3-22 gtable_0.3.0
codetools_0.2-16
[69] abind_1.4-5 DBI_1.1.0 curl_4.3
R6_2.4.1
[73] gridExtra_2.3 lubridate_1.7.9 utf8_1.1.4
bit_4.0.4
[77] stringi_1.4.6 Rcpp_1.0.5 vctrs_0.3.2
rpart_4.1-15
[81] png_0.1-7 tidyselect_1.1.0 xfun_0.16

Resources