I'd like to use dplyr::mutate to add p-values to a dataframe but it's not working and I can't get my head around why.
This works:
my_add<-function(x, y) x + y
str(my_add(5, 15))
#> num 20
df <- data.frame(success=c(5,8,4), fail=c(15,13,18))
mutate(df, total=my_add(success, fail))
#> success fail total
#> 1 5 15 20
#> 2 8 13 21
#>13 4 18 22
But this doesn't:
my_binom <- function(x, y) binom.test(x, y)$"p.value"
str(my_binom(5, 20))
#> num 0.0414
df <- data.frame(success=c(5,8), total=c(20,21))
mutate(df, p_value=my_binom(success, total))
#> success total p_value
#> 1 5 20 0.5810547
#> 2 8 21 0.5810547
df <- data.frame(success=c(5,8,4), total=c(20,21,22))
mutate(df, p_value=my_binom(success, total))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: incorrect length of 'x'.
Both functions take the same input and return a single numeric, so I can't wrap my head around this discrepancy. Can someone enlighten me as to what's going on? Thanks!
Session info:
sessionInfo()
#> R version 3.4.1 (2017-06-30)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: OS X El Capitan 10.11.6
#>
#> Matrix products: default
#> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] bindrcpp_0.2 dplyr_0.7.4
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.4.1 magrittr_1.5 assertthat_0.2.0 R6_2.2.2 tools_3.4.1
#> [6] glue_1.1.1 tibble_1.3.4 yaml_2.1.14 Rcpp_0.12.14 pkgconfig_2.0.1
#> [11] rlang_0.1.2 bindr_0.1
mutate(df, p_value = purrr::map2(success, total, my_binom))
Related
I have read a single-variable .nc file into R as a SpatRaster object using the excellent terra package, with the intention of fitting geostatistical models based on the cell centroids. For this I need to construct a dataframe with columns corresponding to "lon, lat, value" using data from the SpatRaster. This feels like a task which might have a standard solution, but I'm unfamiliar with R's spatial statistics ecosystem.
Any advice/suggestions would be much appreciated.
It's even more straightforward to use the function terra::as.data.frame(). See https://rspatial.github.io/terra/reference/as.data.frame.html
library(terra)
#> terra version 1.3.4
# make test raster with terra::rast()
a <- terra::rast(ncols = 10, nrows = 10,
xmin = -84, xmax = -83,
ymin = 42, ymax = 43)
# give it some values
values(a) <- 1:ncell(a)
plot(a)
a_df <- terra::as.data.frame(a, xy = TRUE, na.rm = FALSE)
# take special note of default values
head(a_df)
#> x y lyr.1
#> 1 -83.95 42.95 1
#> 2 -83.85 42.95 2
#> 3 -83.75 42.95 3
#> 4 -83.65 42.95 4
#> 5 -83.55 42.95 5
#> 6 -83.45 42.95 6
packageVersion("terra")
#> [1] '1.3.4'
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] terra_1.3-4
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.7 codetools_0.2-18 lattice_0.20-44 digest_0.6.27
#> [5] withr_2.4.2 grid_4.1.0 magrittr_2.0.1 reprex_2.0.0
#> [9] evaluate_0.14 highr_0.9 rlang_0.4.11 stringi_1.7.3
#> [13] cli_3.0.1 fs_1.5.0 sp_1.4-5 raster_3.4-13
#> [17] rmarkdown_2.9 tools_4.1.0 stringr_1.4.0 glue_1.4.2
#> [21] xfun_0.24 yaml_2.2.1 compiler_4.1.0 htmltools_0.5.1.1
#> [25] knitr_1.33
Created on 2021-10-21 by the reprex package (v2.0.0)
library(terra)
#> Warning: package 'terra' was built under R version 4.0.5
#> terra version 1.3.22
r <- rast(ncol=10, nrow=10)
values(r) <- runif(ncell(r))
plot(r)
p <- terra::as.points(r)
df <- data.frame(values(p), geom(p))
head(df)
#> lyr.1 geom part x y hole
#> 1 0.9557333 1 1 -162 81 0
#> 2 0.2974196 2 1 -126 81 0
#> 3 0.9703617 3 1 -90 81 0
#> 4 0.3046196 4 1 -54 81 0
#> 5 0.7334711 5 1 -18 81 0
#> 6 0.8880635 6 1 18 81 0
dplyr::lag works fine with integers with the first entry being <NA>, but with bit64::integer64 the first entry is a huge number.
This is my setting:
library(tidyverse)
library(magrittr)
#> ...
library(bit64)
#> Loading required package: bit
#> Attaching package bit
#> ...
#> Attaching package bit64
#> ...
#> The following object is masked from 'package:bit':
#>
#> still.identical
#> The following objects are masked from 'package:base':
#>
#> :, %in%, is.double, match, order, rank
library(reprex)
sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-apple-darwin18.6.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
#>
#> ...
#>
#> other attached packages:
#> [1] reprex_0.3.0 bit64_0.9-7 bit_1.1-14 magrittr_1.5
#> [5] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.1 purrr_0.3.2
#> [9] readr_1.3.1 tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.0
#> [13] tidyverse_1.2.1
#>
#> ...
Here is a minimal reprex:
tib_int64 <- tibble(A_int = as.integer(c(1,2,3)),
A_int64 = as.integer64(c(1,2,3)))
tib_int64 %>% mutate(B = lag(A_int), C = lag(A_int64))
#> # A tibble: 3 x 4
#> A_int A_int64 B C
#> <int> <int64> <int> <int64>
#> 1 1 1 NA 9218868437227407266
#> 2 2 2 1 1
#> 3 3 3 2 2
The first entry in the C column should be <NA> like in the B column.
Is this a dplyr problem or a bit64 problem?
This is not too difficult to work around, but shouldn't this be filed as a bug?
I have written a function part of which converts a matrix to a tibble. This works without issues in tibble 1.4.2 but causes an error in 2.0.1.
The code that causes the error is as follows
library(tibble)
library(magrittr)
testmerge <- matrix( data = NA, ncol = 6 + 1, nrow = 0) %>%
as.tibble
The Error message is below
I can solve the problem by doing the following
testmerge <- matrix( data = NA, ncol = 6 + 1, nrow = 0) %>%
as.data.frame() %>%
as_tibble
But this seems a bit long winded.
What is happening that has caused this change? And how can I easily end up with a tibble of just empty columns?
You need to specify .name_repair; see ?as_tibble:
library(tibble)
library(magrittr)
sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] magrittr_1.5 tibble_2.0.1
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.0 digest_0.6.18 crayon_1.3.4 rprojroot_1.3-2
#> [5] backports_1.1.2 evaluate_0.11 pillar_1.3.1 rlang_0.3.1
#> [9] stringi_1.2.4 rmarkdown_1.10 tools_3.5.2 stringr_1.3.1
#> [13] yaml_2.2.0 compiler_3.5.2 pkgconfig_2.0.2 htmltools_0.3.6
#> [17] knitr_1.20
Your code worked just fine for me with tibble_1.4.2, as you describe, but after upgrading to tibble_2.0.1, I end up with the same error you had, but with a slightly more informative message that included the sentence Use .name_repair to specify repair.:
testmerge <- matrix( data = NA, ncol = 6 + 1, nrow = 0) %>%
as_tibble()
#> Error: Columns 1, 2, 3, 4, 5, … (and 2 more) must be named.
#> Use .name_repair to specify repair.
testmerge <- matrix( data = NA, ncol = 6 + 1, nrow = 0) %>%
as_tibble(.name_repair = "unique")
#> New names:
#> * `` -> `..1`
#> * `` -> `..2`
#> * `` -> `..3`
#> * `` -> `..4`
#> * `` -> `..5`
#> * … and 2 more
testmerge
#> # A tibble: 0 x 7
#> # … with 7 variables: ..1 <lgl>, ..2 <lgl>, ..3 <lgl>, ..4 <lgl>,
#> # ..5 <lgl>, ..6 <lgl>, ..7 <lgl>
Update, in the comments, #NelsonGon links to a GitHub issue, the discussion of which seems to have led to this new behavior.
I only recently learned of tibble::lst, which creates a list object but automatically names list items. I'm using this as a shortcut within a %>% workflow that makes use of the names as the .id argument in map_dfr, so the automatic naming is really helpful.
However, the names are coming in with quotation marks around them. I noticed this because they awkwardly printed in axis tick labels in a ggplot, i.e. I had a label saying "Hartford" instead of Hartford.
I looked through issues on the tidyverse/tibble github but didn't find anything. Is this a bug, or am I doing something wrong?
library(dplyr)
library(purrr)
cities <- lst("New Haven", "Bridgeport", "Hartford")
cities
#> $`"New Haven"`
#> [1] "New Haven"
#>
#> $`"Bridgeport"`
#> [1] "Bridgeport"
#>
#> $`"Hartford"`
#> [1] "Hartford"
cities %>%
map_dfr(~tibble(dummy = rnorm(1)), .id = "city")
#> # A tibble: 3 x 2
#> city dummy
#> <chr> <dbl>
#> 1 "\"New Haven\"" -0.956
#> 2 "\"Bridgeport\"" 0.533
#> 3 "\"Hartford\"" -0.0553
At first I thought it might be to escape the space in "New Haven", but it happens with single characters as well:
lst("a", "b", "c")
#> $`"a"`
#> [1] "a"
#>
#> $`"b"`
#> [1] "b"
#>
#> $`"c"`
#> [1] "c"
It works as I expect when I provide names, but that defeats this advantage that lst has over the base list.
lst(a = "a", b = "b", c = "c")
#> $a
#> [1] "a"
#>
#> $b
#> [1] "b"
#>
#> $c
#> [1] "c"
Pretty sure I'm up to date on tidyverse-related packages, but here's my session info just in case:
sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] purrr_0.2.5 dplyr_0.7.6
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_0.12.18 knitr_1.20 bindr_0.1.1 magrittr_1.5
#> [5] tidyselect_0.2.4 R6_2.2.2 rlang_0.2.2 fansi_0.3.0
#> [9] stringr_1.3.1 tools_3.5.1 utf8_1.1.4 cli_1.0.0
#> [13] htmltools_0.3.6 yaml_2.2.0 assertthat_0.2.0 rprojroot_1.3-2
#> [17] digest_0.6.16 tibble_1.4.2 crayon_1.3.4 bindrcpp_0.2.2
#> [21] glue_1.3.0 evaluate_0.11 rmarkdown_1.10 stringi_1.2.4
#> [25] compiler_3.5.1 pillar_1.3.0 backports_1.1.2 pkgconfig_2.0.2
lst() is really meant to be used with variables. Such as
xa<-"a"
xb<-"b"
xc<-"c"
lst(xa,xb,xc)
# $`xa`
# [1] "a"
# $xb
# [1] "b"
# $xc
# [1] "c"
It doesn't play well with literal, unnamed values. It takes the name of the element from the unevaluated expression you pass in. So if you pass in a character value, that evaluated expression still has the quotes. I think you just want list() here. Possibly with names:
cities <- list("New Haven", "Bridgeport", "Hartford")
names(cities)<-unname(cities)
cities
# $`New Haven`
# [1] "New Haven"
# $Bridgeport
# [1] "Bridgeport"
# $Hartford
# [1] "Hartford"
or just write your own function
nlist <- function(...) {
setNames(list(...), c(...))
}
cities <- nlist("New Haven", "Bridgeport", "Hartford")
In Check if two file paths resolve to the same file the solution is to use normalizePath; however, this appears to not be definitive for directories:
td1 <- tempdir()
td2 <- paste0(td1, "/")
dir.exists(td1) && dir.exists(td2)
#> [1] TRUE
file.create(file.path(td1, "foo.txt"))
#> [1] TRUE
file.exists(file.path(td2, "foo.txt"))
#> [1] TRUE
normalizePath(td1) == normalizePath(td2)
#> [1] FALSE
sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
#> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
#> [5] LC_TIME=English_Australia.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.5.1 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
#> [5] tools_3.5.1 htmltools_0.3.6 yaml_2.2.0 Rcpp_0.12.18
#> [9] stringi_1.1.7 rmarkdown_1.10 knitr_1.20 stringr_1.3.1
#> [13] digest_0.6.16 evaluate_0.11
Created on 2018-09-05 by the reprex package (v0.2.0).
Is there a method that is reliable (or more reliable) at identifying directories?
If you're open to using a package for this, I'm having a lot of success with the fs package for path operations that are robust across OSes. When it normalizes via fs::path_norm(), it will strip this trailing slash on Windows, for example.
td1 <- tempdir()
td2 <- paste0(td1, "/")
dir.exists(td1) && dir.exists(td2)
#> [1] TRUE
file.create(file.path(td1, "foo.txt"))
#> [1] TRUE
file.exists(file.path(td2, "foo.txt"))
#> [1] TRUE
normalizePath(td1) == normalizePath(td2)
#> [1] FALSE
library(fs)
path_norm(td1) == path_norm(td2)
#> [1] TRUE
sessionInfo()
#> R version 3.4.3 (2017-11-30)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252
#> [2] LC_CTYPE=English_United States.1252
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.4.3 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
#> [5] tools_3.4.3 htmltools_0.3.6 yaml_2.1.16 Rcpp_0.12.18
#> [9] stringi_1.1.6 rmarkdown_1.8 knitr_1.17 stringr_1.2.0
#> [13] digest_0.6.16 evaluate_0.10.1
Created on 2018-09-05 by the reprex package (v0.2.0).