dplyr lag has a problem with integer64 from bit64 - r

dplyr::lag works fine with integers with the first entry being <NA>, but with bit64::integer64 the first entry is a huge number.
This is my setting:
library(tidyverse)
library(magrittr)
#> ...
library(bit64)
#> Loading required package: bit
#> Attaching package bit
#> ...
#> Attaching package bit64
#> ...
#> The following object is masked from 'package:bit':
#>
#> still.identical
#> The following objects are masked from 'package:base':
#>
#> :, %in%, is.double, match, order, rank
library(reprex)
sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-apple-darwin18.6.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
#>
#> ...
#>
#> other attached packages:
#> [1] reprex_0.3.0 bit64_0.9-7 bit_1.1-14 magrittr_1.5
#> [5] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.1 purrr_0.3.2
#> [9] readr_1.3.1 tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.0
#> [13] tidyverse_1.2.1
#>
#> ...
Here is a minimal reprex:
tib_int64 <- tibble(A_int = as.integer(c(1,2,3)),
A_int64 = as.integer64(c(1,2,3)))
tib_int64 %>% mutate(B = lag(A_int), C = lag(A_int64))
#> # A tibble: 3 x 4
#> A_int A_int64 B C
#> <int> <int64> <int> <int64>
#> 1 1 1 NA 9218868437227407266
#> 2 2 2 1 1
#> 3 3 3 2 2
The first entry in the C column should be <NA> like in the B column.
Is this a dplyr problem or a bit64 problem?
This is not too difficult to work around, but shouldn't this be filed as a bug?

Related

How to remove square frame around windrose in openair package in R

In the windrose example below, I would like to modify a few things:
remove the square frame around the windrose, so there is no border between the windrose itself and the legend.
remove the "title" which says "Frequency of counts..".
modify the labels of the directions N, E, W and S to "North", "East", "West" and "South".
add a small white background to each of the percentage numbers itself (from 2% to 10%).
Every answer to any question would already help! Sorry, I wasn't allowed to attach images, but the code below should help.
Code:
if (!require("openair")) install.packages("openair")
#> Lade nötiges Paket: openair
data_wind <- mydata
head(data_wind)
#> # A tibble: 6 × 10
#> date ws wd nox no2 o3 pm10 so2 co pm25
#> <dttm> <dbl> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
#> 1 1998-01-01 00:00:00 0.6 280 285 39 1 29 4.72 3.37 NA
#> 2 1998-01-01 01:00:00 2.16 230 NA NA NA 37 NA NA NA
#> 3 1998-01-01 02:00:00 2.76 190 NA NA 3 34 6.83 9.60 NA
#> 4 1998-01-01 03:00:00 2.16 170 493 52 3 35 7.66 10.2 NA
#> 5 1998-01-01 04:00:00 2.4 180 468 78 2 34 8.07 8.91 NA
#> 6 1998-01-01 05:00:00 3 190 264 42 0 16 5.50 3.05 NA
windRose(
mydata = data_wind,
auto.text = T,
paddle = F,
angle = 10,
seg = 1,
key.position = "right",
key.header = "wind speed [m/s]",
key.footer = "",
key = list(plot.style = "border"),
grid.line = list(value = 2, lty = 1, pch = 3, col = "black"),
offset = 2.5,
max.freq = 10,
angle.scale = 45,
border = "black"
)
Created on 2022-07-29 by the reprex package (v2.0.1)
The openair package version is 2.10.0. If you need any information about my OS etc.:
sessionInfo()
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8
#> [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C
#> [5] LC_TIME=German_Germany.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] rstudioapi_0.13 knitr_1.39 magrittr_2.0.3 R.cache_0.16.0
#> [5] rlang_1.0.3 fastmap_1.1.0 fansi_1.0.3 stringr_1.4.0
#> [9] styler_1.7.0 highr_0.9 tools_4.2.1 xfun_0.31
#> [13] R.oo_1.25.0 utf8_1.2.2 cli_3.3.0 withr_2.5.0
#> [17] htmltools_0.5.2 ellipsis_0.3.2 yaml_2.3.5 digest_0.6.29
#> [21] tibble_3.1.7 lifecycle_1.0.1 crayon_1.5.1 purrr_0.3.4
#> [25] R.utils_2.12.0 vctrs_0.4.1 fs_1.5.2 glue_1.6.2
#> [29] evaluate_0.15 rmarkdown_2.14 reprex_2.0.1 stringi_1.7.6
#> [33] compiler_4.2.1 pillar_1.7.0 R.methodsS3_1.8.2 pkgconfig_2.0.3
Created on 2022-07-29 by the reprex package (v2.0.1)
Thanks in advance!

How to reverse factor levels for step_woe preprocessing recipe step

I am applying WOE (weight of evidence) transformation for my features (using 'step_woe' from the 'embed' package) within the 'recipes' framework, but by default it takes the 0 value as reference and thus the WOE values are reversed.
I am trying to relevel the target to set "1" as reference but the results are the same (no change in the direction of woe values). Any idea how to get it right?
Here is an example, first I create example dataset with a target (0's and 1's) and one feature ('yes', 'no') in perfect relationship with each other. Then I apply step_woe transformation while setting the reference level either '0' or '1' to compare the results with no difference.
library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
library(embed)
example_df <-
tibble(
target = rbinom(1000, 1, 0.5),
feature = ifelse(target == 1, "yes", "no")
) %>%
mutate_all(as.factor) %>%
print()
#> # A tibble: 1,000 x 2
#> target feature
#> <fct> <fct>
#> 1 0 no
#> 2 1 yes
#> 3 0 no
#> 4 0 no
#> 5 1 yes
#> 6 0 no
#> 7 1 yes
#> 8 1 yes
#> 9 0 no
#> 10 0 no
#> # … with 990 more rows
woe_recipe_0 <-
recipe(target ~ feature, data = example_df) %>%
step_relevel(target, ref_level = "0") %>%
embed::step_woe(all_nominal_predictors(), outcome = "target") %>%
prep(., retain = FALSE)
tidy(woe_recipe_0, number = 2)
#> # A tibble: 2 x 10
#> terms value n_tot n_0 n_1 p_0 p_1 woe outcome id
#> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 feature no 493 493 0 1 0 20.0 target woe_nY7AB
#> 2 feature yes 507 0 507 0 1 -20.0 target woe_nY7AB
woe_recipe_1 <-
recipe(target ~ feature, data = example_df) %>%
step_relevel(target, ref_level = "1") %>%
embed::step_woe(all_nominal_predictors(), outcome = "target") %>%
prep(., retain = FALSE)
tidy(woe_recipe_1, number = 2)
#> # A tibble: 2 x 10
#> terms value n_tot n_0 n_1 p_0 p_1 woe outcome id
#> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 feature no 493 493 0 1 0 20.0 target woe_Lt6pK
#> 2 feature yes 507 0 507 0 1 -20.0 target woe_Lt6pK
sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Red Hat Enterprise Linux
#>
#> Matrix products: default
#> BLAS: /opt/R/3.5.1/lib64/R/lib/libRblas.so
#> LAPACK: /opt/R/3.5.1/lib64/R/lib/libRlapack.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] embed_0.1.5 recipes_0.1.17 forcats_0.4.0 stringr_1.4.0
#> [5] dplyr_1.0.7 purrr_0.3.4 readr_1.3.1 tidyr_1.1.2
#> [9] tibble_3.0.4 ggplot2_3.3.5 tidyverse_1.3.0
#>
#> loaded via a namespace (and not attached):
#> [1] httr_1.4.1 jsonlite_1.6 splines_3.5.1
#> [4] prodlim_2019.11.13 modelr_0.1.5 RcppParallel_5.0.2
#> [7] assertthat_0.2.1 highr_0.8 cellranger_1.1.0
#> [10] yaml_2.2.0 ipred_0.9-12 pillar_1.6.2
#> [13] backports_1.2.1 lattice_0.20-35 glue_1.5.1
#> [16] reticulate_1.13 digest_0.6.27 rvest_0.3.5
#> [19] colorspace_2.0-0 htmltools_0.4.0 Matrix_1.2-14
#> [22] timeDate_3043.102 pkgconfig_2.0.3 broom_0.7.6
#> [25] haven_2.2.0 scales_1.1.0 whisker_0.4
#> [28] gower_0.2.1 lava_1.6.6 generics_0.1.0
#> [31] ellipsis_0.3.2 withr_2.4.1 keras_2.2.5.0
#> [34] nnet_7.3-12 cli_2.4.0 survival_2.42-3
#> [37] magrittr_2.0.1 crayon_1.4.1 readxl_1.3.1
#> [40] evaluate_0.14 fs_1.3.1 fansi_0.4.2
#> [43] MASS_7.3-51.4 xml2_1.2.2 class_7.3-14
#> [46] tools_3.5.1 hms_1.1.1 lifecycle_1.0.1
#> [49] munsell_0.5.0 reprex_0.3.0 compiler_3.5.1
#> [52] rlang_0.4.12 grid_3.5.1 rstudioapi_0.11
#> [55] base64enc_0.1-3 rmarkdown_1.18 gtable_0.3.0
#> [58] DBI_1.1.1 R6_2.5.0 tfruns_1.4
#> [61] lubridate_1.7.4 knitr_1.26 tensorflow_2.0.0
#> [64] uwot_0.1.5 utf8_1.2.1 zeallot_0.1.0
#> [67] stringi_1.4.3 Rcpp_1.0.7 vctrs_0.3.8
#> [70] rpart_4.1-15 dbplyr_2.1.1 tidyselect_1.1.1.9000
#> [73] xfun_0.11
Created on 2022-02-02 by the reprex package (v0.3.0)

How to constuct lon, lat, value dataframe from terra SpatRaster

I have read a single-variable .nc file into R as a SpatRaster object using the excellent terra package, with the intention of fitting geostatistical models based on the cell centroids. For this I need to construct a dataframe with columns corresponding to "lon, lat, value" using data from the SpatRaster. This feels like a task which might have a standard solution, but I'm unfamiliar with R's spatial statistics ecosystem.
Any advice/suggestions would be much appreciated.
It's even more straightforward to use the function terra::as.data.frame(). See https://rspatial.github.io/terra/reference/as.data.frame.html
library(terra)
#> terra version 1.3.4
# make test raster with terra::rast()
a <- terra::rast(ncols = 10, nrows = 10,
xmin = -84, xmax = -83,
ymin = 42, ymax = 43)
# give it some values
values(a) <- 1:ncell(a)
plot(a)
a_df <- terra::as.data.frame(a, xy = TRUE, na.rm = FALSE)
# take special note of default values
head(a_df)
#> x y lyr.1
#> 1 -83.95 42.95 1
#> 2 -83.85 42.95 2
#> 3 -83.75 42.95 3
#> 4 -83.65 42.95 4
#> 5 -83.55 42.95 5
#> 6 -83.45 42.95 6
packageVersion("terra")
#> [1] '1.3.4'
sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] terra_1.3-4
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.7 codetools_0.2-18 lattice_0.20-44 digest_0.6.27
#> [5] withr_2.4.2 grid_4.1.0 magrittr_2.0.1 reprex_2.0.0
#> [9] evaluate_0.14 highr_0.9 rlang_0.4.11 stringi_1.7.3
#> [13] cli_3.0.1 fs_1.5.0 sp_1.4-5 raster_3.4-13
#> [17] rmarkdown_2.9 tools_4.1.0 stringr_1.4.0 glue_1.4.2
#> [21] xfun_0.24 yaml_2.2.1 compiler_4.1.0 htmltools_0.5.1.1
#> [25] knitr_1.33
Created on 2021-10-21 by the reprex package (v2.0.0)
library(terra)
#> Warning: package 'terra' was built under R version 4.0.5
#> terra version 1.3.22
r <- rast(ncol=10, nrow=10)
values(r) <- runif(ncell(r))
plot(r)
p <- terra::as.points(r)
df <- data.frame(values(p), geom(p))
head(df)
#> lyr.1 geom part x y hole
#> 1 0.9557333 1 1 -162 81 0
#> 2 0.2974196 2 1 -126 81 0
#> 3 0.9703617 3 1 -90 81 0
#> 4 0.3046196 4 1 -54 81 0
#> 5 0.7334711 5 1 -18 81 0
#> 6 0.8880635 6 1 18 81 0

R: Unwanted quotation marks in tibble::lst names

I only recently learned of tibble::lst, which creates a list object but automatically names list items. I'm using this as a shortcut within a %>% workflow that makes use of the names as the .id argument in map_dfr, so the automatic naming is really helpful.
However, the names are coming in with quotation marks around them. I noticed this because they awkwardly printed in axis tick labels in a ggplot, i.e. I had a label saying "Hartford" instead of Hartford.
I looked through issues on the tidyverse/tibble github but didn't find anything. Is this a bug, or am I doing something wrong?
library(dplyr)
library(purrr)
cities <- lst("New Haven", "Bridgeport", "Hartford")
cities
#> $`"New Haven"`
#> [1] "New Haven"
#>
#> $`"Bridgeport"`
#> [1] "Bridgeport"
#>
#> $`"Hartford"`
#> [1] "Hartford"
cities %>%
map_dfr(~tibble(dummy = rnorm(1)), .id = "city")
#> # A tibble: 3 x 2
#> city dummy
#> <chr> <dbl>
#> 1 "\"New Haven\"" -0.956
#> 2 "\"Bridgeport\"" 0.533
#> 3 "\"Hartford\"" -0.0553
At first I thought it might be to escape the space in "New Haven", but it happens with single characters as well:
lst("a", "b", "c")
#> $`"a"`
#> [1] "a"
#>
#> $`"b"`
#> [1] "b"
#>
#> $`"c"`
#> [1] "c"
It works as I expect when I provide names, but that defeats this advantage that lst has over the base list.
lst(a = "a", b = "b", c = "c")
#> $a
#> [1] "a"
#>
#> $b
#> [1] "b"
#>
#> $c
#> [1] "c"
Pretty sure I'm up to date on tidyverse-related packages, but here's my session info just in case:
sessionInfo()
#> R version 3.5.1 (2018-07-02)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS High Sierra 10.13.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] purrr_0.2.5 dplyr_0.7.6
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_0.12.18 knitr_1.20 bindr_0.1.1 magrittr_1.5
#> [5] tidyselect_0.2.4 R6_2.2.2 rlang_0.2.2 fansi_0.3.0
#> [9] stringr_1.3.1 tools_3.5.1 utf8_1.1.4 cli_1.0.0
#> [13] htmltools_0.3.6 yaml_2.2.0 assertthat_0.2.0 rprojroot_1.3-2
#> [17] digest_0.6.16 tibble_1.4.2 crayon_1.3.4 bindrcpp_0.2.2
#> [21] glue_1.3.0 evaluate_0.11 rmarkdown_1.10 stringi_1.2.4
#> [25] compiler_3.5.1 pillar_1.3.0 backports_1.1.2 pkgconfig_2.0.2
lst() is really meant to be used with variables. Such as
xa<-"a"
xb<-"b"
xc<-"c"
lst(xa,xb,xc)
# $`xa`
# [1] "a"
# $xb
# [1] "b"
# $xc
# [1] "c"
It doesn't play well with literal, unnamed values. It takes the name of the element from the unevaluated expression you pass in. So if you pass in a character value, that evaluated expression still has the quotes. I think you just want list() here. Possibly with names:
cities <- list("New Haven", "Bridgeport", "Hartford")
names(cities)<-unname(cities)
cities
# $`New Haven`
# [1] "New Haven"
# $Bridgeport
# [1] "Bridgeport"
# $Hartford
# [1] "Hartford"
or just write your own function
nlist <- function(...) {
setNames(list(...), c(...))
}
cities <- nlist("New Haven", "Bridgeport", "Hartford")

Inconsistent function behavoi in dplyr::mutate

I'd like to use dplyr::mutate to add p-values to a dataframe but it's not working and I can't get my head around why.
This works:
my_add<-function(x, y) x + y
str(my_add(5, 15))
#> num 20
df <- data.frame(success=c(5,8,4), fail=c(15,13,18))
mutate(df, total=my_add(success, fail))
#> success fail total
#> 1 5 15 20
#> 2 8 13 21
#>13 4 18 22
But this doesn't:
my_binom <- function(x, y) binom.test(x, y)$"p.value"
str(my_binom(5, 20))
#> num 0.0414
df <- data.frame(success=c(5,8), total=c(20,21))
mutate(df, p_value=my_binom(success, total))
#> success total p_value
#> 1 5 20 0.5810547
#> 2 8 21 0.5810547
df <- data.frame(success=c(5,8,4), total=c(20,21,22))
mutate(df, p_value=my_binom(success, total))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: incorrect length of 'x'.
Both functions take the same input and return a single numeric, so I can't wrap my head around this discrepancy. Can someone enlighten me as to what's going on? Thanks!
Session info:
sessionInfo()
#> R version 3.4.1 (2017-06-30)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: OS X El Capitan 10.11.6
#>
#> Matrix products: default
#> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] bindrcpp_0.2 dplyr_0.7.4
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.4.1 magrittr_1.5 assertthat_0.2.0 R6_2.2.2 tools_3.4.1
#> [6] glue_1.1.1 tibble_1.3.4 yaml_2.1.14 Rcpp_0.12.14 pkgconfig_2.0.1
#> [11] rlang_0.1.2 bindr_0.1
mutate(df, p_value = purrr::map2(success, total, my_binom))

Resources