dcast with empty left-hand side in formula - r

I'm having problems with using dcast withoud "id" variables. The expected result is basically a transposition -- creating a 1-row data frame with as many columns as there are rows in the original data frame.
I've tried different approaches, but only "hacks" seem to work for now. Before filing a bug, I wanted to double-check if I'm missing something.
d <- data.frame(variable=letters[1:3], value=1:3)
d
## variable value
## 1 a 1
## 2 b 2
## 3 c 3
reshape2::dcast(d, ...~variable)
## . a b c
## 1 . 1 2 3
reshape2::dcast(d, .~variable)
## . a b c
## 1 . 1 2 3
reshape2::dcast(d, ~variable)
## Error: subscript out of bounds
reshape2::dcast(d, 0~variable)
## 0 a b c
## 1 0 1 2 3
sessionInfo()
## R version 3.1.1 (2014-07-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets base
##
## loaded via a namespace (and not attached):
## [1] evaluate_0.5.6 formatR_1.0 knitr_1.6.18 methods_3.1.1
## [5] plyr_1.8.1 Rcpp_0.11.2 reshape2_1.4 stringr_0.6.2
## [9] tools_3.1.1 ulimit_0.0-2
What am I doing wrong? Why is dcast creating the odd and useless . column when using ...~variable or .~variable as formula?

Related

R: strange results when looking at the unique elements of two simple strings

I am absolutely puzzled at what I see.
I read an excel file and when I look at the unique values in a column of strings, I do not understand the result.
I can reproduce this in a minimal reprex (see below): why dd has two unique elements, wheread dd2 has just one?
Any suggestion is appreciated.
dd <- c("Grant", "Grant")
dd2 <- c("Grant", "Grant")
unique(dd)
#> [1] "Grant" "Grant"
length(unique(dd))
#> [1] 2
unique(dd2)
#> [1] "Grant"
length(unique(dd2))
#> [1] 1
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Debian GNU/Linux 11 (bullseye)
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#>
#> locale:
#> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
#> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
#> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] knitr_1.33 magrittr_2.0.1 rlang_0.4.11 fansi_0.5.0
#> [5] stringr_1.4.0 styler_1.5.1 highr_0.9 tools_4.1.1
#> [9] xfun_0.25 utf8_1.2.2 withr_2.4.2 htmltools_0.5.1.1
#> [13] ellipsis_0.3.2 yaml_2.2.1 digest_0.6.27 tibble_3.1.3
#> [17] lifecycle_1.0.0 crayon_1.4.1 purrr_0.3.4 vctrs_0.3.8
#> [21] fs_1.5.0 glue_1.4.2 evaluate_0.14 rmarkdown_2.10
#> [25] reprex_2.0.1 stringi_1.7.3 compiler_4.1.1 pillar_1.6.2
#> [29] backports_1.2.1 pkgconfig_2.0.3
Created on 2021-09-13 by the reprex package (v2.0.1)
The raw values seems to be different, probably from copying
sapply(dd, charToRaw)
$`Grant`
[1] ef bb bf 47 72 61 6e 74
$Grant
[1] 47 72 61 6e 74
whereas with dd2, it is the same
sapply(dd2, charToRaw)
Grant Grant
[1,] 47 47
[2,] 72 72
[3,] 61 61
[4,] 6e 6e
[5,] 74 74
There seems to be an extra character in the first case
nchar(dd)
[1] 6 5
If we remove that first character, unique will be 1
unique(c(substring(dd[1],2), dd[2]))
[1] "Grant"

'as.tibble' causes error in tibble 2.0.1 but not 1.4.2

I have written a function part of which converts a matrix to a tibble. This works without issues in tibble 1.4.2 but causes an error in 2.0.1.
The code that causes the error is as follows
library(tibble)
library(magrittr)
testmerge <- matrix( data = NA, ncol = 6 + 1, nrow = 0) %>%
as.tibble
The Error message is below
I can solve the problem by doing the following
testmerge <- matrix( data = NA, ncol = 6 + 1, nrow = 0) %>%
as.data.frame() %>%
as_tibble
But this seems a bit long winded.
What is happening that has caused this change? And how can I easily end up with a tibble of just empty columns?
You need to specify .name_repair; see ?as_tibble:
library(tibble)
library(magrittr)
sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] magrittr_1.5 tibble_2.0.1
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.0 digest_0.6.18 crayon_1.3.4 rprojroot_1.3-2
#> [5] backports_1.1.2 evaluate_0.11 pillar_1.3.1 rlang_0.3.1
#> [9] stringi_1.2.4 rmarkdown_1.10 tools_3.5.2 stringr_1.3.1
#> [13] yaml_2.2.0 compiler_3.5.2 pkgconfig_2.0.2 htmltools_0.3.6
#> [17] knitr_1.20
Your code worked just fine for me with tibble_1.4.2, as you describe, but after upgrading to tibble_2.0.1, I end up with the same error you had, but with a slightly more informative message that included the sentence Use .name_repair to specify repair.:
testmerge <- matrix( data = NA, ncol = 6 + 1, nrow = 0) %>%
as_tibble()
#> Error: Columns 1, 2, 3, 4, 5, … (and 2 more) must be named.
#> Use .name_repair to specify repair.
testmerge <- matrix( data = NA, ncol = 6 + 1, nrow = 0) %>%
as_tibble(.name_repair = "unique")
#> New names:
#> * `` -> `..1`
#> * `` -> `..2`
#> * `` -> `..3`
#> * `` -> `..4`
#> * `` -> `..5`
#> * … and 2 more
testmerge
#> # A tibble: 0 x 7
#> # … with 7 variables: ..1 <lgl>, ..2 <lgl>, ..3 <lgl>, ..4 <lgl>,
#> # ..5 <lgl>, ..6 <lgl>, ..7 <lgl>
Update, in the comments, #NelsonGon links to a GitHub issue, the discussion of which seems to have led to this new behavior.

Unclear warning when defining custom pipe operator

In my process I need to perform many dplyr::inner_joins. Thought I might define a custom pipe operator for it as explained here:
library(tidyverse)
library(rlang)
df1 <- tibble(a = 1:10, b = 11:20)
df2 <- tibble(a = 1:10, c = 21:30)
`%J>%` <- function(lhs, rhs){
inner_join(lhs, rhs)
}
df1 %J>% df2
This works as expected and I get:
Joining, by = "a"
# A tibble: 10 x 3
a b c
<int> <int> <int>
1 1 11 21
2 2 12 22
3 3 13 23
4 4 14 24
5 5 15 25
6 6 16 26
7 7 17 27
8 8 18 28
9 9 19 29
10 10 20 30
But then also a warning:
Warning message:
`chr_along()` is soft-deprecated as of rlang 0.2.0.
This warning is displayed once per session.
Plot thickens if I don't include library(rlang) at all (in a new session), in which case I get no warnings:
library(tidyverse)
df1 <- tibble(a = 1:10, b = 11:20)
df2 <- tibble(a = 1:10, c = 21:30)
`%J>%` <- function(lhs, rhs){
inner_join(lhs, rhs)
}
df1 %J>% df2
Obviously I don't have to include library(rlang) at all in this example, but if I did - this is one weird warning. Where is it coming from and how to avoid it if I did wanted to include library(rlang)?
sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)
Matrix products: default
locale:
[1] LC_COLLATE=English_Israel.1252 LC_CTYPE=English_Israel.1252 LC_MONETARY=English_Israel.1252 LC_NUMERIC=C LC_TIME=English_Israel.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rlang_0.3.0.1 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.6 purrr_0.2.5 readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_3.1.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.19 cellranger_1.1.0 pillar_1.3.0 compiler_3.5.1 plyr_1.8.4 bindr_0.1.1 tools_3.5.1 packrat_0.4.9-3 jsonlite_1.5 lubridate_1.7.4 nlme_3.1-137
[12] gtable_0.2.0 lattice_0.20-35 pkgconfig_2.0.2 cli_1.0.1 rstudioapi_0.8 haven_1.1.2 bindrcpp_0.2.2 withr_2.1.2 xml2_1.2.0 httr_1.3.1 hms_0.4.2
[23] grid_3.5.1 tidyselect_0.2.4 glue_1.3.0 R6_2.2.2 fansi_0.3.0 readxl_1.1.0 modelr_0.1.2 magrittr_1.5 backports_1.1.2 scales_1.0.0 rvest_0.3.2
[34] assertthat_0.2.0 colorspace_1.3-2 utf8_1.1.4 stringi_1.2.4 lazyeval_0.2.1 munsell_0.5.0 broom_0.5.0 crayon_1.3.4
From your description, I would say that If you load rlang as part of the tidyverse, (i.e. just load tidyverse), then R will use the verse's rlang which is automatically updated whithin the verse. If you load tidyverse first and then rlang, then R will use the last seen one, which is the one you loaded manually. Thus, If you did not update rlang manually then It will give the warning.
The problem should go away If you manually update rlang.

Strange behavior when subsetting with column names quoted with backticks in I of data.table

Look at the follow example generated with reprex:
library(data.table)
DT <- data.table(id = letters[1:3], `counts(a>=0)` = 1:3)
DT[`counts(a>=0)` >= 2] # 1
#> id counts(a>=0)
#> 1: b 2
#> 2: c 3
DT[`counts(a>=0)` == 2] # 2
#> Error in `[.data.table`(DT, `counts(a>=0)` == 2): Column(s) [counts(a] not found in x
DT[id == "a"] # 3
#> id counts(a>=0)
#> 1: a 1
As both the lines marked with #1 and #3 work, I wonder why subsetting with `counts(a>=0)` == 2 (#2) doesn't work.
SessionInfo:
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] reprex_0.1.2 data.table_1.11.2
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 rprojroot_1.3-2 digest_0.6.15 crayon_1.3.4 withr_2.1.2 assertthat_0.2.0 R6_2.2.2
[8] backports_1.1.2 magrittr_1.5 formatR_1.5 evaluate_0.10.1 stringi_1.1.6 debugme_1.1.0 rstudioapi_0.7
[15] callr_2.0.2 whisker_0.3-2 rmarkdown_1.9 devtools_1.13.5 tools_3.4.4 stringr_1.3.0 yaml_2.1.17
[22] compiler_3.4.4 htmltools_0.3.6 memoise_1.1.0 knitr_1.20
It works for me with :
DT[as.numeric(`counts(a>=0)`) == 2]

Inconsistent function behavoi in dplyr::mutate

I'd like to use dplyr::mutate to add p-values to a dataframe but it's not working and I can't get my head around why.
This works:
my_add<-function(x, y) x + y
str(my_add(5, 15))
#> num 20
df <- data.frame(success=c(5,8,4), fail=c(15,13,18))
mutate(df, total=my_add(success, fail))
#> success fail total
#> 1 5 15 20
#> 2 8 13 21
#>13 4 18 22
But this doesn't:
my_binom <- function(x, y) binom.test(x, y)$"p.value"
str(my_binom(5, 20))
#> num 0.0414
df <- data.frame(success=c(5,8), total=c(20,21))
mutate(df, p_value=my_binom(success, total))
#> success total p_value
#> 1 5 20 0.5810547
#> 2 8 21 0.5810547
df <- data.frame(success=c(5,8,4), total=c(20,21,22))
mutate(df, p_value=my_binom(success, total))
#> Error in mutate_impl(.data, dots) :
#> Evaluation error: incorrect length of 'x'.
Both functions take the same input and return a single numeric, so I can't wrap my head around this discrepancy. Can someone enlighten me as to what's going on? Thanks!
Session info:
sessionInfo()
#> R version 3.4.1 (2017-06-30)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: OS X El Capitan 10.11.6
#>
#> Matrix products: default
#> BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] bindrcpp_0.2 dplyr_0.7.4
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_3.4.1 magrittr_1.5 assertthat_0.2.0 R6_2.2.2 tools_3.4.1
#> [6] glue_1.1.1 tibble_1.3.4 yaml_2.1.14 Rcpp_0.12.14 pkgconfig_2.0.1
#> [11] rlang_0.1.2 bindr_0.1
mutate(df, p_value = purrr::map2(success, total, my_binom))

Resources