tidyr::expand_grid() not behaving as expected; what am I missing?

tidyr::expand_grid() not behaving as expected; what am I missing? - r

Here's my reprex:
library(tidyverse)
# make some data
a = tibble(b=1:2,c=2:1)
print(a)
#> # A tibble: 2 x 2
#> b c
#> <int> <int>
#> 1 1 2
#> 2 2 1
expand_grid(a) # doesn't produce the expected output
#> # A tibble: 2 x 2
#> b c
#> <int> <int>
#> 1 1 2
#> 2 2 1
# expected output achieved by:
(
a
%>% as.list()
%>% map(unique)
%>% cross_df()
)
#> # A tibble: 4 x 2
#> b c
#> <int> <int>
#> 1 1 2
#> 2 2 2
#> 3 1 1
#> 4 2 1
Created on 2021-08-17 by the reprex package (v2.0.0)

It is a different behavior compared to expand.grid from base R. But, the behavior is similar if we use do.call (or the similar one from purrr i.e. invoke - retired or exec )
library(purrr)
library(tidyr)
invoke(expand_grid, a)
exec(expand_grid, !!! a) # from #Mike Lawrence comments
-output
# A tibble: 4 x 2
b c
<int> <int>
1 1 2
2 1 1
3 2 2
4 2 1
i.e. basically, expand.grid can work on list directly
expand.grid(a)
expand.grid(unclass(a))
whereas it is different behavior
expand_grid(unclass(a))
# A tibble: 2 x 1
`unclass(a)`
<named list>
1 <int [2]>
2 <int [2]>

Related

tidy syntax for matrix to tibble by index?

I have a matrix foo and want to create a data.frame or tibble like bar with the data in a long format with the indices as columns. What's a simple way to do this in the tidyverse?
z <- c(1,8,6,4,7,3,2,4,7)
foo <- matrix(z,3,3)
bar <- expand.grid(j=1:3,i=1:3)
bar$z <- z
foo
bar

Here are two ways.
The first is in fact a base R solution, just change magrittr's pipe for R's native pipe operator |>.
The second is a tidyverse solution which I find too complicated.
suppressPackageStartupMessages(
library(tidyverse)
)
z <- c(1,8,6,4,7,3,2,4,7)
foo <- matrix(z,3,3)
bar <- expand.grid(j=1:3,i=1:3)
bar$z <- z
cbind(
i = foo %>% row() %>% c(),
j = foo %>% col() %>% c(),
z = foo %>% c()
) %>%
as.data.frame()
#> i j z
#> 1 1 1 1
#> 2 2 1 8
#> 3 3 1 6
#> 4 1 2 4
#> 5 2 2 7
#> 6 3 2 3
#> 7 1 3 2
#> 8 2 3 4
#> 9 3 3 7
foo %>%
t() %>%
as.data.frame() %>%
pivot_longer(everything(), values_to = "z") %>%
mutate(i = c(row(foo)), j = c(col(foo))) %>%
select(-name) %>%
relocate(z, .after = j)
#> # A tibble: 9 × 3
#> i j z
#> <int> <int> <dbl>
#> 1 1 1 1
#> 2 2 1 8
#> 3 3 1 6
#> 4 1 2 4
#> 5 2 2 7
#> 6 3 2 3
#> 7 1 3 2
#> 8 2 3 4
#> 9 3 3 7
Created on 2022-10-12 with reprex v2.0.2

Another base R method would be to take advantage of as.table and as.data.frame
as.data.frame(lapply(as.data.frame(as.table(foo)), as.numeric),
col.names = c("row", "col", "val"))
#> row col val
#> 1 1 1 1
#> 2 2 1 8
#> 3 3 1 6
#> 4 1 2 4
#> 5 2 2 7
#> 6 3 2 3
#> 7 1 3 2
#> 8 2 3 4
#> 9 3 3 7

Add a column to a `tibble` that gives it's list position

I have a list of tibbles and I want to add a column to each tibble that represents it's position in a list.
Lets say I have the following:
library(tidyverse)
l <- list(
tibble(x = 1:3, y = rev(x)),
tibble(a = 3:1, b = rev(a))
)
Which produces:
> l
[[1]]
# A tibble: 3 x 2
x y
<int> <int>
1 1 3
2 2 2
3 3 1
[[2]]
# A tibble: 3 x 2
a b
<int> <int>
1 3 1
2 2 2
3 1 3
How can I use tidyverse syntax to get out the following:
> l
[[1]]
# A tibble: 3 x 2
x y list_pos
<int> <int> <int>
1 1 3 1
2 2 2 1
3 3 1 1
[[2]]
# A tibble: 3 x 2
a b list_pos
<int> <int> <int>
1 3 1 2
2 2 2 2
3 1 3 2

A possible solution:
library(tidyverse)
imap(l, ~ bind_cols(.x, pos = .y))
#> [[1]]
#> # A tibble: 3 x 3
#> x y pos
#> <int> <int> <int>
#> 1 1 3 1
#> 2 2 2 1
#> 3 3 1 1
#>
#> [[2]]
#> # A tibble: 3 x 3
#> a b pos
#> <int> <int> <int>
#> 1 3 1 2
#> 2 2 2 2
#> 3 1 3 2

iterative functions in R

I’m trying to create multiple new score columns based on other columns. I’d like to use a function to minimize copy pasting large blocks of code.
I’m trying to do something like:
Myfunction <- function(column){
Column_df <- old_df %>%
mutate(column.score = if_else(column = 1, “yes”, “no”)
)
}
Score_df <- Myfunction(c(math, reading, science)))
But I’m getting an error saying object math is not found

Starting with an example data frame as below
df <- purrr::map_dfc(c('math', 'reading', 'science', 'history'),
~ rlang::list2(!!.x := sample(1:3, 10, TRUE)))
df
#> # A tibble: 10 × 4
#> math reading science history
#> <int> <int> <int> <int>
#> 1 2 1 3 1
#> 2 3 2 3 1
#> 3 2 2 2 2
#> 4 2 3 1 2
#> 5 3 3 1 2
#> 6 1 2 3 2
#> 7 3 3 2 1
#> 8 3 3 3 2
#> 9 1 2 2 1
#> 10 2 2 2 3
You can create new "score" columns with a function by passing your columns argument to across inside {{ }}, and using the .name option to add ".score" to the name.
If you want only the "score" columns in the output, rather than to add them to existing columns, use transmute instead of mutate.
library(dplyr, warn.conflicts = FALSE)
Myfunction <- function(df, columns){
df %>%
mutate(across({{ columns }}, ~ if_else(. == 1, 'yes', 'no'),
.names = '{.col}.score'))
}
df %>%
Myfunction(c(math, reading, science))
#> # A tibble: 10 × 7
#> math reading science history math.score reading.score science.score
#> <int> <int> <int> <int> <chr> <chr> <chr>
#> 1 2 1 3 1 no yes no
#> 2 3 2 3 1 no no no
#> 3 2 2 2 2 no no no
#> 4 2 3 1 2 no no yes
#> 5 3 3 1 2 no no yes
#> 6 1 2 3 2 yes no no
#> 7 3 3 2 1 no no no
#> 8 3 3 3 2 no no no
#> 9 1 2 2 1 yes no no
#> 10 2 2 2 3 no no no
Created on 2022-01-18 by the reprex package (v2.0.1)

dplyr `slice_max` interpolation not working

Given a data.frame:
library(tidyverse)
set.seed(0)
df <- tibble(A = 1:10, B = rnorm(10), C = rbinom(10,2,0.6))
var <- "B"
I'd like to get filter the data frame by the highest values of the variable in var. Logically, I'd do either:
df %>%
slice_max({{ var }}, n = 5)
#> # A tibble: 1 × 3
#> A B C
#> <int> <dbl> <int>
#> 1 1 1.26 1
df %>%
slice_max(!! var, n = 5)
#> # A tibble: 1 × 3
#> A B C
#> <int> <dbl> <int>
#> 1 1 1.26 1
But neither interpolation is working... what am I missing here?
Expected output would be the same as:
df %>%
slice_max(B, n = 5)
#> # A tibble: 5 × 3
#> A B C
#> <int> <dbl> <int>
#> 1 10 2.40 0
#> 2 3 1.33 2
#> 3 4 1.27 1
#> 4 1 1.26 1
#> 5 5 0.415 2

I think you need to use the newer .data version as outlined here:
df %>%
slice_max(.data[[var]] , n = 5)
#> # A tibble: 5 × 3
#> A B C
#> <int> <dbl> <int>
#> 1 10 2.40 0
#> 2 3 1.33 2
#> 3 4 1.27 1
#> 4 1 1.26 1
#> 5 5 0.415 2
I am puzzled by why your approach is get the first row only though!

We may convert to sym and evaluate (!!)
library(dplyr)
df %>%
slice_max(!! rlang::sym(var), n = 5)
-output
# A tibble: 5 × 3
A B C
<int> <dbl> <int>
1 10 2.40 0
2 3 1.33 2
3 4 1.27 1
4 1 1.26 1
5 5 0.415 2

modify_at to remove NA values in each element in a list

I have a big list of small datasets like this:
>> my_list
[[1]]
# A tibble: 6 x 2
Year FIPS
<dbl> <chr>
1 2015 12001
2 2015 51013
3 2015 12081
4 2015 12115
5 2015 12127
6 2015 42003
[[2]]
# A tibble: 9 x 2
Year FIPS
<dbl> <chr>
1 2017 04013
2 2017 10003
3 2017 NA
4 2017 25005
5 2017 25009
6 2017 25013
7 2017 25017
8 2017 25021
9 2017 25027
...
I want to remove the NAs from each tibble using modify_at because looks like is a clean way to do it. This is my try:
my_list %>% modify_at(c("FIPS"), drop_na)
I tried also with na.omit, but I get the same error in both cases:
Error: character indexing requires a named object
Can anyone help me here, please? What I'm doing wrong?

Creating some data.
library(tidyverse)
mylist <-
list(tibble(a = c(1, 2, NA),
b = c(2, 2, 2)),
tibble(c = rep(1, 5),
d = sample(c(NA, 2), 5, replace = TRUE)))
The .at argument in purrr::modify_at() specifies the list element to modify, not the column within the dataframe nested in the list. purrr::modify() works for your purposes.
modify(mylist, drop_na)
#> [[1]]
#> # A tibble: 2 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
#> 2 2 2
#>
#> [[2]]
#> # A tibble: 4 x 2
#> c d
#> <dbl> <dbl>
#> 1 1 2
#> 2 1 2
#> 3 1 2
#> 4 1 2
purrr::map() also works. Since your input and output are both list objects, map() is sufficient here, while modify() would be preferred if your input is of another class than a regular list and you want to conserve that class attribute for the output.
map(mylist, drop_na)
#> [[1]]
#> # A tibble: 2 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
#> 2 2 2
#>
#> [[2]]
#> # A tibble: 4 x 2
#> c d
#> <dbl> <dbl>
#> 1 1 2
#> 2 1 2
#> 3 1 2
#> 4 1 2
base R
lapply(mylist, na.omit)
#> [[1]]
#> # A tibble: 2 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
#> 2 2 2
#>
#> [[2]]
#> # A tibble: 4 x 2
#> c d
#> <dbl> <dbl>
#> 1 1 2
#> 2 1 2
#> 3 1 2
#> 4 1 2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

tidyr::expand_grid() not behaving as expected; what am I missing? - r

Related

tidy syntax for matrix to tibble by index?

Add a column to a `tibble` that gives it's list position

iterative functions in R

dplyr `slice_max` interpolation not working

modify_at to remove NA values in each element in a list

Categories

Resources