Here's my reprex:
library(tidyverse)
# make some data
a = tibble(b=1:2,c=2:1)
print(a)
#> # A tibble: 2 x 2
#> b c
#> <int> <int>
#> 1 1 2
#> 2 2 1
expand_grid(a) # doesn't produce the expected output
#> # A tibble: 2 x 2
#> b c
#> <int> <int>
#> 1 1 2
#> 2 2 1
# expected output achieved by:
(
a
%>% as.list()
%>% map(unique)
%>% cross_df()
)
#> # A tibble: 4 x 2
#> b c
#> <int> <int>
#> 1 1 2
#> 2 2 2
#> 3 1 1
#> 4 2 1
Created on 2021-08-17 by the reprex package (v2.0.0)
It is a different behavior compared to expand.grid from base R. But, the behavior is similar if we use do.call (or the similar one from purrr i.e. invoke - retired or exec )
library(purrr)
library(tidyr)
invoke(expand_grid, a)
exec(expand_grid, !!! a) # from #Mike Lawrence comments
-output
# A tibble: 4 x 2
b c
<int> <int>
1 1 2
2 1 1
3 2 2
4 2 1
i.e. basically, expand.grid can work on list directly
expand.grid(a)
expand.grid(unclass(a))
whereas it is different behavior
expand_grid(unclass(a))
# A tibble: 2 x 1
`unclass(a)`
<named list>
1 <int [2]>
2 <int [2]>
Related
I have a matrix foo and want to create a data.frame or tibble like bar with the data in a long format with the indices as columns. What's a simple way to do this in the tidyverse?
z <- c(1,8,6,4,7,3,2,4,7)
foo <- matrix(z,3,3)
bar <- expand.grid(j=1:3,i=1:3)
bar$z <- z
foo
bar
Here are two ways.
The first is in fact a base R solution, just change magrittr's pipe for R's native pipe operator |>.
The second is a tidyverse solution which I find too complicated.
suppressPackageStartupMessages(
library(tidyverse)
)
z <- c(1,8,6,4,7,3,2,4,7)
foo <- matrix(z,3,3)
bar <- expand.grid(j=1:3,i=1:3)
bar$z <- z
cbind(
i = foo %>% row() %>% c(),
j = foo %>% col() %>% c(),
z = foo %>% c()
) %>%
as.data.frame()
#> i j z
#> 1 1 1 1
#> 2 2 1 8
#> 3 3 1 6
#> 4 1 2 4
#> 5 2 2 7
#> 6 3 2 3
#> 7 1 3 2
#> 8 2 3 4
#> 9 3 3 7
foo %>%
t() %>%
as.data.frame() %>%
pivot_longer(everything(), values_to = "z") %>%
mutate(i = c(row(foo)), j = c(col(foo))) %>%
select(-name) %>%
relocate(z, .after = j)
#> # A tibble: 9 × 3
#> i j z
#> <int> <int> <dbl>
#> 1 1 1 1
#> 2 2 1 8
#> 3 3 1 6
#> 4 1 2 4
#> 5 2 2 7
#> 6 3 2 3
#> 7 1 3 2
#> 8 2 3 4
#> 9 3 3 7
Created on 2022-10-12 with reprex v2.0.2
Another base R method would be to take advantage of as.table and as.data.frame
as.data.frame(lapply(as.data.frame(as.table(foo)), as.numeric),
col.names = c("row", "col", "val"))
#> row col val
#> 1 1 1 1
#> 2 2 1 8
#> 3 3 1 6
#> 4 1 2 4
#> 5 2 2 7
#> 6 3 2 3
#> 7 1 3 2
#> 8 2 3 4
#> 9 3 3 7
I have a list of tibbles and I want to add a column to each tibble that represents it's position in a list.
Lets say I have the following:
library(tidyverse)
l <- list(
tibble(x = 1:3, y = rev(x)),
tibble(a = 3:1, b = rev(a))
)
Which produces:
> l
[[1]]
# A tibble: 3 x 2
x y
<int> <int>
1 1 3
2 2 2
3 3 1
[[2]]
# A tibble: 3 x 2
a b
<int> <int>
1 3 1
2 2 2
3 1 3
How can I use tidyverse syntax to get out the following:
> l
[[1]]
# A tibble: 3 x 2
x y list_pos
<int> <int> <int>
1 1 3 1
2 2 2 1
3 3 1 1
[[2]]
# A tibble: 3 x 2
a b list_pos
<int> <int> <int>
1 3 1 2
2 2 2 2
3 1 3 2
A possible solution:
library(tidyverse)
imap(l, ~ bind_cols(.x, pos = .y))
#> [[1]]
#> # A tibble: 3 x 3
#> x y pos
#> <int> <int> <int>
#> 1 1 3 1
#> 2 2 2 1
#> 3 3 1 1
#>
#> [[2]]
#> # A tibble: 3 x 3
#> a b pos
#> <int> <int> <int>
#> 1 3 1 2
#> 2 2 2 2
#> 3 1 3 2
I’m trying to create multiple new score columns based on other columns. I’d like to use a function to minimize copy pasting large blocks of code.
I’m trying to do something like:
Myfunction <- function(column){
Column_df <- old_df %>%
mutate(column.score = if_else(column = 1, “yes”, “no”)
)
}
Score_df <- Myfunction(c(math, reading, science)))
But I’m getting an error saying object math is not found
Starting with an example data frame as below
df <- purrr::map_dfc(c('math', 'reading', 'science', 'history'),
~ rlang::list2(!!.x := sample(1:3, 10, TRUE)))
df
#> # A tibble: 10 × 4
#> math reading science history
#> <int> <int> <int> <int>
#> 1 2 1 3 1
#> 2 3 2 3 1
#> 3 2 2 2 2
#> 4 2 3 1 2
#> 5 3 3 1 2
#> 6 1 2 3 2
#> 7 3 3 2 1
#> 8 3 3 3 2
#> 9 1 2 2 1
#> 10 2 2 2 3
You can create new "score" columns with a function by passing your columns argument to across inside {{ }}, and using the .name option to add ".score" to the name.
If you want only the "score" columns in the output, rather than to add them to existing columns, use transmute instead of mutate.
library(dplyr, warn.conflicts = FALSE)
Myfunction <- function(df, columns){
df %>%
mutate(across({{ columns }}, ~ if_else(. == 1, 'yes', 'no'),
.names = '{.col}.score'))
}
df %>%
Myfunction(c(math, reading, science))
#> # A tibble: 10 × 7
#> math reading science history math.score reading.score science.score
#> <int> <int> <int> <int> <chr> <chr> <chr>
#> 1 2 1 3 1 no yes no
#> 2 3 2 3 1 no no no
#> 3 2 2 2 2 no no no
#> 4 2 3 1 2 no no yes
#> 5 3 3 1 2 no no yes
#> 6 1 2 3 2 yes no no
#> 7 3 3 2 1 no no no
#> 8 3 3 3 2 no no no
#> 9 1 2 2 1 yes no no
#> 10 2 2 2 3 no no no
Created on 2022-01-18 by the reprex package (v2.0.1)
Given a data.frame:
library(tidyverse)
set.seed(0)
df <- tibble(A = 1:10, B = rnorm(10), C = rbinom(10,2,0.6))
var <- "B"
I'd like to get filter the data frame by the highest values of the variable in var. Logically, I'd do either:
df %>%
slice_max({{ var }}, n = 5)
#> # A tibble: 1 × 3
#> A B C
#> <int> <dbl> <int>
#> 1 1 1.26 1
df %>%
slice_max(!! var, n = 5)
#> # A tibble: 1 × 3
#> A B C
#> <int> <dbl> <int>
#> 1 1 1.26 1
But neither interpolation is working... what am I missing here?
Expected output would be the same as:
df %>%
slice_max(B, n = 5)
#> # A tibble: 5 × 3
#> A B C
#> <int> <dbl> <int>
#> 1 10 2.40 0
#> 2 3 1.33 2
#> 3 4 1.27 1
#> 4 1 1.26 1
#> 5 5 0.415 2
I think you need to use the newer .data version as outlined here:
df %>%
slice_max(.data[[var]] , n = 5)
#> # A tibble: 5 × 3
#> A B C
#> <int> <dbl> <int>
#> 1 10 2.40 0
#> 2 3 1.33 2
#> 3 4 1.27 1
#> 4 1 1.26 1
#> 5 5 0.415 2
I am puzzled by why your approach is get the first row only though!
We may convert to sym and evaluate (!!)
library(dplyr)
df %>%
slice_max(!! rlang::sym(var), n = 5)
-output
# A tibble: 5 × 3
A B C
<int> <dbl> <int>
1 10 2.40 0
2 3 1.33 2
3 4 1.27 1
4 1 1.26 1
5 5 0.415 2
I have a big list of small datasets like this:
>> my_list
[[1]]
# A tibble: 6 x 2
Year FIPS
<dbl> <chr>
1 2015 12001
2 2015 51013
3 2015 12081
4 2015 12115
5 2015 12127
6 2015 42003
[[2]]
# A tibble: 9 x 2
Year FIPS
<dbl> <chr>
1 2017 04013
2 2017 10003
3 2017 NA
4 2017 25005
5 2017 25009
6 2017 25013
7 2017 25017
8 2017 25021
9 2017 25027
...
I want to remove the NAs from each tibble using modify_at because looks like is a clean way to do it. This is my try:
my_list %>% modify_at(c("FIPS"), drop_na)
I tried also with na.omit, but I get the same error in both cases:
Error: character indexing requires a named object
Can anyone help me here, please? What I'm doing wrong?
Creating some data.
library(tidyverse)
mylist <-
list(tibble(a = c(1, 2, NA),
b = c(2, 2, 2)),
tibble(c = rep(1, 5),
d = sample(c(NA, 2), 5, replace = TRUE)))
The .at argument in purrr::modify_at() specifies the list element to modify, not the column within the dataframe nested in the list. purrr::modify() works for your purposes.
modify(mylist, drop_na)
#> [[1]]
#> # A tibble: 2 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
#> 2 2 2
#>
#> [[2]]
#> # A tibble: 4 x 2
#> c d
#> <dbl> <dbl>
#> 1 1 2
#> 2 1 2
#> 3 1 2
#> 4 1 2
purrr::map() also works. Since your input and output are both list objects, map() is sufficient here, while modify() would be preferred if your input is of another class than a regular list and you want to conserve that class attribute for the output.
map(mylist, drop_na)
#> [[1]]
#> # A tibble: 2 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
#> 2 2 2
#>
#> [[2]]
#> # A tibble: 4 x 2
#> c d
#> <dbl> <dbl>
#> 1 1 2
#> 2 1 2
#> 3 1 2
#> 4 1 2
base R
lapply(mylist, na.omit)
#> [[1]]
#> # A tibble: 2 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
#> 2 2 2
#>
#> [[2]]
#> # A tibble: 4 x 2
#> c d
#> <dbl> <dbl>
#> 1 1 2
#> 2 1 2
#> 3 1 2
#> 4 1 2