R purrr how to rename column of nested df - r

I have a list of data frames, each with two columns named "place" and "data".
"place" is a character and "data" is a nested data frame with one numeric column named "value".
For each data frame from the list, I'd like to rename the "value" column of the nested data frame with the value of "place" column.
library(tidyverse)
some_dt = tibble(place = c("a","a", "b","b","c","c"),
value = c(1,2,1,4,5,6))
# here is a list of data frames...
ls_df <-
some_dt %>%
group_by(place) %>%
nest() %>%
split(.$place)
I'm tried:
map2(ls_df$data,
ls_df$place,
~rename(.x, .y = "value"))
or:
map2(ls_df$data,
ls_df$place,
~rename_with(.x, ~ .y, "value"))
but I'm getting an empty list as result.
How can I rename the "value" column with the content of the outer data frame column?

We may loop over the list ('ls_df') with map, extract the 'place' column and then rename the extracted 'data' column with the 'place' value
library(dplyr)
library(purrr)
ls_df2 <- map(ls_df, ~ {
nm <- .x$place
.x$data[[1]] <- .x$data[[1]] %>%
rename_with(~ nm, "value")
.x
})
-checking
> map(ls_df2, ~ .x$data)
$a
$a[[1]]
# A tibble: 2 × 1
a
<dbl>
1 1
2 2
$b
$b[[1]]
# A tibble: 2 × 1
b
<dbl>
1 1
2 4
$c
$c[[1]]
# A tibble: 2 × 1
c
<dbl>
1 5
2 6
Note that when we are splitting the data, it returns a list. Therefore, we cannot access the columns 'data' directly i.e
> ls_df$data
NULL
> ls_df$place
NULL
Or another option is
some_dt %>%
nest_by(place) %>%
mutate(data = data %>%
rename_with(~ place, value) %>%
list(.)) %>%
ungroup
# A tibble: 3 × 2
place data
<chr> <list>
1 a <tibble [2 × 1]>
2 b <tibble [2 × 1]>
3 c <tibble [2 × 1]>

You can also iterate over each list element and then use mutate to rename the nested data frame using the place.
ls_df %>%
modify(~ mutate(.x,
data = map(data,
~ set_names(.x, first(place)))))
In this case, you can actually simplify this further.
ls_df %>%
modify(~ mutate(.x,
data = map2(data, place, set_names)))
# which can collapse down to as simple as this
ls_df %>%
modify(mutate, data = map2(data, place, set_names))
With that approach, you can actually consider whether you actually need the list. The nested tibble may be easier to work with directly.
ls_df %>%
bind_rows() %>%
mutate(data = map2(data, place, set_names))

You could also try something like this:
library(tidyverse)
map(ls_df,
~ map2(.x$place,
.x$data,
~rename(.y,
!!sym(.x) := value)
)
)
#> $a
#> $a[[1]]
#> # A tibble: 2 x 1
#> a
#> <dbl>
#> 1 1
#> 2 2
#>
#>
#> $b
#> $b[[1]]
#> # A tibble: 2 x 1
#> b
#> <dbl>
#> 1 1
#> 2 4
#>
#>
#> $c
#> $c[[1]]
#> # A tibble: 2 x 1
#> c
#> <dbl>
#> 1 5
#> 2 6

You can create a function which renames using base colnames() then map that over all the list elements as follows:
# The fn:
rnm <- function(x) {
colnames(x$data[[1]]) <- x$place
x
}
# Result:
res <- ls_df |> purrr::map(.f = rnm)
# Check if it's the desired output:
res$a$data
# [[1]]
# A tibble: 2 × 1
# a
# <dbl>
# 1 1
# 2 2

Related

Accessing variable name in for loop in R?

I am trying to run a for loop where I randomly subsample a dataset using sample_n command. I also want to name each new subsampled dataframe as "df1" "df2" "df3". Where the numbers correspond to i in the for loop. I know the way I wrote this code is wrong and why i am getting the error. How can I access "df" "i" in the for loop so that it reads as df1, df2, etc.? Happy to clarify if needed. Thanks!
for (i in 1:9){ print(get(paste("df", i, sep=""))) = sub %>%
group_by(dietAandB) %>%
sample_n(1) }
Error in print(get(paste("df", i, sep = ""))) = sub %>% group_by(dietAandB) %>% :
target of assignment expands to non-language object
Instead of using get you could use assign.
Using some fake example data:
library(dplyr, warn=FALSE)
sub <- data.frame(
dietAandB = LETTERS[1:2]
)
for (i in 1:2) {
assign(paste0("df", i), sub %>% group_by(dietAandB) %>% sample_n(1) |> ungroup())
}
df1
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
df2
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
But the more R-ish way to do this would be to use a list instead of creating single objects:
df <- list(); for (i in 1:2) { df[[i]] = sub %>% group_by(dietAandB) %>% sample_n(1) |> ungroup() }
df
#> [[1]]
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
#>
#> [[2]]
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
Or more concise to use lapply instead of a for loop
df <- lapply(1:2, function(x) sub %>% group_by(dietAandB) %>% sample_n(1) |> ungroup())
df
#> [[1]]
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
#>
#> [[2]]
#> # A tibble: 2 × 1
#> dietAandB
#> <chr>
#> 1 A
#> 2 B
It depends on the sample size which is missing in your question. So, As an example I considered the mtcars dataset (32 rows) and sampling three subsamples of size 20 from the data:
library(dplyr)
for (i in 1:3) {
assign(paste0("df", i), sample_n(mtcars, 20))
}

How to paste strings between tibble's character column and rows in a nested list-column

I have a tibble with one character column and one list-column that nests dataframes. I want to collapse the dataframes in the list-column (using dplyr::bind_rows()) and append the respective value from the character column for each row.
Example
library(tibble)
my_tibble <-
tibble(category = c("color", "shape"),
items = list(tibble(item = c("red", "blue"), value = c(1, 2)),
tibble(item = c("square", "triangle"), value = c(1, 2))
))
> my_tibble
## # A tibble: 2 x 2
## category items
## <chr> <list>
## 1 color <tibble [2 x 2]>
## 2 shape <tibble [2 x 2]>
I know how to collapse the entire items column:
library(dplyr)
my_tibble %>%
pull(items) %>%
bind_rows()
## # A tibble: 4 x 2
## item value
## <chr> <dbl>
## 1 red 1
## 2 blue 2
## 3 square 1
## 4 triangle 2
But what I'm trying to achieve is to paste the values from the category column of my_tibble to get:
desired output
## # A tibble: 4 x 2
## item value
## <chr> <dbl>
## 1 color_red 1
## 2 color_blue 2
## 3 shape_square 1
## 4 shape_triangle 2
How can I do this?
UPDATE
I think that tidyr::unnest_longer() brings me closer to the target:
library(tidyr)
my_tibble %>%
unnest_longer(items)
# A tibble: 4 x 2
category items$item $value
<chr> <chr> <dbl>
1 color red 1
2 color blue 2
3 shape square 1
4 shape triangle 2
But not sure how to progress. Trying to append with tidyr::unite() fails:
my_tibble %>%
unnest_longer(items) %>%
unite("category", `items$item`)
Error: Can't subset columns that don't exist.
x Column items$item doesn't exist.
unnest() returns an output that's easier to work with than unnest_longer():
library(tidyr)
my_tibble %>%
unnest(items) %>%
unite(col = item, category, item)
## # A tibble: 4 x 2
## item value
## <chr> <dbl>
## 1 color_red 1
## 2 color_blue 2
## 3 shape_square 1
## 4 shape_triangle 2
It's not the nicer way, but it works. Try this:
library(dlpyr)
my_tibble %>%
group_by(category) %>%
group_modify(~data.frame(.$items)) %>%
ungroup() %>%
mutate(item=paste(category,item,sep="_")) %>%
select(-category)

Generate a list of tibble from a tibble by using map and select

I want to generate list of tibble fron one tibble in the following codes.
tbl = tibble(id=1:10, a = rnorm(10), b = rnorm(10))
tbl_list = c("a", "b") %>% map(~ tbl %>% select(c("id", .)))
The output I want is
tbl_list
[[1]]
# A tibble: 2 x 2
id a
<int> <dbl>
1 1 -0.704
2 2 -0.917
[[2]]
# A tibble: 2 x 2
id a
<int> <dbl>
1 1 -0.704
2 2 -0.917
However, it shows the error message,
"c("id", .) must evaluate to column positions or names, not a list" ,
so it seems that . is not recognized a character, but a list
Could you tell me how to avoid this error?
You can use .x to access the element
library(tidyverse)
c("a", "b") %>% map(~ tbl %>% select(c("id", .x)))
#[[1]]
# A tibble: 10 x 2
# id a
# <int> <dbl>
# 1 1 1.42
# 2 2 1.51
# 3 3 -0.385
#...
#[[2]]
# A tibble: 10 x 2
# id b
# <int> <dbl>
# 1 1 1.42
# 2 2 0.100
# 3 3 1.28
#....
You can also use . but while using it in chain operation . is referring to the object which is on the left-side of the chain i.e tbl in this case , hence it returns an error. To use . one way is
c("a", "b") %>% map(~select(tbl, c('id', .)))

R collapse column to form numeric list

In R hoe do I collapse column to form another column with numeric lists types.
like we define numeric list as l = c(1,2,3)
df <- read.table(text = "X Y
a 26
a 3
a 24
b 8
b 1
b 4
", header = TRUE)
I am trying this with dplyr but it gives me character list column
> df %>% group_by(X) %>% summarise(lst= paste0(Y, collapse = ","))
# A tibble: 2 x 2
X lst
<fct> <chr>
1 a 26,3,24
2 b 8,1,4
group by X then summarise Y as list
library(dplyr)
out <- df %>%
group_by(X) %>%
summarise(Y = list(Y))
out
# A tibble: 2 x 2
# X Y
# <fct> <list>
#1 a <int [3]>
#2 b <int [3]>
The Y column now looks like this
out$Y
#[[1]]
#[1] 26 3 24
#
#[[2]]
#[1] 8 1 4
nest seems to be another option but this would result in a list column of tibbles (not what you want I think)
df %>%
group_by(X) %>%
nest()
# A tibble: 2 x 2
# X data
# <fct> <list>
#1 a <tibble [3 × 1]>
#2 b <tibble [3 × 1]>
A data.table solution:
library(data.table)
dt <- as.data.table(df)[, list(Y=list(Y)), by="X"]
> dt
X Y
1: a 26, 3,24
2: b 8,1,4
> dt$Y
[[1]]
[1] 26 3 24
[[2]]
[1] 8 1 4

dplyr: passing a grouped tibble to a custom function

(The following scenario simplifies my actual situation)
My data comes from villages, and I would like to summarize an outcome variable by a village variable.
> data
village A Z Y
<chr> <int> <int> <dbl>
1 a 1 1 500
2 a 1 1 400
3 a 1 0 800
4 b 1 0 300
5 b 1 1 700
For example, I would like to calculate the mean of Y only using Z==z by villages. In this case, I want to have (500 + 400)/2 = 450 for village "a" and 700 for village "b".
Please note that the actual situation is more complicated and I cannot directly use this answer, but the point is I need to pass a grouped tibble and a global variable (z) to my function.
z <- 1 # z takes 0 or 1
data %>%
group_by(village) %>% # grouping by village
summarize(Y_village = Y_hat_village(., z)) # pass a part of tibble and a global variable
Y_hat_village <- function(data_village, z){
# This function takes a part of tibble (`data_village`) and a variable `z`
# Calculate the mean for a specific z in a village
data_z <- data_village %>% filter(Z==get("z"))
return(mean(data_z$Y))
}
However, I found . passes entire tibble and the code above returns the same values for all groups.
There are a couple things you can simplify. One is in your function: since you're passing in a value z to the function, you don't need to use get("z"). You have a z in the global environment that you pass in; or, more safely, assign your z value to a variable with some other name so you don't run into scoping issues, and pass that in to the function. In this case, I'm calling it z_val.
library(tidyverse)
z_val <- 1
Y_hat_village2 <- function(data, z) {
data_z <- data %>% filter(Z == z)
return(mean(data_z$Y))
}
You can make the function call on each group using do, which will get you a list-column, and then unnesting that column. Again note that I'm passing in the variable z_val to the argument z.
df %>%
group_by(village) %>%
do(y_hat = Y_hat_village2(., z = z_val)) %>%
unnest()
#> # A tibble: 2 x 2
#> village y_hat
#> <chr> <dbl>
#> 1 a 450
#> 2 b 700
However, do is being deprecated in favor of purrr::map, which I am still having trouble getting the hang of. In this case, you can group and nest, which gives a column of data frames called data, then map over that column and again supply z = z_val. When you unnest the y_hat column, you still have the original data as a nested column, since you wanted access to the rest of the columns still.
df %>%
group_by(village) %>%
nest() %>%
mutate(y_hat = map(data, ~Y_hat_village2(., z = z_val))) %>%
unnest(y_hat)
#> # A tibble: 2 x 3
#> village data y_hat
#> <chr> <list> <dbl>
#> 1 a <tibble [3 × 3]> 450
#> 2 b <tibble [2 × 3]> 700
Just to check that everything works okay, I also passed in z = 0 to check for 1. scoping issues, and 2. that other values of z work.
df %>%
group_by(village) %>%
nest() %>%
mutate(y_hat = map(data, ~Y_hat_village2(., z = 0))) %>%
unnest(y_hat)
#> # A tibble: 2 x 3
#> village data y_hat
#> <chr> <list> <dbl>
#> 1 a <tibble [3 × 3]> 800
#> 2 b <tibble [2 × 3]> 300
As an extension/modification to #patL's answer, you can also wrap the tidyverse solution within purrr:map to return a list of two tibbles, one for each z value:
z <- c(0, 1);
map(z, ~df %>% filter(Z == .x) %>% group_by(village) %>% summarise(Y.mean = mean(Y)))
#[[1]]
## A tibble: 2 x 2
# village Y.mean
# <fct> <dbl>
#1 a 800.
#2 b 300.
#
#[[2]]
## A tibble: 2 x 2
# village Y.mean
# <fct> <dbl>
#1 a 450.
#2 b 700.
Sample data
df <- read.table(text =
" village A Z Y
1 a 1 1 500
2 a 1 1 400
3 a 1 0 800
4 b 1 0 300
5 b 1 1 700 ", header = T)
You can use dplyr to accomplish it:
library(dplyr)
df %>%
group_by(village) %>%
filter(Z == 1) %>%
summarise(Y_village = mean(Y))
## A tibble: 2 x 2
# village Y_village
# <chr> <dbl>
#1 a 450
#2 b 700
To get all columns:
df %>%
group_by(village) %>%
filter(Z == 1) %>%
mutate(Y_village = mean(Y)) %>%
distinct(village, A, Z, Y_village)
## A tibble: 2 x 4
## Groups: village [2]
# village A Z Y_village
# <chr> <dbl> <dbl> <dbl>
#1 a 1 1 450
#2 b 1 1 700
data
df <- data_frame(village = c("a", "a", "a", "b", "b"),
A = rep(1, 5),
Z = c(1, 1, 0, 0, 1),
Y = c(500, 400, 800, 30, 700))

Resources