Split a list of dataframes into separate dataframes [duplicate] - r

So I have a list with me as below, what I want is to split them into three separate dataframes (with names as Banana/Strawberry & apple) as shown in expected output. I have already seen this (Splitting List into dataframe R) but its exact opposite of what I want. I dont want to combine then I want to split them into three dataframe with same name as list header.
list_a <- list(`Banana` = c(8.7), `Strawberry` = c(2.3), `Apple` = c(3.5))
DF1
Banana
8.7
DF2
Strawberry
2.3
DF3
Apple
3.5
Any Solution preferably in Tidyverse would be greatly appreciated. Actual problem has lot more columns in the list.

First convert them all to a tibble:
list_a <- list(`Banana` = c(8.7), `Strawberry` = c(2.3), `Apple` = c(3.5))
list_a <- purrr::map(list_a, tibble::as_tibble)
Then send this to the global environment:
list2env(list_a, envir = .GlobalEnv)

We can use imap to get the names and then use set_names
library(purrr)
library(dplyr)
library(stringr)
imap(list_a, ~ set_names(tibble(.x), .y)) %>%
set_names(str_c("DF", 1:3)) %>%
list2env(.GlobalEnv)
DF1
# A tibble: 1 x 1
# Banana
# <dbl>
#1 8.7
DF2
# A tibble: 1 x 1
# Strawberry
# <dbl>
#1 2.3
DF3
# A tibble: 1 x 1
# Apple
# <dbl>
#1 3.5
If we need separate columns
library(tibble)
enframe(list_a) %>%
unnest(c(value)) %>%
group_split(rn = row_number(), keep = FALSE) %>%
set_names(str_c("DF", 1:3)) %>%
list2env(.GlobalEnv)
DF1
# A tibble: 1 x 2
# name value
# <chr> <dbl>
#1 Banana 8.7
DF2
# A tibble: 1 x 2
# name value
# <chr> <dbl>
#1 Strawberry 2.3
DF3
# A tibble: 1 x 2
# name value
# <chr> <dbl>
#1 Apple 3.5

A tidyverse way would be
library(tidyverse)
new_list <- set_names(map2(list_a,names(list_a),
~tibble(!!.y := .x)), str_c("df", 1:3))
and it can be done in base R as well
new_list <- setNames(Map(function(x, y) setNames(data.frame(x), y),
list_a,names(list_a)), paste0("df", 1:3))
Now we can write it into global environment.
list2env(new_list, .GlobalEnv)

Less straightforward than previous answers but you can get it using a for loop:
for(i in 1:length(list_a))
{
df <- data.frame(unlist(list_a[[i]]))
colnames(df) <- names(list_a[i])
assign(names(list_a[i]),df, .GlobalEnv)
}

Related

how to pass column names including space in R

assume my column names are: User ID and name
how should I pass this column name to functions like what I have below?
df %>%
group_by(User ID) %>%
count(name)
apparently, group_by() or similar functions do not accept column names with space in their names.
You need to use tibble instead of data.frame:
library(tidyverse)
df <- tibble(`User ID` = 1:2, x = 5:6)
df %>%
group_by(`User ID`) %>%
summarise(total = sum(x))
#> # A tibble: 2 × 2
#> `User ID` total
#> <int> <int>
#> 1 1 5
#> 2 2 6

Iterating name of a field with dplyr::summarise function

first time for me here, I'll try to explain you my problem as clearly as possible.
I'm working on erosion data contained in farms in the form of pixels (e.g. 1 farm = 10 pixels so 10 lines in my df), for this I have 4 df in a list, and I would like to calculate for each farm the mean of erosion. I thought about a loop on the name of erosion field but my problem is that my df don't have the exact name (either ERO13 or ERO17). I don't want to work the position of the field because it could change between the df, only with the name which is variable.
Here's a example :
df1 <- data.frame(ID = c(1,1,2), ERO13 = c(2,4,6))
df2 <- data.frame(ID = c(4,4,6), ERO17 = c(4,5,12))
lst_df <- list(df1,df2)
for (df in lst_df){
cur_df <- df
cur_df <- cur_df %>%
group_by(ID) %>%
summarise(current_name_of_erosion_field = mean(current_name_of_erosion_field))
}
I tried with
for (df in lst_df){
cur_df <- df
cur_camp <- names(cur_df)[2]
cur_df <- cur_df %>%
group_by(ID) %>%
summarise(cur_camp = mean(cur_camp))
}
but first doesn't work because it's a string character and not a variable containing the string character and it works with the position.
How can I build the current_name_of_erosion_field here ?
We may convert it to symbol and evaluate (!!) or may pass the string across. Also, as we are using a for loop, make sure to create a list to store the output. Also, to assign from an object created, use := with !!
out <- vector('list', length(lst_df))
for (i in seq_along(lst_df)){
cur_df <- lst_df[[i]]
cur_camp <- names(cur_df)[2]
cur_df <- cur_df %>%
group_by(ID) %>%
summarise(!!cur_camp := mean(!! sym(cur_camp)))
out[[i]] <- cur_df
}
-output
> out
[[1]]
# A tibble: 2 × 2
ID ERO13
<dbl> <dbl>
1 1 3
2 2 6
[[2]]
# A tibble: 2 × 2
ID ERO17
<dbl> <dbl>
1 4 4.5
2 6 12
Or may use across
out <- vector('list', length(lst_df))
for (i in seq_along(lst_df)){
cur_df <- lst_df[[i]]
cur_camp <- names(cur_df)[2]
cur_df <- cur_df %>%
group_by(ID) %>%
summarise(across(all_of(cur_camp), mean))
out[[i]] <- cur_df
}
-output
> out
[[1]]
# A tibble: 2 × 2
ID ERO13
<dbl> <dbl>
1 1 3
2 2 6
[[2]]
# A tibble: 2 × 2
ID ERO17
<dbl> <dbl>
1 4 4.5
2 6 12
A slightly different approach would be to bind the dataframes and use pivot_longer to separate the erosion name from the erosion value. Then you can take the mean of the values without having to specify the name.
library(tidyverse)
df1 <- data.frame(ID = c(1,1,2), ERO13 = c(2,4,6))
df2 <- data.frame(ID = c(4,4,6), ERO17 = c(4,5,12))
bind_rows(df1, df2) %>%
pivot_longer(starts_with('ERO'),
names_to = 'ERO',
values_drop_na = TRUE) %>%
group_by(ID, ERO) %>%
summarize(value = mean(value))
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups` argument.
#> # A tibble: 4 x 3
#> # Groups: ID [4]
#> ID ERO value
#> <dbl> <chr> <dbl>
#> 1 1 ERO13 3
#> 2 2 ERO13 6
#> 3 4 ERO17 4.5
#> 4 6 ERO17 12
Created on 2022-01-14 by the reprex package (v2.0.0)

Concat a list column in R

I'm trying to concatenate characters within a list column in R.
When I try this approach, the result is not 'abc' but a vector converted to character.
What is the right approach?
library(tidyverse)
tibble(b=list(letters[1:3])) %>%
mutate(b = paste(b))
#> # A tibble: 1 x 1
#> b
#> <chr>
#> 1 "c(\"a\", \"b\", \"c\")"
Created on 2020-10-14 by the reprex package (v0.3.0)
Maybe you need something like below
library(tidyverse)
tibble(b=list(letters[1:3])) %>%
mutate(b = sapply(b,paste,collapse = ""))
giving
# A tibble: 1 x 1
b
<chr>
1 abc
Try this. Keep in mind that the element in your tibble is a list. So you can use any of these approaches:
library(tidyverse)
tibble(b=list(letters[1:3])) %>%
mutate(b = lapply(b,function(x)paste0(x,collapse = '')))
Or this:
#Code 2
tibble(b=list(letters[1:3])) %>%
mutate(b = sapply(b,function(x)paste0(x,collapse = '')))
Output:
# A tibble: 1 x 1
b
<chr>
1 abc
In the first case, you will get the result in a list whereas in the second one you will get it as a value.
We can use tidyverse
library(dplyr)
library(stringr)
library(purrr)
tibble(b=list(letters[1:3])) %>%
mutate(b = map_chr(b, str_c, collapse=""))
# A tibble: 1 x 1
# b
# <chr>
#1 abc

Using mutate_at with mutate_if

I'm in the process of creating a generic function in my package. The goal is to find columns that are percent columns, and then to use parse_number on them if they are character columns. I haven't been able to figure out a solution using mutate_at and ifelse. I've pasted a reprex below.
library(tidyverse)
df <- tibble::tribble(
~name, ~pass_percent, ~attendance_percent, ~grade,
"Jon", "90%", 0.85, "B",
"Jim", "100%", 1, "A"
)
percent_names <- df %>% select(ends_with("percent"))%>% names()
# Error due to attendance_percent already being in numeric value
if (percent_names %>% length() > 0) {
df <-
df %>%
dplyr::mutate_at(percent_names, readr::parse_number)
}
#> Error in parse_vector(x, col_number(), na = na, locale = locale, trim_ws = trim_ws): is.character(x) is not TRUE
your attendance_percent variable is numeric, not character and parse_number only wants character variables, see here. So a solution would be:
edited_parse_number <- function(x, ...) {
if (mode(x) == 'numeric') {
x
} else {
parse_number(x, ...)
}
}
df %>%
dplyr::mutate_at(vars(percent_names), edited_parse_number)
# name pass_percent attendance_percent grade
# <chr> <dbl> <dbl> <chr>
#1 Jon 90 0.85 B
#2 Jim 100 1 A
OR
if you didn't want to use that extra function, extract character variables at beginning:
percent_names <- df %>%
select(ends_with("percent")) %>%
select_if(is.character) %>%
names()
percent_names
# [1] "pass_percent"
df %>%
dplyr::mutate_at(vars(percent_names), parse_number)
# name pass_percent attendance_percent grade
# <chr> <dbl> <dbl> <chr>
# 1 Jon 90 0.85 B
# 2 Jim 100 1 A
Alternatively, without having to create a function, you can just add an ifelse statement into mutate_at such as:
if (percent_names %>% length() > 0) {
df <-
df %>% rowwise() %>%
dplyr::mutate_at(vars(percent_names), ~ifelse(is.character(.),
parse_number(.),
.))
}
Source: local data frame [2 x 4]
Groups: <by row>
# A tibble: 2 x 4
name pass_percent attendance_percent grade
<chr> <dbl> <dbl> <chr>
1 Jon 90 0.85 B
2 Jim 100 1 A

tidyverse - prefered way to turn a named vector into a data.frame/tibble

Using the tidyverse a lot i often face the challenge of turning named vectors into a data.frame/tibble with the columns being the names of the vector.
What is the prefered/tidyversey way of doing this?
EDIT: This is related to: this and this github-issue
So i want:
require(tidyverse)
vec <- c("a" = 1, "b" = 2)
to become this:
# A tibble: 1 × 2
a b
<dbl> <dbl>
1 1 2
I can do this via e.g.:
vec %>% enframe %>% spread(name, value)
vec %>% t %>% as_tibble
Usecase example:
require(tidyverse)
require(rvest)
txt <- c('<node a="1" b="2"></node>',
'<node a="1" c="3"></node>')
txt %>% map(read_xml) %>% map(xml_attrs) %>% map_df(~t(.) %>% as_tibble)
Which gives
# A tibble: 2 × 3
a b c
<chr> <chr> <chr>
1 1 2 <NA>
2 1 <NA> 3
This is now directly supported using bind_rows (introduced in dplyr 0.7.0):
library(tidyverse))
vec <- c("a" = 1, "b" = 2)
bind_rows(vec)
#> # A tibble: 1 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
This quote from https://cran.r-project.org/web/packages/dplyr/news.html explains the change:
bind_rows() and bind_cols() now accept vectors. They are treated as rows by the former and columns by the latter. Rows require inner names like c(col1 = 1, col2 = 2), while columns require outer names: col1 = c(1, 2). Lists are still treated as data frames but can be spliced explicitly with !!!, e.g. bind_rows(!!! x) (#1676).
With this change, it means that the following line in the use case example:
txt %>% map(read_xml) %>% map(xml_attrs) %>% map_df(~t(.) %>% as_tibble)
can be rewritten as
txt %>% map(read_xml) %>% map(xml_attrs) %>% map_df(bind_rows)
which is also equivalent to
txt %>% map(read_xml) %>% map(xml_attrs) %>% { bind_rows(!!! .) }
The equivalence of the different approaches is demonstrated in the following example:
library(tidyverse)
library(rvest)
txt <- c('<node a="1" b="2"></node>',
'<node a="1" c="3"></node>')
temp <- txt %>% map(read_xml) %>% map(xml_attrs)
# x, y, and z are identical
x <- temp %>% map_df(~t(.) %>% as_tibble)
y <- temp %>% map_df(bind_rows)
z <- bind_rows(!!! temp)
identical(x, y)
#> [1] TRUE
identical(y, z)
#> [1] TRUE
z
#> # A tibble: 2 x 3
#> a b c
#> <chr> <chr> <chr>
#> 1 1 2 <NA>
#> 2 1 <NA> 3
The idiomatic way would be to splice the vector with !!! within a tibble() call so the named vector elements become column definitions :
library(tibble)
vec <- c("a" = 1, "b" = 2)
tibble(!!!vec)
#> # A tibble: 1 x 2
#> a b
#> <dbl> <dbl>
#> 1 1 2
Created on 2019-09-14 by the reprex package (v0.3.0)
This works for me: c("a" = 1, "b" = 2) %>% t() %>% tbl_df()
Interestingly you can use the as_tibble() method for lists to do this in one call. Note that this isn't best practice since this isn't an exported method.
tibble:::as_tibble.list(vec)
as_tibble(as.list(c(a=1, b=2)))

Resources