Set names of dataframes inside lists - r

I've got a rather large list that contains many dataframes of the same length. I'd like to rename all the column names in the list. I've tried to use purrr::map, but have hit various issues. Is there a better way to do this?
Here is a reprex of the approach and issues I'm having with it. Thanks.
library(tidyverse)
org_names <- names(
starwars %>%
select_if(
Negate(is.list))
)
df <- starwars %>%
select_if(Negate(is.list))
names(df) <- sample(LETTERS, length(df), replace = F)
df_ls <- list(df, list(df, df), list(df, df, df), df, list(df, df))
map(df_ls, function(x){
x %>%
set_names(org_names)
})
#> `nm` must be `NULL` or a character vector the same length as `x`

As some of the elements are nested list, can use a condition to check if it is a list, then do the set_names by looping inside the list
library(tidyverse)
map(df_ls, ~ if(is.data.frame(.x)) .x %>%
set_names(org_names) else
map(.x, ~ .x %>%
set_names(org_names)))
Or it can be made more compact with map_if
out <- map_if(df_ls, is.data.frame, set_names, org_names,
.else = ~ map(.x, set_names, org_names))

Related

Looping error with lists: for function not works inside purrr::map2 in R

I built a function to use it inside the purrr::map2 function and run it in two lists. When I run the function steps separately it works ok. But apparently in map2 it runs the first time (for the first elements of list .x[[1]] .y[[1]]) and then in the second round throws this error in the for function:
How can I find out why it's not working?
PS: It's hard to put an example of the data here because they are lists with very specific characteristics for this function. I'm sorrry.
Follow the function:
df <- list()
build_HUW_raster <- function(.x, .y) {
list.time <- .x %>%
split(.$id) %>%
purrr::map(~list(t=as.matrix(.x$date),
xy=unname(as.matrix(.x[,c(22,23)])))
)
for(i in 1:50){
cat(i," ")
path=list.time[[i]]
ctmc=ctmcmove::path2ctmc(path$xy,path$t,r,method="LinearInterp")
df[[i]] <- as.data.frame(do.call(cbind, ctmc))
}
df <- df %>% purrr::map(~ group_by(., ec) %>%
summarise(rt = mean(rt)) %>%
arrange(desc(rt))
)
stacktime <- df %>% purrr::map(~ rename(., cell = ec)) %>%
map(~dplyr::left_join(cargo.grid, ., by="cell", copy=T)) %>%
map(~raster::rasterize(., r, field="rt", na.rm=F, background=0)) %>%
raster::stack()
stackprop <- .y %>%
split(.$id) %>%
purrr::map(~ raster::rasterize(., y = r,
field=.$proportion,
fun=function(x, ...)median(x))) %>%
raster::stack()
stack_huw <- raster::overlay(raster::calc(stacktime, fun=function(x)
ifelse(is.na(x), NA, x/sum(x, na.rm=T))), stackprop, fun=function(x,y)x*y
)
raster_mean <- raster::stackApply(stack_huw,
indices = rep(1,raster::nlayers(stack_huw)),
fun = "mean",
na.rm = F
)
}
result.list <- purrr::map2 (.x=list1, .y=list2, fun=build_HUW_raster)
The reason is based on the element looped. [[ extracts the list element and depending on the class of the element, map loops over either individual elements if it is a vector/matrix or the columns in case of data.frame as these are units. By using [, it extracts the element as a list
list(1, 2, 3)[1]
[[1]]
[1] 1
vs
list(1, 2, 3)[[1]]
[1] 1
When we loop over map and apply some functions that require a specific structure i.e. colSums require a matrix/data.frame ie. with dim attributes, it fails if we use [[
> map(replicate(2, data.frame(col1 = 1:5, col2 = 6:10), simplify = FALSE)[[1]], colSums)
Error in .f(.x[[i]], ...) :
'x' must be an array of at least two dimensions
> map(replicate(2, data.frame(col1 = 1:5, col2 = 6:10), simplify = FALSE)[1], colSums)
[[1]]
col1 col2
15 40
Here, we may change the code to
purrr::map2(.x=list1[1], .y=list2[1], fun=build_HUW_raster)

How do I use map to drop a variable from a list of data frames

How do I drop a variable with the same name in a list of dataframes using map? Sadly the variable appears in a different position in each data frame, so I can't drop it using its position. It has to be with its name.
var1<-rnorm(100)
var2<-sample(letters, 100, replace=T)
var3<-rnorm(100)
df<-data.frame(var1, var2, var3)
df2<-data.frame(var1, var3, var2)
list1<-list(df, df2)
library(purrr)
#This works, but it won't help me because var2 is in different positions.
list1 %>%
map(., `[`, -2)
#This does not work.
list1 %>%
map(., `[`, -c("var2"))
You can do
map(list1, ~ .x %>% select(-var2))
Or using NSE with a curly-curly expression
name_excl <- "var2"
map(list1, ~ .x %>% select(-{{name_excl}}))

How to apply functions over a list of list of dataframes?

I've got a list state-list which contains 4 lists wa, tex, cin and ohi, all of which contain around 60 dataframes. I want to apply the same functions to these dataframes. For example, I want to add a new column with a mean, like this:
library(dplyr)
df # example df from one of the lists
df %>% group_by(x) %>% mutate(mean_value = mean(value))
How can I do this?
We can use a nested map to loop over the list
library(purrr)
library(dplyr)
out <- map(state_list, ~ map(.x, ~ .x %>%
group_by(x) %>%
mutate(mean_value = mean(value)))
Or using base R
out <- lapply(state_list, function(lst1) lapply(lst1,
function(dat) transform(dat, mean_value = ave(value, x))))

R: Rename some variables in list of dataframes to match others

I have a list of data frames with inconsistent but overlapping variables. Some of the shared variables have similar but not identical names. I would like to conditionally rename the variable so that it is consistent across datasets. The way to do this one at a time would be
library(tidyverse)
df_1 <- starwars
df_2 <- starwars %>% rename(haircolor = hair_color)
df_3 <- starwars
df_list <- list(df_1, df_2, df_3)
df_list[[2]] <- df_list[[2]] %>% rename(hair_color = haircolor)
But I would like this to be flexible such that I can just feed in a list of any size and it will rename any variable titled hair_color as haircolor. Is there a way to purrr::map over these in a way that renames conditionally on the variable existing? The most basic interpretation would look something like:
df_list %>%
purrr::map( ~ rename(., hair_color = haircolor))
We can pass this in a select_helpers function
library(dplyr)
library(purrr)
df_list %>%
purrr::map( ~ .x %>%
rename_at(vars(matches('hair_color')), ~ 'haircolor'))
Or use an if/else condition
df_list %>%
purrr::map( ~ if('hair_color' %in% names(.)) {
rename(., haircolor = hair_color)
} else .)

Split a data.frame by group into a list of vectors rather than a list of data.frames

I have a data.frame which maps an id column to a group column, and the id column is not unique because the same id can map to multiple groups:
set.seed(1)
df <- data.frame(id = paste0("id", sample(1:10,300,replace = T)), group = c(rep("A",100), rep("B",100), rep("C",100)), stringsAsFactors = F)
I'd like to convert this data.frame into a list where each element is the ids in each group.
This seems a bit slow for the size of data I'm working with:
library(dplyr)
df.list <- lapply(unique(df$group), function(g) dplyr::filter(df, group == g)$id)
So I was thinking about this:
df.list <- df %>%
dplyr::group_by(group) %>%
dplyr::group_split()
Assuming it is faster than my first option, any idea how to get it to return the same output as in the first option rather than a list of data.frames?
Using base R only with split. It should be faster than the == with unique
with(df, split(id, group))
Or with tidyverse we can pull the column after the group_split. The group_split returns a data.frame/tibble and could be slower compared to the split only method above. But, here, we can make some performance improvements by removing the group column (keep = FALSE) and then in the list, pull the 'id' column to create the list of vectors
library(dplyr)
library(purrr)
df %>%
group_split(group, keep = FALSE) %>%
map(~ .x %>%
pull(id))
Or use {} with pipe
df %>%
{split(.$id, .$group)}
Or wrap with with
df %>%
with(., split(id, group))

Resources