I have 2 lists inside a list in R. Each sublist contains a different number of dataframes. The data looks like this:
df1 <- data.frame(x = 1:5, y = letters[1:5])
df2 <- data.frame(x = 1:15, y = letters[1:15])
df3 <- data.frame(x = 1:25, y = letters[1:25])
df4 <- data.frame(x = 1:6, y = letters[1:6])
df5 <- data.frame(x = 1:8, y = letters[1:8])
l1 <- list(df1, df2)
l2 <- list(df3, df4, df5)
mylist <- list(l1, l2)
I want to count the total number of dataframes I have in mylist (answer should be 5, as I have 5 data frames in total).
Using lengths():
sum(lengths(mylist)) # 5
From the official documentation:
[...] a more efficient version of sapply(x, length)
library(purrr)
mylist |> map(length) |> simplify() |> sum()
You can try
lapply(mylist,length) |> unlist() |> sum()
How about this:
sum(sapply(mylist, length))
length(unlist(mylist, recursive = F)) should work.
Another possible solution:
library(tidyverse)
mylist %>% flatten %>% length
#> [1] 5
You can unlist and use length.
length(unlist(mylist, recursive = F))
# [1] 5
Forr lists of arbitrary length, one can use rrapply::rrapply:
length(rrapply(mylist, classes = "data.frame", how = "flatten"))
# 5
Related
I am looking to mutate the same variables with two or more dataframes. What is the best way to implement to reduce redundant code?
library(dplyr)
df1 <- tibble(a = 0.125068, b = 0.144623)
df2 <- tibble(a = 0.226018, b = 0.423600)
df1 <- df1 %>%
mutate(a = round(a, 1),
b = round(b, 2))
df2 <- df2 %>%
mutate(a = round(a, 1),
b = round(b, 2))
It may be interesting to put the dataframes in a list first:
my_dfs <- list(df1, df2)
Then use a loop-apply function like lapply:
lapply(my_dfs, \(x) mutate(x, a = round(a, 1),
b = round(b, 2))
If we really need the dataframes in the global environment, instead of in a dedicated list, we can simply call list2env(), as in:
lapply(my_dfs, \(x) mutate(x, a = round(a, 1),
b = round(b, 2)) |>
list2env(envir = .GlobalEnv))
You could make a function
rnd <- function(x) {
x %>%
mutate(a = round(a, 1),
b = round(b, 2))
}
df1 %>% rnd()
This is my data:
df1 <- data.frame(x = 1:5, y = letters[1:5])
df2 <- data.frame(x = 1:15, y = letters[1:15])
df3 <- data.frame(x = 1:25, y = letters[1:25])
df4 <- data.frame(x = 1:6, y = letters[1:6])
df5 <- data.frame(x = 1:8, y = letters[1:8])
l1 <- list(df1, df2)
l2 <- list(df3, df4, df5)
mylist <- list(l1, l2)
I want to calculate the mean of the x column in all data frames inside mylist, and put them in a new empty list (or vector), like so:
mean_vec <- c(
mean(df1$x),
mean(df2$x),
mean(df3$x),
mean(df4$x),
mean(df5$x)
)
Another possible solution, based on purrr::map_depth:
library(tidyverse)
map_depth(mylist, 2, ~ mean(.x$x)) %>% unlist
#> [1] 3.0 8.0 13.0 3.5 4.5
Or using rrapply::rrapply, solution that is now shorter thanks to #Maƫl's comment, to whom I thank:
library(rrapply)
library(magrittr)
rrapply(mylist, condition = is.numeric, f = mean, how = "unlist") %>% unname
#> [1] 3.0 8.0 13.0 3.5 4.5
You can unlist your nested list and compute the mean for each:
mean_vec <- sapply(unlist(mylist, recursive = F), function(dat) mean(dat$x))
mean_vec
# [1] 3.0 8.0 13.0 3.5 4.5
Another option with rapply:
mean <- rapply(mylist, mean)
unname(mean[names(mean) == "x"])
# [1] 3.0 8.0 13.0 3.5 4.5
A purrr solution
library(purrr)
library(dplyr)
mylist %>%
map_depth(., 2, ~ .x %>% summarise(mean = mean(x, na.rm = T))) %>%
bind_rows() %>%
pull()
I'm trying to loop through a list of data frames, dropping columns that don't match some condition. I want to change the data frames such that they're missing 1 column essentially. After executing the function, I'm able to change the LIST of data frames, but not the original data frames themselves.
df1 <- data.frame(
a = c("John","Peter","Dylan"),
b = c(1, 2, 3),
c = c("yipee", "ki", "yay"))
df2 <- data.frame(
a = c("Ray","Bob","Derek"),
b = c(4, 5, 6),
c = c("yum", "yummy", "donuts"))
df3 <- data.frame(
a = c("Bill","Sam","Nate"),
b = c(7, 8, 9),
c = c("I", "eat", "cake"))
l <- list(df1, df2, df3)
drop_col <- function(x) {
x <- x[, !names(x) %in% c("e", "b", "f")]
return(x)
}
l <- lapply(l, drop_col)
When I call the list l, I get a list of data frames with the changes I want. When I call an element in the list, df1 or df2 or df3, they do not have a dropped column.
I've looked at this solution and many others, I'm obviously missing something.
l list and df1 , df2 etc. dataframes are independent. They have nothing to do with each other. One way to get new changed dataframes is to assign names to the list and create new dataframe.
l <- lapply(l, drop_col)
names(l) <- paste0("df", 1:3)
list2env(l, .GlobalEnv)
The problem is that when you are creating l, you are filling it with copies of your data frames df1, df2, df3.
In R, it is not generally possible to pass references to variables. One workaround is to create an environment as #Ronak Shah does.
Another is to use get() and <<- to change the variable within the function.
drop_cols <- function(x) {
for(iter in x)
do.call("<<-", list(iter, drop_col(get(iter))))
}
drop_cols(c("df1","df2","df3"))
df1 <- data.frame(
a = c("John","Peter","Dylan"),
b = c(1, 2, 3),
c = c("yipee", "ki", "yay"))
df2 <- data.frame(
a = c("Ray","Bob","Derek"),
b = c(4, 5, 6),
c = c("yum", "yummy", "donuts"))
df3 <- data.frame(
a = c("Bill","Sam","Nate"),
b = c(7, 8, 9),
c = c("I", "eat", "cake"))
# Name the list elements:
l <- list(df1 = df1, df2 = df2, df3 = df3)
drop_col <- function(x) {
x <- x[, !names(x) %in% c("e", "b", "f")]
return(x)
}
l <- lapply(l, drop_col)
# View altered dfs:
View(l["df1"])
I have a list where the list elements are tibbles or lists that contain multiple tibbles. I would like to get a list where all the tibbles are on the same level.
How would I do that?
library(tibble)
tib_1 <- tibble(a = 1:4, b = LETTERS[1:4])
tib_2 <- tibble(c = 1:4, d = LETTERS[1:4])
tib_3 <- tibble(e = 1:4, f = LETTERS[1:4])
tib_4 <- tibble(g = 1:4, h = LETTERS[1:4])
my_list <- list(tib_1, tib_2, list(tib_3, tib_4))
desired_list <- list(tib_1, tib_2, tib_3, tib_4)
We can just use flatten
library(rlang)
out <- flatten(my_list)
-checking
identical(desired_list, out)
#[1] TRUE
I would like to loop through a list of dataframes and change the column names (I want each of the columns to have the same name)
Does anyone have a solution using the following data?
df <- data.frame(x = 1:10, y = 2:11, z = 3:12)
df2 <- data.frame(x = 1:10, y = 2:11, z = 3:12)
df3 <- data.frame(x = 1:10, y = 2:11, z = 3:12)
x <- list(df, df2, df3)
Either using a for loop or apply? Would actually love to see both if possible
Thanks,
Ben
Both hrbrmstr and David Arenburg's answers are perfect.