Is there a neat way to convert a nested data.frame to a hierarchical list?
I do it below with a for loop, but ideally there is a neater solution that generalizes to an arbitrary number of nested columns.
nested_df <- expand.grid(V1 = c('a','b','c'),
V2 = c('z','y'))%>%
group_by_all()%>%
do(x=runif(10))%>%
ungroup
nested_ls <- list()
for(v1 in unique(nested_df$V1)){
for(v2 in unique(nested_df$V2)){
nested_ls[[v1]][[v2]] <- nested_df%>%
filter(V1==v1 & V2==v2)%>%
pull(x)%>%
unlist
}
}
str(nested_ls)
If you are not very strict with the names z and y, and can also work with [[1]] and [[2]], then you can directly do,
split(nested_df$x, nested_df$V1)
If you need the names, then
lapply(split(nested_df, nested_df$V1), function(i)split(i$x, i$V2))
#Or as #Frank mentions in comments, we can use setNames
lapply(split(nested_df, nested_df$V1), function(i) setNames(i$x, i$V2))
Related
This should be a simple problem to solve, but I am unable to get the exact output I would like. I have a nested list of dataframes, and I would like to filter out all dataframes with less than 50 rows, and remove them from the list.
Here's a reproducible example of what I have tried -
L <- list(iris,mtcars,iris)
O <- list(iris,mtcars,iris)
H <- list(iris,mtcars,iris)
List <- list(L,O,H)
test <- lapply(List, function(x) lapply(x, function(x) if (nrow(x)<50) NULL else x)))
this works for the first list, but it replaces the mtcars dataframes in the nested lists with NULL - it doesn't remove them from the list. It doesn't loop through the other lists unfortunately. I have also tried using the filter function
test <- lapply(List, function(x) lapply(x, function(x) filter(x, nrow(x)>50)))
This has the same issue with not looping through all lists, and for the first list it leaves me with an empty df which is still an element of the list. My last solution was writing a for loop which I tried just on the first list in the nest, which mostly worked - but I'd like to find a less chunky way to do this if possible. This also returns an error: Error in List[[1]][[ii]] : subscript out of bounds
for (ii in seq_along(List[[1]])){
n_rows = nrow(List[[1]][[ii]])
if (n_rows < 20){
List[[1]][[ii]] = NULL
}
}
I am hopeful there is a simple solution just around the corner!
One option could be:
lapply(List, function(x) Filter(function(y) nrow(y) >= 50, x))
With purrr library:
List %>% map(~keep(.x, ~nrow(.x) >= 50))
Here is an option with sapply/lapply
lapply(List, function(x) x[sapply(x, nrow)>=50])
I have the same problem as this guy: returning from list to data.frame after lapply
Whilst they solved his specific problem, no one actually answered his original question about how to get dataframes out of a list.
I have a list of data frames:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
And I want to filter/replace etc on them all.
So my function is:
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
And I use lapply to run the function on them all like this:
a = lapply(dfPreList, DoThis)
As the other post stated, these data frames are now stuck in this list (a), and I need a for loop to get them out, which just cannot be the correct way of doing it.
This is my current working way of applying the function to the dataframes and then getting them out:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
dfPreListstr= list('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
a = lapply(dfPreList, DoThis)
for( i in seq_along(dfPreList)){
assign(dfPreListstr[[i]], as.data.frame(a[i]))
}
Is there a way of doing this without having to rely on for loops and string names of the dataframes? I.e. a one-liner with the lapply?
Many thanks for your help
You can assign names to the list and then use list2env.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
list2env(a, .GlobalEnv)
Another way would be to unlist the list, then convert the content into data frame.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
yearlyFunding <- data.frame(matrix(unlist(a$yearlyFunding), nrow= nrow(yearlyFunding), ncol= ncol(yearlyFunding)))
yearlyPubs <- data.frame(matrix(unlist(a$yearlyPubs), nrow= nrow(yearlyPubs), ncol= ncol(yearlyPubs)))
yearlyAuthors <- data.frame(matrix(unlist(a$yearlyAuthors), nrow= nrow(yearlyAuthors), ncol= ncol(yearlyAuthors)))
Since unlist function returns a vector, we first generate a matrix, then convert it to data frame.
I am merging a list of tibbles, 80000 in particular. I think some in there are nulls, or empty dataframes, but I am having problem to flesh them out.
I am using the following code, with no success
category_data_non_empty <- Filter(Negate(is.null), category_data_names)
category_data_df <- reduce(function(x ,y) merge(x, y, by=names(x)[1]), category_data_non_empty)
what other tidy ways could i do?
And the winner was: Thank you all for the help
category_data_non_empty <- lapply(category_data_names, function(x) !is.null(dim(x))) %>% unlist(use.names = FALSE) # %>% unlist(use.names = FALSE)
category_data_df <- category_data_names[category_data_non_empty] %>% bind_rows
Consider NROW in Filter to remove NULL or NA elements or empty data frames in list.
category_data_non_empty <- Filter(NROW, category_data_names)
category_data_df <- Reduce(function(x ,y) merge(x, y, by=names(x)[1]),
category_data_non_empty)
Otherwise, your current attempt needs an anonymous function argument passed since you run two nested methods. However, this leaves empty (zero-row) data frames:
Filter(function(df) Negate(is.null(df)), category_data_names)
I feed inputList to my custom function, after several workflows(few simple filtration), I end up with data.frame resultDF, which needed to be relisted. I used relist to make resultDF has the same structure of inputList, but I got an error. Is there any simplest way of relisting resultDF? Can anyone point me out how to make this happen? Any idea? sorry for this simple question.
Here is input data.frame within the list:
inputList <- list(
bar=data.frame(from=c(8,18,33,53),
to=c(14,21,39,61), val=c(48,7,10,8)),
cat=data.frame(from=c(6,15,20,44),
to=c(10,17,34,51), val=c(54,21,14,12)),
foo=data.frame(from=c(11,43), to=c(36,49), val=c(49,13)))
After several workflows, I end up with this data.frame:
resultDF <- data.frame(
from=c(53,8,6,15,11,44,43,44,43),
to=c(61,14,10,17,36,51,49,51,49),
val=c(8,48,54,21,49,12,13,12,13)
)
I need to relist resultDF with the same structure of inputList. I used relit method, but I got an error.
This is my desired list:
desiredList <- list(
bar=data.frame(from=c(8,53), to=c(14,61), val=c(48,8)),
cat=data.frame(from=c(6,15,44,44), to=c(10,17,51,51), val=c(54,21,12,12)),
foo=data.frame(from=c(11,43,43), to=c(36,49,49), val=c(49,13,13))
)
How can I achieve desiredList ? Thanks in advance :)
We can loop through the 'inputList' and check whether the pasted row elements in 'resultDF' are %in% list elements and use that index to subset the 'resultDF'
lapply(inputList, function(x) resultDF[do.call(paste, resultDF) %in% do.call(paste, x),])
Another option is a join and then split. We rbind the 'inputList' to a data.table with an additional column 'grp' specifying the list names, join with the 'resultDF' on the column names of 'resultDF', and finally split the dataset using the 'grp' column
library(data.table)
dt <- rbindlist(inputList, idcol = "grp")[resultDF, on = names(resultDF)]
split(dt[,-1, with = FALSE], dt$grp)
This should be a simple one, i hope. I have several dataframes loaded into workspace, labelled df01 to df100, not all numbers represented. I'd like to plot a specific column across all datasets, for example in a box plot. How do I refer all objects starting with df, using globbing, ie:
boxplot(df00$col1, df02$col1, df04$col1)
=
boxplot(df*$col1)
The idomatic approach is to work with lists, or to use a separate environment.
You can create this list using ls and pattern
df.names <- ls(pattern = '^df')
# note
# ls(pattern ='^df[[:digit:]]{2,}')
# may be safer if there are objects starting with df you don't want
df.list <- mget(df.names)
# note if you are using a version of R prior to R 3.0.0
# you will need `envir = parent.frame()`
# mget(ls(pattern = 'df'), envir = parent.frame())
# use `lapply` to extract the relevant columns
df.col1 <- lapply(df.list, '[[', 'col1')
# call boxplot
boxplot(df.col1)
Try this:
nums <- sprintf("%02d", 0:100)
dfs.names <- Filter(exists, paste0("df", nums))
dfs.obj <- lapply(dfs.names, get)
dfs.col1 <- lapply(dfs.obj, `[[`, "col1")
do.call(boxplot, dfs.col1)