I am merging a list of tibbles, 80000 in particular. I think some in there are nulls, or empty dataframes, but I am having problem to flesh them out.
I am using the following code, with no success
category_data_non_empty <- Filter(Negate(is.null), category_data_names)
category_data_df <- reduce(function(x ,y) merge(x, y, by=names(x)[1]), category_data_non_empty)
what other tidy ways could i do?
And the winner was: Thank you all for the help
category_data_non_empty <- lapply(category_data_names, function(x) !is.null(dim(x))) %>% unlist(use.names = FALSE) # %>% unlist(use.names = FALSE)
category_data_df <- category_data_names[category_data_non_empty] %>% bind_rows
Consider NROW in Filter to remove NULL or NA elements or empty data frames in list.
category_data_non_empty <- Filter(NROW, category_data_names)
category_data_df <- Reduce(function(x ,y) merge(x, y, by=names(x)[1]),
category_data_non_empty)
Otherwise, your current attempt needs an anonymous function argument passed since you run two nested methods. However, this leaves empty (zero-row) data frames:
Filter(function(df) Negate(is.null(df)), category_data_names)
Related
I am trying to write a function to dynamically group_by every combination of a character vector.
This is how I set it up my list:
stuff <- c("type", "country", "color")
stuff_ListStr <- do.call("c", lapply(seq_along(stuff), function(i) combn(stuff, i, FUN = list)))
stuff_ListChar <- sapply(stuff_ListStr, paste, collapse = ", ")
stuff_ListSym <- lapply(stuff_ListChar, as.symbol)
Then I threw it into a loop.
b <- list()
for (each in stuff_ListSym) {
a <- answers_wfh %>%
group_by(!!each) %>%
summarize(n=n())
b <- append(b, a)
}
So essentially I want to replicate this
... group_by(type),
... group_by(country),
... group_by(type, country),
... and the rest of the combinations. Then I want put all the summaries into one list (a list of tibbles/lists)
It's totally failing. This is my error message:
Error: Column `type, country` is unknown.
Not only that, b is not giving me what I want. It's a list with length 12 already when I only expected 2 before it failed. One tibble grouped by 'type' and the second by 'country'.
I'm new to R in general but thought tidy eval was really cool and wanted to try. Any tips here?
I think you have a problem of standard evaluation. !! is sometimes not enough to unquote variables and get dplyr to work. Use !!! and rlang::syms for multiple unquotes
b <- list()
for (each in stuff_ListSym) {
a <- answers_wfh %>%
group_by(!!!rlang::syms(each)) %>%
summarize(n=n())
b <- append(b, a)
}
I think lapply would be better in your situation than for since you want to end-up with a list
Since you use variable names as arguments of functions, you might be more comfortable with data.table than dplyr. If you want the equivalent data.table implementation:
library(data.table)
setDT(answers_wfh)
lapply(stuff_ListSym, function(g) answers_wfh[,.(n = .N), by = g])
You can have a look at this blog post I wrote on the subject of SE vs NSE in dplyr and data.table
I think stuff_ListStr is enough to get what you want. You cold use group_by_at which accepts character vector.
library(dplyr)
library(rlang)
purrr::map(stuff_ListStr, ~answers_wfh %>% group_by_at(.x) %>% summarize(n=n()))
A better option is to use count but count does not accept character vectors so using some non-standard evaluation.
purrr::map(stuff_ListStr, ~answers_wfh %>% count(!!!syms(.x)))
I have the same problem as this guy: returning from list to data.frame after lapply
Whilst they solved his specific problem, no one actually answered his original question about how to get dataframes out of a list.
I have a list of data frames:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
And I want to filter/replace etc on them all.
So my function is:
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
And I use lapply to run the function on them all like this:
a = lapply(dfPreList, DoThis)
As the other post stated, these data frames are now stuck in this list (a), and I need a for loop to get them out, which just cannot be the correct way of doing it.
This is my current working way of applying the function to the dataframes and then getting them out:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
dfPreListstr= list('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
a = lapply(dfPreList, DoThis)
for( i in seq_along(dfPreList)){
assign(dfPreListstr[[i]], as.data.frame(a[i]))
}
Is there a way of doing this without having to rely on for loops and string names of the dataframes? I.e. a one-liner with the lapply?
Many thanks for your help
You can assign names to the list and then use list2env.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
list2env(a, .GlobalEnv)
Another way would be to unlist the list, then convert the content into data frame.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
yearlyFunding <- data.frame(matrix(unlist(a$yearlyFunding), nrow= nrow(yearlyFunding), ncol= ncol(yearlyFunding)))
yearlyPubs <- data.frame(matrix(unlist(a$yearlyPubs), nrow= nrow(yearlyPubs), ncol= ncol(yearlyPubs)))
yearlyAuthors <- data.frame(matrix(unlist(a$yearlyAuthors), nrow= nrow(yearlyAuthors), ncol= ncol(yearlyAuthors)))
Since unlist function returns a vector, we first generate a matrix, then convert it to data frame.
Is there a neat way to convert a nested data.frame to a hierarchical list?
I do it below with a for loop, but ideally there is a neater solution that generalizes to an arbitrary number of nested columns.
nested_df <- expand.grid(V1 = c('a','b','c'),
V2 = c('z','y'))%>%
group_by_all()%>%
do(x=runif(10))%>%
ungroup
nested_ls <- list()
for(v1 in unique(nested_df$V1)){
for(v2 in unique(nested_df$V2)){
nested_ls[[v1]][[v2]] <- nested_df%>%
filter(V1==v1 & V2==v2)%>%
pull(x)%>%
unlist
}
}
str(nested_ls)
If you are not very strict with the names z and y, and can also work with [[1]] and [[2]], then you can directly do,
split(nested_df$x, nested_df$V1)
If you need the names, then
lapply(split(nested_df, nested_df$V1), function(i)split(i$x, i$V2))
#Or as #Frank mentions in comments, we can use setNames
lapply(split(nested_df, nested_df$V1), function(i) setNames(i$x, i$V2))
I feed inputList to my custom function, after several workflows(few simple filtration), I end up with data.frame resultDF, which needed to be relisted. I used relist to make resultDF has the same structure of inputList, but I got an error. Is there any simplest way of relisting resultDF? Can anyone point me out how to make this happen? Any idea? sorry for this simple question.
Here is input data.frame within the list:
inputList <- list(
bar=data.frame(from=c(8,18,33,53),
to=c(14,21,39,61), val=c(48,7,10,8)),
cat=data.frame(from=c(6,15,20,44),
to=c(10,17,34,51), val=c(54,21,14,12)),
foo=data.frame(from=c(11,43), to=c(36,49), val=c(49,13)))
After several workflows, I end up with this data.frame:
resultDF <- data.frame(
from=c(53,8,6,15,11,44,43,44,43),
to=c(61,14,10,17,36,51,49,51,49),
val=c(8,48,54,21,49,12,13,12,13)
)
I need to relist resultDF with the same structure of inputList. I used relit method, but I got an error.
This is my desired list:
desiredList <- list(
bar=data.frame(from=c(8,53), to=c(14,61), val=c(48,8)),
cat=data.frame(from=c(6,15,44,44), to=c(10,17,51,51), val=c(54,21,12,12)),
foo=data.frame(from=c(11,43,43), to=c(36,49,49), val=c(49,13,13))
)
How can I achieve desiredList ? Thanks in advance :)
We can loop through the 'inputList' and check whether the pasted row elements in 'resultDF' are %in% list elements and use that index to subset the 'resultDF'
lapply(inputList, function(x) resultDF[do.call(paste, resultDF) %in% do.call(paste, x),])
Another option is a join and then split. We rbind the 'inputList' to a data.table with an additional column 'grp' specifying the list names, join with the 'resultDF' on the column names of 'resultDF', and finally split the dataset using the 'grp' column
library(data.table)
dt <- rbindlist(inputList, idcol = "grp")[resultDF, on = names(resultDF)]
split(dt[,-1, with = FALSE], dt$grp)
This works:
onion$yearone$id %in% mask$yearone
This doesn't:
onion[1][1] %in% mask[1]
onion[1]['id'] %in% mask[1]
Why? Short of an obvious way to vectorize in parallel columns in DF and in memberids (so I only get rows within each year when ids are present in both DF and memberids), im using a for loop, but I'm not being lucky at finding the right way to express the index... Help?
Example data:
yearone <- data.frame(id=c("b","b","c","a","a"),v=rnorm(5))
onion <- list()
onion[[1]] <- yearone
names(onion) <- 'yearone'
mask <- list()
mask[[1]] <- c('a','c')
names(mask) <- 'yearone'
The '$' operator is not the same as the '[' operator. If the "yearone' and 'ids' are in fact the first items in those lists you should see that this is giving the same results as the first call:
DF[[1]][[1]] %in% memberids[[1]]
Why we should think that accessing yearpathall should give the same results is entirely unclear at this point, but using the "[[" operator will possibly give an atomic vector, whereas using "[" will certainly not. The "[" operator always returns a result that is the same class as its first argument so in this case would be a list rather than a vector, for both 'DF' and 'memberids'. The %in% operator is just an infix version fo match and needs an atomic vector as both of its arguments
Here is an approach using Map
# some data
onion <- replicate(5,data.frame(id = sample(letters[1:3], 5,T), v = 1:5),
simplify = F)
mask <- replicate(5, sample(letters[1:3],2), simplify = F)
names(onion) <- names(mask) <- paste0('year', seq_along(onion))
A function that will do the matching
get_matches <- function(data, id, mask){
rows <- data[[id]] %in% mask
data[rows,]
}
Map(get_matches , data = onion, mask = mask, MoreArgs = list(id = 'id'))
This seems to be the answer I was seeking:
merge(mask[1],onion[[1]], by.x = names(mask[1]), by.y = names(onion[[1]][1]))
And applied to parallel lists of dataframes:
result <- list()
for (i in 1:(length(names(onion)))) {
result[[i]] <- merge(mask[i],onion[[i]], by.x = names(mask[i]), by.y = names(onion[[i]][1]))
}