Remove Non-Matching Dataframe Names Nested in A List - r

I have two lists consisting of dataframes - df_quintile and disease_df_quintile. I do not know how to represent them concisely, but this is how they look like in Rstudio:
Notice, disease_df_quintile consists of 5 dataframes (dataframes 1 through 5), while disease_df_quintile consists of 4 (dataframes 2 through 5). I would like to cross check both lists and remove any dataframes that are not shared by both lists - so in this case, I would like to remove the first dataframe from the df_quintile list. How can I achieve this?
Thank you.

Independently of the content of the list, you can first find the repeated names and then subsetting the lists:
##-- Fake lists
l1 <- as.list(1:5)
names(l1) <- 1:5
l2 <- as.list(2:5)
names(l2) <- 2:5
##-- Common names and subsetting
common_names <- intersect(names(l1), names(l2))
l1 <- l1[common_names]
l2 <- l2[common_names]

You can match the list's names and keep the common ones.
keep <- match(names(disease_df_quintile), names(df_quintile))
new_df_quintile <- df_quintile[keep]

Related

Split a dataframe into a list of nested data frames and matrices

I'd like to split the diamonds data frame into a list of 5 dataframe, group by cut. This instruction got me started.
https://dplyr.tidyverse.org/reference/group_split.html
diamonds_g <- diamonds%>% group_split(cut)%>% setNames(unique(diamonds$cut))
My desired output is a list of 5 nested lists. Each nested list contains one data frame and one matrix, such that:
View(diamonds_g[[1]])
factors <- diamonds_g[[1]][2:4]
mat <- diamonds_g[[1]][6:10]
So each of the nested list (or each cut) contains one data frame of n rows (depending on how many diamonds are classified as that cut) named factors by 3 columns, and one matrix of n rows by 10 columns named mat. In other words, the lowest level of the list (the nested matrix and data frame) should have identical names across the 5 nested lists. How do I proceed?
Thank you.
Do you mean something like this?
result <- lapply(diamonds_g, function(x)
list(factors = x[2:4], mat = as.matrix(x[6:10])))
We can use tidyverse
library(dplyr)
library(purrr)
result <- map(diamonds_g, ~ list(factors = .x[2:4], mat = as.matrix(.x[6:10])))

How to split a list of vectors to sub-lists by increasing order.

I have a list of n vectors. I would like to split it to sub-list where the number of the vectors at each list is different. The number of the vectors is increased sequentially from one list to another. For example,
if I have a list with 6 vectors. Then, I would like to split it to several list as follows:
The first list contains one vector. Then, the second list contains 2 vectors and so on. For example,
Suppose I have the list x as follows:
x <- list(x1=c(1,2,3), x2=c(1,4,3), x3=c(3,4,6), x4=c(4,8,4), x5=c(4,33,4), x6=c(9,6,7))
Then, I would like to split it into 3 lists,
list1 = x1
list2 = list(x2, x3)
list3 = list(x4,x5, x6)
I have similar question (How to splitting a list of vectors to small lists in decreasing order in r) but in a decreasing order.
How I can generate it to arbitrary number of vectors. For example, how if I have 10 or 20 vectors?
Any idea, please?
I'd stick them all in a list of lists
MyLists <- list()
i <- 1
for (inc in 1:3){
MyLists[[inc]] <- x[i:(i+inc-1)]
i <- i+inc
}
Now MyLists[[1]] is list1, etc.
Building off farnsy's answer, If you need each list in a separate indexed list in the global environment you could do something like this.
#your Stater list
x <- list(x1=c(1,2,3), x2=c(1,4,3), x3=c(3,4,6),
x4=c(4,8,4), x5=c(4,33,4), x6=c(9,6,7))
#using a paste parse eval approach to evaluate a string
i<-1
for(inc in 1:3){
eval(parse(text =
paste0("list", inc, "<-list(",
paste0("x$",names(x)[i:(i+inc-1)],collapse = ","),
")")
))
i <- i+inc
}

merge list of lists in R

I have a list of lists, where some lists are NULL (contain nothing), and some lists contains 12 columns and 1 row. lets say this list of lists is named: pages.
I would like to merge the lists that contain the 12 columns and 1 row into a dataframe. so that I have a final dataframe of 12 columns and x rows.
I first tried:
final_df <- Reduce(function(x,y) merge(x, y, all=TRUE), pages)
which yielded a dataframe with the right 12 columns, but no rows, so it was empty.
I then tried:
listofvectors <- list()
for (i in 1:length(pages)) {listofvectors <- c(listofvectors, pages[[i]])}
which just pasted every list below each other.
I finally tried playing with:
final<-do.call(c, unlist(pages, recursive=FALSE))
which only resulted in a very long value.
What am I missing? Who can help me out? Thanks a lot for your input.
The merge function is for joining data on common column values (commonly called a join). You need to use rbind instead (the r for row, use cbind to stick columns together).
do.call(rbind, pages) # equivalent to rbind(pages[[1]], pages[[2]], ...)
do.call(rbind, pages[lengths(pages) > 0]) # removing the 0-length elements
If you have additional issues, please provide a reproducible example in your question. This code works on this example:
x = list(data.frame(x = 1), NULL, data.frame(x = 2))
do.call(rbind, x)
# x
# 1 1
# 2 2

Converting specific parts of lists to a dataframe

I have a large list of 2 elements containing lists of species containing lists of 25 vectors, resembling a set like this:
l1 <- list(time=runif(100), space=runif(100))
l2 <- list(time=runif(100), space=runif(100))
list1 <- list(test1=list(species1=l1, species2=l2),test2=list(species1=l1, species2=l2))
I think, its essentially a list of a list of lists.of vectors.
I want to create a data.frame from all space-vectors of all 'species' in just one of the two sublists:
final <- as.data.frame(cbind(unlist(list1[[2]]$species1$space), unlist(list1[[2]]$species2$space)))
names(final) <- names(list1[[2]])
Essentially, i need a loop/apply command that navigates me through list1[[2]]$species and picks all vectors called space.
Thank you very much!
We can use a nested loop to extract the 'space' elements
data.frame(lapply(list1, function(x)
sapply(x, "[", 'space')))

intersecting across 10 large data sets and merging automatically

I have 10 data.frames with 2 columns with names s and p. s is for sequence and p is for p-values. I want to find the sequences that intersect across all data.frames, so I did this:
# 10 data.frames are a, b, c, ..., j
masterseq_list <- Reduce(intersect, list(a$s, b$s, c$s, d$s, e$s, f$s, g$s,h$s, i$s,j$s))
I'd like to take masterseq_list and merge each dataframe a:j by this new reduced sequence so I am left with each data.frame having masterseq_list as the new column instead of s and the p-values remaining intact. I know I can use this code somehow but I'm really not sure how to do it if the column I want is currently a list.
total <- merge(data frameA,data frameB,by="s")
The files are really big so I'd like to find a way to automate this, how can I loop through this faster and efficiently? Thanks so much!
I'd start by putting all the data.frames in a list first:
my_l <- list(a,b,c)
# now get intersection
isect <- Reduce(intersect, lapply(my_l, "[[", 1))
> isect
# [1] "gtcg" "gtcgg" "gggaa" "cttg"
# subset the original data.frames for just this intersecting rows
lapply(my_l, function(x) subset(x, s %in% isect))

Resources