merge list of lists in R - r

I have a list of lists, where some lists are NULL (contain nothing), and some lists contains 12 columns and 1 row. lets say this list of lists is named: pages.
I would like to merge the lists that contain the 12 columns and 1 row into a dataframe. so that I have a final dataframe of 12 columns and x rows.
I first tried:
final_df <- Reduce(function(x,y) merge(x, y, all=TRUE), pages)
which yielded a dataframe with the right 12 columns, but no rows, so it was empty.
I then tried:
listofvectors <- list()
for (i in 1:length(pages)) {listofvectors <- c(listofvectors, pages[[i]])}
which just pasted every list below each other.
I finally tried playing with:
final<-do.call(c, unlist(pages, recursive=FALSE))
which only resulted in a very long value.
What am I missing? Who can help me out? Thanks a lot for your input.

The merge function is for joining data on common column values (commonly called a join). You need to use rbind instead (the r for row, use cbind to stick columns together).
do.call(rbind, pages) # equivalent to rbind(pages[[1]], pages[[2]], ...)
do.call(rbind, pages[lengths(pages) > 0]) # removing the 0-length elements
If you have additional issues, please provide a reproducible example in your question. This code works on this example:
x = list(data.frame(x = 1), NULL, data.frame(x = 2))
do.call(rbind, x)
# x
# 1 1
# 2 2

Related

How can I create smaller data frames out of a nested list in R

My list has 12000 entries. Each entry consists of 16 columns and 8 rows.
I would like to create a data frame for every single entry. I'm interested in 3 of the 16 columns (X,Y and Z coordinates)
I already tried this:
data_frame12000 <- as.data.frame(do.call(cbind, list_small_read_laz))
This and other functions only create one big data.frame with all the 16 columns for each entry.
Can anybody help me?
Thank You in advance!
If I am correct, you have a list containing 12000 elements each cintaining a dataframe with 8r*16c. And I suppose the column names are the same for all list elements.
First select X, Y, Z columns from each entry element :
library(tidyverse)
# assumming your list name is 'list_small_read_laz'
reduced_column <- map(list_small_read_laz,~ select(.,X,Y,Z))
Then combine all entries into a single dataframe:
df_reduced_column <- map_dfr(reduced_column, as.data.frame)
Hope this is what you are looking for.
If you have a list of 12000 dataframes you can generate a list of dataframes with only the desired columns using lapply. Here is an example using mtcars:
cars1 <- mtcars
cars2 <- cars1
cars3 <- cars2
list1 <- list(cars1, cars2, cars3)
df_list <- lapply(list1, function(x) x[, c(2, 4, 6)]) # column numbers are used
final_df <- Reduce(rbind, df_list) # if you want all of the dataframes combined by rows

Remove Non-Matching Dataframe Names Nested in A List

I have two lists consisting of dataframes - df_quintile and disease_df_quintile. I do not know how to represent them concisely, but this is how they look like in Rstudio:
Notice, disease_df_quintile consists of 5 dataframes (dataframes 1 through 5), while disease_df_quintile consists of 4 (dataframes 2 through 5). I would like to cross check both lists and remove any dataframes that are not shared by both lists - so in this case, I would like to remove the first dataframe from the df_quintile list. How can I achieve this?
Thank you.
Independently of the content of the list, you can first find the repeated names and then subsetting the lists:
##-- Fake lists
l1 <- as.list(1:5)
names(l1) <- 1:5
l2 <- as.list(2:5)
names(l2) <- 2:5
##-- Common names and subsetting
common_names <- intersect(names(l1), names(l2))
l1 <- l1[common_names]
l2 <- l2[common_names]
You can match the list's names and keep the common ones.
keep <- match(names(disease_df_quintile), names(df_quintile))
new_df_quintile <- df_quintile[keep]

Transpose R Table

I am trying to transpose a table I have created from a list of lists.
Each nested list has this format:
list(storm_name=NA, storm_level=NA, file_date=NA, file_time=NA,
date=NA, time=NA, actual_or_forecast=NA, lat=NA, long=NA, max_wind=NA,
gusts=NA, eye_speed=NA, eye_location=NA, storm_end=NA)
In short, each row has 14 elements within it.
storm_df <- as.data.frame(matrix(unlist(list1), nrow=length(unlist(list1[1]))))
The code I have written above so far creates the table where the orientation is 14 rows x N (number of inner lists) columns whereas I would like it to be N rows x 14 columns.
Does anyone see what I am doing wrong?
Thanks in advance!
Let's use do.call, rbind, and lapply:
## data
l1 <- list(storm_name=NA, storm_level=NA, file_date=NA, file_time=NA,
date=NA, time=NA, actual_or_forecast=NA, lat=NA, long=NA, max_wind=NA,
gusts=NA, eye_speed=NA, eye_location=NA, storm_end=NA)
big_list <- list(l1, l1, l1)
## make data.frame
do.call('rbind', lapply(big_list, data.frame))
Stepping through this, first we run lapply on big_list, so for each item in big_list, we create a data.frame. Try data.frame(l1) to see the result of each call.
Then we use do.call('rbind', ...) because the lapply returns a lists of data.frames, we want to "stack" the data on top of each other.

How to cbind many data frames with a loop?

I have 105 data frames with xts, zoo class and II want to combine their 6th columns into a data frame.
So, I created a data frame that contains all the data frame names to use it with a 'for' function:
mydata <- AAL
for (i in 2:105) {
k <- top100[i,1] # The first column contains all the data frame names
mydata <- cbind(mydata, k)
}
It's obviously wrong, but I have no idea either how to cbind so many data frames with completely different names (my data frame names are NASDAQ Symbols) nor how to pick the 6th column of all.
Thank you in advance
Try foreach package. May be there is more elegant way to do this task, but this approach will work.
library(foreach)
#create simple data frames with columns named 'A' and 'B'
df1<-t(data.frame(1,2,3))
df2<-t(data.frame(4,5,6))
colnames(df1)<-c('A')
colnames(df2)<-c('B')
#make a list
dfs<-list(df1,df2)
#join data frames column by column, this will preserve their names
foreach(x=1:2
,.combine=cbind)%do% # don`t forget this directive
{
dfs[[x]]
}
The result will be:
A B
X1 1 4
X2 2 5
X3 3 6
To pick column number 6:
df[,6]
First, you should store all of your data.frames in a list. You can then use a combination of lapply and do.call to extract and recombine the sixth columns of each of the data.frames:
# Create sample data
df_list <- lapply(1:105, function(x) {
as.data.frame(matrix(sample(1:1000, 100), ncol = 10))
})
# Extract the sixth column from each data.frame
extracted_cols <- lapply(df_list, function(x) x[6])
# Combine all of the columns together into a new data.frame
result <- do.call("cbind", extracted_cols)
One way to get all of your preexisting data.frames into a list would be to use lapply along with get:
df_list <- lapply(top100[[1]], get)

intersecting across 10 large data sets and merging automatically

I have 10 data.frames with 2 columns with names s and p. s is for sequence and p is for p-values. I want to find the sequences that intersect across all data.frames, so I did this:
# 10 data.frames are a, b, c, ..., j
masterseq_list <- Reduce(intersect, list(a$s, b$s, c$s, d$s, e$s, f$s, g$s,h$s, i$s,j$s))
I'd like to take masterseq_list and merge each dataframe a:j by this new reduced sequence so I am left with each data.frame having masterseq_list as the new column instead of s and the p-values remaining intact. I know I can use this code somehow but I'm really not sure how to do it if the column I want is currently a list.
total <- merge(data frameA,data frameB,by="s")
The files are really big so I'd like to find a way to automate this, how can I loop through this faster and efficiently? Thanks so much!
I'd start by putting all the data.frames in a list first:
my_l <- list(a,b,c)
# now get intersection
isect <- Reduce(intersect, lapply(my_l, "[[", 1))
> isect
# [1] "gtcg" "gtcgg" "gggaa" "cttg"
# subset the original data.frames for just this intersecting rows
lapply(my_l, function(x) subset(x, s %in% isect))

Resources