How to merge subsets of data frames from a list (i.e., merge all of the first dfs from each list component) - r

I have seen a number of answers as to how to merge dataframes from a list when each list element is a single data frame. However, in my case, each list element contains two data frames. I want to merge all of the first together and all of the second. As a dummy example:
lst<-list()
lst[[1]]<-list(data.frame(cat=c(1:5), type=c(11:15)), data.frame(group=c("A","B","C"), num=c(1:3)))
lst[[2]]<-list(data.frame(cat=c(22:26), type=c(50:54)), data.frame(group=c("H","I","J"), num=c(7:9)))
I want to merge the first elements together and the second elements together, to yield two data frames:
df1:
cat type
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 22 50
7 23 51
8 24 52
9 25 53
10 26 54
df2:
group num
1 A 1
2 B 2
3 C 3
4 H 7
5 I 8
6 J 9
I am sure there is some straightforward way to do this (somehow with do.call and rbind??) but I cannot figure out how to reference the various elements within each list properly.
Clearly with this small example I could just do it manually by:
df1<-rbind(lst[[1]][[1]], lst[[2]][[1]])
However, my actual list includes hundreds of data frames. I can do it by creating a loop and rbinding in one at a time sequentially, but I'm sure there is a more efficient way...Thanks for any help!

You can use Reduce function(where you can customize how to reduce it) to rbind data frames. Reduce takes two elements from the list every time and reduce it to one element based on your function, and for the customized rbind since each two data frames need to be bound separately, you can use Map, put them together:
Reduce(function(x, y) Map(rbind, x, y), lst)
# [[1]]
# cat type
# 1 1 11
# 2 2 12
# 3 3 13
# 4 4 14
# 5 5 15
# 6 22 50
# 7 23 51
# 8 24 52
# 9 25 53
# 10 26 54
# [[2]]
# group num
# 1 A 1
# 2 B 2
# 3 C 3
# 4 H 7
# 5 I 8
# 6 J 9
Or maybe a faster way:
lapply(1:2, function(i) do.call(rbind, lapply(lst, `[[`, i)))

Related

merging every nth columns from different data frames as text files

I have three data frames with 50 columns. I would like to write 50 text files in a way that the first text file should contain the first columns from the three data frames, the second text file should contain the second columns from the data frames, and so on. How can I use the column headers, which are the same in the three data frames, as the names of the output text files? Kindly suggest an R solution.
df1 <- data.frame(C1=c(1,2),C2=c(3,4))
df2 <- data.frame(C1=c(5,6),C2=c(7,8))
df3 <- data.frame(C1=c(9,10),C2=c(11,12))
Sample data, a list of three identically-structured frames:
frames <- list(
frame1 = data.frame(x=1:3, y=4:6, z=7:9),
frame2 = data.frame(x=11:13, y=14:16, z=17:19),
frame3 = data.frame(x=21:23, y=24:26, z=27:29))
If instead you have three individual (not list of) frames, create the list with
frames <- list(Df1, Df2, Df3)
Transposing in a fashion:
newframes <- lapply(seq_len(ncol(frames[[1]])), function(i) {
as.data.frame(lapply(frames, `[[`, i))
})
names(newframes) <- names(frames[[1]])
newframes
# $x
# frame1 frame2 frame3
# 1 1 11 21
# 2 2 12 22
# 3 3 13 23
# $y
# frame1 frame2 frame3
# 1 4 14 24
# 2 5 15 25
# 3 6 16 26
# $z
# frame1 frame2 frame3
# 1 7 17 27
# 2 8 18 28
# 3 9 19 29
Writing to files:
for (nm in names(newframes)) write.csv(newframes[[nm]], paste0(nm, ".csv"))
This implementation makes the column names the name of the original frame, which is less ambiguous (in my opinion) than having them all the same column name x,x,x, y,y,y, etc). If you really need that though (and don't care that R will complain about it if you try to do follow-on work with it), then before writing it,
newframes <- Map(function(nm, x) `colnames<-`(x, rep(nm, ncol(x))),
names(newframes), newframes)
newframes
# $x
# x x x
# 1 1 11 21
# 2 2 12 22
# 3 3 13 23
# $y
# y y y
# 1 4 14 24
# 2 5 15 25
# 3 6 16 26
# $z
# z z z
# 1 7 17 27
# 2 8 18 28
# 3 9 19 29

Split a dataset into a list of dataframes with equal number of columns

I have a data set with 36 columns and single observation. I want to split it into a list with each dataframe having 3 columns and then rbind them into a single data frame.
I have been using the following code:
m=12
nc<-ncol(df)
df1<-lapply(split(as.list(df), cut(1:nc, m, labels = FALSE)), as.data.frame)
df1<-do.call("rbind",df1)
This code is working. But the problem comes when I try to run this code in shiny app.
Can someone suggest a replacement for above code
We can split the one row dataframe by generating a specific sequence
do.call("rbind", split(c(t(df)), rep(seq(1, ncol(df)/3), each = 3)))
where
rep(seq(1, ncol(df)/3), each = 3)
would generate
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8
9 9 9 10 10 10 11 11 11 12 12 12

Create all possible combinations from two values for each element in a vector in R [duplicate]

This question already has answers here:
How to generate a matrix of combinations
(3 answers)
Closed 6 years ago.
I have been trying to create vectors where each element can take two different values present in two different vectors.
For example, if there are two vectors a and b, where a is c(6,2,9) and b is c(12,5,15) then the output should be 8 vectors given as follows,
6 2 9
6 2 15
6 5 9
6 5 15
12 2 9
12 2 15
12 5 9
12 5 15
The following piece of code works,
aa1 <- c(6,12)
aa2 <- c(2,5)
aa3 <- c(9,15)
for(a1 in 1:2)
for(a2 in 1:2)
for(a3 in 1:2)
{
v <- c(aa1[a1],aa2[a2],aa3[a3])
print(v)
}
But I was wondering if there was a simpler way to do this instead of writing several for loops which will also increase linearly with the number of elements the final vector will have.
expand.grid is a function that makes all combinations of whatever vectors you pass it, but in this case you need to rearrange your vectors so you have a pair of first elements, second elements, and third elements so the ultimate call is:
expand.grid(c(6, 12), c(2, 5), c(9, 15))
A quick way to rearrange the vectors in base R is Map, the multivariate version of lapply, with c() as the function:
a <- c(6, 2, 9)
b <- c(12, 5, 15)
Map(c, a, b)
## [[1]]
## [1] 6 12
##
## [[2]]
## [1] 2 5
##
## [[3]]
## [1] 9 15
Conveniently expand.grid is happy with either individual vectors or a list of vectors, so we can just call:
expand.grid(Map(c, a, b))
## Var1 Var2 Var3
## 1 6 2 9
## 2 12 2 9
## 3 6 5 9
## 4 12 5 9
## 5 6 2 15
## 6 12 2 15
## 7 6 5 15
## 8 12 5 15
If Map is confusing you, if you put a and b in a list, purrr::transpose will do the same thing, flipping from a list of two elements of length three to a list of three elements of length two:
library(purrr)
list(a, b) %>% transpose() %>% expand.grid()
and return the same thing.
I think what you're looking for is expand.grid.
a <- c(6,2,9)
b <- c(12,5,15)
expand.grid(a,b)
Var1 Var2
1 6 12
2 2 12
3 9 12
4 6 5
5 2 5
6 9 5
7 6 15
8 2 15
9 9 15

How can I make an list from existing data frame, each object in a list contains a vector of a single or multiple row from the data frame?

I am very new to R, still getting my head around so my question can be very basic but please help me out!
I have a large data frame, with more than 400000 rows.
GENE_ID p1 p2 p3 ...
41 1 2 3
41 4 5 6
41 7 8 9
85 1 2 3
1923 1 2 3
1923 4 5 6
First, I wanted to simply name the GENE_ID as the row name, but due to some gene IDs not unique, I failed.
Now I am thinking of making this data frame into a list each object contains expression level of a gene.
So what I want is a list that has outcome something like,
mylist$41
[1] 1 2 3 4 5 6 7 8 9
mylist$85
[1] 1 2 3
mylist$1923
[1] 1 2 3 4 5 6
Any advice to achieve this would be greatly appreciated.
We can do a melt by 'GENE_ID' and then do the split to get a list of vectors
library(reshape2)
mylist <- melt(df1, id.var = 'GENE_ID')
split(mylist$value, mylist$GENE_ID)
#$`41`
#[1] 1 4 7 2 5 8 3 6 9
#$`85`
#[1] 1 2 3
#$`1923`
#[1] 1 4 2 5 3 6
Also, we can do this in base R
v1 <- unlist(df1[-1], use.names = FALSE)
grp <- rep(df1[,1], ncol(df1[-1]))
split(v1, grp)

Making a data frame that is a subset of two data frames

I am stumped again.
I have two data frames
dataframe1
a b c
[1] 21 12 22
[2] 11 9 6
[3] 4 6 7
and
dataframe2
f g h
[1] 21 12 22
[2] 11 9 6
[3] 4 6 7
I want to take the first column of dataframe1 and make three new dataframes with the second column being each of the three f,g and h
Obviously I could just do a subset over and over
subset1 <- cbind(dataframe1[,1]dataframe2[,1])
subset2 <- cbind(dataframe1[,1]dataframe2[,2])
but my dataframes will have variable numbers of columns and are very long row numberwise. So I am looking for a little more something general. My data frames will always be the same length.
The closest I have come to getting anything was with apply and cbind but I got either a set of three rows that were a and f, a and g, a and h each combined as single numeric vector or I get a single data frame with four columns, a,f,g,h.
Help is deeply appreciated.
You can use lapply it iterate over the columns of dataframe2 like so:
lapply(dataframe2, function(x) as.data.frame(cbind(dataframe1[,1], x)))
This will result in a list object where each entry corresponds to a column of dataframe2. For example:
$f
V1 x
1 21 21
2 11 11
3 4 4
$g
V1 x
1 21 12
2 11 9
3 4 6
$h
V1 x
1 21 22
2 11 6
3 4 7

Resources