How to convert list file to matrix - r

I have a list file as below:
> results
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[[1]][[1]][[1]][[1]]
[1] "1" "inflammation" "37.5" "A" "B"
[6] "F"
[[1]][[1]][[2]]
[[1]][[1]][[2]][[1]]
[1] "1" "Apoptosis" "37.5" "C" "G" "H"
[[1]][[1]][[3]]
[[1]][[1]][[3]][[1]]
[1] "1" "Repair" "25" "A" "H"
[[2]]
[[2]][[1]]
[[2]][[1]][[1]]
[[2]][[1]][[1]][[1]]
[1] "2" "inflammation" "20" "F"
[[2]][[1]][[2]]
[[2]][[1]][[2]][[1]]
[1] "2" "Apoptosis" "40" "G" "H"
[[2]][[1]][[3]]
[[2]][[1]][[3]][[1]]
[1] "2" "Repair" "20" "H"
Also this is the output of dput function:
dput(results)
list(list(list(list(c("1", "inflammation", "37.5", "A", "B",
"F")), list(c("1", "Apoptosis", "37.5", "C", "G", "H")), list(
c("1", "Repair", "25", "A", "H")))), list(list(list(c("2",
"inflammation", "20", "F")), list(c("2", "Apoptosis", "40", "G",
"H")), list(c("2", "Repair", "20", "H")))), list(list(list(c("3",
"inflammation", "25", "F")), list(c("3", "Apoptosis", "25", "C"
)), list(c("3", "Repair", "0")))), list(list(list(c("4", "inflammation",
"50", "A", "B", "F")), list(c("4", "Apoptosis", "33.3333333333333",
"G", "H")), list(c("4", "Repair", "33.3333333333333", "A", "H"
)))))
Then I want to make a matrix like this
Number pathway wight genes
1 inflammation 37.5 A, B, F
1 Apoptosis 37.5 C, G, H
1 Repair 25 A, H
2 inflammation 20 F
2 Apoptosis 40 G, H
2 Repair 20 H
Is there any trick for this? genes columns includes various number of genes.

First you should unlist the results a few times. Then you have to paste the genes together and finally you can rbind the data. Here's how this could look like.
lst <- unlist(unlist(unlist(results, recursive=FALSE), recursive=FALSE), recursive=FALSE)
df <- do.call(rbind, lapply(lst,
function(x){
data.frame(Number=as.numeric(x[1]),
pathway=x[2],
weight=as.numeric(x[3]),
genes=paste(x[4:max(4, length(x))], collapse=", "))
}))
df
## Number pathway weight genes
## 1 1 inflammation 37.50000 A, B, F
## 2 1 Apoptosis 37.50000 C, G, H
## 3 1 Repair 25.00000 A, H
## 4 2 inflammation 20.00000 F
## 5 2 Apoptosis 40.00000 G, H
## 6 2 Repair 20.00000 H
## 7 3 inflammation 25.00000 F
## 8 3 Apoptosis 25.00000 C
## 9 3 Repair 0.00000 NA
## 10 4 inflammation 50.00000 A, B, F
## 11 4 Apoptosis 33.33333 G, H
## 12 4 Repair 33.33333 A, H

I would probably approach the issue at hand like shadow did above, but here's also an alternative recursive implementation that does not fix the number of unlists, i.e. it can handle varying depth within the list of lists.
It just traverses the tree until it finds an element other than list and performs some specified formatting on it, and finally rbinds it all to a single matrix:
recurse.format <- function(
x,
format = function(z) { c(z[1:3], ifelse(length(z)>3, paste(z[4:length(z)], collapse=","), NA)) }
){
if(class(x) == "list"){
do.call("rbind", lapply(x, FUN=recurse.format))
}else{
format(x)
}
}
mat <- recurse.format(results)
colnames(mat) <- c("Number", "Pathway", "Weight", "Genes")
print(mat)
> print(mat)
Number Pathway Weight Genes
[1,] "1" "inflammation" "37.5" "A,B,F"
[2,] "1" "Apoptosis" "37.5" "C,G,H"
[3,] "1" "Repair" "25" "A,H"
[4,] "2" "inflammation" "20" "F"
[5,] "2" "Apoptosis" "40" "G,H"
[6,] "2" "Repair" "20" "H"
[7,] "3" "inflammation" "25" "F"
[8,] "3" "Apoptosis" "25" "C"
[9,] "3" "Repair" "0" NA
[10,] "4" "inflammation" "50" "A,B,F"
[11,] "4" "Apoptosis" "33.3333333333333" "G,H"
[12,] "4" "Repair" "33.3333333333333" "A,H"

Related

R - Expand Grid Without Duplicates

I need a function similar to expand.grid but without the combinations of duplicate elements.
Here is a simplified version of my problem.
X1 = c("x","y","z")
X2 = c("A","B","C")
X3 = c("y","C","G")
d <- expand.grid(X1,X2,X3)
d
Var1 Var2 Var3
1 x A y
2 y A y
3 z A y
4 x B y
. . . .
. . . .
. . . .
23 y B G
24 z B G
25 x C G
26 y C G
27 z C G
d has 27 rows. But 6 of these contain duplicate values which I do not need Rows: 2, 5, 8, 16, 17 & 18
Is there a way to get the other 21 rows which does not contain any duplicates.
Note that vectors have more than 3 elements (c("x","y","z","k","m"...), up to 50) and number of vectors is more than 3 in the real case. (X4, X5, X6... up to 11 ). Because of this expanded object is getting real large and RAM cannot handle it.
In RcppAlgos*, there is a function called comboGrid that does the trick:
library(RcppAlgos) ## as of v2.4.3
comboGrid(X1, X2, X3, repetition = F)
# Var1 Var2 Var3
# [1,] "x" "A" "C"
# [2,] "x" "A" "G"
# [3,] "x" "A" "y"
# [4,] "x" "B" "C"
# [5,] "x" "B" "G"
# [6,] "x" "B" "y"
# [7,] "x" "C" "G"
# [8,] "x" "C" "y"
# [9,] "y" "A" "C"
# [10,] "y" "A" "G"
# [11,] "y" "B" "C"
# [12,] "y" "B" "G"
# [13,] "y" "C" "G"
# [14,] "z" "A" "C"
# [15,] "z" "A" "G"
# [16,] "z" "A" "y"
# [17,] "z" "B" "C"
# [18,] "z" "B" "G"
# [19,] "z" "B" "y"
# [20,] "z" "C" "G"
# [21,] "z" "C" "y"
Large Test
set.seed(42)
rnd_lst <- lapply(1:11, function(x) {
sort(sample(LETTERS, sample(26, 1)))
})
## Number of results that expand.grid would return if your machine
## had enough memory... over 300 trillion!!!
prettyNum(prod(lengths(rnd_lst)), big.mark = ",")
# [1] "365,634,846,720"
exp_grd_test <- expand.grid(rnd_lst)
# Error: vector memory exhausted (limit reached?)
system.time(cmb_grd_test <- comboGrid(rnd_lst, repetition=FALSE))
# user system elapsed
# 9.866 0.330 10.196
dim(cmb_grd_test)
# [1] 3036012 11
head(cmb_grd_test)
# Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11
# [1,] "A" "E" "C" "B" "D" "G" "F" "H" "J" "I" "K"
# [2,] "A" "E" "C" "B" "D" "G" "F" "H" "J" "I" "L"
# [3,] "A" "E" "C" "B" "D" "G" "F" "H" "J" "I" "M"
# [4,] "A" "E" "C" "B" "D" "G" "F" "H" "J" "I" "N"
# [5,] "A" "E" "C" "B" "D" "G" "F" "H" "J" "I" "O"
# [6,] "A" "E" "C" "B" "D" "G" "F" "H" "J" "I" "P"
* I am the author of RcppAlgos
(Sorry, I just realized that your problem is as much a size problem, so removing them post-generation may not be feasible. For that, this may not be the best answer, but I'll keep it around for smaller-and-related questions.)
base R
I hard-code "3", but you can use ncol(d) and/or ncol(d)-1 for programmatic use.
d[lengths(apply(d, 1, unique)) > 2, ]
# Var1 Var2 Var3
# 1 x A y
# 3 z A y
# 4 x B y
# 6 z B y
# 7 x C y
# 9 z C y
# 10 x A C
# 11 y A C
# 12 z A C
# 13 x B C
# 14 y B C
# 15 z B C
# 19 x A G
# 20 y A G
# 21 z A G
# 22 x B G
# 23 y B G
# 24 z B G
# 25 x C G
# 26 y C G
# 27 z C G
(The row names are not reset, you can see the gaps to verify it is not 27 rows.)
And to verify, here are the rows with dupes:
d[lengths(apply(d, 1, unique)) < 3, ]
# Var1 Var2 Var3
# 2 y A y
# 5 y B y
# 8 y C y
# 16 x C C
# 17 y C C
# 18 z C C

Adding elements to a list according to another list in R

I would like to call the elements of one list (y) according to their name to append it in another (x). The y list does not necessarily contain all the different elements of x.
Those two lists look like these:
> x
$a
[1] "c" "d" "e"
$b
[1] "f" "g" "h"
> y
$c
[1] 10 11 12
$d
[1] 20 21 22
$f
[1] 40 41 42
$g
[1] 50 51 52
The desired output is:
> z
$a
[1] "c" "d" "e" "10" "11" "12" "20" "21" "22"
$b
[1] "f" "g" "h" "40" "41" "42" "50" "51" "52"
I came out with this solution:
for (i in 1:length(x)){
for (j in 1:length(x[[i]])){
if (ifelse(isTRUE(x[[i]][j] == names(y[x[[i]][j]])), TRUE, FALSE)){
new <- y[[x[[i]][j]]]
z[[i]] <- c(z[[i]], new)
}
}
}
But I would have liked to know if the same thing could have been done with lapply(x, function(x) something) in a more efficient way perhaps.
You can use lapply with c and unlist:
lapply(x, function(i) c(i, unlist(y[i], use.names = FALSE)))
#$a
#[1] "c" "d" "e" "10" "11" "12" "20" "21" "22"
#
#$b
#[1] "f" "g" "h" "40" "41" "42" "50" "51" "52"
Data:
x <- list(a = c("c", "d", "e"), b = c("f", "g", "h"))
y <- list(c = 10:12, d = 20:22, f = 40:42, g = 50:52)
Or using map from purrr
library(purrr)
map(x, ~ c(.x, flatten_chr(y[.x])))
-output
#$a
#[1] "c" "d" "e" "10" "11" "12" "20" "21" "22"
#$b
#[1] "f" "g" "h" "40" "41" "42" "50" "51" "52"
data
x <- list(a = c("c", "d", "e"), b = c("f", "g", "h"))
y <- list(c = 10:12, d = 20:22, f = 40:42, g = 50:52)

Creating an edgelist from Patent data in R

I am trying to create an edgelist out of patent data of the form:
PatentID InventorIDs CoinventorIDs
1 A ; B C,D,E ; F,G,H,C
2 J ; K ; L M,O ; N ; P, Q
What I would like is the edgelist below showing the connections between inventors and patents. (the semicolons separate the coinventors associated with each primary inventor):
1 A B
1 A C
1 A D
1 A E
1 B F
1 B G
1 B H
1 B C
2 J K
2 J L
2 J M
2 J O
2 K N
2 L P
2 L Q
Is there an easy way to do this with igraph in R?
I'm confused by the edges going between the inventorIds. But, here is a kind of brute force function that you could just apply by row. There may be a way with igraph, it being a massive library, that is better, but once you have the data in an this form it should be simple to convert to an igraph data structure.
Note that this leaves out the edges between primary inventors.
## A function to make the edges for each row
rowFunc <- function(row) {
tmp <- lapply(row[2:3], strsplit, '\\s*;\\s*')
tmp2 <- lapply(tmp[[2]], strsplit, ',')
do.call(rbind, mapply(cbind, row[[1]], unlist(tmp[[1]]), unlist(tmp2, recursive=FALSE)))
}
## Apply the function by row
do.call(rbind, apply(dat, 1, rowFunc))
# [,1] [,2] [,3]
# [1,] "1" "A" "C"
# [2,] "1" "A" "D"
# [3,] "1" "A" "E"
# [4,] "1" "B" "F"
# [5,] "1" "B" "G"
# [6,] "1" "B" "H"
# [7,] "1" "B" "C"
# [8,] "2" "J" "M"
# [9,] "2" "J" "O"
# [10,] "2" "K" "N"
# [11,] "2" "L" "P"
# [12,] "2" "L" " Q"

How to do basic row name mapping of matrix in R?

I have very big matrix called A, I need to add one column to that matrix, which is the mapped row names of this matrix from other matrix called B .
row names of matrix A are in column called ID and it's mapped name is in column Sample
Here iss simple reproduceable example and expected output.
A<-cbind(c("a","b", "c"), c(1,2,3), c(10,14,15))
rownames(A)<-c("q","w","r")
> A
[,1] [,2] [,3]
q "a" "1" "10"
w "b" "2" "14"
r "c" "3" "15"
and second matrix, which row names of matrix A are in column ID and it's mapped name is in Sample column:
B<-cbind(c("d1","f2","g5","y4"),c("q","L","w","r"),c("qw","we","zr","ls"))
colnames(B)<-c("M","ID","Sample"
> B
M ID Sample
[1,] "d1" "q" "qw"
[2,] "f2" "L" "we"
[3,] "g5" "w" "zr"
[4,] "y4" "r" "ls"
>
Here is the expected output:
A<-cbind(c("a","b", "c"), c(1,2,3), c(10,14,15), c("qw","zr","ls"))
rownames(A)<-c("q","w","r")
> A
[,1] [,2] [,3] [,4]
q "a" "1" "10" "qw"
w "b" "2" "14" "zr"
r "c" "3" "15" "ls"
>
Would someone help me to implement it in R ?
You can also use the merge function in R.
> A <-matrix( data = NA, nrow = 3, ncol =3)
> A[1,] <- c("a" , "1", "10")
> A[2,] <- c( "b" , "2" , "14")
> A[3,] <- c("c" , "3" , "15")
>
> row.names(A) = c("q","w","r")
>
>
> B <- matrix(data = "NA" , nrow = 4, ncol = 3)
> B[1,] <- c("d1" ,"q" ,"qw")
> B[2,] <- c( "f2" ,"L" ,"we")
> B[3,] <- c("g5" ,"w", "zr")
> B[4,] <- c("y4", "r", "ls" )
> colnames(B) = c("M", "ID", "Sample")
> A
[,1] [,2] [,3]
q "a" "1" "10"
w "b" "2" "14"
r "c" "3" "15"
> B
M ID Sample
[1,] "d1" "q" "qw"
[2,] "f2" "L" "we"
[3,] "g5" "w" "zr"
[4,] "y4" "r" "ls"
>
> C <- merge(A, B, by.x = 0, by.y = "ID" )
> D <- C[,-5]
> D
Row.names V1 V2 V3 Sample
1 q a 1 10 qw
2 r c 3 15 ls
3 w b 2 14 zr
You were almost there just putting the sample matrices together.
While we cannot use the $ operator on matrices, we can use the dimnames (as well as the row/column numbers) to subset the matrix. Then we can find which ID are in the row names of A with %in%
> cbind(A, B[,"Sample"][B[,"ID"] %in% rownames(A)])
# [,1] [,2] [,3] [,4]
# q "a" "1" "10" "qw"
# w "b" "2" "14" "zr"
# r "c" "3" "15" "ls"

Appending values with different order in R

I have two data elements in R:
data1
1 M
2 T
3 Z
4 A
5 J
data2 values
[1,] "A" "aa"
[2,] "J" "ab"
[3,] "M" "ac"
[4,] "T" "ad"
[5,] "Z" "ae"
I would like to get:
data1 values
[1,] "M" "ac"
[2,] "T" "ad"
[3,] "Z" "ae"
[4,] "A" "aa"
[5,] "J" "ab"
How can I append the values to data 1 such that they are sorted according to the different order in data 1?
You can get this behavior with the match function:
dat1 = data.frame(data1=c("M", "T", "Z", "A", "J"), stringsAsFactors=FALSE)
dat2 = data.frame(data2=c("A", "J", "M", "T", "Z"),
values=c("aa", "ab", "ac", "ad", "ae"), stringsAsFactors=FALSE)
dat2[match(dat1$data1, dat2$data2),]
# data2 values
# 3 M ac
# 4 T ad
# 5 Z ae
# 1 A aa
# 2 J ab

Resources