I have a graph stored as a dataframe that each row represents a link. Some nodes in the graph have a group of alias. What I want to do is to merge the alias information into the graph. For example a graph is
node1 node2
1 A E
2 B F
3 C G
Node A has alias A1, A2, node E has alias E1, E2, so all the connections between {A, A1, A2} and {E, E1, E2} are added to the dataframe. I think I can use the package igraph to achieve that.
Let's store the alias count in a lookup-table with row names:
> aliases=data.frame(count=c(1,2,2,3,2,1))
> row.names(aliases)=c("A","B","C","E","F","G")
This function expands a row of the data frame, and numbers the original thing with a zero:
expand.row=function(r){
e1=r[1];e2=r[2]
n1=aliases[e1,1];n2=aliases[e2,1]
expand.grid(
paste(e1,seq_len(n1)-1,sep=""),
paste(e2,seq_len(n2)-1,sep=""))
}
So that:
> expand.row(c("B","E"))
Var1 Var2
1 B0 E0
2 B1 E0
3 B0 E1
4 B1 E1
5 B0 E2
6 B1 E2
Then apply over the edge data frame, merge the bits into a single matrix:
> full = do.call(rbind,apply(as.matrix(d),1,expand.row))
> full
Var1 Var2
1 A0 E0
2 A0 E1
3 A0 E2
4 B0 F0
5 B1 F0
6 B0 F1
7 B1 F1
8 C0 G0
9 C1 G0
and there's your graph:
> g = graph.edgelist(as.matrix(full))
> plot(g)
Related
Suppose you have a factor variable whose level labels come in pairs
(such as 'a1' and 'a2', 'b1' and 'b2', etc.), and these pairs have unequal n-sizes.
x <- factor(c(rep("a1", 10), rep("a2", 15),rep("b1", 5), rep("b2", 30),rep("c1", 33), rep("c2", 22)))
> table(x)
a1 a2 b1 b2 c1 c2
10 15 5 30 33 22
But you wanted to randomly downsample the larger-sized level of each pair to
equalize their n-sizes. Here's the desired outcome:
a1 a2 b1 b2 c1 c2
10 10 5 5 22 22
I have found that caret::downSample() can downsample to equalize all the levels of
a factor:
x_ds <- caret::downSample(1:115, x)
table(x_ds$Class)
a1 a2 b1 b2 c1 c2
5 5 5 5 5 5
And I have the notion to use split() in conjunction with downSample(), but I'm having trouble figuring out a way to split on the level pairs. How could this be done?
I have two data frames with different columns and rows:
One of the tables are specifically expressed genes names in rows (750 entries) with statistical analysis (p-val, fold change) in columns. (750x2 matrix)
The second table is all expressed genes names(13,000) and their associated genes in the same row (rows go as long as 100). (13,000x100 matrix)
I am interested in creating a data frame with the 750 expressed gene names from the first file and using a match function in R that will insert the associated genes from table2.
Example:
First Data
table
Name Fold Change P-value
A 0 3
B 1 4
F 2 6
H 1 8
Second Data table
Name X1 X2 X3 X4 X5
A A1 A2 A3 A4 A5
B B1 B2 B3 B4 B5
C C1 C2 C3 C4 C5
D D1 D2 D3 D4 D5
E E1 E2 E3 E4 E5
F F1 F2 F3 F4 F5
Desired Output
Name X1 X2 X3 X4 X5
A A1 A2 A3 A4 A5
B B1 B2 B3 B4 B5
D D1 D2 D3 D4 D5
F F1 F2 F3 F4 F5
I have a bipartite edgelist that I would like to convert into a unipartite graph of just the 'from' nodes. I need to do this in a sparse matrix because of the size. Unfortunately, this means that easier solutions such as using bipartite.projection(graph) freezes everything. My data looks like:
To From Weight
A1 B2 1
A1 B3 1
A2 B2 1
A3 B3 1
A3 B4 1
A4 B2 1
A4 B3 1
Using the Matrix package, I've created a sparse matrix with the correct dimensions, but for some reason only the diagonal is populated. For the sparse matrix I used:
myMat <- sparseMatrix(as.integer(as.factor(df$from),
as.integer(as.factor(df$from),
x = df$weight,
dimnames = list(levels(as.factor(df$from)),
levels(as.factor(df$from))
)
)
returns:
B2 B3 B4
B2 2 . .
B3 . 2 .
B4 . . 1
The diagonal summed the weight, but the rest of the matrix is empty where I was expecting it to have filled with the summed weight as well.
What I'd like is:
B2 B3 B4
B2 . 2 .
B3 2 . 1
B4 . 1 .
As this is a matrix of one side of the bipartite graph with the matrix[i,j] representing the count of df$to values connecting any two df$from values. This would then be the weight I would give to edges in any network graph.
What about as_adjacency_matrix which returns sparse?
library(igraph)
df <- read.table(textConnection("
To From Weight
A1 B2 1
A1 B3 1
A2 B2 1
A3 B3 1
A3 B4 1"), header=T, stringsAsFactors=F)
g <- graph.data.frame(df[,1:2])
V(g)$type <- V(g)$name %in% df[,1]
is.bipartite(g)
as_adjacency_matrix(g)
I ended up using some matrix algebra rather than a defined function to get my result. By changing around the sparseMatrix call just a tiny bit and then multiplying by the transpose I got the matrix I was looking for
myMat <- sparseMatrix(as.integer(as.factor(df$from),
as.integer(as.factor(df$to),
x = 1,
dimnames = list(levels(as.factor(df$from)),
levels(as.factor(df$to))
)
)
finalMat <- myMat %*% t(myMat)
Suppose the following situation. There are two tables, each one of them with data of different quality. Both of them have the same variables A, B and C. Variables in the first table are called A1, B1 and C2, while those in the second table are called A2, B2, and C2.
The first table can be updated with the second table. There are six possible combinations:
A1, B1, C2
A1, B2, C1
A2, B1, C1
A1, B2, C2
A2, B1, C2
A2, B2, C1
The question is how to get that in R. What I'm using is what follows:
require(utils)
require(stringr)
vars <- c("A1", "B1", "C1", "A2", "B2", "C2")
combine <- function(data, n){
com1 = combn(data, n)# make all combinations
com2 = c(str_sub(com1, end=-2L))# remove the number in the end of the name
com3 = matrix(com2, nrow = dim(com1)[1], ncol = dim(com1)[2])# vector to matrix
com3 = split(com3, rep(1:ncol(com3), each = nrow(com3)))# matrix to list
com3 = lapply(com3, duplicated)# find list elements with duplicated names
com3 = lapply(com3, function(X){X[which(!any(X == TRUE))]})# identify duplicated names
pos = which(as.numeric(com3) == 0)# get position of duplicates
com3 = com1[,pos]# return elements from the original list
com3 = split(com3, rep(1:ncol(com3), each = nrow(com3)))# matrix to list
com3 = lapply(com3, sort)# sort by alphabetical order
com3 = as.data.frame(com3, stringsAsFactors = FALSE)# matrix to data frame
res = list(positions = pos, combinations = com3)# return position and combinations
return(res)
}
combine(vars, 3)
$positions
[1] 1 4 6 10 11 15 17 20
$combinations
X1 X2 X3 X4 X5 X6 X7 X8
1 A1 A1 A1 A1 A2 A2 A2 A2
2 B1 B1 B2 B2 B1 B1 B2 B2
3 C1 C2 C1 C2 C1 C2 C1 C2
I'd like to know if anyone knows a more straightforward solution than creating all possible combinations and afterwards cleaning up the result as my function does.
You're over thinking the problem. Just use expand.grid:
> expand.grid(c('A1','A2'),c('B1','B2'),c('C1','C2'))
Var1 Var2 Var3
1 A1 B1 C1
2 A2 B1 C1
3 A1 B2 C1
4 A2 B2 C1
5 A1 B1 C2
6 A2 B1 C2
7 A1 B2 C2
8 A2 B2 C2
I am stuck on this problem and would be happy for advice. I have the following data.frame:
c1 <- factor(c("a","a","a","a"))
c2 <- factor(c("b","b","y","b"))
c3 <- factor(c("c","y","z","c"))
c4 <- factor(c("y","z","","y"))
c5 <- factor(c("z","","","z"))
x <- data.frame(c1,c2,c3,c4,c5)
So this data looks like this:
c1 c2 c3 c4 c5
1 a b c y z
2 a b y z
3 a y z
4 a b c y z
So in each row, there is a sequence of varying length of a, b, c which concludes with values for y and z. What I need to do is to move values y and z each to separate column that I can work with, so the data looks like this:
c6 c7 c8 c9 c10
1 a b c y z
2 a b y z
3 a y z
4 a b c y z
I have worked out to identify the length of each sequence per row and added that as a column, so I know which column y and z is located in:
x$not.na <- apply(paths, 1, function(x) length(which(!x=="")))
But I am stuck on how to loop(?) over each row to perform the necessary cut and paste of z and y.
Something like this:
lastTwoToEnd<-function(x){
i<-sum(x!="")-1:0
x[c(setdiff(seq_along(x),i),i)]
}
data.frame(t(apply(x,1,lastTwoToEnd)))
## X1 X2 X3 X4 X5
## 1 a b c y z
## 2 a b y z
## 3 a y z
## 4 a b c y z