I have a directed graph i.e matrix of order n x n.
I need to find all the cycles present in it along with the vertices involved in the cycle.
Here is an example:
A B C D
0 1 1 1
1 0 1 0
1 0 0 0
1 0 0 0
The output should be similar to:
No.of cycles found : 4
A->B->A
A->B->C->A
A->C->A
A->D->A
You should be looking for elementary cycles, in which no vertex (other than begin/end) appears more than once. In that case there are linear time algorithms (linear in nodes + edges). See http://www.cs.tufts.edu/comp/150GA/homeworks/hw1/Johnson%2075.PDF, for example. This comes from the second answer at Finding all cycles in a directed graph, which is better than the first IMHO.
Related
I am new to bioinformatics in general and would really appreciate some help and tips with the project Im working on.
My data of protein-protein interactions is stored in a table (in MySQL) with binary information about tissue-specificity. Now I am trying to create an undirected graph with igraph in R, but could not understand what type of data structure I should use without losing the tissue-information (Adjacency matrix, edge list..?).
Thank you in advance!
The data itself about 200k rows, but here is an example of the structure:
symbol1
symbol2
adipose_tissue
adrenal_gland
amygdala
bone
POT1
PRMT7
0
0
0
0
CNBP
HNRNPAB
1
1
1
1
TRIAP1
BAG3
1
1
1
1
NR5A1
RALY
0
1
0
0
TPI1
CCDC8
1
1
1
1
MRPS22
BARD1
0
0
0
1
TOP2A
CCDC8
0
0
0
1
MYH9
TRIM72
0
0
0
0
ATXN7
TAF12
1
0
0
1
PSEN1
STT3B
1
1
1
1
ATP5F1
TSG101
1
1
1
1
BRCA1
UTP4
0
0
1
1
Bioinformatics apart, this is a question of data-wrangling in igraph. Igraph is capable of building graph-objects from both matrices and lists in many formats, so one should avoid too much pre-conversion. I suggest you build your graph using graph_from_data_frame()
I assume that the data structure described above is relational and therefore basically already an edge-list of relations between proteins uniprot2 and uniprot2. This mockup sample-data would then mimics your data-structure.
data <- data.frame(uniprot1 = c('Q94X','Q95X','Q435','QUUU','0982'),
uniprot2 = c('QUUU','Q94X','Q95X','Q95X','Q94X'),
symbol = c('Symbol A', 'Symbol B', 'Symbol C','Symbol D',' Symbol E'),
adipose_tissue = c(1,0,0,1,1),
bone=c(0,0,0,1,1))
To keep variables other than just the relational edges between vertices, you can either create them alongside your graph-objects, or add and manipulate them later manually.
Attributes naturally belong either to vertices or to edges. A veracity-attribute in your data would be a protein name, size or other characteristic. An edge-attribute would be the relational strength, type, or any other characteristic of the link between two proteins. If your graph would have a veracity called understandable_name_of_protein you'd access it like so:
V(g)$understandable_name_of_protein
Edge-attributes follow the same principle through E(g)$attribute. When you load the example data above, all your edge-attributes should jump right into your graph like this:
# Build an undirected graph using the edges described in `data`
g <- graph_from_data_frame(data, directed=FALSE)
# Check sure that data was correctly imported as edge-attributes
E(g)$bone
# Add the edge-attribut `color` which will be displayed when plotting the graph
E(g)$color <- ifelse(E(g)$bone == 0, 'green','black')
# plot to see the graph with the bone-attribute visible as edge-color
plot(g)
I am trying to understand the basic math of a 4x4 matrix in Maya 3d software, and I can't seem to find anything specific enough to my scenario that I can understand.
I basically have an object with a matrix like this:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
I know that the buttom row represents translations, and the 1 in each row is the scale value.
But... if I rotate the object in X by 30*, then I get a matrix like this:
1 0 0 0
0 0.8 0.5 0
0 -0.5 0.8 0
0 0 0 1
Firstly, how would I go about mathematically calculating the rotation x value from knowing only the matrix?
Secondly, how would I go about calculating the matrix value based on only knowing rotations, translations and scale of a 3d object?
As we talk about the Autodesk Maya, we can use the OpenMaya API:
import maya.cmds as cmds
import maya.api.OpenMaya as om
# An object of interest:
object = "pCube1"
# Get the transform matrix as a list of 16 floats
m_list = cmds.xform(object, query=True, matrix=True)
# Create the MMatrix object
m = om.MMatrix(m_list)
# Get the MTransformationMatrix
mt = om.MTransformationMatrix(m)
# Get the rotations
rot = mt.rotation()
# Rotations in radians (as if rotated in the xyz order):
print rot.x, rot.y, rot.z
# Rotations in degrees:
print om.MAngle(rot.x).asDegrees(), om.MAngle(rot.y).asDegrees(), om.MAngle(rot.z).asDegrees()
This question already has an answer here:
Generating distinct groups of nodes in a network
(1 answer)
Closed 3 years ago.
Given a symmetric binary similarity matrix M (1 = similarity), I want to extract all (potentially overlapping) subsets, where all elements within a set are mutually similar.
A B C D E
A 1 1 0 0 0
B 1 1 1 1 0
C 0 1 1 1 1
D 0 1 1 1 1
E 0 0 1 1 1
Also, sets contained within other sets should be discarded (e.g. {D,E} is contained in {C,D,E}). For the matrix the result would be: {A,B}, {B,C,D}, {C,D,E}
How can I easily achieve this?
I suspect that there is some clustering algorithm for this, but I am unaware of the name for these types of problems. To which (mathematical) class of problems does this task belong to?
Code
M <- matrix(c(1,1,0,0,0,
1,1,1,1,0,
0,1,1,1,1,
0,1,1,1,1,
0,0,1,1,1), ncol = 5, byrow = TRUE)
colnames(M) <- rownames(M) <- LETTERS[1:5]
PS. While this may smell like some homework assignment, but its actually a problem I encountered in my job :)
A clique is a subgraph that is completely connected.
What you are looking for is hence (maximal) clique detection.
https://en.wikipedia.org/wiki/Clique_problem
Beware that the results can be much larger than you anticipate. Consider a graph where each edge is 1 with a probability of p. For p close to 1, almost any subset is a clique. Finding maximum cliques then becomes expensive. P can also be chosen to maximize the number of maximal cliques...
I'm trying to change the node colors in a particular graph, but the V(gsna)$color command doesn't work. For some reason, I can change shapes, but not colors. Specifically, I want to change the colors of nodes that occupy each cohesive block. The default colors don't read well in black and white print.
The dataset, sna, is a 2-mode, asymmetric incidence matrix. Here's an example (in reality, the dataset is much larger):
Attr1 Attr2 Attr3 Attr4 Attr5
Subj1 1 0 0 1 1
Subj2 1 0 0 1 1
Subj3 1 0 1 0 1
Subj4 1 0 0 1 1
Subj5 0 1 0 0 0
Subj6 0 1 1 0 0
I used the cohesive.blocks() command to create hierarchically nested blocks. Subjects are represented by circles, attributes are represented by squares.
Here is my code:
library(igraph)
as.matrix(sna) -> sna
gsna <- graph.incidence(sna)
bloc <- cohesive.blocks(gsna)
par(mar=c(.05,.05,.05,.05),cex=.8)
V(gsna)[V(gsna)$type == 1]$shape <- "square"
V(gsna)[V(gsna)$type == 0]$shape <- "circle"
plot(bloc,gsna,layout=layout.fruchterman.reingold,vertex.size=5,edge.color="gray40",
vertex.label.color="black",mark.groups=blocks(bloc))
I also tried using vcol <- colorRampPalette(c("red4","green","aliceblue")) and adding the vertex.color=vcol option to the plot() function, but that doesn't change anything.
I would like to thank Professor Ronald Breiger for answering this question in person.
The solution is as follows:
group1 <- bloc$blocks[[1]]
a <- V(gsna)[group1]$color <- rep("blue4", length(group1))
group2 <- bloc$blocks[[2]]
b <- V(gsna)[group2]$color <- rep("deeppink", length(group2))
group3 <- bloc$blocks[[3]]
c <- V(gsna)[group3]$color <- rep("greenyellow", length(group3))
plot(gsna,layout=layout.fruchterman.reingold,vertex.size=5,edge.color="gray40",
vertex.label.color="black",vertex.color=V(gsna)$color,mark.groups=blocks(bloc))
The rep() command is used to replicate a particular color (e.g., blue) for each element of a block (e.g., group1). Be sure to specify the color for each block AND the length of the block. If you leave out the length, it will not color all of the nodes in the block.
The order of the commands is important in this case because there are hierarchical clusters. Colors should be assigned in order from largest (in this case, blocks[[1]]) to smallest (blcoks[[3]]).
In the plot command, it is sufficient to input V(gsna)$color.
I am currently working on dynamic temporal network.
Header: Time Sender Receiver
1 1 2
1 1 3
2 2 1
2 2 1
3 1 2
3 1 2
The above is a sample of my dataset.
There are 3 time periods (sessions) and the edgelists between nodes.
I want to compute centrality measures by each time period.
I am thinking about writing a script that compute centrality measures within the same period of the time.
However I am just wondering whether there might be R libraries that can handle this problem.
Is there anyone who knows about?
Jinie
I tried to write the code for subsetting data based on Time as follows:
uniq <-unique(unlist(df$Time))
uniq
[1] 1 2 3
for (i in 1:length(uniq)){
t[i]<-subset(df, Time==uniq[i])
net[i] <-as.matrix(t[i])
netT[i]<-net[i][,-3] #removing time column
#### getting edgelist
netT[i][,1]=as.character(net[i][,1])
netT[i][,2]=as.character(net[i][,2])
g [i]=graph.edgelist(netT [i], directed=T)
g[i]
}
however, I've got a error message ( Error in t[i] <- subset(df, Time == uniq[i]) :
object of type 'closure' is not subsettable)
Do you know why? I am kind of new to R so it is hard to figure it out.
I guess t[i] is the problem. I don't know how to assign t[i] as a data frame.
The networkDynamic R library is helpful for this sort of thing (disclaimer: I'm a package maintainer)
library(networkDynamic)
# a data frame with your input data
raw<-data.frame(time=c(1,1,2,2,3,3),
sender=c(1,1,2,2,1,1),
receiver=c(2,3,1,1,2,2))
# add another time column to define a start and end time for each edge spell
raw2<-cbind(raw$time,raw$time+1,raw$sender,raw$receiver)
# create a networkDynamic object using this edge timing info
nd<-networkDynamic(edge.spells=raw2)
# load the sna library with static network measures
library(sna)
# apply degree measure to static networks extracted at default time points
lapply(get.networks(nd),degree)
[[1]]
[1] 2 1 1
[[2]]
[1] 1 1 0
[[3]]
[1] 1 1 0
You could try the igraph library. I'm not familiar with it, but i find this question interesting enough to code up an answer, so here we go:
Because you've got a directed network (senders and receivers) you're going to need to two measures of centrality: indegree and outdegree.
Calculating this is fairly simple, the complication is splitting by time points. So for each time point we need to do the following:
Create an adjacency matrix indicating for each row (sender) the number of connections to each column (receiver).
From that we can simply add up the connections in the rows to get the outdegree, and the connections in the columns for the indegree.
Assuming your data is stored in a data.frame named df we can use split to split your data.frame by time point:
nodes <- unique(c(unique(df$Sender), unique(df$Receiver)))
centrality <- lapply(split(df, df$Time), function(time.df) {
adj <- matrix(0, length(nodes), length(nodes), dimnames=list(nodes, nodes))
for (i in 1:nrow(time.df)) {
sender <- time.df[i, "Sender"]
receiver <- time.df[i, "Receiver"]
adj[sender, receiver] <- adj[sender, receiver] + 1
}
list(indegree=colSums(adj), outdegree=rowSums(adj))
})
names(centrality) <- paste0("Time.Point.", 1:length(centrality))
If we run the code on your data (I've replaced the Senders and Receivers with letters for clarity):
> centrality
$Time.Point.1
$Time.Point.1$indegree
a b c
0 1 1
$Time.Point.1$outdegree
a b c
2 0 0
$Time.Point.2
$Time.Point.2$indegree
a b c
2 0 0
$Time.Point.2$outdegree
a b c
0 2 0
$Time.Point.3
$Time.Point.3$indegree
a b c
0 2 0
$Time.Point.3$outdegree
a b c
2 0 0