Objective (In the R environment): extract nodes and edges from a dataframe to use them for modeling a graph!!
I am trying to learn how to work with DiagrammeR or any other graph modeling libraries in order to get a graph such as the one in below (you can follow the link [The GRAPH1]) from a data frame :
The data frame:
a b c classes
1 2 0 a
0 0 2 b
0 1 0 c
I have used DiagrammeR library and defined nodes and edges manually by these commands:
library(DiagrammeR)
egrViz("
digraph boxes_and_circles{
#add the node statement
node[shape=box]
a; b; c;
#add the nodge statement
a->a [label=1]; a-> b[label=2]; b->c[label=2]; c->b[label=1]
graph [nodesep=0.1]
}
")
Could you help me to understand how I can get the nodes and edges automatically? Thank you in advance.
You can do this with the igraph package. Your data frame is an adjacency matrix and igraph contains a function to make that into a graph. My code below adds a layout to position the vertices in the positions that you indicated in your sample graph.
## Your data
df = read.table(text="a b c classes
1 2 0 a
0 0 2 b
0 1 0 c",
header=TRUE)
library(igraph)
g = graph_from_adjacency_matrix(as.matrix(df[,1:3]), weighted=TRUE)
LO = matrix(c(0,0,0,3,2,1), ncol=2)
plot(g, layout=LO, edge.label=E(g)$weight, vertex.shape="rectangle",
vertex.color="white", edge.curved=c(0,0,0.15,0.15))
Related
In igraph in R, if we use as_adjacency_matrix(g) on an undirected graph g, sometimes I see an entry of 2 versus just getting 0 or 1. I am used to only getting 0's or 1's but what does a 2 mean and how can I get rid of it?
as_adj() returns a matrix where any edges between two edges are counted. If you have more than one edge between two vertices, the weighted matrix will contain numbers above one. Compare these two examples:
# Single edges give weights of 1
g <- make_empty_graph(directed=F) %>% add_vertices(3) %>% add_edges(c(1,2,2,3,3,1))
plot(g)
( as_adj(g) )
# Multiple links give weights > 1
g <- g %>% add_edges(c(1,2,2,3,2,3))
( as_adj(g) )
plot(g)
I have a graph like below:
adj <- read.table(text = "
A B C D
A 0 1 0 0
B 1 0 1 1
C 0 1 0 0
D 0 1 0 0
", header = T)
g <- graph_from_adjacency_matrix(as.matrix(adj))
I want to compute each node's distance like below:
distMat <- 1/2^distances(g)
for (i in 1:nrow(distMat)) {
res[i] <- sum(distMat[i, ]) - distMat[i, i]
}
names(res) <- V(g)$name[V(g)]
res
But the number of values in the result should be equal to the number of nodes in the graph but it's not and I have 5 values instead of 4. Any idea how to fix it?
Your code does not return 5 values. One reason that you may run into trouble is that your code is much more complicated than it needs to be. Look at what it does:
res <- 1:nrow(distMat)
for (i in 1:nrow(distMat)) {
res[i]<-sum( distMat[i,]) - distMat[i,i]
}
is a loop of your distance matrix where each row is summarised before subtracting distMat[i,i] which must always be 1 since distMat[i,i] refer's to each node's distance to itself. A nicer rewrite for the same would be:
res <- rowSums(1/2^distances(g))-1
It then becomes easier to see that kind of calculations are really centrality measures. The lowest sum of distances to all other nodes will be associated with high centrality.
iGraph has functions to compute a whole range of documented and established centrality measures. See degree(), closeness(), or betweenness(). What is the advantage of yours?
Look at your centrality measure and play around using this code:
# Make random graph with more nodes and calculate your centrality measure as res
graph <- erdos.renyi.game(80, 100, "gnm", directed=FALSE)
res <- rowSums(1/2^distances(graph))-1
# Colour each node according to their distance to every other node
colfunc <- colorRampPalette(c("yellow", "red"))
gradient <- colfunc(max(res, na.rm=T))
V(graph)$color <- gradient[round(res)+1]
# Plot it
plot(graph, vertex.label="")
Play around by using res <- betweenness(graph) for example.
I have a PCA that shows two really big clusters and I dont know how to figure out which of my samples are in each cluster.
If it helps, Im using prcomp to generate the PCA:
pca1 <- autoplot(prcomp(df), label = TRUE, label.size = 2)
My approach has been to attempt to cluster the PCA output using kmeans with 2 groups to get the clusters:
pca <- prcomp(df, scale.=TRUE)
clust <- kmeans(pca$x[,1:2], centers=2)$cluster
I can then make a beautiful plot, but I am still lost as to which samples are in each cluster. For reference, here is the plot generate if I graph the kmeans output:
As you can see in the first PCA plot, the labels literally say which sample each dot is. My ideal output would be a two column txt file with the sample name in one column, and the group it belongs to in the other column.
All that aside, if there is a better way, please let me know.
Thanks in advance.
Here is a chunk of my data:
a b c b e
Sample_1013 312011 624559 625898 534309 220415
Sample_1046 474774 949458 951145 843049 366136
Sample_104 645363 1290450 1292520 919474 272200
Sample_1057 267319 534685 535294 690574 422645
Sample_106 414065 830571 834527 657354 234130
Sample_107 299289 602483 603756 566256 262153
In my question, clust is the name of the output from my kmeans:
clust <- kmeans(pca$x[,1:2], centers=2)$cluster
I typed clust into the terminal and got which samples belong to each group:
> clust
Sample_1013 Sample_1046 Sample_104 Sample_1057 Sample_106 Sample_107
1 1 1 1 1 1
Sample_1098 Sample_109 Sample_1109 Sample_1129 Sample_1130 Sample_1140
1 1 1 1 1 1
Sample_1149 Sample_115 Sample_118 Sample_1220 Sample_1223 Sample_1225
1 1 1 1 1 1
Hopefully this helps someone.
I have a list of clusters lets say from cluster 1 to cluster 3; along with
their membership for example below. I would like to display the clusters in radial format. I was thinking of using the as.phylo function
in the ape package to display this, but that requires creating a hclust object.If anyone knows how to do this thats much appreciated creating a hclust object or otherwise.
Many Thanks!
cl var numberOfCluster
1 a 1
1 b 1
1 c 1
1 d 1
1 a 2
1 b 2
2 c 2
2 d 2
3 a 3
1 b 3
2 c 3
2 d 3
Thanks very much!
(This is a copy of my answer to a similar question from "crossvalidated")
Assuming you can create hclust (from variables which can have a distance measure defined on them) - then it can be done by combining two new packages: circlize and dendextend.
The plot can be made using the circlize_dendrogram function (allowing for a much more refined control over the "fan" layout of the plot.phylo function).
# install.packages("dendextend")
# install.packages("circlize")
library(dendextend)
library(circlize)
# create a dendrogram
hc <- hclust(dist(datasets::mtcars))
dend <- as.dendrogram(hc)
# modify the dendrogram to have some colors in the branches and labels
dend <- dend %>%
color_branches(k=4) %>%
color_labels
# plot the radial plot
par(mar = rep(0,4))
# circlize_dendrogram(dend, dend_track_height = 0.8)
circlize_dendrogram(dend, labels_track_height = NA, dend_track_height = .4)
I have an matrix as below:
jerry peter king
jerry 1 0 0
peter 0 1 0
king 1 1 1
Now I am trying to draw a graph standing for the matrix with the code below:
t <- read.table("../data/table.dat");
adjm <- data.matrix(t);
g1 <- graph.adjacency(adjm,add.colnames=NULL);
plot(g1, main="social network", vertex.color="white", edge.color="grey", vertex.size=8,
vertex.frame.color="yellow");
The labels of the vertices is the id, so my question is how do I set the label of the vertices by the dimnames of the matrix?
I have tried to the code
vertex.label=attr(adjm,"dimnames")
but get the wrong graph.
There are 2 ways to do this:
When you create the graph object, assign the names to a vertex attribute called label. This is the default that plot.igraph() looks for when plotting.
g1 <- graph.adjacency(adjm,add.colnames='label')
Use the V iterator to extract the name vertex attribute, which is how they are stored if you use add.colnames=NULL.
plot(g1, main="social network", vertex.color="white", edge.color="grey", vertex.size=8, vertex.frame.color="yellow", vertex.label=V(g1)$name)
Either way will give you your desired result. Something like: