How to plot overlap clustering with a list of vertices of each group and the edge list by R? - r

I have a file csv including edge list of graph. After implementing CONGA
(Clustering Overlap Girven-NewMan alorithm), result is a list of vertices of each group.
I don't know how to plot it so that each group has different color in graph by R.
I can plot graph with edge list in R, but I don't know how to mark vertices in each group.
Input: edge list file and list of vertices in each group.
Output: graph with different color for each group.
output nearly like this
My English isn't good. Thanks for your support.

You plot colors using $color of vertices. Try to assign a color like V(g)$color <- 'green'.
It is better if you give us some code.
You say you get a list of your group-members. Convert the list to a vector, and sign a new color to each unique group-member value. I wrote this example code. I think it shows what you're after.
library(igraph)
get_a_random_network <- function() {
# EN: Function to get some random data to use as an example
g <- erdos.renyi.game(100, 60, type="gnm", directed=F, loops=FALSE)
g <- g %>% delete_vertices( V(g)[degree(g)==0] )
(g)
}
# Get sample data
g <- get_a_random_network()
# Use a cluster algorythm to determine groups. You said you had a list. I use this to generate example data.
groups <- cluster_fast_greedy(g)
# Look at the vertecies
(V(g))
# Look at what groups they belong to:
(groups$membership)
# Here you write that you have "list of vertices of each group". You don't
# give us code, but I assume that you have data that looks like this:
CONGA_list <- lapply(1:max(groups$membership),function(x) V(g)[groups$membership ==x])
(CONGA_list)
# This is where you should really have provided a code example.
# You could convert a list like CONGA_list to a vector like this:
membership_groups <- rep(0, length(V(g)))
for(x in 1:length(CONGA_list)){
membership_groups[as.vector(CONGA_list[[x]])] <- x
}
(membership_groups == groups$membership)
# You give color to your network by first telling each vector which group it belongs to
V(g)$membership <- groups$membership
# Then we asign a color. I use a vector of R-colors which I get like this...
colors = grDevices::colors()[grep('gr(a|e)y', grDevices::colors(), invert = T)]
# ... and then I sample from them to give each vertecy a color.
colors <- sample(colors, max(V(g)$membership))
V(g)$color <- colors[V(g)$membership]
(V(g)$color)
# The plot will work with the colors in V(g)$color
plot(g, vertex.size=7, vertex.label=NA)
Good luck

Related

Create bipartite graph in R?

So this question has been asked here and here... but I cant seem to adapt it to my problem. I am trying to create a bipartite graph using the igraph package in R, that looks something like this:
The code im using to try this is:
# create all pairs and turn into vector for graph edges
pairs <- expand.grid(1:6, 1:6) # create all pairs
pairs <- pairs[!pairs$Var1 == pairs$Var2, ] # remove matching rows
ed <- as.vector(t(pairs)) # turn into vecotr
# create graph
g <- make_empty_graph(n = 6)
g <- add_edges(graph = g, edges = ed)
plot(g)
This will a create a graph... but im trying to make it resemble the graph in the image, with, say, (1,2,3) on the top and (4,5,6) on the bottom.
I tried using make_bipartite_graph() and layout_as_bipartite... but I cant seem to get it to work... any suggestions?
If the graph is created straight from the data.frame it will not be a bipartite graph.
library(igraph)
g <- graph_from_data_frame(df)
is.bipartite(g)
#[1] FALSE
But it will be a bipartite graph if created from the incidence matrix.
tdf <- table(df)
g <- graph.incidence(tdf, weighted = TRUE)
is.bipartite(g)
#[1] TRUE
Now plot it.
colrs <- c("green", "cyan")[V(g)$type + 1L]
plot(g, vertex.color = colrs, layout = layout_as_bipartite)

R Indexing a matrix to use in plot coordinates

I'm trying to plot a temporal social network in R. My approach is to create a master graph and layout for all nodes. Then, I will subset the graph based on a series of vertex id's. However, when I do this and layout the graph, I get completely different node locations. I think I'm either subsetting the layout matrix incorrectly. I can't locate where my issue is because I've done some smaller matrix subsets and everything seems to work fine.
I have some example code and an image of the issue in the network plots.
library(igraph)
# make graph
g <- barabasi.game(25)
# make graph and set some aestetics
set.seed(123)
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sample(1:25, 15, F)
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph')
The vertices in the second plot should match layout of those in the first.
Unfortunately, you set the random seed after you generated the graph,
so we cannot exactly reproduce your result. I will use the same code but
with set.seed before the graph generation. This makes the result look
different than yours, but will be reproducible.
When I run your code, I do not see exactly the same problem as you are
showing.
Your code (with set.seed moved and scales added)
library(igraph)
library(scales) # for rescale function
# make graph
set.seed(123)
g <- barabasi.game(25)
# make graph and set some aestetics
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
## V(g)$names = 1:25
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sort(sample(1:25, 15, F))
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph', vertex.label=V(sub_g)$names)
When I run your code, both graphs have nodes in the same
positions. That is not what I see in the graph in your question.
I suggest that you run just this code and see if you don't get
the same result (nodes in the same positions in both graphs).
The only difference between the two graphs in my version is the
node labels. When you take the subgraph, it renumbers the nodes
from 1 to 15 so the labels on the nodes disagree. You can fix
this by storing the node labels in the graph before taking the
subgraph. Specifically, add V(g)$names = 1:25 immediately after
your statement E(g)$arrow.size = .1. Then run the whole thing
again, starting at set.seed(123). This will preserve the
original numbering as the node labels.
The graph looks slightly different because the new, sub-graph
does not take up all of the space and so is stretched to use
up the empty space.
Possible fast way around: draw the same graph, but color nodes and vertices that you dont need in color of your background. Depending on your purposes it can suit you.

How to add labels to original data given clustering result using hclust

Just say I have some unlabeled data which I know should be clustered into six catergories, like for example this dataset:
library(tidyverse)
ts <- read_table(url("http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_control.data"), col_names = FALSE)
If I create an hclust object with a sample of 60 from the original dataset like so:
n <- 10
s <- sample(1:100, n)
idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)
ts.samp <- ts[idx,]
observedLabels <- c(rep(1,n), rep(2,n), rep(3,n), rep(4,n), rep(5,n), rep(6,n))
# compute DTW distances
library(dtw)#Dynamic Time Warping (DTW)
distMatrix <- dist(ts.samp, method= 'DTW')
# hierarchical clustering
hc <- hclust(distMatrix, method='average')
I know that I can then add the labels to the dendrogram for viewing like this:
observedLabels <- c(rep(1,), rep(2,n), rep(3,n), rep(4,n), rep(5,n), rep(6,n))
plot(hc, labels=observedLabels, main="")
However, I would like to the correct labels to the initial data frame that was clustered. So for ts.samp I would like to add a extra column with the correct label that each observation has been clustered into.
It would seems that ts.samp$cluster <- hc$label should add the cluster to the data frame, however hc$label returns NULL.
Can anyone help with extracting this information?
You need to define a level where you cut your dendrogram, this will form the groups.
Use:
labels <- cutree(hc, k = 3) # you set the number of k that's more appropriate, see how to read a dendrogram
ts.samp$grouping <- labels
Let's look at the dendrogram in order to find the best number for k:
plot(hc, main="")
abline(h=500, col = "red") # cut at height 500 forms 2 groups
abline(h=300, col = "blue") # cut at height 300 forms 3/4 groups
It looks like either 2 or 3 might be good. You need to find the highest jump in the vertical lines (Height).
Use the horizontal lines at that height and count the cluster "formed".

Highlight Subset Cells From Heatmap By Row/Col Index

I'm trying to visually inspect, and extract, subsets of large heatmaps. For example, I'd like to roughly extract the row/col indices for clusters like the one I circled below:
Following the advice from here, I hope to achieve this by creating rectangles around subsets of cells by index and repeat until I've highlighted areas close enough to what I want.
Using some simpler data, I tried this:
library(gplots)
set.seed(100)
# Input data 4x5 matrix
nx <- 5
ny <- 4
dat <- matrix(runif(20, 1, 10), nrow=ny, ncol=nx)
# Get hierarchically clustered heatmap matrix
hm <- heatmap.2(dat, main="Test HM", key=T, trace="none")
hmat <- dat[rev(hm$rowInd), hm$colInd]
# Logical matrix with the same dimensions as our data
# indicating which cells I want to subset
selection <- matrix(rep(F,20), nrow=4)
# For example: the third row
selection[3,] <- T
#selection <- dat>7 # Typical subsets like this don't work either
# Function for making selection rectangles around selection cells
makeRects <- function(cells){
coords = expand.grid(1:nx,1:ny)[cells,]
xl=coords[,1]-0.49
yb=coords[,2]-0.49
xr=coords[,1]+0.49
yt=coords[,2]+0.49
rect(xl,yb,xr,yt,border="black",lwd=3)
}
# Re-make heatmap with rectangles based on the selection
# Use the already computed heatmap matrix and don't recluster
heatmap.2(hmat, main="Heatmap - Select 3rd Row", key=T, trace="none",
dendrogram="none", Rowv=F, Colv=F,
add.expr={makeRects(selection)})
This does not work. Here is the result. Instead of the third row being highlighted, we see a strange pattern:
It must have to do with this line:
coords = expand.grid(1:nx,1:ny)[cells,]
# with parameters filled...
coords = expand.grid(1:5,1:4)[selection,]
Can anyone explain what's going on here? I'm not sure why my subset isn't working even though it is similar to the one in the other question.
Very close. I think you made a typo in the makeRects() function. In my hands, it works with a few changes.
# Function for making selection rectangles around selection cells
makeRects <- function(cells){
coords = expand.grid(ny:1, 1:nx)[cells,]
xl=coords[,2]-0.49
yb=coords[,1]-0.49
xr=coords[,2]+0.49
yt=coords[,1]+0.49
rect(xl,yb,xr,yt,border="black",lwd=3)
}
# Re-make heatmap with rectangles based on the selection
# Use the already computed heatmap matrix and don't recluster
heatmap.2(hmat, main="Heatmap - Select 3rd Row", key=T, trace="none",
dendrogram="none", Rowv=F, Colv=F,
add.expr={makeRects(selection)})

How can I extract the matrix derived from a heatmap created with gplots after hierarchical clustering?

I am making a heatmap, but I can't assign the result in a variable to check the result before plotting. Rstudio plot it automatically. I would like to get the list of rownames in the order of the heatmap. I'am not sure if this is possible. I'am using this code:
hm <- heatmap.2( assay(vsd)[ topVarGenes, ], scale="row",
trace="none", dendrogram="both",
col = colorRampPalette( rev(brewer.pal(9, "RdBu")) )(255),
ColSideColors = c(Controle="gray", Col1.7G2="darkgreen", JG="blue", Mix="orange")[
colData(vsd)$condition ] )
You can assign the plot to an object. The plot will still be drawn in the plot window, however, you'll also get a list with all the data for each plot element. Then you just need to extract the desired plot elements from the list. For example:
library(gplots)
p = heatmap.2(as.matrix(mtcars), dendrogram="both", scale="row")
p is a list with all the elements of the plot.
p # Outputs all the data in the list; lots of output to the console
str(p) # Struture of p; also lots of output to the console
names(p) # Names of all the list elements
p$rowInd # Ordering of the data rows
p$carpet # The heatmap values
You'll see all the other values associated with the dendrogram and the heatmap if you explore the list elements.
To others out there, a more complete description way to capture a matrix representation of the heatmap created by gplots:
matrix_map <- p$carpet
matrix_map <- t(matrix_map)

Resources