From a matrix of distances to a matrix of adjacency - r

I have a matrix of distances 1024x1024 with all the couple of distances between all the terms. I want to define a graph starting from that. So I defined a minimum spanning tree and I computed the matrix of adjacency on that.
My matrix of distances is distMat.
matrix_of_distances <- as.matrix(distMat)
myGraph <- graph.adjacency(matrix_of_distances, weighted=TRUE)
My graph is a graph with all the possible arcs (because the distances between all the couple of terms are a finite value). I need another graph, sparser:
mst <- as.undirected(minimum.spanning.tree(myGraph))
From that sparse graph I can compute the matrix of adjacency with:
adjacency <- as_adjacency_matrix(mst, type = c("both", "upper", "lower"), attr = NULL, edges = FALSE, names = TRUE, sparse =igraph_opt("sparsematrices"))
Now I want to create the matrix adjacency differently, passing another minimum spanning tree object. Suppose I have created another spanning tree:
spt <- spantree(matrix_of_distances)
If I do:
adjacency <- as_adjacency_matrix(spt, type = c("both", "upper", "lower"), attr = NULL, edges = FALSE, names = TRUE, sparse =igraph_opt("sparsematrices"))
I get the error:
Error in as_adjacency_matrix(spt, type = c("both", "upper", "lower"),
: Not a graph object
Again, I'm trying to generate a matrix of adjacency from a minimum spanning tree. How can I solve that?

The error comes from your using function as_adjacency_matrix on an object of class spantree when it expects an igraph.
Since you are using igraph, one simple solution would be to compute the minimum spanning tree from your original "distance graph" with igraph's function mst.
Here is how spantree computes the minimum spanning tree:
require(vegan)
data(dune)
dis <- vegdist(dune)
tr <- spantree(dis)
The result is the following tree (plot(tr, type="t"))
:
You could get the same result only with igraph functions:
library(igraph)
g <- graph.adjacency(as.matrix(dis), weighted=TRUE)
g_mst <- mst(g)
And the resulting tree looks like this (plot(g_mst, vertex.color=NA, vertex.size=10, edge.arrow.size=0.5)):
Once you have your igraph tree, you already know that you can transform it into an adjacency matrix with function as_adjacency_matrix:
A <- as_adjacency_matrix(mst)

Related

Plotting a dendrogram with actual data as 'height' of the tree, and not distances, in R

I want to generate a dendrogram out of an association matrix (i.e., a squared matrix that contains association data for each pair of individuals in my population). The dendrogram should show the association index (scale 0 to 1) on the y axis and individuals as leaf labels.
I did as follows:
#Loaded my data i.e., association matrix :
HWI<-as.matrix(read.csv2(HWI.csv))
#Computed distances:
data.dist=dist(HWI, method = "euclidean")
#Did the clustering using the 'average-linkage method':
data.hclust=hclust(data.dist,method="average")
#Plotted the dendrogram:
hcd <- as.dendrogram(data.hclust)
plot(hcd, type = "rectangle", xlab = "Associative distance between individuals",horiz = TRUE)
The problem is, the height of the dendrogram is the distance measure, on which the clustering is based, and not the association index.
Does anyone know how I can plot the dendrogram with the association index (original data file) as y-axis instead?
Thanks a million for your suggestions.
Eve

Color nodes of a graph created with igraph proportionally to eigenvector centrality of that node

I am using the library igraph in R. I have created an MST graph by using the function mst based on some distance function stored in a dataframe called tree:
gf <- graph_from_data_frame(tree, directed = FALSE)
mstgf <- mst(gf, weights = tree$distance)
I have calculated the eigenvector centrality of each node in the MST as:
ec <- eigen_centrality(mstgf, directed=T, weights=NA)$vector
I have then joined the vector of eigenvector centralities to the data.frame tree:
x <- cbind(names(ec), as.numeric(ec)) %>% as_tibble() %>% mutate(V2 = as.numeric(V2)) %>%
rename(from = V1)
tree <- tree %>% inner_join(.,x, by = "from")
What I want to do is to plot the MST by coloring the nodes in a way to resemble their eigenvector centrality. I am using the following for plotting, yet I don't know how to change the argument vertex.color to obtain something like the image below?
plot.igraph(mstgf,
vertex.color = round(tree$V2,0),
edge.color = "blue",
edge.curved = TRUE,
edge.witdh = 1,
)
Once you've calculated the centralities of your choice, you would like to 1) scale the values to a meaningful categorical range (like 1,2,3,4,5) and 2) associate your centrality categories with colors from a gradient. You don't necessarily have to keep joining and calculating outside igraph.
This is a random network
# Random network
g <- erdos.renyi.game(100,250,'gnm', directed=F)
1) make categories
This forces every eigenvector centrality to assume an integer value between 1 and 10
# Calculate eigen centrality and check the distribution We're attaching the
# result of eigen_centrality() straight onto the vertices as verticy-attributes
V(g)$ec <- eigen_centrality(g, directed=T, weights=NA)$vector
hist(V(g)$ec)
# You could use the scales package, or define this normalisation function:
normalize <- function(x){(x-min(x))/(max(x)-min(x))}
(V(g)$ec_index <- round(normalize(V(g)$ec) * 9) + 1)
#ec_index should now be a category between 1 and 10 for your centralities
You can use any resolution you like.
2) Attach colours from the indexation
There are several packages and ways to load colour-ranges in R (colorspace, colorRamps, RColorBrewer etc).
# Build a color-mapping with 10 categories and set the color of each
# node to the corresponding color of each centrality-measure category
V(g)$color <- colorRampPalette(c("turquoise", "yellow","red"))(10)[V(g)$ec_index]
# Look at what we did
table(V(g)$color)
plot(g, vertex.label=NA, vertex.size=5)
This example should produce something along the lines of this graph here:

Applying Graph Clustering Algorithms on the (famous) Iris data set

My question deals with the application of graph clustering algorithms. Most times, I see that graphs are made by using nodes and edges within the data. For example, suppose we have social media data: each individual in the data could be represented as a node and the relationship between individuals could be represented as edges. Using this information, we could build a graph and then perform graph clustering algorithms (e.g. Louvain Clustering) on this graph.
Sometimes, graphs can also be made using distances between points. Distances between points can be thought of as edges. For example, in the Spectral Clustering algorithm, a KNN (k nearest neighbor) graph is made from the data and then the K-Means clustering algorithm is performed on this graph.
My question is this: Suppose we take the famous Iris data and remove the response variable ("Species"). Would it make any sense to create a graph of this Iris data in which each node corresponds to an individual flower and edges correspond to pairwise Euclidean distances between each points? Assuming this is a logical and correct approach, could graph clustering algorithms be then performed on this Iris graph?
Below, I have attempted to first create a graph of the Iris data using pairwise Euclidean distances (in R). I then performed Louvain Clustering and Infomap Clustering on the resulting graph. After that, I attempted to create a KNN graph of the Iris data and perform MST (minimum spanning tree) clustering on this KNN graph, as well as perform Louvain Clustering.
Could someone please provide an opinion on what I have done? Is this intuitive and does it make mathematical sense? As a way of "cheating" - the Iris data only has 3 species. Thus, if a given clustering algorithm returns significantly more than 3 clusters, we know that the graph and/or the clustering algorithm may not be the best choice. However, in real applications, we are unable to know how many "true" classes exist within the data.
library(igraph)
library(network)
library(reshape2)
library(mstknnclust)
library(visNetwork)
library(cluster)
/****louvain clustering done on a distance based graph - maybe this is correct****/
x <- iris[,1:4]
dist <- daisy(x,
metric = "euclidean"
)
d_mat <- as.matrix(dist)
d_long <- melt(d_mat)
colnames(d_long) <- c("from", "to", "correlation")
d_mat_long <- d_long[which(d_long$correlation > .5),]
graph <- graph_from_data_frame(d_mat_long, directed = FALSE)
nodes <- as_data_frame(graph, what = "vertices")
colnames(nodes) <- "id"
nodes$label <- nodes$id
links <- as_data_frame(graph, what = "edges")
visNetwork(nodes, links) %>% visIgraphLayout(layout = "layout_with_fr")
cluster <- cluster_louvain(graph)
nodes$cluster <- cluster$membership
nodes$color <- ifelse(nodes$cluster == 1, "red", "blue")
visNetwork(nodes, links) %>% visIgraphLayout(layout = "layout_with_fr") %>% visOptions(selectedBy = "cluster") %>% visNodes(color = "color")
/***infomap and louvain clustering done a distance based graph but with a different algorithm: I think this is wrong***/
imc <- cluster_infomap(graph)
membership(imc)
communities(imc)
plot(imc, graph)
lc <- cluster_louvain(graph, weights = NULL)
membership(lc)
communities(lc)
plot(lc, graph)
/****mst spanning algorithm on the knn graph : based on the number of clusters I think this is wrong****/
cg <- generate.complete.graph(1:nrow(x),d_mat)
##Generates kNN graph
knn <- generate.knn(cg)
plot(knn$knn.graph,
main=paste("kNN \n k=", knn$k, sep=""))
results <- mst.knn(d_mat)
igraph::V(results$network)$label.cex <- seq(0.6,0.6,length.out=2)
plot(results$network, vertex.size=8,
vertex.color=igraph::clusters(results$network)$membership,
layout=igraph::layout.fruchterman.reingold(results$network, niter=10000),
main=paste("MST-kNN \n Clustering solution \n Number of clusters=",results$cnumber,sep="" ))
/*****louvain clustering and infomap done on the knn graph - maybe this is correct****/
#louvain
lc <- cluster_louvain(knn$knn.graph, weights = NULL)
membership(lc)
communities(lc)
plot(lc, knn$knn.graph)
imc <- cluster_infomap(knn$knn.graph)
membership(imc)
communities(imc)
plot(imc, knn$knn.graph)
"louvain clustering done on a distance based graph - maybe this is correct"
Not really, distance is used when graphing things like betweenness centrality. If your interest is similarity then convert distance to similarity.

How to spread out community graph made by using igraph package in R

Trying to find communities in tweet data. The cosine similarity between different words forms the adjacency matrix. Then, I created graph out of that adjacency matrix. Visualization of the graph is the task here:
# Document Term Matrix
dtm = DocumentTermMatrix(tweets)
### adjust threshold here
dtms = removeSparseTerms(dtm, 0.998)
dim(dtms)
# cosine similarity matrix
t = as.matrix(dtms)
# comparing two word feature vectors
#cosine(t[,"yesterday"], t[,"yet"])
numWords = dim(t)[2]
# cosine measure between all column vectors of a matrix.
adjMat = cosine(t)
r = 3
for(i in 1:numWords)
{
highElement = sort(adjMat[i,], partial=numWords-r)[numWords-r]
adjMat[i,][adjMat[i,] < highElement] = 0
}
# build graph from the adjacency matrix
g = graph.adjacency(adjMat, weighted=TRUE, mode="undirected", diag=FALSE)
V(g)$name
# remove loop and multiple edges
g = simplify(g)
wt = walktrap.community(g, steps=5) # default steps=2
table(membership(wt))
# set vertex color & size
nodecolor = rainbow(length(table(membership(wt))))[as.vector(membership(wt))]
nodesize = as.matrix(round((log2(10*membership(wt)))))
nodelayout = layout.fruchterman.reingold(g,niter=1000,area=vcount(g)^1.1,repulserad=vcount(g)^10.0, weights=NULL)
par(mai=c(0,0,1,0))
plot(g,
layout=nodelayout,
vertex.size = nodesize,
vertex.label=NA,
vertex.color = nodecolor,
edge.arrow.size=0.2,
edge.color="grey",
edge.width=1)
I just want to have some more gap between separate clusters/communities.
To the best of my knowledge, you can't layout vertices of the same community close to each other, using igraph only. I have implemented this function in my package NetPathMiner. It seems it is a bit hard to install the package just for the visualization function. I will write the a simple version of it here and explain what it does.
layout.by.attr <- function(graph, wc, cluster.strength=1,layout=layout.auto) {
g <- graph.edgelist(get.edgelist(graph)) # create a lightweight copy of graph w/o the attributes.
E(g)$weight <- 1
attr <- cbind(id=1:vcount(g), val=wc)
g <- g + vertices(unique(attr[,2])) + igraph::edges(unlist(t(attr)), weight=cluster.strength)
l <- layout(g, weights=E(g)$weight)[1:vcount(graph),]
return(l)
}
Basically, the function adds an extra vertex that is connected to all vertices belonging to the same community. The layout is calculated based on the new graph. Since each community is now connected by a common vertex, they tend to cluster together.
As Gabor said in the comment, increasing edge weights will also have similar effect. The function leverages this information, by increasing a cluster.strength, edges between created vertices and their communities are given higher weights.
If this is still not enough, you extend this principle (calculating the layout on a more connected graph) by adding edges between all vertices of the same communities (forming a clique). From my experience, this is a bit of an overkill.

Minimum spaning tree with Kruskal' algorithm

How i can calculate im R(3.0.0 - Linux x32) minimum spanning tree with Kruskal's algorithm?
I create an weighted full graph with igraph (0.6.5) library as folws:
set.seed(1234567890)
g <- graph.full(n = 20)
E(g)$weight <- round(runif(ecount(g)), 2) * 100
And i am able to calcutae the minimum spaning tree with Prim (igraph)
mstPrim <- minimum.spanning.tree(g, algorithm = "prim")
But unfortunaly doesn't in "igraph" Kruskal's algorithm implemented.
I can represent my genereted graph as a data.frame:
edgeMatrix <- data.frame(cbind(get.edgelist(g), E(g)$weight))
names(edgeMatrix) <- c("from", "to", "weight")
Is there a simple way to calculate mst with Kruskal's alogithm in R?
A small workaround with RBGL package:
#convert with graph packagege to BAM class of graph an calculate mst
mstKruskalBAM <- mstree.kruskal(graphBAM(edgeMatrix))
#build new data frame with resut
mstKruskalDF <- data.frame(cbind(t(mstKruskalBAM$edgeList),
t(mstKruskalBAM$weight)))
#convert back to igraph package
mstKruskal <- graph.data.frame(mstKruskalDF, directed=FALSE)
Now is it possible to plot and compare both aloriph with defining a layout algorithm like this:
plot(mstPrim, layout = layout.kamada.kawai, edge.label = E(mstPrim)$weight)
plot(mstKruskal, layout = layout.kamada.kawai, edge.label = mstKruskal$weight)
I think mst function in ape package implements this.
http://cran.r-project.org/web/packages/ape/ape.pdf

Resources