The plotting function in R igraph - r

I use igraph R to visualize chemical compounds. In this approach, nodes correspond to atoms and edges correspond to bonds between atoms. Such represented chemical molecule is known as a molecular graph or H-depleted graphs. For instance, to represent 2-methylbutane, 2mb for short, (simple organic molecule whose carbon skeleton consists of five atoms) I use the following R code:
molecular.graph.2mb = graph.formula(1-2,2-3,3-4,2-5)
To visualize this molecular graph I use as a layout the following matrix m:
m = matrix(c(1,0,2,0,3,0,4,0,2,1), nrow=5, byrow=TRUE).
This matrix contains coordinates of vertices of my graph. Then I plot this graph:
plot(molecular.graph.2mb, layout=m).
I obtain the graph where the geometrical distance between the vertices 2 and 5 is larger than between v1 and v2, v2 and v3 as well as v3 and v4.
My question is: How to force igraph R to plot the graph where all geometrical distances will be equal?

The main trick to doing this is to set the parameter rescale=FALSE and adjust the xlim. Doing that alone made the nodes very small so I needed to adjust the node size as well. I think this is what you are looking for.
library(igraph)
## Your graph
molecular.graph.2mb = graph.formula(1-2,2-3,3-4,2-5)
m = matrix(c(1,0,2,0,3,0,4,0,2,1), nrow=5, byrow=TRUE)
## Modified plot
plot(molecular.graph.2mb, layout=m, rescale=FALSE,
vertex.size=30, xlim=c(0.5,4.5))

Related

Plotting graph visualization of weighted Adjacency Matrix in Julia

I have a weighted directed adjacency matrix and I am trying to plot a graph visualization using PlotRecipes.jl. I can't seem to find a way to show both the nodes with labels and somehow represent the edge weights on the edges. I have tried...
graphplot(G, weights=weights,names=names)
where G is the adjacency matrix, weights are the edge weights and names is a list of names for the nodes, but I get this output
using just
graphplot(G,names=names)
I get the same output.
Is there any way to do this with PlotRecipes.jl or is there possibly another library?
You can represent the weights with color (using the line_z keyword), but not with line thickness at the moment. cf https://github.com/JuliaPlots/PlotRecipes.jl/issues/39

revealing clusters of interaction in igraph

I have an interaction network and I used the following code to make an adjacency matrix and subsequently calculate the dissimilarity between the nodes of the network and then cluster them to form modules:
ADJ1=abs(adjacent-mat)^6
dissADJ1<-1-ADJ1
hierADJ<-hclust(as.dist(dissADJ1), method = "average")
Now I would like those modules to appear when I plot the igraph.
g<-simplify(graph_from_adjacency_matrix(adjacent-mat, weighted=T))
plot.igraph(g)
However the only thing that I have found thus far to translate hclust output to graph is as per the following tutorial: http://gastonsanchez.com/resources/2014/07/05/Pretty-tree-graph/
phylo_tree = as.phylo(hierADJ)
graph_edges = phylo_tree$edge
graph_net = graph.edgelist(graph_edges)
plot(graph_net)
which is useful for hierarchical lineage but rather I just want the nodes that closely interact to cluster as follows:
Can anyone recommend how to use a command such as components from igraph to get these clusters to show?
igraph provides a bunch of different layout algorithms which are used to place nodes in the plot.
A good one to start with for a weighted network like this is the force-directed layout (implemented by layout.fruchterman.reingold in igraph).
Below is a example of using the force-directed layout using some simple simulated data.
First, we create some mock data and clusters, along with some "noise" to make it more realistic:
library('dplyr')
library('igraph')
library('RColorBrewer')
set.seed(1)
# generate a couple clusters
nodes_per_cluster <- 30
n <- 10
nvals <- nodes_per_cluster * n
# cluster 1 (increasing)
cluster1 <- matrix(rep((1:n)/4, nodes_per_cluster) +
rnorm(nvals, sd=1),
nrow=nodes_per_cluster, byrow=TRUE)
# cluster 2 (decreasing)
cluster2 <- matrix(rep((n:1)/4, nodes_per_cluster) +
rnorm(nvals, sd=1),
nrow=nodes_per_cluster, byrow=TRUE)
# noise cluster
noise <- matrix(sample(1:2, nvals, replace=TRUE) +
rnorm(nvals, sd=1.5),
nrow=nodes_per_cluster, byrow=TRUE)
dat <- rbind(cluster1, cluster2, noise)
colnames(dat) <- paste0('n', 1:n)
rownames(dat) <- c(paste0('cluster1_', 1:nodes_per_cluster),
paste0('cluster2_', 1:nodes_per_cluster),
paste0('noise_', 1:nodes_per_cluster))
Next, we can use Pearson correlation to construct our adjacency matrix:
# create correlation matrix
cor_mat <- cor(t(dat))
# shift to [0,1] to separate positive and negative correlations
adj_mat <- (cor_mat + 1) / 2
# get rid of low correlations and self-loops
adj_mat <- adj_mat^3
adj_mat[adj_mat < 0.5] <- 0
diag(adj_mat) <- 0
Cluster the data using hclust and cutree:
# convert to dissimilarity matrix and cluster using hclust
dissim_mat <- 1 - adj_mat
dend <- dissim_mat %>%
as.dist %>%
hclust
clusters = cutree(dend, h=0.65)
# color the nodes
pal = colorRampPalette(brewer.pal(11,"Spectral"))(length(unique(clusters)))
node_colors <- pal[clusters]
Finally, create an igraph graph from the adjacency matrix and plot it using the fruchterman.reingold layout:
# create graph
g <- graph.adjacency(adj_mat, mode='undirected', weighted=TRUE)
# set node color and plot using a force-directed layout (fruchterman-reingold)
V(g)$color <- node_colors
coords_fr = layout.fruchterman.reingold(g, weights=E(g)$weight)
# igraph plot options
igraph.options(vertex.size=8, edge.width=0.75)
# plot network
plot(g, layout=coords_fr, vertex.color=V(g)$color)
In the above code, I generated two "clusters" of correlated rows, and a third group of "noise".
Hierarchical clustering (hclust + cuttree) is used to assign the data points to clusters, and they are colored based on cluster membership.
The result looks like this:
For some more examples of clustering and plotting graphs with igraph, checkout: http://michael.hahsler.net/SMU/LearnROnYourOwn/code/igraph.html
You haven't shared some toy data for us to play with and suggest improvements to code, but your question states that you are only interested in plotting your clusters distinctly - that is, graphical presentation.
Although igraph comes with some nice force directed layout algorithms, such as layout.fruchterman.reingold, layout_with_kk, etc., they can, in presence of a large number of nodes, quickly become difficult to interpret and make sense of at all.
Like this:
With these traditional methods of visualising networks,
the layout algorithms, rather than the data, determine the visualisation
similar networks may end up being visualised very differently
large number of nodes will make the visualisation difficult to interpret
Instead, I find Hive Plots to be better at displaying important network properties, which, in your instance, are the cluster and the edges.
In your case, you can:
Plot each cluster on a different straight line
order the placement of nodes intelligently, so that nodes with certain properties are placed at the very end or start of each straight line
Colour the edges to identify direction of edge
To achieve this you will need to:
use the ggnetwork package to turn your igraph object into a dataframe
map your clusters to the nodes present in this dataframe
generate coordinates for the straight lines and map these to each cluster
use ggplot to visualise
There is also a hiveR package in R, should you wish to use a packaged solution. You might also find another visualisation technique for graphs very useful: BioFabric

How to create a legend for edge colors when plotting networks in R?

I have generated a connectivity matrix representing a network of geographical points connected by ocean currents. Each point releases particles that are received by the others. The number of particles released and received by each point is summarized in this square matrix. For example an element Aij of the matrix correspond to the amount of particles emitted by the ith point and received by the jth.
My purpose is to be able to plot this as a network such that each point constitutes a vertex and the connections between two points constitute an edge. I would like those edges to be of different colors according to the amount of particles exchanged. Those have to be marked by an arrow.
I could plot those points according to their geographic coordinates and I could plot those edges the way I wanted. My only concern is now how to add a legend relating the color of the edges with the amount of particles they represent.
Can anyone help me with that? Here is my code so far:
library(ggplot2)
library(plyr)
library(sp)
library(statnet)
connectivityMatrix <- as.matrix(read.table(file='settlementMatrix001920.dat'))
coordinates <- as.matrix(read.table(file='NoTakeReefs_center_LonLat.dat'))
net <- as.network(connectivityMatrix, matrix.type = "adjacency", directed = TRUE)
minX<-min(coordinates[,1])#-0.5
maxX<-max(coordinates[,1])#+0.5
minY<-min(coordinates[,2])#-0.5
maxY<-max(coordinates[,2])#+0.5
p<-plot(net, coord=coordinates,xlim=c(minX,maxX),ylim=c(minY,maxY),edge.col=connectivityMatrix,object.scale=0.01)
without having your real data, here as a sample example
matrixValues<-matrix(c(0,1,2,3,
0,0,0,0,
0,0,0,0,
0,0,0,0),ncol=4)
net<-as.network(matrixValues)
plot(net,edge.col=matrixValues)
# plot legend using non-zero values from matrix
legend(1,1,fill = unique(as.vector(matrixValues[matrixValues>0])),
legend=unique(as.vector(matrixValues[matrixValues>0])))
you may have to adjust the first two coordinate values in legend to draw it where you need on the plot. You could also construct your network slightly differently so that the values were loaded in from the matrix (see the ignore.eval argument to as.network(). In which case you would use edge.col='myValueName' for the plot command and get.edge.attribute(net,'myValueName') to feed the values into legend.

visualization for high-dimensional points in R

I have a centroid, e.g., A. and I have other 100 points. All of these points are of high-dimensions, e.g, 1000 dimensions. Is there a way to visualize these points in a two-dimensional space in-terms of their distance with A.
A common (though simple) way to visualize high-dimensional points in low dimensional space is to use some form of multi-dimensional scaling:
dat <- matrix(runif(1000*99),99,1000)
#Combine with "special" point
dat <- rbind(rep(0.1,1000),dat)
out <- cmdscale(dist(dat),k = 2)
#Plot everything, highlighting our "special" point
plot(out)
points(out[1,1],out[1,2],col = "red")
You can also check out isoMDS or sammon in the MASS package for other implementations in R.
The distance (by which I assume you mean the norm of the difference vector) is only 1 value, so you can calculate these norms and show them on a 1D plot, but for 2D you'll need a second parameter.

What techniques exists in R to visualize a "distance matrix"?

I wish to present a distance matrix in an article I am writing, and I am looking for good visualization for it.
So far I came across balloon plots (I used it here, but I don't think it will work in this case), heatmaps (here is a nice example, but they don't allow to present the numbers in the table, correct me if I am wrong. Maybe half the table in colors and half with numbers would be cool) and lastly correlation ellipse plots (here is some code and example - which is cool to use a shape, but I am not sure how to use it here).
There are also various clustering methods but they will aggregate the data (which is not what I want) while what I want is to present all of the data.
Example data:
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dist(nba[1:20, -1], )
I am open for ideas.
You could also use force-directed graph drawing algorithms to visualize a distance matrix, e.g.
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dist_m <- as.matrix(dist(nba[1:20, -1]))
dist_mi <- 1/dist_m # one over, as qgraph takes similarity matrices as input
library(qgraph)
jpeg('example_forcedraw.jpg', width=1000, height=1000, unit='px')
qgraph(dist_mi, layout='spring', vsize=3)
dev.off()
Tal, this is a quick way to overlap text over an heatmap. Note that this relies on image rather than heatmap as the latter offsets the plot, making it more difficult to put text in the correct position.
To be honest, I think this graph shows too much information, making it a bit difficult to read... you may want to write only specific values.
also, the other quicker option is to save your graph as pdf, import it in Inkscape (or similar software) and manually add the text where needed.
Hope this helps
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
dst <- dist(nba[1:20, -1],)
dst <- data.matrix(dst)
dim <- ncol(dst)
image(1:dim, 1:dim, dst, axes = FALSE, xlab="", ylab="")
axis(1, 1:dim, nba[1:20,1], cex.axis = 0.5, las=3)
axis(2, 1:dim, nba[1:20,1], cex.axis = 0.5, las=1)
text(expand.grid(1:dim, 1:dim), sprintf("%0.1f", dst), cex=0.6)
A Voronoi Diagram (a plot of a Voronoi Decomposition) is one way to visually represent a Distance Matrix (DM).
They are also simple to create and plot using R--you can do both in a single line of R code.
If you're not famililar with this aspect of computational geometry, the relationship between the two (VD & DM) is straightforward, though a brief summary might be helpful.
Distance Matrices--i.e., a 2D matrix showing the distance between a point and every other point, are an intermediate output during kNN computation (i.e., k-nearest neighbor, a machine learning algorithm which predicts the value of a given data point based on the weighted average value of its 'k' closest neighbors, distance-wise, where 'k' is some integer, usually between 3 and 5.)
kNN is conceptually very simple--each data point in your training set is in essence a 'position' in some n-dimension space, so the next step is to calculate the distance between each point and every other point using some distance metric (e.g., Euclidean, Manhattan, etc.). While the training step--i.e., construcing the distance matrix--is straightforward, using it to predict the value of new data points is practically encumbered by the data retrieval--finding the closest 3 or 4 points from among several thousand or several million scattered in n-dimensional space.
Two data structures are commonly used to address that problem: kd-trees and Voroni decompositions (aka "Dirichlet tesselation").
A Voronoi decomposition (VD) is uniquely determined by a distance matrix--i.e., there's a 1:1 map; so indeed it is a visual representation of the distance matrix, although again, that's not their purpose--their primary purpose is the efficient storage of the data used for kNN-based prediction.
Beyond that, whether it's a good idea to represent a distance matrix this way probably depends most of all on your audience. To most, the relationship between a VD and the antecedent distance matrix will not be intuitive. But that doesn't make it incorrect--if someone without any statistics training wanted to know if two populations had similar probability distributions and you showed them a Q-Q plot, they would probably think you haven't engaged their question. So for those who know what they are looking at, a VD is a compact, complete, and accurate representation of a DM.
So how do you make one?
A Voronoi decomp is constructed by selecting (usually at random) a subset of points from within the training set (this number varies by circumstances, but if we had 1,000,000 points, then 100 is a reasonable number for this subset). These 100 data points are the Voronoi centers ("VC").
The basic idea behind a Voronoi decomp is that rather than having to sift through the 1,000,000 data points to find the nearest neighbors, you only have to look at these 100, then once you find the closest VC, your search for the actual nearest neighbors is restricted to just the points within that Voronoi cell. Next, for each data point in the training set, calculate the VC it is closest to. Finally, for each VC and its associated points, calculate the convex hull--conceptually, just the outer boundary formed by that VC's assigned points that are farthest from the VC. This convex hull around the Voronoi center forms a "Voronoi cell." A complete VD is the result from applying those three steps to each VC in your training set. This will give you a perfect tesselation of the surface (See the diagram below).
To calculate a VD in R, use the tripack package. The key function is 'voronoi.mosaic' to which you just pass in the x and y coordinates separately--the raw data, not the DM--then you can just pass voronoi.mosaic to 'plot'.
library(tripack)
plot(voronoi.mosaic(runif(100), runif(100), duplicate="remove"))
You may want to consider looking at a 2-d projection of your matrix (Multi Dimensional Scaling). Here is a link to how to do it in R.
Otherwise, I think you are on the right track with heatmaps. You can add in your numbers without too much difficulty. For example, building of off Learn R :
library(ggplot2)
library(plyr)
library(arm)
library(reshape2)
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")
nba$Name <- with(nba, reorder(Name, PTS))
nba.m <- melt(nba)
nba.m <- ddply(nba.m, .(variable), transform,
rescale = rescale(value))
(p <- ggplot(nba.m, aes(variable, Name)) + geom_tile(aes(fill = rescale),
colour = "white") + scale_fill_gradient(low = "white",
high = "steelblue")+geom_text(aes(label=round(rescale,1))))
A dendrogram based on a hierarchical cluster analysis can be useful:
http://www.statmethods.net/advstats/cluster.html
A 2-D or 3-D multidimensional scaling analysis in R:
http://www.statmethods.net/advstats/mds.html
If you want to go into 3+ dimensions, you might want to explore ggobi / rggobi:
http://www.ggobi.org/rggobi/
In the book "Numerical Ecology" by Borcard et al. 2011 they used a function called *coldiss.r *
you can find it here: http://ichthyology.usm.edu/courses/multivariate/coldiss.R
it color codes the distances and even orders the records by dissimilarity.
another good package would be the seriation package.
Reference:
Borcard, D., Gillet, F. & Legendre, P. (2011) Numerical Ecology with R. Springer.
A solution using Multidimensional Scaling
data = read.csv("http://datasets.flowingdata.com/ppg2008.csv", sep = ",")
dst = tcrossprod(as.matrix(data[,-1]))
dst = matrix(rep(diag(dst), 50L), ncol = 50L, byrow = TRUE) +
matrix(rep(diag(dst), 50L), ncol = 50L, byrow = FALSE) - 2*dst
library(MASS)
mds = isoMDS(dst)
#remove {type = "n"} to see dots
plot(mds$points, type = "n", pch = 20, cex = 3, col = adjustcolor("black", alpha = 0.3), xlab = "X", ylab = "Y")
text(mds$points, labels = rownames(data), cex = 0.75)

Resources