arranging matrix - network graphs - r

I was trying to make a network graph, using the function gplot from library(sna). The graph would represent the links between different fields.
I have the following data:
MTM <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1)
FI <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
MCLI <- c(0,0,1,0,0,1,1,1,0,0,0,0,1,0,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1)
mat1 <- data.frame(MTM,FI,MCLI)
mat1 <- as.matrix(mat1)
Where "MTM", "FI" and "MCLI" are the "fields of interest" and every row is a different project that has some/any/none of the fields in common.
How could I transform these data to look like this?
matx:
MTM FI MCLI
MTM 10 0 1
FI 0 1 1
MCLI 10 1 17
I am interested in representing -in a network graph- the fields as "nodes", and the connections as "edges". This could be helpful in representing the most "popular" and interconnected fields. Is it possible with these data?
Thanks in advance!
EDIT: I came across this solution, which could be OK for what I want:
library(igraph)
G<-graph.incidence(as.matrix(mat1),weighted=TRUE,directed=FALSE)
summary(G)
plot(G)

Here is one way to make a network graph from your data where each node is a "field of interest". Note that I have made a symmetrical adjacency matrix from your original data that doesn't entirely match your desired matrix output.
library(igraph)
# Use matrix multiplication to create symmetrical adjacency matrix.
adj_mat = t(mat1) %*% (mat1)
# Two ways to show edge weights.
png("igraphs.png", width=10, height=5, units="in", res=200)
par(mfrow=c(1, 2))
g1 = graph.adjacency(adj_mat, mode="undirected", diag=FALSE, weighted=TRUE)
plot(g1, edge.width=E(g1)$weight, vertex.size=50)
g2 = graph.adjacency(adj_mat, mode="undirected", diag=FALSE)
plot(g2, vertex.size=50)
dev.off()

Related

How to draw graph of relationships from binary matrix

I'm trying to create a graph with relationships of people involved in the 9/11 attacks, but I don't understand the input very much. I use loops to group the hijackers (hijacker1 knows hijacker2; hijacker5 knows hijacker3 etc.) but it doesn't work for me.
The result of my work should be a relationship graph as on this page: LINK
I use data in csv format: Data to download
The data schema looks like the screenshots below. There are three files available, but if I understand correctly to get what I want, enough data from the first file (?)
Hijackers ASSOCIATES
Hijackers ATTR
Hijackers PRIORITY_CONTACT
Hname1 HName2 HName3
HName1 0 1 0
HName2 1 0 1
HName3 0 1 0
...
I would like to draw a relationship diagram and extract information about which of the hijackers had the most relationships (Should I use betweenness() from igraph library?).
Here's an approach with igraph:
First, let's grab the data and make it into an adjacency matrix:
temp <- tempfile(fileext = ".zip")
download.file("https://sites.google.com/site/ucinetsoftware/datasets/covert-networks/911hijackers/9%2011%20Hijackers%20CSV.zip?attredirects=0&d=1",
temp,
mode = "wb")
data <- read.csv(unz(temp,"CSV/9_11_HIJACKERS_ASSOCIATES.csv"))
my.rownames <- data$X
data2 <- sapply(data[,-1], as.numeric)
rownames(data2) <- my.rownames
Adj <- as.matrix(data2)
Now the easy parts. We can convert the adjacency matrix into an igraph graph, compute vertex degree and add that data to to the graph.
library(igraph)
Graph <- graph_from_adjacency_matrix(Adj)
V(Graph)$vertex_degree <- degree(Graph)
Finally we can plot the graph with the vertex size being proportional to the degree:
plot.igraph(Graph,
vertex.size = V(Graph)$vertex_degree,
layout=layout.fruchterman.reingold, main="Hijacker Relationships")

revealing clusters of interaction in igraph

I have an interaction network and I used the following code to make an adjacency matrix and subsequently calculate the dissimilarity between the nodes of the network and then cluster them to form modules:
ADJ1=abs(adjacent-mat)^6
dissADJ1<-1-ADJ1
hierADJ<-hclust(as.dist(dissADJ1), method = "average")
Now I would like those modules to appear when I plot the igraph.
g<-simplify(graph_from_adjacency_matrix(adjacent-mat, weighted=T))
plot.igraph(g)
However the only thing that I have found thus far to translate hclust output to graph is as per the following tutorial: http://gastonsanchez.com/resources/2014/07/05/Pretty-tree-graph/
phylo_tree = as.phylo(hierADJ)
graph_edges = phylo_tree$edge
graph_net = graph.edgelist(graph_edges)
plot(graph_net)
which is useful for hierarchical lineage but rather I just want the nodes that closely interact to cluster as follows:
Can anyone recommend how to use a command such as components from igraph to get these clusters to show?
igraph provides a bunch of different layout algorithms which are used to place nodes in the plot.
A good one to start with for a weighted network like this is the force-directed layout (implemented by layout.fruchterman.reingold in igraph).
Below is a example of using the force-directed layout using some simple simulated data.
First, we create some mock data and clusters, along with some "noise" to make it more realistic:
library('dplyr')
library('igraph')
library('RColorBrewer')
set.seed(1)
# generate a couple clusters
nodes_per_cluster <- 30
n <- 10
nvals <- nodes_per_cluster * n
# cluster 1 (increasing)
cluster1 <- matrix(rep((1:n)/4, nodes_per_cluster) +
rnorm(nvals, sd=1),
nrow=nodes_per_cluster, byrow=TRUE)
# cluster 2 (decreasing)
cluster2 <- matrix(rep((n:1)/4, nodes_per_cluster) +
rnorm(nvals, sd=1),
nrow=nodes_per_cluster, byrow=TRUE)
# noise cluster
noise <- matrix(sample(1:2, nvals, replace=TRUE) +
rnorm(nvals, sd=1.5),
nrow=nodes_per_cluster, byrow=TRUE)
dat <- rbind(cluster1, cluster2, noise)
colnames(dat) <- paste0('n', 1:n)
rownames(dat) <- c(paste0('cluster1_', 1:nodes_per_cluster),
paste0('cluster2_', 1:nodes_per_cluster),
paste0('noise_', 1:nodes_per_cluster))
Next, we can use Pearson correlation to construct our adjacency matrix:
# create correlation matrix
cor_mat <- cor(t(dat))
# shift to [0,1] to separate positive and negative correlations
adj_mat <- (cor_mat + 1) / 2
# get rid of low correlations and self-loops
adj_mat <- adj_mat^3
adj_mat[adj_mat < 0.5] <- 0
diag(adj_mat) <- 0
Cluster the data using hclust and cutree:
# convert to dissimilarity matrix and cluster using hclust
dissim_mat <- 1 - adj_mat
dend <- dissim_mat %>%
as.dist %>%
hclust
clusters = cutree(dend, h=0.65)
# color the nodes
pal = colorRampPalette(brewer.pal(11,"Spectral"))(length(unique(clusters)))
node_colors <- pal[clusters]
Finally, create an igraph graph from the adjacency matrix and plot it using the fruchterman.reingold layout:
# create graph
g <- graph.adjacency(adj_mat, mode='undirected', weighted=TRUE)
# set node color and plot using a force-directed layout (fruchterman-reingold)
V(g)$color <- node_colors
coords_fr = layout.fruchterman.reingold(g, weights=E(g)$weight)
# igraph plot options
igraph.options(vertex.size=8, edge.width=0.75)
# plot network
plot(g, layout=coords_fr, vertex.color=V(g)$color)
In the above code, I generated two "clusters" of correlated rows, and a third group of "noise".
Hierarchical clustering (hclust + cuttree) is used to assign the data points to clusters, and they are colored based on cluster membership.
The result looks like this:
For some more examples of clustering and plotting graphs with igraph, checkout: http://michael.hahsler.net/SMU/LearnROnYourOwn/code/igraph.html
You haven't shared some toy data for us to play with and suggest improvements to code, but your question states that you are only interested in plotting your clusters distinctly - that is, graphical presentation.
Although igraph comes with some nice force directed layout algorithms, such as layout.fruchterman.reingold, layout_with_kk, etc., they can, in presence of a large number of nodes, quickly become difficult to interpret and make sense of at all.
Like this:
With these traditional methods of visualising networks,
the layout algorithms, rather than the data, determine the visualisation
similar networks may end up being visualised very differently
large number of nodes will make the visualisation difficult to interpret
Instead, I find Hive Plots to be better at displaying important network properties, which, in your instance, are the cluster and the edges.
In your case, you can:
Plot each cluster on a different straight line
order the placement of nodes intelligently, so that nodes with certain properties are placed at the very end or start of each straight line
Colour the edges to identify direction of edge
To achieve this you will need to:
use the ggnetwork package to turn your igraph object into a dataframe
map your clusters to the nodes present in this dataframe
generate coordinates for the straight lines and map these to each cluster
use ggplot to visualise
There is also a hiveR package in R, should you wish to use a packaged solution. You might also find another visualisation technique for graphs very useful: BioFabric

How to randomly select 2 vertices from a graph in R?

I'm new to R, and I'm trying to randomly select 2 vertices from a graph.
What I've done so far is:
First, set up a graph
edgePath <- "./project1/data/smalledges.csv"
edgesMatrix <- as.matrix(read.csv(edgePath, header = TRUE, colClasses = "character"))
graph <- graph.edgelist(edgesMatrix)
The smalledges.csv is a file look like this:
from to
4327231 2587908
Then I get all the vertices from the graph into a list:
vList <- as.list(get.data.frame(graph, what = c("vertices")))
After that, I try to use:
sample(vList, 2)
But what I've got is an error:
cannot take a sample larger than the population when 'replace = FALSE'
I guess it's because R thinks what I want is 2 random lists, so I tried this:
sample(vList, 2, replace = TRUE)
And then I've got 2 large lists... BUT THAT'S NOT WHAT I WANTED! So guys, how can I randomly select 2 vertices from my graph? Thanks!
Not clear from your question whether you want just the vertices, or a sub-graph containing those vertices. Here's an example of both.
library(igraph)
set.seed(1) # for reproducible example
g <- erdos.renyi.game(10, 0.3)
par(mfrow=c(1,3), mar=c(1,1,1,1))
set.seed(1) # for reproducible plot
plot(g)
# random sample of vertices
smpl <- sample(1:vcount(g),5)
V(g)[smpl] # 5 random vertices
# Vertex sequence:
# [1] 9 5 7 2 4
# change the color of only those vertices
V(g)[smpl]$color="lightgreen" # make them light green
set.seed(1) # for reproducible plot
plot(g)
# create a sub-graph with only those vertices, retaining edge structure
sub.g <- induced.subgraph(g,V(g)[smpl])
plot(sub.g)

Generating network from two matrix with different color in R

I have two matrices m1 and m2 as follows:
m1<-matrix(sample(0:1,36,replace=TRUE),nc=6);
diag(m1)<-0
row.names(m1) <- c(paste0("X",1:6))
colnames(m1) <- c(paste0("X",1:6))
m2<-matrix(sample(0:1,36,replace=TRUE),nc=6);
diag(m2)<-0
row.names(m2) <- c(paste0("X",1:6))
colnames(m2) <- c(paste0("X",1:6))
where value 1 represent a connection between corresponding variables.
I can easily generate a network for one of such matrix using packages like igraph. However, I am interested in generating a differential network using both matrix such that the edges are colored. It would be like overlapping two networks such that 'green' represent edges that are both m1 and m2, 'blue' represent edges that are in m2 but not in m1, and black edges represent connection in m1 but not in m1. Is there some command for generating such network, similar to 'lines/points' for simple overlapping plot, in R?
First, we create an edge list from your adjacency matrices. We do this my adding them together in such a way that those edges only in m1 have a value of 1, those only in m2 have a value of 2, and those in both have a value of 3.
m3<-m1+2*m2
el<-subset(as.data.frame(as.table({m3[lower.tri(m3)]<-0; m3})), Freq>0)
names(el)<-c("V1","V2","CON")
Now we can create a graph object and color in the edges using the values we encoded above
library(igraph)
gg<-graph.data.frame(el, directed=F)
plot(gg, edge.color=c("black","blue","green")[E(gg)$CON])
This will give you the following plot
library(igraph)
ml <- list(m1,m2)
gl <- lapply(ml,graph.adjacency, weighted=T)
g <- do.call("+",gl)
cols <- c("black","blue","green")
colors <- cols[as.numeric(!is.na(E(g)$weight_1)) + 2*(!is.na(E(g)$weight_2))]
set.seed(1) # for reproducible graph
plot(g, edge.color=colors)
Your matrices m1 and m2 are not symmetrical, which implies a directed graph. So in this case the graph shows that m1 has a connection from X2 to X4, whereas m2 has a connection from X4 to X2.
This approach takes advantage of the fact that you can add graphs, and that the result has edges with attributes from both of the original graphs (here, weight_1 and weight_2). It also deals with the possibility that the weights are not 0 or 1.

2nd Degree Connections in igraph

I think have this working correctly, but I am looking to mimic something similar to Facebook's Friend suggestion. Simply, I am looking to find 2nd degree connections (friends of your friends that you do not have a connection with). I do want to keep this as a directed graph and identify the 2nd degree outward connections (the people your friends connect to).
I believe my dummy code is achieving this, but since the reference is on indices and not vertex labels, I was hoping you could help me modify the code to return useable names.
### create some fake data
library(igraph)
from <- sample(LETTERS, 50, replace=T)
to <- sample(LETTERS, 50, replace=T)
rel <- data.frame(from, to)
head(rel)
### lets plot the data
g <- graph.data.frame(rel)
summary(g)
plot(g, vertex.label=LETTERS, edge.arrow.size=.1)
## find the 2nd degree connections
d1 <- unlist(neighborhood(g, 1, nodes="F", mode="out"))
d2 <- unlist(neighborhood(g, 2, nodes="F", mode="out"))
d1;d2;
setdiff(d2,d1)
Returns
> setdiff(d2,d1)
[1] 13
Any help you can provide will be great. Obviously I am looking to stay within R.
You can index back into the graph vertices like:
> V(g)[setdiff(d2,d1)]
Vertex sequence:
[1] "B" "W" "G"
Also check out ?V for ways to get at this type of info through direct indexing.
You can use the adjacency matrix $G$ of the graph $g$ (no latex here?). One of the properties of the adjacency matrix is that its nth power gives you the number of $n$-walks (paths of length n).
G <- get.adjacency(g)
G2 <- G %*% G # G2 contains 2-walks
diag(G2) <- 0 # take out loops
G2[G2!=0] <- 1 # normalize G2, not interested in multiplicity of walks
g2 <- graph.adjacency(G2)
An edge in graph g2 represents a "friend-of-a-friend" bond.

Resources