arranging matrix - network graphs - r
I was trying to make a network graph, using the function gplot from library(sna). The graph would represent the links between different fields.
I have the following data:
MTM <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1)
FI <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
MCLI <- c(0,0,1,0,0,1,1,1,0,0,0,0,1,0,1,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1)
mat1 <- data.frame(MTM,FI,MCLI)
mat1 <- as.matrix(mat1)
Where "MTM", "FI" and "MCLI" are the "fields of interest" and every row is a different project that has some/any/none of the fields in common.
How could I transform these data to look like this?
matx:
MTM FI MCLI
MTM 10 0 1
FI 0 1 1
MCLI 10 1 17
I am interested in representing -in a network graph- the fields as "nodes", and the connections as "edges". This could be helpful in representing the most "popular" and interconnected fields. Is it possible with these data?
Thanks in advance!
EDIT: I came across this solution, which could be OK for what I want:
library(igraph)
G<-graph.incidence(as.matrix(mat1),weighted=TRUE,directed=FALSE)
summary(G)
plot(G)
Here is one way to make a network graph from your data where each node is a "field of interest". Note that I have made a symmetrical adjacency matrix from your original data that doesn't entirely match your desired matrix output.
library(igraph)
# Use matrix multiplication to create symmetrical adjacency matrix.
adj_mat = t(mat1) %*% (mat1)
# Two ways to show edge weights.
png("igraphs.png", width=10, height=5, units="in", res=200)
par(mfrow=c(1, 2))
g1 = graph.adjacency(adj_mat, mode="undirected", diag=FALSE, weighted=TRUE)
plot(g1, edge.width=E(g1)$weight, vertex.size=50)
g2 = graph.adjacency(adj_mat, mode="undirected", diag=FALSE)
plot(g2, vertex.size=50)
dev.off()
Related
How to draw graph of relationships from binary matrix
I'm trying to create a graph with relationships of people involved in the 9/11 attacks, but I don't understand the input very much. I use loops to group the hijackers (hijacker1 knows hijacker2; hijacker5 knows hijacker3 etc.) but it doesn't work for me. The result of my work should be a relationship graph as on this page: LINK I use data in csv format: Data to download The data schema looks like the screenshots below. There are three files available, but if I understand correctly to get what I want, enough data from the first file (?) Hijackers ASSOCIATES Hijackers ATTR Hijackers PRIORITY_CONTACT Hname1 HName2 HName3 HName1 0 1 0 HName2 1 0 1 HName3 0 1 0 ... I would like to draw a relationship diagram and extract information about which of the hijackers had the most relationships (Should I use betweenness() from igraph library?).
Here's an approach with igraph: First, let's grab the data and make it into an adjacency matrix: temp <- tempfile(fileext = ".zip") download.file("https://sites.google.com/site/ucinetsoftware/datasets/covert-networks/911hijackers/9%2011%20Hijackers%20CSV.zip?attredirects=0&d=1", temp, mode = "wb") data <- read.csv(unz(temp,"CSV/9_11_HIJACKERS_ASSOCIATES.csv")) my.rownames <- data$X data2 <- sapply(data[,-1], as.numeric) rownames(data2) <- my.rownames Adj <- as.matrix(data2) Now the easy parts. We can convert the adjacency matrix into an igraph graph, compute vertex degree and add that data to to the graph. library(igraph) Graph <- graph_from_adjacency_matrix(Adj) V(Graph)$vertex_degree <- degree(Graph) Finally we can plot the graph with the vertex size being proportional to the degree: plot.igraph(Graph, vertex.size = V(Graph)$vertex_degree, layout=layout.fruchterman.reingold, main="Hijacker Relationships")
revealing clusters of interaction in igraph
I have an interaction network and I used the following code to make an adjacency matrix and subsequently calculate the dissimilarity between the nodes of the network and then cluster them to form modules: ADJ1=abs(adjacent-mat)^6 dissADJ1<-1-ADJ1 hierADJ<-hclust(as.dist(dissADJ1), method = "average") Now I would like those modules to appear when I plot the igraph. g<-simplify(graph_from_adjacency_matrix(adjacent-mat, weighted=T)) plot.igraph(g) However the only thing that I have found thus far to translate hclust output to graph is as per the following tutorial: http://gastonsanchez.com/resources/2014/07/05/Pretty-tree-graph/ phylo_tree = as.phylo(hierADJ) graph_edges = phylo_tree$edge graph_net = graph.edgelist(graph_edges) plot(graph_net) which is useful for hierarchical lineage but rather I just want the nodes that closely interact to cluster as follows: Can anyone recommend how to use a command such as components from igraph to get these clusters to show?
igraph provides a bunch of different layout algorithms which are used to place nodes in the plot. A good one to start with for a weighted network like this is the force-directed layout (implemented by layout.fruchterman.reingold in igraph). Below is a example of using the force-directed layout using some simple simulated data. First, we create some mock data and clusters, along with some "noise" to make it more realistic: library('dplyr') library('igraph') library('RColorBrewer') set.seed(1) # generate a couple clusters nodes_per_cluster <- 30 n <- 10 nvals <- nodes_per_cluster * n # cluster 1 (increasing) cluster1 <- matrix(rep((1:n)/4, nodes_per_cluster) + rnorm(nvals, sd=1), nrow=nodes_per_cluster, byrow=TRUE) # cluster 2 (decreasing) cluster2 <- matrix(rep((n:1)/4, nodes_per_cluster) + rnorm(nvals, sd=1), nrow=nodes_per_cluster, byrow=TRUE) # noise cluster noise <- matrix(sample(1:2, nvals, replace=TRUE) + rnorm(nvals, sd=1.5), nrow=nodes_per_cluster, byrow=TRUE) dat <- rbind(cluster1, cluster2, noise) colnames(dat) <- paste0('n', 1:n) rownames(dat) <- c(paste0('cluster1_', 1:nodes_per_cluster), paste0('cluster2_', 1:nodes_per_cluster), paste0('noise_', 1:nodes_per_cluster)) Next, we can use Pearson correlation to construct our adjacency matrix: # create correlation matrix cor_mat <- cor(t(dat)) # shift to [0,1] to separate positive and negative correlations adj_mat <- (cor_mat + 1) / 2 # get rid of low correlations and self-loops adj_mat <- adj_mat^3 adj_mat[adj_mat < 0.5] <- 0 diag(adj_mat) <- 0 Cluster the data using hclust and cutree: # convert to dissimilarity matrix and cluster using hclust dissim_mat <- 1 - adj_mat dend <- dissim_mat %>% as.dist %>% hclust clusters = cutree(dend, h=0.65) # color the nodes pal = colorRampPalette(brewer.pal(11,"Spectral"))(length(unique(clusters))) node_colors <- pal[clusters] Finally, create an igraph graph from the adjacency matrix and plot it using the fruchterman.reingold layout: # create graph g <- graph.adjacency(adj_mat, mode='undirected', weighted=TRUE) # set node color and plot using a force-directed layout (fruchterman-reingold) V(g)$color <- node_colors coords_fr = layout.fruchterman.reingold(g, weights=E(g)$weight) # igraph plot options igraph.options(vertex.size=8, edge.width=0.75) # plot network plot(g, layout=coords_fr, vertex.color=V(g)$color) In the above code, I generated two "clusters" of correlated rows, and a third group of "noise". Hierarchical clustering (hclust + cuttree) is used to assign the data points to clusters, and they are colored based on cluster membership. The result looks like this: For some more examples of clustering and plotting graphs with igraph, checkout: http://michael.hahsler.net/SMU/LearnROnYourOwn/code/igraph.html
You haven't shared some toy data for us to play with and suggest improvements to code, but your question states that you are only interested in plotting your clusters distinctly - that is, graphical presentation. Although igraph comes with some nice force directed layout algorithms, such as layout.fruchterman.reingold, layout_with_kk, etc., they can, in presence of a large number of nodes, quickly become difficult to interpret and make sense of at all. Like this: With these traditional methods of visualising networks, the layout algorithms, rather than the data, determine the visualisation similar networks may end up being visualised very differently large number of nodes will make the visualisation difficult to interpret Instead, I find Hive Plots to be better at displaying important network properties, which, in your instance, are the cluster and the edges. In your case, you can: Plot each cluster on a different straight line order the placement of nodes intelligently, so that nodes with certain properties are placed at the very end or start of each straight line Colour the edges to identify direction of edge To achieve this you will need to: use the ggnetwork package to turn your igraph object into a dataframe map your clusters to the nodes present in this dataframe generate coordinates for the straight lines and map these to each cluster use ggplot to visualise There is also a hiveR package in R, should you wish to use a packaged solution. You might also find another visualisation technique for graphs very useful: BioFabric
How to randomly select 2 vertices from a graph in R?
I'm new to R, and I'm trying to randomly select 2 vertices from a graph. What I've done so far is: First, set up a graph edgePath <- "./project1/data/smalledges.csv" edgesMatrix <- as.matrix(read.csv(edgePath, header = TRUE, colClasses = "character")) graph <- graph.edgelist(edgesMatrix) The smalledges.csv is a file look like this: from to 4327231 2587908 Then I get all the vertices from the graph into a list: vList <- as.list(get.data.frame(graph, what = c("vertices"))) After that, I try to use: sample(vList, 2) But what I've got is an error: cannot take a sample larger than the population when 'replace = FALSE' I guess it's because R thinks what I want is 2 random lists, so I tried this: sample(vList, 2, replace = TRUE) And then I've got 2 large lists... BUT THAT'S NOT WHAT I WANTED! So guys, how can I randomly select 2 vertices from my graph? Thanks!
Not clear from your question whether you want just the vertices, or a sub-graph containing those vertices. Here's an example of both. library(igraph) set.seed(1) # for reproducible example g <- erdos.renyi.game(10, 0.3) par(mfrow=c(1,3), mar=c(1,1,1,1)) set.seed(1) # for reproducible plot plot(g) # random sample of vertices smpl <- sample(1:vcount(g),5) V(g)[smpl] # 5 random vertices # Vertex sequence: # [1] 9 5 7 2 4 # change the color of only those vertices V(g)[smpl]$color="lightgreen" # make them light green set.seed(1) # for reproducible plot plot(g) # create a sub-graph with only those vertices, retaining edge structure sub.g <- induced.subgraph(g,V(g)[smpl]) plot(sub.g)
Generating network from two matrix with different color in R
I have two matrices m1 and m2 as follows: m1<-matrix(sample(0:1,36,replace=TRUE),nc=6); diag(m1)<-0 row.names(m1) <- c(paste0("X",1:6)) colnames(m1) <- c(paste0("X",1:6)) m2<-matrix(sample(0:1,36,replace=TRUE),nc=6); diag(m2)<-0 row.names(m2) <- c(paste0("X",1:6)) colnames(m2) <- c(paste0("X",1:6)) where value 1 represent a connection between corresponding variables. I can easily generate a network for one of such matrix using packages like igraph. However, I am interested in generating a differential network using both matrix such that the edges are colored. It would be like overlapping two networks such that 'green' represent edges that are both m1 and m2, 'blue' represent edges that are in m2 but not in m1, and black edges represent connection in m1 but not in m1. Is there some command for generating such network, similar to 'lines/points' for simple overlapping plot, in R?
First, we create an edge list from your adjacency matrices. We do this my adding them together in such a way that those edges only in m1 have a value of 1, those only in m2 have a value of 2, and those in both have a value of 3. m3<-m1+2*m2 el<-subset(as.data.frame(as.table({m3[lower.tri(m3)]<-0; m3})), Freq>0) names(el)<-c("V1","V2","CON") Now we can create a graph object and color in the edges using the values we encoded above library(igraph) gg<-graph.data.frame(el, directed=F) plot(gg, edge.color=c("black","blue","green")[E(gg)$CON]) This will give you the following plot
library(igraph) ml <- list(m1,m2) gl <- lapply(ml,graph.adjacency, weighted=T) g <- do.call("+",gl) cols <- c("black","blue","green") colors <- cols[as.numeric(!is.na(E(g)$weight_1)) + 2*(!is.na(E(g)$weight_2))] set.seed(1) # for reproducible graph plot(g, edge.color=colors) Your matrices m1 and m2 are not symmetrical, which implies a directed graph. So in this case the graph shows that m1 has a connection from X2 to X4, whereas m2 has a connection from X4 to X2. This approach takes advantage of the fact that you can add graphs, and that the result has edges with attributes from both of the original graphs (here, weight_1 and weight_2). It also deals with the possibility that the weights are not 0 or 1.
2nd Degree Connections in igraph
I think have this working correctly, but I am looking to mimic something similar to Facebook's Friend suggestion. Simply, I am looking to find 2nd degree connections (friends of your friends that you do not have a connection with). I do want to keep this as a directed graph and identify the 2nd degree outward connections (the people your friends connect to). I believe my dummy code is achieving this, but since the reference is on indices and not vertex labels, I was hoping you could help me modify the code to return useable names. ### create some fake data library(igraph) from <- sample(LETTERS, 50, replace=T) to <- sample(LETTERS, 50, replace=T) rel <- data.frame(from, to) head(rel) ### lets plot the data g <- graph.data.frame(rel) summary(g) plot(g, vertex.label=LETTERS, edge.arrow.size=.1) ## find the 2nd degree connections d1 <- unlist(neighborhood(g, 1, nodes="F", mode="out")) d2 <- unlist(neighborhood(g, 2, nodes="F", mode="out")) d1;d2; setdiff(d2,d1) Returns > setdiff(d2,d1) [1] 13 Any help you can provide will be great. Obviously I am looking to stay within R.
You can index back into the graph vertices like: > V(g)[setdiff(d2,d1)] Vertex sequence: [1] "B" "W" "G" Also check out ?V for ways to get at this type of info through direct indexing.
You can use the adjacency matrix $G$ of the graph $g$ (no latex here?). One of the properties of the adjacency matrix is that its nth power gives you the number of $n$-walks (paths of length n). G <- get.adjacency(g) G2 <- G %*% G # G2 contains 2-walks diag(G2) <- 0 # take out loops G2[G2!=0] <- 1 # normalize G2, not interested in multiplicity of walks g2 <- graph.adjacency(G2) An edge in graph g2 represents a "friend-of-a-friend" bond.