R: How to Efficiently Visualize a Large Graph Network - r

I simulated some graph network data (~10,000 observations) in R and tried to visualize it using the visNetwork library in R. However, the data is very cluttered and is very difficult to analyze visually (I understand that in real life, network data is meant to be analyzed using graph query language).
For the time being, is there anything I can do to improve the visualization of the graph network I created (so I can explore some of the linkages and nodes that are all piled on top of each other)?
Can libraries such as 'networkD3' and 'diagrammeR' be used to better visualize this network?
I have attached my reproducible code below:
library(igraph)
library(dplyr)
library(visNetwork)
#create file from which to sample from
x5 <- sample(1:10000, 10000, replace=T)
#convert to data frame
x5 = as.data.frame(x5)
#create first file (take a random sample from the created file)
a = sample_n(x5, 9000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 9000)
#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")
graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)
graph
plot(graph)
library(visNetwork)
nodes <- data.frame(id = V(graph)$name, title = V(graph)$name)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(graph, what="edges")[1:2]
visNetwork(nodes, edges) %>% visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
visInteraction(navigationButtons = TRUE)
Thanks

At the request of the OP, I am applying the method used in a previous answer
Visualizing the result of dividing the network into communities to this problem.
The network in the question was not created with a specified random seed.
Here, I specify the seed for reproducibility.
## reproducible version of OP's network
library(igraph)
library(dplyr)
set.seed(1234)
#create file from which to sample from
x5 <- sample(1:10000, 10000, replace=T)
#convert to data frame
x5 = as.data.frame(x5)
#create first file (take a random sample from the created file)
a = sample_n(x5, 9000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 9000)
#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")
graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)
As noted by the OP, a simple plot is a mess. The referenced previous answer
broke this into two parts:
Plot all of the small components
Plot the giant component
1. Small components
Different components get different colors to help separate them.
## Visualize the small components separately
SmallV = which(components(graph)$membership != 1)
SmallComp = induced_subgraph(graph, SmallV)
LO_SC = layout_components(SmallComp, layout=layout_with_graphopt)
plot(SmallComp, layout=LO_SC, vertex.size=9, vertex.label.cex=0.8,
vertex.color=rainbow(18, alpha=0.6)[components(graph)$membership[SmallV]])
More could be done with this, but that is fairly easy and not the substance of the question, so I will leave this as the representation of the small components.
2. Giant component
Simply plotting the giant component is still hard to read. Here are two
approaches to improving the display. Both rely on grouping the vertices.
For this answer, I will use cluster_louvain to group the nodes, but you
could try other community detection methods. cluster_louvain produces 47
communities.
## Now try for the giant component
GiantV = which(components(graph)$membership == 1)
GiantComp = induced_subgraph(graph, GiantV)
GC_CL = cluster_louvain(GiantComp)
max(GC_CL$membership)
[1] 47
Giant method 1 - grouped vertices
Create a layout that emphasizes the communities
GC_Grouped = GiantComp
E(GC_Grouped)$weight = 1
for(i in unique(membership(GC_CL))) {
GroupV = which(membership(GC_CL) == i)
GC_Grouped = add_edges(GC_Grouped, combn(GroupV, 2), attr=list(weight=6))
}
set.seed(1234)
LO = layout_with_fr(GC_Grouped)
colors <- rainbow(max(membership(GC_CL)))
par(mar=c(0,0,0,0))
plot(GC_CL, GiantComp, layout=LO,
vertex.size = 5,
vertex.color=colors[membership(GC_CL)],
vertex.label = NA, edge.width = 1)
This provides some insight, but the many edges make it a bit hard to read.
Giant method 2 - contracted communities
Plot each community as a single vertex. The size of the vertex
reflects the number of nodes in that community. The color represents
the degree of the community node.
## Contract the communities in the giant component
CL.Comm = simplify(contract(GiantComp, membership(GC_CL)))
D = unname(degree(CL.Comm))
set.seed(1234)
par(mar=c(0,0,0,0))
plot(CL.Comm, vertex.size=sqrt(sizes(GC_CL)),
vertex.label=1:max(membership(GC_CL)), vertex.cex = 0.8,
vertex.color=round((D-29)/4)+1)
This is much cleaner, but loses any internal structure of the communities.

Just a tip for 'real-life'. The best way to deal with large graphs is to either 1) filter the edges you are using by some measure, or 2) use some related variable as weight.

Related

group variables and color by type R igraph

I have a graph where I need the vertices to be in name, with a different color due to the type of data (stock, forex and commodities)... I don't understand how to do it...
in this post igraph group vertices based on community something similar is done... I need not circles, only letters and that they have a different color according to the type of data that is...
library(Hmisc) # For correlation matrix
library(corrplot) # For correlation matrix
library("Spillover")
library(readxl)
library("xts")
library("zoo") #
library("ggplot2")
library("nets")
library("MASS")
library("igraph")
library("reshape") # For "melt" function/cluster network
library("writexl")
# We compute the dynamic interdependence. Direct edges denoted as Granger causality linkages
# We compute the contemporaneous interdependence. Indirect edges denoted as Partial correlation linkages
stock <- table[1:55, 1:55]
forex <- table[56:95, 56:95]
commodities <- table[96:116, 96:116]
dim(stock); dim(forex); dim(commodities)
View(stock);
View(forex);
View(commodities)
# Full network
network.spill <- graph.adjacency(table, mode='directed')
degree <- degree(network.spill) # number of adjacent edges
between <- betweenness(network.spill)
close <- closeness(network.spill, mode = "all")
autorsco <- authority.score(network.spill)$vector
eccentry <- eccentricity(network.spill, mode = "all")
measures <- cbind(as.matrix(degree), as.matrix(between), as.matrix(close), as.matrix(autorsco), as.matrix(eccentry))
View(measures)
V(network.spill)[which(network.spill_forex)]$color="red"
V(network.spill)$size <- round((degree-min(degree))/(max(degree)-min(degree))) # To create vertex
V(network.spill)$shape <- "sphere"
E(network.spill)$name=1:116
# For stock returns layout_nicely(network.spill)
par(mfcol = c(1, 1))
plot( network.spill, layout = layout_nicely(network.spill), vertex.color = c("gold"), vertex.label.cex=0.6,
vertex.size = autorsco*2,edge.curved = 0.2, edge.arrow.mode=0.5,edge.arrow.size=1.5) # To make the chart

Calculate alters density in a large graph in R

I'm working with a large social network in R (560120 ties). I want to calculate the local density of nodes, as well as the density of their alters.
I achieved the former with the following code snippet, using the package igraph.
g <- graph_from_data_frame(edgelist, directed = FALSE)
egonet_list <- make_ego_graph(g)
dat <- data.frame(
id = names(V(g),
egonet_density = lapply(egonet_list, graph.density) %>% unlist()
)
However, I run into memory troubles when I try to calculate the network of ego's alters. I try to run the following:
alter_list <- make_ego_graph(g, order = 2, mindist = 1)
It does work for smaller graphs, but with my network setup, it is eating up all of my RAM (>110GB) and crashing.
Does anyone have a suggestion how to solve this issue in a memory-friendly way?
You can calculate the local density for one node at a time without saving the alter graphs.
library(igraph)
library(purrr)
make_ego_graph(g, order = 2, nodes = 1, mindist = 1)
V(g) %>%
map_dbl(~ make_ego_graph(g, order = 2, nodes = .x, mindist = 1)[[1]] %>%
graph.density())
You can take the code within map and write a function called get_alter_density() and use lapply if you prefer.

How to subset a network graph keeping the top n components?

I have a disconnected network with many small components.
I would like to keep only those components are above the 75th percentile in size.
In using decompose a list of network is produced that cannot be plotted as one.
library(igraph)
set.seed(123)
g <- erdos.renyi.game(100, 0.02, directed = FALSE, loops = TRUE)
components(g)$csize
components <- which(components(g)$csize>=quantile(components(g)$csize,.75))
g_final <- igraph::decompose(g, max.comps = length(components), min.vertices = 2)
I think that what you really want to use is induced_subgraph.
I really dislike that you used the name of the function components as the name of a variable, so I have changed it here to Components.
Components <- which(components(g)$csize>=quantile(components(g)$csize,.75))
BigComp <-induced_subgraph(g,
which(components(g)$membership %in% Components))
plot(BigComp)

Phylogenetic tree ape too small

I am building a phylogenetic tree using data from NCBI taxonomy. The tree is quite simple, its aims is to show the relationship among a few Arthropods.
The problem is the tree looks very small and I can't seem to make its branches longer. I would also like to color some nodes (Ex: Pancrustaceans) but I don't know how to do this using ape.
Thanks for any help!
library(treeio)
library(ape)
treeText <- readLines('phyliptree.phy')
treeText <- paste0(treeText, collapse="")
tree <- read.tree(text = treeText) ## load tree
distMat <- cophenetic(tree) ## generate dist matrix
plot(tree, use.edge.length = TRUE,show.node.label = T, edge.width = 2, label.offset = 0.75, type = "cladogram", cex = 1, lwd=2)
Here are some pointers using the ape package. I am using a random tree as we don't have access to yours, but these examples should be easily adaptable to your problem. If your provide a reproducible example of a specific question, I could take another look.
First me make a random tree, add some species names, and plot it to show the numbers of nodes (both terminal and internal)
library(ape)
set.seed(123)
Tree <- rtree(10)
Tree$tip.label <- paste("Species", 1:10, sep="_")
plot.phylo(Tree)
nodelabels() # blue
tiplabels() # yellow
edgelabels() # green
Then, to color any node or edge of the tree, we can create a vector of colors and provide it to the appropriate *labels() function.
# using numbers for colors
node_colors <- rep(5:6, each=5)[1:9] # 9 internal nodes
edge_colors <- rep(3:4, each=9) # 18 branches
tip_colors <- rep(c(11,12,13), 4)
# plot:
plot.phylo(Tree, edge.color = edge_colors, tip.color = tip_colors)
nodelabels(pch = 21, bg = node_colors, cex=2)
To label just one node and the clade descending from it, we could do:
Nnode(Tree)
node_colors <- rep(NA, 9)
node_colors[7] <- "green"
node_shape <- ifelse(is.na(node_colors), NA, 21)
edge_colors <- rep("black", 18)
edge_colors[13:18] <- "green"
plot(Tree, edge.color = edge_colors, edge.width = 2, label.offset = .1)
nodelabels(pch=node_shape, bg=node_colors, cex=2)
Without your tree, it is harder to tell how to adjust the branches. One way is to reduce the size of the tip labels, so they take up less space. Another way might be to play around when saving the png or pdf.
There are other ways of doing these embellishments of trees, including the ggtree package.

R Indexing a matrix to use in plot coordinates

I'm trying to plot a temporal social network in R. My approach is to create a master graph and layout for all nodes. Then, I will subset the graph based on a series of vertex id's. However, when I do this and layout the graph, I get completely different node locations. I think I'm either subsetting the layout matrix incorrectly. I can't locate where my issue is because I've done some smaller matrix subsets and everything seems to work fine.
I have some example code and an image of the issue in the network plots.
library(igraph)
# make graph
g <- barabasi.game(25)
# make graph and set some aestetics
set.seed(123)
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sample(1:25, 15, F)
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph')
The vertices in the second plot should match layout of those in the first.
Unfortunately, you set the random seed after you generated the graph,
so we cannot exactly reproduce your result. I will use the same code but
with set.seed before the graph generation. This makes the result look
different than yours, but will be reproducible.
When I run your code, I do not see exactly the same problem as you are
showing.
Your code (with set.seed moved and scales added)
library(igraph)
library(scales) # for rescale function
# make graph
set.seed(123)
g <- barabasi.game(25)
# make graph and set some aestetics
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
## V(g)$names = 1:25
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sort(sample(1:25, 15, F))
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph', vertex.label=V(sub_g)$names)
When I run your code, both graphs have nodes in the same
positions. That is not what I see in the graph in your question.
I suggest that you run just this code and see if you don't get
the same result (nodes in the same positions in both graphs).
The only difference between the two graphs in my version is the
node labels. When you take the subgraph, it renumbers the nodes
from 1 to 15 so the labels on the nodes disagree. You can fix
this by storing the node labels in the graph before taking the
subgraph. Specifically, add V(g)$names = 1:25 immediately after
your statement E(g)$arrow.size = .1. Then run the whole thing
again, starting at set.seed(123). This will preserve the
original numbering as the node labels.
The graph looks slightly different because the new, sub-graph
does not take up all of the space and so is stretched to use
up the empty space.
Possible fast way around: draw the same graph, but color nodes and vertices that you dont need in color of your background. Depending on your purposes it can suit you.

Resources