How to subset a network graph keeping the top n components? - r

I have a disconnected network with many small components.
I would like to keep only those components are above the 75th percentile in size.
In using decompose a list of network is produced that cannot be plotted as one.
library(igraph)
set.seed(123)
g <- erdos.renyi.game(100, 0.02, directed = FALSE, loops = TRUE)
components(g)$csize
components <- which(components(g)$csize>=quantile(components(g)$csize,.75))
g_final <- igraph::decompose(g, max.comps = length(components), min.vertices = 2)

I think that what you really want to use is induced_subgraph.
I really dislike that you used the name of the function components as the name of a variable, so I have changed it here to Components.
Components <- which(components(g)$csize>=quantile(components(g)$csize,.75))
BigComp <-induced_subgraph(g,
which(components(g)$membership %in% Components))
plot(BigComp)

Related

Defining Node shape in a Network plot with an additional attribute table in R

I am working on plotting a Network and it contains two different types of Nodes which I want to visualise with different shapes. For that I made an additional table in which I specified which structure is which type using a binary system. Now I want to specify in my plot function that the structures with 1 are to be triangles and the ones with 0 as circles.
My data for the Network is in the format of an adjacency matrix (I use igraph) and I am using ggnet2 for the plotting of it.
this is how I imported the data:
am <- as.matrix(read.csv2("mydata.csv", header = T, row.names = 1))
g <- graph_from_adjacency_matrix(am, mode = "undirected")
attr <- read.csv2("myattributes.csv", header = T, row.names = 1)
this is how I would plot it but I dont know how to specify the shape function
ggnet2(g, size = "degree", node.color = "darkgreen", shape = ??????)
Thanks in advance for your help!
Note that the package-requirements for plotting igraphs with ggnet2 include ggplot2, sna and network as well as intergraph as a bridge.
ggnet2 is prettier, sure, but the igraph-way is this:
g <- erdos.renyi.game(100,100,'gnm')
V(g)$shape <- sample(c('csquare','circle'), 100, replace=T)
plot(g, vertex.label = NA)
Note that I added two igraph-style shapes as vertex-attributes to g above. In ggent2 you can provide a vector with shapes, but they can be any values (even a factor), or numbers (the usual gray circle is 19. Try this out to plot in ggnet2
ggnet2(g, shape=19)
ggnet2(g, shape=10+round(1:100/10))
ggnet2(g, shape=factor(V(g)$shape))
V(g)$shape <- sample(c('One shape','Another shape'), 100, replace=T)
ggnet2(g, shape=V(g)$shape, size = "degree", node.color = "darkgreen")
Note that, if you add attributes to your vertices after separately loading attribute data (as you do above), it may be so that the very order of your data matters. Make sure your table import actually works as intended with the correct attribute being assigned to the correct vertex. I find it a good practice to tie all values as attributes on the igraph-object (edge- and vertex attributes alike) rather than letting the network data live in different dataframes or loose vectors to be combined in order to correctly visualise a network.

Calculate alters density in a large graph in R

I'm working with a large social network in R (560120 ties). I want to calculate the local density of nodes, as well as the density of their alters.
I achieved the former with the following code snippet, using the package igraph.
g <- graph_from_data_frame(edgelist, directed = FALSE)
egonet_list <- make_ego_graph(g)
dat <- data.frame(
id = names(V(g),
egonet_density = lapply(egonet_list, graph.density) %>% unlist()
)
However, I run into memory troubles when I try to calculate the network of ego's alters. I try to run the following:
alter_list <- make_ego_graph(g, order = 2, mindist = 1)
It does work for smaller graphs, but with my network setup, it is eating up all of my RAM (>110GB) and crashing.
Does anyone have a suggestion how to solve this issue in a memory-friendly way?
You can calculate the local density for one node at a time without saving the alter graphs.
library(igraph)
library(purrr)
make_ego_graph(g, order = 2, nodes = 1, mindist = 1)
V(g) %>%
map_dbl(~ make_ego_graph(g, order = 2, nodes = .x, mindist = 1)[[1]] %>%
graph.density())
You can take the code within map and write a function called get_alter_density() and use lapply if you prefer.

R: How to Efficiently Visualize a Large Graph Network

I simulated some graph network data (~10,000 observations) in R and tried to visualize it using the visNetwork library in R. However, the data is very cluttered and is very difficult to analyze visually (I understand that in real life, network data is meant to be analyzed using graph query language).
For the time being, is there anything I can do to improve the visualization of the graph network I created (so I can explore some of the linkages and nodes that are all piled on top of each other)?
Can libraries such as 'networkD3' and 'diagrammeR' be used to better visualize this network?
I have attached my reproducible code below:
library(igraph)
library(dplyr)
library(visNetwork)
#create file from which to sample from
x5 <- sample(1:10000, 10000, replace=T)
#convert to data frame
x5 = as.data.frame(x5)
#create first file (take a random sample from the created file)
a = sample_n(x5, 9000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 9000)
#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")
graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)
graph
plot(graph)
library(visNetwork)
nodes <- data.frame(id = V(graph)$name, title = V(graph)$name)
nodes <- nodes[order(nodes$id, decreasing = F),]
edges <- get.data.frame(graph, what="edges")[1:2]
visNetwork(nodes, edges) %>% visIgraphLayout(layout = "layout_with_fr") %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
visInteraction(navigationButtons = TRUE)
Thanks
At the request of the OP, I am applying the method used in a previous answer
Visualizing the result of dividing the network into communities to this problem.
The network in the question was not created with a specified random seed.
Here, I specify the seed for reproducibility.
## reproducible version of OP's network
library(igraph)
library(dplyr)
set.seed(1234)
#create file from which to sample from
x5 <- sample(1:10000, 10000, replace=T)
#convert to data frame
x5 = as.data.frame(x5)
#create first file (take a random sample from the created file)
a = sample_n(x5, 9000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 9000)
#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")
graph <- graph.data.frame(c, directed=F)
graph <- simplify(graph)
As noted by the OP, a simple plot is a mess. The referenced previous answer
broke this into two parts:
Plot all of the small components
Plot the giant component
1. Small components
Different components get different colors to help separate them.
## Visualize the small components separately
SmallV = which(components(graph)$membership != 1)
SmallComp = induced_subgraph(graph, SmallV)
LO_SC = layout_components(SmallComp, layout=layout_with_graphopt)
plot(SmallComp, layout=LO_SC, vertex.size=9, vertex.label.cex=0.8,
vertex.color=rainbow(18, alpha=0.6)[components(graph)$membership[SmallV]])
More could be done with this, but that is fairly easy and not the substance of the question, so I will leave this as the representation of the small components.
2. Giant component
Simply plotting the giant component is still hard to read. Here are two
approaches to improving the display. Both rely on grouping the vertices.
For this answer, I will use cluster_louvain to group the nodes, but you
could try other community detection methods. cluster_louvain produces 47
communities.
## Now try for the giant component
GiantV = which(components(graph)$membership == 1)
GiantComp = induced_subgraph(graph, GiantV)
GC_CL = cluster_louvain(GiantComp)
max(GC_CL$membership)
[1] 47
Giant method 1 - grouped vertices
Create a layout that emphasizes the communities
GC_Grouped = GiantComp
E(GC_Grouped)$weight = 1
for(i in unique(membership(GC_CL))) {
GroupV = which(membership(GC_CL) == i)
GC_Grouped = add_edges(GC_Grouped, combn(GroupV, 2), attr=list(weight=6))
}
set.seed(1234)
LO = layout_with_fr(GC_Grouped)
colors <- rainbow(max(membership(GC_CL)))
par(mar=c(0,0,0,0))
plot(GC_CL, GiantComp, layout=LO,
vertex.size = 5,
vertex.color=colors[membership(GC_CL)],
vertex.label = NA, edge.width = 1)
This provides some insight, but the many edges make it a bit hard to read.
Giant method 2 - contracted communities
Plot each community as a single vertex. The size of the vertex
reflects the number of nodes in that community. The color represents
the degree of the community node.
## Contract the communities in the giant component
CL.Comm = simplify(contract(GiantComp, membership(GC_CL)))
D = unname(degree(CL.Comm))
set.seed(1234)
par(mar=c(0,0,0,0))
plot(CL.Comm, vertex.size=sqrt(sizes(GC_CL)),
vertex.label=1:max(membership(GC_CL)), vertex.cex = 0.8,
vertex.color=round((D-29)/4)+1)
This is much cleaner, but loses any internal structure of the communities.
Just a tip for 'real-life'. The best way to deal with large graphs is to either 1) filter the edges you are using by some measure, or 2) use some related variable as weight.

R Indexing a matrix to use in plot coordinates

I'm trying to plot a temporal social network in R. My approach is to create a master graph and layout for all nodes. Then, I will subset the graph based on a series of vertex id's. However, when I do this and layout the graph, I get completely different node locations. I think I'm either subsetting the layout matrix incorrectly. I can't locate where my issue is because I've done some smaller matrix subsets and everything seems to work fine.
I have some example code and an image of the issue in the network plots.
library(igraph)
# make graph
g <- barabasi.game(25)
# make graph and set some aestetics
set.seed(123)
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sample(1:25, 15, F)
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph')
The vertices in the second plot should match layout of those in the first.
Unfortunately, you set the random seed after you generated the graph,
so we cannot exactly reproduce your result. I will use the same code but
with set.seed before the graph generation. This makes the result look
different than yours, but will be reproducible.
When I run your code, I do not see exactly the same problem as you are
showing.
Your code (with set.seed moved and scales added)
library(igraph)
library(scales) # for rescale function
# make graph
set.seed(123)
g <- barabasi.game(25)
# make graph and set some aestetics
l <- layout_nicely(g)
V(g)$size <- rescale(degree(g), c(5, 20))
V(g)$shape <- 'none'
V(g)$label.cex <- .75
V(g)$label.color <- 'black'
E(g)$arrow.size = .1
## V(g)$names = 1:25
# plot graph
dev.off()
par(mfrow = c(1,2),
mar = c(1,1,5,1))
plot(g, layout = l,
main = 'Entire\ngraph')
# use index & induced subgraph
v_ids <- sort(sample(1:25, 15, F))
sub_l <- l[v_ids, c(1,2)]
sub_g <- induced_subgraph(g, v_ids)
# plot second graph
plot(sub_g, layout = sub_l,
main = 'Sub\ngraph', vertex.label=V(sub_g)$names)
When I run your code, both graphs have nodes in the same
positions. That is not what I see in the graph in your question.
I suggest that you run just this code and see if you don't get
the same result (nodes in the same positions in both graphs).
The only difference between the two graphs in my version is the
node labels. When you take the subgraph, it renumbers the nodes
from 1 to 15 so the labels on the nodes disagree. You can fix
this by storing the node labels in the graph before taking the
subgraph. Specifically, add V(g)$names = 1:25 immediately after
your statement E(g)$arrow.size = .1. Then run the whole thing
again, starting at set.seed(123). This will preserve the
original numbering as the node labels.
The graph looks slightly different because the new, sub-graph
does not take up all of the space and so is stretched to use
up the empty space.
Possible fast way around: draw the same graph, but color nodes and vertices that you dont need in color of your background. Depending on your purposes it can suit you.

Create graph (network analysis R)?

I'm quite new to R and having trouble with the following:
I'm researching politicians in Belgium on Twitter, and would like to see if any networks form within political parties on Twitter.
I have two data files
The matrix file that contains whether or not politicians are linked
(politicixpolitici.csv)
The file that contains all the polticians with that respective
fistname, name, political party, twitterhandle and parliament
(data.csv)
I want to create a graph that shows the network, but with the nodes colored by their politicial party (this variable is called 'fractie' in the data.csv file).
I've tried doing this as follows:
First, I've tried to combine the files as follows:
rownames(politicicsv) <- politicicsv[,'TwitterHandle']
test <- cbind(politicixpolitici,
politicicsv[, "Fractie"][match(rownames(politicixpolitici),
rownames(politicicsv))])
=> I've plotted this network, but it comes out very sloppy and the names are on there which makes it very hard to see + the nodes are obviously not coloured according to the party.
Then, I've tried it using statnet, but when I wanted to create the the graph, I had trouble with the creation of the vertex attribute:
fractie <- get.vertex.attribute(politicicsv, "Fractie")
Error in get.vertex.attribute(politicicsv, "Fractie") :
get.vertex.attribute requires an argument of class network.
Can someone help me in plotting this network, with the nodes colored according to the political party ("Fractie") they belong to?
Files can be found here
Thank you, this would help me with my thesis.
Can someone help me in plotting this network, with the nodes colored
according to the political party ("Fractie") they belong to?
You could do it like this
df <- read.csv("data.csv")
m <- as.matrix(read.csv2("politicixpolitici.csv", row.names = 1))
library(igraph)
g <- simplify(graph_from_adjacency_matrix(m))
# Color palette:
(pal <- setNames(
colorRampPalette(categorical_pal(8))(nlevels(df$Fractie)),
levels(df$Fractie)) )
# CD&V Ecolo-Groen Groen N-VA Onafhankelijke
# "#E69F00" "#81ADA3" "#33ABB9" "#18A56E" "#C0D64B"
# Open vld Open Vld sp.a VB Vlaams Belang
# "#77AB7A" "#2A6D8E" "#BF5F11" "#CF6E64" "#BC82A2"
# Vuye&Wouters
# "#999999"
V(g)$color <- pal[df$Fractie[match(V(g)$name, df$TwitterHandle)]]
set.seed(1); coords <- layout_with_fr(g)
plot(g,
layout=coords, vertex.label.cex=.2, vertex.size=2,
edge.arrow.size=0, edge.lty="blank", asp = 0)
or try an interactive plot:
library(visNetwork)
visIgraph(g) %>%
visIgraphLayout(layout="layout.norm", layoutMatrix = coords, type = "full")
All in all I'd recommend exporting your graph to gephi and experiment with other layouts and visualizations there interactively.

Resources