How can I reduce the nodes in a ggraph arc graph? - r

I'm trying to create an arc graph showing relationships between nonprofits focusing on a subgraph centered on one of the nonprofits. There are so many nonprofits in this subgraph, I need to reduce the number of nodes in the arc graph to only focus on the strongest connections.
I've successfully filtered out edges below a weight of 50. But when I create the graph, the nodes are still remaining even though the edges have disappeared. How do I filter the unwanted nodes from the arc graph?
Here's my code, starting from the creation of the igraph object.
# Create an igraph object
NGO_igraph <- graph_from_data_frame(d = edges, vertices = nodes, directed = TRUE)
# Create a subgraph centered on a node
# Start by entering the node ID
nodes_of_interest <- c(48)
# Build the graph
selegoV <- ego(NGO_igraph, order=1, nodes = nodes_of_interest, mode = "all", mindist = 0)
selegoG <- induced_subgraph(NGO_igraph,unlist(selegoV))
# Reducing the graph based on edge weight
smaller <- delete.edges(selegoG, which(E(selegoG)$weight < 50))
# Plotting an arc graph
ggraph(smaller, layout = "linear") +
geom_edge_arc(aes(width = weight), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_node_text(aes(label = label)) +
labs(edge_width = "Interactions") +
theme_graph()
And here's the result I'm getting:

If you are only interested in omitting zero degree vertices or isolates (meaning vertices which have no incoming or outgoing edge) you could simply use the following line:
g <- induced.subgraph(g, degree(g) > 0)
However, this will delete all isolates. So if you are for some reason set on specificly deleting those vertices connected by edges smaller than 50 (and exempt other 'special' isolates), then you will need to clearly identify which those are:
special_vertex <- 1
v <- ends(g, which(E(g) < 50))
g <- delete.vertices(g, v[v != special_vertex])
You could also skip the delete.edges part by considering the strength of a vertex:
g <- induced.subgraph(g, strength(g) > 50)

Without any sample data I created this basic sample:
#define graph
g <- make_ring(10) %>%
set_vertex_attr("name", value = LETTERS[1:10])
g
V(g)
#delete edges going to and from vertice C
g<-delete.edges(g, E(g)[2:3])
#find the head and tails of each edge in graph
heads<-head_of(g, E(g))
tails<-tail_of(g, E(g))
#list of all used vetrices
combine<-unique(c(heads, tails))
#collect an vertices
v<-V(g)
#find vertices not in found set
toremove<-setdiff(v, combine)
#remove unwanted vertices
delete_vertices(g, toremove)
The basic process is to identify the start and end of all of the edges of interest, then compare this unique list with all of the edges and remove the ones not in the unique list.
From your code above the graph "smaller" would be used to find the vertices.
Hope this helps.

Related

How do I see the two nodes of most weighted egdes in R igraph?

I have an igraph network object constructed in R and generated weight information for each edge. I want to see the nodes of the most weighted edges (descending). What codes should I use to do that? Thank you!
# create an igraph project of user interaction network and check descriptives.
library(igraph)
#edge list
EL = read.csv("(file path omitted)user_interaction_structure.csv")
head(EL)
#node list: I do not have a node list
#construct an igraph oject
g <- graph_from_data_frame(EL, directed = TRUE, vertices = NULL)
#check the edge and node number of the network
gsize(g)
vcount(g)
#check nodes based on degree (descending)
deg <- igraph::degree(g)
dSorted <-sort.int(deg,decreasing=TRUE,index.return=FALSE)
dSorted
#check edges based on weight
E(g)
#the network will contain loop edges and multiple edges
#simplify multiple edges
g_simple <- graph.adjacency(get.adjacency(g),weighted=TRUE)
#check edge weight
E(g_simple)$weight
#igraph can generate a matrix
g_simple[]
Then I wanted to see who were interacting heavily with whom (the nodes of the edges with the largest weight),so I tried
e_top_weights <- order(order(E(g_simple))$weight, decreasing=TRUE)
but it did not work.
I think what you want is the igraph function strength(), which gives the sum of the weights of the edges incident to each node. Here's an example:
library(igraph)
# A small graph we can visualize
g <- make_ring(5)
# Assign each edge an increasing weight, to make things
# easy
edgeweights<- 1:ecount(g)
E(g)$weight <- edgeweights
# The strength() function sums the weights of edges incident
# to each node
strengths <- strength(g)
# We can collect the top two strengths by sorting the
# strengths vector, then asking for which elements of the
# strengths vector are equal to or greater than the second
# largest element.
toptwo <- which(strengths >= sort(strengths, decreasing = TRUE)[2])
## [1] 4 5
# Assign nodes a color blue that is more saturated when nodes
# have greater strength.
cr <- colorRamp(c(rgb(0,0,1,.1), rgb(0,0,1,1)), alpha = TRUE)
colors <- cr(strengths/max(strengths))
V(g)$color <- apply(colors, 1, function(row) rgb(row[1], row[2], row[3], row[4], maxColorValue = 255))
# Plot to confirm
plot(g, edge.width = edgeweights)
Edit
Here are two different ways to find the two nodes (the "from" node and the "to" node) which are the ends of the edge with the maximum weight:
## 1
edge_df <- as_data_frame(g, "edges")
edge_df[which(edge_df$weight == max(edge_df$weight)), c("from", "to")]
## 2
max_weight_edge <- E(g)[which(E(g)$weight == max(E(g)$weight))]
ends(g, es = max_weight_edge)

how can I extract a sublayout to match a subgraph in R

I'm using igraph in R, trying to extract a subgraph that comprises only the largest cliques in a graph. I want to plot the graph and the subgraph (1) without moving the vertices and (2) while maintaining numbering. I can't seem to make it work. I tried storing the layout coordinates directly in the vertices but plot seems to rescale things. Setting rescale=FALSE didn't work either. Here's what I have, first plotting the random graph, then highlighting the largest cliques, and finally displaying only the largest cliques:
# plot random graph
g <- sample_gnp(n=30, p=.1)
l=layout_with_fr(g)
V(g)$x <- l[,1]
V(g)$y <- l[,2]
V(g)$id <- 1:vcount(g)
plot(g,vertex.size=6,vertex.label.dist=1,vertex.label=V(g)$id,main="a random network",sub="where are the cliques?")
# highlight largest cliques
lc=unlist(largest_cliques(g))
vcol <- rep("grey80", vcount(g))
vcol[unlist(lc)] <- "gold"
plot(g, vertex.size=6,vertex.color=vcol,vertex.label.dist=1,vertex.label=V(g)$id,main="here they are!",
layout=l)
# plot only the largest cliques, without changing position or vertex numbers
sg = induced_subgraph(g,lc)
sl = cbind(V(sg)$x,V(sg)$y)
plot(sg,vertex.size=6,vertex.label.dist=1,vertex.label=V(sg)$id,vertex.color="gold",
layout=sl)
First of all, you did not set the random seed, so each time you run this code you will get a different graph and layout. I am setting the seed for reproducibility.
# plot random graph
set.seed(2021)
g <- sample_gnp(n=30, p=.1)
l=layout_with_fr(g)
V(g)$x <- l[,1]
V(g)$y <- l[,2]
V(g)$id <- 1:vcount(g)
# highlight largest cliques
lc=unlist(largest_cliques(g))
vcol <- rep("grey80", vcount(g))
vcol[unlist(lc)] <- "gold"
There are other ways to do this, but I will do it using rescale=F.
When you do that, you have to adjust xlim and ylim to the ranges of the layout. Also, it changes the scale, so you need to change the vertex size.
Putting all of that together, I get:
Rx =range(l[,1])
Ry =range(l[,2])
par(mfrow=c(1,2))
plot(g, vertex.size=45,vertex.color=vcol,vertex.label.dist=1,
vertex.label=V(g)$id,main="here they are!", layout=l,
rescale=F, xlim=Rx, ylim=Ry)
# plot only the largest cliques, without changing position or vertex numbers
sg = induced_subgraph(g,lc)
sl = cbind(V(sg)$x,V(sg)$y)
plot(sg,vertex.size=45, vertex.label.dist=1,vertex.label=V(sg)$id,vertex.color="gold",
layout=sl, rescale=F, xlim=Rx, ylim=Ry, main="Unmoved")
you can use set-vertex_attr to set the name attribute for graph g, and then subset the layout lc like below
# plot random graph
set.seed(2021)
g <- sample_gnp(n = 30, p = .1)
l <- layout_with_fr(g)
# highlight largest cliques
lc <- unlist(largest_cliques(g))
g %>%
set_vertex_attr(name = "name", value = seq(vcount(.))) %>%
induced.subgraph(lc) %>%
plot(layout = l[as.integer(names(V(.))), ], vertex.color = "gold")
and you will get

How to calculate the number of vertices contracted into one graph?

I have a few large igraph objects that represent social networks. All nodes have various attributes, among them sector which is a factor variable. I have contracted this large network into a small where vertices represent groups and edges have the sum of individual edges in the original network. The label attribute in the second network represents the sector attribute in the first.
groupnet <- contract(g, as.integer(as.factor(V(g)$sector)), "ignore")
E(groupnet)$weight <- 1
groupnet <- simplify(groupnet, edge.attr.comb = list(weight = "sum"))
V(groupnet)$label <- levels(as.factor(V(g)$sector))
I would like to add another attribute to the second object V(groupnet)$groupsize that represents the number of original vertices that were contracted into groupnet. I have tried it with the following code but it did not work:
V(groupnet)$groupsize <- length(V(g)$sector[V(g)$sector == V(groupnet)$label])
How can I do this properly?
table() could be helpful here. Try out:
set.seed(1234)
library(igraph)
g <- make_ring(1000)
V(g)$sector <- factor(sample(LETTERS, 100, replace = T))
V(g)$sector
## contracted network
groupnet <- contract(g, as.integer(as.factor(V(g)$sector)), "ignore")
E(groupnet)$weight <- 1
V(groupnet)$label <- levels(as.factor(V(g)$sector))
## number of original vertices that were contracted into groupnet
# the tip is to see that table(V(g)$sector) provides the number of vertices per sector and
# its output is also arranged like V(groupnet)
table(V(g)$sector)
V(groupnet)
# solution
V(groupnet)$groupsize <- as.numeric(table(V(g)$sector))

cluster walktrap returns three communities, but when plotting they are all on top of each other, with no visible clustering

I've been following documentation tutorials and even lecture tutorials step by step. But for some reason the output of my plot is like this:
The output doesn't make any sense to me. There clearly is no structure, or communities in this current plot, as you can see that the bigger circles are all overlapping. Shouldn't this, in this case, return only a single community? Additionally the modularity of my network is ~0.02 which would again, suggest there is no community structure. But why does it return 3 communities?
this is my code: (exactly same as in documentation, with different dataset)
m <- data.matrix(df)
g <- graph_from_adjacency_matrix(m, mode = "undirected")
#el <- get.edgelist(g)
wc <- cluster_walktrap(g)
modularity(wc)
membership(wc)
plot(wc,g)
my data set looks is a 500x500 adjacency matrix in the form of a csv, with a 1-500 column and index names corresponding to a person.
I tried understanding the community class and using different types of variables for the plot, e.g. membership(wc)[2] etc. My thought is that the coloring is simply wrong, but nothing Ive tried so far seems to fix the issue.
You can have inter-community connections. You're working with a graph of 500 nodes and they can have multiple connections. There will be a large number of connections between nodes of different communities, but if you conduct a random walk you're most likely to traverse connections between nodes of the same community.
If you separate the communities in the plot (using #G5W's code (igraph) Grouped layout based on attribute) you can see the different groups.
set.seed(4321)
g <- sample_gnp(500, .25)
plot(g, vertex.label = '', vertex.size = 5)
wc <- cluster_walktrap(g)
V(g)$community <- membership(wc)
E(g)$weight = 1
g_grouped = g
for(i in unique(V(g)$community)){
groupV = which(V(g)$community == i)
g_grouped = add_edges(g_grouped, combn(groupV, 2), attr=list(weight = 2))
}
l <- layout_nicely(g_grouped)
plot( wc,g, layout = l, vertex.label = '', vertex.size = 5, edge.width = .1)
Red edges are intercommunity connections and black edges are intracommunity edges

Solving Chinese Postman algorithm with eulerization

I'm would like to solve Chinese Postman problem in a graph where an eulerian cycle does not exist. So basically I'm looking for a path in a graph which visits every edge exactly once, and starts and ends at the same node. A graph will have an euler cycle if and only if every node has same number of edges entering into and going out of it. Obviously my graph doesn't .
I found out that Eulerization (making a graph Eulerian) could solve my question LINK. Can anyone suggest a script to add duplicate edges to a graph so that the resulting graph has no vertices of odd degree (and thus does have an Euler Circuit)?
Here is my example:
require(igraph)
require(graph)
require(eulerian)
require(GA)
g1 <- graph(c(1,2, 1,3, 2,4, 2,5, 1,5, 3,5, 4,7, 5,7, 5,8, 3,6, 6,8, 6,9, 9,11, 8,11, 8,10, 8,12, 7,10, 10,12, 11,12), directed = FALSE)
mat <- get.adjacency(g1)
mat <- as.matrix(mat)
rownames(mat) <- LETTERS[1:12]
colnames(mat) <- LETTERS[1:12]
g2 <- as(graphAM(adjMat=mat), "graphNEL")
hasEulerianCycle(g2)
Fun problem.
The graph you sugest in the code above, can be made to have duplicates that enable a eulerian cycle to be created. The function I provide below tries to add the minimum amount of duplicate edges, but also readily breaks the graph structure by adding new links if it has to.
You can run:
eulerian.g1 <- make.eulerian(g1)$graph
Check what the function did to your graph with:
make.eulerian(g1)$info
Bare in mind that:
This is not the only graph structure where duplicates added to the original g1 graph can form an eulerian cycle. Imagine for example my function looping the vertices of the graph backwards instead.
Your graph already has an uneven number of vertices with uneven degree, and all of the vertices that are, have neighbours with uneven degrees to pair them with. This function therefore works well four your particular example data.
The function could fail to produce a graph using only duplicates even in graphs where eulerian cycles are possible with correctly added duplicates. This is since it always goes for connecting a node with the first of its neighbours with uneven degree. If this is something that you'd absolutely like to get around, an MCMC-approach would be the way to go.
See also this excellent answer on probability calculation:
Here's my function in a full script that you can source out-of-the-box:
library(igraph)
# You asked about this graph
g1 <- graph(c(1,2, 1,3, 2,4, 2,5, 1,5, 3,5, 4,7, 5,7, 5,8, 3,6, 6,8, 6,9, 9,11, 8,11, 8,10, 8,12, 7,10, 10,12, 11,12), directed = FALSE)
# Make a CONNECTED random graph with at least n nodes
connected.erdos.renyi.game <- function(n,m){
graph <- erdos.renyi.game(n,m,"gnm",directed=FALSE)
graph <- delete_vertices(graph, (degree(graph) == 0))
}
# This is a random graph
g2 <- connected.erdos.renyi.game(n=12, m=16)
make.eulerian <- function(graph){
# Carl Hierholzer (1873) had explained how eulirian cycles exist for graphs that are
# 1) connected, and 2) contain only vertecies with even degrees. Based on this proof
# the posibility of an eulerian cycle existing in a graph can be tested by testing
# on these two conditions.
#
# This function assumes a connected graph.
# It adds edges to a graph to ensure that all nodes eventuall has an even numbered. It
# tries to maintain the structure of the graph by primarily adding duplicates of already
# existing edges, but can also add "structurally new" edges if the structure of the
# graph does not allow.
# save output
info <- c("broken" = FALSE, "Added" = 0, "Successfull" = TRUE)
# Is a number even
is.even <- function(x){ x %% 2 == 0 }
# Graphs with an even number of verticies with uneven degree will more easily converge
# as eulerian.
# Should we even out the number of unevenly degreed verticies?
search.for.even.neighbor <- !is.even(sum(!is.even(degree(graph))))
# Loop to add edges but never to change nodes that have been set to have even degree
for(i in V(graph)){
set.j <- NULL
#neighbors of i with uneven number of edges are good candidates for new edges
uneven.neighbors <- !is.even(degree(graph, neighbors(graph,i)))
if(!is.even(degree(graph,i))){
# This node needs a new connection. That edge e(i,j) needs an appropriate j:
if(sum(uneven.neighbors) == 0){
# There is no neighbor of i that has uneven degree. We will
# have to break the graph structure and connect nodes that
# were not connected before:
if(sum(!is.even(degree(graph))) > 0){
# Only break the structure if it's absolutely nessecary
# to force the graph into a structure where an euclidian
# cycle exists:
info["Broken"] <- TRUE
# Find candidates for j amongst any unevenly degreed nodes
uneven.candidates <- !is.even(degree(graph, V(graph)))
# Sugest a new edge between i and any node with uneven degree
if(sum(uneven.candidates) != 0){
set.j <- V(graph)[uneven.candidates][[1]]
}else{
# No candidate with uneven degree exists!
# If all edges except the last have even degrees, thith
# function will fail to make the graph eulerian:
info["Successfull"] <- FALSE
}
}
}else{
# A "structurally duplicated" edge may be formed between i one of
# the nodes of uneven degree that is already connected to it.
# Sugest a new edge between i and its first neighbor with uneven degree
set.j <- neighbors(graph, i)[uneven.neighbors][[1]]
}
}else if(search.for.even.neighbor == TRUE & is.null(set.j)){
# This only happens once (probably) in the beginning of the loop of
# treating graphs that have an uneven number of verticies with uneven
# degree. It creates a duplicate between a node and one of its evenly
# degreed neighbors (if possible)
info["Added"] <- info["Added"] + 1
set.j <- neighbors(graph, i)[ !uneven.neighbors ][[1]]
# Never do this again if a j is correctly set
if(!is.null(set.j)){search.for.even.neighbor <- FALSE}
}
# Add that a new edge to alter degrees in the desired direction
# OBS: as.numeric() since set.j might be NULL
if(!is.null(set.j)){
# i may not link to j
if(i != set.j){
graph <- add_edges(graph, edges=c(i, set.j))
info["Added"] <- info["Added"] + 1
}
}
}
# return the graph
(list("graph" = graph, "info" = info))
}
# Look at what we did
eulerian <- make.eulerian(g1)
eulerian$info
g <- eulerian$graph
par(mfrow=c(1,2))
plot(g1)
plot(g)

Resources