extract a connected subgraph from a subset of vertices with igraph - r

I have a graph G(V,E) unweighted, undirected and connected graph with 12744 nodes and 166262 edges. I have a set of nodes (sub_set) that is a subset of V. I am interested in extracting the smallest connected subgraph where sub_set is a part of this new graph. I have managed to get a subgraph where my subset of nodes is included but I would like to know if there is a way to minimise the graph.
Here is my code (adapted from http://sidderb.wordpress.com/2013/07/16/irefr-ppi-data-access-from-r/)
library('igraph')
g <- erdos.renyi.game(10000, 0.003) #graph for illustrating my propose
sub_set <- sample(V(g), 80)
order <- 1
edges <- get.edges(g, 1:(ecount(g)))
neighbours.vid <- unique(unlist(neighborhood(g, order, which(V(g) %in% sub_set))))
rel.vid <- edges[intersect(which(edges[,1] %in% neighbours.vid), which(edges[,2] %in% neighbours.vid)),]
rel <- as.data.frame(cbind(V(g)[rel.vid[,1]], V(g)[rel.vid[,2]]), stringsAsFactors=FALSE)
names(rel) <- c("from", "to")
subgraph <- graph.data.frame(rel, directed=F)
subgraph <- simplify(subgraph)
I have read this post
minimum connected subgraph containing a given set of nodes, so I guess that my problem could be "The Steiner Tree problem", is there any way to try to find a suboptimal solution using igraph?

Not sure if that's what you meant but
subgraph<-minimum.spanning.tree(subgraph)
produces a graph with the minimum number of edges in which all nodes stay connected in one component.

Related

How do I see the two nodes of most weighted egdes in R igraph?

I have an igraph network object constructed in R and generated weight information for each edge. I want to see the nodes of the most weighted edges (descending). What codes should I use to do that? Thank you!
# create an igraph project of user interaction network and check descriptives.
library(igraph)
#edge list
EL = read.csv("(file path omitted)user_interaction_structure.csv")
head(EL)
#node list: I do not have a node list
#construct an igraph oject
g <- graph_from_data_frame(EL, directed = TRUE, vertices = NULL)
#check the edge and node number of the network
gsize(g)
vcount(g)
#check nodes based on degree (descending)
deg <- igraph::degree(g)
dSorted <-sort.int(deg,decreasing=TRUE,index.return=FALSE)
dSorted
#check edges based on weight
E(g)
#the network will contain loop edges and multiple edges
#simplify multiple edges
g_simple <- graph.adjacency(get.adjacency(g),weighted=TRUE)
#check edge weight
E(g_simple)$weight
#igraph can generate a matrix
g_simple[]
Then I wanted to see who were interacting heavily with whom (the nodes of the edges with the largest weight),so I tried
e_top_weights <- order(order(E(g_simple))$weight, decreasing=TRUE)
but it did not work.
I think what you want is the igraph function strength(), which gives the sum of the weights of the edges incident to each node. Here's an example:
library(igraph)
# A small graph we can visualize
g <- make_ring(5)
# Assign each edge an increasing weight, to make things
# easy
edgeweights<- 1:ecount(g)
E(g)$weight <- edgeweights
# The strength() function sums the weights of edges incident
# to each node
strengths <- strength(g)
# We can collect the top two strengths by sorting the
# strengths vector, then asking for which elements of the
# strengths vector are equal to or greater than the second
# largest element.
toptwo <- which(strengths >= sort(strengths, decreasing = TRUE)[2])
## [1] 4 5
# Assign nodes a color blue that is more saturated when nodes
# have greater strength.
cr <- colorRamp(c(rgb(0,0,1,.1), rgb(0,0,1,1)), alpha = TRUE)
colors <- cr(strengths/max(strengths))
V(g)$color <- apply(colors, 1, function(row) rgb(row[1], row[2], row[3], row[4], maxColorValue = 255))
# Plot to confirm
plot(g, edge.width = edgeweights)
Edit
Here are two different ways to find the two nodes (the "from" node and the "to" node) which are the ends of the edge with the maximum weight:
## 1
edge_df <- as_data_frame(g, "edges")
edge_df[which(edge_df$weight == max(edge_df$weight)), c("from", "to")]
## 2
max_weight_edge <- E(g)[which(E(g)$weight == max(E(g)$weight))]
ends(g, es = max_weight_edge)

how to create a complete list of diads form a vertex list

how can I create a complete list of dyads from a vertex list?
I have a list (1, 2, 3...) and I need to generate a list containing all possible dyads from that list (1-1, 1-2, 1-3, 2-1, 2-2,...).
I've tried with get.edgelist, but it doesn't work, because the graph is not fully connected (all nodes are connected among them).
Thanks
Using igraph, you can grab all edges of a graph using E(g). If you'd want all possible edges, you can apply it on a complete graph (a graph that is fully connected). If the vertices in your graph are indeed in sequence from 1 to n, you can use make_full_graph() to make a Kn - that is to say a fully connected graph. In this example, the graph has 14 vertices.
g <- make_full_graph(14, directed=F)
el <- as_edgelist(g)
edges <- E(g)
edges_list <- split(el, rep(1:nrow(el), each = ncol(el)))
edges_vert <- unlist(list(t(el)))
edges will be the igraph-object, but I think what you're after is a list in R, like edges_list.
As you see, length(edges_list) is 91 since it is an undirected graph, and the number of edges in complete graphs is a function of the number of vertices.
A complete graph with n vertices is commonly written Kn and has these many edges:
Note that in igraph dyads are called edges and nodes are called vertices.

R iGraph: degree in the case of bidirectional edges

I have noticed that the function degree in iGraph doesn't straighforwardly allow to calculate the degree of the undirected skeleton graph of a directed graph, whenever bidirectional edges are involved.
For example,
g <-graph_from_literal( a-+b,a++c,d-+a,a-+e,a-+f )
d1 <- degree(g,v='a',mode="all")
# 6
nn <- unique(neighbors(g,'a',mode='all'))
d2 <- length(nn)
# 5
As I wanted d2, instead of d1, I have used a different route based on finding the neighbors of the considered vertex.
My question is: is there a better/faster way to do this, maybe using some other iGraph function that I'm not aware of?
Create an undirected copy of the graph, collapse the multiple edges in the undirected graph into a single edge, and then calculate the degree on that:
> g2 <- as.undirected(g, mode="collapse")
> degree(g2)

After clustering in R (iGraph, etc), can you maintain nodes+edges from a cluster to do individual cluster analysis?

Basically I have tried a few different ways of clustering. I can usually get to a point in iGraph where each node is labeled with a cluster. I can then identify all the nodes within a single cluster. However, this loses their edges.
I'd have to re-iterate back over the original dataset for all the nodes in cluster 1 to get only those where both nodes+the edge are within the cluster. I'd have to do this for every cluster.
This seems like a painfully long process and there is probably a shortcut my google-fu is missing.
So, is there an easy way to, after clustering or performing community detection processes, to maintain an individual cluster/community as its own smaller graph -- that is, retaining all nodes AND edges between them?
You can use delete.vertices() to create a subgraph. Example:
library(igraph)
set.seed(123)
# create random graph
g <- barabasi.game(100, directed = F)
plot(g, layout=layout.fruchterman.reingold)
# do community detection
wc <- multilevel.community(g)
V(g)$community <- membership(wc)
# make community 1 subgraph
g_sub <- delete.vertices(g, V(g)[community != 1])
plot(g_sub, layout=layout.fruchterman.reingold)
An alternative:
#Create random network
d <- sample_gnm(n=50,m=40)
#Identify the communities
dc <- cluster_walktrap(d)
#Induce a subgraph out of the first community
dc_1 <- induced.subgraph(d,dc[[1]])
#plot that specific community
plot(dc_1)

Calculate degree, closeness and betweenness in R

I have a data table which consists of names of users who post in the same thread in a forum, it looks like that:
X1 X2
1. g79 kian
2. g79 greyracer
3. g79 oldskoo1 ...
I need to calculate degree, closeness and betweenness. I'm using the following code:
library(igraph)
setwd("/Volumes/NATASHKA/api/R files")
load("edgelist_one_mode.rda")
load("map.rda")
load ("result.rda")
el <- as.matrix(whatwewant)
el[,1] <- as.character(el[,1])
el[,2] <- as.character(el[,2])
g <- graph.data.frame(el, directed=FALSE)
plot(g, edge.arrow.size=.5)
indegreeG <- degree(g, mode="in")
outdegreeG <- degree(g, mode="out")
totaldegreeG <- degree(g)
inclosenessG <- closeness(g, mode='in')
outclosenessG <- closeness(g, mode='out')
totalclosenessG <- closeness(g)
betweennessG <- betweenness(g)
forumG <- data.frame(V(g)$name, indegreeG, outdegreeG, totaldegreeG, inclosenessG, outclosenessG, totalclosenessG, betweennessG)
write.table(forumG,file="forumG.csv",sep=";")
The question is why do I get the same values for in-degree, out-degree and total-degree, the same for closeness? Besides, at the beginning I have 41213 users, but after analysis (when I calculate degree, etc..) I only have 37874. How could I lose so many observations? Please tell me if I have a mistake in the code.
Thanks
The reason you get the same value for in-degree, out-degree and total degree is because you are creating an undirected network with the graph.data.frame(el, directed=FALSE).
In an undirected network, the number of links from a node and to a node are the same and they are both equal to the global degree.
If you want a directed network, you will need to do graph.data.frame(el, directed=TRUE).
It will create a directed network in which the id in the first column of your dataframe is the id of the node sending the tie and the id in the second column indicates the node receiving that tie.
As for loosing nodes, my guess would be that you have some individuals who never interact with anyone and therefore are lost when you transform your two-mode network into one-mode (I assume you do this but don't show us how you do it because of your line:load("edgelist_one_mode.rda"))
Short of a reproducible example, I think that is all I can deduce from your code.

Resources