Problems with igraph, statnet and GEPHI? - networking

I'm working with some graph models in both Gephi, Python and R. Until by chance I decided to compare the results they gave me.
So I had the following problem. When calculating the betweenness centrality with Gephi and R (using igraph and statnet), the three give me different results (igraph and statnet, not very different). Since I am working a very large network, I decided to take a small network and perform the calculation by hand, the figure shown below (taked from: enter link description here )
enter image description here
Using the adjacency list:
source target
1 2
1 3
1 4
2 3
3 4
4 5
4 6
5 6
5 8
5 7
6 8
6 7
7 8
7 9
To then see what results I threw R and Gephi. I discovered that Gephi gives me the same results:
enter image description here
But R (both for igraph and statnet not).
> library('igraph')
> data <- read.csv(file.choose())
> set.seed(123456)
> graph_1<-graph.data.frame(data)
> summary(graph_1)
IGRAPH cfa51db DN-- 9 14 --
+ attr: name (v/c)
> graph_1
IGRAPH cfa51db DN-- 9 14 --
+ attr: name (v/c)
+ edges from cfa51db (vertex names):
[1] 1->2 1->3 1->4 2->3 3->4 4->5 4->6 5->6 5->8 5->7 6->8 6->7 7->8 7->9
> betweenness(graph_1)
1 2 3 4 5 6 7 8 9
0 0 6 15 6 6 6 0 0
> detach("package:igraph", unload=TRUE)
> library(statnet)
> library(intergraph)
> graph_2<-asNetwork(graph_1)
> betweenness(graph_2)
[1] 0 0 6 15 6 6 6 0 0
Am I doing something wrong by running my R code or is it using another algorithm to calculate the betweenness centrality?
Thank you :)

You are computing two different things.
First, to make your example reproducible, here is code that all of us can use to make your example.
library(igraph)
EL = matrix(c(1,2, 1,3, 1,4, 2,3, 3,4, 4,5, 4,6, 5,6, 5,8,
5,7, 6,8, 6,7, 7,8, 7,9), ncol=2, byrow=T)
graph_1 = graph_from_edgelist(EL)
Now, using your code, I get the same result.
betweenness(graph_1)
[1] 0 0 6 15 6 6 6 0 0
However,
betweenness(graph_1, directed=F)
[1] 3 0 3 15 6 6 7 0 0
Gives the same result as you got from Gephi.
The help page ?betweenness says:
directed
Logical, whether directed paths should be considered while
determining the shortest paths.
Clearly, Gephi has different defaults than R.

Related

BFS father attribute in igraph is wrong

I am using the igraph package and I am uncertain whether it is a bug or not, but the $father output makes no sense sometimes. Specifically, when I rename the vertex attributes.
h<-make_tree(10)
#with normal vertex attributes
graph.bfs(h, root="1", neimode='out', order=TRUE, father=TRUE,unreachable=FALSE) #father output seems correct
plot(h,layout=layout_as_tree)
#with renamed vertex attributes
set.seed(1)
h<-set.vertex.attribute(h, "name", value=sample(1:10,10))
plot(h,layout=layout_as_tree)
graph.bfs(h, root="3", neimode='out', order=TRUE, father=TRUE,unreachable=FALSE) #father output seems wrong
I obtain the output as below
#with normal vertex attributes
$order
+ 10/10 vertices, from ff55a96:
[1] 1 2 3 4 5 6 7 8 9 10
$rank
NULL
$father
+ 10/10 vertices, from ff55a96:
[1] NA 1 1 2 2 3 3 4 4 5
#with renamed vertex attributes
$order
+ 10/10 vertices, named, from 170f7a0:
[1] 3 4 5 7 2 8 9 6 10 1
$rank
NULL
$father
+ 10/10 vertices, named, from 170f7a0:
[1] 3 4 5 7 2 8 9 6 10 1
I do not understand why the father for the renamed vertex attributes case is wrong. For example, the first element should be NA but its not.
Can someone explain what is happening? If so how do I fix this such that my father elements reflects something similar to the first case.
It's a bit strange, but for some reason, the bfs function has a straight assignment of the vertex names to the names of the father vector. See the 54-55 line of code in the source code:
if (father)
names(res$father) <- V(graph)$name
Clearly, this simply overwrites the names of res$father with the vector of names in the graph. Notice that this conditional statement requires the argument igraph_opt("add.vertex.names") to be true.
So we can avoid this behavior by setting the global option for adding vertex names to false.
> igraph_options()$add.vertex.names
[1] TRUE
> igraph_options(add.vertex.names=F)
> igraph_options()$add.vertex.names
[1] FALSE
Now it should work:
h<-make_tree(10)
set.seed(1)
h<-set_vertex_attr(h, "name", value=sample(1:10,10))
bfs(h, root=1, neimode='out', order=TRUE, rank=TRUE, father=TRUE,unreachable=FALSE)
Output:
$root
[1] 1
$neimode
[1] "out"
$order
+ 10/10 vertices, named:
[1] 3 4 5 7 2 8 9 6 10 1
$rank
[1] 1 2 3 4 5 6 7 8 9 10
$father
+ 10/10 vertices, named:
[1] <NA> 3 3 4 4 5 5 7 7 2
$pred
NULL
$succ
NULL
$dist
NULL
Might be worth raising this on the igraph github, since this seems (at least to me) like undesirable behavior.

adding several edges simultaneously to a graph using igraph

I would like to pass igraph a matrix of a series of edges for it to form between nodes. The edges are undirected. However, it is not adding the edges I would like for it to add
my graph looks like this
mygraph
IGRAPH U--- 30 11 --
+ attr: color (v/c), color (e/c)
+ edges:
[1] 3-- 4 3-- 9 4-- 5 4-- 6 6--10 12--14 15--20 16--21 25--27 25--30 26--29
I now want to add these undirected edges (edges in m go horizontally e.g. 12--13 is an edge, 9--13 is an edge, etc). If the edges are repeating they should be removed since it is undirected (meaning 23--20 is the same as 20--23).
m
value L1
[1,] 6 2
[2,] 4 5
[3,] 6 5
[4,] 2 6
[5,] 12 13
[6,] 9 13
[7,] 23 20
[8,] 20 23
when I do
add_edges(mygraph, m)
I get the following (note, the total number of edges is correct, but not the nodes that should have edges. For example, 12--13 does not exist and instead 12--9 is formed, which was not specified in m). It seems like add_edges is adding an edge vertically to make 12--9 instead of horizontally to make 12--13 from m
IGRAPH U--- 30 19 --
+ attr: color (v/c), color (e/c)
+ edges:
[1] 3-- 4 3-- 9 4-- 5 4-- 6 6--10 12--14 15--20 16--21 25--27 25--30 26--29 4-- 6 2-- 6 9--12 20--23 2-- 5 5-- 6 13--13
[19] 20--23
how can edges be added horizontally from a matrix to a graph using igraph?
You need to make a transpose of your edge matrix before adding them to your graph, the reason is that data in a matrix is stored by column, and it seems that igraph does not provide a proper interface for matrix, i.e, it doesn't interpret your matrix as by row edge matrix but just a vector and interpret each adjacent pair as a new edge:
Take a look at this simple example:
library(igraph)
mygraph <- graph(c(1,2,3,4,5,6))
mygraph
IGRAPH D--- 6 3 --
+ edges:
[1] 1->2 3->4 5->6
m <- matrix(c(6,2,4,5), byrow = TRUE, ncol = 2)
m
[,1] [,2]
[1,] 6 2
[2,] 4 5
If I add m directly to the graph object:
add_edges(mygraph, m)
IGRAPH D--- 6 5 --
+ edges:
[1] 1->2 3->4 5->6 6->4 2->5
I have 6 -> 4 and 2 -> 5 added as graph, which is because:
as.vector(m)
# [1] 6 4 2 5
So adjacent nodes are interpreted as edges. But if you transpose m before adding it as edges, it gives the correct result.
add_edges(mygraph, t(m))
IGRAPH D--- 6 5 --
+ edges:
[1] 1->2 3->4 5->6 6->2 4->5

How to get the edge list of a strongly connected components in a graph?

I have a weighted directed multigraph with a few cycles. With clusters function in igraph package, I can get the nodes belongs to a strongly connected components. But I need the path/order of the nodes that form a cycle.
EDIT after #josilber's response
I have a very dense graph, with 30 nodes and around 2000 edges. So graph.get.subisomorphisms.vf2 takes too long to run in my case.
I'm not familiar with graph algorithm, but I'm thinking maybe do a DFS to the original or reverse graph and use the order or order.out might work, but not sure.
Or any other ideas to make this run faster are welcomed!
Example
library(igraph)
set.seed(123)
graph <-graph(c(1,2,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,8,10,9,10,9,10,10,11,10,5,11,12,12,13,13,14,14,15,14,20,15,16, 16,17,17,18,18,19,19,20,20,21,21,1,22,23,22,23,23,22),directed=T)
E(graph)$weight= runif(ecount(graph),0,10)
> clusters(graph, "strong")
$membership
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1
$csize
[1] 2 21
$no
[1] 2
How do I get the edge list of a cycle with the highest weight here? Thanks!
Assuming that all nodes in each strongly connected component form a cycle and that you're only interested in this large cycle (e.g. in your example you're just interested in the cycle with nodes 1:21 and the cycle with nodes 22:23), then you can extract the node order that forms the cycle, grab the weights on the edges, and compute the total weight of the cycle.
# Compute the node order of the cycle for each component by performing an
# isomorphism with a same-sized directed ring graph
clusts <- clusters(graph, "strong")
(cycles <- lapply(1:clusts$no, function(x) {
sg <- induced.subgraph(graph, clusts$membership == x)
n <- sum(clusts$membership == x)
col <- rep(c(1, 0), c(1, n-1)) # Used to grab isomorphism starting at 1
sg.idx <- graph.get.subisomorphisms.vf2(sg, graph.ring(n, directed=TRUE), col, col)[[1]]
which(clusts$membership == x)[sg.idx]
}))
# [[1]]
# [1] 22 23
#
# [[2]]
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Now you can grab the sum of the edge weights for each cycle:
sapply(cycles, function(x) sum(graph[from=x, to=c(tail(x, -1), x[1])]))
# [1] 8.833018 129.959437
Note that this is in general NP-hard, because finding a Hamiltonian cycle in a general graph is NP-hard. Therefore the graph.get.subisomorphisms.vf2 call could be quite slow for large graphs.

Betweenness of a vertex and edge in a graph

I created a graph using a code
g <- graph.ring(10)
and then used betweenness function which outputted value 8 for every vertex and 12.5 for every edge. However, what I have understood this function does, according to me answer must be 7 for every vertex .
Can someone explain me how is it 8 and for edge 12.5?
Assume the following graph:
str(g)
# IGRAPH U--- 10 10 -- Ring graph
# + attr: name (g/c), mutual (g/l), circular (g/l)
# + edges:
# [1] 1-- 2 2-- 3 3-- 4 4-- 5 5-- 6 6-- 7 7-- 8 8-- 9 9--10 1--10
Let's take vertex 2, it appears on eight shortest paths:
1-2-3
1-2-3-4
1-2-3-4-5
1-2-3-4-5-6 (non-unique, 1-10-9-8-7-6)
3-2-1-10-9-8 (non-unique, 3-4-5-6-7-8)
3-2-1-10-9
3-2-1-10
4-3-2-1-10-9 (non-unique, 4-5-6-7-8-9)
4-3-2-1-10
5-4-3-2-1-10 (non-unique, 5-6-7-8-9-10)
So this is altogether 6 unique shortest paths, and four that each count as 1/2, as there is one alternative path without 2 for each. This is 6 plus 2, which is 8.

wrong labels on a directed graph in R

(Hi,
I have yet another question in R and I do not know what I am doing wrong. In this thread I have asked how to read the directed graph which worked well with the answer of user1317221_G.
Now I've deleted the edge 6->7 from the directed graph and read it that way:
library(igraph)
graph2 <- read.table("Graph_2.txt")
graph2 <- graph.data.frame(graph2)
That's how Graph_2.txt looks like:
1 2
1 3
2 5
3 4
3 5
4 5
5 6
5 10
7 8
7 9
7 12
8 9
9 10
9 11
9 12
10 7
10 11
11 7
11 12
But the plot shows (again, like in the other thread) a different directed graph:
As you can see in the file, there is no edge between 5->9 or from 10->12 as an example. So my question, again, is, how can I read the directed graph correctly? What am I doing wrong?
Thank you!
You can set the vertices labels as you create the graph using graph.data.frame, via its vertices option:
graph2 <- graph.data.frame(graph2, vertices = data.frame(symbols = 1:12,
label = 1:12))
plot(graph2, layout = layout.fruchterman.reingold)

Resources