Is eigenvector centrality in igraph wrong? - r

I am trying to improve my understanding of eigenvector centrality. This overview from the University of Washington was very helpful, especially when read in conjunction with this R code. However, when I use evcent(graph_from_adjacency_matrix(A)), the result differs.
The below code
library(matrixcalc)
library(igraph)
# specify the adjacency matrix
A <- matrix(c(0,1,0,0,0,0,
1,0,1,0,0,0,
0,1,0,1,1,1,
0,0,1,0,1,0,
0,0,1,1,0,1,
0,0,1,0,1,0 ),6,6, byrow= TRUE)
EV <- eigen(A) # compute eigenvalues and eigenvectors
max(EV$values) # find the maximum eigenvalue
centrality <- data.frame(EV$vectors[,1])
names(centrality) <- "Centrality"
print(centrality)
B <- A + diag(6) # Add self loops
EVB <- eigen(B) # compute eigenvalues and eigenvectors
# they are the same as EV(A)
c <- matrix(c(2,3,5,3,4,3)) # Degree of each node + self loop
ck <- function(k){
n <- (k-2)
B_K <- B # B is the original adjacency matrix, w/ self-loops
for (i in 1:n){
B_K <- B_K%*%B #
#print(B_K)
}
c_k <- B_K%*%c
return(c_k)
}
# derive EV centrality as k -> infinity
# k = 100
ck(100)/frobenius.norm(ck(100)) # .09195198, .2487806, .58115487, .40478177, .51401731, .040478177
# Does igraph match?
evcent(graph_from_adjacency_matrix(A))$vector # No: 0.1582229 0.4280856 1.0000000 0.6965127 0.8844756 0.6965127
The rank correlation is the same, but it is still bothersome that the values are not the same. What is going on?

The result returned by igraph is not wrong, but note that there are subtleties to defining eigenvector centrality, and not all implementations handle self-loops in the same way.
Please see what I wrote here.
One way to define eigenvector centrality is simply as "the leading eigenvector of the adjacency matrix". But this is imprecise without specifying what the adjacency matrix is, especially what its diagonal elements should be when there are self-loops present. Depending on application, diagonal entries of the adjacency matrix of an undirected graph are sometimes defined as the number of self-loops, and sometimes as twice the number of self-loops. igraph uses the second definition when computing eigenvector centrality. This is the source of the difference you see.
A more intuitive definition of eigenvector centrality is that the centrality of each vertex is proportional to the sum of its neighbours centralities. Thus the details of the computation hinge on who the neighbours are. Consider a single vertex with a self-loop. It is its own neighbour, but how many times? We can traverse the self-loop in both directions, so it is reasonable to say that it is its own neighbour twice. Indeed, its degree is conventionally taken to be 2, not 1.
You will find that different software packages treat self-loops differently when computing the eigenvector centrality. In igraph, we made a choice by looking at the intuitive interpretation of eigenvector centrality rather than rigidly following a formal definition, with no regard for the motivation behind that definition.
Note: What I wrote about refers to how eigenvector centrality computations work internally, not to what as_adjacency_matrix() return. as_adjacency_matrix() adds one (not two) to the diagonal for each self-loop.

Related

Decay Centrality for Bipartite Graphs

How is decay centrality defined for a bipartite graph? I am unable to find a clear definition. All I got is https://www.centiserver.org/centrality/Decay_Centrality/. Which wasn't really helpful.
Also, is there some nice implementation of decay centrality for graphs in python? Because I managed to find only networkx (https://networkx.org/documentation/stable/index.html) and it does not have decay centrality. Though it does have all the other centrality measures like degree, closeness, betweenness, eigenvector centrality.
The definition of decay centrality given on the website you linked should work for bipartite and nonbipartite graphs alike.
Here’s how decay centrality is computed. Given a node v whose centrality you’re interested in, your first step is to pick a parameter δ between 0 and 1. The closer δ is to zero, the more emphasis you’ll place on nodes close to v. The closer δ is to 1, the more emphasis you place on nodes further from v.
Next, compute the distance from v to each other node x in the graph. (I’m not specifically familiar with networkx, but this should be easy to compute via breadth-first search if the graph doesn’t have edge weights, Dijkstra’s algorithm if the graph has edge weights and they’re nonnegative, or the Bellman-Ford algorithm if the graph has edge weights which can be negative.) Notationally, let’s have d(v, x) denote the distance from v to x that you computed.
Finally, for each node x other than v, compute δd(x, v), and add up all those values. That final number is the decay centrality for v.

How do you find betweennes centrality from igraph and what does the output number mean?

Hi so bascially lets say I have a network(A) and I want to find the betweeness centrality of it.
I used: centr_betw(graph, directed = FALSE, normalized = TRUE)
This returned every node with the value:
[1] 1.827102e+04 3.554450e+04 5.000000e-01 9.524383e+04
[5] 0.000000e+00 0.000000e+00 1.078184e+05 4.768125e+04
I really want to know what these numbers mean.
It also shows the between centralization of the whole network and a max value. Lets say the network(A) as a whole has a betweenness centrality of 0.04. What can you say about this network(A) when it is compared to a random network with a betweeness centrality of 0.001?
MUCH THANKS GUYS
Quite a bit of information can be found simply if you type ?centr_betw. In particular, centr_betw returns a list of three components: res, centralization, theoretical_max.
Each element of res is the betweenness centrality of a corresponding vertex i computed in this manner. Specifically, given a shortest path between some vertices j and k (not equal to i), i is considered to be more central if this shortest path includes i. Going over all possible pairs of j and k we can find this betweenness centrality of i.
Further, centralization and theoretical_max concern the Freeman centralization. centralization is C_x, which measures how central network's most central vertex is in relation to how central all the other vertices are. theoretical_max is the denominator of C_x providing the maximal possible value of the numerator across all networks with the same number of vertices.
So, if network A has Freeman centralization 0.04 and network B has 0.001, then we may say that the most central vertex of A is significantly more central than the most central vertex of B. If B is random (i.e., Erdos-Renyi), then that makes sense, because in a big enough network all vertices should play pretty similar role.

Computing the un-weighted length of a weighted shortest path

I have started investigating whether igraph would be a more efficient method for calculating the length of a least cost path. Using the package gdistance it is straightforward to supply a cost surface and generated least cost paths between two (or many) points. The function costDistance returns the actual length of the paths as the sum of all the segments lengths (i.e. not the cumulative COST of the least cost path).
My question is whether there is way to do this in igraph so that i can compare computation time. Using get.shortest.paths, i can obtain the length of the shortest path between vertices, but, when edge weights are provided, the path path length is reported as the weighted path length.
In short: i would like to find shortest paths on a weighted network but have the lengths reported in terms of edge length, not weighted edge length.
Note: I can see how this is possible by looping through each shortest path and then writing some extra code to just add up the unweighted edge lengths, but i fear this will cancel out my original need for increased efficiency of pairwise distance calculations over massive networks.
In get.shortest.paths, there is a weights argument! If you read ?get.shortest.paths you will find that weights is
Possibly a numeric vector giving edge weights. If this is NULL and the graph has a weight edge attribute, then the attribute is used. If this is NA then no weights are used (even if the graph has a weight attribute).
So you should set weights = NA. See below for an example:
require(igraph)
# make a reproducible example
el <- matrix(nc=3, byrow=TRUE,
c(1,2,.5, 1,3,2, 2,3,.5) )
g2 <- add.edes(graph.empty(3), t(el[,1:2]), weight=el[,3])
# calculate weighted shortest distance between vertice 1 and 3
get.shortest.paths(g2, 1, 3)
# calculate unweighted shortest distance between vertice 1 and 3
get.shortest.paths(g2, 1, 3, weights=NA)
I'm not sure whether I completely understand what "edge length" and "weighted edge length" means in your post (I guess that "edge length" is simply "the number of edges along the path" and "weighted edge length" is "the total weights of the edges along the path"), but if I'm right, your problem simply boils down to "finding shortest paths where edges are weighted by one particular criteria and then returning a length for each path which is the sum of some other properties of the edges involved".
If this is the case, you can pass the output="epath" parameter to get.shortest.paths; in this case, igraph will report the IDs of the edges along the weighted shortest path between two nodes. You can then use these IDs as indices into a vector containing the values of that other property that you wish to use when the lengths are calculated. E.g.:
> g <- grg.game(100, 0.2)
> E(g)$weight <- runif(ecount(g), min=1, max=20)
> E(g)$length <- runif(ecount(g), min=1, max=20)
> path <- unlist(get.shortest.paths(g, from=1, to=100, output="epath")[[1]]$epath)
> sum(E(g)$length[path])
This will give you the sum of the length attributes of the edges involved in the shortest path between nodes 1 and 100, while the shortest paths are calculated using the weight attribute (which is the default for get.shortest.paths, but you can also override it with the weights=... argument).
If you are simply interested in the number of edges on the path, you can either use a constant 1 for the lengths, or simply call length(path) in the last line.

Should out-linking nodes be in rows or columns for calculating eigenvector centrality of a directed graph using igraph in R?

I've got a probabilistic adjacency matrix (probability that i knows j), and I want to calculate eigenvector centrality for all i. The graph is directed.
Because the graph is directed, the adjacency matrix isn't symmetric. Because the adjacency matrix isn't symmetric, the result depends on whether the matrix is transposed. I suppose that one is the adjacency matrix for being linked to, and the other is the adjacency matrix for linking to others. Which is which?
Here's a dummy example demonstrating the issue:
set.seed(333)
N=4
adj = matrix(runif(N^2),N)
diag(adj)<-0
A = graph.adjacency(adj,weighted=TRUE)
evcent(A,directed=TRUE)$vector
A = graph.adjacency(t(adj),weighted=TRUE)
evcent(A,directed=TRUE)$vector
For directed graphs, matrix element A[i,j] represents the edge from vertex i to vertex j. See also http://en.wikipedia.org/wiki/Adjacency_matrix

Calculating the trace of a matrix to the power k

I need to calculate the trace of a matrix to the power of 3 and 4 and it needs to be as fast as it can get.
The matrix here is an adjacency matrix of a simple graph, therefore it is square, symmetric, its entries are always 1 or 0 and the diagonal elements are always 0.
Optimization is trivial for the trace of the matrix to the power of 2:
We only need the diagonal entries (i,i) for the trace, skip all others
As the matrix is symmetric these entries are just the entries of the i-th row squared and summed up
And as the entries are just 1 or 0 the square-operation can be skipped
Another idea I found on wikipedia was summing up all elements of the Hadamard product, i.e. entry-wise multiplication, but I don't know how to extend this method to the power of 3 and 4.
See http://en.wikipedia.org/wiki/Trace_(linear_algebra)#Properties
Maybe I'm just blind but I can't think of a simple solution.
In the end I need a C++ implementation, but I think that's not important to the question.
Thanks in advance for any help.
The trace is the sum of the eigenvalues and the eigenvalues of a matrix power are just the eigenvalues to that power.
That is, if l_1,...,l_n are the eigenvalues of your matrix then trace(M^p) = 1_1^p + l_2^p +...+l_n^p.
Depending on your matrix you may want to go with computing the eigenvalues and then summing. If your matrix has low rank (or can be well approximated with a low rank matrix) you can compute the eigenvalues very cheaply (a partial eigendecomposition has complexity O(n*k^2) where k is the rank).
Edit: You mention in the comments that it's 1600x1600 in which case finding all the eigenvalues should be no problem. Here's one of many C++ codes that you can use for this http://code.google.com/p/redsvd/
Ok, I just figured this one out myself.
The important thing I did not know was this:
If A is the adjacency matrix of the directed or undirected graph G, then the matrix An (i.e., the matrix product of n copies of A) has an interesting interpretation: the entry in row i and column j gives the number of (directed or undirected) walks of length n from vertex i to vertex j. This implies, for example, that the number of triangles in an undirected graph G is exactly the trace of A^3 divided by 6.
(Copied from http://en.wikipedia.org/wiki/Adjacency_matrix#Properties)
Retrieving the number of paths of a given length from node i to i for all n nodes can essentially be done in O(n) when dealing with sparse graphs and using adjacency lists instead of matrices.
Nevertheless, thanks for your answers!

Resources