How do we define betweenness centrality when weights have a positive meaning? - graph

I've read that betweenness centrality is defined as the number of times a vertex lies on the shortest path of the other pairs of nodes.
However, in case weights have a positive meaning (i.e. the more the weight of an edge the merrier), then how does one define betweenness centrality?
In this case, is there another way to calculate betweenness centrality? Or is it simply interpreted in a different way?

Computing the betweenness centrality of a vertex v relies on the following fraction, for any u and w: s(u,w,v) / s(u,w) where s(u,w,v) is the number of shortest paths between u and w that involve v, and s(u,w) is the total number of shortest paths between u and w.
With positive edge weights, I would suggest that you count each shortest path with its own weight: replace s(u,w,v) by the sum of weights of shortest paths between u and w that involve v; and s(u,w) by the sum of weights of all shortest paths between u and w.
Then, you have to define the weight of paths, and this depends on what you have in mind. You may for instance consider the sum of edge weights, their product, their minimum or maximal value, etc.
Warning: this definition still relies on shortest unweighted paths; if longer paths with higher weights exist, they will be ignored, which means that graph structure prevails. This may not be satisfactory.
Note: this approach is somewhat equivalent, if edges have integer weight and a path weight is its edge weight product, to use the classical definition on a multi-graph (an unweighted graph where several edges may exist between two same vertices).

Related

Decay Centrality for Bipartite Graphs

How is decay centrality defined for a bipartite graph? I am unable to find a clear definition. All I got is https://www.centiserver.org/centrality/Decay_Centrality/. Which wasn't really helpful.
Also, is there some nice implementation of decay centrality for graphs in python? Because I managed to find only networkx (https://networkx.org/documentation/stable/index.html) and it does not have decay centrality. Though it does have all the other centrality measures like degree, closeness, betweenness, eigenvector centrality.
The definition of decay centrality given on the website you linked should work for bipartite and nonbipartite graphs alike.
Here’s how decay centrality is computed. Given a node v whose centrality you’re interested in, your first step is to pick a parameter δ between 0 and 1. The closer δ is to zero, the more emphasis you’ll place on nodes close to v. The closer δ is to 1, the more emphasis you place on nodes further from v.
Next, compute the distance from v to each other node x in the graph. (I’m not specifically familiar with networkx, but this should be easy to compute via breadth-first search if the graph doesn’t have edge weights, Dijkstra’s algorithm if the graph has edge weights and they’re nonnegative, or the Bellman-Ford algorithm if the graph has edge weights which can be negative.) Notationally, let’s have d(v, x) denote the distance from v to x that you computed.
Finally, for each node x other than v, compute δd(x, v), and add up all those values. That final number is the decay centrality for v.

Is eigenvector centrality in igraph wrong?

I am trying to improve my understanding of eigenvector centrality. This overview from the University of Washington was very helpful, especially when read in conjunction with this R code. However, when I use evcent(graph_from_adjacency_matrix(A)), the result differs.
The below code
library(matrixcalc)
library(igraph)
# specify the adjacency matrix
A <- matrix(c(0,1,0,0,0,0,
1,0,1,0,0,0,
0,1,0,1,1,1,
0,0,1,0,1,0,
0,0,1,1,0,1,
0,0,1,0,1,0 ),6,6, byrow= TRUE)
EV <- eigen(A) # compute eigenvalues and eigenvectors
max(EV$values) # find the maximum eigenvalue
centrality <- data.frame(EV$vectors[,1])
names(centrality) <- "Centrality"
print(centrality)
B <- A + diag(6) # Add self loops
EVB <- eigen(B) # compute eigenvalues and eigenvectors
# they are the same as EV(A)
c <- matrix(c(2,3,5,3,4,3)) # Degree of each node + self loop
ck <- function(k){
n <- (k-2)
B_K <- B # B is the original adjacency matrix, w/ self-loops
for (i in 1:n){
B_K <- B_K%*%B #
#print(B_K)
}
c_k <- B_K%*%c
return(c_k)
}
# derive EV centrality as k -> infinity
# k = 100
ck(100)/frobenius.norm(ck(100)) # .09195198, .2487806, .58115487, .40478177, .51401731, .040478177
# Does igraph match?
evcent(graph_from_adjacency_matrix(A))$vector # No: 0.1582229 0.4280856 1.0000000 0.6965127 0.8844756 0.6965127
The rank correlation is the same, but it is still bothersome that the values are not the same. What is going on?
The result returned by igraph is not wrong, but note that there are subtleties to defining eigenvector centrality, and not all implementations handle self-loops in the same way.
Please see what I wrote here.
One way to define eigenvector centrality is simply as "the leading eigenvector of the adjacency matrix". But this is imprecise without specifying what the adjacency matrix is, especially what its diagonal elements should be when there are self-loops present. Depending on application, diagonal entries of the adjacency matrix of an undirected graph are sometimes defined as the number of self-loops, and sometimes as twice the number of self-loops. igraph uses the second definition when computing eigenvector centrality. This is the source of the difference you see.
A more intuitive definition of eigenvector centrality is that the centrality of each vertex is proportional to the sum of its neighbours centralities. Thus the details of the computation hinge on who the neighbours are. Consider a single vertex with a self-loop. It is its own neighbour, but how many times? We can traverse the self-loop in both directions, so it is reasonable to say that it is its own neighbour twice. Indeed, its degree is conventionally taken to be 2, not 1.
You will find that different software packages treat self-loops differently when computing the eigenvector centrality. In igraph, we made a choice by looking at the intuitive interpretation of eigenvector centrality rather than rigidly following a formal definition, with no regard for the motivation behind that definition.
Note: What I wrote about refers to how eigenvector centrality computations work internally, not to what as_adjacency_matrix() return. as_adjacency_matrix() adds one (not two) to the diagonal for each self-loop.

Generate random cycle that approximates a given weight

Is there a graph algorithm for solving the following problem:
Given a weighted undirected graph G (all weights are positive), a start node N and a total weight W*. Generate a random cycle through the graph, starting and ending at node N, of which the total weight (the summed weight of all the edges) approximates the given weight W*.
One could see this as generating the cycle that best approximates W*, but generating a cycle that approximates W* within some margin of error is also fine.
If you want a simple cycle, you want an approximation algorithm for the travelling salesman problem. I believe there are known hardness results, indicating that this is NP-hard for general graphs, but there is a wide range of heuristics; you can check the literature.

How do you find betweennes centrality from igraph and what does the output number mean?

Hi so bascially lets say I have a network(A) and I want to find the betweeness centrality of it.
I used: centr_betw(graph, directed = FALSE, normalized = TRUE)
This returned every node with the value:
[1] 1.827102e+04 3.554450e+04 5.000000e-01 9.524383e+04
[5] 0.000000e+00 0.000000e+00 1.078184e+05 4.768125e+04
I really want to know what these numbers mean.
It also shows the between centralization of the whole network and a max value. Lets say the network(A) as a whole has a betweenness centrality of 0.04. What can you say about this network(A) when it is compared to a random network with a betweeness centrality of 0.001?
MUCH THANKS GUYS
Quite a bit of information can be found simply if you type ?centr_betw. In particular, centr_betw returns a list of three components: res, centralization, theoretical_max.
Each element of res is the betweenness centrality of a corresponding vertex i computed in this manner. Specifically, given a shortest path between some vertices j and k (not equal to i), i is considered to be more central if this shortest path includes i. Going over all possible pairs of j and k we can find this betweenness centrality of i.
Further, centralization and theoretical_max concern the Freeman centralization. centralization is C_x, which measures how central network's most central vertex is in relation to how central all the other vertices are. theoretical_max is the denominator of C_x providing the maximal possible value of the numerator across all networks with the same number of vertices.
So, if network A has Freeman centralization 0.04 and network B has 0.001, then we may say that the most central vertex of A is significantly more central than the most central vertex of B. If B is random (i.e., Erdos-Renyi), then that makes sense, because in a big enough network all vertices should play pretty similar role.

Computing the un-weighted length of a weighted shortest path

I have started investigating whether igraph would be a more efficient method for calculating the length of a least cost path. Using the package gdistance it is straightforward to supply a cost surface and generated least cost paths between two (or many) points. The function costDistance returns the actual length of the paths as the sum of all the segments lengths (i.e. not the cumulative COST of the least cost path).
My question is whether there is way to do this in igraph so that i can compare computation time. Using get.shortest.paths, i can obtain the length of the shortest path between vertices, but, when edge weights are provided, the path path length is reported as the weighted path length.
In short: i would like to find shortest paths on a weighted network but have the lengths reported in terms of edge length, not weighted edge length.
Note: I can see how this is possible by looping through each shortest path and then writing some extra code to just add up the unweighted edge lengths, but i fear this will cancel out my original need for increased efficiency of pairwise distance calculations over massive networks.
In get.shortest.paths, there is a weights argument! If you read ?get.shortest.paths you will find that weights is
Possibly a numeric vector giving edge weights. If this is NULL and the graph has a weight edge attribute, then the attribute is used. If this is NA then no weights are used (even if the graph has a weight attribute).
So you should set weights = NA. See below for an example:
require(igraph)
# make a reproducible example
el <- matrix(nc=3, byrow=TRUE,
c(1,2,.5, 1,3,2, 2,3,.5) )
g2 <- add.edes(graph.empty(3), t(el[,1:2]), weight=el[,3])
# calculate weighted shortest distance between vertice 1 and 3
get.shortest.paths(g2, 1, 3)
# calculate unweighted shortest distance between vertice 1 and 3
get.shortest.paths(g2, 1, 3, weights=NA)
I'm not sure whether I completely understand what "edge length" and "weighted edge length" means in your post (I guess that "edge length" is simply "the number of edges along the path" and "weighted edge length" is "the total weights of the edges along the path"), but if I'm right, your problem simply boils down to "finding shortest paths where edges are weighted by one particular criteria and then returning a length for each path which is the sum of some other properties of the edges involved".
If this is the case, you can pass the output="epath" parameter to get.shortest.paths; in this case, igraph will report the IDs of the edges along the weighted shortest path between two nodes. You can then use these IDs as indices into a vector containing the values of that other property that you wish to use when the lengths are calculated. E.g.:
> g <- grg.game(100, 0.2)
> E(g)$weight <- runif(ecount(g), min=1, max=20)
> E(g)$length <- runif(ecount(g), min=1, max=20)
> path <- unlist(get.shortest.paths(g, from=1, to=100, output="epath")[[1]]$epath)
> sum(E(g)$length[path])
This will give you the sum of the length attributes of the edges involved in the shortest path between nodes 1 and 100, while the shortest paths are calculated using the weight attribute (which is the default for get.shortest.paths, but you can also override it with the weights=... argument).
If you are simply interested in the number of edges on the path, you can either use a constant 1 for the lengths, or simply call length(path) in the last line.

Resources