How to calculate the betweenness using the random walk algorithm? - r

The igraph package calculates the betweenness using shortest path between nodes.
http://igraph.sourceforge.net/doc/R/betweenness.html
Now I want to calculate the betweenness using the random walk.
A measure of betweenness centrality based on random walks, M. E. J. Newman, Social Networks 27, 39-54 (2005).
I know that NetworkX in python can implement this function. But it turns out the memory error because of the large network I used.
Is there any suggestion about how to calculate betweenness using the random walk?
Thanks!

After running for three days and nights, the computer finally obtained the result of betweenness using NetworkX.
The graph I used consists of about six thousand nodes and 5 million edges. The RAM of computer is 16G.
The solver is set to “full” (uses most memory), not the default 'lu'.
This link also mentioned the problem of run time using NetworkX to calculate the betweenness based on random walk.

Related

Use modularity maximization in undirected, non-binary network?

I have a network model of trade between nodes, where the edge weights are the total trade flow between nodes and are therefore undirected.
I would like to cluster the network using modularity maximization but the needed binary (0/1) adjacency matrix does not adequately represent the different edge weights. Is there a way to deal with this complexity or is modularity maximization simply the wrong method?
Happy for any hints!

Sum up shortest paths in a weighted network

I have a graph with 340 nodes and 700 links. As for performance indicator of the network, I want to compute the sum of all weighted shortest paths in my network.
I tried the all_shortest_paths command from the igraph package. But my system doesn't have enough RAM to store the resulting matrix.
Can someone recommend a package or code which computes the sum of all shortest paths? (So the big matrix is not needed?)
For unweighted networks is the command mean_distance, which does basically something similar!?
You could try the package dodgr. With
dodgr_dists(graph)
you can generate a square matrix of distances between your nodes (more info).
Note: This will only work if your graph is directed.

Is PageRank always better then eigenvector or Katz centrality?

As far as I understand, there is classical eigenvector centrality and there are variants such as Katz centrality or PageRank. I wonder if the latter is the "latest stage" in the evolution of eigenvector centrality and therefore always superior? Or are there certain conditions, depending on which one should use one or the other. If so, what conditions would that be?
Might be a little bit late, but
Eigen Vector Centrality assumes that nodes with more important connections are important. For example, people who know the president are probably important. mathematically, this is performed by calculating the centrality measurements by finding the eigen vector of the largest eigenvalue of the adjacency matrix.
The problem with Eigen Vector Centrality is that it does not handle directed graphs well as centrality is not passed to incoming edges, leading to lots of zeroes for centrality despite having many outgoing edges. Katz Centrality seeks to fix this problem by adding a small bias term so that no node has strictly zero centrality, thus affecting the centralities of the neighboring nodes as well.
However, the problem with Katz Centrality is that when a node becomes very central in a network, it passes its centrality to all of its outgoing links, making all those nodes very popular. For example, even though people who know the president are important, not all of them are (the car driver of the president for example). To fix this, PageRank Centrality utilizes the degree centrality of the node, mixed with Katz centrality to balance this problem.
In Conclusion, If graph is undirected, use Eigen Vector Centrality. If graph is directed, using Katz or PageRank is dependent upon the situation. If you want nodes that are extremely central to highly influence its neighbors, then use Katz; else, use PageRank.
you can not compare these three cause they are base on different prospective and definition of Centrality. PageRank uses eigenvector centrality concept to determine how important a website is read this
for instance :
in eingenvector centrality we use right eigenvector in the power Iteration algorithm. Now in Pagerank algorithm, we are interesting in inlinks of nodes not outlinks(directed graph). so instead of using right eigenvector, we use left eigenvector. Eigenvector centrality
Also read : Katz centrality

Find highest scoring pair of nodes in graph.

I'm trying so solve an optimization problem where I want to find a combination of two nodes with the highest impact/importance in a graph. Lets say I want to base this on betweenness centrality (BC). I guess the more sensible approach is to select one node (maybe one with a high BC), then calculate the BC for the resulting network and then remove the node with the highest value for BC. My goal is to generate a list of the highest scoring combinations of nodes when removed from the original graph. I've implemented a simplified method that picks out random nodes and if the score is higher than the previous, one of the two nodes is reused in the next combination. I'm not sure if this approach is good enough of if the code will "get stuck" at local optima combinations.
Any pointers to steer me in the right direction would be appreciated.
Unless there are properties of the graph and/or function that you can exploit, you have to check all pairs to be sure that the maximum is found.
Several approximate betweenness centrality calculation algorithms have been proposed.
Your general method is good, and it is somewhat similar to the one used in Fast approximation of betweenness centrality through sampling [Riondato, Kornaropoulos] 2015. here here.
Quoting:
"Since exact computation in large networks is prohibitively expensive,
we present two efficient randomized algorithms for betweenness
estimation. The algorithms are based on random sampling of shortest
paths and offer probabilistic guarantees on the quality of the
approximation. [...] The first algorithm estimates the betweenness of all vertices (or edges): all approximate values are within an additive factor ε ∈ (0, 1) from the real values, with probability at least 1 − δ. The second algorithm focuses on the top-K vertices (or edges) with highest betweenness and estimate their betweenness value to within a multiplicative factor ε, with probability at least 1 − δ. This is the first algorithm that can compute such approximation for the top-K vertices (or edges). "
The time complexity for both algorithms is O(r*(|E|+|V| log |V|)), where r is the sample size (which determines the accuracy).
Their second algorithm is quite relevant for your use case (K=2):
"This is the first algorithm that can compute such approximation for
the top-K vertices (or edges)."
First, calculate betweeness centrality value for all nodes. Sort in ascending order. Select the node with the highest BC value and remove it from the network. Reconnect the remaining nodes and repeat the process continuously. This will enable you pick the nodes with the highest BC on the network.

evaluation metric for community detection using igraph in R?

I am running Community Detection in graphs and I run different community detection algorithm implemented in igraph listed here :
1. Edge-betweennes.community(w,-d)
2. walktrap.community (w,-d)
3. fastgreedy.community(w)
4. spinglass.community (w,d, not for unconnected graph)
5. infomap.community (w,d)
6. label.propagation.community(w)
7. Multivel.community(w)
8.leading.eigenvector.community (w)
as I have two types of graph one is directed an weighted and the other one is undirected and unweighted,
the one which I could use for both are four (1,2,4,5) which I get the error on the forth one as my graph is an unconnected graph, so there is three.
now I want to compare them using different evaluation metrics provided in here http://lab41.github.io/Circulo/ , as I searched there is modularity and compare.communities ( metrics listed here :http://www.inside-r.org/packages/cran/igraph/docs/compare.communities are ("vi", "nmi","split.join", "rand","adjusted.rand) in igraph).
what I am wondering about are :
is there any other algorithm which is implemented in igraph and is not in the list? and which will give me overlapping communities as well.
which of these metric could be used for weighted and directed graph and is there any implementation in igraph?
also which metric could be used for which algorithm? , as I go through one of the article "edge-betweeness"the metric used in there was the ground truth and they compare to the known community graph.
thank you in advance.
Yes, there are many algorithms which are not in iGraph package, to name one: RG+, presented in Cluster "Cores and Modularity Maximization" on 2010.
Modularity by far is the best measure to evaluate communities.
edge.betweenness simply gives you the betweenness centrality values of all the edges, it's not a measure to evaluate communities but can be used for one.

Resources