Computing a pagerank on a weighted graph with absolute weights - graph

I am facing the same issue as expressed in this link (Networkx PageRank - Equal Ranks with Different Weights).
Essentially, I am using networkx to compute the pagerank on a graph. Since, pagerank computation first converts the graph to a right stochastic matrix (all out-going edges are normalised to one).
What I need is a way to not normalise the edge weights. So, If one node as only one outgoing edge with weight 0.1 and another one has only one outgoing edge with weight 0.05, I want this information to be used in the computation of pagerank (rather than being normalized to 1 each).
Does anyone know what could be the right way of modifying pagerank to achieve this.
thanks in advance,
Amit

Maybe you are thinking of what Larry and Sergey called "Personalized PageRank"? You can adjust the weighting of the nodes in the random jump part of the algorithm to create a bias. E.g.
In [1]: import networkx as nx
In [2]: G = nx.DiGraph()
In [3]: G.add_path([1,2,3,4])
In [4]: nx.pagerank_numpy(G)
Out[4]:
{1: 0.11615582303660349,
2: 0.2148882726177166,
3: 0.29881085476166286,
4: 0.370145049584017}
In [5]: nx.pagerank_numpy(G,personalization={1:1,2:10,3:1,4:1})
Out[5]:
{1: 0.031484535189871404,
2: 0.341607206810105,
3: 0.3218506609784609,
4: 0.3050575970215628}
See, for example, the discussion here http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf

I have a graph which is not right stochastic (i.e. edge weights are absolute and consistent across nodes). I changed pagerank implementation of networkx to avoid converting initial matrix to a right stochastic matrix thus giving me the right answer. However, this implies that pagerank doesn't converge as sometimes total sum of edges > 1, but usually rankings are fairly consistent after 30-40 iterations.
In essence, removing this line from the networkx code (algorithms/link_analysis/pagerank_alg.py) did the job:-
W = x.stochastic_graph(D, weight=weight)

Related

Maximum number of directed graph

Given set of N points, what is the maximum number of directed graphs can be created ? I'm having trouble with isomorphic problem.
Edit (1): Only directed simple, non-loop vertex graph, doesn't required to be connected
Edit (2): Any point in this set is treated equally to each other, so the main problem here is to calculate and subtract the number of isomorphic graphs created from different sets of edges.
Number of unlabeled directed graphs with n vertices is here (OEIS A000273)
1, 1, 3, 16, 218, 9608, 1540944, 882033440, 1793359192848
There is no closed formula, approximated value is number of labeled graphs divided by number of vertex permutations:
2^(n*(n-1)) / n!
There are n-1 possible edges for each node, so a total of n(n-1) edges.
Each possible graph will either contain a particular edge, or it won't.
So the number of possible graphs is 2^(n(n-1)).
EDIT: This only applies under the assumption there are no loops and each edge is unique.
Looping is basically coming back to the same node again so I'm considering double-headed arrows are not allowed. Now, if there are n nodes available so graphs you make without loops can have n-1 edges. Now, let m be the number of homeomorphic graphs you can make out of n nodes. Let si is number of symmetries present in ith graph of those m homeomorphic graphs. These symmetries I'm talking about are the likes of we study in group theory for geometric figures. Now, we know all edge can have 2 states i.e. left head and right head.
So the total number of distinct directed graphs can be given as:
Note: If these symmetries were not present then it would have been simply m*2(n-1)
(Edit 1) Also, this valid for connected graph with n nodes. If you want to include graphs that don't need to be connected then you'll have to modify a few things in this equation or add few things like the number of smaller partitions of this n noded graph you can form and apply this formula in each of those combinations.
Permutation&Combination, Group Theory, Symmetries, Partitions, Overall it's messy so this was the only simple way I could put it.

Number of vertices in Igraph in R

I'm fairly new to IGraph in R.
I'm doing community detection using IGraph and have already built my communities /clusters using the walktrap technique.
Next, within each cluster, I want to count the number of vertices between each two certain vertices. The reason I want to do this is, for each vertex XX, I want to list vertices that are connected to XX via say max 3 vertices, meaning no further than 3 vertices away from XX.
Can anyone help how this can be done in R please?
making a random graph (for demonstration):
g <- erdos.renyi.game(100, 1/25)
plot(g,vertex.size=3)
get walktrap communities and save as vertex attribute:
V(g)$community<-walktrap.community(g, modularity = TRUE, membership = TRUE)$membership
V(g)$community
now make a subgraph containing only edges and vertices of one community, e.g. community 2:
sub<-induced.subgraph(g,v=V(g)$community==2)
plot(sub)
make a matrix containing all shortest paths:
shortestPs<-shortest.paths(sub)
now count the number of shortest paths smaller or equal to 3.
I also exclude shortest paths from each node to itself (shortestPaths!=0).
also divide by two because every node pair appears twice in the matrix for undirected graphs.
Number_of_shortest_paths_smaller_3 <- length(which(shortestPs<=3 & shortestPs!=0))/2
Number_of_shortest_paths_smaller_3
Hope that's close to what you need, good luck!

Graph Bi-partition - Normalized cuts

Can anyone explain me how to do cut in graph in the Normalized cuts algorithm descriped here: http://web.cs.ucdavis.edu/~bai/ECS231/returnsfinal/WangH.pdf (page 3 bottom)?
I have image, graph, solved eigenvalue problem, eigenvector of 2nd smallest eigenvalue. But I don't know how to cut graph.
You don't need to 'cut' the graph. All you need to do is split the current image in 2 sets, A and B with A = y_1(i) > t and B = y_1(i) <= t where y_1 refering to your algorithm is the second smallest eigenvector (also the solution for normalized cuts, stated in proposition 2) and t as the section 4.4 of the paper explains could be 0, mean, median, mode or the value of your choice, with different results as shown in section 5.
You only need the indexes of where that happens, as if they were 2 different binary masks for your original image. Then the ncut_partitions algorithm recursively calls itself with both A and B sets separately as the new images.
Follow "Algorithm 1 Recursive two-way cut" line by line.

Finding probability of edges in a graph

I have a random graph G(n, p) with n = 5000 vertices and an edge probability of p = 0.004.
I wonder what would be the expected number of edges in the graph but I have not much knowledge in probability-theory.
Can anyone help me?
Thank you so much!
EDIT:
If pE is the number of possible edges in the Graph, wouldn't I have to calculate 0.004 * pE to get the expected number of edges in the graph?
First, ask yourself the maximum number of possible edges in the graph. This is when every vertex is connected to every single other vertex (nC2 = n * (n-1)/2), assuming this is an undirected graph without self-loops).
If each possible edge has a likelihood of 0.004, and the # of possible edges is n(n-1)/2, then the expected number of edges will be 0.004*(n(n-1)/2).
The number of expected vertices depend on the number of nodes and the edge probability as in E = p(n(n-1)/2).
The total number of possible edges in your graph is n(n-1) if any i is allowed to be linked to any j as both i->j and j->i. I am your friend, you are mine. If the graph is undirected (and an edge only means that we are friends) the total number of edges drop by half: n(n-1)/2 since i->j and j->i are the same.
The multiplication with p gives the expected number of edges, since every possible edge has become real or not depending on the probability. p=1 gives n(n-1)/2 edges since every possible edge actually happened. For graphs with p<1, the actual edge count might (obviously) differ from time to time if you were to actually generate a random graph using the p and n of your choice. Expected edge count will however be the most common observed edge count if you were to generate an infinite number of random graphs. NetLogo is a very pedagogical tool if you want to generate random graphs and get a feel for how network measurements arise from random graphs of different structures.

Trying to find networking metrics with R

I have created a directed network in R. I have to find the average degree, which I think I have, the diameter and the maximum/minimum clustering. The diameter is the longest of the shortest distances between two nodes. If this makes sense to anyone, please point me in the right direction. I have what I have coded below so far.
library(igraph)
ghw <- graph.formula(1-+4:5:9:12:14, 2-+11:16:17, 3-+4:5:7,
4-+1:3:6:7:8, 5-+1:3:6:7, 6-+4:5:8,
7-+3:4:5:8:13, 8-+4:6:7, 9-+10:12:14:15,
10-+9:12:14, 11-+2:16:17, 12-+1:9:10:14,
13-+7:15:18, 14-+1:9:10:12, 15-+13:16:18,
16-+2:11:15:17:18, 17-+2:11:16:18, 18-+13:15:16:17)
plot(ghw)
get.adjacency(ghw)
Total number of directed edges
numdeg <- ecount(ghw)
Average number of edges per node
avgdeg <- numdeg / 18
How about looking at the documentation?
diameter(ghw)
I am not sure what you mean by maximum/minimum clustering, but maybe this:
range(transitivity(ghw, type="local"))
Btw. your average number of edges per node is wrong, because every edge belongs to two nodes.

Resources