Explanation of network indices normalization - math

Could someone explain in pretty simple words why the normalization of many network analysis indicies (graph theory) is n(n - 1), where n – the graph size? Why do we need to take into account (n - 1)?

Most network measures that focus on counting edges (e.g. clustering coefficient) are normalized by the total number of possible edges. Since every edge connects a pair of vertices, we need to know how many possible pairs of vertices we can make. There are n possible vertices we could choose as the source of our edge, and therefore there are n-1 possible vertices that could be the target of our edge (assuming no self-loops, and if undirected divide by 2 bc source and target are exchangeable). Hence, you frequently encounter $n(n-1)$ or $\binomal{n}{2}$.

Related

finding maximum weight subgraph

My graph is as follows:
I need to find a maximum weight subgraph.
The problem is as follows:
There are n Vectex clusters, and in every Vextex cluster, there are some vertexes. For two vertexes in different Vertex cluster, there is a weighted edge, and in the same Vextex cluster, there is no edge among vertexes. Now I
want to find a maximum weight subgraph by finding only one vertex in each
Vertex cluster. And the total weight is computed by adding all weights of the edges between the selected vertex. I add a picture to explain the problem. Now I know how to model this problem by ILP method. However, I do not know how to solve it by an approximation algorithm and how to get its approximation ratio.
Could you give some solutions and suggestions?
Thank you very much. If any unclear points in this description,
please feel free to ask.
I do not think you can find an alpha-approx for this problem, for any alpha. That is because if such an approximation exists, then it would also prove that the unique games conjecture(UGC) is false. And disproving (or proving) the UGC is a rather big feat :-)
(and I'm actually among the UGC believers, so I'd say it's impossible :p)
The reduction is quite straightforward, since any UGC instance can be described as your problem, with weights of 0 or 1 on edges.
What I can see as polynomial approximation is a 1/k-approx (k the number of clusters), using a maximum weight perfect matching (PM) algorithm (we suppose the number of clusters is even, if it's odd just add a 'useless' one with 1 vertex, 0 weights everywhere).
First, you need to build a new graph. One vertex per cluster. The weight of the edge u, v has the weight max w(e) for e edge from cluster u to cluster v. Run a max weight PM on this graph.
You then can select one vertex per cluster, the one that corresponds to the edge selected in the PM.
The total weight of the solution extracted from the PM is at least as big as the weight of the PM (since it contains the edges of the PM + other edges).
And then you can conclude that this is a 1/k approx, because if there exists a solution to the problem that is more than k times bigger than the PM weight, then the PM was not maximal.
The explanation is quite short (lapidaire I'd say), tell me if there is one part you don't catch/disagree with.
Edit: Equivalence with UGC: unique label cover explained.
Think of a UGC instance. Then, every node in the UGC instance will be represented by a cluster, with as many nodes in the cluster as there are colors in the UGC instance. Then, create edge with weight 0 if they do not correspond to an edge in the UGC, or if it correspond to a 'bad color match'. If they correspond to a good color match, then give it the weight 1.
Then, if you find the optimal solution to an instance of your problem, it means it corresponds to an optimal solution to the corresponding UGC instance.
So, if UGC holds, it means it is NP-hard to approximate your problem.
Introduce a new graph G'=(V',E') as follows and then solve (or approximate) the maximum stable set problem on G'.
Corresponding to each edge a-b in E(G), introduce a vertex v_ab in V'(G') where its weight is equal to the weight of the edge a-b.
Connect all of vertices of V'(G') to each other except for the following ones.
The vertex v_ab is not connected to the vertex v_ac, where vertices b and c are in different clusters in G. In this manner, we can select both of these vertices in an stable set of G' (Hence, we can select both of the corresponding edges in G)
The vertex v_ab is not connected to the vertex v_cd, where vertices a, b, c and d are in different clusters in G. In this manner, we can select both of these vertices in an stable set of G' (Hence, we can select both of the corresponding edges in G)
Finally, I think you can find an alpha-approximation for this problem. In other words, in my opinion the Unique Games Conjecture is wrong due to the 1.999999-approximation algorithm which I proposed for the vertex cover problem.

algorithm for 'generalized' matching in complete graphs

My problem is a generalization of a task solved by [Blossom algorithm] by Edmonds. The original task is the following: given a complete graph with weighted undirected edges, find a set of edges such that
1) every vertex of the graph is adjacent to only one edge from this set (i.e. vertices are grouped into pairs)
2) sum over weights of edges in this set is minimal.
Now, I would like to modify the first goal into
1') vertices are grouped into sets of 3 vertices (or in general, d vertices), and leave condition 2) unchanged.
My questions:
Do you know if this 'generalised' problem has a name?
Do you know about an algorithm solving it in number of steps being polynomial of number of vertices (like Blossom algorithm for an original problem)? I don't see a straightforward generalisation of Blossom algorithm, as it is based on looking for augmenting paths on a graph compressed to a bipartite graph (and uses here Hungarian algorithm). But augmenting paths do not seem to point to groups of vertices different than pairs.
Best regards,
Paweł

The number of connected (!) subgraphs is exponential?

i want to show that for an example graph family the nummer of connected subgraphs grows expnential with n.
That is easy to show for a complete graph, because a complete graph has
n(n-1)/2 = n over 2
edges. One edge is either in the subgraph or not. Therefore, every subgraph can be enumerated with a binary number of the length
2^(n over 2)
and because its a completed graph, every subgraph is connected.
But lets assume for example we want to show that the number of connected subgraphs in a 3- or 4-regular graph grows also exponential. We can enumerate the subgraphs in the same manner. But we have to exclude a lot of them, because they are not connected.
How can we do that? Is there a way to distinguish all connected subgraphs from the not connected ones?
Greetings and thanks for your thoughts
This idea is easy to prove for certain families of graphs, and in particular families of graphs with a high "Edge-Connectivity" (see https://en.wikipedia.org/wiki/K-edge-connected_graph).
For an edge-connectivity greater than k, you can always choose any k vertices for removal and generate a connected graph. Hence, you get at least Summation(j = 1 .. k; E-choose-k) graphs where E is the total number of edges. Let k > (E/m) for some constant m.
Then indeed, the number of sub-graphs will grow exponentially.

R: Find cliques of special sizes efficiently using igraph

I have a large graph(100000 nodes) and i want to find its cliques of size 5.
I use this command for this goal:
cliques(graph, min=5, max=5)
It takes lots of time to calculate this operation. It seems that it first tries to find all of the maximal cliques of the graph and then pick the cliques with size 5; I guess this because of the huge difference of run time between these two commands while both of them are doing a same job:
adjacent.triangles (graph) # takes about 30s
cliques(graph, min=3, max=3) # takes more than an hour
I am looking for a command like adjacent.triangles to find clique with size 5 efficiently.
Thanks
There is a huge difference between adjacent.triangles() and cliques(). adjacent.triangles() only needs to count the triangles, while cliques() needs to store them all. This could easily account for the time difference if there are lots of triangles. (Another factor is that the algorithm in cliques() has be generic and not limited to triangles - it could be the case that adjacent.triangles() contains some optimizations since we know that we are interested in triangles only).
For what it's worth, cliques() does not find all the maximal cliques; it starts from 2-cliques (i.e. edges) and then merges them into 3-cliques, 4-cliques etc until it reaches the maximum size that you specified. But again, if you have lots of 3-cliques in your graph, this could easily become a bottleneck as there is one point in the algorithm where all the 3-cliques have to be stored (even if you are not interested in them) since we need them to find the 4-cliques.
You are probably better off with maximal.cliques() first to get a rough idea of how large the maximal cliques are in your graph. The idea here is that you have a maximal clique of size k, then all its subsets of size 5 are 5-cliques. This means that it is enough to search for the maximal cliques, keep the ones that are at least size 5, and then enumerate all their subsets of size 5. But then you have a different problem as some cliques may be counted more than once.
Update: I have checked the source code for adjacent.triangles and basically all it does is that it loops over all vertices, and for each vertex v it enumerates all the (u, w) pairs of its neighbors, and checks whether u and w are connected. If so, there is a triangle adjacent on vertex v. This is an O(nd2) operation if you have n vertices and the average degree is d, but it does not generalize to groups of vertices of arbitrary size (because you would need to hardcode k-1 nested for loops in the code for a group of size k).

Representing a graph where nodes are also weights

Consider a graph that has weights on each of its nodes instead of between two nodes. Therefore the cost of traveling to a node would be the weight of that node.
1- How can we represent this graph?
2- Is there a minimum spanning path algorithm for this type of graph (or could we modify an existing algorithm)?
For example, consider a matrix. What path, when traveling from a certain number to another, would produce a minimum sum? (Keep in mind the graph must be directed)
if one don't want to adjust existing algorithms and use edge oriented approaches, one could transform node weights to edge weights. For every incoming edge of node v, one would save the weight of v to the edge. Thats the representation.
well, with the approach of 1. this is now easy to do with well known algorithms like MST.
You could also represent the graph as wished and hold the weight at the node. The algorithm simply didn't use Weight w = edge.weight(); it would use Weight w = edge.target().weight()
simply done. no big adjustments are necessary.
if you have to use adjacency matrix, you need a second array with node weights and in adjacency matrix are just 0 - for no edge or 1 - for an edge.
hope that helped

Resources