Which community detection algorithms to choose in igraph? - r

I am trying to cluster this network. The vertices are tags and the edges depict the co-occurance of these tags. The edge widths show the number of times a tag pair occurs. 'energy' and 'electricity' occur the most together.
I tried using community detection algorithms in R, especially edge.between.community which gives a modularity of 0.35 with this network. fastgreedy.community does not work on a weighted-edge graph. Is there any other algorithm somebody could suggest for this specific case? I am a novice in both graph theory and R.

Related

Optimal number of graph vertices when conducting an experiment

I want to conduct an experiment about graph drawing algorithms and for this purpose I have to generate graphs, but I don't know what is the optimal number of graph vertices that should be generated, is it 100 or 200 vertices ? What is the best number of vertices that humans can understand and comprehend ? How can I decide that, do you have any ideas or some papers that are useful for me, I searched online about this topic in Google scholar and many other papers search engine, but I did not find anything.
Thanks in advance
This is a very broad question. Size and type of graphs may depend on the research focus.
The GDToolkit (which i am not affiliated with) publishes several graph drawing test case collections from academic literature which might be a starting point.
In general graph drawing gets more interesting the higher the number of vertices is, especially if labelling comes into play.
A number of vertices up to 100 (maybe more in graphs with a structure to exploit geometrically) has the benefit that you can ask humans to layout the graph and compare their results with what the tested algos produce.
As for the maximum number of vertices that people can 'understand', there is no fixed limit - think of a 2D or 3D lattice, the number of vertices up to which humans can grasp the essence of the graph is virtually unlimited.
There is of course a lot of leeway in what you mean exactly by 'understand'. In general human respondents will be able to tell about non-trivial properties of the graph or create hypotheses on such properties if some visual pattern shows up (this might be an interesting research topic in itself [I have not checked for existing work in this domain], think of 'distorted' drawings of lattices or drawings projections of lattices in higher dimensions).

Compute ALL spanning trees of a directed acyclic graph using igraph, network, or other R package

I want to compute the complete set of spanning trees for a graph. The graphs I'm working with are small (usually less than 10 nodes).
I see functionality for computing the minimum spanning tree with igraph:
library(igraph)
g <- sample_gnp(100, 3/100)
g_mst <- mst(g)
and I see a previous StackOverflow post that described how to compute a spanning tree using a breadth-first search. The code below is adapted from the accepted answer:
r <- graph.bfs(g, root=1, neimode='all', order = TRUE, father = TRUE)
h <- graph(rbind(r$order, r$father[r$order, na_ok = TRUE])[,-1], directed = FALSE)
However, I don't know how to adapt this to compute multiple spanning trees. How would one adapt this code to compute all spanning trees? I'm thinking that one piece of this would be to loop through each node to use as the "root" of each tree, but I don't think that takes me all the way there (since there could still be multiple spanning trees associated with a given root node).
EDIT
The end-goal is to compute the distortion of a graph, which is defined as follows (link, see page 5):
Consider any spanning tree T on a graph G, and compute the average distance t = E[HT] on T between any two nodes that share a link in G. The distortion measures how T distorts links in G, i.e. it measures how many extra hops are required to go from one side of a link in G to the other, if we are restricted to using T. The distortion is defined [13] to be the smallest such average over all possible Ts. Intuitively distortion measures how tree-like a graph is.
[13] R. G. H. Tagmunarunkit and S. Jamin, “Network topology generators: degree-based vs. structural,” in SIGMCOMM, 2002.
I don't think you will find a function to do that on an R package.
There are n^{n-2} spanning trees on a graph (according to the Cayley's formula). Even on your graph with 10 nodes, there may exist 1,000,000,000 different spanning trees, which is a big number.
Furthermore, the problem of counting or enumerating all spanning trees of a given graph is #P-Complete, which is as harder as NP-Complete problems.
If you are really willing to do that, I recommend dropping R and start using C or C++, which can compute your problem much faster than any R code can do.
Have a look on this paper for a survey on algorithms for computing all spanning trees of a connected graph (which I think is your case).

Visualizing Multiple Networks in R using Igraph

I need to visualize a very large number (10k) of distinct networks, all on the same page, and label each node a type (binary). Each network aside from a few are relatively small. Link lengths/weighting is not important for this dataset, and a high degree of overlap on the bigger networks is fine as long as density is evident.
I have gone over some of the Igraph documentation and have been able to create individual graphs using a node/link list. However, I would like some insight if possible on how to translate this large number of networks (currently just 10k arrays with node identity and type inside) into 10k plots, and whether this is a feasible task with Igraph.
Any insight is greatly appreciated.

Clustering in Gephi (Louvain Method)

I have started to work with gephi to help me display a dataset.
The dataset contains:
tags (terms for a certain picture) as nodes
Normalized Google Similarity Distance between those tags as edges with a weight (between 0 und 1)
Every tag is connected to every other tag, as long as they both belong to the same picture. So I have one cluster of nodes and edges for every picture.
I have now imported this dataset to gephi in the following format:
nodes: id, label
edges: target, source, weight (between 0 and 1)
Like 500 nodes and 6000 edges.
My problem now is that after importing all those nodes and edges the graph looks kind of bunched with no real order. Every cluster of every picture is mixed into other clusters of other pictures.
Now using Modularity as Partition algorithm (which should use the Louvain method) the graph is getting colored, each color represent a picture. Now I can split this mess, using the Force Atlas 2 Layout.
I now have a colored graph with something like 15 clusters (every cluster represent 1 picture)
Now I want to cluster those clusters again using tags (nodes) according to their Normalized google distance (weight of the edges), which should then be tags which are somewhat equal in their meaning.
I hope you guys understand what I want to accomplish.
I can also upload a picture to clarify it.
Thanks a lot
I don't think you can do that with the standard version of Gephi. You would need to develop a plugin to implement the very last step of your process.
Gephi is good for visualizing and browsing graphs, but (for now) there are more complete tools when it comes to processing topological properties. for instance, the igraph library (available in C, R and python) might be more appropriate for you. And note that you can use a file format compatible with both Gephi and igraph, which allows you to use both tools on the same data.
I was able to solve my problem. I had to import every one of these 15 clusters on their own. In this way i could use the Modularity method on just those few.

Code for Vertex-Disjoint Menger Problem in NON-planar graphs

I am doing an EDA-analysis program. Reading some articles I have found that my problem has a name (Vertex-Disjoint Menger Problem). But all the articles describes algorithms for planar graphs - needless to say that I have non-planar undirected graphs.
This problem is equivalent to finding the minimum s-t vertex cut in a undirected graph.
Also, instead of high-level algorithmic descriptions, I would like functional C/C++ code. As far as I can tell, BOOST has no such functionality.
As mentioned in another question, finding the minimum s–t vertex cut in a undirected graph can be reduced to finding the minimum s–t edge cut in a directed graph, for which many algorithms and implementations exist.

Resources