Algorithm for splitting a connected graph into two components - graph

Suppose I am given a weighted, connected graph. I'd like to find a list of edges that can be removed from the graph leaving it split into two components and so that the sum of the weights of the removed edges is small. Ideally I'd like to have the minimal sum, but I'd settle for a reasonable approximation.
This seems like a hard problem. Are there any good algorithms for doing this?
If it helps, in my case the number of nodes is about 50 and the graph may be dense, so that most pairs of nodes will have an edge between them.

I think you are looking for a minimum cut algorithm. Wikipedia
Before the Edmunds-Karp algorithm came the Ford-Fulkerson algorithm. For what it's worth, the Algorithms book [Cormen, Rivest] cites these two algorithms in the chapter on graph theory.

I believe what you're looking for is an algorithm for computing the minimum cut. The Edmonds-Karp algorithm does this for flow networks (with source and sink vertices). Hao and Orlin (1994) generalize this to directed, weighted graphs. Their algorithm runs in O(nm lg(n^2/m)) for n vertices and m edges. Chekuri et al. (1997) compare several algorithms empirically, some of which have better big O's than Hao and Orlin.

I may be wrong with my idea, but Ford Fulkersonalgorithm does not find a solution for this problem, since Ford Fulkerson assumes that there are source and destination nodes, and there is an attempt to transfer a material from source to destination. Hence, the algorithm cannot calculate all possible min-cuts.

Related

Algorithm for efficient identification of bipartite graph

I'm looking for an algorithm to efficiently determine if a given graph (implemented either as an adyacency matrix or as an adyacency list — the one that makes the algorithm run faster) is bipartite or not.
I'm aware that a slight modification of BFS algorithm can suit this purpose, but I haven't been able to lower BFS's time-complexity of O(|E|+|V|).
I'd expect to find an algorithm that is a bit faster, taking advantage o the fact that a bipartite graph lacks cicles with an odd number of edges.
Does anyone know such algorithm or have any suggestion to address this problem?

All pairs shortest paths in graph directed with non-negative weighted edges

I have a directed graph with non-negative weighted edges where there are multiple edges between two vertices.
I need to compute all pairs shortest path.
This graph is very big (20 milion vertices and 100 milion of edges).
Is Floyd–Warshall the best algorithm ? There is a good library or tool to complete this task ?
There exists several all-to-all shortest paths algorithms for directed graphs with non-negative cycles, Floyd-Warshall being probably the most famous, but with the figures you gave, I think you will have in any case memory issues (time could be an issue, but you can find all-to-all algorithm that can be easily and massively parallelized).
Independently of the algorithm you use, you will have to store the result somewhere. And storing 20,000,000² = 400,000,000,000,000 paths length (if not the full paths themselves) would use hundreds of terabytes, at the very least.
Accessing any of these results would probably be longer than calculating one shortest path (memory wall), which can be done in less than a milisecond (depending on the graph structure, you can find techniques that are much, much faster than Dijkstra or any priority queue based algorithm).
I think you should look for an alternative where computing all-to-all shortest paths is not required, to be honnest. Or, to study the structure of your graph (DAG, well structured graph easy to partition/cluster, geometric/geographic information ...) in order to apply different algorithms, because in the general case, I do not see any way around.
For example, with the figures you gave, an average degree of about 5 makes for a decently sparse graph, considering its dimensions. Graph partitioning approaches could then be very useful.

Algorithm to enumerate all spanning trees of a graph

Is there any easy way to enumerate all spanning trees of a indirected graph? This can have O(2^n) complexity. The number of nodes on the graph is always lower than 10. I know Knuth has an algorithm on Volume 4 of TAoCP but I cant find it.
There are other works in the literature, did you check them? First an algorithm by Char, which is described in [Jayakumar et al.'84]. There's also the algorithm of [Kapoor & Ramesh'91], which is described in details in their article, and the work of [Postnikov'94]
Also, you might have a look at this other tread: Find all spanning trees of a directed weighted graph (some answers mention undirected graphs).
Finally, note that even if the algorithmic complexity is very high, this might not be a problem in practice, on such a small graph.

What are the differences between community detection algorithms in igraph?

I have a list of about 100 igraph objects with a typical object having about 700 vertices and 3500 edges.
I would like to identify groups of vertices within which ties are more likely. My plan is to then use a mixed model to predict how many within-group ties vertices have using vertex and group attributes.
Some people may want to respond to other aspects of my project, which would be great, but the thing I'm most interested in is information about functions in igraph for grouping vertices. I've come across these community detection algorithms but I'm not sure of their advantages and disadvantages, or whether some other function would be better for my case. I saw the links here as well, but they aren't specific to igraph. Thanks for your advice.
Here is a short summary about the community detection algorithms currently implemented in igraph:
edge.betweenness.community is a hierarchical decomposition process where edges are removed in the decreasing order of their edge betweenness scores (i.e. the number of shortest paths that pass through a given edge). This is motivated by the fact that edges connecting different groups are more likely to be contained in multiple shortest paths simply because in many cases they are the only option to go from one group to another. This method yields good results but is very slow because of the computational complexity of edge betweenness calculations and because the betweenness scores have to be re-calculated after every edge removal. Your graphs with ~700 vertices and ~3500 edges are around the upper size limit of graphs that are feasible to be analyzed with this approach. Another disadvantage is that edge.betweenness.community builds a full dendrogram and does not give you any guidance about where to cut the dendrogram to obtain the final groups, so you'll have to use some other measure to decide that (e.g., the modularity score of the partitions at each level of the dendrogram).
fastgreedy.community is another hierarchical approach, but it is bottom-up instead of top-down. It tries to optimize a quality function called modularity in a greedy manner. Initially, every vertex belongs to a separate community, and communities are merged iteratively such that each merge is locally optimal (i.e. yields the largest increase in the current value of modularity). The algorithm stops when it is not possible to increase the modularity any more, so it gives you a grouping as well as a dendrogram. The method is fast and it is the method that is usually tried as a first approximation because it has no parameters to tune. However, it is known to suffer from a resolution limit, i.e. communities below a given size threshold (depending on the number of nodes and edges if I remember correctly) will always be merged with neighboring communities.
walktrap.community is an approach based on random walks. The general idea is that if you perform random walks on the graph, then the walks are more likely to stay within the same community because there are only a few edges that lead outside a given community. Walktrap runs short random walks of 3-4-5 steps (depending on one of its parameters) and uses the results of these random walks to merge separate communities in a bottom-up manner like fastgreedy.community. Again, you can use the modularity score to select where to cut the dendrogram. It is a bit slower than the fast greedy approach but also a bit more accurate (according to the original publication).
spinglass.community is an approach from statistical physics, based on the so-called Potts model. In this model, each particle (i.e. vertex) can be in one of c spin states, and the interactions between the particles (i.e. the edges of the graph) specify which pairs of vertices would prefer to stay in the same spin state and which ones prefer to have different spin states. The model is then simulated for a given number of steps, and the spin states of the particles in the end define the communities. The consequences are as follows: 1) There will never be more than c communities in the end, although you can set c to as high as 200, which is likely to be enough for your purposes. 2) There may be less than c communities in the end as some of the spin states may become empty. 3) It is not guaranteed that nodes in completely remote (or disconencted) parts of the networks have different spin states. This is more likely to be a problem for disconnected graphs only, so I would not worry about that. The method is not particularly fast and not deterministic (because of the simulation itself), but has a tunable resolution parameter that determines the cluster sizes. A variant of the spinglass method can also take into account negative links (i.e. links whose endpoints prefer to be in different communities).
leading.eigenvector.community is a top-down hierarchical approach that optimizes the modularity function again. In each step, the graph is split into two parts in a way that the separation itself yields a significant increase in the modularity. The split is determined by evaluating the leading eigenvector of the so-called modularity matrix, and there is also a stopping condition which prevents tightly connected groups to be split further. Due to the eigenvector calculations involved, it might not work on degenerate graphs where the ARPACK eigenvector solver is unstable. On non-degenerate graphs, it is likely to yield a higher modularity score than the fast greedy method, although it is a bit slower.
label.propagation.community is a simple approach in which every node is assigned one of k labels. The method then proceeds iteratively and re-assigns labels to nodes in a way that each node takes the most frequent label of its neighbors in a synchronous manner. The method stops when the label of each node is one of the most frequent labels in its neighborhood. It is very fast but yields different results based on the initial configuration (which is decided randomly), therefore one should run the method a large number of times (say, 1000 times for a graph) and then build a consensus labeling, which could be tedious.
igraph 0.6 will also include the state-of-the-art Infomap community detection algorithm, which is based on information theoretic principles; it tries to build a grouping which provides the shortest description length for a random walk on the graph, where the description length is measured by the expected number of bits per vertex required to encode the path of a random walk.
Anyway, I would probably go with fastgreedy.community or walktrap.community as a first approximation and then evaluate other methods when it turns out that these two are not suitable for a particular problem for some reason.
A summary of the different community detection algorithms can be found here: http://www.r-bloggers.com/summary-of-community-detection-algorithms-in-igraph-0-6/
Notably, the InfoMAP algorithm is a recent newcomer that could be useful (it supports directed graphs too).

Cluster adjacency matrix of different sizes

I have created adjacency matrix for directed graphs of different sizes. I have around 30,000 matrices, each on a separate text file. How can I cluster them, is there any tools available. What is the best way to represent a directed graph for clustering.
Thank you.
What exactly do you want to achieve? Group similar matrices, right?
With k-means, you will not have much fun here. The adjacency matrices are binary; interpreting them as huge vectors and computing an L-p-norm distance (e.g. Euclidean distance) on them, then computing average matrixes - which is what k-means does - doesn't sound sensible to me. Plus, you will likely be bitten by the curse of dimensionality. The high number of dimensions will make all matrixes appear similar.
For pretty much any clustering algorithm, the first question you as the "domain expert" will have to answer is: what makes two adjacency matrixes similar? Once you have formalized this, you will be able to run many clustering algorithms, including classic single-link clustering, DBSCAN or OPTICS.
I would try k-means and voronoi-diagrams. It can be by computed with a minimal spanning tree and by looking for the longest edges. Then you can compute the different cluster with the traditional k-means using the mst edges as center. Another possiblity would be a hierarchical cluster for example a space-filling-curve. See for example: https://stats.stackexchange.com/questions/1475/visualization-software-for-clustering.
You can find some ideas for graph features/statistics here:
http://networkx.lanl.gov/reference/algorithms.html

Resources