Algorithm for solving a shortest edges problem? - graph

Here is the problem:
For a graph of several nodes, each node could only connect to one of the other nodes. How to minimize the total edges of this graph?
Fig.1Fig.2
As the above, Fig.2 has shorter length than that of Fig.1. Is there an algorithm to calculate the shortest length of the total edges?

The problem is called 'minimum weight perfect matching'. Kolmogorov, V. - Blossom V: A new implementation of
a minimum cost perfect matching algorithm presents an efficient algorithm. The C++ implementation of the algorithm in the paper is available (retrieved from here; the link given in the paper itself is no longer active).
A cursory Google search suggested that various graph processing libraries ( like LEDA ) include an algorithm for solving your problem in their toolbox.
Caveat
I have not tested the implementation of the cited paper and do not know about the legal status of using it.

Related

All pairs shortest paths in graph directed with non-negative weighted edges

I have a directed graph with non-negative weighted edges where there are multiple edges between two vertices.
I need to compute all pairs shortest path.
This graph is very big (20 milion vertices and 100 milion of edges).
Is Floyd–Warshall the best algorithm ? There is a good library or tool to complete this task ?
There exists several all-to-all shortest paths algorithms for directed graphs with non-negative cycles, Floyd-Warshall being probably the most famous, but with the figures you gave, I think you will have in any case memory issues (time could be an issue, but you can find all-to-all algorithm that can be easily and massively parallelized).
Independently of the algorithm you use, you will have to store the result somewhere. And storing 20,000,000² = 400,000,000,000,000 paths length (if not the full paths themselves) would use hundreds of terabytes, at the very least.
Accessing any of these results would probably be longer than calculating one shortest path (memory wall), which can be done in less than a milisecond (depending on the graph structure, you can find techniques that are much, much faster than Dijkstra or any priority queue based algorithm).
I think you should look for an alternative where computing all-to-all shortest paths is not required, to be honnest. Or, to study the structure of your graph (DAG, well structured graph easy to partition/cluster, geometric/geographic information ...) in order to apply different algorithms, because in the general case, I do not see any way around.
For example, with the figures you gave, an average degree of about 5 makes for a decently sparse graph, considering its dimensions. Graph partitioning approaches could then be very useful.

Graph Shortest Paths w/Dynamic Weights (Repeated Dijkstra? Distance Vector Routing Algorithm?) in R / Python / Matlab

I have a graph of a road network with avg. traffic speed measures that change throughout the day. Nodes are locations on a road, and edges connect different locations on the same road or intersections between 2 roads. I need an algorithm that solves the shortest travel time path between any two nodes given a start time.
Clearly, the graph has dynamic weights, as the travel time for an edge i is a function of the speed of traffic at this edge, which depends on how long your path takes to reach edge i.
I have implemented Dijkstra's algorithm with
edge weights = (edge_distance / edge_speed_at_start_time)
but this ignores that edge speed changes over time.
My questions are:
Is there a heuristic way to use repeated calls to Dijkstra's algorithm to approximate the true solution?
I believe the 'Distance Vector Routing Algorithm' is the proper way to solve such a problem. Is there a way to use the Igraph library or another library in R, Python, or Matlab to implement this algorithm?
EDIT
I am currently using Igraph in R. The graph is an igraph object. The igraph object was created using the igraph command graph.data.frame(Edges), where Edges looks like this (but with many more rows):
I also have a matrix of the speed (in MPH) of every edge for each time, which looks like this (except with many more rows and columns):
Since I want to find shortest travel time paths, then the weights for a given edge are edge_distance / edge_speed. But edge_speed changes depending on time (i.e. how long you've already driven on this path).
The graph has 7048 nodes and 7572 edges (so it's pretty sparse).
There exists an exact algorithm that solves this problem! It is called time-dependent Dijkstra (TDD) and runs about as fast as Dijkstra itself.
Unfortunately, as far as I know, neither igraph nor NetworkX have implemented this algorithm so you will have to do some coding yourself.
Luckily, you can implement it yourself! You need to adapt Dijkstra in single place.
In normal Dijkstra you assign the weight as follows:
With dist your current distance matrix, u the node you are considering and v its neighbor.
alt = dist[u] + travel_time(u, v)
In time-dependent Dijkstra we get the following:
current_time = start_time + dist[u]
cost = weight(u, v, current_time)
alt = dist[u] + cost
TDD Dijkstra was described by Stuart E. Dreyfus. An appraisal of some shortest-path
algorithms. Operations Research, 17(3):395–412, 1969
Currently, much faster heuristics are already in use. They can be found with the search term: 'Time dependent routing'.
What about igraph package in R? You can try get.shortest.paths or get.all.shortest.paths function.
library(igraph)
?get.all.shortest.paths
get.shortest.paths()
get.all.shortest.paths()# if weights are NULL then it will use Dijkstra.

Algorithm to enumerate all spanning trees of a graph

Is there any easy way to enumerate all spanning trees of a indirected graph? This can have O(2^n) complexity. The number of nodes on the graph is always lower than 10. I know Knuth has an algorithm on Volume 4 of TAoCP but I cant find it.
There are other works in the literature, did you check them? First an algorithm by Char, which is described in [Jayakumar et al.'84]. There's also the algorithm of [Kapoor & Ramesh'91], which is described in details in their article, and the work of [Postnikov'94]
Also, you might have a look at this other tread: Find all spanning trees of a directed weighted graph (some answers mention undirected graphs).
Finally, note that even if the algorithmic complexity is very high, this might not be a problem in practice, on such a small graph.

Community detection with InfoMap algorithm producing one massive module

I am using the InfoMap algorithm in the igraph package to perform community detection on a directed and non-weighted graph (34943 vertices, 206366 edges). In the graph, vertices represent websites and edges represent the existence of a hyperlink between websites.
A problem I have encountered after running the algorithm is that the majority of vertices have a membership in a single massive community (32920 or 94%). The rest of the vertices are dispersed into hundreds of other tiny communities.
I have tried different settings with the nb.trials parameter (i.e. 50, 100, and now running 500). However, this doesn't seem to change the result much.
I am feeling rather exasperated because the run-time on the algorithm is quite high, so I have to wait each time for the results (with no luck yet!!).
Many thanks.
Thanks for all the excellent comments. In the end, I got it working by downloading and running the source code for Infomap, which is available at: http://www.mapequation.org/code.html.
Due to licence issues, the latest code has not been integrated with igraph.
This solved the problem of too many nodes being 'lumped' into a single massive community.
Specifically, I used the following options from the command line: -N 10 --directed --two-level --map
Kudos to Martin Rosvall from the Infomap project for providing me with detailed help to resolve this problem.
For the interested reader, here is more information about this issue:
When a network collapses into one major cluster, it is most often because of a very dense and random link structure ... In the code for directed networks implemented in iGraph, teleportation is encoded. If many nodes have no outlinks, the effect of teleportation can be significant because it randomly connect nodes. We have made new code available here: http://www.mapequation.org/code.html that can cluster network without encoding the random teleportation necessary to make the dynamics ergodic. For details, see this paper: http://pre.aps.org/abstract/PRE/v85/i5/e056107
I was going to put this in a comment, but it ended up being too long and hard to read in that format, so this is a tangentially related answer.
One thing you should do is assess whether the algorithm is doing a good job at finding community structure. You can try to visualise your network to establish:
Is the algorithm returning community structures that make sense? Maybe there is one massive community?
If not does the visualisation provide insight as to why?
This will help inform your next steps. Maybe the structure of the network requires a different algorithm?
One thing I find useful for large networks is plotting your edges as a heatmap. This is simple to do if you have your edges stored in an adjacency matrix.
For this, you can use the image function, passing in your matrix of edges as the argument z. Hopefully this will allow you to see by eye the community structure.
However you also want to assess the correctness of your algorithm, so you want to sort the nodes (rows and columns of your adjacency matrix) by the community they've been assigned to.
Another thing to note is that if your edges are directed it may be more difficult to assess by eye as edges can appear on either side of the diagonal of the heatmap. One thing you can do is instead plot the underlying graph -- that is the adjacency matrix assuming your edges are undirected.
If your algorithm is doing a good job, you would expect to see square blocks along the diagonal, one for each detected community.

Algorithm for splitting a connected graph into two components

Suppose I am given a weighted, connected graph. I'd like to find a list of edges that can be removed from the graph leaving it split into two components and so that the sum of the weights of the removed edges is small. Ideally I'd like to have the minimal sum, but I'd settle for a reasonable approximation.
This seems like a hard problem. Are there any good algorithms for doing this?
If it helps, in my case the number of nodes is about 50 and the graph may be dense, so that most pairs of nodes will have an edge between them.
I think you are looking for a minimum cut algorithm. Wikipedia
Before the Edmunds-Karp algorithm came the Ford-Fulkerson algorithm. For what it's worth, the Algorithms book [Cormen, Rivest] cites these two algorithms in the chapter on graph theory.
I believe what you're looking for is an algorithm for computing the minimum cut. The Edmonds-Karp algorithm does this for flow networks (with source and sink vertices). Hao and Orlin (1994) generalize this to directed, weighted graphs. Their algorithm runs in O(nm lg(n^2/m)) for n vertices and m edges. Chekuri et al. (1997) compare several algorithms empirically, some of which have better big O's than Hao and Orlin.
I may be wrong with my idea, but Ford Fulkersonalgorithm does not find a solution for this problem, since Ford Fulkerson assumes that there are source and destination nodes, and there is an attempt to transfer a material from source to destination. Hence, the algorithm cannot calculate all possible min-cuts.

Resources