Lowest cost path of a graph - graph

I am working on a problem which drills down to this:
There is a connected undirected graph. I need to visit all the nodes
without visiting a node more than once. I can start and end at any
arbitrary node.
How can I go about this? Shall I apply algorithm like Floyd-Warshall to all start nodes possible or is there a better way to do?
Thanks.

A path that visits every node once and only once is called a Hamiltonian Path. The problem of finding a Hamiltonian Path is called Hamiltonian Path Problem.
First of all, this problem is NP-Complete. An algorithm whose run time is proportional to at most a polynomial of input size is called a polynomial algorithm. For example, most sorting algorithms require O(N logN) time, which is less than N^2, which makes it polynomial.
For NP-complete problems there is no known polynomial time algorithm. Although no one could prove it yet, most probably there is no polynomial time algorithm for NP-complete problems. It means:
the run time of any algorithm you will come up with will be proportional to an exponential function of input size. (i.e. if it solve the problem with 40 nodes in an hour, it will require 2 hours for 41 nodes, 4 hours for 42 nodes, ..) Which is very bad news.
The algorithm you will come up with will not be fundamentally much faster than one that proceeds with try and error.
If your input size is small, start with a simple backtracking algorithm. If you need to do better, a google search with terms like "hamiltonian path", "longest path" may provide an answer. Ultimately you will have to lower your expectations, (for example settle with an approximation instead of an optimal solution) if your input is large.

Related

All pairs shortest paths in graph directed with non-negative weighted edges

I have a directed graph with non-negative weighted edges where there are multiple edges between two vertices.
I need to compute all pairs shortest path.
This graph is very big (20 milion vertices and 100 milion of edges).
Is Floyd–Warshall the best algorithm ? There is a good library or tool to complete this task ?
There exists several all-to-all shortest paths algorithms for directed graphs with non-negative cycles, Floyd-Warshall being probably the most famous, but with the figures you gave, I think you will have in any case memory issues (time could be an issue, but you can find all-to-all algorithm that can be easily and massively parallelized).
Independently of the algorithm you use, you will have to store the result somewhere. And storing 20,000,000² = 400,000,000,000,000 paths length (if not the full paths themselves) would use hundreds of terabytes, at the very least.
Accessing any of these results would probably be longer than calculating one shortest path (memory wall), which can be done in less than a milisecond (depending on the graph structure, you can find techniques that are much, much faster than Dijkstra or any priority queue based algorithm).
I think you should look for an alternative where computing all-to-all shortest paths is not required, to be honnest. Or, to study the structure of your graph (DAG, well structured graph easy to partition/cluster, geometric/geographic information ...) in order to apply different algorithms, because in the general case, I do not see any way around.
For example, with the figures you gave, an average degree of about 5 makes for a decently sparse graph, considering its dimensions. Graph partitioning approaches could then be very useful.

Longest path in unweighted undirected graph starting and finishing in the same vertex

I have a problem in which I need to find the longest path. Given an unveighted undirected graph. Starting from a given vertex I need to visit as many vertices as possible and finish in the same one without visiting each of them more then once.
Most of the algorithms I found were for a special case (acyclic, directed etc.). An idea can be to find Hamiltonian cycle for every subset of the vertices (the subset can be generated with backtrack). But I guess there must be a far better algorithm.
As you've discovered, finding the largest cycle involves finding the Hamiltonian cycles of its subgraphs, and thus is NP-complete - unless you're working on some special class of graphs, any solution is going to be exponential in complexity.
A smart brute force approach (e.g. bitmask) is the best efficiency one can get for this type of problem.

Shortest path in a 3D maze

I'm trying to write a program to find the shortest path in a 3D maze using recursion.
I am able to write the code that finds a random path through the maze but I'm wondering how I can modify my code in order to find the shortest path.
Please note that I want to keep the recursive approach.
Can someone suggest a solution?
Here is a sample 2D maze:
s
XXXX
XX X
XXX
Xe X
One starts from s going to e. X is an obstacle and is the route.
It depends on the algorithm you are implementing. If you want a recursive approach then finding a random path is a good start point (although if the problem is too complex then a bad choice could have huge effects on number of attempts needed for convergence). Afterwards you need to modify the path and for example check whether the new path is shorter than the pervious one; if yes then you continue modifying your parameters in the same direction. Otherwise you have to change your direction.
Exit criterium for the algorithm/ program is normally the difference between the found solution and the ideal solution. So if you know the length of the ideal path (the optimal solution, you need not know the path itself but only its length) in advance then you can define an error margin of 10^-9 for example and once the difference between both solutions is less than this margin your algorithm exits.
In conclusion, this question is a mathematical optimization problem. Optimization is a field which has well-established literature eventhough it is a little bit complex. However if I were you I would search for shortest path algorithms and implement one which is best suited to my application (3D Maze)

Graph Clustering for almost Clustered Graph by removing nodes(vertices)

I want to carry out Graph Clustering in a huge undirected graph with millions of edges and nodes. Graph is almost clustered with different clusters joined together only by some nodes(kind of ambiguous nodes which can relate to multiple clusters). There will be very few or almost no edges between two clusters. This problem is almost similar to finding vertex cut set of a graph, with one exception that graph needs to be partitioned into many components(their number being unknown).(Refer this picture https://docs.google.com/file/d/0B7_3zLD0XdtAd3ZwMFAwWDZuU00/edit?pli=1)
Its almost like different strongly connected components sharing a couple of nodes between them and i am supposed to remove those nodes to separate those strongly connected components. Edges are weigthed but this problem is more like finding structures in a graph, so edge weigths won't be of relevance. (Another way to think about the problem would be to visualize Solid Spheres touching each other at some points with Spheres being those strongly connected components and touching points being those ambiguous nodes)
I am prototyping something, so am quiet short of time to pick up Graph Clustering Algorithms by myself and to select the best possible. Plus i need a solution that would cut nodes and not edges since different clusters share nodes and not edges in my case.
Is there any research paper, blog that addresses this or somewhat related problem? Or can anyone come up with a solution to this problem howsoever dirty.
Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too?
Is there any current open source implementation in MapReduce that can i directly use?
I think this problem is analogous to Finding Communities in online social networks by removing vertices.
Your problem is not so simple. I am afraid that it is related to the clique problem, which is NP complete, so unless you quantify somehow the statement "there are almost no edges between the clusters", your problem might be still very difficult. But what I would do in your shoes, would be to try one dirty, greedy approach, namely regarding the nodes as the following kind of quasi-neural net:
Each vertex I would consider to have inputs, outputs and a sigmoid activation function which convert the input value (sum of inputs) into the output value. The output value, and I consider this important, would not be cloned and sent to all the neighbors, but rather divided evenly between the neighbors. In addition to this, I would define a logarithmic decay of activity in a neuron (self-suppression, suppressive connection to itself), defined by a decay parameter global for the net.
Now, I would start simulation with all the neurons starting from activity 0.5 (activity range is 0 to 1) with very high decay parameter, which would lead to all the neuronst quickly stabilizing in 0 state. I would then gradually decrease the decay parameter until the steady state result would yield the first clique with non-zero stable activity.
The question is what to do next. One possibility is to subtract the found clique from the graph and run the same process again until we find all the cliques. This greedy approach might succeed if your graph is indeed as well behaved (really almost clustered) as you say, but might lead to unexpected results otherwise. Another possibility is to give the found clique a unique clique smell that would be repulsive (mutual suppresion) to other cliques an rerun the algorithm until the second clique is found, give it a different clique smell repulsive to all others etc., until each node has its own assigned smell.
I think this would be as many big ideas as i have about this.
The key is, that since it is probably not possible to solve this problem in the general case (likely NP complete), you need to take use of whatever special properties your graph has. That means you need to play with parameters for a while until the algorithm solves 99% of the cases that you encounter. I don't think that it is possible to give the numerically precise answer to your question without long experimentation with the actual datasets that you encounter.
Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too?
In my experience I doubt if using Map/Reduce over here would be truly advantageous. First 10^6 order of nodes isn't really that large [that too in a non hyper-connected graph, since you are considering clustering], and the over head of using Map/Reduce [unless you already have setup your hardware/software for it] for your problem will not be worth it.
Map/Reduce will work much better, where once you have solved the clustering issue, and then want to process each cluster with similar analysis. Basically when you can break your task into relatively isolated sub-tasks, which can be performed in parallel. This of course can be cascaded to several layers.
In a relatively similar situation, I personally first modelled my graph into a graph database (I used Neo4J, and would recommend it highly) and then ran my analytic and queries on it. You will be surprised as to how white board friendly this solution is, and even massively joined and connected queries will be executed near instantaneously especially at the scale of only a few million nodes. For example, you can do a filtered analysis, based on degrees of separation, followed by listing of commons.

Dijkstra's algorithm cannot deal with negative weights, when do you see negative weights in the real world?

I can't think of a concrete instance in which you'd have a negative weight. You cannot have a negative distance between two houses, you cannot go back in time. When would you have a graph with a negative edge weight?
I found the Bellman Ford algorithm was originally used to deal with routing in ARPANET, but again, can't imagine where you'd run into a route with a negative weight, it just doesn't seem possible. I could just be thinking too hard about this, what would be a simple example?
Suppose that walking a distance takes a certain amount of food. But along some paths there is food you can gather, so you might gain food by following those paths.
When doing routing, a negative weight might be assigned to a link to make it the default path. You could use this if you have a primary and a backup line and for whatever reason you don't want to load balance between them.
I guess you might get negative weights where you've already got a system with non-negative weights and a path comes along that is cheaper than all existing paths, and for some reason it's expensive to reweight the network.
Even if there were an example; you could probably normalize it to be all positive. Any actual representation of a negative weight is relative to some 0. I guess what I'm saying is that there probably isn't an application of negative weights that can't be done using exclusively positive weights.
EDIT: After thinking about this a little bit more, I suppose you could have situations where a given path has a negative weight. In this context; assuming the negative weight is bad, you would have to have a situation where the only possible way to achieve the goal of getting to your desired endpoint, would mean there would have to be at least one point in your graph where you're REQUIRED to take the negative path (as in, no other option is available to reach your goal). But I suppose if the graph hasn't been traversed; how would you know it were true?
EDIT (AGAIN): #Jim, I think you're right. The choke point isn't really relevant. I guess I was too quick to assume that it was because one question that pops into my mind when introducing negative edges is - if it is possible to traverse the graph without taking ANY negative edge, then what are the negative edges doing there in the first place? But, this doesn't hold very well, because - outside of hindsight - how would you ever know if a graph could or could not be traversed without going across a negative edge?
Also worth noting, according to the wikipedia page for Djikstra's algorithm :
Dijkstra's algorithm, conceived by Dutch computer scientist Edsger Dijkstra in 1956 and published in 1959, is a graph search algorithm that solves the single-source shortest path problem for a graph with nonnegative edge path costs, producing a shortest path tree. This algorithm is often used in routing and as a subroutine in other graph algorithms.
So, even though this conversation is useful and thought provoking; maybe the title of the question should be "What is the proper algorithm to use for traversing a graph with negative edges?" Djikstra's algorithm was intended to find the shortest path. But, if you introduce positive and negative weights, then doesn't the goal change from finding the shortest path to finding the MOST positive - regardless of how many edges are on your chosen path? And if it does, what is your exit condition? The only way you could know you've reached the optimal solution would be if you happened across a path that included all positive edges without any negative edges - and wouldn't this scenario only occur by chance? So - if introducing a situation where there are positive and negative weights changes the goal to be the most positive (or negative depending on how you want to frame it) wouldn't this problem be doomed to O(n!) and therefor be best solved by a decision making algorithm (like alpha/beta) which would produce the best outcome given a restriction on the total amount of edges you're allowed to take?
If you're trying to find the quickest way to swim across a series of linked pools in a water park, and it has flumes.

Resources