What is a Topological Sort

What is a Topological Sort - graph

I have looked up numerous examples online and watched a YouTube video but I am still a little lost on what a topological sort is. As far as i understand you should start with like a visited and non-visited queue and get the topological sort order when you are done visiting all of the children of a node?

Topological Sort means you are given a list of jobs and list of prerequisites and you have to figure out the ordering of jobs.
jobs = [1,2,3]
prerequisites = [[1,2], [1, 3], [3,2]]
result = [1,3,2] should be the order in which jobs should execute.
here [1,2] signifies that job 2 cannot be started until job 1 is completed (job 1 is a prerequisite).
So for this you can use a straightforward depth first search (graph traversal algo). wherein you can have a custom class named with JobVertex
class JobVertex {
int job;
List<JobVertex> preRequisites;
boolean inProgress;
boolean visited;}
Initially both inProgress and visited flags can be set to false.
inProgress flag is used to detect cycles (because to make topological sort to work graph has to be DAG)
List<Integer> result = new ArrayList<>();
result is the final list where our ordered jobs will be added.
dfs(node, result) method can look like this wherein you can start with any of the node and then traverse through its prerequisites in recursive manner and updating the flags with every iteration.
Time Complexity can be same as that of dfs (v + e) where v corresponds to vertices and e corresponds to edges respectively.

A topological sort is a linear ordering of nodes in which for every directed edge 'a -> b', 'a' comes before 'b' in the ordering. Since the edges must be directed, topological sorts must be done on directed graphs, and the graphs must also be acyclic (they can't contain cycles). This is a special subset of graphs known as a DAG (Directed Acyclic Graph).
Here is an example of what it looks like, notice all edges (arrows) go left to right
The best way to find a topological sort is to use DFS with a temporary Stack as opposed to a Queue. Your understanding is close, but not fully there. As DFS is running on the graph, you don't push the node onto the temporary Stack until all of the children of that node have been explored as well. This is where the recursive element of this algorithm comes into play. Since you don't want to push the node onto the stack until of its children have been explored, you have to wait until the algorithm is finished running on the children. By implementing a stack, the first node that is popped off the top will have no edges pointing towards it, and the last node popped off will have no edges coming from it. If we used a queue it would be the other way around.
You can use a second stack and remove nodes from it once they are added to the temporary stack for housekeeping purposes. Once all the nodes are on the temporary stack, print them by popping them off one by one, and you will have the topological sort of the given graph.

Related

How to detect cycles in directed graph in iterative DFS?

My current project features a set of nodes with inputs and outputs. Each node can take its input values and generate some output values. Those outputs can be used as inputs for other nodes. To minimize the amount of computation needed, node dependencies are checked on application start. When updating the nodes, they are updated in the reverse order they depend on each other.
That said, the nodes resemble a directed graph. I am using iterative DFS (no recursion to avoid stack overflows in huge graphs) to work out the dependencies and create an order for updating the nodes.
I further want to avoid cycles in a graph because cyclic dependencies will break the updater algorithm and causing a forever running loop.
There are recursive approaches to finding cycles with DFS by tracking nodes on the recursion stack, but is there a way to do it iteratively? I could then embed the cycle search in the main dependency resolver to speed things up.

There are plenty of cycle-detection algorithms available on line. The simplest ones are augmented versions of Dijkstra's algorithm. You maintain a list of visited nodes and costs to get there. In your design, replace the "cost" with the path to get there.
In each iteration of the algorithm, you grab the next node on the "active" list and look at each node that follows it in the graph (i.e. each of its dependencies). If that node is on the "visited" list, then you have a cycle. The path you maintained in getting here shows the loop path.
Is that enough to get you moving?

Try a timestamp. Add a meta timestamp and set it to zero on your nodes.
Previous Answer (non applicable):
When you start a search, increment or grab a time() stamp. Then, when
you visit a node, compare it to the current search timestamp. If it
is the same, then you have found a cycle. If not then set the stamp
to current.
Next search, increment again.
Ok, this is how I'm assuming you are performing your DFS search:
Add Root node to a stack (for searching) and a vector (for updating).
Pop the stack and add children of the current node to the stack and to the vector
loop until stack is empty
reverse iterate the vector and update values (by referencing child nodes)
The problem: Cycles will cause the same set of nodes to be added to the stack.
Solution 1: Use a boolean/timestamp to see if the node has been visited before adding to the DFS search stack. This will eliminate cycles, but will not resolve them. You can spit out an error and quit.
Solution 2: Use a timestamp, but increment it each time you pop the stack. If a child node has a timestamp set, and it is less than the current stamp, you have found a cycle. Here's the kicker. When iterating over the values backwards, you can check the timestamps of the child nodes to see if they are greater than the current node. If less, then you've found a cycle, but you can use a default value.
In fact, I think Solution 1 can be resolved the same way by never following more than one child when updating the value and setting all nodes to a default value on start. Solution 2 will give you a warning while evaluating the graph whereas solution 1 only gives you a warning when creating the vector.

Finding optimal order of all nodes to be visited in a graph

The following problem comes from geography, but I don't kown of any GIS method to solve it. I think it's solution can be found with graph analysis, but I need some guidance to think in the right direction.
There is a geographical area, say a state. It is subdivided in several quadrants, which are subdivided further, and once again. So its a tree structure with the state as root, and 3 levels of child nodes, each parent having 4 childs. But from the perspective of the underlying process its more like a completed graph, since in theory a node is directly reachable from each other node.
The subdivisions reflect map sheet boundaries at different mapscales. Each mapsheet has to reviewed by a topographer in a time span dependend on the complexity of the map contents.
While reviewing the map, the underlying digital data is locked in the database. And as the objects have topological relationships with objects of neighboring map sheet (eg. roads crossing the map boundaries), all 8 surrounding map sheets are locked also.
The question is, what is the optimal order in which the leafs (on the lowest level) should be visited to satisfy following requirements:
each node has to be visited
we do not deal with travel times but with the timespan a worker spent at each node (map)
the time spent at a node is different
while the worker is at a node, all adjacent nodes cannot be visited; this is true also for other workers too; they cannot work on a map side by side with a map already being processed
if a node has been visited, other nodes having the same parent should be prefered as next node; this is true for all levels of parents
Finally for a given number of nodes/maps and workers we need an ordered series of nodes, each worker visites to minimize the overall time, and the time for each parent also.
After designing the solution the real work begins. We will recognize, that the actual work may need more or less time, than expected. Therefore it is necessary to replay the solution up to a current state, and design a new solution with slightly different conditions, leading to another order of nodes.
Has somebody an idea which data structure and which algorithm to use to find a solution for such kind of problem?

Not havig a ready made algorithm, but may be the following helps devising one:
Your exakt topologie is not cler. I assume from the other remarks,
you are targeting a regular structure. In your case a 4x4 square.
Given the restriction that working on a node blocks any adjacient node can be used to identify a starting condition for the algorithm:
Put a worker to one corner of the total are and then put others
at ditance 2 from this (first in x direction and as soon as the side is "filled" with y direction . This will occupy all (x,y) nodes (x,y in 0,2,..,2n where 2n <= size of grid)
With a 4x4 area this will allow a maximum of 4 workers, and will position a worker per child node of each 2 level grid node)
from this let each worker process (x,y),(x+1),(y+1),(x+1,y). This are the 4 nodes of a small square.
If a worker is done but can not proceed to the next planned node, you may advance it to the next free node from the schedule.
The more workers you will have, the higher the risk for contention will be. If you have any estimates on the expected wokload per node,
then you may prefer starting with the most expensive ones and arrange the processing sequence to continue with the ones that have the highest total expected costs.

How to compute the average(or sum) of node values in a network?

Consider a network(graph) of N nodes and each of them is holding a value, how to design a program/algorithm (for each node) that allows each node to compute the average(or sum) of all the node values in the network?
Assumptions are:
Direct communication between nodes is constrained by the graph topology, which is not a complete graph. Any other assumptions, if necessary for your algorithm, is allowable. The weakest one I assume is that there's a loop in the graph that contains all the nodes.
N is finite.
N is suffiently large such that you can't store all the values and then compute its average (or sum). For the same reason, you can't "remember" whose value you've received (thus you can't just redistributing values you've received and add those you've not seen to the buffer and get a result).
(The Tags may not be right since I don't know which field this kind of problems are in, if it's some kind of a general problem.)

That is an interesting question, here some assumptions I've made, before I present a partial solution:
The graph is connected (in case of a directed graph, strongly connected)
The nodes only communicate with their direct neighbours
It is possible to hold and send the sum of all numbers, this means the sum either won't exceed long or you have a data structure sufficiently large, which it won't exceed
I'd go with depth first search. Node N0 would initiate the algorithm and send it's value + the count to the first neighbour (N0.1). N0.1 would add it's own value + count and forward the message to the next neighbour (N0.1.1). In case the message comes back to either N0 or N0.1 they just forward it to another neighbour of theirs. (N0.2 or N0.1.2).
The problem now is to know, when to terminate the algorithm. Preferably you want to terminate it as soon as you've reached all nodes, and afterwards just broadcast the final message. In case you know how many nodes there are in the graph, just keep on forwarding it to the next node, until every node will be reached eventually. The last node will know that is had been reached (it can compare the count variable with the number of nodes in the graph) and broadcast the message.
If you don't know how many nodes there are, and it's and undirected graph, than it will be just depth first implementation in a graph. This means, if N0.1 gets a message from anyone else than N0.1.1 it will just bounce the message back, as you can't send messages to the parent when performing depth first search. If it is a directed graph and you don't know the number of nodes, well than you either come up with a mathematical model to prove when the algorithm has finished, or you learn the number of nodes.
I've found a paper here, proposing a gossip based algorithm to count the number of nodes in a dynamic network: https://gnunet.org/sites/default/files/Gossipico.pdf maybe that will help. Maybe you can even use it to sum up the nodes.

How to hash and check for equality of objects with circular references

I have a cyclic graph-like structure that is represented by Node objects.
A Node is either a scalar value (leaf) or a list of n>=1 Nodes (inner node).
Because of the possible circular references, I cannot simply use a recursive HashCode() function, that combines the HashCode() of all child nodes: It would end up in an infinite recursion.
While the HashCode() part seems at least to be doable by flagging and ignoring already visited nodes, I'm having some troubles to think of a working and efficient algorithm for Equals().
To my surprise I did not find any useful information about this, but I'm sure many smart people have thought about good ways to solve these problems...right?
Example (python):
A = [ 1, 2, None ]; A[2] = A
B = [ 1, 2, None ]; B[2] = B
A is equal to B, because it represents exactly the same graph.
BTW. This question is not targeted to any specific language, but implementing hashCode() and equals() for the described Node object in Java would be a good practical example.

I would like to know a good answer as well. So far I use a solution based on the visited set.
When computing hash, I traverse the graph structure and I keep a set of visited nodes. I do not enter the same node twice. When I hit an already visited node, the hash returns a number without recursion.
This work even for the equality comparison. I compare node data and recursively invoke on the children. When I hit an already visited node, the comparison returns true without recursion.

If you think about this as graph, a leaf node is a node that has only one connection and a complex node is one with at least 2. So one you got it that way, implement a simple BFS algorithm witch applies the hash function to each node it passes and then drops the result. This way you ensure yourself that you wont fall in cicles or go through any node more than once.
The implementation could be very straihgtforward. Read about it here.

You need to walk the graphs.
Here's a question: are these graphs equal?
A = [1,2,None]; A[2] = A
B = [1,2,[1,2,None]]; B[2][2] = B
If so, you need a set of (Node, Node) tuples. Use this set to catch loops, and return 'true' when you catch a loop.
If not, you can be a bit more efficient, and use a map from Node to Node. Then, when walking the graphs, build up a set of correspondances. (In the case above, A would correspond to B, A[2] would correspond to B[2], &c.) Then when you visit a node pair you make sure the exact pair is in the map; if it isn't the graph doesn't match.

Fixed length path between two graph nodes

Is there an algorithm that will, if given two nodes on a graph, find a route between them that takes the specified number of hops? Any node can be connected to any other.
The points at the moment are located in 2D space, so I'm not sure if a graph is the best approach.

Have you tried iterated-deepening DFS?

If you have nodes are seeking to find routes in terms of hops, then a graph is probably the right approach. I'm not sure I understand what you are trying to do and what the constraints are, though, especially with respect to "Any Node can be connected to any other" .. which seems a bit open ended.
Disregarding that, however; with some graph representation:
It seems like starting at the first node, and doing a depth first search from there, and terminating a search if (a) the hops taken is larger than your specified number or (b) we have arrived at the second node; this will determine the first (not only) path connecting the two nodes in (at most) that many hops.
If it has to be exactly the specified hops, terminate any branch of the search if the hops have gone over, and terminate with success if you have also arrived at the second node.

Dumb approach: (data structure is array of stacks). This is basically doing Breadth First Search (BFS) to depth N, except that if loops are allowed (you did not clarify but I assume they are), you don't exclude the visited nodes from further searching.
Push starting node on a stack stored in the array at index 0 (index=depth)
For each level/index "l" 0-N:
For each node on a stack stored at level "l", find all its neighbors, and push them onto a stack stored in level "l+1".
Important: if your task allows finding paths that contain loops, do NOT check if you already visited any node you add. If it does not allow loops, use a hash of visited nodes to not add any node twice**
Stop when you end level "N-1".
Loop over all the nodes you just added to stack at index "N" and find your destination node. If found: success, if not, no such path.
Please note that if by "every node can be connected" you are implying a FULLY CONNECTED graph, then there exists a mathematical answer not involving actually visiting nodes
(however, the formula is too long to write in the text-entry field of StackOverflow)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex