I've been given the following exercise: There's an unweighted, directed, weakly connected graph with n nodes (n < 1 000 000). We want to traverse the whole graph, starting from the least number of nodes. The question is: from which nodes do I start the traversals? I couldn't find any content on this particular topic. However, I managed to come up with an algorithm, but it's not efficient enough:
I store the graph in an adjacency list (n can be too high for a two-dimensional matrix)
I start a BFS from each node i, and store the nodes it reached in x[i][...] (x = List<List<int>>)
I check whether any x[i].Count == n
I check whether any (x[i] union x[j]).Count == n
I check whether any (x[i] union x[j] union x[k]).Count == n
... So I make all possible unions of 2, 3, 4... subsets of x, and check whether its count is n.
It works all right if n is not too high, but I would need a more efficient algorithm for bigger n.
Any help is appreciated (you would make me be able to fall asleep again)! :)
Find the nodes that do not have any incoming edges. Loop over these nodes, and for each node v, begin traversing the graph. Remember which nodes you visited (by putting them in a hash table or marking them). Stop traversing when you reach a node you have already visited.
You would need an adjacency list representation, where each node has a list of incoming and a list of outgoing edges. Then do something like this:
Set nodesToVisit = emptySet;
for i=1 to n:
if incoming[i].size() == 0:
nodesToVisit.add(i)
Set visited = emptySet;
for v in nodesToVisit:
nodesToVisit.remove(v)
if(v is not in visited):
visit(v);
visited.add(v);
for u in outgoing[v]:
nodesToVisit.add(u)
Related
I have been practicing graph questions lately.
https://leetcode.com/problems/course-schedule-ii/
https://leetcode.com/problems/alien-dictionary/
The current way I detect cycles is to use two hashsets. One for visiting nodes, and one for fully visited nodes. And I push the result onto a stack with DFS traversal.
If I ever visit a node that is currently in the visiting set, then it is a cycle.
The code is pretty verbose and the length is long.
Can anyone please explain how I can use a more standard top-sort algorithm (Kahn's) to detect cycles and generate the top sort sequence?
I just want my method to exit or set some global variable which flags that a cycle has been detected.
Many thanks.
Khan's algorithm with cycle detection (summary)
Step 1: Compute In-degree: First we create compute a lookup for the in-degrees of every node. In this particular Leetcode problem, each node has a unique integer identifier, so we can simply store all the in-degrees values using a list where indegree[i] tells us the in-degree of node i.
Step 2: Keep track of all nodes with in-degree of zero: If a node has an in-degree of zero it means it is a course that we can take right now. There are no other courses that it depends on. We create a queue q of all these nodes that have in-degree of zero. At any step of Khan's algorithm, if a node is in q then it is guaranteed that it's "safe to take this course" because it does not depend on any courses that "we have not taken yet".
Step 3: Delete node and edges, then repeat: We take one of these special safe courses x from the queue q and conceptually treat everything as if we have deleted the node x and all its outgoing edges from the graph g. In practice, we don't need to update the graph g, for Khan's algorithm it is sufficient to just update the in-degree value of its neighbours to reflect that this node no longer exists.
This step is basically as if a person took and passed the exam for
course x, and now we want to update the other courses dependencies
to show that they don't need to worry about x anymore.
Step 4: Repeat: When we removing these edges from x, we are decreasing the in-degree of x's neighbours; this can introduce more nodes with an in-degree of zero. During this step, if any more nodes have their in-degree become zero then they are added to q. We repeat step 3 to process these nodes. Each time we remove a node from q we add it to the final topological sort list result.
Step 5. Detecting Cycle with Khan's Algorithm: If there is a cycle in the graph then result will not include all the nodes in the graph, result will return only some of the nodes. To check if there is a cycle, you just need to check whether the length of result is equal to the number of nodes in the graph, n.
Why does this work?:
Suppose there is a cycle in the graph: x1 -> x2 -> ... -> xn -> x1, then none of these nodes will appear in the list because their in-degree will not reach 0 during Khan's algorithm. Each node xi in the cycle can't be put into the queue q because there is always some other predecessor node x_(i-1) with an edge going from x_(i-1) to xi preventing this from happening.
Full solution to Leetcode course-schedule-ii in Python 3:
from collections import defaultdict
def build_graph(edges, n):
g = defaultdict(list)
for i in range(n):
g[i] = []
for a, b in edges:
g[b].append(a)
return g
def topsort(g, n):
# -- Step 1 --
indeg = [0] * n
for u in g:
for v in g[u]:
indeg[v] += 1
# -- Step 2 --
q = []
for i in range(n):
if indeg[i] == 0:
q.append(i)
# -- Step 3 and 4 --
result = []
while q:
x = q.pop()
result.append(x)
for y in g[x]:
indeg[y] -= 1
if indeg[y] == 0:
q.append(y)
return result
def courses(n, edges):
g = build_graph(edges, n)
ordering = topsort(g, n)
# -- Step 5 --
has_cycle = len(ordering) < n
return [] if has_cycle else ordering
I have to make an algorithm that finds all the topological orders(using predecessor counting) and the highest cost paths and their costs between 2 pairs of vertices. My algorithm looks like this for now:
def topologicalSort(self):
sorted = []
count = {}
q = deque()
for x in self.parseX():
count[x] = self.innerDegree(x)
if count[x] == 0:
q.append(x)
while len(q) > 0:
x = q.popleft()
sorted.append(x)
for y in self.parseNout(x):
count[y] -= 1
if count[y] == 0:
q.append(y)
return sorted
It works fine but the problem is that is will find only one topological order. And my question would be: How can I make it to find all the topological orders?
Your loops are in a fixed order. Different topological sorts are achieved by iterating over them in different orders. So you need another level of recursion trying an topological sort on each of them being the first one to be tried.
I'd elaborate, but cursory search found several pages apparently describing the algorithm you want (albeit in different languages):
https://www.geeksforgeeks.org/all-topological-sorts-of-a-directed-acyclic-graph/
https://www.techiedelight.com/find-all-possible-topological-orderings-of-dag/
First I'll explain the problem. I have a player in a closed maze filled with items that he should collect to win the game. We also have an opponent which tries to do just the same.The player with the biggest amount for items collected wins. Suppose the opponent follows a BFS algorithm to collect the items, and we have access to all its decisions for every turn, can we make some prediction on what items in the maze should we go to first (so it doesn't get a chance in having the ones close to it), or just pin point a location where items are more dense?
It feels like randomness could also affect this very badly (most of the items land next to the opponent for example). What about if the opponent follows an A* algorithm?
I have already implemented an A* algorithm for our player.First, I look for the closest item heuristically using manhattan distance, then i go collect it and look for the new closest one again and so on.I feel like the "looking for the closest item" method might not be that efficient, maybe pin pointing (somehow haha) a location where the items are more dense is better as i said.
def astar(start, items, mazeMap):
# mazeMap is a dictionary with nodes and as a key for every node
is associated another dictionary containing the neighbors as keys
and the weight of edges to them as values
# items is a list of pairs giving the location of each item
# Apparent goal
# goal is a pair (closest_item, distance_to_closest_item)
goal = closest_item(start, items)
# Set of nodes not needed to be checked anymore
# closedSet = {node: [gscore, fscore]}
closedSet = {}
# Set of potential short-path nodes
# openSet = {node: [gscore, fscore]}
openSet = {start: [0, goal[1]]}
# Set to construct the optimal path
cameFrom = {}
while len(openSet) > 0:
# Looking for the node with the smallest fscore
current = list(openSet.keys())[0]
for keys, values in openSet.items():
if values[1] < openSet[current][1]:
current = keys
# If the chosen node is an item of cheese, we are done
if current in items:
return reconstruct_path(cameFrom, current)
# The current node no longer needs to be checked
closedSet[current] = openSet[current]
del openSet[current]
for keys, values in mazeMap[current].items():
# We don't need to check the node if it's already been done
if keys in list(closedSet.keys()):
continue
# Calculate Gscore
tentative_gscore = closedSet[current][0] + values
if keys not in list(openSet.keys()):
openSet[keys] = [0, 0]
elif tentative_gscore >= openSet[keys][0]:
continue
# This new path is better than the previous one, save it !
cameFrom[keys] = current
openSet[keys][0] = tentative_gscore
openSet[keys][1] = tentative_gscore + manhattan_distance(keys, goal[0])
return "Impossible"
I would like to know which node(s) should I delete if I want to maximize the number of isolated node in my undirected network?
For instance in the following R script, I would like the result to be H if I delete 1 node and H & U if I delete 2 nodes and so on ...
library(igraph)
graph <- make_graph( ~ A-B-C-D-A, E-A:B:C:D,
G-H-I,
K-L-M-N-K, O-K:L:M:N,
P-Q-R-S-P,
C-I, L-T, O-T, M-S,
C-P, C-L, I-U-V,V-H,U-H,H-W)
plot(graph)
Thanks for your help.
You will want to do something like:
Compute the k-coreness of each node (just called Graph.coreness in the python bindings, don't know about R).
Find the node with k-coreness 2, that connects to the largest number of nodes with k-coreness 1.
Edit:
Your counter-example was spot on, so I resorted to brute force (which is still linear time in this case).
This is a brute force python implementation that could be optimised (only loop over nodes with k-coreness 1), but it completes in linear time and should be accessible even if you don't know python.
import numpy as np
import igraph
def maximise_damage(graph):
coreness = graph.coreness()
# find number of leaves for each node
n = graph.vcount()
number_of_leaves = np.zeros((n))
for ii in range(n):
if coreness[ii] == 1:
neighbour = graph.neighbors(ii) # list of length 1
number_of_leaves[neighbour] += 1
# rank nodes by number of leaves
order = np.argsort(number_of_leaves)
# reverse order such that the first element has the most leaves
order = order[::-1]
return order, number_of_leaves[order]
EDIT 2:
Just realised this will not work in general for cases where you want to delete more than 1 node at a time. But I think the general approach would still work -- I will think about it some more.
EDIT 3:
Here we go; still linear. You will need to process the output a little bit though -- some solutions are less than the number of nodes that you want to delete, and then you have to combine them.
import numpy as np
import igraph
def maximise_damage(graph, delete=1):
# get vulnerability
# nodes are vulnerable if their degree count is lower
# than the number of nodes that we want to delete
vulnerability = np.array(graph.degree())
# create a hash table to keep track of all combinations of nodes to delete
combinations = dict()
# loop over vulnerable nodes
for ii in np.where(vulnerability <= delete)[0]:
# find neighbours of vulnerable nodes and
# count the number of vulnerable nodes for that combination
neighbours = tuple(graph.neighbors(ii))
if neighbours in combinations:
combinations[neighbours] += 1
else:
combinations[neighbours] = 1
# determine rank of combinations by number of vulnerable nodes dangling from them
combinations, counts = combinations.keys(), combinations.values()
# TODO:
# some solutions will contain less nodes than the number of nodes that we want to delete;
# combine these solutions
return combinations, counts
I have a graph G(V,E), the number of edges is 35000 and the number of nodes is 3500,
Is there anyway I can generate a origin-destination list within n (say 4) stops for each node?
I think the function neighborhood() does exactly what you want. Set the order argument to 4 and for each vertex you'll get a vector of vertex ids for the vertices that are at most 4 steps away from it.
I figure it out:
Use the property of the adjacency matrix A, the entry in row i and column j of A^n gives the number of (directed or undirected) walks of length n from vertex i to vertex j. So for n stop, construct n matrix An, A(n-1)......A1, in which, An= A^n. Then the union of An,An-1....A1 should be the matrix that representing n stop reachable destinations for an origin.