I would like to know which node(s) should I delete if I want to maximize the number of isolated node in my undirected network?
For instance in the following R script, I would like the result to be H if I delete 1 node and H & U if I delete 2 nodes and so on ...
library(igraph)
graph <- make_graph( ~ A-B-C-D-A, E-A:B:C:D,
G-H-I,
K-L-M-N-K, O-K:L:M:N,
P-Q-R-S-P,
C-I, L-T, O-T, M-S,
C-P, C-L, I-U-V,V-H,U-H,H-W)
plot(graph)
Thanks for your help.
You will want to do something like:
Compute the k-coreness of each node (just called Graph.coreness in the python bindings, don't know about R).
Find the node with k-coreness 2, that connects to the largest number of nodes with k-coreness 1.
Edit:
Your counter-example was spot on, so I resorted to brute force (which is still linear time in this case).
This is a brute force python implementation that could be optimised (only loop over nodes with k-coreness 1), but it completes in linear time and should be accessible even if you don't know python.
import numpy as np
import igraph
def maximise_damage(graph):
coreness = graph.coreness()
# find number of leaves for each node
n = graph.vcount()
number_of_leaves = np.zeros((n))
for ii in range(n):
if coreness[ii] == 1:
neighbour = graph.neighbors(ii) # list of length 1
number_of_leaves[neighbour] += 1
# rank nodes by number of leaves
order = np.argsort(number_of_leaves)
# reverse order such that the first element has the most leaves
order = order[::-1]
return order, number_of_leaves[order]
EDIT 2:
Just realised this will not work in general for cases where you want to delete more than 1 node at a time. But I think the general approach would still work -- I will think about it some more.
EDIT 3:
Here we go; still linear. You will need to process the output a little bit though -- some solutions are less than the number of nodes that you want to delete, and then you have to combine them.
import numpy as np
import igraph
def maximise_damage(graph, delete=1):
# get vulnerability
# nodes are vulnerable if their degree count is lower
# than the number of nodes that we want to delete
vulnerability = np.array(graph.degree())
# create a hash table to keep track of all combinations of nodes to delete
combinations = dict()
# loop over vulnerable nodes
for ii in np.where(vulnerability <= delete)[0]:
# find neighbours of vulnerable nodes and
# count the number of vulnerable nodes for that combination
neighbours = tuple(graph.neighbors(ii))
if neighbours in combinations:
combinations[neighbours] += 1
else:
combinations[neighbours] = 1
# determine rank of combinations by number of vulnerable nodes dangling from them
combinations, counts = combinations.keys(), combinations.values()
# TODO:
# some solutions will contain less nodes than the number of nodes that we want to delete;
# combine these solutions
return combinations, counts
Related
I recently began working on r for social network analysis. Everything goes well and up until now, I found answers to my questions here or on google. But not this time!
I am trying to find a way to calculate "vertex reciprocity" (% of reciprocal edges of each actor of the network). On igraph, reciprocity(g) works fine to calculate the reciprocity of the whole network, but it doesn't help me with the score per actor. Does anybody know what I could do?
Thank you!
I am going to assume that you have a simple graph, that is no loops and no multiple links between nodes. In that case, it is fairly easy to compute this. What does it mean for a link to be reciprocated? When there is a link from a to b, there is a link back from b to a. That means that there is a path of length two from a to itself a->b->a. How many such paths are there? If A is the adjacency matrix, then the entries of AA gives the number of paths of length two. We only want the ones from a node to itself, so we want the diagonal of AA. This will only count a->b->a as one path, but you want to count it twice: once for the link a->b and once for b->a. So for each node you can get the number of reciprocated links from 2*diag(A*A). You want to divide by the total number of links to and from a which is just the degree.
Let me show the computation with an example. Since you do not provide any data, I will use the Enron email data that is available in the 'igraphdata' package. It has loops and multiple links which i will remove. It also has a few isolated vertices, which I will also remove. That will leave us with a connected, directed graph with no loops.
library(igraph)
library(igraphdata)
data(enron)
enron = simplify(enron)
## remove two isolated vertices
enron = delete_vertices(enron, c(72,118))
Now the reciprocity computation is easy.
EnronAM = as.matrix(as_adjacency_matrix(enron))
Path2 = diag(EnronAM %*% EnronAM)
degree(enron)
VertRecip = 2*Path2 / degree(enron)
Let's check it by walking through one node in detail. I will use node number 1.
degree(enron,1)
[1] 10
ENDS = ends(enron, E(enron))
E(enron)[which(ENDS[,1] == 1)]
+ 6/3010 edges from b72ec54:
[1] 1-> 10 1-> 21 1-> 49 1-> 91 1->104 1->151
E(enron)[which(ENDS[,2] == 1)]
+ 4/3010 edges from b72ec54:
[1] 10->1 21->1 105->1 151->1
Path2[1]
[1] 3
Node 1 has degree 10; 6 edges out and 4 edges in. Recip shows that there are three paths of length 2 from 1 back to itself.
1->10->1
1->21->1
1->151->1
That makes 6 reciprocated links and 4 unreciprocated links. The vertex reciprocity should be 6/10 = 0.6 which agrees with what we computed above.
VertRecip[1]
[1] 0.6
I have been practicing graph questions lately.
https://leetcode.com/problems/course-schedule-ii/
https://leetcode.com/problems/alien-dictionary/
The current way I detect cycles is to use two hashsets. One for visiting nodes, and one for fully visited nodes. And I push the result onto a stack with DFS traversal.
If I ever visit a node that is currently in the visiting set, then it is a cycle.
The code is pretty verbose and the length is long.
Can anyone please explain how I can use a more standard top-sort algorithm (Kahn's) to detect cycles and generate the top sort sequence?
I just want my method to exit or set some global variable which flags that a cycle has been detected.
Many thanks.
Khan's algorithm with cycle detection (summary)
Step 1: Compute In-degree: First we create compute a lookup for the in-degrees of every node. In this particular Leetcode problem, each node has a unique integer identifier, so we can simply store all the in-degrees values using a list where indegree[i] tells us the in-degree of node i.
Step 2: Keep track of all nodes with in-degree of zero: If a node has an in-degree of zero it means it is a course that we can take right now. There are no other courses that it depends on. We create a queue q of all these nodes that have in-degree of zero. At any step of Khan's algorithm, if a node is in q then it is guaranteed that it's "safe to take this course" because it does not depend on any courses that "we have not taken yet".
Step 3: Delete node and edges, then repeat: We take one of these special safe courses x from the queue q and conceptually treat everything as if we have deleted the node x and all its outgoing edges from the graph g. In practice, we don't need to update the graph g, for Khan's algorithm it is sufficient to just update the in-degree value of its neighbours to reflect that this node no longer exists.
This step is basically as if a person took and passed the exam for
course x, and now we want to update the other courses dependencies
to show that they don't need to worry about x anymore.
Step 4: Repeat: When we removing these edges from x, we are decreasing the in-degree of x's neighbours; this can introduce more nodes with an in-degree of zero. During this step, if any more nodes have their in-degree become zero then they are added to q. We repeat step 3 to process these nodes. Each time we remove a node from q we add it to the final topological sort list result.
Step 5. Detecting Cycle with Khan's Algorithm: If there is a cycle in the graph then result will not include all the nodes in the graph, result will return only some of the nodes. To check if there is a cycle, you just need to check whether the length of result is equal to the number of nodes in the graph, n.
Why does this work?:
Suppose there is a cycle in the graph: x1 -> x2 -> ... -> xn -> x1, then none of these nodes will appear in the list because their in-degree will not reach 0 during Khan's algorithm. Each node xi in the cycle can't be put into the queue q because there is always some other predecessor node x_(i-1) with an edge going from x_(i-1) to xi preventing this from happening.
Full solution to Leetcode course-schedule-ii in Python 3:
from collections import defaultdict
def build_graph(edges, n):
g = defaultdict(list)
for i in range(n):
g[i] = []
for a, b in edges:
g[b].append(a)
return g
def topsort(g, n):
# -- Step 1 --
indeg = [0] * n
for u in g:
for v in g[u]:
indeg[v] += 1
# -- Step 2 --
q = []
for i in range(n):
if indeg[i] == 0:
q.append(i)
# -- Step 3 and 4 --
result = []
while q:
x = q.pop()
result.append(x)
for y in g[x]:
indeg[y] -= 1
if indeg[y] == 0:
q.append(y)
return result
def courses(n, edges):
g = build_graph(edges, n)
ordering = topsort(g, n)
# -- Step 5 --
has_cycle = len(ordering) < n
return [] if has_cycle else ordering
First I'll explain the problem. I have a player in a closed maze filled with items that he should collect to win the game. We also have an opponent which tries to do just the same.The player with the biggest amount for items collected wins. Suppose the opponent follows a BFS algorithm to collect the items, and we have access to all its decisions for every turn, can we make some prediction on what items in the maze should we go to first (so it doesn't get a chance in having the ones close to it), or just pin point a location where items are more dense?
It feels like randomness could also affect this very badly (most of the items land next to the opponent for example). What about if the opponent follows an A* algorithm?
I have already implemented an A* algorithm for our player.First, I look for the closest item heuristically using manhattan distance, then i go collect it and look for the new closest one again and so on.I feel like the "looking for the closest item" method might not be that efficient, maybe pin pointing (somehow haha) a location where the items are more dense is better as i said.
def astar(start, items, mazeMap):
# mazeMap is a dictionary with nodes and as a key for every node
is associated another dictionary containing the neighbors as keys
and the weight of edges to them as values
# items is a list of pairs giving the location of each item
# Apparent goal
# goal is a pair (closest_item, distance_to_closest_item)
goal = closest_item(start, items)
# Set of nodes not needed to be checked anymore
# closedSet = {node: [gscore, fscore]}
closedSet = {}
# Set of potential short-path nodes
# openSet = {node: [gscore, fscore]}
openSet = {start: [0, goal[1]]}
# Set to construct the optimal path
cameFrom = {}
while len(openSet) > 0:
# Looking for the node with the smallest fscore
current = list(openSet.keys())[0]
for keys, values in openSet.items():
if values[1] < openSet[current][1]:
current = keys
# If the chosen node is an item of cheese, we are done
if current in items:
return reconstruct_path(cameFrom, current)
# The current node no longer needs to be checked
closedSet[current] = openSet[current]
del openSet[current]
for keys, values in mazeMap[current].items():
# We don't need to check the node if it's already been done
if keys in list(closedSet.keys()):
continue
# Calculate Gscore
tentative_gscore = closedSet[current][0] + values
if keys not in list(openSet.keys()):
openSet[keys] = [0, 0]
elif tentative_gscore >= openSet[keys][0]:
continue
# This new path is better than the previous one, save it !
cameFrom[keys] = current
openSet[keys][0] = tentative_gscore
openSet[keys][1] = tentative_gscore + manhattan_distance(keys, goal[0])
return "Impossible"
I have a directed graph (grafopri1fase1) the graph has no loops and it has a tree structure (not binary tree).
I have an array of nodes (meterdiretti) that i have extracted from the graph (grafopri1fase1) matching a condition.
I would like to know starting from each node of Meterdiretti how many nodes are under each node of Meterdiretti.
The result I would like to have is a Matrix with the following format
first column------------ second column
meterdiretti[1] -------- total amount of nodes reachable starting from meterdiretti[1]
meterdiretti[2] -------- total amount of nodes reachable starting from meterdiretti[2]
....
meterdiretti[n] ----------total amount of nodes reachable starting from meterdiretti[n]
Take a punt at what you want - it would be good if you could add a reproducible example to your question.
I think what you want is to count the descendents of a node. You can do this with neighborhood.size and mode="out" argument.
library(igraph)
# create a random graph
g <- graph.tree(17, children = 2)
plot(g, layout=layout.reingold.tilford)
# test on a single node
neighborhood.size( g, vcount(g), "1", "out") - 1
# [1] 16
# apply over a few nodes
neighborhood.size( g, vcount(g), c(1,4,7), "out") - 1
[1] 16 4 2
I've been given the following exercise: There's an unweighted, directed, weakly connected graph with n nodes (n < 1 000 000). We want to traverse the whole graph, starting from the least number of nodes. The question is: from which nodes do I start the traversals? I couldn't find any content on this particular topic. However, I managed to come up with an algorithm, but it's not efficient enough:
I store the graph in an adjacency list (n can be too high for a two-dimensional matrix)
I start a BFS from each node i, and store the nodes it reached in x[i][...] (x = List<List<int>>)
I check whether any x[i].Count == n
I check whether any (x[i] union x[j]).Count == n
I check whether any (x[i] union x[j] union x[k]).Count == n
... So I make all possible unions of 2, 3, 4... subsets of x, and check whether its count is n.
It works all right if n is not too high, but I would need a more efficient algorithm for bigger n.
Any help is appreciated (you would make me be able to fall asleep again)! :)
Find the nodes that do not have any incoming edges. Loop over these nodes, and for each node v, begin traversing the graph. Remember which nodes you visited (by putting them in a hash table or marking them). Stop traversing when you reach a node you have already visited.
You would need an adjacency list representation, where each node has a list of incoming and a list of outgoing edges. Then do something like this:
Set nodesToVisit = emptySet;
for i=1 to n:
if incoming[i].size() == 0:
nodesToVisit.add(i)
Set visited = emptySet;
for v in nodesToVisit:
nodesToVisit.remove(v)
if(v is not in visited):
visit(v);
visited.add(v);
for u in outgoing[v]:
nodesToVisit.add(u)