Get node descendants in a tree graph - r

I have a directed graph (grafopri1fase1) the graph has no loops and it has a tree structure (not binary tree).
I have an array of nodes (meterdiretti) that i have extracted from the graph (grafopri1fase1) matching a condition.
I would like to know starting from each node of Meterdiretti how many nodes are under each node of Meterdiretti.
The result I would like to have is a Matrix with the following format
first column------------ second column
meterdiretti[1] -------- total amount of nodes reachable starting from meterdiretti[1]
meterdiretti[2] -------- total amount of nodes reachable starting from meterdiretti[2]
....
meterdiretti[n] ----------total amount of nodes reachable starting from meterdiretti[n]

Take a punt at what you want - it would be good if you could add a reproducible example to your question.
I think what you want is to count the descendents of a node. You can do this with neighborhood.size and mode="out" argument.
library(igraph)
# create a random graph
g <- graph.tree(17, children = 2)
plot(g, layout=layout.reingold.tilford)
# test on a single node
neighborhood.size( g, vcount(g), "1", "out") - 1
# [1] 16
# apply over a few nodes
neighborhood.size( g, vcount(g), c(1,4,7), "out") - 1
[1] 16 4 2

Related

Vertex reciprocity - social network analysis on r

I recently began working on r for social network analysis. Everything goes well and up until now, I found answers to my questions here or on google. But not this time!
I am trying to find a way to calculate "vertex reciprocity" (% of reciprocal edges of each actor of the network). On igraph, reciprocity(g) works fine to calculate the reciprocity of the whole network, but it doesn't help me with the score per actor. Does anybody know what I could do?
Thank you!
I am going to assume that you have a simple graph, that is no loops and no multiple links between nodes. In that case, it is fairly easy to compute this. What does it mean for a link to be reciprocated? When there is a link from a to b, there is a link back from b to a. That means that there is a path of length two from a to itself a->b->a. How many such paths are there? If A is the adjacency matrix, then the entries of AA gives the number of paths of length two. We only want the ones from a node to itself, so we want the diagonal of AA. This will only count a->b->a as one path, but you want to count it twice: once for the link a->b and once for b->a. So for each node you can get the number of reciprocated links from 2*diag(A*A). You want to divide by the total number of links to and from a which is just the degree.
Let me show the computation with an example. Since you do not provide any data, I will use the Enron email data that is available in the 'igraphdata' package. It has loops and multiple links which i will remove. It also has a few isolated vertices, which I will also remove. That will leave us with a connected, directed graph with no loops.
library(igraph)
library(igraphdata)
data(enron)
enron = simplify(enron)
## remove two isolated vertices
enron = delete_vertices(enron, c(72,118))
Now the reciprocity computation is easy.
EnronAM = as.matrix(as_adjacency_matrix(enron))
Path2 = diag(EnronAM %*% EnronAM)
degree(enron)
VertRecip = 2*Path2 / degree(enron)
Let's check it by walking through one node in detail. I will use node number 1.
degree(enron,1)
[1] 10
ENDS = ends(enron, E(enron))
E(enron)[which(ENDS[,1] == 1)]
+ 6/3010 edges from b72ec54:
[1] 1-> 10 1-> 21 1-> 49 1-> 91 1->104 1->151
E(enron)[which(ENDS[,2] == 1)]
+ 4/3010 edges from b72ec54:
[1] 10->1 21->1 105->1 151->1
Path2[1]
[1] 3
Node 1 has degree 10; 6 edges out and 4 edges in. Recip shows that there are three paths of length 2 from 1 back to itself.
1->10->1
1->21->1
1->151->1
That makes 6 reciprocated links and 4 unreciprocated links. The vertex reciprocity should be 6/10 = 0.6 which agrees with what we computed above.
VertRecip[1]
[1] 0.6

Maximize the number of isolated nodes in a network

I would like to know which node(s) should I delete if I want to maximize the number of isolated node in my undirected network?
For instance in the following R script, I would like the result to be H if I delete 1 node and H & U if I delete 2 nodes and so on ...
library(igraph)
graph <- make_graph( ~ A-B-C-D-A, E-A:B:C:D,
G-H-I,
K-L-M-N-K, O-K:L:M:N,
P-Q-R-S-P,
C-I, L-T, O-T, M-S,
C-P, C-L, I-U-V,V-H,U-H,H-W)
plot(graph)
Thanks for your help.
You will want to do something like:
Compute the k-coreness of each node (just called Graph.coreness in the python bindings, don't know about R).
Find the node with k-coreness 2, that connects to the largest number of nodes with k-coreness 1.
Edit:
Your counter-example was spot on, so I resorted to brute force (which is still linear time in this case).
This is a brute force python implementation that could be optimised (only loop over nodes with k-coreness 1), but it completes in linear time and should be accessible even if you don't know python.
import numpy as np
import igraph
def maximise_damage(graph):
coreness = graph.coreness()
# find number of leaves for each node
n = graph.vcount()
number_of_leaves = np.zeros((n))
for ii in range(n):
if coreness[ii] == 1:
neighbour = graph.neighbors(ii) # list of length 1
number_of_leaves[neighbour] += 1
# rank nodes by number of leaves
order = np.argsort(number_of_leaves)
# reverse order such that the first element has the most leaves
order = order[::-1]
return order, number_of_leaves[order]
EDIT 2:
Just realised this will not work in general for cases where you want to delete more than 1 node at a time. But I think the general approach would still work -- I will think about it some more.
EDIT 3:
Here we go; still linear. You will need to process the output a little bit though -- some solutions are less than the number of nodes that you want to delete, and then you have to combine them.
import numpy as np
import igraph
def maximise_damage(graph, delete=1):
# get vulnerability
# nodes are vulnerable if their degree count is lower
# than the number of nodes that we want to delete
vulnerability = np.array(graph.degree())
# create a hash table to keep track of all combinations of nodes to delete
combinations = dict()
# loop over vulnerable nodes
for ii in np.where(vulnerability <= delete)[0]:
# find neighbours of vulnerable nodes and
# count the number of vulnerable nodes for that combination
neighbours = tuple(graph.neighbors(ii))
if neighbours in combinations:
combinations[neighbours] += 1
else:
combinations[neighbours] = 1
# determine rank of combinations by number of vulnerable nodes dangling from them
combinations, counts = combinations.keys(), combinations.values()
# TODO:
# some solutions will contain less nodes than the number of nodes that we want to delete;
# combine these solutions
return combinations, counts

How to create graph with large number of points in R?

I have a large dataset contains a large number of nodes; more than 25000 nodes, organized in a .csv file. The structure is similar to the following :
node freq
a 3
b 2
c 5
I want to create a graph from these node in which edges between nodes are constructed by a function of the freq column. I have used the rgraph function from sna package, such as:
num_nodes <- length(data$node)
pLink = data$freq/10
# create 1 graph with nodes and link proability, graph loops = FALSE
graph_adj= rgraph(num_nodes,1,pLink,"graph",FALSE)
graph <- graph.adjacency(graph_adj, mode="undirected")
The above code is running in case of small number of nodes, but with large number of nodes, The R session aborted with the following Error:
Error: C stack usage 19924416 is too close to the limit
Is there another way to create a graph with the mentioned properties: a large number of nodes and edges are created with probability?

Generate directed lattice with equal number of neighbours for each node

I would like to generate a lattice with 100 nodes but would like to ensure that all nodes have the same number of neighbours.
However when I do:
d=graph.lattice(100,0,nei=10,directed=TRUE,circular=TRUE)
get.edgelist(d)
then I can see that many of the nodes do not have the same number of neighbours.
Is there any way to ensure that every node has the same number of connections assuming that the first column represents nodes and the second column connections?
This is because the default edge directions for graph.lattice are not the best for directed graphs. What you can do is creating an undirected graph, and then converting it to directed:
d <- as.directed(graph.lattice(100, 0, nei=10, directed=FALSE, circular=TRUE))
unique(degree(d, mode="in"))
# [1] 20
unique(degree(d, mode="out"))
# [1] 20
If you want non-mutual edges, then the easiest (but somewhat less readable) solutions is
d <- graph(sapply(1:100, function(i) {
rbind(i, ((i+1):(i+10)-1) %% 100 + 1)
}))
unique(degree(d, mode="in"))
# [1] 10
unique(degree(d, mode="out"))
# [1] 10
You can create a edgelist and make the graph from that. In that case, assuming that you only consider neighbors linked to (directed), then you could do something like:
el <- do.call(rbind,
lapply(1:100,
function(e) {cbind(rep(e,10),
sample(setdiff(1:100, e),10))}))
d <- graph.edgelist(el)
This picks 10 random nodes (other than itself) to link a node to.

How to copy a vertex with it's respective edges (all/in/out) from a directed graph g, to a new directed graph g1?

Is there a method or a class in igraph to do this procedure fast and efectively?
Let's assume that your graph is in g and the set of vertices to be used is in sampled (which is a vector consisting of zero-based vertex IDs).
First, we select the set of edges where at least one endpoint is in sampled:
all.vertices <- (1:vcount(g)) - 1
es <- E(g) [ sampled %--% 1:n ]
es is now an "edge sequence" object that consists of the edges of interest. Next, we take the edge list of the graph (which is an m x 2 matrix) and select the rows corresponding to the edges:
el <- get.edgelist(g)[as.vector(es)+1]
Here, as.vector(es) converts the edge sequence into a vector consisting of the edge IDs of the edges in the edge sequence, and use it to select the appropriate subset of the edge list. Note that we had to add 1 to the edge IDs because R vectors are indexed from 1 but igraph edge IDs are from zero.
Next, we construct the result from the edge list:
g1 <- graph(el, vcount(g), directed=is.directed(g))
Note that g1 will contain exactly as many vertices as g. You can take the subgraph consisting of the sampled vertices as follows:
g1 <- subgraph(g1, sampled)
Note to users of igraph 0.6 and above: igraph 0.6 will switch to 1-based indexing instead of 0-based, so there is no need to subtract 1 from all.vertices and there is no need to add 1 to as.vector(es). Furthermore, igraph 0.6 will contain a function called subgraph.edges, so one could simply use this:
g1 <- subgraph.edges(g, es)

Resources