I am new to R/igraph. I would like to remove N nodes randomly from a graph. However, I could not find the right way to do that. I have generated the Erdos-Renyi graph with the help of the igraph package with 400 vertices.
igraph provides the deletion of the vertices, but not in the random way.
For example: delete.vertices(graph, v).
I referred to this documentation.
I also searched the web and previous questions on Stack Overflow, but could not get the right answer.
Can anyone please tell or refer me to documentation on how to remove the N (lets say N = 100) random nodes?
Basically you just need to generate a vector of random numbers ranging from 1 to 400:
random.deletes <- runif(n=100, min=1, max=400)
And then apply it:
my.new.graph <- delete.vertices(graph, random.deletes)
Of course, both can be done at once but you'd lose track of the deleted nodes:
my.new.graph <- delete.vertices(graph, runif(n=100, min=1, max=400))
Related
I have generated an undirected regular graph with an even number of nodes with the same degree, e.g. k, by using the function k.regular.game of the R package igraph.
Now I need to iteratively add one edge to each node, so that in each iteration the degree remains constant for every node and it is equal to k + i, where i is the number of iterations performed.
In addition, I want connections to be preserved in each iteration, that is: the set of neighbors of agent j for iteration i should be the same of the set of neighbors of agent j for iteration i + 1 except for one connection: e.g., if j is connected to w and y when k = 2, j must be connected to w, y and z when k = 3.
My final goal is to obtain (n-1) graphs, where n is equal to the number of nodes in the regular graph. As a result, I will obtain that the first generated graph has k = 1 and the last generated graph has k = (n-1).
Any suggestion on how to do this?
This is a nice network problem solved with two partial solutions below.
Let's imagine there is a function which would bring a graph g from all degrees being 1 to all degrees being 2. It would have to be a graph with an even number of nodes.
increment.k <- function(g){}
It follows that increment.k will increase the degree of each node by one by adding |V|/2 edges to it - one edge for each two nodes in the graph. From what I understand from your problem specification, any of those edges must not connect agin two nodes that are already connected. This makes increment.k() a puzzle in which a random edge between two nodes might close the possibility for all nodes to reach the new k-value of degrees. What if a graph has k=1 and we start adding edges at random only to arrive at the last edge only to find that the only two nodes still with degree 1 are already connected?!
I cannot intuitively grasp if this allows for the possibility of graphs that cannot be incremented since no combination of random edges allows for the creation of |V|/2 edges between previously unconnected nodes. But I can imagine that such graphs exist.
I've done this example on a graph with 20 nodes (which consequently can have a k between 1 and 19):
g <- k.regular.game(no.of.nodes=20, k=1, directed=F)
What if you were to generate random k.regular.games with a higher k until you found a graph where the edges of your graph is a subset of the edges of the higher-k random graph? It should be spectacularly slow.
The problem, of course, is that you don't want to allow for duplicated arches. If not, the solution would be quite simple:
increase.k.allowing.duplicates <- function(graph){
if(length(V(graph))%%2!=0){
stop("k can only be incremented for graphs with an even number of nodes.")
}
# Add random edges to the graph and allow dual edges just to increase k
graph %>% add_edges(as.numeric(sample(1:length(V(graph)), length(V(graph)))))
}
The above code would solve the problem if double arches were allowed. This would return graphs of ever higher k, and would let k go towards infinity since the number of nodes of the graph don't set any maximum average degree of the graph.
I have come up with this Montecarlo approach below. To increase k by one, a given number of edges is added one by one between nodes, but if the loop runs out of alternatives when placing arches between nodes that are 1) not connected and 2) not already incremented to the higher k/degree, the process of creating a new graph with a higher k starts over. The function has a maximum number of tries start over in maximum.tries.
increase.k <- function(graph, maximum.tries=200){
if(length(V(graph))%%2!=0){
stop("k can only be incremented for graphs with an even number of nodes.")
}
k <- mean(degree(graph))
if(k != round(k) ){
stop("Nodes in graph do not have the same degree")
}
if(k >= length(V(graph))-1 ) {
stop("This graph is complete")
}
# each node has the following available arches before starting the iteration:
#posisble.arches <- lapply(neighbors(graph,1), function(x) setdiff(V(graph), x[2:length(x)]))
# Here we must lay the puzzle. If we run into a one-way street with the edges we add, we'll have to start afresh
original.graph <- graph
for(it in 1:maximum.tries){
# We might need many tries to get the puzzle right by brute-forcing
# For each try we increment in a loop to avoid duplicate links
for(e_ij in 1:(length(V(graph))/2)){
# Note that while(mean(degree(graph)) < k + 1){} is a logical posibility, but less safe
# Add a new edge between two nodes of degree k. i is any such node and j is any such node not already connected to i
i <- sample(as.numeric(V(graph)[degree(graph)==k]), 1)
js <- as.numeric(V(graph)[degree(graph) == k * !V(graph) %in% c(as.numeric(neighbors(graph,i)), i)])
# Abandon this try if no node unconnected to i and with degree == k exists
if(length(js)==0){break}
j <- sample(c(js), 1); if(length(js)==1){j<-js}
graph <- graph %>% add_edges(c(i,j))
}
# Did we lay the puzzle to completion successfully crating a random graph with a higher k?
if(mean(degree(graph)) == k+1){
# Success
print(paste("Succeded at iteration ", it))
break
} else {
# Failure, let's try again
graph <- original.graph
print("Failed")
}
}
(graph)
}
# Compare the two approaches
g1 <- increase.k.allowing.duplicates(g)
g2 <- increase.k(g)
degree(g1) == degree(g2)
l <- layout_with_gem(g2)
par(mfrow=c(1,2))
plot(g1, layout=l, vertex.label="")
plot(g2,layout=l, vertex.label="")
dev.off()
# Note that increase.k() can be run incrementally up untill a complete graph:
is.complete <- function(graph){mean(degree(graph)) >= (length(V(graph))-1)}
while(!is.complete(g)){
print(mean(degree(g)))
g <- increase.k(g)
}
# and that increase.k() cannot increase k in already complete graphs.
g <- increase.k(g)
The above code has solved the problem for some graphs. More iterations are needed to lay the puzzle the larger the graph is. In this example with only 20 nodes, each k-level can be generated from 1-19 relatively quickly. I did manage to get 19 separate networks from k=1 to k=19. But I have managed to get stuck in the loop also, which I take as evidence for the existing network structures of which k cannot be successfully incremented. Particularly since the same starting specification can get stuck sometimes, but manage to arrive at a complete graph on other occasions.
To test the function, I set the maximum.tries to 25 and tried to go from k=1 to 19 100 times. It never worked. The higher the k, the more difficult it is to lay the puzzle and find arches that fit, even though the next-to-last iteration is faster before a collapse. The risk of hitting the cap of 25 increased between the 15th and 18th iteration, and most graphs only made it to k=17.
It is possible to imagine this method being performed backwards starting at a complete graph, removing edges within a Montecarlo process which tries to remove edges to achieve a graph with all degrees at k-1. It should run into similar problems, though.
The code above is really an attempt to brute-force this problem without going into the underlying mathematics of graphs of this type. I am not a mathematician and lack the skills, but maybe the creation of a fail-safe k.increment()-function is a real and unsolved mathematical problem. If any graph-theoreticians come by this post, please enlighten us.
I have a large transition matrix that I want to plot a graph of in r. I have chosen the markovchain package to do this, which allows me to turn this matrix into a markovchain object and then plot it as follows:
library(markovchain)
tMat = matrix(c(0,.2,.7,.1,.3,.4,.3,.1,.4,.5),3,3)
mc = new("markovchain",transitionMatrix = tMat)
plot(mc)
which produces the following output:
of course, this is just an example, and as I mentioned before the real transition matrix is much hairier.
My question is: how can I plot only edges that have values greater than some minimum threshold? If I try to "zero out" all values below a certain threshold, markovchain complains that the rows do not sum to one (because it is then no longer a singularly sochastic matrix). But for a very complicated graph, it is less important that the edges connected to a vertex sum to 1, and more important that the graph remains readable. Is there any way to do this?
I know that the plot function is built on top of igraph.plot, so I am hoping that there is some option there that might help?
Any suggestions would be much appreciated!
-Paul
woops: i answered my own question. Just wanted to leave this here in case other people encounter the same problem: you can simply create the markovchain object, and then go into its transitionMatrix attribute and edit the values directly:
mc#transitionMatrix[mc#transitionMatrix<.2] = 0
which produces:
Now a god follow-up question which actually gets at the original problem and would be a better solution is: how to only suppress the numbers in teh graph output rather than deleting the lines altogether? It leads to ugly situations where previously connected nodes/vertices become islands. I think this would involve going into the part of the igraph.plot object that stores these values, which I don't know how to do even after reseraching quite a bit.
how to only suppress the numbers in teh graph output rather than
deleting the lines altogether?
Coerce the markovchain object to an igraph object; now you got all flexibility you need:
library(markovchain)
statesNames=c("a","b","c")
mc<-new("markovchain", states=statesNames, transitionMatrix=
matrix(c(0.2,0.5,0.3,
0,1,0,
0.1,0.8,0.1),nrow=3, byrow=TRUE, dimnames=list(statesNames,statesNames)
))
g <- as(mc, "igraph")
min <- 0.5
plot(g, edge.label=ifelse(E(g)$prob>=min, E(g)$prob, NA))
I'm fairly new to IGraph in R.
I'm doing community detection using IGraph and have already built my communities /clusters using the walktrap technique.
Next, within each cluster, I want to count the number of vertices between each two certain vertices. The reason I want to do this is, for each vertex XX, I want to list vertices that are connected to XX via say max 3 vertices, meaning no further than 3 vertices away from XX.
Can anyone help how this can be done in R please?
making a random graph (for demonstration):
g <- erdos.renyi.game(100, 1/25)
plot(g,vertex.size=3)
get walktrap communities and save as vertex attribute:
V(g)$community<-walktrap.community(g, modularity = TRUE, membership = TRUE)$membership
V(g)$community
now make a subgraph containing only edges and vertices of one community, e.g. community 2:
sub<-induced.subgraph(g,v=V(g)$community==2)
plot(sub)
make a matrix containing all shortest paths:
shortestPs<-shortest.paths(sub)
now count the number of shortest paths smaller or equal to 3.
I also exclude shortest paths from each node to itself (shortestPaths!=0).
also divide by two because every node pair appears twice in the matrix for undirected graphs.
Number_of_shortest_paths_smaller_3 <- length(which(shortestPs<=3 & shortestPs!=0))/2
Number_of_shortest_paths_smaller_3
Hope that's close to what you need, good luck!
I have 4 undirected graph with 1000 vertices and 176672, 150994, 193477, 236060 edges. I am trying to see interaction between a specific set of nodes (16 in number) for each graph. This visualization in tkplot is not feasible as 1000 vertices is already way too much for it. I was thinking of if there is some way to extract the interaction of these 16 nodes from the parent graph and view separately, which will be then more easy to handle and work with in tkplot. I don't want the loss of information as in what is the node(s) in he path of interaction if it comes from other than 16 pre-specified nodes. Is there a way to achieve it?
In such a dense graph, if you only take the shortest paths connecting each pair of these 16 vertices, you will still get a graph too large for tkplot, or even to see any meaningful on a cairo pdf plot.
However, if you aim to do it, this is one possible way:
require(igraph)
g <- erdos.renyi.game(n = 1000, p = 0.1)
set <- sample(1:vcount(g), 16)
in.shortest.paths <- NULL
for(v in set){
in.shortest.paths <- c(in.shortest.paths,
unlist(get.all.shortest.paths(g, from = v, to = set)$res))
}
subgraph <- induced.subgraph(g, unique(in.shortest.paths))
In this example, subgraph will include approx. half of all the vertices.
After this, I think you should consider to find some other way than visualization to investigate the relationships between your vertices of interest. It can be some topological metric, but it really depends on the aims of your analysis.
I'm using the igraph package in R. I have a connected graph G=(V,E), how can I randomly remove some edges (say, n < |E|) but without disconnecting the given graph. In other words, I mustn't remove any bridges. Any hints on how that could be done?
A simple approach would be to keep randomly selecting and removing sets of n nodes until you found a set that doesn't increase the number of components of the graph:
remove.edges <- function(g, n) {
num.tries <- 0
while (TRUE) {
num.tries <- num.tries + 1
g2 <- delete.edges(g, E(g)[sample(seq_along(E(g)), n)])
if (no.clusters(g2) == no.clusters(g)) {
print(paste("Total number of tries:", num.tries))
return(g2)
}
}
}
Let's try it out with a sample graph:
library(igraph)
set.seed(144)
g <- erdos.renyi.game(10, 0.4)
g2 <- remove.edges(g, 5)
# [1] "Total number of tries: 3"
This could be terribly inefficient for a large, sparse graph coupled with a large n value. In that case, you'll probably want to run something like Tarjan's Bridge-Finding Algorithm at each iteration and limit your random selections to not be bridges. Unfortunately, I can't find an implementation of that algorithm in R, so you'd probably need to do some implementation to get that to work.
A simple technique is to find a cycle in the graph and remove an edge from this cycle. To find a cycle I would do a depth first search until you find a node you have previously seen on the search.
For example, if you are at node x while performing the DFS and you discover a node y in x's neighborhood, then if x also already exists in the DFS tree, you have found a cycle. At this point you can safely remove any edge on this cycle without risk of it being a bridge. This should run pretty quickly if the graph isn't very sparse.
Note that in practice this DFS technique will often just resemble a random walk around the graph until encountering a previously seen node.