r igraph find all cycles - r

I have directed igraph and want to fetch all the cycles. girth function works but only returns the smallest cycle. Is there a way in R to fetch all the cycles in a graph of length greater then 3 (no vertex pointing to itself and loops)

It is not directly a function in igraph, but of course you can code it up. To find a cycle, you start at some node, go to some neighboring node and then find a simple path back to the original node. Since you did not provide any sample data, I will illustrate with a simple example.
Sample data
## Sample graph
library(igraph)
set.seed(1234)
g = erdos.renyi.game(7, 0.29, directed=TRUE)
plot(g, edge.arrow.size=0.5)
Finding Cycles
Let me start with just one node and one neighbor. Node 2 connects to Node 4. So some cycles may look like 2 -> 4 -> (Nodes other than 2 or 4) -> 2. Let's get all of the paths like that.
v1 = 2
v2 = 4
lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p))
[[1]]
[1] 2 4 2
[[2]]
[1] 2 4 3 5 7 6 2
[[3]]
[1] 2 4 7 6 2
We see that there are three cycles starting at 2 with 4 as the second node. (I know that you said length greater than 3. I will come back to that.)
Now we just need to do that for every node v1 and every neighbor v2 of v1.
Cycles = NULL
for(v1 in V(g)) {
for(v2 in neighbors(g, v1, mode="out")) {
Cycles = c(Cycles,
lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p)))
}
}
This gives 17 cycles in the whole graph. There are two issues though that you may need to look at depending on how you want to use this. First, you said that you wanted cycles of length greater than 3, so I assume that you do not want the cycles that look like 2 -> 4 -> 2. These are easy to get rid of.
LongCycles = Cycles[which(sapply(Cycles, length) > 3)]
LongCycles has 13 cycles having eliminated the 4 short cycles
2 -> 4 -> 2
4 -> 2 -> 4
6 -> 7 -> 6
7 -> 6 -> 7
But that list points out the other problem. There still are some that you cycles that you might think of as duplicates. For example:
2 -> 7 -> 6 -> 2
7 -> 6 -> 2 -> 7
6 -> 2 -> 7 -> 6
You might want to weed these out. To get just one copy of each cycle, you can always choose the vertex sequence that starts with the smallest vertex number. Thus,
LongCycles[sapply(LongCycles, min) == sapply(LongCycles, `[`, 1)]
[[1]]
[1] 2 4 3 5 7 6 2
[[2]]
[1] 2 4 7 6 2
[[3]]
[1] 2 7 6 2
This gives just the distinct cycles.
Addition regarding efficiency and scalability
I am providing a much more efficient version of the code that I
originally provided. However, it is primarily for the purpose of
arguing that, except for very simple graphs, you will not be able
produce all cycles.
Here is some more efficient code. It eliminates checking many
cases that either cannot produce a cycle or will be eliminated
as a redundant cycle. In order to make it easy to run the tests
that I want, I made it into a function.
## More efficient version
FindCycles = function(g) {
Cycles = NULL
for(v1 in V(g)) {
if(degree(g, v1, mode="in") == 0) { next }
GoodNeighbors = neighbors(g, v1, mode="out")
GoodNeighbors = GoodNeighbors[GoodNeighbors > v1]
for(v2 in GoodNeighbors) {
TempCyc = lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p))
TempCyc = TempCyc[which(sapply(TempCyc, length) > 3)]
TempCyc = TempCyc[sapply(TempCyc, min) == sapply(TempCyc, `[`, 1)]
Cycles = c(Cycles, TempCyc)
}
}
Cycles
}
However, except for very simple graphs, there is a combinatorial
explosion of possible paths and so finding all possible cycles is
completely impractical I will illustrate this with graphs much smaller
than the one that you mention in the comments.
First, I will start with some small graphs where the number of edges
is approximately twice the number of vertices. Code to generate my
examples is below but I want to focus on the number of cycles, so I
will just start with the results.
## ecount ~ 2 * vcount
Nodes Edges Cycles
10 21 15
20 41 18
30 65 34
40 87 424
50 108 3433
55 117 22956
But you report that your data has approximately 5 times as
many edges as vertices. Let's look at some examples like that.
## ecount ~ 5 * vcount
Nodes Edges Cycles
10 48 3511
12 61 10513
14 71 145745
With this as the growth of the number of cycles, using 10K nodes
with 50K edges seems to be out of the question. BTW, it took several
minutes to compute the example with 14 vertices and 71 edges.
For reproducibility, here is how I generated the above data.
set.seed(1234)
g10 = erdos.renyi.game(10, 0.2, directed=TRUE)
ecount(g10)
length(FindCycles(g10))
set.seed(1234)
g20 = erdos.renyi.game(20, 0.095 , directed=TRUE)
ecount(g20)
length(FindCycles(g20))
set.seed(1234)
g30 = erdos.renyi.game(30, 0.056 , directed=TRUE)
ecount(g30)
length(FindCycles(g30))
set.seed(1234)
g40 = erdos.renyi.game(40, 0.042 , directed=TRUE)
ecount(g40)
length(FindCycles(g40))
set.seed(1234)
g50 = erdos.renyi.game(50, 0.038 , directed=TRUE)
ecount(g50)
length(FindCycles(g50))
set.seed(1234)
g55 = erdos.renyi.game(55, 0.035 , directed=TRUE)
ecount(g55)
length(FindCycles(g55))
##########
set.seed(1234)
h10 = erdos.renyi.game(10, 0.55, directed=TRUE)
ecount(h10)
length(FindCycles(h10))
set.seed(1234)
h12 = erdos.renyi.game(12, 0.46, directed=TRUE)
ecount(h12)
length(FindCycles(h12))
set.seed(1234)
h14 = erdos.renyi.game(14, 0.39, directed=TRUE)
ecount(h14)
length(FindCycles(h14))

Related

How to simulate a martingale process problem in R?

100 people are watching a theater.At the end of the show all of them are visiting the vesting room in order to take their coats.The man working on the vesting room give back people's coat totally at random.The participants that they will pick the right coat leave.The other that have picked the wrong one, give back the coat and the man again randomly gives back the coat.The process ends when all the customers of the theater take back their right coat.
I want to simulate in R this martingale process in order to find the expected time that this process will end.
But I don't know how .Any help ?
Something like:
# 100 customers
x = seq(1,100,by=1);x
# random sample from x
y = sample(x,100,replace=FALSE)
x==y
# for the next iteration exclude those how are TRUE and run it again until everyone is TRUE
The expected time is how many iterations where needed .
Or something like this :
n = 100
X = seq(1,100,by=1)
martingale = rep(NA,n)
iterations = 0
accept = 0
while (X != n) {
iterations = iterations + 1
y = sample(1:100,100,replace=FALSE)
if (X = y){
accept = accept + 1
X = X+1
martingale [X] = y
}
}
accept
iterations
One way to do this is as follows (using 10 people as an example, the print statement is unnecessary, just to show what's done in each iteration):
set.seed(0)
x <- 1:10
count <- 0
while(length(x) > 0){
x <- x[x != sample(x)]
print(x)
count <- count + 1
}
# [1] 1 2 3 4 5 6 7 9 10
# [1] 3 4 5 6 7 9
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 4 5 6 7
# [1] 3 6
#
count
# [1] 10
For each step in the loop, it removes the values of x where the customers have been randomly allocated their coat, until there are none left.
To use this code to get the expected time taken for 100 people, you could extend it to:
set.seed(0)
nits <- 1000 #simulate the problem 1000 times
count <- 0
for (i in 1:nits){
x <- 1:100
while(length(x) > 0){
x <- x[x != sample(x)]
count <- count + 1/nits
}
}
count
# [1] 99.901
I hypothesise without proof that the expected time for n people is n iterations - it seems pretty close when I tried with 50, 100 or 200 people.
I didn't follow your discussion above and I'm not entirely sure if that's what you want, but my rationale was as follows:
You have N people and queue them.
In the first round the first person has a chance of 1/N to get their clothes right.
At this point you have two options. Eitehr person 1 gets their clothes right or not.
If person 1 gets their clothes right, then person 2 has a chance of 1/(N-1) to get their clothes right. If person 1 didn't get the correct clothes, person 1 remains in the pool (at the end), and person 2 also has a 1/N probability to get their clothes right.
You continue to assign thes probabilities until all N persons have seen the clerk once. Then you sort out those who have the right clothes and repeat at step 1 until everyone has their clothes right.
For simulation purposes, you'd of course repeat the whole thing 1000 or 10000 times.
If I understand you correctly, you are interstes in the number of iterations, i.e. how often does the clerk have to go through the whole queue (or what remains of it) until everyone has their clothes.
library(tidyverse)
people <- 100
results <- data.frame(people = 1:people,
iterations = NA)
counter <- 0
finished <- 0
while (finished < people)
{
loop_people <- results %>%
filter(is.na(iterations)) %>%
pull(people)
loop_prob <- 1/length(loop_people)
loop_correct <- 0
for (i in 1:length(loop_people))
{
correct_clothes_i <- sample(c(0,1), size = 1, prob = c(1-loop_prob, loop_prob))
if (correct_clothes_i == 1)
{
results[loop_people[i], 2] <- counter + 1
loop_correct <- loop_correct + 1
loop_prob <- 1/(length(loop_people) - loop_correct)
}
}
counter <- counter + 1
finished <- length(which(!is.na(results$iterations)))
}
max(results$iterations)
[1] 86
head(results)
people iterations
1 1 7
2 2 42
3 3 86
4 4 67
5 5 2
6 6 9
The results$iterations column contains the iteration number where each person has gotten their clothes right, thus max(results$iterations) gives you the total number of loops.
I have no proof, but empirically and intuitively the number of required iterations should approach N.

How to repeatedly sample without replacement when sample size is greater than the population size

This seems like it must be a duplicate but I can't find a solution, probably because I don't know exactly what to search for.
Say I have a bucket of 8 numbered marbles and 10 people who will each sample 1 marble from the bucket.
How can I write a sampling procedure where each person draws a marble from the bucket without replacement until the bucket is empty, at which point all the marbles are put back into the bucket, and sampling continues without replacement? Is there a name for this kind of sampling?
For instance, a hypothetical result from this sampling with our 10 people and bucket of 8 marbles would be:
person marble
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 1
10 10 2
Note that the marbles are drawn randomly, so not necessarily in numerical order. This is just an example output to get my point across.
Building on the answer from MÃ¥nsT, here is a function to do this programmatically. I put in functionality to smoothly handle cases where the number of samples take is less than the population size (and return the expected behavior).
sample_more <- function(pop, n, seed = NULL) {
set.seed(seed) # For reproducibility, if desired. Defaults to NULL, which is no seed
m <- length(pop)
if(n <= m) { # handles case when n is smaller than the population size
sample(pop, n, replace = FALSE)
} else { # handle case when n is greater than population size
c(sample(pop, m, replace = FALSE), sample(pop, n-m, replace = FALSE))
}
}
marbles <- 1:8
sample_more(marbles, 10, seed = 1)
[1] 1 4 8 2 6 3 7 5 2 3
sample_more(marbles, 3, seed = 1)
[1] 1 4 8
Not sure if there is a name for this, but a simple solution would be to just use sample several times:
# Create a bucket with 8 marbles:
bucket <- 1:8
# First draw 8 marbles without replacement, then refill the bucket at draw 2 more:
marbles <- c(sample(bucket, 8, replace = FALSE), sample(bucket, 2, replace = FALSE))
marbles
You can dynamically use sample in a for loop to generate the list.
For marbles 1:8 and over n people
Example:
bucket <- 1:8
n <- 100
marbleList <- c()
for(i in 1:ceiling(n / length(bucket))){
marbleList <- c(marbleList, sample(bucket))
}
marbleList <- marbleList[1:n]

Recuperate maximum bipartite matching from maxFlowFordFulkerson in R

I want to find the maximum bipartite matching, so I'll use Flow Ford Fulkerson's algorithm, as explained here.
But when I implement the function, I only get the value of the maximum flow, but what interests me is the flow itself, so that I can find the matching.
Can anybody help me?
I used the function maxFlowFordFulkerson in R.
There is no way to do that using only the output of the function you've found. Besides the value of the maximum flow, it does also provide a minimum cut, which provides some additional information but still not what you're looking for.
Using the example from the page you refer to (reproduced below for ease of reference):
> library("optrees")
> vertices <- 1:14
> edges <- matrix(c(1,2,1, 1,3,1, 1,4,1, 1,5,1, 1,6,1, 1,7,1, 2,9,1, 2,10,1, 4,8,1, 4,11,1, 5,10,1, 6,11,1, 7,13,1, 8,14,1, 9,14,1, 10,14,1, 11,14,1, 12,14,1, 13,14,1), byrow = TRUE, ncol = 3)
> maxFlowFordFulkerson(vertices, edges, source.node = 1, sink.node = 14)
$s.cut
[1] 1 3
$t.cut
[1] 2 4 5 6 7 8 9 10 11 12 13 14
$max.flow
[1] 5
Here, the vertices in the two partitions are 2:7 and 8:13 respectively, so this tells us that vertex 3, i.e. the second vertex from the top in the left partition, remains unmatched, but other than that it tells you nothing about the matching.
If you want to stick to igraph, you can use maximum.bipartite.matching to get what you want. As this one operates on bipartite graphs directly, we don't have to mess with the auxiliary source/sink vertices at all. With the example from above:
> library("igraph")
> A <- matrix(c(0,1,1,0,0,0, 0,0,0,0,0,0, 1,0,0,1,0,0, 0,0,1,0,0,0, 0,0,1,1,0,0, 0,0,0,0,0,1), byrow = T, ncol = 6)
> g <- graph.incidence(A)
> maximum.bipartite.matching(g)
$matching_size
[1] 5
$matching_weight
[1] 5
$matching
[1] 8 NA 7 9 10 12 3 1 4 5 NA 6
Here, the left partition is represented by 1:6, and the right partition by 7:12. From $matching, we read that the 6 vertices in the left partition are matched with 8, nothing, 7, 9, 10, and 12 respectively.

R: Calculating adjacent vertex after deletion of nodes

I'm very new to R and trying to calculate the adjacent vertices of a graph, which is obtained from deleting certain nodes from an original graph.
However, the output of the result doesn't match with the plot of the graph.
For example:
library(igraph)
g <- make_ring(8)
g <- add_edges(g, c(1,2, 2,7, 3,6, 4,5, 8,2, 6,2))
V(g)$label <- 1:8
plot(g)
h <- delete.vertices(g, c(1,2))
plot(h)
If I compute:
adjacent_vertices(h,6)= 5
However, I want the output to be 3,5,7 as the plot shows. The problem lies in the fact that it doesn't know I'm trying to find the adjacent vertices of node labelled 6.
Could someone please help. Thanks.
The issue here is that when you delete the vertices, the indices for the remaining vertices are shifted down to [0,6]:
> V(h)
+ 6/6 vertices:
[1] 1 2 3 4 5 6
To find the neighbors, using the original vertex names, you could then simply offset the values by the number of vertices removed, e.g.:
> neighbors(h, 6 - offset) + offset
+ 3/6 vertices:
[1] 3 5 7
A better approach, however, would be to refer to the vertex labels instead of using the indices:
> V(g)$label
[1] 1 2 3 4 5 6 7 8
> V(h)$label
[1] 3 4 5 6 7 8
> V(h)[V(h)$label == 6]
+ 1/6 vertex:
[1] 4
To get the neighbors of your vertex of interest, you can modify your code to look like:
> vertex_of_interest <- V(h)[V(h)$label == 6]
> neighbors(h, vertex_of_interest)$label
[1] 3 5 7

Using R as a game simulator

I am trying to simulate a simple game where you spin a spinner, labeled 1-5, and then progress on until you pass the finish line (spot 50). I am a bit new to R and have been working on this for a while searching for answers. When I run the code below, it doesn't add the numbers in sequence, it returns a list of my 50 random spins and their value. How do I get this to add the spins on top of each other, then stop once => 50?
SpacesOnSpinner<-(seq(1,5,by=1))
N<-50
L1<-integer(N)
for (i in 1:N){
takeaspin<-sample(SpacesOnSpinner,1,replace=TRUE)
L1[i]<-L1[i]+takeaspin
}
This is a good use-case for replicate. I'm not sure if you have to use a for loop, but you could do this instead (replicate is a loop too):
SpacesOnSpinner<-(seq(1,5,by=1))
N<-10
cumsum( replicate( N , sample(SpacesOnSpinner,1,replace=TRUE) ) )
#[1] 5 10 14 19 22 25 27 29 30 33
However, since you have a condition which you want to break on, perhaps the other answer with a while condition is exactly what you need in this case (people will tell you they are bad in R, but they have their uses). Using this method, you can see how many spins it took you to get past 50 by a simple subset afterwards (but you will not know in advance how many spins it will take, but at most it will be 50!):
N<-50
x <- cumsum( replicate( N , sample(5,1) ) )
# Value of accumulator at each round until <= 50
x[ x < 50 ]
#[1] 5 6 7 8 12 16 21 24 25 29 33 34 36 38 39 41 42 44 45 49
# Number of spins before total <= 50
length(x[x < 50])
[1] 20
Here is another interesting way to simulate your game, using a recursive function.
spin <- function(outcomes = 1:5, start = 0L, end = 50L)
if (start <= end)
c(got <- sample(outcomes, 1), Recall(outcomes, start + got, end))
spin()
# [1] 5 4 4 5 1 5 3 2 3 4 4 1 5 4 3
Although elegant, it won't be as fast as an improved version of #Simon's solution that makes a single call to sample, as suggested by #Viktor:
spin <- function(outcomes = 1:5, end = 50L) {
max.spins <- ceiling(end / min(outcomes))
x <- sample(outcomes, max.spins, replace = TRUE)
head(x, match(TRUE, cumsum(x) >= end))
}
spin()
# [1] 3 5 2 3 5 2 2 5 1 2 1 5 5 5 2 4
For your ultimate goal (find the probability of one person being in the lead for the entire game), it is debatable whether while will be more efficient or not: a while loop is certainly slower, but you may benefit from the possibility of exiting early as the lead switches from one player to the other. Both approaches are worth testing.
You can use a while statement and a variable total for keeping track of the sum:
total <- 0
while(total <= 50){
takeaspin<-sample(SpacesOnSpinner,1,replace=TRUE)
total <- takeaspin + total
}
print (total)

Resources