Recuperate maximum bipartite matching from maxFlowFordFulkerson in R - r

I want to find the maximum bipartite matching, so I'll use Flow Ford Fulkerson's algorithm, as explained here.
But when I implement the function, I only get the value of the maximum flow, but what interests me is the flow itself, so that I can find the matching.
Can anybody help me?
I used the function maxFlowFordFulkerson in R.

There is no way to do that using only the output of the function you've found. Besides the value of the maximum flow, it does also provide a minimum cut, which provides some additional information but still not what you're looking for.
Using the example from the page you refer to (reproduced below for ease of reference):
> library("optrees")
> vertices <- 1:14
> edges <- matrix(c(1,2,1, 1,3,1, 1,4,1, 1,5,1, 1,6,1, 1,7,1, 2,9,1, 2,10,1, 4,8,1, 4,11,1, 5,10,1, 6,11,1, 7,13,1, 8,14,1, 9,14,1, 10,14,1, 11,14,1, 12,14,1, 13,14,1), byrow = TRUE, ncol = 3)
> maxFlowFordFulkerson(vertices, edges, source.node = 1, sink.node = 14)
$s.cut
[1] 1 3
$t.cut
[1] 2 4 5 6 7 8 9 10 11 12 13 14
$max.flow
[1] 5
Here, the vertices in the two partitions are 2:7 and 8:13 respectively, so this tells us that vertex 3, i.e. the second vertex from the top in the left partition, remains unmatched, but other than that it tells you nothing about the matching.
If you want to stick to igraph, you can use maximum.bipartite.matching to get what you want. As this one operates on bipartite graphs directly, we don't have to mess with the auxiliary source/sink vertices at all. With the example from above:
> library("igraph")
> A <- matrix(c(0,1,1,0,0,0, 0,0,0,0,0,0, 1,0,0,1,0,0, 0,0,1,0,0,0, 0,0,1,1,0,0, 0,0,0,0,0,1), byrow = T, ncol = 6)
> g <- graph.incidence(A)
> maximum.bipartite.matching(g)
$matching_size
[1] 5
$matching_weight
[1] 5
$matching
[1] 8 NA 7 9 10 12 3 1 4 5 NA 6
Here, the left partition is represented by 1:6, and the right partition by 7:12. From $matching, we read that the 6 vertices in the left partition are matched with 8, nothing, 7, 9, 10, and 12 respectively.

Related

Change order of vector of nodes from level order to infix order in r

I have a vector of nodes taken from a binary regression tree. These are in level order, for example, 1,2,4,5,10,11. I would like to place them in infix order like so: 4,2,10,5,11,1. Thanks to Alistaire I have a solution that uses recursion. But as they point out, "There has to be a better way". I was hoping someone might be able to help me out with a non-recursive approach. The recursive version is very slow for vectors of any reasonable length. I have also tried creating a binary tree using igraph and data.tree but I cannot seem to get the ordering I want from these.
Yes, it's possible to do this without recursion since you are dealing with a binary tree, which has a fixed structure like the following tree with depth 5:
Suppose we have a vector of your nodes:
nodes <- c(1, 2, 4, 5, 10, 11)
First of all, we only want a binary tree that is of a suitable depth to accommodate your largest node. We can get the required depth by doing:
depth <- ceiling(log(max(nodes), 2))
And a data frame that gives the node number, depth and 'leftness' of a sufficiently large binary tree like this:
df <- data.frame(node = seq(2^(depth) - 1),
depth = rep(seq(depth), times = 2^(seq(depth) - 1)),
leftness = unlist(sapply(2^seq(depth) - 1,
function(x) (seq(x)[seq(x) %% 2 ==1])/(x + 1))))
However, we only need the subset of this tree that matches your nodes:
df <- df[match(nodes, df$node),]
df
#> node depth leftness
#> 1 1 1 0.5000
#> 2 2 2 0.2500
#> 4 4 3 0.1250
#> 5 5 3 0.3750
#> 10 10 4 0.3125
#> 11 11 4 0.4375
And we can sort the nodes in order according to leftness:
df$node[order(df$leftness)]
#> [1] 4 2 10 5 11 1
Which is your expected result.
To generalize this, just put the above steps in a function:
sort_left <- function(nodes) {
depth <- ceiling(log(max(nodes), 2))
df <- data.frame(node = seq(2^(depth) - 1),
depth = rep(seq(depth), times = 2^(seq(depth) - 1)),
leftness = unlist(sapply(2^seq(depth) - 1,
function(x) (seq(x)[seq(x) %% 2 ==1])/(x + 1))))
df <- df[match(nodes, df$node),]
df$node[order(df$leftness)]
}
So we can do:
sort_left( c(1, 2, 4, 5, 10, 11))
#> [1] 4 2 10 5 11 1
Or, given the example in your original question,
sort_left(c(1,2,4,5,10,11,20,21))
#> [1] 4 2 20 10 21 5 11 1
Which was the desired result. All without recursion.

How to repeatedly sample without replacement when sample size is greater than the population size

This seems like it must be a duplicate but I can't find a solution, probably because I don't know exactly what to search for.
Say I have a bucket of 8 numbered marbles and 10 people who will each sample 1 marble from the bucket.
How can I write a sampling procedure where each person draws a marble from the bucket without replacement until the bucket is empty, at which point all the marbles are put back into the bucket, and sampling continues without replacement? Is there a name for this kind of sampling?
For instance, a hypothetical result from this sampling with our 10 people and bucket of 8 marbles would be:
person marble
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 1
10 10 2
Note that the marbles are drawn randomly, so not necessarily in numerical order. This is just an example output to get my point across.
Building on the answer from MÃ¥nsT, here is a function to do this programmatically. I put in functionality to smoothly handle cases where the number of samples take is less than the population size (and return the expected behavior).
sample_more <- function(pop, n, seed = NULL) {
set.seed(seed) # For reproducibility, if desired. Defaults to NULL, which is no seed
m <- length(pop)
if(n <= m) { # handles case when n is smaller than the population size
sample(pop, n, replace = FALSE)
} else { # handle case when n is greater than population size
c(sample(pop, m, replace = FALSE), sample(pop, n-m, replace = FALSE))
}
}
marbles <- 1:8
sample_more(marbles, 10, seed = 1)
[1] 1 4 8 2 6 3 7 5 2 3
sample_more(marbles, 3, seed = 1)
[1] 1 4 8
Not sure if there is a name for this, but a simple solution would be to just use sample several times:
# Create a bucket with 8 marbles:
bucket <- 1:8
# First draw 8 marbles without replacement, then refill the bucket at draw 2 more:
marbles <- c(sample(bucket, 8, replace = FALSE), sample(bucket, 2, replace = FALSE))
marbles
You can dynamically use sample in a for loop to generate the list.
For marbles 1:8 and over n people
Example:
bucket <- 1:8
n <- 100
marbleList <- c()
for(i in 1:ceiling(n / length(bucket))){
marbleList <- c(marbleList, sample(bucket))
}
marbleList <- marbleList[1:n]

r igraph find all cycles

I have directed igraph and want to fetch all the cycles. girth function works but only returns the smallest cycle. Is there a way in R to fetch all the cycles in a graph of length greater then 3 (no vertex pointing to itself and loops)
It is not directly a function in igraph, but of course you can code it up. To find a cycle, you start at some node, go to some neighboring node and then find a simple path back to the original node. Since you did not provide any sample data, I will illustrate with a simple example.
Sample data
## Sample graph
library(igraph)
set.seed(1234)
g = erdos.renyi.game(7, 0.29, directed=TRUE)
plot(g, edge.arrow.size=0.5)
Finding Cycles
Let me start with just one node and one neighbor. Node 2 connects to Node 4. So some cycles may look like 2 -> 4 -> (Nodes other than 2 or 4) -> 2. Let's get all of the paths like that.
v1 = 2
v2 = 4
lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p))
[[1]]
[1] 2 4 2
[[2]]
[1] 2 4 3 5 7 6 2
[[3]]
[1] 2 4 7 6 2
We see that there are three cycles starting at 2 with 4 as the second node. (I know that you said length greater than 3. I will come back to that.)
Now we just need to do that for every node v1 and every neighbor v2 of v1.
Cycles = NULL
for(v1 in V(g)) {
for(v2 in neighbors(g, v1, mode="out")) {
Cycles = c(Cycles,
lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p)))
}
}
This gives 17 cycles in the whole graph. There are two issues though that you may need to look at depending on how you want to use this. First, you said that you wanted cycles of length greater than 3, so I assume that you do not want the cycles that look like 2 -> 4 -> 2. These are easy to get rid of.
LongCycles = Cycles[which(sapply(Cycles, length) > 3)]
LongCycles has 13 cycles having eliminated the 4 short cycles
2 -> 4 -> 2
4 -> 2 -> 4
6 -> 7 -> 6
7 -> 6 -> 7
But that list points out the other problem. There still are some that you cycles that you might think of as duplicates. For example:
2 -> 7 -> 6 -> 2
7 -> 6 -> 2 -> 7
6 -> 2 -> 7 -> 6
You might want to weed these out. To get just one copy of each cycle, you can always choose the vertex sequence that starts with the smallest vertex number. Thus,
LongCycles[sapply(LongCycles, min) == sapply(LongCycles, `[`, 1)]
[[1]]
[1] 2 4 3 5 7 6 2
[[2]]
[1] 2 4 7 6 2
[[3]]
[1] 2 7 6 2
This gives just the distinct cycles.
Addition regarding efficiency and scalability
I am providing a much more efficient version of the code that I
originally provided. However, it is primarily for the purpose of
arguing that, except for very simple graphs, you will not be able
produce all cycles.
Here is some more efficient code. It eliminates checking many
cases that either cannot produce a cycle or will be eliminated
as a redundant cycle. In order to make it easy to run the tests
that I want, I made it into a function.
## More efficient version
FindCycles = function(g) {
Cycles = NULL
for(v1 in V(g)) {
if(degree(g, v1, mode="in") == 0) { next }
GoodNeighbors = neighbors(g, v1, mode="out")
GoodNeighbors = GoodNeighbors[GoodNeighbors > v1]
for(v2 in GoodNeighbors) {
TempCyc = lapply(all_simple_paths(g, v2,v1, mode="out"), function(p) c(v1,p))
TempCyc = TempCyc[which(sapply(TempCyc, length) > 3)]
TempCyc = TempCyc[sapply(TempCyc, min) == sapply(TempCyc, `[`, 1)]
Cycles = c(Cycles, TempCyc)
}
}
Cycles
}
However, except for very simple graphs, there is a combinatorial
explosion of possible paths and so finding all possible cycles is
completely impractical I will illustrate this with graphs much smaller
than the one that you mention in the comments.
First, I will start with some small graphs where the number of edges
is approximately twice the number of vertices. Code to generate my
examples is below but I want to focus on the number of cycles, so I
will just start with the results.
## ecount ~ 2 * vcount
Nodes Edges Cycles
10 21 15
20 41 18
30 65 34
40 87 424
50 108 3433
55 117 22956
But you report that your data has approximately 5 times as
many edges as vertices. Let's look at some examples like that.
## ecount ~ 5 * vcount
Nodes Edges Cycles
10 48 3511
12 61 10513
14 71 145745
With this as the growth of the number of cycles, using 10K nodes
with 50K edges seems to be out of the question. BTW, it took several
minutes to compute the example with 14 vertices and 71 edges.
For reproducibility, here is how I generated the above data.
set.seed(1234)
g10 = erdos.renyi.game(10, 0.2, directed=TRUE)
ecount(g10)
length(FindCycles(g10))
set.seed(1234)
g20 = erdos.renyi.game(20, 0.095 , directed=TRUE)
ecount(g20)
length(FindCycles(g20))
set.seed(1234)
g30 = erdos.renyi.game(30, 0.056 , directed=TRUE)
ecount(g30)
length(FindCycles(g30))
set.seed(1234)
g40 = erdos.renyi.game(40, 0.042 , directed=TRUE)
ecount(g40)
length(FindCycles(g40))
set.seed(1234)
g50 = erdos.renyi.game(50, 0.038 , directed=TRUE)
ecount(g50)
length(FindCycles(g50))
set.seed(1234)
g55 = erdos.renyi.game(55, 0.035 , directed=TRUE)
ecount(g55)
length(FindCycles(g55))
##########
set.seed(1234)
h10 = erdos.renyi.game(10, 0.55, directed=TRUE)
ecount(h10)
length(FindCycles(h10))
set.seed(1234)
h12 = erdos.renyi.game(12, 0.46, directed=TRUE)
ecount(h12)
length(FindCycles(h12))
set.seed(1234)
h14 = erdos.renyi.game(14, 0.39, directed=TRUE)
ecount(h14)
length(FindCycles(h14))

R: Calculating adjacent vertex after deletion of nodes

I'm very new to R and trying to calculate the adjacent vertices of a graph, which is obtained from deleting certain nodes from an original graph.
However, the output of the result doesn't match with the plot of the graph.
For example:
library(igraph)
g <- make_ring(8)
g <- add_edges(g, c(1,2, 2,7, 3,6, 4,5, 8,2, 6,2))
V(g)$label <- 1:8
plot(g)
h <- delete.vertices(g, c(1,2))
plot(h)
If I compute:
adjacent_vertices(h,6)= 5
However, I want the output to be 3,5,7 as the plot shows. The problem lies in the fact that it doesn't know I'm trying to find the adjacent vertices of node labelled 6.
Could someone please help. Thanks.
The issue here is that when you delete the vertices, the indices for the remaining vertices are shifted down to [0,6]:
> V(h)
+ 6/6 vertices:
[1] 1 2 3 4 5 6
To find the neighbors, using the original vertex names, you could then simply offset the values by the number of vertices removed, e.g.:
> neighbors(h, 6 - offset) + offset
+ 3/6 vertices:
[1] 3 5 7
A better approach, however, would be to refer to the vertex labels instead of using the indices:
> V(g)$label
[1] 1 2 3 4 5 6 7 8
> V(h)$label
[1] 3 4 5 6 7 8
> V(h)[V(h)$label == 6]
+ 1/6 vertex:
[1] 4
To get the neighbors of your vertex of interest, you can modify your code to look like:
> vertex_of_interest <- V(h)[V(h)$label == 6]
> neighbors(h, vertex_of_interest)$label
[1] 3 5 7

Assign numbers to each letter so that r calculates the sum of the letters in a word

I'm trying to create a tool in R that will calculate the atomic composition (i.e. number of carbon, hydrogen, nitrogen and oxygen atoms) of a peptide chain that is input in single letter amino acid code. For example, the peptide KGHLY consists of the amino acids lysine (K), glycine (G), histidine (H), leucine (L) and tyrosine (Y). Lysine is made of 6 carbon, 13 hydrogen, 1 nitrogen and 2 oxygen. Glycine is made of 2 carbon, 5 hydrogen, 1 nitrogen and 2 oxygen. etc. etc.
I would like the r code to either read the peptide string (KGHLY) from a data frame or take input from the keyboard using readline()
I am new to R and new to programming. I am able to make objects for each amino acid, e.g. G <- c(2, 5, 1, 2) or build a data frame containing all 20 amino acids and their respective atomic compositions.
The bit that I am struggling with is that I don't know how to get R to index from a data frame in response to a string of letters. I have a feeling the solution is probably very simple but so far I have not been able to find a function that is suited to this task.
There's two main components to take care of here: The selection of
a method for the storing of the basic data and the algorithm that
computes the result you desire.
For the computation, it might be preferable to have your data
stored in a matrix, due to the way R recycles the shorter vector
when multiplying two vectors. This recycling also kicks in if you
want to multiply a matrix with a vector, since a matrix is a
vector with some additional attributes (that is to say, dimension
and dimension-names). Consider the example below to see how it
works
test_matrix <- matrix(data = 1:12, nrow = 3)
test_vec <- c(3, 0, 1)
test_matrix
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
test_matrix * test_vec
[,1] [,2] [,3] [,4]
[1,] 3 12 21 30
[2,] 0 0 0 0
[3,] 3 6 9 12
Based on this observation, it's possible to deduce that a solution
where each amino acid has one row in a matrix might be a good way
to store the look up data; when we have a counting vector with
specifying the desired amount of contribution from each row, it
will be sufficient to multiply our matrix with our counting
vector, and then sum the columns - the last part solved using
colSums.
colSums(test_matrix * test_vec)
[1] 6 18 30 42
It's in general a "pain" to store this kind of information in a
matrix, since it might be a "lot of work" to update the
information later on. However, I guess it's not that often it's
required to add new amino acids, so that might not be an issue in
this case.
So let's create a matrix for the the five amino acids needed
for the peptide you mentioned in your example. The numbers was
found on Wikipedia, and hopefully I didn't mess up when I copied
them. Just follow suit to add all the other amino acids too.
amino_acids <- rbind(
G = c(C = 2, H = 5, N = 1, O = 2),
L = c(C = 6, H = 13, N = 1, O = 2),
H = c(C = 6, H = 9, N = 3, O = 2),
K = c(C = 6, H = 14, N = 2, O = 2),
Y = c(C = 9, H = 11, N = 1, O = 3))
amino_acids
C H N O
G 2 5 1 2
L 6 13 1 2
H 6 9 3 2
K 6 14 2 2
Y 9 11 1 3
This matrix contains the information we want, but it might be
preferable to have them in lexicographic order - and it would be
nice to ensure that we haven't by mistake added the same row
twice. The code below takes care of both of these issues.
amino_acids <-
amino_acids[sort(unique(rownames(amino_acids))), ]
amino_acids
C H N O
G 2 5 1 2
H 6 9 3 2
K 6 14 2 2
L 6 13 1 2
Y 9 11 1 3
The next part is to figure out how to deal with the peptides. This
will here be done by first using strsplit to split the string
into separate characters, and then use a table-solution upon the
result to get the vector that we want to multiply with the matrix.
peptide <- "KGHLY"
peptide_2 <- unlist(strsplit(x = peptide, split = ""))
peptide_2
[1] "K" "G" "H" "L" "Y"
Using table upon peptide_2 gives us
table(peptide_2)
peptide_2
G H K L Y
1 1 1 1 1
This can thus be used to define a vector to play the role of test_vec in the first example. However, in general the resulting vector will contain fewer components than the rows of the matrix amino_acids; so a restriction must be performed first, in order to get the correct format we want for our computation.
Several options is available, and the simplest one might be to use the names from the table to subset the required rows from amino_acids, such that the computation can proceed without any further fuzz.
peptide_vec <- table(peptide_2)
colSums(amino_acids[names(peptide_vec), ] * as.vector(peptide_vec))
C H N O
29 52 8 11
This outlines one possible solution for the core of your problem,
and this can be collected into a function that takes care of all
the steps for us.
peptide_function <- function(peptide, amino_acids) {
peptide_vec <- table(
unlist(strsplit(x = peptide, split = "")))
## Compute the result and return it to the work flow.
colSums(
amino_acids[names(peptide_vec), ] *
as.vector(peptide_vec))
}
And finally a test to see that we get the same answer as before.
peptide_function(peptide = "GHKLY",
amino_acids = amino_acids)
C H N O
29 52 8 11
What next? Well that depends on how you have stored your
peptides, and what you would like to do with the result. If for
example you have the peptides stored in a vector, and would like
to have the result stored in a matrix, then it might e.g. be
possible to use vapply as given below.
data_vector <- c("GHKLY", "GGLY", "HKLGL")
result <- t(vapply(
X = data_vector,
FUN = peptide_function,
FUN.VALUE = numeric(4),
amino_acids = amino_acids))
result
C H N O
GHKLY 29 52 8 11
GGLY 19 34 4 9
HKLGL 26 54 8 10

Resources