How to calculate a maximum-bottleneck path with igraph? - r

Given a capacity network with a single source and a single sink, how can I calculate the maximum-bottleneck path (also known as the widest path or maximum capacity path problem) using igraph?
I've read (e.g. here or even with pseudocode there) that it is possible with some modifications to Dijkstra’s algorithm, but I do not want to dive into algortihm development but use igraph instead.
Example
library(igraph)
set.seed(21)
nodes = cbind(
'id' = c('Fermenters', 'Methanogens', 'carbs', 'CO2', 'H2', 'other', 'CH4', 'H2O')
)
from <- c('carbs', rep('Fermenters', 3), rep('Methanogens', 2), 'CO2', 'H2')
to <- c('Fermenters', 'other', 'CO2', 'H2', 'CH4', 'H2O', rep('Methanogens', 2))
weight <- sample(1 : 20, 8)
links <- data.frame(from, to, weight, stringsAsFactors = FALSE)
net = graph_from_data_frame(links, vertices = nodes, directed = T)
## Calculate max-bottleneck here !
# # disabled because just vis
# plot(net, edge.width = E(net)$weight)
# require(networkD3)
# require(tidyverse)
#
# d3net <- igraph_to_networkD3(net, group = rep(1, 8))
# forceNetwork(
# Links = mutate(d3net$links, weight = E(net)$weight), Nodes = d3net$nodes,
# Source = 'source', Target = 'target',
# NodeID = 'name', Group = "group", Value = "weight",
# arrows = TRUE, opacity = 1, opacityNoHover = 1
# )
So with respect to the example, how would I calculate the maximum capacity path from carbs to H2O?

I don't know how efficient this would be, but you could use igraph to find all "simple" paths, then calculate the minimum edge weight of each, then choose the max...
require(tibble)
require(igraph)
nodes = data_frame('id' = c('A', "B", "C", "D"))
links = tribble(
~from, ~to, ~weight,
"A" , "B", 10,
"B", "C", 10,
"C", "D", 6,
"A", "D", 4,
)
net = graph_from_data_frame(links, vertices = nodes, directed = T)
simple_paths <- all_simple_paths(net, "A", "D")
simple_paths[which.max(
sapply(simple_paths, function(path) {
min(E(net, path = path)$weight)
})
)]
# [[1]]
# + 4/4 vertices, named, from 061ab8d:
# [1] A B C D

You could try the same idea as in IGRAPH IN R: Find the path between vertices that maximizes the product of edge attributes. Invert the weights, divide by the total to keep the weights < 1 (to keep the log-weights positive), and take the min:
x<-shortest_paths(net,3,8, weights=-log(E(net)$weight/sum(E(net)$weight)), output="epath")[[2]]
E(net)[x[[1]]]
min(E(net)$weight[x[[1]]])
which gives
+ 4/8 edges from 57589bc (vertex names):
[1] carbs ->Fermenters Fermenters ->H2 H2 ->Methanogens Methanogens->H2O
[1] 10

Related

Automatically categorize and add annotations using pheatmap in R

I have a dataframe made by the school grades of some students in different subjects. The students are also characterized by their gender (F or M), that is included as a suffix in their names (e.g. Anne_F, Albert_M, etc...)
With these data I have created an heatmap with the package pheatmap(), in this way:
library(pheatmap)
Anne_F <- c(9,7,6,10,6)
Carl_M <- c(6,7,9,5,7)
Albert_M <- c(8,8,8,7,9)
Kate_F <- c(10,5,10,9,5)
Emma_F <- c(6,8,10,8,7)
matrix <- cbind(Anne_F, Carl_M, Albert_M, Kate_F, Emma_F)
rownames(matrix) <- c("Math", "Literature", "Arts", "Science", "Music")
print(matrix)
heatmap <- pheatmap(
mat = matrix,
cluster_rows = F,
cluster_cols = F,
cellwidth = 30,
cellheight = 30,
)
heatmap
Which gives this matrix
and the relative plot:
Now I would like to automatically recognize if a student is Male or Female and add this as a column annotation in the heatmap, in order to have a graph like this:
I have thought to create two vectors, one with the name of the students:
name <- c("Anne", "Carl", "Albert", "Kate", "Emma") and one with the respective genders: gender <- c("F", "M", "M", "F", "F") , but I can't figure out how to associate names with genders, and to show them on the heatmap.
I don't mean to manually associate one-name to one-gender (as Anne to F, Albert to M, etc,). I need to take the entire vector of names and associate it with the corresponding vector of genders (and then annotate them on the heatmap), because their number will increase in the future.
Many thanks in advance for your help.
You need to use annotation_col option in pheatmap.
library(pheatmap)
# split matrix into "Name" and "Gender"
name_gender_matrix <- str_split_fixed(colnames(matrix), "_", 2)
# Data that maps to the heatmap should be set at the rownames
annot_col <- data.frame(row.names = name_gender_matrix[, 1], Gender = name_gender_matrix[, 2])
# Align the column name of your matrix and with the annotation
colnames(matrix) <- rownames(annot_col)
heatmap <- pheatmap(
mat = matrix,
cluster_rows = F,
cluster_cols = F,
cellwidth = 30,
cellheight = 30,
annotation_col = annot_col
)
With the given data, you could achieve your desired output like this:
Gender <- sapply(colnames(matrix), function(x) strsplit(x, "_")[[1]][2])
df <- as.data.frame(Gender)
pheatmap(
mat = matrix,
cluster_rows = F,
cluster_cols = F,
cellwidth = 30,
cellheight = 30,
annotation_col = df,
annotation_colors = list(Gender = c(M = "#6ef88a", F = "#d357fe"))
)

Extract single linkage clusters from very large pairs list

I have a very large pairs list that I need to break down into single linkage communities. So far I have been able to do this entirely in R just fine. But I need to prepare for the eventuality that the entire list may be too large to hold in memory, or for igraph's R implementation to handle. A very simple version of this task looks like:
library(igraph)
df <- data.frame("p1" = c("a", "a", "d", "d"),
"p2" = c("b", "c", "e", "f"),
"val" = c(0.5, 0.75, 0.25, 0.35))
g <- graph_from_data_frame(d = df,
directed = FALSE)
sg <- groups(components(g))
sg <- sapply(sg,
function(x) induced_subgraph(graph = g,
vids = x),
USE.NAMES = FALSE,
simplify = FALSE)
if df is incredibly large - on the scale of hundreds of millions, to tens of billions of rows, is there a way for me to extract individual positions of sg without having to build g in it's entirety? It's relatively easy for me to store representations of df outside of R either as a compressed txt file or as a sqlite database.
To adress the problem with igraph's R implementation (assuming the dataset is still holdable in RAM, otherwise see #Paul Brodersen's answer):
The solution below works by specifying one element of the graph and then going over all connections until no further edges are found. It therefore creates the subgraph without building the whole graph. It looks a bit hacky compared to a recursive function but scales better.
library(igraph)
reduce_graph <- function(df, element) {
stop = F
elements_to_inspect <- element
rows_graph <-0
while(stop ==F) {
graph_parts <- df[df$p1 %in% elements_to_inspect |
df$p2 %in% elements_to_inspect,]
elements_to_inspect <- unique(c(unique(graph_parts$p1),
unique(graph_parts$p2)))
if(dim(graph_parts)[1] == rows_graph) {
stop <-TRUE
} else {
rows_graph <- dim(graph_parts)[1]
}
}
return(graph_parts)
}
df <- data.frame("p1" = c("a", "a", "d", "d","o"),
"p2" = c("b", "c", "e", "f","u"),
"val" = c(100, 0.75, 0.25, 0.35,1))
small_graph <- reduce_graph(df, "f")
g <- graph_from_data_frame(d = small_graph,
directed = FALSE)
sg <- groups(components(g))
sg <- sapply(sg,
function(x) induced_subgraph(graph = g,
vids = x),
USE.NAMES = FALSE,
simplify = FALSE)
One can test the speed on a bigger dataset.
##larger dataset with lots of sparse graphs.
set.seed(100)
p1 <- as.character(sample(1:10000000, 1000000, replace=T))
p2 <- as.character(sample(1:10000000, 1000000, replace=T))
val <- rep(1, 1000000)
df <- data.frame("p1" = p1,
"p2" = p2,
"val" = val)
small_graph <- reduce_graph(df, "9420672") #has 3 pairwise connections
g <- graph_from_data_frame(d = small_graph,
directed = FALSE)
sg <- groups(components(g))
sg <- sapply(sg,
function(x) induced_subgraph(graph = g,
vids = x),
USE.NAMES = FALSE,
simplify = FALSE)
Building groups and subgraph takes one second, compared to multiple minutes for the whole graph on my machine. This of course depends on how sparsely connected the graphs are.

manually create a dendrogram r

I am trying to create a dendrogram from similarity scores I have acquired not through hclust or any other means. I have two branches and just want to draw them out according to how similar they are and then have them branch off.
A and B are 0.5 similar
A is 0.2 unique
B is 0.3 unique
So the total height of A is 0.7 and the total height of B is 0.8, where 0.5 of their branches are shared.
The following just makes two branches without a long branch connecting the two leaves. There is this similar question, but it doesn't quite help!
x <- list(1, 2)
## attach "leaf" and "label" attributes to leaf nodes
attr(x[[1]], "leaf") <- TRUE
attr(x[[2]], "leaf") <- TRUE
attr(x[[1]], "label") <- "A"
attr(x[[2]], "label") <- "B"
## set "height" attributes for all nodes
attr(x, "height") <- 1
attr(x[[1]], "height") <- (1-0.7)
attr(x[[2]], "height") <- (1-0.8)
## set "midpoints" attributes for all nodes
attr(x, "midpoint") <- 1
attr(x[[1]], "midpoint") <- 0.5
attr(x[[2]], "midpoint") <- 0.5
## set "members" attributes for all nodes
attr(x, "members") <- 2
attr(x[[1]], "members") <- 1
attr(x[[2]], "members") <- 1
## set class as "dendrogram"
class(x) <- "dendrogram"
x
plot(x)
You can make a function to build the leaves. Add the height of the attributes and the total height. n and n1 are the leaves for your A and B and n2 are your leaves combined and are converted to a dendrogram by changing the class.
Attr = function(o, plus_) {
if (!missing(plus_)) for (n in names(plus_)) { attr(o, n) = plus_[[n]]; }
o
}
n = Attr("A", list(label = "A", members = 1, height = 0.2, leaf = T));
n1 = Attr("B", list(label = "B", members = 1, height = 0.3, leaf = T));
n2 = Attr(list(n, n1), list(members = 2, height = 1, midpoint = 0.5));
class(n2) = 'dendrogram';
plot(n2)

Subset graph based on edges weight

I have a graph, G=(V,E) with several attributes including an edge weight attribute. I'm trying to create a subgraph based on a condition where weights are higher than x.
I've tried the standard R subsetting option with g <- E(g)[weight > max(weight)*.10], but I always get a vector.
I'm not sure what I'm doing wrong here.
Maybe you want something like this
library(igraph)
set.seed(1)
m <- matrix(sample(c(.5, 2, 5), 100, replace=T, prob = c(.6,.3,.1)), nc=10, dimnames = rep(list(letters[1:10]), 2))
g <- graph_from_adjacency_matrix(m, weighted=T, diag=F, mode="undirected")
coords <- layout.auto(g)
par(mfrow = c(1,3))
plot(g, layout=coords, edge.width = E(g)$weight)
s1 <- subgraph.edges(g, E(g)[E(g)$weight>2], del=F)
plot(s1, layout=coords, edge.width = E(s1)$weight)
s2 <- delete_vertices(s1, degree(s1, mode = "in")==0)
plot(s2, layout=coords[V(g)$name%in%V(s2)$name,], edge.width = E(s2)$weight)
That would be because you replaced your graph g with just subsetted edges. If you want to remove edges below the threshold weight, you can use:
g_sub <- delete.edges(g, E(g)[weight <= max(weight)*.10])

igraph use of %>% as continuation

I am using igraph with R
trying to create a graph with labeled vertices.
THe igraph docs for "paths" give the example code
g <- make_empty_graph(directed = FALSE, n = 10) %>%
set_vertex_attr("name", value = letters[1:10])
plot(g) # labels the vertices with letters a through j
However
g <- make_empty_graph(directed = FALSE, n = 10)
set_vertex_attr(g, "name", value = letters[1:10])
plot(g) # now labels the vertices as numbers 1:10
Why?
The igraph docs for "set_vertex_attributs" use
set_vertex_attr(g, "name", value = letters[1:10])
to set vertex attributes.

Resources