Shortest Paths based on edge attribute with igraph - r

I'm trying to get the shortest paths of a graph but based on its edge ids.
So having the following graph:
library(igraph)
set.seed(45)
g <- erdos.renyi.game(25, 1/10, directed = TRUE)
E(g)$id <- sample(1:3, length(E(g)), replace = TRUE)
The shortest_paths(g, 1, V(g)) function finds all the shortest paths from node 1 to all the other nodes. However, I would like to calculate this, not just by following the geodesic distance, but a mix between the geodesic distance, and the minimum of edge id changes.
For example if this would be a train network, and the edge ids would represent trains. I would like to calculate how to get from node A to all the other nodes using the shortest path, but while changing the least amount of time of trains.

OK I think I have a working solution, although the code is a little ugly. The basic algorithm (lets call it gs(i, j)) goes like this: If we want to find the shortest train journey from i to j (gs(i, j)) we:
find the shortest path from i to j considering all trains. if this path is length 0 or 1 return it (there is either no path or a path on 1 train)
split the graph up by 'trains' (subset graph by edges) so as to consider each train network separately, and find the shortest path between i and j in each individual train network
if a single train will get you from i to j, return the train route with the fewest stops between i and j, else
if no single train runs from i to j then call gs(i, j-1) where (j-1) is the stop before j in the shortest path between i and j on the full network.
So basically, we look to see if a single train can do it, and if it can't we call the function recursively looking if a single train can get you to the stop before the last stop, etc. etc.
library(igraph)
# First your data
set.seed(45)
g <- erdos.renyi.game(25, 1/10, directed = TRUE)
E(g)$id <- sample(1:3, length(E(g)), replace = TRUE)
plot(g, edge.color = E(g)$id)
# The function takes as arguments the graph, and the id of the vertex
# you want to go from/to. It should work for a vector of
# destinations but I have not rigorously tested it so proceed with
# caution!
get.shortest.routes <- function(g, from, to){
train.routes <- lapply(unique(E(g)$id), function(id){subgraph.edges(g, eids = which(E(g)$id==id), delete.vertices = F)})
target.sp <- shortest_paths(g, from = from, to = to, output = 'vpath')$vpath
single.train.paths <- lapply(train.routes, function(gs){shortest_paths(gs, from = from, to = to, output = 'vpath')$vpath})
for (i in length(target.sp)){
if (length(target.sp[[i]]>1)) {
cands <- lapply(single.train.paths, function(l){l[[i]]})
if (sum(unlist(lapply(cands, length)))!=0) {
cands <- cands[lapply(cands, length)!=0]
cands <- cands[lapply(cands, length)==min(unlist(lapply(cands, length)))]
target.sp[[i]] <- cands[[1]]
} else {
target.sp[[i]] <- c(get.shortest.routes(g, from = as.numeric(target.sp[[i]][1]),
to = as.numeric(target.sp[[i]][(length(target.sp[[i]]) - 1)]))[[1]],
get.shortest.routes(g, from = as.numeric(target.sp[[i]][(length(target.sp[[i]]) - 1)]),
to = as.numeric(target.sp[[i]][length(target.sp[[i]])]))[[1]][-1])
}
}
}
target.sp
}
OK now lets run some tests. If you squint at the graph above you can see that the path from vertex 5 to vertex 21 is length-2 if you take two trains, but that you can get there on 1 train if you pass through an extra station. Our new function should return the longer path:
shortest_paths(g, 5, 21)$vpath
#> [[1]]
#> + 3/25 vertices, from b014eb9:
#> [1] 5 13 21
get.shortest.routes(g, 5, 21)
#> Warning in shortest_paths(gs, from = from, to = to, output = "vpath"): At
#> structural_properties.c:745 :Couldn't reach some vertices
#> Warning in shortest_paths(gs, from = from, to = to, output = "vpath"): At
#> structural_properties.c:745 :Couldn't reach some vertices
#> [[1]]
#> + 4/25 vertices, from c22246c:
#> [1] 5 13 15 21
Lets make a really easy graph where we are sure what we want to see: here we should get 1-2-4-5 instead of 1-3-5:
df <- data.frame(from = c(1, 1, 2, 3, 4), to = c(2, 3, 4, 5, 5))
g1 <- graph_from_data_frame(df)
E(g1)$id <- c(1, 2, 1, 3, 1)
plot(g1, edge.color = E(g1)$id)
get.shortest.routes(g1, 1, 5)
#> Warning in shortest_paths(gs, from = from, to = to, output = "vpath"): At
#> structural_properties.c:745 :Couldn't reach some vertices
#> Warning in shortest_paths(gs, from = from, to = to, output = "vpath"): At
#> structural_properties.c:745 :Couldn't reach some vertices
#> [[1]]
#> + 4/5 vertices, named, from c406649:
#> [1] 1 2 4 5
I'm sure there is a more rigorous solution, and you'll probably want to optimize the code a bit. For instance, I just realized that I don't stop the function immediately if the shortest path on the full graph has only two nodes -- doing so would avoid some needless computations! This was a fun problem, I hope some other answers gets posted.
Created on 2018-05-11 by the reprex package (v0.2.0).

Here is my take on the problem. A few notes:
1) all_simple_paths will not scale well with large or highly connected graphs
2) I favored fewest changes above all else, which means a path with two changes and a dist of 40 will beat a path with three changes and a dist of 3.
4) I can imagine an even faster approach if # of changes and distance change priority if there is no path on one id
library(igraph)
# First your data
set.seed(45)
g <- erdos.renyi.game(25, 1/10, directed = TRUE)
E(g)$id <- sample(1:3, length(E(g)), replace = TRUE)
plot(g, edge.color = E(g)$id)
##Option 1:
rst <- all_simple_paths(g, from = 1, to = 18, mode = "out")
rst <- lapply(rst, as_ids)
rst1 <- lapply(rst, function(x) c(x[1], rep(x[2:(length(x)-1)],
each=2), x[length(x)]))
rst2 <- lapply(rst1, function(x) data.frame(eid = get.edge.ids(graph=g, vp = x),
train=E(g)$id[get.edge.ids(graph=g, vp = x)]))
rst3 <- data.frame(pathID=seq_along(rst),
changes=sapply(rst2, function(x) length(rle(x$train)$lengths)),
dist=sapply(rst2, nrow))
spath <- rst3[order(rst3$changes, rst3$dist), ][1,1]
#Vertex IDs
rst[[spath]]
#[1] 1 23 8 18
plot(g, edge.color = E(g)$id, vertex.color=ifelse(V(g) %in% rst[[spath]], "firebrick", "gray80"),
edge.arrow.size=0.5)

Related

Does igraph has a function that generates sub-graphs limited by weights? dfs, random_walk

I have a weighted graph in igraph R environment.
And need to obtain sub-graphs recursively, starting from any random node. The sum of weights in each sub-graph has to be less them a number.
The Deep First Search algorithm seems to deal with this problem. Also the random walk function.
Does anybody know which igraph function could tackle this?
This iterative function finds the sub-graph grown from vertex vertex of any undirected graph which contains the biggest possible weight-sum below a value spevified in limit.
A challange in finding such a graph is the computational load of evaluating the weight sum of any possible sub-graphs. Consider this example, where one iteration has found a sub-graph A-B with a weight sum of 1.
The shortest path to any new vertex is A-C (with a weight of 3), a sub-graph of A-B-D has a weight-sum of 6, while A-B-C would have a weight-sum of 12 because of the inclusion of the edge B-C in the sub-graph.
The function below looks ahead and evaluates iterative steps by choosing to gradually enlarge the sub-graph by including the next vertex that would result in the lowest sub-graph weight-sum rather than that vertex which has the shortest direct paths.
In terms of optimisation, this leaves something to be desired, but I think id does what you requested in your first question.
find_maxweight_subgraph_from <- function(graph, vertex, limit=0, sub_graph=c(vertex), current_ws=0){
# Keep a shortlist of possible edges to go next
shortlist = data.frame(k=integer(0),ws=numeric(0))
limit <- min(limit, sum(E(graph)$weight))
while(current_ws < limit){
# To find the next possible vertexes to include, a listing of
# potential candidates is computed to be able to choose the most
# efficient one.
# Each iteration chooses amongst vertecies that are connected to the sub-graph:
adjacents <- as.vector(adjacent_vertices(graph, vertex, mode="all")[[1]])
# A shortlist of possible enlargements of the sub-graph is kept to be able
# to compare each potential enlargement of the sub-graph and always choose
# the one which results in the smallest increase of sub-graph weight-sum.
#
# The shortlist is enlarged by vertecies that are:
# 1) adjacent to the latest added vertex
# 2) not alread IN the sub-graph
new_k <- adjacents[!adjacents %in% sub_graph]
shortlist <- rbind(shortlist[!is.na(shortlist$k),],
data.frame(k = new_k,
ws = rep(Inf, length(new_k)) )
)
# The addition to the weight-sum is NOT calculated by the weight on individual
# edges leading to vertecies on the shortlist BUT on the ACTUAL weight-sum of
# a sub-graph that would be the result of adding a vertex `k` to the sub-graph.
shortlist$ws <- sapply(shortlist$k, function(x) sum( E(induced_subgraph(graph, c(sub_graph,x)))$weight ) )
# We choose the vertex with the lowest impact on weight-sum:
shortlist <- shortlist[order(shortlist$ws),]
vertex <- shortlist$k[1]
current_ws <- shortlist$ws[1]
shortlist <- shortlist[2:nrow(shortlist),]
# Each iteration adds a new vertex to the sub-graph
if(current_ws <= limit){
sub_graph <- c(sub_graph, vertex)
}
}
(induced_subgraph(graph, sub_graph))
}
# Test function using a random graph
g <- erdos.renyi.game(16, 30, type="gnm", directed=F)
E(g)$weight <- sample(1:1000/100, length(E(g)))
sum(E(g)$weight)
plot(g, edge.width = E(g)$weight, vertex.size=2)
sg <- find_maxweight_subgraph_from(g, vertex=12, limit=60)
sum(E(sg)$weight)
plot(sg, edge.width = E(sg)$weight, vertex.size=2)
# Test function using your example code:
g <- make_tree(10, children = 2, mode = c("undirected"))
s <- seq(1:10)
g <- set_edge_attr(g, "weight", value= s)
plot(g, edge.width = E(g)$weight)
sg <- find_maxweight_subgraph_from(g, 2, 47)
sum(E(sg)$weight)
plot(sg, edge.width = E(g)$weight)
It is done here below, however, it does not seem to be effective.
#######Example code
g <- make_tree(10, children = 2, mode = c("undirected"))
s <- seq(1:19)
g <- set_edge_attr(g, "weight", value= s)
plot(g)
is_weighted(g)
E(g)$weight
threshold <- 5
eval <- function(r){
#r <- 10
Vertice_dfs <- dfs(g, root = r)
Sequencia <- as.numeric(Vertice_dfs$order)
for (i in 1:length(Sequencia)) {
#i <- 2
# function callback by vertice to dfs
f.in <- function(graph, data, extra) {
data[1] == Sequencia[i]-1
}
# DFS algorithm to the function
dfs <- dfs(g, root = r,in.callback=f.in)
# Vertices resulted from DFS
dfs_eges <- na.omit(as.numeric(dfs$order))
# Rsulted subgraph
g2 <- induced_subgraph(g, dfs_eges)
# Total weight subgraph g2
T_W <- sum(E(g2)$weight)
if (T_W > threshold) {
print(T_W)
return(T_W)
break
}
}
}
#search by vertice
result <- lapply(1:length(V(g)),eval)

Igraph - A way to extract which nodes got into which communities

I found this code online here: https://blog.revolutionanalytics.com/2015/08/contracting-and-simplifying-a-network-graph.html
library(igraph)
# Download prepared igraph file from github
gs <- readRDS("pdb/depGraph-CRAN.rds")
set.seed(42)
# Compute communities (clusters)
cl <- walktrap.community(gs, steps = 5)
cl$degree <- (degree(gs)[cl$names])
# Assign node with highest degree as name for each cluster
cl$cluster <- unname(ave(cl$degree, cl$membership,
FUN=function(x)names(x)[which.max(x)])
)
V(gs)$name <- cl$cluster
# Contract graph ----------------------------------------------------------
# Contract vertices
E(gs)$weight <- 1
V(gs)$weight <- 1
gcon <- contract.vertices(gs, cl$membership,
vertex.attr.comb = list(weight = "sum", name = function(x)x[1], "ignore"))
# Simplify edges
gcon <- simplify(gcon, edge.attr.comb = list(weight = "sum", function(x)length(x)))
gcc <- induced.subgraph(gcon, V(gcon)$weight > 20)
V(gcc)$degree <- unname(degree(gcc))
# ------------------------------------------------------------------------
set.seed(42)
par(mar = rep(0.1, 4))
g.layout <- layout.kamada.kawai(gcc)
plot.igraph(gcc, edge.arrow.size = 0.1, layout = g.layout, vertex.size = 0.5 * (V(gcc)$degree))
This code contracts nodes and simplifies edges. It reduces my graph from over 500 nodes to around 39, which is great! However, I want to know which nodes ended up in which clusters in order to check if the procedure makes sense.
I also get this error when using the code:
> V(gs)$name <- cl$cluster
Warning message:
In length(vattrs[[name]]) <- vc : length of NULL cannot be changed
> (degree(gs)[cl$names])
numeric(0) <-- there seems to be nothing?
> unname(ave(cl$degree, cl$membership,
+ FUN=function(x)names(x)[which.max(x)]))
numeric(0) <-- there seems to be nothing?
Is this causing my problem or can I find my answer somewhere else?

Labels on only root and terminal vertices in igraph (R)?

inst2 = c(2, 3, 4, 5, 6)
motherinst2 = c(7, 8, 2, 10, 11)
km = c(20, 30, 40, 25, 60)
df2 = data.frame(inst2, motherinst2)
df2 = cbind(df2, km)
g2 = graph_from_data_frame(df2)
tkplot(g2)
how would I approach adding labels to exclusively my root and terminal vertices in a graph? I know it would involve this function, but how would you set it up? Assuming the graph object is just called 'g', or something obvious.
vertex.label =
The solution from #eipi1o is good, but the OP says "I'm finding it difficult to apply to my large data set effectively." I suspect that the issue is finding which are the intermediate nodes whose name should be blanked out. I will continue the example of #eipi10. Since my answer is based on his, if you upvote my answer, please upvote his as well.
You can use the neighbors function to determine which points are sources and sinks. Everything else is an intermediate node.
## original graph from eipi10
g = graph_from_edgelist(cbind(c(rep(1,10),2:11), c(2:21)))
## Identify which nodes are intermediate
SOURCES = which(sapply(V(g), function(x) length(neighbors(g, x, mode="in"))) == 0)
SINKS = which(sapply(V(g), function(x) length(neighbors(g, x, mode="out"))) == 0)
INTERMED = setdiff(V(g), c(SINKS, SOURCES))
## Fix up the node names and plot
V(g)$name = V(g)
V(g)$name[INTERMED] = ""
plot(g)
Using your example graph, we'll identify the root and terminal vertices and remove the labels for other vertices. Here's what the initial graph looks like:
set.seed(2)
plot(g2)
Now let's identify and remove the names of the intermediate vertices
# Get all edges
e = get.edgelist(g2)
# Root vertices are in first column but not in second column
root = setdiff(e[,1],e[,2])
# Terminal vertices are in second column but not in first column
terminal = setdiff(e[,2], e[,1])
# Vertices to remove are not in root or terminal vertices
remove = setdiff(unique(c(e)), c(root, terminal))
# Remove names of intermediate vertices
V(g2)$name[V(g2)$name %in% remove] = ""
set.seed(2)
plot(g2)
Original Answer
You can use set.vertex.attribute to change the label names. Here's an example:
library(igraph)
# Create a graph to work with
g = graph_from_edgelist(cbind(c(rep(1,10),2:11), c(2:21)))
plot(g)
Now we can remove the labels from the intermediate vertices:
g = set.vertex.attribute(g, "name", value=c(1,rep("", length(2:11)),12:21))
plot(g)

How to calculate the edge attributes as the path length in igraph?

Pretend the dataframe below is an edgelist (relation between inst2 and motherinst2), and that km is an attribute I want to calculate as a path that's been assigned to the edges. I'm too new at coding to make a reproducible edge list.
inst2 = c(2, 3, 4, 5, 6)
motherinst2 = c(7, 8, 9, 10, 11)
km = c(20, 30, 40, 25, 60)
df2 = data.frame(inst2, motherinst2)
edgelist = cbind(df2, km)
g = graph_from_data_frame(edgelist)
I know how to calculate the path length of vertices in a graph, but I have some attributes attached to the edges that I want to sum up as path lengths. They are simple attributes (distance in km, time in days, and speed as km/day).
This is how I was calculating the path of vertices (between roots and terminals/leaves):
roots = which(sapply(sapply(V(g),
function(x) neighbors(g, x, mode = 'in')), length) == 0)
#slight tweaking this piece of code will also calculate 'terminal' nodes (or leaves). (11):
terminals = which(sapply(sapply(V(g),
function(x) neighbors(g, x, mode = 'out')), length) == 0)
paths= lapply(roots, function(x) get.all.shortest.paths(g, from = x, to = terminals, mode = "out")$res)
named_paths= lapply(unlist(paths, recursive=FALSE), function(x) V(g)[x])
I just want to do essentially exactly as I did above, but summing up the distance, time, and rate (which I will compute the mean of) incurred between each of those paths. If it helps to know how the edges have been added as attributes, I've used cbind like so:
edgelist_df = cbind(edgelist_df, time, dist, speed)
and my graph object (g) is set up like this:
g <- graph_from_data_frame(edgelist_df, vertices = vattrib_df)
vattrib_df is the attributes of the vertices, which is not of interest to us here.

Neighbor groups based on cluster assignment is slow

I am doing some analysis using iGraph in R, and I am currently doing a calculation that is very expensive. I need to do it across all of the nodes in my graph, so if someone knows a more efficient way to do it, I would appreciate it.
I start out with a graph, g. I first do some community detection on the graph
library(igraph)
adj_matrix <- matrix(rbinom(10 * 5, 1, 0.5), ncol = 8000, nrow = 8000)
g <- graph_from_adjacency_matrix(adj_matrix, mode = 'undirected', diag = FALSE)
c <- cluster_louvain(g)
Then, I basically assign each cluster to 1 of 2 groups
nc <- length(c)
assignments <- rbinom(nc, 1, .5)
Now, for each node, I want to find out what percentage of its neighbors are in a given group (as defined by the cluster assignments). I currently do this in the current way:
pct_neighbors_1 <- function(g, vertex, c, assignments) {
sum(
ifelse(
assignments[membership(c)[neighbors(g, vertex)]] == 1, 1, 0)
)/length(neighbors(g, vertex))
}
And then, given that I have a dataframe with each row corresponding to one vertex in the graph, I do this for all vertices with
data$pct_neighbors_1 <- sapply(1:nrow(data),
pct_neighbors_1,
graph = g, community = c,
assignments = assignments)
Is there somewhere in here that I can make things more efficient? Thanks!
This should be faster :
library(igraph)
# for reproducibility's sake
set.seed(1234)
# create a random 1000 vertices graph
nverts <- 1000
g <- igraph::random.graph.game(nverts,0.1,type='gnp',directed=FALSE)
# clustering
c <- cluster_louvain(g)
# assignments
nc <- length(c)
assignments <- rbinom(nc, 1, .5)
# precalculate if a vertex belongs to the assigned communities
vertsInAssignments <- membership(c) %in% which(assignments==1)
# compute probabilities
probs <- sapply(1:vcount(g),FUN=function(i){
neigh <- neighbors(g,i)
sum(vertsInAssignments[neigh]) / length(neigh)
})

Resources