How to calculate the edge attributes as the path length in igraph? - r

Pretend the dataframe below is an edgelist (relation between inst2 and motherinst2), and that km is an attribute I want to calculate as a path that's been assigned to the edges. I'm too new at coding to make a reproducible edge list.
inst2 = c(2, 3, 4, 5, 6)
motherinst2 = c(7, 8, 9, 10, 11)
km = c(20, 30, 40, 25, 60)
df2 = data.frame(inst2, motherinst2)
edgelist = cbind(df2, km)
g = graph_from_data_frame(edgelist)
I know how to calculate the path length of vertices in a graph, but I have some attributes attached to the edges that I want to sum up as path lengths. They are simple attributes (distance in km, time in days, and speed as km/day).
This is how I was calculating the path of vertices (between roots and terminals/leaves):
roots = which(sapply(sapply(V(g),
function(x) neighbors(g, x, mode = 'in')), length) == 0)
#slight tweaking this piece of code will also calculate 'terminal' nodes (or leaves). (11):
terminals = which(sapply(sapply(V(g),
function(x) neighbors(g, x, mode = 'out')), length) == 0)
paths= lapply(roots, function(x) get.all.shortest.paths(g, from = x, to = terminals, mode = "out")$res)
named_paths= lapply(unlist(paths, recursive=FALSE), function(x) V(g)[x])
I just want to do essentially exactly as I did above, but summing up the distance, time, and rate (which I will compute the mean of) incurred between each of those paths. If it helps to know how the edges have been added as attributes, I've used cbind like so:
edgelist_df = cbind(edgelist_df, time, dist, speed)
and my graph object (g) is set up like this:
g <- graph_from_data_frame(edgelist_df, vertices = vattrib_df)
vattrib_df is the attributes of the vertices, which is not of interest to us here.

Related

Shortest Paths based on edge attribute with igraph

I'm trying to get the shortest paths of a graph but based on its edge ids.
So having the following graph:
library(igraph)
set.seed(45)
g <- erdos.renyi.game(25, 1/10, directed = TRUE)
E(g)$id <- sample(1:3, length(E(g)), replace = TRUE)
The shortest_paths(g, 1, V(g)) function finds all the shortest paths from node 1 to all the other nodes. However, I would like to calculate this, not just by following the geodesic distance, but a mix between the geodesic distance, and the minimum of edge id changes.
For example if this would be a train network, and the edge ids would represent trains. I would like to calculate how to get from node A to all the other nodes using the shortest path, but while changing the least amount of time of trains.
OK I think I have a working solution, although the code is a little ugly. The basic algorithm (lets call it gs(i, j)) goes like this: If we want to find the shortest train journey from i to j (gs(i, j)) we:
find the shortest path from i to j considering all trains. if this path is length 0 or 1 return it (there is either no path or a path on 1 train)
split the graph up by 'trains' (subset graph by edges) so as to consider each train network separately, and find the shortest path between i and j in each individual train network
if a single train will get you from i to j, return the train route with the fewest stops between i and j, else
if no single train runs from i to j then call gs(i, j-1) where (j-1) is the stop before j in the shortest path between i and j on the full network.
So basically, we look to see if a single train can do it, and if it can't we call the function recursively looking if a single train can get you to the stop before the last stop, etc. etc.
library(igraph)
# First your data
set.seed(45)
g <- erdos.renyi.game(25, 1/10, directed = TRUE)
E(g)$id <- sample(1:3, length(E(g)), replace = TRUE)
plot(g, edge.color = E(g)$id)
# The function takes as arguments the graph, and the id of the vertex
# you want to go from/to. It should work for a vector of
# destinations but I have not rigorously tested it so proceed with
# caution!
get.shortest.routes <- function(g, from, to){
train.routes <- lapply(unique(E(g)$id), function(id){subgraph.edges(g, eids = which(E(g)$id==id), delete.vertices = F)})
target.sp <- shortest_paths(g, from = from, to = to, output = 'vpath')$vpath
single.train.paths <- lapply(train.routes, function(gs){shortest_paths(gs, from = from, to = to, output = 'vpath')$vpath})
for (i in length(target.sp)){
if (length(target.sp[[i]]>1)) {
cands <- lapply(single.train.paths, function(l){l[[i]]})
if (sum(unlist(lapply(cands, length)))!=0) {
cands <- cands[lapply(cands, length)!=0]
cands <- cands[lapply(cands, length)==min(unlist(lapply(cands, length)))]
target.sp[[i]] <- cands[[1]]
} else {
target.sp[[i]] <- c(get.shortest.routes(g, from = as.numeric(target.sp[[i]][1]),
to = as.numeric(target.sp[[i]][(length(target.sp[[i]]) - 1)]))[[1]],
get.shortest.routes(g, from = as.numeric(target.sp[[i]][(length(target.sp[[i]]) - 1)]),
to = as.numeric(target.sp[[i]][length(target.sp[[i]])]))[[1]][-1])
}
}
}
target.sp
}
OK now lets run some tests. If you squint at the graph above you can see that the path from vertex 5 to vertex 21 is length-2 if you take two trains, but that you can get there on 1 train if you pass through an extra station. Our new function should return the longer path:
shortest_paths(g, 5, 21)$vpath
#> [[1]]
#> + 3/25 vertices, from b014eb9:
#> [1] 5 13 21
get.shortest.routes(g, 5, 21)
#> Warning in shortest_paths(gs, from = from, to = to, output = "vpath"): At
#> structural_properties.c:745 :Couldn't reach some vertices
#> Warning in shortest_paths(gs, from = from, to = to, output = "vpath"): At
#> structural_properties.c:745 :Couldn't reach some vertices
#> [[1]]
#> + 4/25 vertices, from c22246c:
#> [1] 5 13 15 21
Lets make a really easy graph where we are sure what we want to see: here we should get 1-2-4-5 instead of 1-3-5:
df <- data.frame(from = c(1, 1, 2, 3, 4), to = c(2, 3, 4, 5, 5))
g1 <- graph_from_data_frame(df)
E(g1)$id <- c(1, 2, 1, 3, 1)
plot(g1, edge.color = E(g1)$id)
get.shortest.routes(g1, 1, 5)
#> Warning in shortest_paths(gs, from = from, to = to, output = "vpath"): At
#> structural_properties.c:745 :Couldn't reach some vertices
#> Warning in shortest_paths(gs, from = from, to = to, output = "vpath"): At
#> structural_properties.c:745 :Couldn't reach some vertices
#> [[1]]
#> + 4/5 vertices, named, from c406649:
#> [1] 1 2 4 5
I'm sure there is a more rigorous solution, and you'll probably want to optimize the code a bit. For instance, I just realized that I don't stop the function immediately if the shortest path on the full graph has only two nodes -- doing so would avoid some needless computations! This was a fun problem, I hope some other answers gets posted.
Created on 2018-05-11 by the reprex package (v0.2.0).
Here is my take on the problem. A few notes:
1) all_simple_paths will not scale well with large or highly connected graphs
2) I favored fewest changes above all else, which means a path with two changes and a dist of 40 will beat a path with three changes and a dist of 3.
4) I can imagine an even faster approach if # of changes and distance change priority if there is no path on one id
library(igraph)
# First your data
set.seed(45)
g <- erdos.renyi.game(25, 1/10, directed = TRUE)
E(g)$id <- sample(1:3, length(E(g)), replace = TRUE)
plot(g, edge.color = E(g)$id)
##Option 1:
rst <- all_simple_paths(g, from = 1, to = 18, mode = "out")
rst <- lapply(rst, as_ids)
rst1 <- lapply(rst, function(x) c(x[1], rep(x[2:(length(x)-1)],
each=2), x[length(x)]))
rst2 <- lapply(rst1, function(x) data.frame(eid = get.edge.ids(graph=g, vp = x),
train=E(g)$id[get.edge.ids(graph=g, vp = x)]))
rst3 <- data.frame(pathID=seq_along(rst),
changes=sapply(rst2, function(x) length(rle(x$train)$lengths)),
dist=sapply(rst2, nrow))
spath <- rst3[order(rst3$changes, rst3$dist), ][1,1]
#Vertex IDs
rst[[spath]]
#[1] 1 23 8 18
plot(g, edge.color = E(g)$id, vertex.color=ifelse(V(g) %in% rst[[spath]], "firebrick", "gray80"),
edge.arrow.size=0.5)

Labels on only root and terminal vertices in igraph (R)?

inst2 = c(2, 3, 4, 5, 6)
motherinst2 = c(7, 8, 2, 10, 11)
km = c(20, 30, 40, 25, 60)
df2 = data.frame(inst2, motherinst2)
df2 = cbind(df2, km)
g2 = graph_from_data_frame(df2)
tkplot(g2)
how would I approach adding labels to exclusively my root and terminal vertices in a graph? I know it would involve this function, but how would you set it up? Assuming the graph object is just called 'g', or something obvious.
vertex.label =
The solution from #eipi1o is good, but the OP says "I'm finding it difficult to apply to my large data set effectively." I suspect that the issue is finding which are the intermediate nodes whose name should be blanked out. I will continue the example of #eipi10. Since my answer is based on his, if you upvote my answer, please upvote his as well.
You can use the neighbors function to determine which points are sources and sinks. Everything else is an intermediate node.
## original graph from eipi10
g = graph_from_edgelist(cbind(c(rep(1,10),2:11), c(2:21)))
## Identify which nodes are intermediate
SOURCES = which(sapply(V(g), function(x) length(neighbors(g, x, mode="in"))) == 0)
SINKS = which(sapply(V(g), function(x) length(neighbors(g, x, mode="out"))) == 0)
INTERMED = setdiff(V(g), c(SINKS, SOURCES))
## Fix up the node names and plot
V(g)$name = V(g)
V(g)$name[INTERMED] = ""
plot(g)
Using your example graph, we'll identify the root and terminal vertices and remove the labels for other vertices. Here's what the initial graph looks like:
set.seed(2)
plot(g2)
Now let's identify and remove the names of the intermediate vertices
# Get all edges
e = get.edgelist(g2)
# Root vertices are in first column but not in second column
root = setdiff(e[,1],e[,2])
# Terminal vertices are in second column but not in first column
terminal = setdiff(e[,2], e[,1])
# Vertices to remove are not in root or terminal vertices
remove = setdiff(unique(c(e)), c(root, terminal))
# Remove names of intermediate vertices
V(g2)$name[V(g2)$name %in% remove] = ""
set.seed(2)
plot(g2)
Original Answer
You can use set.vertex.attribute to change the label names. Here's an example:
library(igraph)
# Create a graph to work with
g = graph_from_edgelist(cbind(c(rep(1,10),2:11), c(2:21)))
plot(g)
Now we can remove the labels from the intermediate vertices:
g = set.vertex.attribute(g, "name", value=c(1,rep("", length(2:11)),12:21))
plot(g)

R - computer vision image matching - nearest neighbour for SURF descriptor

I am using https://github.com/bnosac/image
and the image.dlib package for image matching.
I am trying to match two objects for similarity using SURF.. Also successfully got feature vector.
But in the nearest neighbour comparison (rflann package), I get varying results.
Do I sum up the abs distances and compare?
Also I am getting varying number of rows of surf points for each image (64 columns are constant)
Attached are reference and candidate images.
Reference should match with c1 ideally. Here's my try.
library(image.dlib)
f <- "D:\\<path to my ref image>\\ref.bmp"
f1 <- "D:\\<path to my candidate images>\\c1.bmp"
f2 <- "D:\\<path to my candidate images>\\c2.bmp"
f3 <-"D:\\<path to my candidate images>\\c3.bmp"
surf_blobs <- image_surf(f, max_points = 1000, detection_threshold = 30)
surf_blobs1 <- image_surf(f1, max_points = 1000, detection_threshold = 30)
surf_blobs2 <- image_surf(f2, max_points = 1000, detection_threshold = 30)
surf_blobs3 <- image_surf(f3, max_points = 1000, detection_threshold = 30)
library(rflann)
kn1 <- rflann::Neighbour(as.matrix(surf_blobs1$surf),as.matrix(surf_blobs$surf),k = 10, build = "kmeans")
s1 <- sum(abs(kn1$distances))
kn2 <- rflann::Neighbour(as.matrix(surf_blobs2$surf),as.matrix(surf_blobs$surf),k = 10, build = "kmeans")
s2 <- sum(abs(kn2$distances))
kn3 <- rflann::Neighbour(as.matrix(surf_blobs3$surf),as.matrix(surf_blobs$surf),k = 10, build = "kmeans")
s3 <- sum(abs(kn3$distances))
ref image
Candidate images
The links are given below
Reference image
https://i.imgur.com/jvHcA9c.png
Candidate images
Candidate image 1: https://image.ibb.co/jL6xNw/c1.png
Candidate image 2: https://image.ibb.co/khhF9b/c2.png
Candidate image 3: https://image.ibb.co/bNTYvG/c3.png
Additional candidates to test robustness:
https://image.ibb.co/mHqyfG/c2.jpg
https://image.ibb.co/de0W0G/c3.jpg

Extracting simplified graph in igraph

I have a network that looks like this
library(igraph)
library(igraphdata)
data("kite")
plot(kite)
I run a community detection and the result looks like this
community <- cluster_fast_greedy(kite)
plot(community,kite)
Now I want to extract a network based on the communities. The edge weight should be the number of ties between communities (how strong are communities connected to each other), the vertex attribute should be the number of nodes in the community (called numnodes).
d <- data.frame(E=c(1, 2, 3),
A=c(2, 3, 1))
g2 <- graph_from_data_frame(d, directed = F)
E(g2)$weight <- c(5, 1, 1)
V(g2)$numnodes <- c(4,3,3)
plot.igraph(g2,vertex.label=V(g2)$name, edge.color="black",edge.width=E(g2)$weight,vertex.size=V(g2)$numnodes)
The graph should look like this
One node is larger than the others, one edge has a lot of weight in comparison to the others.
As far as I know, igraph doesn't have method to count edges connecting groups of vertices. Therefore to count the edges connecting communities you need to iterate over each pairs of communities. To count the members for each community, you can use the sizes method.
library(igraph)
library(igraphdata)
data("kite")
plot(kite)
community <- cluster_fast_greedy(kite)
plot(community,kite)
cedges <- NULL
for(i in seq(1,max(community$membership) - 1)){
for(j in seq(i + 1, max(community$membership))){
imembers <- which(community$membership == i)
jmembers <- which(community$membership == j)
weight <- sum(
mapply(function(v1) mapply(
function(v2) are.connected(kite, v1, v2),
jmembers),
imembers)
)
cedges <- rbind(cedges, c(i, j, weight))
}
}
cedges <- as.data.frame(cedges)
names(cedges)[3] <- 'weight'
cgraph <- graph_from_data_frame(cedges, directed = FALSE)
V(cgraph)$numnodes <- sizes(community)
plot.igraph(cgraph,
vertex.label = V(cgraph)$name,
edge.color = "black",
edge.width = E(cgraph)$weight,
vertex.size = V(cgraph)$numnodes)

Define Edge labels based on matching vertex type in R

I have a graph net with two different types (1 and 2) of vertices, appearing n1 and n2 times, respectively:
net %v% "type" <- c(rep("1", n1), rep("2", n2))
We have some edges which were generated randomly with probabilities ps and pd, where ps is the edge probability with a same type (1-1 or 2-2) and pd with a different type (1-2).
I would like to plot this graph such that the edges between same types (i.e. 1-1 or 2-2) have a different color than edges between different types (1-2).
How do I do this?
I tried playing around with the %e% operator of the network package, but I'm confused about how to grab the type of the end node of each edge.
Thank you!
Do you want that?
from <- sample(1:2, 10, replace = T)
to <- sample(1:2, 10, replace = T)
node <- cbind(from, to)
library(igraph)
net <- graph_from_edgelist(node, directed = F)
edge_color <- function(from_to){
from_node <- from_to[1]
to_node <- from_to[2]
ifelse(from_node == to_node, return("red"), return("blue"))
}
color<- apply(node, 1, edge_color)
plot(net, edge.color=color)

Resources