Solving Chinese Postman algorithm with eulerization - r

I'm would like to solve Chinese Postman problem in a graph where an eulerian cycle does not exist. So basically I'm looking for a path in a graph which visits every edge exactly once, and starts and ends at the same node. A graph will have an euler cycle if and only if every node has same number of edges entering into and going out of it. Obviously my graph doesn't .
I found out that Eulerization (making a graph Eulerian) could solve my question LINK. Can anyone suggest a script to add duplicate edges to a graph so that the resulting graph has no vertices of odd degree (and thus does have an Euler Circuit)?
Here is my example:
require(igraph)
require(graph)
require(eulerian)
require(GA)
g1 <- graph(c(1,2, 1,3, 2,4, 2,5, 1,5, 3,5, 4,7, 5,7, 5,8, 3,6, 6,8, 6,9, 9,11, 8,11, 8,10, 8,12, 7,10, 10,12, 11,12), directed = FALSE)
mat <- get.adjacency(g1)
mat <- as.matrix(mat)
rownames(mat) <- LETTERS[1:12]
colnames(mat) <- LETTERS[1:12]
g2 <- as(graphAM(adjMat=mat), "graphNEL")
hasEulerianCycle(g2)

Fun problem.
The graph you sugest in the code above, can be made to have duplicates that enable a eulerian cycle to be created. The function I provide below tries to add the minimum amount of duplicate edges, but also readily breaks the graph structure by adding new links if it has to.
You can run:
eulerian.g1 <- make.eulerian(g1)$graph
Check what the function did to your graph with:
make.eulerian(g1)$info
Bare in mind that:
This is not the only graph structure where duplicates added to the original g1 graph can form an eulerian cycle. Imagine for example my function looping the vertices of the graph backwards instead.
Your graph already has an uneven number of vertices with uneven degree, and all of the vertices that are, have neighbours with uneven degrees to pair them with. This function therefore works well four your particular example data.
The function could fail to produce a graph using only duplicates even in graphs where eulerian cycles are possible with correctly added duplicates. This is since it always goes for connecting a node with the first of its neighbours with uneven degree. If this is something that you'd absolutely like to get around, an MCMC-approach would be the way to go.
See also this excellent answer on probability calculation:
Here's my function in a full script that you can source out-of-the-box:
library(igraph)
# You asked about this graph
g1 <- graph(c(1,2, 1,3, 2,4, 2,5, 1,5, 3,5, 4,7, 5,7, 5,8, 3,6, 6,8, 6,9, 9,11, 8,11, 8,10, 8,12, 7,10, 10,12, 11,12), directed = FALSE)
# Make a CONNECTED random graph with at least n nodes
connected.erdos.renyi.game <- function(n,m){
graph <- erdos.renyi.game(n,m,"gnm",directed=FALSE)
graph <- delete_vertices(graph, (degree(graph) == 0))
}
# This is a random graph
g2 <- connected.erdos.renyi.game(n=12, m=16)
make.eulerian <- function(graph){
# Carl Hierholzer (1873) had explained how eulirian cycles exist for graphs that are
# 1) connected, and 2) contain only vertecies with even degrees. Based on this proof
# the posibility of an eulerian cycle existing in a graph can be tested by testing
# on these two conditions.
#
# This function assumes a connected graph.
# It adds edges to a graph to ensure that all nodes eventuall has an even numbered. It
# tries to maintain the structure of the graph by primarily adding duplicates of already
# existing edges, but can also add "structurally new" edges if the structure of the
# graph does not allow.
# save output
info <- c("broken" = FALSE, "Added" = 0, "Successfull" = TRUE)
# Is a number even
is.even <- function(x){ x %% 2 == 0 }
# Graphs with an even number of verticies with uneven degree will more easily converge
# as eulerian.
# Should we even out the number of unevenly degreed verticies?
search.for.even.neighbor <- !is.even(sum(!is.even(degree(graph))))
# Loop to add edges but never to change nodes that have been set to have even degree
for(i in V(graph)){
set.j <- NULL
#neighbors of i with uneven number of edges are good candidates for new edges
uneven.neighbors <- !is.even(degree(graph, neighbors(graph,i)))
if(!is.even(degree(graph,i))){
# This node needs a new connection. That edge e(i,j) needs an appropriate j:
if(sum(uneven.neighbors) == 0){
# There is no neighbor of i that has uneven degree. We will
# have to break the graph structure and connect nodes that
# were not connected before:
if(sum(!is.even(degree(graph))) > 0){
# Only break the structure if it's absolutely nessecary
# to force the graph into a structure where an euclidian
# cycle exists:
info["Broken"] <- TRUE
# Find candidates for j amongst any unevenly degreed nodes
uneven.candidates <- !is.even(degree(graph, V(graph)))
# Sugest a new edge between i and any node with uneven degree
if(sum(uneven.candidates) != 0){
set.j <- V(graph)[uneven.candidates][[1]]
}else{
# No candidate with uneven degree exists!
# If all edges except the last have even degrees, thith
# function will fail to make the graph eulerian:
info["Successfull"] <- FALSE
}
}
}else{
# A "structurally duplicated" edge may be formed between i one of
# the nodes of uneven degree that is already connected to it.
# Sugest a new edge between i and its first neighbor with uneven degree
set.j <- neighbors(graph, i)[uneven.neighbors][[1]]
}
}else if(search.for.even.neighbor == TRUE & is.null(set.j)){
# This only happens once (probably) in the beginning of the loop of
# treating graphs that have an uneven number of verticies with uneven
# degree. It creates a duplicate between a node and one of its evenly
# degreed neighbors (if possible)
info["Added"] <- info["Added"] + 1
set.j <- neighbors(graph, i)[ !uneven.neighbors ][[1]]
# Never do this again if a j is correctly set
if(!is.null(set.j)){search.for.even.neighbor <- FALSE}
}
# Add that a new edge to alter degrees in the desired direction
# OBS: as.numeric() since set.j might be NULL
if(!is.null(set.j)){
# i may not link to j
if(i != set.j){
graph <- add_edges(graph, edges=c(i, set.j))
info["Added"] <- info["Added"] + 1
}
}
}
# return the graph
(list("graph" = graph, "info" = info))
}
# Look at what we did
eulerian <- make.eulerian(g1)
eulerian$info
g <- eulerian$graph
par(mfrow=c(1,2))
plot(g1)
plot(g)

Related

How do I see the two nodes of most weighted egdes in R igraph?

I have an igraph network object constructed in R and generated weight information for each edge. I want to see the nodes of the most weighted edges (descending). What codes should I use to do that? Thank you!
# create an igraph project of user interaction network and check descriptives.
library(igraph)
#edge list
EL = read.csv("(file path omitted)user_interaction_structure.csv")
head(EL)
#node list: I do not have a node list
#construct an igraph oject
g <- graph_from_data_frame(EL, directed = TRUE, vertices = NULL)
#check the edge and node number of the network
gsize(g)
vcount(g)
#check nodes based on degree (descending)
deg <- igraph::degree(g)
dSorted <-sort.int(deg,decreasing=TRUE,index.return=FALSE)
dSorted
#check edges based on weight
E(g)
#the network will contain loop edges and multiple edges
#simplify multiple edges
g_simple <- graph.adjacency(get.adjacency(g),weighted=TRUE)
#check edge weight
E(g_simple)$weight
#igraph can generate a matrix
g_simple[]
Then I wanted to see who were interacting heavily with whom (the nodes of the edges with the largest weight),so I tried
e_top_weights <- order(order(E(g_simple))$weight, decreasing=TRUE)
but it did not work.
I think what you want is the igraph function strength(), which gives the sum of the weights of the edges incident to each node. Here's an example:
library(igraph)
# A small graph we can visualize
g <- make_ring(5)
# Assign each edge an increasing weight, to make things
# easy
edgeweights<- 1:ecount(g)
E(g)$weight <- edgeweights
# The strength() function sums the weights of edges incident
# to each node
strengths <- strength(g)
# We can collect the top two strengths by sorting the
# strengths vector, then asking for which elements of the
# strengths vector are equal to or greater than the second
# largest element.
toptwo <- which(strengths >= sort(strengths, decreasing = TRUE)[2])
## [1] 4 5
# Assign nodes a color blue that is more saturated when nodes
# have greater strength.
cr <- colorRamp(c(rgb(0,0,1,.1), rgb(0,0,1,1)), alpha = TRUE)
colors <- cr(strengths/max(strengths))
V(g)$color <- apply(colors, 1, function(row) rgb(row[1], row[2], row[3], row[4], maxColorValue = 255))
# Plot to confirm
plot(g, edge.width = edgeweights)
Edit
Here are two different ways to find the two nodes (the "from" node and the "to" node) which are the ends of the edge with the maximum weight:
## 1
edge_df <- as_data_frame(g, "edges")
edge_df[which(edge_df$weight == max(edge_df$weight)), c("from", "to")]
## 2
max_weight_edge <- E(g)[which(E(g)$weight == max(E(g)$weight))]
ends(g, es = max_weight_edge)

Add iteratively one edge in a graph generated by a k regular game maintaining past connections

I have generated an undirected regular graph with an even number of nodes with the same degree, e.g. k, by using the function k.regular.game of the R package igraph.
Now I need to iteratively add one edge to each node, so that in each iteration the degree remains constant for every node and it is equal to k + i, where i is the number of iterations performed.
In addition, I want connections to be preserved in each iteration, that is: the set of neighbors of agent j for iteration i should be the same of the set of neighbors of agent j for iteration i + 1 except for one connection: e.g., if j is connected to w and y when k = 2, j must be connected to w, y and z when k = 3.
My final goal is to obtain (n-1) graphs, where n is equal to the number of nodes in the regular graph. As a result, I will obtain that the first generated graph has k = 1 and the last generated graph has k = (n-1).
Any suggestion on how to do this?
This is a nice network problem solved with two partial solutions below.
Let's imagine there is a function which would bring a graph g from all degrees being 1 to all degrees being 2. It would have to be a graph with an even number of nodes.
increment.k <- function(g){}
It follows that increment.k will increase the degree of each node by one by adding |V|/2 edges to it - one edge for each two nodes in the graph. From what I understand from your problem specification, any of those edges must not connect agin two nodes that are already connected. This makes increment.k() a puzzle in which a random edge between two nodes might close the possibility for all nodes to reach the new k-value of degrees. What if a graph has k=1 and we start adding edges at random only to arrive at the last edge only to find that the only two nodes still with degree 1 are already connected?!
I cannot intuitively grasp if this allows for the possibility of graphs that cannot be incremented since no combination of random edges allows for the creation of |V|/2 edges between previously unconnected nodes. But I can imagine that such graphs exist.
I've done this example on a graph with 20 nodes (which consequently can have a k between 1 and 19):
g <- k.regular.game(no.of.nodes=20, k=1, directed=F)
What if you were to generate random k.regular.games with a higher k until you found a graph where the edges of your graph is a subset of the edges of the higher-k random graph? It should be spectacularly slow.
The problem, of course, is that you don't want to allow for duplicated arches. If not, the solution would be quite simple:
increase.k.allowing.duplicates <- function(graph){
if(length(V(graph))%%2!=0){
stop("k can only be incremented for graphs with an even number of nodes.")
}
# Add random edges to the graph and allow dual edges just to increase k
graph %>% add_edges(as.numeric(sample(1:length(V(graph)), length(V(graph)))))
}
The above code would solve the problem if double arches were allowed. This would return graphs of ever higher k, and would let k go towards infinity since the number of nodes of the graph don't set any maximum average degree of the graph.
I have come up with this Montecarlo approach below. To increase k by one, a given number of edges is added one by one between nodes, but if the loop runs out of alternatives when placing arches between nodes that are 1) not connected and 2) not already incremented to the higher k/degree, the process of creating a new graph with a higher k starts over. The function has a maximum number of tries start over in maximum.tries.
increase.k <- function(graph, maximum.tries=200){
if(length(V(graph))%%2!=0){
stop("k can only be incremented for graphs with an even number of nodes.")
}
k <- mean(degree(graph))
if(k != round(k) ){
stop("Nodes in graph do not have the same degree")
}
if(k >= length(V(graph))-1 ) {
stop("This graph is complete")
}
# each node has the following available arches before starting the iteration:
#posisble.arches <- lapply(neighbors(graph,1), function(x) setdiff(V(graph), x[2:length(x)]))
# Here we must lay the puzzle. If we run into a one-way street with the edges we add, we'll have to start afresh
original.graph <- graph
for(it in 1:maximum.tries){
# We might need many tries to get the puzzle right by brute-forcing
# For each try we increment in a loop to avoid duplicate links
for(e_ij in 1:(length(V(graph))/2)){
# Note that while(mean(degree(graph)) < k + 1){} is a logical posibility, but less safe
# Add a new edge between two nodes of degree k. i is any such node and j is any such node not already connected to i
i <- sample(as.numeric(V(graph)[degree(graph)==k]), 1)
js <- as.numeric(V(graph)[degree(graph) == k * !V(graph) %in% c(as.numeric(neighbors(graph,i)), i)])
# Abandon this try if no node unconnected to i and with degree == k exists
if(length(js)==0){break}
j <- sample(c(js), 1); if(length(js)==1){j<-js}
graph <- graph %>% add_edges(c(i,j))
}
# Did we lay the puzzle to completion successfully crating a random graph with a higher k?
if(mean(degree(graph)) == k+1){
# Success
print(paste("Succeded at iteration ", it))
break
} else {
# Failure, let's try again
graph <- original.graph
print("Failed")
}
}
(graph)
}
# Compare the two approaches
g1 <- increase.k.allowing.duplicates(g)
g2 <- increase.k(g)
degree(g1) == degree(g2)
l <- layout_with_gem(g2)
par(mfrow=c(1,2))
plot(g1, layout=l, vertex.label="")
plot(g2,layout=l, vertex.label="")
dev.off()
# Note that increase.k() can be run incrementally up untill a complete graph:
is.complete <- function(graph){mean(degree(graph)) >= (length(V(graph))-1)}
while(!is.complete(g)){
print(mean(degree(g)))
g <- increase.k(g)
}
# and that increase.k() cannot increase k in already complete graphs.
g <- increase.k(g)
The above code has solved the problem for some graphs. More iterations are needed to lay the puzzle the larger the graph is. In this example with only 20 nodes, each k-level can be generated from 1-19 relatively quickly. I did manage to get 19 separate networks from k=1 to k=19. But I have managed to get stuck in the loop also, which I take as evidence for the existing network structures of which k cannot be successfully incremented. Particularly since the same starting specification can get stuck sometimes, but manage to arrive at a complete graph on other occasions.
To test the function, I set the maximum.tries to 25 and tried to go from k=1 to 19 100 times. It never worked. The higher the k, the more difficult it is to lay the puzzle and find arches that fit, even though the next-to-last iteration is faster before a collapse. The risk of hitting the cap of 25 increased between the 15th and 18th iteration, and most graphs only made it to k=17.
It is possible to imagine this method being performed backwards starting at a complete graph, removing edges within a Montecarlo process which tries to remove edges to achieve a graph with all degrees at k-1. It should run into similar problems, though.
The code above is really an attempt to brute-force this problem without going into the underlying mathematics of graphs of this type. I am not a mathematician and lack the skills, but maybe the creation of a fail-safe k.increment()-function is a real and unsolved mathematical problem. If any graph-theoreticians come by this post, please enlighten us.

Why igraph::cluster_walktrap gives a different result for non directed isomorphic graphs?

I'm trying to use igraph::cluster_walktrap in R to look for communities inside of a graph, however I noticed a weird behaviour (or at least, a behaviour I am not able to explain).
Suppose you are given an undirected graph by defining a list of its edges. Say
a,b
c,d
e,f
...
Then, if I define another graph by swapping randomly selected vertices in the edge list definition:
a,b
d,c
e,f
...
I expect the two graphs to be isomorphic and the difference between the two graph to be empty. This is exactly what happens in R in my toy example. Following this line of reasoning, calling cluster_walktrap on the two graphs (using set.seed appropriately) should yield the same result since the two graphs are the same. This is not happening and the only explanation I can give is that the starting point of each random walk is not the same for the two graphs. Why is this?
You can follow my reasoning in the toy example below. I don't understand why the last two objects are not identical.
require(igraph)
# Number of vertices
verteces <- 50
# Swap randomly some elements in the edges definition
set.seed(20)
row_swapped <- sample(1:verteces,25,replace=F)
m_values <- sample(letters, verteces*2, replace=T) #1:100
# Build edge lists
m1 <- matrix(m_values, verteces, 2)
m1
a <- m1
colS <- seq(round(ncol(m1)*0.3))
m1[row_swapped, 2:1] <- m1[row_swapped, 1:2]
m1
b <- m1
# Define the two graphs
ag <- igraph::graph_from_edgelist(a, directed = F)
bg <- igraph::graph_from_edgelist(b, directed = F)
# Another way of building an isomorphic graph for testing
#bg <- permute(ag, sample(vcount(ag)))
# Should be empty: ok
difference(ag, bg)
# Should be TRUE: ok
isomorphic(ag,bg)
# I expect it to be TRUE but it isn't...
identical(ag, bg)
# Vertices
V(ag)
ag
V(bg)
bg
# Calculate community
set.seed(100)
ac1 <- cluster_walktrap(ag)
set.seed(100)
bc1 <- cluster_walktrap(bg)
# I expect all to be TRUE, however
# merges is different
# membership is different
# names are different
identical(ac1$merges, bc1$merges)
identical(ac1$modularity, bc1$modularity)
identical(ac1$membership, bc1$membership)
identical(ac1$names, bc1$names)
identical(ac1$vcount, bc1$vcount)
identical(ac1$algorithm, bc1$algorithm)
The results are not different. You have two things going on which is making your graphs not identical but isoporphic. I emphasize identical because it has a very strict definition.
1) identical(ag, bg) is not identical because the vertices and edges are not in the same order between the two graphs. Exactly, the same nodes and edges exist but they are not in the exact same place or orientation. For, example if I shuffle the rows of a and make a new graph...
a1 <- a[sample(1:nrow(a)), ]
a1g <- igraph::graph_from_edgelist(a1, directed = F)
identical(ag, a1g)
#[1] FALSE
2) This goes for edges as well. An edge is stored as node1, node2 and a flag if the edge is directed or not. so when you swap rows the representation at the "byte level" (I use this term loosely) is different even though the relationship is the same. Edge 44 represents the same relationship but is stored based on how it was constructed.
E(ag)[44]
# + 1/50 edge from 6318240 (vertex names):
# [1] q--d
E(bg)[44]
# + 1/50 edge from 38042e0 (vertex names):
# [1] d--q
So onto your cluster_walktrap, first, the function returns the index of the vertices, not the name which can be misleading. Which means the reason the objects aren't identical is because ag and bg have different ordering of nodes in the object.
If I reorder the membership by node name the two become identical.
identical(membership(bc1)[order(names(membership(bc1)))], membership(ac1)[order(names(membership(ac1)))])
#[1] TRUE

Clustering based on connectivity of points

I have 1 million records of lat long [5 digits precision] and Route. I want to cluster those data points.
I dont want to use standard k-means clustering as I am not sure how many clsuters [tried Elbow method but not convinced].
Here is my Logic -
1) I want to reduce width of lat long from 5 digits to 3 digits.
2) Now lat longs which are in range of +/- 0.001 are to be clustered in once cluster. Calculate centroid of cluster.
But in doing so I am unable to find good algorithm and R Script to execute my thought code.
Can any one please help me in above problem.
Thanks,
Clustering can be done based on connected components.
All points that are in +/-0.001 distance to each other can be connected so we will have a graph that contains subgraphs that each may be a single poin or a series of connected points(connected components)
then connected components can be found and their centeroid can be calculated.
Two packages required for this task :
1.deldir to form triangulation of points and specify which points are adaject to each other and to calculate distances between them.
2 igraph to find connected components.
library(deldir)
library(igraph)
coords <- data.frame(lat = runif(1000000),long=runif(1000000))
#round to 3 digits
coords.r <- round(coords,3)
#remove duplicates
coords.u <- unique(coords.r)
# create triangulation of points. depends on the data may take a while an consume more memory
triangulation <- deldir(coords.u$long,coords.u$lat)
#compute distance between adjacent points
distances <- abs(triangulation$delsgs$x1 - triangulation$delsgs$x2) +
abs(triangulation$delsgs$y1 - triangulation$delsgs$y2)
#remove edges that are greater than .001
edge.list <- as.matrix(triangulation$delsgs[distances < .0011,5:6])
if (length(edge.list) == 0) { #there is no edge that its lenght is less than .0011
coords.clustered <- coords.u
} else { # find connected components
#reformat list of edges so that if the list is
# 9 5
# 5 7
#so reformatted to
# 3 1
# 1 2
sorted <- sort(c(edge.list), index.return = TRUE)
run.length <- rle(sorted$x)
indices <- rep(1:length(run.length$lengths),times=run.length$lengths)
edge.list.reformatted <- edge.list
edge.list.reformatted[sorted$ix] <- indices
#create graph from list of edges
graph.struct <- graph_from_edgelist(edge.list.reformatted, directed = FALSE)
# cluster based on connected components
clust <- components(graph.struct)
#computation of centroids
coords.connected <- coords.u[run.length$values, ]
centroids <- data.frame(lat = tapply(coords.connected$lat,factor(clust$membership),mean) ,
long = tapply(coords.connected$long,factor(clust$membership),mean))
#combine clustered points with unclustered points
coords.clustered <- rbind(coords.u[-run.length$values,], centroids)
# round the data and remove possible duplicates
coords.clustered <- round(coords.clustered, 3)
coords.clustered <- unique(coords.clustered)
}

Extracting the outer-boundary of a set of grid points with some missing grids (holes)

This question is about generalization of this question. The mentioned question is working well for the point set with no hole. In the present question I want to get the perimeter (outer boundary) of a subset of a near-regular grid of points where some of the grid point with in the polygon are missing (i.e., polygon with hole).
The sample data set on grids is available here.
I used the R-code as suggested as an answer in the above mentioned question (with no holes).
The following is the output of using those codes:
Now I want it ignore holes in inside the point set and want to consider the outer boundary of the point set as the required polygon.
Any suggestion!! Thanks.
This slight variant on my previous code works if there's holes by finding all the loops and taking only the one with the largest X coordinate, which must be the outside loop. Unless the loops touch... Strictly perhaps it should take the loop with the largest area... Note also the need to use X and Y in one of the functions because of a bug (I've reported) in the igraph package.
perimeterGrid <- function(pts, maxdist=6000, mindist=1){
g = edgeP(makegrid(pts, maxdist=maxdist, mindist=mindist))
## there might be holes. Find the loop with the largest X coordinate.
parts = components(g)
outer = which.max(tapply(V(g)$x, parts$membership, function(x){max(x)}))
g = induced_subgraph(g, which(parts$membership==outer))
loop = graph.dfs(minimum.spanning.tree(g),1)$order
cbind(V(g)$x, V(g)$y)[loop,]
}
# haversine distance matrix
dmat <- function(pts){
n=nrow(pts)
do.call(rbind,lapply(1:n,function(i){distHaversine(pts[i,],pts)}))
}
# make the grid cells given a maxdist (and a mindist to stop self-self edges)
makegrid <- function(pts, maxdist=6000, mindist=1){
dists = dmat(pts)
g = graph.adjacency(dists<maxdist & dists>mindist,
mode="undirected")
## use X and Y here not x and y due to igraph bug
## these get copied to the output later...
V(g)$X=pts[,1]
V(g)$Y=pts[,2]
g
}
# clever function that does the grid edge count etc
edgeP <- function(g){
# find all the simple squares
square=graph.ring(4)
subs = graph.get.subisomorphisms.vf2(g,square)
# expand all the edges
subs = do.call(rbind, lapply(subs, function(s){
rbind(s[1:2], s[2:3], s[3:4], s[c(4,1)])
}))
# make new graph of the edges of all the squares
e = graph.edgelist(subs,directed=FALSE)
# add the weight as the edge count
E(e)$weight=count.multiple(e)
# copy the coords from the source back
V(e)$x=V(g)$X
V(e)$y=V(g)$Y
# remove multiple edges
e=simplify(e)
# internal edges now have weight 256.
e = e - edges(which(E(e)$weight==256))
# internal nodes how have degree 0
e = e - vertices(degree(e)==0)
return(e)
}

Resources