How to iterate with dynamic changing the number edges and nodes? - r

I have a random graph with 10 nodes where 4 nodes have the zero degree.
It is required to obtain the connected graph by 1) select a node with zero degree and a minimal feature (for exmaple, random number from uniform distribautin) corresponding to each edge and connect it with graph by creation two incident edges to the node and deleting the 3rd edge, 2) repeat step 1 for all zero degree nodes.
The original graph in left, the resulting one in right.
My attempt is:
library(igraph)
######################################################################
set.seed(5)
g <- sample_gnm(10, 4)
xy <- cbind(runif(10), runif(10))
par(mfrow=c(1,2))
plot(g, vertex.size=5, layout=xy)
num_point <- length(V(g)[degree(g)==0])
for(k in 1:num_point){
points = V(g)[degree(g)==0]
for(i in 1:length(E(g))) { # loop over all edges
head <- get.edgelist(g)[i,][1]; h <- c(V(g)[head]$x, V(g)[head]$y)
tail <- get.edgelist(g)[i,][2]; t <- c(V(g)[tail]$x, V(g)[tail]$y)
d <- NULL
# loop over all points
for(j in points) d <- c(d, runif(1))
E(g)[i]$d <- min(d) # local min
E(g)[i]$p <- points[which(d == min(d))]
} # i
ei = which.min(E(g)$d) # edge with the global min
vi = E(g)[ei]$p
# head and tail of edge with global min
head <- get.edgelist(g)[E(g)[ei],][1]; tail <- get.edgelist(g)[E(g)[ei],][2]
g <- add_edges(g, c(head, V(g)[vi],
V(g)[vi],
tail));
g <- delete_edges(g, get.edge.ids(g, c(head, tail) ))
}
plot(g, vertex.size=5, layout=xy)
Question. How to organize the loop over all edges when the number of edges increase by 1 and number of point decrising by 1 evety step? One can see, I don't use the k variable in explicit form.

Instead of for loop, I think you can use repeat plus a termination condition, i.e., no isolated vertices any more
repeat {
points <- V(g)[degree(g) == 0]
for (i in 1:length(E(g))) { # loop over all edges
head <- get.edgelist(g)[i, ][1]
h <- c(V(g)[head]$x, V(g)[head]$y)
tail <- get.edgelist(g)[i, ][2]
t <- c(V(g)[tail]$x, V(g)[tail]$y)
d <- NULL
# loop over all points
for (j in points) d <- c(d, runif(1))
E(g)[i]$d <- min(d) # local min
E(g)[i]$p <- points[which(d == min(d))]
} # i
ei <- which.min(E(g)$d) # edge with the global min
vi <- E(g)[ei]$p
# head and tail of edge with global min
head <- get.edgelist(g)[E(g)[ei], ][1]
tail <- get.edgelist(g)[E(g)[ei], ][2]
g <- add_edges(g, c(
head, V(g)[vi],
V(g)[vi],
tail
))
g <- delete_edges(g, get.edge.ids(g, c(head, tail)))
if (sum(degree(g) == 0) == 0) {
break
}
}

I will recommend you to use recursion for this and drop for loop- using recursion for tree and graph structures will definitely make your life easier.
Answer:
maintain a stack of all the leaf nodes
every time you iterate empty your stack by matching the leaf node values
if there's a new value and count of the stack != to old count.
Now iterate again.

Related

How to define the mapping for a vetrex contraction?

I have a graph $G=(V, E)$. I need to make a vetrex contraction by the rules:
Find all articulation vertices with degree is greater than 2 and contract all vertices into them that can only be reached through this vertex.
The original graph in left, the expected graph in right. It is should be noted: instead of leaves "14", "8" and "3" can be subgraphs with more that one vertex.
First, I have found the bridge edges (red color) and two types of the articulation vertices: a) on a chain (red color), b) not on a chain (green color). The criteria for articulation classifucation is a vertex degree (2 or not 2).
My attemp is:
library(igraph)
set.seed(44)
n = 20
m = 35
G <- sample_gnm(n=n, m=m)
V(G)$group <- 1:n
V(G)$color <- "black"
E(G)$color <- "black"
ind <- articulation.points(G)
V(G)$color[ind] <- ifelse(degree(G, V(G)[ind])==2, "red", "green")
if(degree(G, V(G)[ind])==2) V(G)[ind]$group = 0
num_comp <- length(decompose.graph(G))
for (i in 1:m) {
G_sub <- delete.edges(G, i)
if (length(decompose.graph(G_sub)) > num_comp) E(G)$color[i] <- "red"
}
plot(G, layout = layout.fruchterman.reingold,
vertex.size = 15, vertex.color= V(G)$color,
vertex.label.color = "white" )
g2 <- contract(G, mapping = factor(V(G)$group),
vertex.attr.comb=toString)
plot(g2, layout = layout.fruchterman.reingold,
vertex.size = 15, vertex.color= V(G)$color,
vertex.label.color = "white" )
Question. How to define the mapping?
Edit. After the ThomasIsCoding's answer I'd add the figure for the remark: It is should be noted: instead of leaves "14", "8" and "3" can be subgraphs with more that one vertex. For instance, I can have the case:
set.seed(44)
n <- 20
m <- 35
G <- sample_gnm(n = n, m = m) %>%
add_vertices(1) %>%
add_vertices(1) %>%
add_edges(c(3,21, 3,22, 21,22))
plot(G)
In the figure below one can see the five bridges. The degree of vertices 8 and 14 equal to one, but vertex 3 in not a leaf now.
My problem is: how to distinguish the chain and no chain.
for (k in ind) {
nbs <- neighbors(G, k)
if (degree(G, k) == 2) # chain
V(G)$group <- replace(V(G)$group,
match(nbs[degree(G, nbs) == 1], V(G)), match(k, V(G)))
else # no chain
V(G)$group <- ...
}
Also weak place is: To which subgraph (A or B) should the vertex contraction operation be applied? In the original case the one vetrex were contracted only. The original task come from the simplification big graph for future analysis. And I think I can make the simplification based on bridges and cut-vertices. But now I am thinking on the selection subgraph for the vetrex contraction. The ccurrent point of view: apply the the vertex contraction for the subgraph with the minimal geodesic spanning tree.
You can try the code below to produce the mapping argument (see the for loop part)
library(igraph)
set.seed(44)
n <- 20
m <- 35
G <- sample_gnm(n = n, m = m)
V(G)$group <- 1:n
ind <- articulation.points(G)
for (k in ind) {
nbs <- neighbors(G, k)
V(G)$group <- replace(V(G)$group, match(nbs[degree(G, nbs) == 1], V(G)), match(k, V(G)))
}
g2 <- contract(G, mapping = factor(V(G)$group))

Optimizing spatial query in R - crown-distributed biomass in forest

I am trying to calculate the area of tree crowns overlapping squared grid cells in a forest plot. Hereafter, a reproducible example:
# A. Define objects
require(sp)
require(raster)
require(rgdal)
require(rgeos)
require(dismo)
radius=25 # max search radius around 10 x 10 m cells
res <- vector() # where to store results
# Create a fake set of trees with x,y coordinates and trunk diameter (=dbh)
set.seed(0)
survey <- data.frame(x=sample(99,1000,replace=T),y=sample(99,1000,replace=T),dbh=sample(100,1000,replace=T))
coordinates(survey) <- ~x+y
# Define 10 x 10 subplots
grid10 <- SpatialGrid(GridTopology(c(5,5),c(10,10),c(10,10)))
survey$subplot <- over(survey,grid10)
# B. Now find fraction of tree crown overlapping each subplot
for (i in 1:100) {
# Extract centroïd of each the ith cell
centro <- expand.grid(x=seq(5,95,10),y=seq(5,95,10))[i,]
corner <- data.frame(x=c(centro$x-5,centro$x+5,centro$x+5,centro$x-5),y=c(centro$y-5,centro$y-5,centro$y+5,centro$y+5))
# Find trees in a max radius (define above)
tem <- survey[which((centro$x-survey$x)^2+(centro$y-survey$y)^2<=radius^2),]
# Define tree crown based on tree diameter
tem$crownr <- exp(-.438+.658*log(tem$dbh/10)) # crown radius in meter
# Compute the distance from each tree to cell's borders
pDist <- vector()
for (k in 1:nrow(tem)) {
pDist[k] <- gDistance(tem[k,],SpatialPolygons(list(Polygons(list(Polygon(corner)),1))))
}
# Keeps only trees whose crown is lower than the above distance (=overlap)
overlap.trees <- tem[which(pDist<=tem$crownr),]
overlap.trees$crowna <-overlap.trees$crownr^2*pi # compute crown area
# Creat polygons from overlapping crowns
c1 <- circles(coordinates(overlap.trees),overlap.trees$crownr, lonlat=F, dissolve=F)
crown <- polygons(c1)
Crown <- SpatialPolygonsDataFrame(polygons(c1),data=data.frame(dbh=overlap.trees$dbh,crown.area=overlap.trees$crowna))
# Create a fine grid points to retrieve the fraction of overlapping crowns
max.dist <- ceiling(sqrt(which.max((centro$x - overlap.trees$x)^2 + (centro$y - overlap.trees$y)^2))) # max distance to narrow search
finegrid <- as.data.frame(expand.grid(x=seq(centro$x-max.dist,centro$x+max.dist,1),y=seq(centro$y-max.dist,centro$y+max.dist,1)))
coordinates(finegrid) <- ~ x+y
A <- extract(Crown,finegrid)
Crown#data$ID <- seq(1,length(crown),1)
B <- as.data.frame(table(A$poly.ID))
if (nrow(B)>0) {
B <- merge(B,Crown#data,by.x="Var1",by.y="ID",all.x=T)
B$overlap <- B$Freq/B$crown.area
B$overlap[B$overlap>1] <- 1
res[i] <- sum(B$overlap) } else {
res[i] <- 0 }
}
# C. Check the result
res # sum of crown fraction overlapping each cell (works fine)
This algo takes about 3 min to run for 100 cells. I have a large data set with 35000 cells, so 150*7=1050 minutes or 17.5 hours.
Any hint to fasten and/or optimize this algorithm??
After a quick profiling with profvis package, it would seem there can be some improvement just by changing a few lines. This wasn't an exhaustive optimization and I'm sure more improvements are still possible.
I changed
pDist <- vector()
for (k in 1:nrow(tem)) {
pDist[k] <- gDistance(tem[k,],SpatialPolygons(list(Polygons(list(Polygon(corner)),1))))
}
to
pDist <- rep(NA, nrow(tem))
my.poly <- SpatialPolygons(list(Polygons(list(Polygon(corner)),1)))
for (k in 1:nrow(tem)) {
pDist[k] <- gDistance(tem[k,], my.poly)
}
because there's no need to create SpatialPolygons object every time. This can be expensive as seen in the profiling images below (top is optimized).
Here is some code which should run in parallel.
# load only necessary package for code until parSapplyLB
# LB is load-balancing, which means it will distribute task to cores
# which are idle. This is great if jobs take an uneven amount of time
# to run.
library(parallel)
library(sp)
system.time({
# prepare the cluster, default is PSOCK on windows but can be FORK form *nix
cl <- makeCluster(4)
# worker is just a new instance of fresh vanilla R so you need to load the
# necessary libraries to all the workers
clusterEvalQ(cl = cl, library(sp))
clusterEvalQ(cl = cl, library(raster))
clusterEvalQ(cl = cl, library(rgdal))
clusterEvalQ(cl = cl, library(rgeos))
clusterEvalQ(cl = cl, library(dismo))
radius <- 25 # max search radius around 10 x 10 m cells
# res <- rep(NA, 100) # where to store results
# Create a fake set of trees with x,y coordinates and trunk diameter (=dbh)
set.seed(0)
survey <- data.frame(x=sample(99,1000,replace=T),y=sample(99,1000,replace=T),dbh=sample(100,1000,replace=T))
coordinates(survey) <- ~x+y
# Define 10 x 10 subplots
grid10 <- SpatialGrid(GridTopology(c(5,5),c(10,10),c(10,10)))
survey$subplot <- over(survey,grid10)
# Export needed variables to workers
clusterExport(cl = cl, varlist = c("survey", "radius"))
# this function is your former for() loop, increase X = 1:100 to suit your needs
res <- parSapplyLB(cl = cl, X = 1:100, FUN = function(i, survey) {
# B. Now find fraction of tree crown overlapping each subplot
# Extract centroïd of each the ith cell
centro <- expand.grid(x=seq(5,95,10),y=seq(5,95,10))[i,]
corner <- data.frame(x=c(centro$x-5,centro$x+5,centro$x+5,centro$x-5),y=c(centro$y-5,centro$y-5,centro$y+5,centro$y+5))
# Find trees in a max radius (define above)
tem <- survey[which((centro$x-survey$x)^2+(centro$y-survey$y)^2<=radius^2),]
# Define tree crown based on tree diameter
tem$crownr <- exp(-.438+.658*log(tem$dbh/10)) # crown radius in meter
# Compute the distance from each tree to cell's borders
pDist <- vector()
my.poly <- SpatialPolygons(list(Polygons(list(Polygon(corner)),1)))
for (k in 1:nrow(tem)) {
pDist[k] <- gDistance(tem[k,], my.poly)
}
# Keeps only trees whose crown is lower than the above distance (=overlap)
overlap.trees <- tem[which(pDist<=tem$crownr),]
overlap.trees$crowna <-overlap.trees$crownr^2*pi # compute crown area
# Creat polygons from overlapping crowns
c1 <- circles(coordinates(overlap.trees),overlap.trees$crownr, lonlat=F, dissolve=F)
crown <- polygons(c1)
Crown <- SpatialPolygonsDataFrame(polygons(c1),data=data.frame(dbh=overlap.trees$dbh,crown.area=overlap.trees$crowna))
# Create a fine grid points to retrieve the fraction of overlapping crowns
max.dist <- ceiling(sqrt(which.max((centro$x - overlap.trees$x)^2 + (centro$y - overlap.trees$y)^2))) # max distance to narrow search
finegrid <- as.data.frame(expand.grid(x=seq(centro$x-max.dist,centro$x+max.dist,1),y=seq(centro$y-max.dist,centro$y+max.dist,1)))
coordinates(finegrid) <- ~ x+y
A <- extract(Crown,finegrid)
Crown#data$ID <- seq(1,length(crown),1)
B <- as.data.frame(table(A$poly.ID))
if (nrow(B)>0) {
B <- merge(B,Crown#data,by.x="Var1",by.y="ID",all.x=T)
B$overlap <- B$Freq/B$crown.area
B$overlap[B$overlap>1] <- 1
res <- sum(B$overlap) } else {
res <- 0 }
}, survey = survey)
stopCluster(cl = cl)
})
For those interested in trees, crown & biomass, I have been suggested a faster way of computing crown-distributed biomass in forest stand (thanks to H. Muller-Landau). One need to think on a stem-by-stem basis and 1x1m grid. The code hereafter takes 6 min to run vs. couple of hours for the previous code. Hope of interest!
# Create a fake 1-ha forest stand:
trees <- data.frame(x=sample(99.5,1000,replace=T),y=sample(99.5,1000,replace=T),dbh=sample(100,1000,replace=T))
# Create a 1x1m cell matrix where to store the result
cdagb=matrix(0,nrow=100,ncol=100)
#Calculate the crownradius for every stem (fake proportion)
trees$crownradius = 2*trees$dbh^0.5
#Calculate the index of the 1x1 m quadrat in which the tree stem falls
trees$quadx=ceiling(trees$x)
trees$quady=ceiling(trees$y)
# Run the algo stem-by-stem
for (i in 1:nrow(trees)) {
# xdisp and ydisp are the integer cell position differences in x and y that should be checked to see if the crown of the focal tree overlaps
xdisp=seq(ceiling(trees$quadx[i]-trees$crownradius[i]),floor((trees$quadx[i]+trees$crownradius[i])),1)
xdisp[xdisp>=1000] <- 1000 +(1000 - xdisp[xdisp>=1000]) # mirror values on edges onto adjacent cells
xdisp[xdisp<1] <- -xdisp[xdisp<1] + 1 # avoid XY to be 0
ydisp=seq(ceiling(trees$quady[i]-trees$crownradius[i]),floor((trees$quady[i]+trees$crownradius[i])),1)
ydisp[ydisp>=500] <- 500 +(500 - ydisp[ydisp>=500])
ydisp[ydisp<1] <- -ydisp[ydisp<1] + 1
# Calculate the square of the x and y distances from the focal tree to the center of each of these cells
xdistsqr=(xdisp-trees$quadx[i])^2
ydistsqr=(ydisp-trees$quady[i])^2
nx=length(xdisp)
ny=length(ydisp)
# Calculate the distance from the center of each cell in the neighborhood to the focal tree
distmatrix=matrix(sqrt(rep(xdistsqr,each=ny)+rep(ydistsqr,nx)),nrow=nx,ncol=ny)
# includes only trees that overlap the grid cells
incmatrix=ifelse(distmatrix<trees$crownradius[i],1,0)
ncells=sum(incmatrix)
agbpercell=trees$agb[i]/ncells # divide the biomass by cell
addagbmatrix=incmatrix*agbpercell # relloacte biomass by cell
# add the biomass divided in square meter to each grid point
cdagb[xdisp,ydisp] = cdagb[xdisp,ydisp] + addagbmatrix
}

Netting a weighted matrix in igraph (R)

Is there a simple way that I'm missing to create a netted version of a simple, weighted, directed network in igraph? This is where mutual edges are replaced with a single edge carrying the difference between the weights and the direction is such that the weight is always positive. A simple example:
gGross <- graph_from_literal(A++B)
E(gGross)$weight <- c(12, 20)
gNet <- graph_from_literal(A+-B)
E(gNet)$weight <- c(8)
In matrix notation this would be
N_{ij} = (A_{ij} - A_{ji})_+
where the + means keep positive elements.
A good starting point would be something similar to which_mutual that gives the index of the return edge. I can imagine writing a long function in R but that seems likely to be slow.
I feel I can do better than this, but I do at least have a working method. Improvements most welcome. Edited to reflect Tamás' comments and allow use of any attribute
nettedGraph <- function(g, col="weight") {
if(!(col %in% edge_attr_names(g))) {
stop(col, " not and edge attribute")
}
# Get a matrix of mutual edges (by vertex ids)
me <- ends(g, E(g)[which_mutual(g)], names=FALSE)
# Only keep one of the mutual edges
me <- me[me[,1] < me[,2], ]
toDel <- vector(mode="integer", length=nrow(me))
for (i in 1:nrow(me)) {
# Get edge ids going each way
e1 <- get.edge.ids(g, c(me[i,1],me[i,2]))
e2 <- get.edge.ids(g, c(me[i,2],me[i,1]))
weightDiff <- edge_attr(g,col, e1) - edge_attr(g,col, e2)
if(weightDiff > 0) {
# Update the edge we're keeping
edge_attr(g,col, e1) <- weightDiff
# Delete the one we're not
toDel[i] <- e2
} else {
# Update the edge we're keeping
edge_attr(g,col, e2) <- -weightDiff
# Delete the one we're not
toDel[i] <- e1
}
}
# Now delete all the unneeded edges
g <- g - E(g)[toDel]
return(g)
}
set.seed(123)
g <- graph_from_literal(A-+B, B++C, A++C)
E(g)$weight <- round(runif(ecount(g),1,20))
weight0 <- E(g)$weight
gNet <- nettedGraph(g)

Using igraph, how to force curvature when arrows point in opposite directions

autocurve.edges does an amazing job of curving edges in igraph plots so that they don't overlap when they point in the same direction. However, when they point in opposite directions, no curvature is applied.
d <- data.frame(start=c("a","a","b","c"),end=c("b","b","c","b"))
graph <- graph.data.frame(d, directed=T)
plot(graph,
vertex.color="white")
The issue is for the arrows between b and c (or c and b).
Other than specifying curvature manually, any suggestions?
I would use the edge.curved option with the same seq call that autocurve.edges uses.
plot(graph,
vertex.color="white", edge.curved=seq(-0.5, 0.5, length = ecount(graph)))
EDIT:
As Étienne pointed out, this solution also curves edges for unique observations. The solution is then to modify the autocurve.edges function. This is my modified function called autocurve.edges2. Basically, it generates a vector, which curves only non-unique edges.
autocurve.edges2 <-function (graph, start = 0.5)
{
cm <- count.multiple(graph)
mut <-is.mutual(graph) #are connections mutual?
el <- apply(get.edgelist(graph, names = FALSE), 1, paste,
collapse = ":")
ord <- order(el)
res <- numeric(length(ord))
p <- 1
while (p <= length(res)) {
m <- cm[ord[p]]
mut.obs <-mut[ord[p]] #are the connections mutual for this point?
idx <- p:(p + m - 1)
if (m == 1 & mut.obs==FALSE) { #no mutual conn = no curve
r <- 0
}
else {
r <- seq(-start, start, length = m)
}
res[ord[idx]] <- r
p <- p + m
}
res
}
And here's the result when adding a single, non-mutual edge (C->D):
library(igraph)
d <- data.frame(start=c("a","a","b","c","c"),end=c("b","b","c","b","d"))
graph <- graph.data.frame(d, directed=T)
curves <-autocurve.edges2(graph)
plot(graph, vertex.color="white", edge.curved=curves)

Ordering 1:17 by perfect square pairs

There was an interesting question on R-help:
"Take the numbers one up to 17. Can you write them out in a line so that every pair of numbers that are next to each other, adds up to give a square number?"
My solution is below and not particularly special. I'm curious about a more elegant and/or robust solution. Maybe a solution that can take an arbitrary string of numbers and order them like this if possible?
sq.test <- function(a, b) {
## test for number pairs that sum to squares.
sqrt(sum(a, b)) == floor(sqrt(sum(a, b)))
}
ok.pairs <- function(n, vec) {
## given n as a member of vec,
## which other members of vec satisfiy sq.test
vec <- vec[vec!=n]
vec[sapply(vec, sq.test, b=n)]
}
grow.seq <- function(y) {
## given a starting point (y) and a pairs list (pl)
## grow the squaring sequence.
ly <- length(y)
if(ly == y[1]) return(y)
## this line is the one that breaks down on other number sets...
y <- c(y, max(pl[[y[ly]]][!pl[[y[ly]]] %in% y]))
y <- grow.seq(y)
return(y)
}
## start vector
x <- 1:17
## get list of possible pairs
pl <- lapply(x, ok.pairs, vec=x)
## pick start at max since few combinations there.
y <- max(x)
grow.seq(y)
You can use outer to compute the allowable pairs.
The resulting matrix is the adjacency matrix of a graph,
and you just want a Hamiltonian path on it.
# Allowable pairs form a graph
p <- outer(
1:17, 1:17,
function(u,v) round(sqrt(u + v),6) == floor(sqrt(u+v)) )
)
rownames(p) <- colnames(p) <- 1:17
image(p, col=c(0,1))
# Read the solution on the plot
library(igraph)
g <- graph.adjacency(p, "undirected")
V(g)$label <- V(g)$name
plot(g, layout=layout.fruchterman.reingold)

Resources