How to use the convex_hull() function in the loop? - r

In my previous question I have used the convex_hull() function. We have a graph, where all nodes have zero degree and (x,y) coordinats. We need to create graph as sequence of convex hulls. We can stop the loop if only just one node has zero degree.
My attempt is:
library(igraph)
######################################################################
set.seed(5)
n=15
g <- graph.empty(n)
xy <- cbind(runif(n), runif(n))
in_points <- V(g)[degree(g)==0]
repeat {
cp <- convex_hull(xy[in_points, ])$resverts+1
g <- as.undirected(add_edges(g, c(t(embed(cp, 2)), cp[1], cp[length(cp)])))
in_points <- V(g)[degree(g)==0]
if (length(in_points)=1) {break}
}
plot(g, vertex.size=10, layout=xy)
One can see in the repeate loop the node 3 used twice.
Expected result is:
Question. How are correctly use output of convex_hull() function for sequential adding edges in the loop?

You can iteratively use chull and add_edges. You just need to make sure that input to add_edges is of correct format and track which nodes are not already part of outer convex hull.
xy <- cbind(seq_len(n), xy)
while(nrow(xy) > 1){
current_hull <- chull(xy[,2], xy[,3])
current_hull <- c(current_hull, current_hull[[1]])
g <- add_edges(g, as.vector(t(embed(xy[,1][current_hull], 2)[,2:1])))
xy <- xy[-current_hull,,drop = FALSE]
}

Related

foreach doesn't change value of raster cell in R

I'm trying to simulate herding behavior in R.
Here's the code
library(raster)
library(sp)
library(foreach)
K=100
sig=0.2
G=0.3
x <- raster(ncol=2000,nrow=2000)
values(x) <- sign(rnorm(4000000,mean=0,sd=0.3))
y <- raster(ncol=2000,nrow=2000)
values(y) <- sign(rnorm(4000000,mean=0,sd=0.3))
#plot(x)
ei <- rnorm(4000000)
j=0
while(j < 30) {
for(i in 1:4000000){
ad <- adjacent(x,cell=c(i))[,2]
y[i] <- sign(K*sum(x[ad])+sig*ei[i]+G)
}
x <- y
plot(x)
j = j+1
}
The classic loop approach is too slow.
If I use a foreach loop instead of a classic for loop it doesn't change the values of y in every iteration.
I can't fix it at all.
Can someone please help about this?
Thank you
You have a dynamic model in which the output of each (time) step is input for the next step. It is not possible to do that in parallel. But that does not mean you cannot make the model run faster.
Looping over raster cells in R is always going to be slow, so we need to avoid that. Normally a problem like this could be solved with focal (see code a the bottom) --- but in this case it is difficult because you effectively use two rasters (x and ei) --- I will look at implementing multi-layer focal operations in the terra package.
Here is an approach with getFocalValues. It is much faster (and I use Sys.sleep to slow it down a bit).
library(raster)
set.seed(0)
x <- raster(ncol=200, nrow=200)
values(x) <- sign(rnorm(ncell(x),mean=0,sd=0.3))
y <- raster(x)
values(y) <- sign(rnorm(ncell(x),mean=0,sd=0.3))
ei <- rnorm(ncell(x))
K=100
sig=0.2
G=0.3
for (j in 1:29) {
# with large rasters, you may need to do the below in chunks
v <- getValuesFocal(x, 1, nrow(x), c(3,3))
# only keep the rook neighbors
v <- v[, c(2,4,6,8)]
v <- rowSums(v, na.rm=TRUE)
values(x) <- sign(K*v+sig*ei+G)
plot(x)
Sys.sleep(0.1)
}
This how you could use focal in similar cases
w <- matrix(c(0,1,0,1,0,1,0,1,0), 3, 3)
y <- focal(x, w, fun=function(i)sign(K*sum(i)+sig+G))
Also see the cellular automata examples in ?focal

Is there any way I can optimize this R code?

This is a code I'm trying to run in rstudio. I know the iterations are way too long. Is there any optimal/faster way to do this? I've been stuck for 4+ hours and it doesn't seem like finishing any time soon.
I'm trying to make a distance matrix between 415 cities and 3680126 monuments. To optimize, I am only comparing those monuments with cities which are present in the same country.
for(x in 1:3680126){
for(y in 1:415){
if(list2_cities$Country[y]==list1_POI$Country[x]){
distance_matrix [x,y] <- ({POI$Longitude[x]-cities$Longitude[y]}^2)+({POI$Latitude[x]-cities$Latitude[y]}^2)
}
else{
distance_matrix [x,y] <- 0
}
}
}
Maybe you can try distm from package geosphere
library(geosphere)
d <- distm(list1_POI[c("Longitude","Latitude")],list2_cities[c("Longitude","Latitude")])
m <- +(outer(list1_POI$Country,list2_cities$Country,`==`))
res <- d*m
where
the distm part gives the all paired distances between two cities
the outer part provides a mask such that values for non-matched cities are set to 0
If your desired matrix is sparse, here is another option
common <- intersect(list1_POI$Country,list2_cities$Country)
rl <- match(common,list1_POI$Country)
cl <- match(common,list2_cities$Country)
d <- diag(distm(list1_POI[rl,c("Longitude","Latitude")],list2_cities[cl,c("Longitude","Latitude")]))
res <- matrix(0,length(list1_POI$Country),length(list1_cities$Country))
res[cbind(rl,cl)] <- d
where you only need to locate the matched cities and calculate their distances.

Writing a graph and keeping the vertex names

I'm trying to create a graph and then write it with the function write.graph (package igraph). Hence, I create the distance matrix
require(vegan)
data(dune)
dis <- vegdist(dune)
and then I define explicitly the rownames:
x <- c("dune1")
for (i in 1: 20){
n <- paste("dune", i, sep="")
x <- append(x, n)
}
rownames(dune) <- x
With the following procedure I create an undirected graph through the minimum spanning tree algorithm.
gg <- graph.adjacency(as.matrix(dis), weighted=TRUE)
gg_mst <- as.undirected(mst(gg))
At this point I want to represent it such to open it with pajek. In order to do that I use write.graph:
write.graph(gg_mst, "graph.net", format="pajek")
obtaining the following graph:
The names are lost!
Nevertheless, if i use the same function using a different format:
write.graph(gg_mst, "graph.txt", format="ncol")
I obtain a file keeping the rownames:
dune1 dune3 0.448275862068966
dune2 dune3 0.341463414634146
dune2 dune10 0.294117647058824
dune3 dune4 0.270588235294118
... ... ...
Is it a bug related to the use of write.graph with the format "pajek"?
You need to assign id attributes of the vertices in order to be able to have the vertices' names shown in a pajek viewer such as this one http://vlado.fmf.uni-lj.si/pub%20/networks/pajek/default.htm or gephi. Need to modify a few lines of your code like the following:
dis <- vegdist(dune)
x <- c()
for (i in 1: 20){
n <- paste("dune", i, sep="")
x <- append(x, n)
}
gg <- graph.adjacency(as.matrix(dis), weighted=TRUE)
gg_mst <- as.undirected(mst(gg))
V(gg_mst)$id <- x # assign the ids
write.graph(gg_mst, "graph.net", format="pajek")
Opening with pajek shows the vertex ids correctly.

R-raster extraction along SpatialLine: relate extracted values to actual distance

When extracting values of a raster along a SpatialLine in R, how to relate these values to the actual distance along this line?
Suppose I want to extract the value of the R logo along the following line:
library(raster)
r <- raster(system.file("external/rlogo.grd", package="raster"))
x=c(5, 95)
y=c(20, 50)
line = SpatialLines(list(Lines(Line(cbind(x,y)), ID="a")))
plot(r)
plot(line, add=TRUE)
I can extract the values and plot them - but how to replace the x values (1:length(vals) below) by the actual distance (starting e.g. at 0 from the left side of the line)?
vals <- extract(r, line)[[1]]
plot(1:length(vals), vals, type='o')
I could combine the extraction of the cells with xyFromCell to get the coordinates of the extracted cells as suggested here, but it is not clear to me how to go further.
I'm not sure what you're exactly asking, but if you looking for distances between the leftmost coordinate of the line segment and the centres of the cells which the line passes through, then you can find the distances like this:
x <- extract(r, l, cellnumbers=TRUE)[[1]]
xy <- xyFromCell(r, x[,1]) # get cell coordinates where the line passes
start <- xy[which.min(xy[,1]),] # leftmost coordinate of the line
d <- apply(xy, 1, function(x, start) sqrt(sum((x-start)^2)), start=start) # find distances between the line segment start and the cells
plot(1:length(d), d, type='o')
Here is a solution (partly on the basis of #jvj's input) through an attempt to compute the orthogonal projections of the cell centres provided by raster::extract on the line and then compute the distances along the line.
(This is an R-beginners script, likely easily improvable, but seems to work (and is of course only for rasters with projection respecting distances))
vals <- extract(r, line, cellnumbers=TRUE)[[1]]
cellsxy <- xyFromCell(r, vals[,1]) # coordinates of intersected cells (likely not ON the line)
linexy = spsample(line, 1000, "regular") # get the line as points
linexy <- matrix(cbind(linexy$x, linexy$y), ncol=2) # easier than Spatial object for later
orthoproj <- c() # to store the orthogonal projections of cells centres on the line
for (i in 1:nrow(cellsxy)) {
xypt = cellsxy[i,]
min.index <- which.min(spDistsN1(linexy, xypt))
orthopt <- linexy[min.index, ] # orthogonal projections = smaller distance
orthoproj <- c(orthoproj, c(orthopt[1], orthopt[2]))
}
orthoproj <- matrix(orthoproj, ncol=2, byrow=T)
orthoproj <- data.frame(x=orthoproj[,1], y=orthoproj[,2])
orthoproj <- orthoproj[order(orthoproj[,1]),] # reorder with increasing distance
orthoproj <- data.frame(x=orthoproj$x, y=orthoproj$y)
start <- linexy[which.min(linexy[,1]),] # leftmost coordinate of the line
dists <- apply(orthoproj, 1,
function(xy, start) sqrt(sum((xy-start)^2)),
start=start) # distances between 'start' and the orthogonal projections
plot(dists, rev(vals[,2]), type='o') # !! beware: order of 'vals' and 'dists'
# depending on the order in which cellnumbers are returned
# in raster::extract and the shape of your line !!

Choose n most evenly spread points across point dataset in R

Given a set of points, I am trying to select a subset of n points that are most evenly distributed across this set of points. In other words, I am trying to thin out the dataset while still evenly sampling across space.
So far, I have the following, but this approach likely won't do well with larger datasets. Maybe there is a more intelligent way to choose the subset of points in the first place...
The following code randomly chooses a subset of the points, and seeks to minimize the distance between the points within this subset and the points outside of this subset.
Suggestions appreciated!
evenSubset <- function(xy, n) {
bestdist <- NA
bestSet <- NA
alldist <- as.matrix(dist(xy))
diag(alldist) <- NA
alldist[upper.tri(alldist)] <- NA
for (i in 1:1000){
subset <- sample(1:nrow(xy),n)
subdists <- alldist[subset,-subset]
distsum <- sum(subdists,na.rm=T)
if (distsum < bestdist | is.na(bestdist)) {
bestdist <- distsum
bestSet <- subset
}
}
return(xy[bestSet,])
}
xy2 <- evenSubset(xy=cbind(rnorm(1000),rnorm(1000)), n=20)
plot(xy)
points(xy2,col='blue',cex=1.5,pch=20)
Following #Spacedman's suggestion, I used voronoi tesselation to identify and drop those points that were closest to other points.
Here, the percentage of points to drop is given to the function. This appears to work quite well, except for the fact that it is slow with large datasets.
library(tripack)
voronoiFilter <- function(occ,drop) {
n <- round(x=(nrow(occ) * drop),digits=0)
subset <- occ
dropped <- vector()
for (i in 1:n) {
v <- voronoi.mosaic(x=subset[,'Longitude'],y=subset[,'Latitude'],duplicate='error')
info <- cells(v)
areas <- unlist(lapply(info,function(x) x$area))
smallest <- which(areas == min(areas,na.rm=TRUE))
dropped <- c(dropped,which(paste(occ[,'Longitude'],occ[,'Latitude'],sep='_') == paste(subset[smallest,'Longitude'],subset[smallest,'Latitude'],sep='_')))
subset <- subset[-smallest,]
}
return(occ[-dropped,])
}
xy <- cbind(rnorm(500),rnorm(500))
colnames(xy) <- c('Longitude','Latitude')
xy2 <- voronoiFilter(xy, drop=0.7)
plot(xy)
points(xy2,col='blue',cex=1.5,pch=20)

Resources