Grouping polygons and neighbours based on distance criteria - r

I would like to group polygons together based on a distance criteria:
Any polygon within a certain distance (1200 metres or less) of an origin polygon are grouped together
If other polygons are within the same distance (1200 metres or less) of these 'neighbouring' polygons they are added to this same group
The process for this group continues until no further polygons are added (because they are all further than 1200 metres away).
The next ungrouped polygon is selected and the process repeats for a new grouping
Polygons with no neighbour within 1200 metres are assigned to be in a group by themselves
A polygon should only belong to one group
The final output would be a table with the single polygon ID (UID) and the group ID it belongs to (GrpID) and the average distance between the polygons in that group
I am sure a distance matrix with st_distance means this is possible, but I'm just not getting it.
library(sf)
library(dplyr)
download.file("https://drive.google.com/uc?export=download&id=1-I4F2NYvFWkNqy7ASFNxnyrwr_wT0lGF" , destfile="ProximityAreas.zip")
unzip("ProximityAreas.zip")
Proximity_Areas <- st_read("Proximity_Areas.gpkg")
Dist_Matrix <- Proximity_Areas %>%
st_distance(. , by_element = FALSE)

This function uses sf and igraph package functions:
group_polygons <- function(polys, distance){
## get distance matrix
dist_matrix = st_distance(polys, by_element = FALSE)
## this object has units, so get rid of them:
class(dist_matrix) = NULL
## make a binary 0/1 matrix where 1 if two polys are inside the distance threshold
connected = dist_matrix < distance
## make a graph
g = igraph::graph_from_adjacency_matrix(connected)
return(components(g)$membership)
}
You can use it like this:
Proximity_Areas$Group = group_polygons(Proximity_Areas, 1200)
Let's make a category for mapping:
Proximity_Areas$FGroup = factor(Proximity_Areas$Group)
plot(Proximity_Areas[,"FGroup"])
There are three clusters here, the big one, one with 3 regions on the right, and one singleton region on the left. All the orange regions could be connected together by bridges that are less than 1200m long.
If you want to compute the average distance without re-computing the distance matrix, you can do this within the function by subsetting according to the membership value from the components function. The key here is computing the binary 0/1 matrix and using igraph to compute the connectivity of that as an adjacency matrix.

Related

Calculating minimum and maximum distance between points in meters, R

I have a data frame with two columns including coordinates in meters (about 45.000 locations).
What I want to do is to calculate the minimum and maximum distances between the locations. I have tried to calculate the minimum distance as follow:
library(sf)
xco<-c(320963.6,421813.6,315423.6,405733.6,365603.6)
yco<-c(172137.7,165287.7,232197.7,138917.7,183697.7)
mydata<-data.frame(xco,yco)
mydata_sf<-st_as_sf(mydata, coords = c("coords.x1", "coords.x2"), crs = 2100)
dist_df<-as.data.frame(st_distance(mydata_sf))
min(dist_df[dist_df> 0])
However, that gives me a value which I can not see in my data.
Can anyone suggest a faster and better way to do that?
Thank you!
You have an error in your code. The fifth line should be
mydata_sf <- st_as_sf(mydata, coords = c("xco", "yco"), crs = 2100)
Then
dist_df <- as.data.frame(st_distance(mydata_sf))
Gives you a labeled distance matrix. You just need the upper or lower triangle:
range(dist_df[lower.tri(dist_df)])
# [1] 30885.97 129834.72
The first value is the minimum distance and the second is the maximum.

Create centroid tag within a radius for longitude-latitude data in R

I have longitude-latitude data (220000 observation [duplicates included]) and I want to create two new columns defining the number of points within 5km and 10 km radius.
My problem is the vector size, meaning that the algorithm does not run for this size of data.
Here is my code to create the first column, radius_5km.
long <- runif(n=220000,min=-153,max=174)
lat <- runif(n=220000, min=-42,max=67)
df <- data.frame(long,lat)
library(geosphere)
dfgesphe <- cbind(df,Radius_5km=rowSums(distm(df[,c("long","lat")],
fun = distHaversine) / 1000 <= 5)) # number of points within distance 5 km

Calculate nearest neighbor distance between data points within a previously identified Kmeans cluster in R

I would like to use nndist.ppx() to calculate distance to nearest neighbor within a given Kmeans cluster (df$cluster is as.factor). The clusters are first identified using kmeans(df,2), and I then cbind the cluster vector to original df, and then convert it to class ppx using ppx(df,simplify=F) because df is 3D (xyz) and nndist() requires class ppx.
The problem is that I can only get nndist.ppx to calculate the distance to all the points in the df irregardless of cluster. This question is close to what I'm looking for in that distance is being calculated with a restraint.
Start with practice data which is a list with 2 elements of class df
library(spatstat)
library(stats)
df_a1 <- data.frame(X = c(9,9,10,10,17,20,22,25,40,40,42),
Y=c(10,10,11,11,105,106,108,109,112,113,114), Z=c(1,1,1,1,3,4,4,6,8,8,8))
df_a2 <- data.frame(X = c(9,9,10,10,15,22,26,30,40,40,42),
Y=c(10,10,11,11,105,106,108,109,112,113,114), Z=c(1,1,1,1,5,5,4,5,7,7,8))
list_a <- list(df_a1,df_a2)
df_a_list_names<-c("control", "variable")
Run kmeans clustering:
Here is my Kmeans fxn which also cbinds the Kmeans cluster vector to the original df. I then lapply the kmeans_fxn over list of dfs. The output is stored in a new list.
kmeans_fxn<-function(x){
kmeans(x,(3))->results
results$cluster->cluster
cluster->x$cluster
as.factor(x$cluster)->x$cluster
return(x)
}
lapply(list_a, kmeans_fxn)->kmean_results_list
Calculate distance of nearest neighbor:
Here is fxn I wrote to calculate distance between each data point and its top 2 nearest neighbors. I then lapply the fxn to previously created list
distance_fxn<-function(x){
x<-ppx(x, simplify=F)->df.ppx
nndist.ppx(df.ppx,k=2)->x
as.data.frame(x)->x
return(x)
}
lapply(kmean_results_list, distance_fxn)->nearest_list
The output is distance to nearest neighbor within entire df irregardless of cluster (I repeated without cluster column and output was the same...not shown).
Also, I tried this
kmeans_results_list[[1]]->fob
ppx(fob, simplify=F)->fob.ppx
by(fob.ppx[[1]], cluster, function(x) nndist.ppx(fob.ppx, k=2))
and this but neither worked
by(fob.ppx, fob.ppx[[1]], function(x) nndist.ppx(fob.ppx, k=2))
Instead of treating the cluster label as a coordinate, treat it as a mark. Use as.ppp to convert your data frame to a two-dimensional point pattern (class ppp) with categorical marks. Then divide this pattern X into a list of patterns using Y <-split(X). Then compute nearest neighbour distances within each cluster by D <- lapply(Y, nndist). If you want the distances in their original order use unsplit(D, marks(X)).

How to create correct spatial lag variables for a raster in R?

lag.listw creates incorrect spatial lag values when I use the function spdep::cell2nb. I have a raster file and want to create a new raster where each cell has the average value of its neighboring cells (spatial lag value).
The code below creates
a new raster from scratch and
calculates the neighbor matrix with cell2nb.
nb2listw constructs the weights that accord to each neighbor value.
lag.listw creates the vector of neighbor values
Finally I use this vector to create a new raster.
Code:
library(raster)
library(spdep)
##raster
r<-raster(nrows=7, ncols=8)
##raster values
v<-rep(0,ncell(r))
i<-sample(1:ncell(r),1)
v[i]<-1
values(r)<-v
plot(r)
##neighbor values
#neighbor list
nb<-cell2nb(nrow=nrow(r),ncol=ncol(r),type="queen")
#spatial weights matrix
nb.w<-nb2listw(nb,style="W", zero.policy=T)
#lagged values
nb.v<-lag.listw(nb.w,values(r),zero.policy=T,NAOK=T)
##new raster
nb.r<-r
values(nb.r)<-nb.v
plot(nb.r)
The first raster looks like:
The new raster with neighbor values is:
Comparing both images it becomes clear that this method values are misplaced and wrong.
The above code works only if the given raster/ cell-matrix has equal numbers of rows and columns. Test:
##raster
r<-raster(nrows=8, ncols=8)
##raster values
v<-rep(0,ncell(r))
i<-sample(1:ncell(r),1)
v[i]<-1
values(r)<-v
plot(r)
##neighbor values
#neighbor list
nb<-cell2nb(nrow=nrow(r),ncol=ncol(r),type="queen")
#spatial weights matrix
nb.w<-nb2listw(nb,style="W", zero.policy=T)
#lagged values
nb.v<-lag.listw(nb.w,values(r),zero.policy=T,NAOK=T)
##new raster
nb.r<-r
values(nb.r)<-nb.v
plot(nb.r)
Use the focal function from the raster package itself. It statistics based on the neighboring cell values and the own cell value. To exclude the own cell value from this calculation you have to 1) attribute a zero weight to it and 2) adapt the mean function to have one observation less.
##create base raster
r<-raster(nrows=7, ncols=8)
extent(r)<-c(-60,60,-50,50) #avoid touching cells at the west and south edges
r[]<-0
r[4,5]<-1 #value at the upper edge
r[1,3]<-1 #value at the left edge
r[5,1]<-1 #value at the center
plot(r)
the weight matrix must have a zero value for the own value:
nb.w<-matrix(c(1,1,1,1,0,1,1,1,1),ncol=3)
To take the avereage of all neighboring cells (withoug the own cell value) you can create your own funciton:
mean.B.style<-function(x){sum(x,na.rm=T)/(ncell(nb.w)-1)}
# sum of all values devided by 8 (nr. of neighbors)
# B style, in reference to the spdep::nb2listw function
To take into account, that edges have fewer neighbors you can adjust the weights with:
mean.W.style<-function(x){sum(x,na.rm=T)/(length(x[!is.na(x)])-1)}
# W style, in reference to the spdep::nb2listw function
With either of these functions you can now create a new raste containing the spatial lags:
nb.r<-focal(r,nb.w,pad=T,NAonly=F,fun=mean.B.style)
plot(nb.r)
base raster plot
spatial lag raster plot

DistanceFromPoints with multiple XY coordinates

I'm trying to use distanceFromPoints function in raster package as:
distanceFromPoints(object,xy,...)
Where, object is raster and xy is matrix of x and y coordinates
Now, if my raster has, for example, 1000 cells and xy represents one point, I get 1000 values representing distances between xy and each raster cell. my problem is when xy has multiple coordinates, e.g., 10 points. the function description indicates that xy can be multiple points but when I run this function with multiple XY points, I still get only 1000 values while I'm expecting 1000 values for each coordinate in XY. How does this work?
Thanks!
using distanceFromPoints on multiple points gives a single value for each raster cell, which is the distance to the nearest point to that cell.
To create raster layers giving the distance to each point separately, you can use apply
a reproducible example:
r = raster(matrix(nrow = 10, ncol = 10))
p = data.frame(x=runif(5), y=runif(5))
dp = apply(p, 1, function(p) distanceFromPoints(r,p))
This gives a list of raster layers, each having the distance to one point
# for example, 1st raster in the list has the distance to the 1st point
plot(dp[[1]])
points(p[1,])
For convenience, you can convert this list into a raster stack
st = stack(dp)
plot(st)
A final word of caution:
It should be noted that the raster objects thus created do not really contain any more information than the list of points from which they are generated. As such, they are a computationally- and memory-expensive way to store that information. I can't easily think of any situation in which this would be a sensible way to solve a specific question. Therefore, it may be worth thinking again about the reasons you want these raster layers, and asking whether there may be a more efficient way to solve you overall problem.

Resources