Minimum distance between two sets of co-ordinates - r

I have two data frames of different sizes containing geocodes. The first (df) has 12,000 observations and the second (schools) 3,000.
The first contains geocodes for properties in a country and the second for schools in the country.
I want to find the distance of the nearest school for each property. Using the geosphere package I'm currently working with the following:
library(geosphere)
for(i in 1:length(df$longitude)){
df$dist2[i] <- distm(c(schools[1, 3], schools[1, 2]), c(df$longitude[i], df$latitude[i]), fun = distHaversine) *0.001
}
where schools[, 3] and schools[, 2] are the longitude and latitude columns of that data frame respectively.
The above calculates the distance (in km) between all observations in df and the first school in schools.
I want to calculate the distance between each observation and all schools, saving only the smallest distance as that value for df$dist2[i].

In the following example, I make up the longitude/latitude data on the points and the school.
library(tidyverse)
library(geosphere)
df_points <- data.frame(lon = rnorm(10, mean =4, sd = 0.5), lat = rnorm(10, mean = 50, sd= 0.1))
df_schools <- data.frame(lon = rnorm( 3, mean =4, sd = 0.5), lat = rnorm( 3, mean = 50, sd= 0.1))
distm(df_points, df_schools, fun = distHaversine ) %>%
as.data.frame() %>%
rownames_to_column(var = "point_id") %>%
mutate(point_id = as.numeric(point_id)) %>%
gather(key = school, value = distance, -point_id) %>%
group_by(point_id) %>%
summarise(smalles_distance = min(distance))

Here is an approach using sp class objects. You can coerce data.frame objects to SpatialPointsDataFrame objects using something along the lines of: coordinates(x) <- ~lon+lat The idea here is to derive a distance matrix between the two point feature classes and then pull the distance and ID based on the column name (assigned from the school data). This not only returns the distance but also a unique identifier for each school making it easy to query the actual closest school to any given property feature.
First, add the required libraries and create some example data.
library(sp)
library(raster)
e <- as(raster::extent(-180, 180, -90, 90), "SpatialPolygons")
properties <- spsample(e, 1000, type="random")
proj4string(properties) <- "+proj=longlat +ellps=WGS84"
schools <- spsample(e, 100, type="random")
proj4string(schools) <- "+proj=longlat +ellps=WGS84"
schools$ids <- paste0("school", 1:length(schools))
Now, we can create the distance matrix, assign the diagonal to NA and add a unique identifier from schools to the column names of the matrix.
d <- spDists(x = properties, y = schools, longlat = TRUE)
diag(d) <- NA
colnames(d) <- schools$ids
There are certainly more elegant ways to do this but, for simplicity sake we will use a for loop to populate two vectors representing distance and ID's. We use which.min to pull the index for the minimum distance at row i. The iterator is based on the matrix rows because they represent the property features.
sdist <- rep(NA, nrow(d))
sid <- rep(NA, nrow(d))
for(i in 1:nrow(d)) {
srow <- d[i,]
sdist[i] <- srow[which.min(srow)]
sid[i] <- names(srow)[which.min(srow)]
}
We can then assign the resulting vectors to the properties SpatialPointsDataFrame. Now we have columns in the #data slot data.frame that represent distance to nearest school as well as the school ID's.
properties$school <- sid
properties$dist <- sdist
Here we can plot the results.
par(mfrow=c(2,1))
plot(properties, pch=19, cex=0.5)
plot(schools, pch=19, col="red", add=TRUE)
plot(e, add=TRUE)
title("random properties (black) and schools (red)", cex=0.5)
plot(properties, col="white")
plot(properties[1,], pch=19, cex=2, add=TRUE)
plot(schools[which(schools$ids %in% properties[1,]$school),],
pch=19, cex=2, col="red", add=TRUE)
plot(e, add=TRUE)
title("Property 1 (black) and closest school (red)", cex=0.5)
sidx <- which(schools$ids %in% properties[1,]$school)
text(coordinates(schools[sidx,]),
label = schools[sidx,]$ids, col="blue", cex=1)

Related

How to calculate the distance of raster to the nearest polygon?

I have an empty raster file (r1, Rasterlayer) and I want to calculate for each of the non-NA cells, the euclidian distance to the nearest polygons (S1, SpatialPolygonsDataFrame). Both r1 and S1 are projected in utm. Is there any R package that can do that?
You can first rasterize, and then use distance.
library(raster)
p1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60), c(-180,-20))
pols <- spPolygons(p1)
r <- raster(ncol=90, nrow=45)
r <- rasterize(pols, r)
d <- distance(r)
(The strange looking pattern is because the raster has lon/lat coordinates that wrap around the date line)
You can use distance function from library(raster), this function returns the distance, for all cells that are NA, to the nearest cell that is not NA:
install.packages('raster')
library(raster)
distance(r1, S1, filename = 'raster_distance', doEdge= TRUE)
this is something i would like to find in a one off function also - similar to raster::distance but telling you which polygon each cell is closest to, via an ID field or some such.
Annyhoo, for now;
# make a dummy raster with some dummy data.
r <- raster(xmn = 50, xmx = 80, ymn = -7, ymx = 10, res=0.5)
r[] <- sample(1:10,ncell(r), replace=T)
# make some dummy polygons that you want to know the min distance to on your raster
p1 <- rbind(c(55,-5), c(65,-5), c(65, 0), c(55,0), c(55,-5))
p2 <- rbind(c(70,-5), c(75,-5), c(75, 2), c(70,2), c(70,-5))
p3 <- rbind(c(55,4), c(75,4), c(75, 7), c(55,7), c(55,4))
pols <- st_as_sf(spPolygons(p1,p2,p3))
pols$ID <- 1:3
# let's look
plot(r)
plot(pols$geometry,add=T)
# to make a raster layer of min euclidean distance to a polygon (any),
# loop through the sf ID's and `fasterize` (v. quick) and run the
# raster::distance function to find out the distance from itself
st <- stack()
for(i in pols$ID){
r_pol <- fasterize(pols[pols$ID==i,], r, field="ID")
rd <- distance(r_pol)
st <- stack(st,rd)
}
# To produce a simple min distance from any polygon, just select the min from the stack
r_min_dis_any_pol <- min(st)
plot(r_min_dis_any_pol)
# to go a bit further and let you know to which polygon the min distance refers to
r_min_which_pol <- which.min(st)
plot(r_min_which_pol)
# if you were interested in a zonal sum of the raster data, dependent on its nearest polygon
zonal_sum <- zonal(r, r_min_which_pol, fun="sum")
identical(sum(zonal_sum[,2]), cellStats(r,sum))
[1] TRUE

How to efficiently count the number of spatial points within a certain distance around raster cells in R?

I would like to count the number of spatial points (of a SpatialPointsDataFrame object) within a certain distance to every cell of a RasterLayer in R. The resulting value should replace the original value of that particular raster cell.
Here is a reproducible example:
# load library
library(raster)
# generate raster
ras <- raster(nrow=18, ncol=36)
values(ras) <- NA
# create SpatialPointsDataFrame
x <- c(-160,-155,-153,-150, 30, -45, -44, -42, -40, 100, 110, 130)
y <- c(-75,-73,-71,-60, 0, 30, 35, 40, 41, 10, -10, 60)
z <- c(seq(1, 12, 1))
df <- data.frame(x,y,z)
spdf <- SpatialPointsDataFrame(coords=df[,c(1,2)],
data=as.data.frame(df[,3]),
proj4string=CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
# visualize
plot(ras)
plot(spdf, add=T)
# loop over all raster cells
for(r in 1:nrow(ras)){
for(c in 1:ncol(ras)){
# duplicate raster for subsequent modification
ras_x <- ras
# define cell for which to count the number of surrounding points
ras_x[r,c] <- nrow(spdf) # some value that is impossible to be true, this is only a temporary placeholder
ras_x[ras_x != nrow(spdf)] <- NA
# convert raster cell to spatial point
spatial_point <- rasterToPoints(ras_x, spatial=T)
# calculate distance around raster cell
ras_dist <- distanceFromPoints(ras_x, spatial_point)
ras_dist <- ras_dist / 1000000 # scale values
# define circular zone by setting distance threshold (raster only with values 1 or NA)
ras_dist[ras_dist > 2] <- NA
ras_dist[ras_dist <= 2] <- 1
# create empty vector to count number of spatial points located within zone around the particular raster cell
empty_vec <- c()
# loop to check which value every point of SpatialPointsDataFrame corresponds to
for (i in 1:nrow(spdf)){
point <- extract(ras_dist, spdf[i,])
empty_vec[i] <- point
}
# sum of resulting vector is the number of points within surrounding zone around predefined raster cell
val <- sum(na.omit(empty_vec))
val
ras[r,c] <- val
# print for progress monitoring
print(paste0("sum of points within radius around cell row ", r, " and column ", c, " is ", val))
print(paste0("finished ", r, " out of ", nrow(ras)))
print(paste0("finished ", c, " out of ", ncol(ras)))
# both plots are just for visualization and progress monitoring
plot(ras)
plot(spdf, add=T)
}
}
plot(ras)
plot(spdf, add=T)
The resulting raster is exactly what I want but my way of checking the underlying raster values for each point of the SpatialPointsDataFrame seems inefficient. My real data consists of a RasterLayer with 2160, 4320, 9331200 (nrow, ncol, ncell) and a SpatialPointsDataFrame with 2664 features.
Is there a way to generate the raster of simply counting how many points are located within a certain distance around every raster cell more efficiently?
If you can work with projected coordinates this can be done fairly easily with the spatstat package.
This requires you to project your points (and grid) with e.g. sf::st_transform() and will not work
on a global scale.
Load spatstat and make 2000 random points to test against:
library(spatstat)
W <- square(1)
set.seed(42)
Y <- runifpoint(2000) # Random points in the unit square
plot(Y, main = "Random points in unit square")
Make 3000x3000 grid of points (9 million points):
xy <- gridcenters(W, 3000, 3000) # Grid of points in the unit square
X <- ppp(xy$x, xy$y, window = W, check = FALSE, checkdup = FALSE)
For each of the 9 million grid points count the number of other points within
radius 0.01 (timed on my resonably fast laptop with 16GB RAM):
system.time(counts <- crosspaircounts(X, Y, r = .01))
#> user system elapsed
#> 1.700 0.228 1.928
Convert to spatstat’s im-format (raster type format – can be converted with maptools) and plot:
rslt <- as.im(data.frame(x = xy$x, y = xy$y, counts))
plot(rslt, main = "Point counts in raster cells")
The points overlayed on the counts shows that we have done the right thing:
plot(rslt, main = "Point counts in raster cells")
plot(Y, add = TRUE, col = rgb(1,1,1,.7), pch = 3)
I’m sure you can also do something elegant and fast with raster, but I’m not the right one to ask there.

R - Best Way to Perform Geospatial Calculations

I am working on a project where I pull crime data from an API, and essentially calculate the density of crime per predefined grid unit. I do this now by putting lat and lon into a data.frame and then calculating the count of points within a radius of a point center. This is computationally expensive as there are thousands of points in the predefined grid and thousands of crime points.
I'm wondering if there is a better way to calculate crime density; I've heard that raster may be valuable?
Some sample data:
# Create a predefined grid of coordinates
predef.grid <- data.frame(lat = seq(from = 2.0, to = 4.0, by = 0.1),lon = seq(from = 19.0, to = 21.0, by = 0.1))
predef.grid <- expand.grid(predef.grid)
# Create random sample of crime incidents
crime.incidents <- data.frame(lat = rnorm(10, 4),lon = rnorm(10,20))
crime.incidents <- expand.grid(mydata)
# Need to count number of crimes within radius of every point in predef.grid
Thanks!
# Need to count number of crimes within radius of every point in
library(raster)
library(sp)
# predfined raster
predef.grid <- raster(xmn=2, # xmin
ymn=4, # ymin
xmx=19, # xmax
ymx=21, # ymax
res=1, # spatial resolution
vals = 1) # cell value
plot(predef.grid)
# Create random sample of crime incidents
# points should be a Spatial object of some form, point, etc.
crime.incidents <- spsample(x = as(extent(predef.grid), 'SpatialPolygons'),
n = 100,
type = 'random')
# plot points over grid
points(crime.incidents, pch = 20)
# count points per cell
density <- rasterize(crime.incidents, predef.grid, fun='count')
# plot the density
plot(density)

Create square polygons from single centre coordinates and area in R

I am having issues plotting true to geographic extent pixels in R. the files come with a list of daily single coordinates and pixel size (area). There is also a Z element separate from this. The data structure looks this way:
X <- c(1,3,6,7)
Y <- c(3,2,7,8)
Z <- c(38,23,12,12)
Area <- c(32,23,45,67)
The X and Y are in degrees longitude and latitude while the area is in square kilometres. I create the point features easily using:
library(sp)
A <- cbind(X,Y,Z,Area)
B <- SpatialPoints(A)
I plot these easily using the area values to determine the "cex" for plotting. The Z column is intensity and I use these values to determine the colours . How do I create spatial polygons features using the areas for each point in R? I would be using these points to create gridded rasters.
This should do the trick:
library(rgeos) ## for gBuffer()
library(raster) ## for bind()
ww <- sqrt(B$Area)/2 ## Widths of buffers needed to produce desired areas
pp <- list()
for(i in seq_along(B)) {
pp[i] <- gBuffer(B[i], width=ww[i], quadsegs=1, capStyle="SQUARE")
}
PP <- do.call(bind, pp)
## Check that it worked
plot(PP)
plot(B, add=TRUE)
text(B, labels=1:4, adj=c(-1,0), col="red")
Just to update this a bit, here is an sf solution.
Function to create the square buffers
bSquare <- function(x, a) {
a <- sqrt(a)/2
return( sf::st_buffer(x, dist = a, nQuadSegs=1,
endCapStyle = "SQUARE") )
}
Some example data
library(sf)
xy <- st_as_sf(data.frame(x = c(1,3,6,7),
y = c(3,2,7,8), z = c(38,23,12,12),
area = c(32,23,45,67)),
coords = c("x", "y"),
agr = "constant")
With variable buffer
sb.var <- bSquare(xy, xy$area)
plot(st_geometry(sb.var))
plot(st_geometry(xy), pch=20, add=TRUE)
With fixed buffer
sb <- bSquare(xy, 32)
plot(st_geometry(sb))
plot(st_geometry(xy), pch=20, add=TRUE)

pairwise distance matrix of cell centroids in a raster grid in R

I want to get a pairwise distance matrix between the centroids of every cell with every other cell in a raster layer. As well, I'm trying to figure out how to get the pairwise matrix of differences between values within the raster layer.
I tried looking at the {gdistance} and {spatstat} packages but it doesn't seem like there is a function for this.
Here is an example:
df <- data.frame(x = rnorm(100, 5, 3), y = rnorm(100, 10, 6))
rast <- raster()
ncol <- 20
nrow <- 20
test <- rasterize(df, rast, FUN=mean)
plot(test, xlim=c(min(df$x), max(df$x)), ylim=c(min(df$y), max(df$x)))
Any ideas?

Resources