How to calculate the distance of raster to the nearest polygon? - r

I have an empty raster file (r1, Rasterlayer) and I want to calculate for each of the non-NA cells, the euclidian distance to the nearest polygons (S1, SpatialPolygonsDataFrame). Both r1 and S1 are projected in utm. Is there any R package that can do that?

You can first rasterize, and then use distance.
library(raster)
p1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60), c(-180,-20))
pols <- spPolygons(p1)
r <- raster(ncol=90, nrow=45)
r <- rasterize(pols, r)
d <- distance(r)
(The strange looking pattern is because the raster has lon/lat coordinates that wrap around the date line)

You can use distance function from library(raster), this function returns the distance, for all cells that are NA, to the nearest cell that is not NA:
install.packages('raster')
library(raster)
distance(r1, S1, filename = 'raster_distance', doEdge= TRUE)

this is something i would like to find in a one off function also - similar to raster::distance but telling you which polygon each cell is closest to, via an ID field or some such.
Annyhoo, for now;
# make a dummy raster with some dummy data.
r <- raster(xmn = 50, xmx = 80, ymn = -7, ymx = 10, res=0.5)
r[] <- sample(1:10,ncell(r), replace=T)
# make some dummy polygons that you want to know the min distance to on your raster
p1 <- rbind(c(55,-5), c(65,-5), c(65, 0), c(55,0), c(55,-5))
p2 <- rbind(c(70,-5), c(75,-5), c(75, 2), c(70,2), c(70,-5))
p3 <- rbind(c(55,4), c(75,4), c(75, 7), c(55,7), c(55,4))
pols <- st_as_sf(spPolygons(p1,p2,p3))
pols$ID <- 1:3
# let's look
plot(r)
plot(pols$geometry,add=T)
# to make a raster layer of min euclidean distance to a polygon (any),
# loop through the sf ID's and `fasterize` (v. quick) and run the
# raster::distance function to find out the distance from itself
st <- stack()
for(i in pols$ID){
r_pol <- fasterize(pols[pols$ID==i,], r, field="ID")
rd <- distance(r_pol)
st <- stack(st,rd)
}
# To produce a simple min distance from any polygon, just select the min from the stack
r_min_dis_any_pol <- min(st)
plot(r_min_dis_any_pol)
# to go a bit further and let you know to which polygon the min distance refers to
r_min_which_pol <- which.min(st)
plot(r_min_which_pol)
# if you were interested in a zonal sum of the raster data, dependent on its nearest polygon
zonal_sum <- zonal(r, r_min_which_pol, fun="sum")
identical(sum(zonal_sum[,2]), cellStats(r,sum))
[1] TRUE

Related

Repeat for loop for all rows of a spatial points data frame

I want to calculate the shortestPath distance (using gDistance package) between a set of geographic coordinates, using a transition layer of the ocean to prevent 'movement' across land.
Here is how I created the transition layer:
library(raster); library(gdistance); library(maptools); library(rgdal); library(sp)
mapcrs <- "+proj=longlat +datum=WGS84 +no_defs"
data(wrld_simpl)
world <- wrld_simpl
worldshp <- spTransform(world, mapcrs)
ras <- raster(nrow=300,ncol=300)
crs(ras) <- crs(oceans.shp)
extent(ras) <- extent(worldshp)
landmask <- rasterize(worldshp, ras)
landras <- is.na(landmask)
tr <- transition(landras, transitionFunction = mean, directions = 8, symm = FALSE)
tr = geoCorrection(tr, scl=FALSE)
I then want to calculate the shortestPath distance between every coordinate in my dataset i.e. location 1 to location n, location 2 to location n etc.
Let's produce some hypothetical geographic coordinates and convert to spatial points
x <- rnorm(10, mean = -40, sd=5)
y <- rnorm(10, mean = 20, sd=5)
xy <- cbind(x,y); colnames(xy) <- c("lon","lat")
xy <- SpatialPoints(xy); projection(xy) <- projection(mapcrs)
Using the shortestPath function in gDistance, I can calculate the distance from the first coordinate (i.e. xy[1]) to all other xy coordinates, like so.
dist <- shortestPath(tr, origin = xy, goal = xy, output="SpatialLines")
I then tried to apply a for loop to sequentially calculate distance from location 1 to all other locations, and then calculating distance from location 2 to all other locations etc., which I wrote as follows:
for(i in seq_along(xy)){
AtoB <- shortestPath(tr, origin = xy[i,], goal=xy, output="SpatialLines")
i <- i+1
}
This, however, still only calculates the distances relative to the first xy spatial point and does not 'loop' for all subsequent rows. I don't know what I'm doing wrong. It's probably super-easy, but I'm struggling. Any help would be appreciated.
Thanks in advance,
Tony
---- UPDATE ----
We have come up with a bit of a work around (thanks Charley Clubley) but it still won't produce outputs for every spatial line. This will generate a matrix of distances.
The work around is as follows:
Using xy as a matrix, not spatial points
distances <- matrix(ncol=nrow(xy), nrow=nrow(xy))
xy_b <- xy ## Coords needs to be as a matrix (not spatial points)
## This generates an error indicating there are no more rows to delete once complete, but the computation works
for (i in 1:nrow(xy_b)) {
AtoB <-shortestPath(tr, xy_b, xy, output="SpatialLines")
length <- SpatialLinesLengths(AtoB)
distances[i, ] <- length
xy_b <- xy_b[-1,]
}

How to efficiently count the number of spatial points within a certain distance around raster cells in R?

I would like to count the number of spatial points (of a SpatialPointsDataFrame object) within a certain distance to every cell of a RasterLayer in R. The resulting value should replace the original value of that particular raster cell.
Here is a reproducible example:
# load library
library(raster)
# generate raster
ras <- raster(nrow=18, ncol=36)
values(ras) <- NA
# create SpatialPointsDataFrame
x <- c(-160,-155,-153,-150, 30, -45, -44, -42, -40, 100, 110, 130)
y <- c(-75,-73,-71,-60, 0, 30, 35, 40, 41, 10, -10, 60)
z <- c(seq(1, 12, 1))
df <- data.frame(x,y,z)
spdf <- SpatialPointsDataFrame(coords=df[,c(1,2)],
data=as.data.frame(df[,3]),
proj4string=CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))
# visualize
plot(ras)
plot(spdf, add=T)
# loop over all raster cells
for(r in 1:nrow(ras)){
for(c in 1:ncol(ras)){
# duplicate raster for subsequent modification
ras_x <- ras
# define cell for which to count the number of surrounding points
ras_x[r,c] <- nrow(spdf) # some value that is impossible to be true, this is only a temporary placeholder
ras_x[ras_x != nrow(spdf)] <- NA
# convert raster cell to spatial point
spatial_point <- rasterToPoints(ras_x, spatial=T)
# calculate distance around raster cell
ras_dist <- distanceFromPoints(ras_x, spatial_point)
ras_dist <- ras_dist / 1000000 # scale values
# define circular zone by setting distance threshold (raster only with values 1 or NA)
ras_dist[ras_dist > 2] <- NA
ras_dist[ras_dist <= 2] <- 1
# create empty vector to count number of spatial points located within zone around the particular raster cell
empty_vec <- c()
# loop to check which value every point of SpatialPointsDataFrame corresponds to
for (i in 1:nrow(spdf)){
point <- extract(ras_dist, spdf[i,])
empty_vec[i] <- point
}
# sum of resulting vector is the number of points within surrounding zone around predefined raster cell
val <- sum(na.omit(empty_vec))
val
ras[r,c] <- val
# print for progress monitoring
print(paste0("sum of points within radius around cell row ", r, " and column ", c, " is ", val))
print(paste0("finished ", r, " out of ", nrow(ras)))
print(paste0("finished ", c, " out of ", ncol(ras)))
# both plots are just for visualization and progress monitoring
plot(ras)
plot(spdf, add=T)
}
}
plot(ras)
plot(spdf, add=T)
The resulting raster is exactly what I want but my way of checking the underlying raster values for each point of the SpatialPointsDataFrame seems inefficient. My real data consists of a RasterLayer with 2160, 4320, 9331200 (nrow, ncol, ncell) and a SpatialPointsDataFrame with 2664 features.
Is there a way to generate the raster of simply counting how many points are located within a certain distance around every raster cell more efficiently?
If you can work with projected coordinates this can be done fairly easily with the spatstat package.
This requires you to project your points (and grid) with e.g. sf::st_transform() and will not work
on a global scale.
Load spatstat and make 2000 random points to test against:
library(spatstat)
W <- square(1)
set.seed(42)
Y <- runifpoint(2000) # Random points in the unit square
plot(Y, main = "Random points in unit square")
Make 3000x3000 grid of points (9 million points):
xy <- gridcenters(W, 3000, 3000) # Grid of points in the unit square
X <- ppp(xy$x, xy$y, window = W, check = FALSE, checkdup = FALSE)
For each of the 9 million grid points count the number of other points within
radius 0.01 (timed on my resonably fast laptop with 16GB RAM):
system.time(counts <- crosspaircounts(X, Y, r = .01))
#> user system elapsed
#> 1.700 0.228 1.928
Convert to spatstat’s im-format (raster type format – can be converted with maptools) and plot:
rslt <- as.im(data.frame(x = xy$x, y = xy$y, counts))
plot(rslt, main = "Point counts in raster cells")
The points overlayed on the counts shows that we have done the right thing:
plot(rslt, main = "Point counts in raster cells")
plot(Y, add = TRUE, col = rgb(1,1,1,.7), pch = 3)
I’m sure you can also do something elegant and fast with raster, but I’m not the right one to ask there.

How to perform focal operation (mean) on raster using 3x3 window in R? I have lat/long values

I have raster and lat/long values, I want to perform focal operation on these points using 3x3 window/kernel. I am new to R.
Here is a workflow to compute the mean of a raster in 3x3 zones centered at defined lat/lon coordinates.
Rasterize the points and dilate the resulting raster to create the 3x3 zones
library(mmand)
library(raster)
# rasterize points based on lat/lon coordinates
z <- rasterize(pts[,2:3], r, field = pts$id)
# dilate z using a 3x3 box
kern <- shapeKernel(c(3,3), type="box")
z[,] <- dilate(as.matrix(z), kern)
plot(z)
Compute the mean of r values in each zone
# raster::zonal function (zonal statistics) is used
# -Inf correspond to NA values and should not be taken into account
# you can change "mean" by the stats you would like to compute
zonal(r, z, fun = "mean")
# zone mean
#[1,] -Inf 5.563607
#[2,] 1 5.000000
#[3,] 2 3.444444
#[4,] 3 5.222222
Sample data
library(raster)
set.seed(1)
# Generate raster of random values
r <- raster(crs = CRS("+proj=robin +datum=WGS84"), resolution = c(10, 10))
r[] <- round(runif(ncell(r), 1, 10))
# Generate data frame with lat/lon coordinates
pts <- data.frame(id = 1:3, lon = c(-100, 40, 120), lat = c(-45, 5, 35))
plot(r)
points(pts$lon, pts$lat, pch = 20, cex = 2)

Calculating the distance between points in R

I looked through the questions that been asked but dealing with coordinates but couldn't find something can help me out with my problem.
I have dataset that contain ID, Speed, Time , List of Latitude & Longitude. ( dataset can be found in the link)
https://drive.google.com/file/d/1MJUvM5WEhua7Rt0lufCyugBdGSKaHMGZ/view?usp=sharing
I want to measure the distance between each point of Latitude & Longitude.
For example;
Latitude has: x1 ,x2 ,x3 ,...x1000
Longitude has: y1 ,y2 ,y3 ,..., y100
I want to measure the distance between (x1,y1) to all the points , and (x2,y2) to all the points, and so on.
The reason I'm doing this to know which point close to which and assign index to each location based on the distance.
if (x1, y1) is close to (x4,y4) so (x1, y1) will get the index A for example and (x4,y4) will get labeled as B. sort the points in order based on distance.
I tried gDistance function but showed error message: "package ‘gDistance’ is not available (for R version 3.4.3)"
and if I change the version to 3.3 library(rgeos) won't work !!
Any suggestions?
here's what I tried,
#requiring necessary packages:
library(sp) # vector data
library(rgeos) # geometry ops
#Read the data and transform them to spatial objects
d <- read.csv("ReadyData.csv")
sp.ReadData <- d
coordinates(sp.ReadyData) <- ~Longitude + Latitude
d <- gDistance(sp.ReadyData, byid= TRUE)
here's update my solution, I created spatial object and made spatial data frame as follow:
#Create spatial object:
lonlat <- cbind(spatial$Longitude, spatial$Latitude)
#Create a SpatialPoints object:
library(sp)
pts <- SpatialPoints(lonlat)
crdref <- CRS('+proj=longlat +datum=WGS84')
pts <- SpatialPoints(lonlat, proj4string=crdref)
# make spatial data frame
ptsdf <- SpatialPointsDataFrame(pts, data=spatial)
Now I'm trying to measure the Distance for longitude/latitude coordinates. I tried dist method but seems not working for me and tried pointDistance method:
gdis <- pointDistance(pts, lonlat=TRUE)
still not clear for me how this function can measure the distance, I need to figure out the distance so I can locate the point in the middle and assign numbers for each point based on its location from the middle point..
You can use raster::pointDistance or geosphere::distm among others functions.
Part of your example data (please avoid files in your questions):
d <- read.table(sep=",", text='
"OBU ID","Time Received","Speed","Latitude","Longitude"
"1",20,1479171686325,0,38.929596,-77.2478813
"2",20,1479171686341,0,38.929596,-77.2478813
"3",20,1479171698485,1.5,38.9295887,-77.2478945
"4",20,1479171704373,1,38.9295048,-77.247922
"5",20,1479171710373,0,38.9294865,-77.2479055
"6",20,1479171710373,0,38.9294865,-77.2479055
"7",20,1479171710373,0,38.9294865,-77.2479055
"8",20,1479171716373,2,38.9294773,-77.2478712
"9",20,1479171716374,2,38.9294773,-77.2478712
"10",20,1479171722373,1.32,38.9294773,-77.2477417')
Solution:
library(raster)
m <- pointDistance(d[, c("Longitude", "Latitude")], lonlat=TRUE)
To get the nearest point to each point, you can do
mm <- as.matrix(as.dist(m))
diag(mm) <- NA
i <- apply(mm, 1, which.min)
The point pairs
p <- cbind(1:nrow(mm), i)
To get the distances, you can do:
mm[p]
Or do this:
apply(mm, 1, min, na.rm=TRUE)
Note that rgeos::gDistance is for planar data, not for longitude/latitude data.
Here is a similar question/answer with some illustration.
our data set is too large to make a single distance matrix. You can process your data in chunks to with that. Here I am showing that with a rather small chunk size of 4 rows. Make this number much bigger to speed up processing time.
library(geosphere)
chunk <- 4 # rows
start <- seq(1, nrow(d), chunk)
end <- c(start[-1], nrow(d))
x <- d[, c("Longitude", "Latitude")]
r <- list()
for (i in 1:length(start)) {
y <- x[start[i]:end[i], , drop=FALSE]
m <- distm(y, x)
m[cbind(1:nrow(m), start[i]:end)] <- NA
r[[i]] <- apply(m, 1, which.min)
}
r <- unlist(r)
r
# [1] 2 1 1 5 6 6 5 5 9 8 8 8
So for your data:
d <- read.csv("ReadyData.csv")
chunk <- 100 # rows
# etc
This will take a long time.
An alternative approach:
library(spdep)
x <- as.matrix(d[, c("Longitude", "Latitude")])
k <- as.vector(knearneigh(x, k=1, longlat=TRUE)$nn)
Assuming you have p1 as spatialpoints of x and p2 as spatialpoints of y, to get the index of the nearest other point:
ReadyData$cloDist <- apply(gDistance(p1, p2, byid=TRUE), 1, which.min)
If you have the same coordinate in the list you will get an index of the point itself since the closest place to itself is itself. An easy trick to avoid that is to use the second farthest distance as reference with a quick function:
f_which.min <- function(vec, idx) sort(vec, index.return = TRUE)$ix[idx]
ReadyData$cloDist2 <- apply(gDistance(p1, p2, byid=TRUE), 1, f_which.min,
idx = 2)

Finding minimum distance between two raster layer pixels in R

I have two thematic raster layers r1 and r2 for same area each following same classification scheme and has 16 classes. I need to find minimum distance between cell of r1 and cell of r2 but with same value. E.g. nth cell in r1 has value 10 and coordinates x1,y1. And in r2, there are 2 cells with value 10 and coordinates x1+2,y1+2 and x1-0.5,y1-0.5. Thus the value that I need for this cell would be 0.5,0.5.
I tried distance from raster package but it gives distance, for all cells that are NA, to the nearest cell that is not NA. I am confused as to how can I include second raster layer into this.
You can use knn from class package so that for each cell of r1 find index of nearest cell of r2 with the same category:
library(class)
library(raster)
#example of two rasters
r1 <- raster(ncol = 600, nrow = 300)
r2 <- raster(ncol = 600, nrow = 300)
#fill each with categories that rabge from 1 to 16
r1[] <- sample(1:16, ncell(r1), T)
r2[] <- sample(1:16, ncell(r2), T)
# coordinates of cells extracted
xy = xyFromCell(r1, 1:ncell(r1))
#multiply values of raster with a relatively large number so cells thet belong
#to each category have smaller distance with reagrd to other categories.
v1 = values(r1) * 1000000
v2 = values(r2) * 1000000
# the function returns indices of nearest cells
out = knn(cbind(v2, xy) ,cbind(v1, xy) ,1:ncell(r1), k=1)
So, use rasterToPoints to extract SpatialPoints object for unique thematic class. Then use the sp::spDists function to find the distance between your points.
library(raster)
r1 <- raster( nrow=10,ncol=10)
r2 <- raster( nrow=10,ncol=10)
set.seed(1)
r1[] <- ceiling(runif(100,0,10))
r2[] <- ceiling(runif(100,0,10))
dist.class <- NULL
for(i in unique(values(r1))){
p1 <- rasterToPoints(r1, fun=function(xx) xx==i, spatial=T)
p2 <- rasterToPoints(r2, fun=function(xx) xx==i, spatial=T)
dist.class[i] <- min(spDists(p1,p2))
}
cbind(class = unique(values(r1)),dist.class)
The loop may not be efficient for you. If it's a problem, wrap it into a function and lapply it. Also, be carefull with your class, if they aren't 1:10, my loop won't work. If your projection is in degree, you will probably need the geosphere package to get accurate results. But the best in that case I think is to use a projection in meters.
A memory safe approach using the raster-package would be to use the layerize() function to split up your raster value into a stack of binary rasters (16 in your case) and then use the distance() function to compute distances in the layers of r2, masking them with the respective layers of r1. Something like this:
layers1 <- layerize(r1, falseNA=TRUE)
layers2 <- layerize(r2, falseNA=TRUE)
# now you can loop over the layers (use foreach loop if you want
# to speed things up using parallel processing)
dist.stack <- layers1
for (i in 1:nlayers(r1)) {
dist.i <- distance(layers2[[i]])
dist.mask.i <- mask(dist, layers1[[i]])
dist.stack[[i]] <- dist.mask.i
}
# if you want pairwise distances for all classes in one layer, simply
# combine them using sum()
dist.combine <- sum(dist.stack, na.rm=TRUE)

Resources