pairwise distance matrix of cell centroids in a raster grid in R - r

I want to get a pairwise distance matrix between the centroids of every cell with every other cell in a raster layer. As well, I'm trying to figure out how to get the pairwise matrix of differences between values within the raster layer.
I tried looking at the {gdistance} and {spatstat} packages but it doesn't seem like there is a function for this.
Here is an example:
df <- data.frame(x = rnorm(100, 5, 3), y = rnorm(100, 10, 6))
rast <- raster()
ncol <- 20
nrow <- 20
test <- rasterize(df, rast, FUN=mean)
plot(test, xlim=c(min(df$x), max(df$x)), ylim=c(min(df$y), max(df$x)))
Any ideas?

Related

How to calculate the distance of raster to the nearest polygon?

I have an empty raster file (r1, Rasterlayer) and I want to calculate for each of the non-NA cells, the euclidian distance to the nearest polygons (S1, SpatialPolygonsDataFrame). Both r1 and S1 are projected in utm. Is there any R package that can do that?
You can first rasterize, and then use distance.
library(raster)
p1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60), c(-180,-20))
pols <- spPolygons(p1)
r <- raster(ncol=90, nrow=45)
r <- rasterize(pols, r)
d <- distance(r)
(The strange looking pattern is because the raster has lon/lat coordinates that wrap around the date line)
You can use distance function from library(raster), this function returns the distance, for all cells that are NA, to the nearest cell that is not NA:
install.packages('raster')
library(raster)
distance(r1, S1, filename = 'raster_distance', doEdge= TRUE)
this is something i would like to find in a one off function also - similar to raster::distance but telling you which polygon each cell is closest to, via an ID field or some such.
Annyhoo, for now;
# make a dummy raster with some dummy data.
r <- raster(xmn = 50, xmx = 80, ymn = -7, ymx = 10, res=0.5)
r[] <- sample(1:10,ncell(r), replace=T)
# make some dummy polygons that you want to know the min distance to on your raster
p1 <- rbind(c(55,-5), c(65,-5), c(65, 0), c(55,0), c(55,-5))
p2 <- rbind(c(70,-5), c(75,-5), c(75, 2), c(70,2), c(70,-5))
p3 <- rbind(c(55,4), c(75,4), c(75, 7), c(55,7), c(55,4))
pols <- st_as_sf(spPolygons(p1,p2,p3))
pols$ID <- 1:3
# let's look
plot(r)
plot(pols$geometry,add=T)
# to make a raster layer of min euclidean distance to a polygon (any),
# loop through the sf ID's and `fasterize` (v. quick) and run the
# raster::distance function to find out the distance from itself
st <- stack()
for(i in pols$ID){
r_pol <- fasterize(pols[pols$ID==i,], r, field="ID")
rd <- distance(r_pol)
st <- stack(st,rd)
}
# To produce a simple min distance from any polygon, just select the min from the stack
r_min_dis_any_pol <- min(st)
plot(r_min_dis_any_pol)
# to go a bit further and let you know to which polygon the min distance refers to
r_min_which_pol <- which.min(st)
plot(r_min_which_pol)
# if you were interested in a zonal sum of the raster data, dependent on its nearest polygon
zonal_sum <- zonal(r, r_min_which_pol, fun="sum")
identical(sum(zonal_sum[,2]), cellStats(r,sum))
[1] TRUE

Calculate majority using circular moving window on categorical data in R

I'm trying to calculate the majority value in a categorical raster data set in R, for example using land cover data. This would be similar to the focal statistics tool in ArcGIS using the majority statistic. I am able to calculate the majority land cover type using a rectangular moving window and the modal function:
library(raster)
# create data
r <- raster(nrows = 120, ncol = 120, xmn=0)
r[] <- sample(3, ncell(r), replace=TRUE)
a<-focal(r, w=matrix(1,3,3), fun=modal) # 3x3 moving window
plot(a)
However, when I apply a focal weight to define my circular moving window, the output values appear to be treated as continuous numbers and not discrete values, even if I convert the raster to a factor:
#convert to a factor factor
r.f<-as.factor(r)
#set up window
fw <- focalWeight(r.f, 4.5, type='circle')
#apply focal fxn
r.f.focal<-focal(r.f, w=fw, fun=modal, na.rm=TRUE)
It appears zeros might be added into the analysis and that is creating the problem. Can anyone steer me in the right direction?
I think it will be fixed if d (which is now 4.5) is an integer. Your defined radius should be based on cell counts (1,2,3,...,n). However, still, the output will be in floating format as each xij in the kernel is a floating point with sum of 1. To achieve an integer output there is also a third option.
library(raster)
set.seed(070319)
# create data
r <- raster(nrows = 120, ncol = 120, xmn=0)
r[] <- sample(3, ncell(r), replace=TRUE)
a<-focal(r, w=matrix(1,3,3), fun=modal) # 3x3 moving window
par(mfrow=c(1,2))
plot(r)
plot(a)
#set up window
fw <- focalWeight(r, 4, type='circle')
#apply focal fxn
r.f.focal<-focal(r, w=fw, fun=modal, na.rm=TRUE)
par(mfrow=c(1,2))
plot(r)
plot(r.f.focal)
for integer output you can also do this:
#set up window
fw <- ceiling(focalWeight(r, 4, type='circle'))#for integer output
#apply focal fxn
r.f.focal<-focal(r, w=fw, fun=modal, na.rm=TRUE)
par(mfrow=c(1,2), oma=c(0,0,0,1))
plot(r)
plot(r.f.focal)

Minimum distance between two sets of co-ordinates

I have two data frames of different sizes containing geocodes. The first (df) has 12,000 observations and the second (schools) 3,000.
The first contains geocodes for properties in a country and the second for schools in the country.
I want to find the distance of the nearest school for each property. Using the geosphere package I'm currently working with the following:
library(geosphere)
for(i in 1:length(df$longitude)){
df$dist2[i] <- distm(c(schools[1, 3], schools[1, 2]), c(df$longitude[i], df$latitude[i]), fun = distHaversine) *0.001
}
where schools[, 3] and schools[, 2] are the longitude and latitude columns of that data frame respectively.
The above calculates the distance (in km) between all observations in df and the first school in schools.
I want to calculate the distance between each observation and all schools, saving only the smallest distance as that value for df$dist2[i].
In the following example, I make up the longitude/latitude data on the points and the school.
library(tidyverse)
library(geosphere)
df_points <- data.frame(lon = rnorm(10, mean =4, sd = 0.5), lat = rnorm(10, mean = 50, sd= 0.1))
df_schools <- data.frame(lon = rnorm( 3, mean =4, sd = 0.5), lat = rnorm( 3, mean = 50, sd= 0.1))
distm(df_points, df_schools, fun = distHaversine ) %>%
as.data.frame() %>%
rownames_to_column(var = "point_id") %>%
mutate(point_id = as.numeric(point_id)) %>%
gather(key = school, value = distance, -point_id) %>%
group_by(point_id) %>%
summarise(smalles_distance = min(distance))
Here is an approach using sp class objects. You can coerce data.frame objects to SpatialPointsDataFrame objects using something along the lines of: coordinates(x) <- ~lon+lat The idea here is to derive a distance matrix between the two point feature classes and then pull the distance and ID based on the column name (assigned from the school data). This not only returns the distance but also a unique identifier for each school making it easy to query the actual closest school to any given property feature.
First, add the required libraries and create some example data.
library(sp)
library(raster)
e <- as(raster::extent(-180, 180, -90, 90), "SpatialPolygons")
properties <- spsample(e, 1000, type="random")
proj4string(properties) <- "+proj=longlat +ellps=WGS84"
schools <- spsample(e, 100, type="random")
proj4string(schools) <- "+proj=longlat +ellps=WGS84"
schools$ids <- paste0("school", 1:length(schools))
Now, we can create the distance matrix, assign the diagonal to NA and add a unique identifier from schools to the column names of the matrix.
d <- spDists(x = properties, y = schools, longlat = TRUE)
diag(d) <- NA
colnames(d) <- schools$ids
There are certainly more elegant ways to do this but, for simplicity sake we will use a for loop to populate two vectors representing distance and ID's. We use which.min to pull the index for the minimum distance at row i. The iterator is based on the matrix rows because they represent the property features.
sdist <- rep(NA, nrow(d))
sid <- rep(NA, nrow(d))
for(i in 1:nrow(d)) {
srow <- d[i,]
sdist[i] <- srow[which.min(srow)]
sid[i] <- names(srow)[which.min(srow)]
}
We can then assign the resulting vectors to the properties SpatialPointsDataFrame. Now we have columns in the #data slot data.frame that represent distance to nearest school as well as the school ID's.
properties$school <- sid
properties$dist <- sdist
Here we can plot the results.
par(mfrow=c(2,1))
plot(properties, pch=19, cex=0.5)
plot(schools, pch=19, col="red", add=TRUE)
plot(e, add=TRUE)
title("random properties (black) and schools (red)", cex=0.5)
plot(properties, col="white")
plot(properties[1,], pch=19, cex=2, add=TRUE)
plot(schools[which(schools$ids %in% properties[1,]$school),],
pch=19, cex=2, col="red", add=TRUE)
plot(e, add=TRUE)
title("Property 1 (black) and closest school (red)", cex=0.5)
sidx <- which(schools$ids %in% properties[1,]$school)
text(coordinates(schools[sidx,]),
label = schools[sidx,]$ids, col="blue", cex=1)

R: Bilinear interpolation to fill gaps in R

I have a grid that contains gaps (NAs) that I want to fill using interpolation. My grid shows autocorrelation in the x and y dimensions, so I would like to try bilinear interpolation. Most of the solutions I have found are focused on 'upsampling' (interpolation for the purpose of increasing number of samples/size of grid), but I do not want/need to change the grid size. I just want to fill NAs using interpolation. Other potential solutions do not seem to handle NAs for the input grid of values (the 'z matrix'), or are neighborhood-based solutions rather than bilinear interpoloation, or simply have no answer.
I found that with the raster package, I can input a grid (as a raster) that contains NAs, and use the 'resample' command to output a grid of the same size. However, the results look like nearest neighbor interpolation rather than bilinear interpolation.
Am I missing something such that there is a way to do bilinear interpolation with the raster package? Or is there a better way to do bilinear interpolation simply to fill NAs?
library(raster)
# raster containing gap
r <- raster(nrow=10, ncol=10)
r[] <- 1:ncell(r)
r[25] <- NA
# The s raster is the same size as the r raster
s <- raster(nrow=10, ncol=10)
s <- resample(r, s, method='bilinear')
plot(r)
plot(s)
s[25]
s[35]
# s[25] appears to have been filled with neighbor s[35]
UPDATE
The Akima package seems like a promising alternative to the raster approach above, but I'm having trouble if there are NAs in the input grid of values (the Z matrix). Here's an example parallel to the example above to demonstrate. (Again, I'm interpolating to a grid the same size as the original).
library(akima)
# Use bilinear interpolation (no NAs in input)
rmat<-matrix(seq(1,100,1), nrow = 10, ncol = 10, byrow = T)
x <- seq(1,10,1)
y <- seq(1,10,1)
smat <- bilinear.grid(x, y, rmat, nx = 10, ny = 10) # works
plot(raster(rmat), main = "original")
plot(raster(smat$z), main = "interpolated")
# Try using bilinear interpolation but with an NA
rmat<-matrix(seq(1,100,1), nrow = 10, ncol = 10, byrow = T)
rmat[3,5] <- NA
x <- seq(1,10,1)
y <- seq(1,10,1)
smat <- bilinear.grid(x, y, rmat, nx = 10, ny = 10) # Error about NAs
UPDATE2
There was a great question from #Robert Hijmans about why not use a moving window average with the focal() command in the raster package. The reason is that I want to try bilinear interpolation, and I don't think a moving window average always gives the same answer as bilinear interpolation. However, this was not clear in the example I posted (in that example moving window and bilinear interp do give the same answer), so I'll demonstrate in a new example below. Note that the bilinear interpolation solution should be 8 for the example below (here is a handy calculator for tests).
library(raster)
r <- raster(nrow=10, ncol=10)
# Different grid values than earlier examples
values(r) <- c(rep(1:5, 4), rep(4:8, 4), rep(1:5, 4), rep(4:8, 4), rep(1:5, 4))
r[25] <- NA
plot(r)
# See what the mean of the moving window produces
f <- focal(r, w=matrix(1,nrow=3, ncol=3), fun=mean, NAonly=TRUE, na.rm=TRUE)
f[25] # Moving window gives 5 but bilinear interp gives 8
# Note that this seems to be how the moving window works with equal weights
window_test <- c(r[14:16], r[24:26], r[34:36])
mean(window_test, na.rm = T)
Am I missing something here? Maybe there is something clever with the weights argument of focal() that can produce a bilinear interpolation solution?
Let's use equal distance cells to avoid differences because of cell size variation with lon/lat data
library(raster)
r <- raster(nrow=10, ncol=10, crs='+proj=utm +zone=1 +datum=WGS84', xmn=0, xmx=1, ymn=0, ymx=1)
For this example, you might use focal
values(r) <- 1:ncell(r)
r[25] <- NA
f <- focal(r, w=matrix(1,nrow=3, ncol=3), fun=mean, NAonly=TRUE, na.rm=TRUE)
I see that you dismiss "neighborhood-based solutions rather than bilinear interpoloation". But the question is why. In this case, you may want a neighborhood-based solution.
Update. Then again, in case of cells that are not approximately square, bilinear would be preferable.
values(r) <- c(rep(1:5, 4), rep(4:8, 4), rep(1:5, 4), rep(4:8, 4), rep(1:5, 4))
r[25] <- NA
The problem with bilinear interpolation normally uses 4 contiguous cells, but in this case, where you want the value for the center of a cell, the appropriate cell would be the value of the cell itself, because the distance to that cell is zero, and thus that is where the interpolation ends up. For example, for cell 23
extract(r, xyFromCell(r, 23))
#6
extract(r, xyFromCell(r, 23), method='bilinear')
#[1] 6
In this case the focal cell is NA, so you get the average of the focal cell and 3 more cells. The question is which three? It is arbitrary, but to make it work, the NA cell must get a value. The raster algorithm assigns the value below the NA cell to that cell (also 8 here). This works well, I think, to deal with NA values at edges (e.g. land/ocean), but perhaps not in this case.
`
extract(r, xyFromCell(r, 25))
#NA
extract(r, xyFromCell(r, 25), method='bilinear')
#[1] 8
That is also what resample gives
resample(r, r)[25]
# 8
Is this what the on-line calculator suggests too?
This is very sensitive to small changes
extract(r, xyFromCell(r, 25)+0.0001, method='bilinear')
#[1] 4.998997
What I would really want in this case is the mean of the rook-neighbors
mean(r[adjacent(r, 25, pairs=FALSE)])
[1] 6
Or, more generally, the local inverse distance weighted average. You can compute
that by setting up a weights matrix with focal
# compute weights matrix
a <- sort(adjacent(r, 25, 8, pairs=F, include=TRUE))
axy <- xyFromCell(r, a)
d <- pointDistance(axy, xyFromCell(r, 25), lonlat=F)
w <- matrix(d, 3, 3)
w[2,2] <- 0
w <- w / sum(w)
# A simpler approach could be:
# w <- matrix(c(0,.25,0,.25,0,.25,0,.25,0), 3, 3)
foc <- focal(r, w, na.rm=TRUE, NAonly=TRUE)
foc[25]
In this example this is fine; but it would not be correct if there were multiple NA values in the focal area (as the sum of weights would no longer be 1). We can correct for that by computing the sum of weights
x <- as.integer(r/r)
sum_weights <- focal(x, w, na.rm=TRUE, NAonly=TRUE)
fw <- foc/sum_weights
done <- cover(r, fw)
done[25]

Retreiving output raster comparing two rater layer in R

I have two raster layer of dimension (7801, 7651). I want to compare each pixel of one raster layer with the other and create a new raster which has the minimum pixel value among the initial two raster. That is, if any i,j pixel of raster 1 has value 25 and same i,j pixel of raster 2 has value 20, thus in the output raster the i,j pixel should be 20.
You can just use min with two raster layers.
Let's start with a reproducible example:
library(raster)
r1 <- raster(ncol = 5, nrow = 5)
r1[] <- 1:ncell(r1)
plot(r1)
r2 <- raster(ncol = 5, nrow = 5)
r2[] <- ncell(r2):1
par(mfrow = c(1,3))
plot(r1)
plot(r2)
Now we calculate the min of each overlapping cell within the two raster layers very easily with the implemented cell statistics:
r3 <- min(r2, r1)
plot(r3)
Furthermore, you can also apply statistics like mean, max, etc.
If the implemented statistics somehow fail, or you want to use your own statistics, you can also directly access the data per pixel. That is, you first copy one of the raster layers.
r3 <- r1
Afterwards, you can apply a function over the values.
r3[] <- apply(cbind(r1[], r2[]), 1, min)
Using #loki's example, you have three more options to calculate minimum value for both layers:
library(raster)
calc(stack(r1,r2),fun=min,na.rm=T)
stackApply(stack(r1,r2),indices = c(1,1),fun='min',na.rm=T)
overlay(r1,r2,fun=min,na.rm=T)

Resources