Avoid a for loop in raster::extract(rst,shp) - r

I am working on R to extract the mean and maximum value of a raster within a 3 meter buffer of some buildings.
For this, I have created a for loop that iterates iterates through each building to extract this two values. My current code looks as follows:
for (b in c(1:nrow(buildings_shp))){
building <- buildings_shp[b,]
buffered <- st_buffer(building, 3)
raster_cropped <- crop(raster, extent(buffered))
mean <- extract(depths_cropped, buffered, fun = mean, na.rm = TRUE)
max <- extract(depths_cropped, buffered, fun = max, na.rm = TRUE)
buildings_shp[b,"mean"] <- mean
buildings_shp[b,"max"] <- max
}
This loop, however, takes a considerable amount of time (~17 minutes for 1500 buildings), and the step that seems to take most time is the two extract lines. I would like to know if there are ways to speed up this process by:
a) avoiding the use of a loop - the reason for this loop is that I fear that if I use st_buffer on the entire dataset, then when buildings are closer than 3 meters I would generate overlapping geometries, which may cause an error.
b) parallelizing the for loop (i have tried the raster clustering feature, but it did not speed up the process, probably because it did not parallelize the loop itself but the extract function)
c) using other function than raster::extract. I have seen some posts recommending the velox package, but it seems like this package has been removed from CRAN.
Some dummy data (copied from the referenced question above)
library(raster)
library(sf)
raster <- raster(ncol=1000, nrow=1000, xmn=2001476, xmx=11519096, ymn=9087279, ymx=17080719)
raster []=rtruncnorm(n=ncell(raster ),a=0, b=10, mean=5, sd=2)
crs(raster ) <- "+proj=utm +zone=51 ellps=WGS84"
x1 <- runif(100,2001476,11519096)
y1 <- runif(100, 9087279,17080719)
buildings_shp <- st_buffer(st_sfc(st_point(c(x1[1],y1[1]), dim="XY"),crs=32651),200000)

You do not need a loop. Example data from ?raster::extract
library(raster)
r <- raster(ncol=36, nrow=18, vals=1:(18*36))
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
buildings <- spPolygons(cds1, cds2)
Get the buffers and extract. As you want to compute two statistics, it is easier to not use a summarizing function in this case.
b <- buffer(buildings, width=3, dissolve=FALSE)
e <- extract(r, b)
And now compute the statistics
sapply(e, mean, na.rm=TRUE)
#[1] 379.4167 330.0741
sapply(e, max, na.rm=TRUE)
#[1] 507 498
This should be faster with terra
library(terra)
v <- vect(b)
x <- rast(r)
ee <- extract(x, v)

Related

How can I perform neighborhood analysis in terra or raster and keep the same NA cells of the input?

I want to perform a neighborhood analysis in R to smooth the layer I have but keeping all the NAs of the input raster intact.
However, when I do, for instance, the following, the calculation "propagates" over the NA values - what it is an undesiderable behavior, in my case.
library(terra)
library(dplyr)
# load example raster in metric system
f <- system.file("ex/elev.tif", package="terra")
r <- rast(f) %>%
terra::project("EPSG:32631")
# focal
neigh <- terra::focal(r, w = 7, fun = "mean")
# plot
plot(c(r, neigh))
Update:
Following the suggestion made by #dww below, I could use terra::mask. A way to deal with that, then, would be:
# focal
neigh <- terra::focal(r, w = 7, fun = "mean") %>%
terra::mask(mask = r)
# plot
plot(c(r, neigh))
Is there another way out avoid the propagation of values to NA cells within focal?
(here it is a simple example of a square filter to calculate the mean, but I am searching something that would be usefull for all types of filter, e.g. any matrix defined by terra::focalMat())
Should I deal with that when defining the weight matrix?
With terra the focal method has an argument na.policy that can be set to one of "all", "only" or "omit".
library(terra)
#terra 1.5.6
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
r[45:50, 45:50] <- NA
f1 <- focal(r, 7, "mean", na.policy="omit", na.rm=TRUE)
plot(f1, fun=lines(v))
This is equivalent, but possibly more efficient, to using focal and mask:
f2 <- focal(r, 7, "mean", na.rm=TRUE) |> mask(r)

Streamlining binary rasterization in R

I have a few very small country-level polygon and point shapefiles that I would like to rasterize in R. The final product should be one global binary raster (indicating whether grid cell center is covered by a polygon / point lies within cell or not). My approach is to loop over the shapefiles and do the following for each shapefile:
# load shapefile
shp = sf::read_sf(shapefile_path)
# create a global raster template with resolution 0.0083
ext = extent(-180.0042, 180.0042, -65.00417, 75.00417)
gridsize = 0.008333333
r = raster(ext, res = gridsize)
# rasterize polygon or point shapefile to raster
rr = rasterize(shp, r, background = 0) #all grid cells that are not covered get 0
# convert to binary raster
values(rr)[values(rr)>0] = 1
Here, rr is the raster file where the polygons / points in shp are coded as 1 and all other grid cells are coded as 0. Afterwards, I take the sum over all rr to arrive at one global binary raster file including all polygons / points.
The final two steps are incredibly slow. In addition, I get RAM problems when I try to replace the all positive values in rr with 1 as the cell count is very large due to the fine resolution. I was wondering whether it is possible to come up with a smarter solution for what I'd like to achieve.
I have already found the fasterize package that has a speedy implementation of rasterize which works fine. I think it would be of great help if someone has a solution where rasterize directly returns a binary raster.
This is how you can do this better with raster. Note the value=1 argument, and also that that I changed your specification of the extent -- as what you do is probably not correct.
library(raster)
v <- shapefile(shapefile_path)
ext <- extent(-180, 180, -65, 75)
r <- raster(ext, res = 1/120)
rr <- rasterize(v, r, value=1, background = 0)
There is no need for your last step, but you could have done
rr <- clamp(rr, 0, 1)
# or
rr <- rr > 0
# or
rr <- reclassify(rr, cbind(1, Inf, 1))
raster::calc is not very efficient for simple arithmetic like this
It should be much faster to rasterize all vector data in one step, rather than in a loop, especially with large rasters like this (for which the program may need to write a temp file for each iteration).
To illustrate this solution with example data
library(raster)
cds1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60))
cds2 <- rbind(c(-10,0), c(140,60), c(160,0), c(140,-55))
cds3 <- rbind(c(-125,0), c(0,60), c(40,5), c(15,-45))
v <- spLines(cds1, cds2, cds3)
r <- raster(ncols=90, nrows=45)
r <- rasterize(v, r, field=1)
To speed things up, you can use terra (the replacement for raster)
library(raster)
f <- system.file("ex/lux.shp", package="terra")
v <- as.lines(vect(f))
r <- rast(v, ncol=75, nrow=100)
x <- rasterize(v, r, field=1)
Something that seems to work computationally and significantly improves computation time is to
Create one large shapefile shp instead of working with individual rasterized shapefiles.
Use the fasterize package to rasterize the merged shapefile.
Use raster::calc to avoid memory problems.
ext = extent(-180.0042, 180.0042, -65.00417, 75.00417)
gridsize = 0.008333333
r = raster(ext, res=gridsize)
rr = fasterize(shp, r, background = 0) #all not covered cells get 0, others get sum
# convert to binary raster
fun = function(x) {x[x>0] <- 1; return(x) }
r2 = raster::calc(rr, fun)

How do you temporally interpolate a large RasterStack object to a higher periodicity (weekly to daily)

In R, I am trying to interpolate between stacks that were created a at weekly time interval, to a daily time interval. Interpolation method can be nearest neighbor or linear interpolation.
I have seen this can be done for time series using na.approx or a spline.
Also, I would like to keep the object as a Stack (no dataframe) if possible.
#Dummy example
#---#
library(raster)
# Create date sequence
idx <- seq(as.Date("2000/1/1"), as.Date("2000/12/31"), by = "week")
# Create raster stack and assign dates
r <- raster(ncol=20, nrow=20)
s <- stack(lapply(1:length(idx), function(x) setValues(r,
runif(ncell(r)))))
s <- setZ(s, idx)
# Do interpolation to daily resolution
# (Perhaps it should be done one by one, perhaps all at once...)
# ...
Say my actual stack has dimensions c(20,20,52), the result would have dimensions c(20,20,366).
Thank for your help
You need to write a function f, that does this for a vector (a cell), say s[1]. Then apply this function using calc, as in calc(s, f)
Here is a simple example that uses approx, that can be replaced by spline or other interpolators
library(raster)
r <- raster(ncol=20, nrow=20)
s <- stack(lapply(1:length(idx), function(x) setValues(r, runif(ncell(r)))))
idx <- seq(as.Date("2000/1/1"), as.Date("2000/12/31"), by = "week")
dr <- seq(as.Date("2000/1/1"), as.Date("2000/12/31"), by = "day")
f <- function(x) approx(idx, x, dr, rule=2)$y
# test <- f(s[1])
x <- calc(s, f)
Results for one cell
plot(dr, as.vector(x[1]), pch="+")
points(idx, as.vector(s[1]), pch=20, col="red", cex=2)
lines(idx, as.vector(s[1]), col="blue")

Transfering Conditional command from Raster Calculator of ArcGis to R

I would like to transfer the following code from Raster calculator of ArcGis to R.
Con("diff_canopy" >= 1), "diff_canopy")
This estimates a new raster which only contains the data from diff_canopy where diff_canopy is greater or equal than 1.
To solve this, I followed and adapted the code proposed in this post:
test <- raster (extent(canopy_sjd), nrows=nrow(canopy_sjd), ncols=ncol(canopy_sjd))
test[canopy_sjd[]>=1] <- canopy_sjd[canopy_sjd[] >=1]
The code works fine, however, when I compare the raster obtained with R code with the raster obtained directly with ArcGis calculator, I obtained different values:
From ArcGis calculator: min 1.01598 max 10.0271
From adapted R code: min 1.01598 max 11.7207
My questions are the following:
1) The adapted code match with the raster calculator statment?
2) If it matches, why the max values between output rasters differs?
3) If it do not matches, any other suggestions to fix the error?
Always include some example data:
library(raster)
canopy <- raster(nrow=4, ncol=4, xmn=0, xmx=1, ymn=0, ymx=1, crs='+proj=utm +zone=1')
values(canopy) <- 1:ncell(canopy)
canopy <- canopy - 5
Here is a simple solution:
x <- reclassify(canopy, cbind(-Inf, 1, NA), right=FALSE)
An alternative:
y <- mask(canopy, canopy >=1, maskvalue=0)
One more:
z <- calc(canopy, function(i){ i[i<1] <- NA; i})
For small data sets, it is possible to use your solution (but not recommended). I would do it like this:
a <- raster(canopy)
i <- which(values(canopy) >= 1)
a[i] <- canopy[i]

Function for resizing matrices in R

I was wondering if there was a function that scales down matrices in R statistical software exactly like with image resizing. The function imresize() in MATLAB is exactly what I'm looking for (I believe it takes the average of the surrounding points, but I am not sure of this), but I am wondering if there is an R equivalent for this function.
This question has been posted before on this forum, but with reference to MATLAB, not R:
Matlab "Scale Down" a Vector with Averages
The post starting with "Any reason why you can't use the imresize() function?" is exactly what I am looking for, but in R, not MATLAB.
Say I have a latitude-longitude grid of temperatures around the world, and let's say this is represented by a 64*128 matrix of temperatures. Now let's say I would like to have the same data contained in a new matrix, but I would like to rescale my grid to make it a 71*114 matrix of temperatures around the world. A function that would allow me to do so is what I'm looking for (again, the imresize() function, but in R, not MATLAB)
Thank you.
Steve
One way to do this is by using the function resample(), from the raster package.
I'll first show how you could use it to rescale your grid, and then give an easier-to-inspect example of its application to smaller raster objects
Use resample() to resize matrices
library(raster)
m <- matrix(seq_len(68*128), nrow=68, ncol=128, byrow=TRUE)
## Convert matrix to a raster with geographical coordinates
r <- raster(m)
extent(r) <- extent(c(-180, 180, -90, 90))
## Create a raster with the desired dimensions, and resample into it
s <- raster(nrow=71, ncol=114)
s <- resample(r,s)
## Convert resampled raster back to a matrix
m2 <- as.matrix(s)
Visually confirm that resample() does what you'd like:
library(raster)
## Original data (4x4)
rr <- raster(ncol=4, nrow=4)
rr[] <- 1:16
## Resize to 5x5
ss <- raster(ncol=5, nrow=5)
ss <- resample(rr, ss)
## Resize to 3x3
tt <- raster(ncol=3, nrow=3)
tt <- resample(rr, tt)
## Plot for comparison
par(mfcol=c(2,2))
plot(rr, main="original data")
plot(ss, main="resampled to 5-by-5")
plot(tt, main="resampled to 3-by-3")
The answer posted by Josh O'Brien is OK and it helped me (for starting point), but this approach was too slow since I had huge list of data. The method below is good alternative. It uses fields and works much faster.
Functions
rescale <- function(x, newrange=range(x)){
xrange <- range(x)
mfac <- (newrange[2]-newrange[1])/(xrange[2]-xrange[1])
newrange[1]+(x-xrange[1])*mfac
}
ResizeMat <- function(mat, ndim=dim(mat)){
if(!require(fields)) stop("`fields` required.")
# input object
odim <- dim(mat)
obj <- list(x= 1:odim[1], y=1:odim[2], z= mat)
# output object
ans <- matrix(NA, nrow=ndim[1], ncol=ndim[2])
ndim <- dim(ans)
# rescaling
ncord <- as.matrix(expand.grid(seq_len(ndim[1]), seq_len(ndim[2])))
loc <- ncord
loc[,1] = rescale(ncord[,1], c(1,odim[1]))
loc[,2] = rescale(ncord[,2], c(1,odim[2]))
# interpolation
ans[ncord] <- interp.surface(obj, loc)
ans
}
Lets look how it works
## Original data (4x4)
rr <- matrix(1:16, ncol=4, nrow=4)
ss <- ResizeMat(rr, c(5,5))
tt <- ResizeMat(rr, c(3,3))
## Plot for comparison
par(mfcol=c(2,2), mar=c(1,1,2,1))
image(rr, main="original data", axes=FALSE)
image(ss, main="resampled to 5-by-5", axes=FALSE)
image(tt, main="resampled to 3-by-3", axes=FALSE)

Resources