How to fill NA gaps by IDW using focal (R raster/terra)? - r

I have a big raster with some NA cells that need to be filled. I want to fill it by Inverse Distance Weighting (IDW), by considering the 9 nearest [valid] cells. I used the idw function from the gstat package, but although it works, it takes ages to complete the task (my original raster comprises 6232186 cells that I include in the gstat call, and I have ~14000 gaps to be filled). As I have to repeat this task with several rasters, I'm looking for a faster solution. Does anyone have a suggestion?
I was thinking about using the focal from the raster or terra packages, but to be sincere I didn't understood very well how to set a matrix of weights to get a result like the IDW... Also, I would like to get the nearest valid cells (thus, suppose that in a square focal does not find valid cells, it would look further away to find more valid cells).
Just to give an example, suppose that in the following raster I need to fill the cells of number 310 and 330:
r <- raster(nrow = 20, ncol = 20)
r[1:300] <- sample(1:4, size = 300, replace = T)
plot(r)
gaps <- xyFromCell(r, c(310, 330))
points(gaps)
By using focal with a 3x3 square I would get the mean for just the cell 310 (and without the inverse weighting and also without getting 9 valid cells):
filed <- raster::focal(r, matrix(1, nrow = 3, ncol = 3), fun = mean, NAonly = T, na.rm = T)
plot(filed);points(gaps)
I appreciate any help/suggestion!

One approach would be to use a while loop to increase the window/matrix of the focal function until all NA cells are filled.
With terra it would be like this:
library(terra)
r <- rast(nrow = 20, ncol = 20)
r[1:300] <- sample(1:4, size = 300, replace = T)
gaps <- xyFromCell(r, c(310, 330))
w <- 1
filled <- r # just in case you want to keep the original
to_fill <- any(is.na(values(filled)))
# for big rasters you could use (same inside loop)
# to_fill <- global(filled, function(x) any(is.na(x)))[,1]
while(to_fill) {
w <- w + 2
filled <- focal(filled, w = w, fun = mean, na.policy = "only", na.rm = T)
to_fill <- any(is.na(values(filled)))
}
plot(filled)
points(gaps)

Related

A function in R for plotting locations with consistent high values in multiple raster data

I have four raster files of the same extent. The pattern of low and high values differ in each raster data. I would like to plot areas in the extent (boundary) with values greater than x (where x is an integer). Can anyone help me with an r function to do this? Please find below a sample code for the raster data. In this example, let say I want to plot and identify cells with values greater 0.4 in all the four rasters. Instead of four separate images I want one image that shows cells with values greater than 4. More like overlaying the raster and identifying cells with values greater than 4 in all the images
library(raster)
r1 <- raster(nrows = 1, ncols = 1, res = 0.5, xmn = -1.5, xmx = 1.5, ymn = -1.5, ymx = 1.5, vals = 0.3)
rr <- lapply(1:4, function(i) setValues(r1,runif(ncell(r1))))
par(mfrow = c(2,2))
plot(rr[[1]])
plot(rr[[2]])
plot(rr[[3]])
plot(rr[[4]])
Thank you/
You can combine raster images with &. First threshold each individual plot:
r2 = lapply(rr, `>`, threshold)
And then combine them, retaining only fields which are all greater than the threshold:
summary = Reduce(`&`, r2)
plot(summary)
This is a super easy solution that can be easily generalized for different length of input:
myplot <- function(input,threshold){
input <- lapply(input,function(x){
x <- x>threshold
})
par(mfrow = c(2,2))
plot(input[[1]])
plot(input[[2]])
plot(input[[3]])
plot(input[[4]])
}
myplot(rr,0.4)

Simulate a matricies of data using R as bands of satellite imagery from scratch

I am trying to
1) Simulate a matrix of data using R (in effect an image of numbers where each cell in the matrix has a number on the numerical scale of 0-255 (8 bit data))
2) Map the simulated data using mapping tools
3) Classify the image into 8-10 classes
The idea is to use a simple function to generate an image with 3 bands of Red Green and Blue imagery simulating multispectral imagery from satellite. So a composite of 3 different matricies. Like this.
Then classify the composite by colour into 8 or 10 classes
Any help would be much appreciated.
Based on your comments, here is an approach to sample as a gradient.
imagerows <- 100
imagecols <- 100
cuts <- 8
(imagecols * imagerows) %% cuts == 0 #Must be true
colorgroups <- as.integer(cut(0:255,cuts))
colors <- c("red","green","blue")
result <- lapply(colors,function(y){
unlist(
lapply(seq(1,cuts),function(x){sample((0:255)[colorgroups == x],
size = (imagerows*imagecols)/cuts,
replace = TRUE)})
)})
result is now a list of length 3, each element of which is a 100x100 matrix. The matrix contains 100 * 100 random samples between 0 and 255, but in cuts number of increasing groups.
We can then control the direction of the gradient using byrow = in matrix and by using rev() on the data.
red.matrix <- matrix((result[[1]]),nrow=imagerows,ncol=imagecols,byrow = TRUE)
green.matrix <- matrix((result[[2]]),nrow=imagerows,ncol=imagecols,byrow = FALSE)
blue.matrix <- matrix(rev(result[[3]]),nrow=imagerows,ncol=imagecols,byrow = FALSE)
Then we put the colors together with rgb() which outputs a vector. We can coerce it back into a matrix by assigning dimensions. Then just plot with grid.raster().
library(grid)
rgb.matrix <- rgb(red.matrix,green.matrix,blue.matrix,maxColorValue = 255)
dim(rgb.matrix) <- c(imagerows,imagecols)
grid.newpage()
grid.raster(rgb.matrix,interpolate = FALSE)

Calculate raster with each cell equals the mean of all adjacent cells

I am working on an ecological problem, involving species distribution models. I have a raster which is essentially a landscape of probabilities of presence per cell, so to speak. I want to calculate a new raster, based on the old one, where each cell is equal to the mean of itself and all 8 adjacent cells. This is not the same as aggregating the cells by mean, which results in the border between the newly aggregated cells being calculated incorrectly.
I can do this with the bit of code provided, but the raster I am working with is way, way too big to run this calculation, as it uses too much memory. If I subdivide the raster, it will still take days to do. Does anyone have a more efficient way of calculating this? I have created a small version of the raster as an example, albeit somewhat clumsily:
require(raster)
## create raster called "ras" rather clumsily
## create raster called "ras" rather clumsily
# (UTM coordinates and a probability value for each cell, not really
# important)
s.x = seq(249990, by = 30, length.out = 20)
s.y = seq(6189390, by = 30, length.out = 20)
x.l = lapply(1:20, function(x){
rep(s.x[x], 20)
})
x.l2 = as.vector(c(x.l[[1]], x.l[[2]], x.l[[3]], x.l[[4]], x.l[[5]],
x.l[[6]], x.l[[7]], x.l[[8]], x.l[[9]], x.l[[10]],
x.l[[11]], x.l[[12]], x.l[[13]], x.l[[14]], x.l[[15]],
x.l[[16]],x.l[[17]], x.l[[18]], x.l[[19]], x.l[[20]]))
df = as.data.frame(cbind(x.l2, rep(s.y, 20), rnorm(20*20, 0.5, 0.2)))
colnames(df) = c("x", "y", "P")
coordinates(df) <- ~ x + y
gridded(df) <- TRUE
ras = raster(df)
# for each cell, make a vector of the values at
# the cell and all <=8 adjacent cells:
vl = lapply(1:length(ras), function(x){
extract(ras,
(c(x,(adjacent(ras, x, directions=8, pairs=F, sorted=F)))))
})
# find the mean for each cell
vm = sapply(1:length(ras), function(x){
as.vector(mean(vl[[x]], na.rm = T))
})
# create raster template
templ = ras/ras
# multiply into template for new raster
ras = vm*templ

Calculate all distances between two set of points using st_distance

I have two sets of points stored in R as sf objects. Point object x contains 204,467 and point y contains 5,297 points.
In theory, I would want to calculate the distance from all points in x to all points in y. I understand that this would create a beast of a matrix, but it is doable using st_distance(x, y, by_element=FALSE) in the sf package in about 40 minutes on my i7 desktop.
What I want to do is to calculate the distance from all of the points in x to all of the points in y, then I want to convert this into a data.frame, that contains all variables for the respective x and y pair of points. This is because I want flexibility in terms of aggregation using dplyr, for instance, I want to find the number of points in y, that is within 10, 50, 100 km from x, and where x$year < y$year.
I successfully created the distance matrix, which has around 1,083,061,699 cells. I know this is a very inefficient way of doing this, but it gives flexibility in terms of aggregation. Other suggestions are welcome.
Below is code to create two sf point objects, and measure the distance between them. Next, I would want to convert this into a data.frame with all variables from x and y, but this is where I fail to proceed.
If my suggested workflow is unfeasible, can someone provide an alternative solution to measure distance to all points within a predefined radius, and create a data.frame of the result with all variables from x and y?
# Create two sf point objects
set.seed(123)
library(sf)
pts1 <- st_as_sf(x = data.frame(id=seq(1,204467,1),
year=sample(seq(from = 1990, to = 2018, by = 1), size = 204467, replace = TRUE),
xcoord=sample(seq(from = -180, to = 180, by = 1), size = 204467, replace = TRUE),
ycoord=sample(seq(from = -90, to = 90, by = 1), size = 204467, replace = TRUE)),
coords=c("xcoord","ycoord"),crs=4326)
pts2 <- st_as_sf(x = data.frame(id=seq(1,5297,1),
year=sample(seq(from = 1990, to = 2018, by = 1), size = 5297, replace = TRUE),
xcoord=sample(seq(from = -180, to = 180, by = 1), size = 5297, replace = TRUE),
ycoord=sample(seq(from = -90, to = 90, by = 1), size = 5297, replace = TRUE)),
coords=c("xcoord","ycoord"),crs=4326)
distmat <- st_distance(pts1,pts2,by_element = FALSE)
I would consider approaching this differently. Once you have your distmat matrix, you can do the types of calculation you describe without needing a data.frame. You can use standard subsetting to find which points meet your specified criteria.
For example, to find the combinations of points where pts1$year is greater than pts2$year we can do:
subset_points = outer(pts1$year, pts2$year, `>`)
Then, to find how many of these are separated more than 100 km, we can do
library(units)
sum(distmat[subset_points] > (100 * as_units('km', 1)))
A note on memory usage
However you approach this with sf or data.frame objects, the chances are that you will start to bump up against RAM limits with 1e9 floating points in each matrix or column of a data.table. You might think about instead converting your distance matrix to a raster. Then the raster can be stored on disk rather than in memory, and you can utilise the memory-safe functions in the raster package to crunch your way through.
How we might use rasters to work from disk and save RAM
We can use memory-safe raster operations for the very large matrices like this, for example:
library(raster)
# convert our matrices to rasters, so we can work on them from disk
r = raster(matrix(as.numeric(distmat), length(pts1$id), length(pts2$id)))
s = raster(subset_points)
remove('distmat', 'subset_points')
# now create a raster equal to r, but with zeroes in the cells we wish to exclude from calculation
rs = overlay(r,s,fun=function(x,y){x*y}, filename='out1.tif')
# find which cells have value greater than x (1e6 in the example)
Big_cells = reclassify(rs, matrix(c(-Inf, 1e6, 0, 1e6, Inf, 1), ncol=3, byrow=TRUE), 'out.tiff', overwrite=T)
# and finally count the cells
N = cellStats(Big_cells, sum)

Spatial correlogram using the raster package

Dear Crowd
Problem
I tried to calculate a spatial correlogram with the packages nfc, pgirmess, SpatialPack and spdep. However, I was troubling to define the start and end-point of the distance. I'm only interested in the spatial autocorrelation at smaller distances, but there on smaller bins. Additionally, as the raster is quite large (1.8 Megapixels), I run into memory troubles with these packages but the SpatialPack.
So I tried to produce my own code, using the function Moran from the package raster. But I must have some error, as the result for the complete dataset is somewhat different than the one from the other packages. If there is no error in my code, it might at least help others with similar problems.
Question
I'm not sure, whether my focal matrix is erroneous. Could you please tell me whether the central pixel needs to be incorporated? Using the testdata I can't show the differences between the methods, but on my complete dataset, there are differences visible, as shown in the Image below. However, the bins are not exactly the same (50m vs. 69m), so this might explain parts of the differences. However, at the first bin, this explanation seems not to be plausible to me. Or might the irregular shape of my raster, and different ways to handle NA's cause the difference?
Comparison of Own method with the one from SpatialPack
Runable Example
Testdata
The code for calculating the testdata is taken from http://www.petrkeil.com/?p=1050#comment-416317
# packages used for the data generation
library(raster)
library(vegan) # will be used for PCNM
# empty matrix and spatial coordinates of its cells
side=30
my.mat <- matrix(NA, nrow=side, ncol=side)
x.coord <- rep(1:side, each=side)*5
y.coord <- rep(1:side, times=side)*5
xy <- data.frame(x.coord, y.coord)
# all paiwise euclidean distances between the cells
xy.dist <- dist(xy)
# PCNM axes of the dist. matrix (from 'vegan' package)
pcnm.axes <- pcnm(xy.dist)$vectors
# using 8th PCNM axis as my atificial z variable
z.value <- pcnm.axes[,8]*200 + rnorm(side*side, 0, 1)
# plotting the artificial spatial data
r <- rasterFromXYZ(xyz = cbind(xy,z.value))
plot(r, axes=F)
Own Code
library(raster)
sp.Corr <- matrix(nrow = 0,ncol = 2)
formerBreak <- 0 #for the first run important
for (i in c(seq(10,200,10))) #Calculate the Morans I for these bins
{
cat(paste0("..",i)) #print the bin, which is currently calculated
w = focalWeight(r,d = i,type = 'circle')
wTemp <- w #temporarily saves the weigtht matrix
if (formerBreak>0) #if it is the second run
{
midpoint <- ceiling(ncol(w)/2) # get the midpoint
w[(midpoint-formerBreak):(midpoint+formerBreak),(midpoint-formerBreak):(midpoint+formerBreak)] <- w[(midpoint-formerBreak):(midpoint+formerBreak),(midpoint-formerBreak):(midpoint+formerBreak)]*(wOld==0)#set the previous focal weights to 0
w <- w*(1/sum(w)) #normalizes the vector to sum the weights to 1
}
wOld <- wTemp #save this weight matrix for the next run
mor <- Moran(r,w = w)
sp.Corr <- rbind(sp.Corr,c(Moran =mor,Distance = i))
formerBreak <- i/res(r)[1]#divides the breaks by the resolution of the raster to be able to translate them to the focal window
}
plot(x=sp.Corr[,2],y = sp.Corr[,1],type = "l",ylab = "Moran's I",xlab="Upper bound of distance")
Other methods to calculate the Spatial Correlogram
library(SpatialPack)
sp.Corr <- summary(modified.ttest(z.value,z.value,coords = xy,nclass = 21))
plot(x=sp.Corr$coef[,1],y = data$coef[,4],type = "l",ylab = "Moran's I",xlab="Upper bound of distance")
library(ncf)
ncf.cor <- correlog(x.coord, y.coord, z.value,increment=10, resamp=1)
plot(ncf.cor)
In order to compare the results of the correlogram, in your case, two things should be considered. (i) your code only works for bins proportional to the resolution of your raster. In that case, a bit of difference in the bins could make to include or exclude an important amount of pairs. (ii) The irregular shape of the raster has a strong impact of the pairs that are considered to compute the correlation for certain distance interval. So your code should deal with both, allow any value for the length of bin and consider the irregular shape of the raster. A small modification of your code to tackle those problems are below.
# SpatialPack correlation
library(SpatialPack)
test <- modified.ttest(z.value,z.value,coords = xy,nclass = 21)
# Own correlation
bins <- test$upper.bounds
library(raster)
sp.Corr <- matrix(nrow = 0,ncol = 2)
for (i in bins) {
cat(paste0("..",i)) #print the bin, which is currently calculated
w = focalWeight(r,d = i,type = 'circle')
wTemp <- w #temporarily saves the weigtht matrix
if (i > bins[1]) {
midpoint <- ceiling(dim(w)/2) # get the midpoint
half_range <- floor(dim(wOld)/2)
w[(midpoint[1] - half_range[1]):(midpoint[1] + half_range[1]),
(midpoint[2] - half_range[2]):(midpoint[2] + half_range[2])] <-
w[(midpoint[1] - half_range[1]):(midpoint[1] + half_range[1]),
(midpoint[2] - half_range[2]):(midpoint[2] + half_range[2])]*(wOld==0)
w <- w * (1/sum(w)) #normalizes the vector to sum the weights to 1
}
wOld <- wTemp #save this weight matrix for the next run
mor <- Moran(r,w=w)
sp.Corr <- rbind(sp.Corr,c(Moran =mor,Distance = i))
}
# Comparing
plot(x=test$upper.bounds, test$imoran[,1], col = 2,type = "b",ylab = "Moran's I",xlab="Upper bound of distance", lwd = 2)
lines(x=sp.Corr[,2],y = sp.Corr[,1], col = 3)
points(x=sp.Corr[,2],y = sp.Corr[,1], col = 3)
legend('topright', legend = c('SpatialPack', 'Own code'), col = 2:3, lty = 1, lwd = 2:1)
The image shows that the results of using the SpatialPack package and the own code are the same.

Resources