I have a grid that contains gaps (NAs) that I want to fill using interpolation. My grid shows autocorrelation in the x and y dimensions, so I would like to try bilinear interpolation. Most of the solutions I have found are focused on 'upsampling' (interpolation for the purpose of increasing number of samples/size of grid), but I do not want/need to change the grid size. I just want to fill NAs using interpolation. Other potential solutions do not seem to handle NAs for the input grid of values (the 'z matrix'), or are neighborhood-based solutions rather than bilinear interpoloation, or simply have no answer.
I found that with the raster package, I can input a grid (as a raster) that contains NAs, and use the 'resample' command to output a grid of the same size. However, the results look like nearest neighbor interpolation rather than bilinear interpolation.
Am I missing something such that there is a way to do bilinear interpolation with the raster package? Or is there a better way to do bilinear interpolation simply to fill NAs?
library(raster)
# raster containing gap
r <- raster(nrow=10, ncol=10)
r[] <- 1:ncell(r)
r[25] <- NA
# The s raster is the same size as the r raster
s <- raster(nrow=10, ncol=10)
s <- resample(r, s, method='bilinear')
plot(r)
plot(s)
s[25]
s[35]
# s[25] appears to have been filled with neighbor s[35]
UPDATE
The Akima package seems like a promising alternative to the raster approach above, but I'm having trouble if there are NAs in the input grid of values (the Z matrix). Here's an example parallel to the example above to demonstrate. (Again, I'm interpolating to a grid the same size as the original).
library(akima)
# Use bilinear interpolation (no NAs in input)
rmat<-matrix(seq(1,100,1), nrow = 10, ncol = 10, byrow = T)
x <- seq(1,10,1)
y <- seq(1,10,1)
smat <- bilinear.grid(x, y, rmat, nx = 10, ny = 10) # works
plot(raster(rmat), main = "original")
plot(raster(smat$z), main = "interpolated")
# Try using bilinear interpolation but with an NA
rmat<-matrix(seq(1,100,1), nrow = 10, ncol = 10, byrow = T)
rmat[3,5] <- NA
x <- seq(1,10,1)
y <- seq(1,10,1)
smat <- bilinear.grid(x, y, rmat, nx = 10, ny = 10) # Error about NAs
UPDATE2
There was a great question from #Robert Hijmans about why not use a moving window average with the focal() command in the raster package. The reason is that I want to try bilinear interpolation, and I don't think a moving window average always gives the same answer as bilinear interpolation. However, this was not clear in the example I posted (in that example moving window and bilinear interp do give the same answer), so I'll demonstrate in a new example below. Note that the bilinear interpolation solution should be 8 for the example below (here is a handy calculator for tests).
library(raster)
r <- raster(nrow=10, ncol=10)
# Different grid values than earlier examples
values(r) <- c(rep(1:5, 4), rep(4:8, 4), rep(1:5, 4), rep(4:8, 4), rep(1:5, 4))
r[25] <- NA
plot(r)
# See what the mean of the moving window produces
f <- focal(r, w=matrix(1,nrow=3, ncol=3), fun=mean, NAonly=TRUE, na.rm=TRUE)
f[25] # Moving window gives 5 but bilinear interp gives 8
# Note that this seems to be how the moving window works with equal weights
window_test <- c(r[14:16], r[24:26], r[34:36])
mean(window_test, na.rm = T)
Am I missing something here? Maybe there is something clever with the weights argument of focal() that can produce a bilinear interpolation solution?
Let's use equal distance cells to avoid differences because of cell size variation with lon/lat data
library(raster)
r <- raster(nrow=10, ncol=10, crs='+proj=utm +zone=1 +datum=WGS84', xmn=0, xmx=1, ymn=0, ymx=1)
For this example, you might use focal
values(r) <- 1:ncell(r)
r[25] <- NA
f <- focal(r, w=matrix(1,nrow=3, ncol=3), fun=mean, NAonly=TRUE, na.rm=TRUE)
I see that you dismiss "neighborhood-based solutions rather than bilinear interpoloation". But the question is why. In this case, you may want a neighborhood-based solution.
Update. Then again, in case of cells that are not approximately square, bilinear would be preferable.
values(r) <- c(rep(1:5, 4), rep(4:8, 4), rep(1:5, 4), rep(4:8, 4), rep(1:5, 4))
r[25] <- NA
The problem with bilinear interpolation normally uses 4 contiguous cells, but in this case, where you want the value for the center of a cell, the appropriate cell would be the value of the cell itself, because the distance to that cell is zero, and thus that is where the interpolation ends up. For example, for cell 23
extract(r, xyFromCell(r, 23))
#6
extract(r, xyFromCell(r, 23), method='bilinear')
#[1] 6
In this case the focal cell is NA, so you get the average of the focal cell and 3 more cells. The question is which three? It is arbitrary, but to make it work, the NA cell must get a value. The raster algorithm assigns the value below the NA cell to that cell (also 8 here). This works well, I think, to deal with NA values at edges (e.g. land/ocean), but perhaps not in this case.
`
extract(r, xyFromCell(r, 25))
#NA
extract(r, xyFromCell(r, 25), method='bilinear')
#[1] 8
That is also what resample gives
resample(r, r)[25]
# 8
Is this what the on-line calculator suggests too?
This is very sensitive to small changes
extract(r, xyFromCell(r, 25)+0.0001, method='bilinear')
#[1] 4.998997
What I would really want in this case is the mean of the rook-neighbors
mean(r[adjacent(r, 25, pairs=FALSE)])
[1] 6
Or, more generally, the local inverse distance weighted average. You can compute
that by setting up a weights matrix with focal
# compute weights matrix
a <- sort(adjacent(r, 25, 8, pairs=F, include=TRUE))
axy <- xyFromCell(r, a)
d <- pointDistance(axy, xyFromCell(r, 25), lonlat=F)
w <- matrix(d, 3, 3)
w[2,2] <- 0
w <- w / sum(w)
# A simpler approach could be:
# w <- matrix(c(0,.25,0,.25,0,.25,0,.25,0), 3, 3)
foc <- focal(r, w, na.rm=TRUE, NAonly=TRUE)
foc[25]
In this example this is fine; but it would not be correct if there were multiple NA values in the focal area (as the sum of weights would no longer be 1). We can correct for that by computing the sum of weights
x <- as.integer(r/r)
sum_weights <- focal(x, w, na.rm=TRUE, NAonly=TRUE)
fw <- foc/sum_weights
done <- cover(r, fw)
done[25]
Related
I want to perform a neighborhood analysis in R to smooth the layer I have but keeping all the NAs of the input raster intact.
However, when I do, for instance, the following, the calculation "propagates" over the NA values - what it is an undesiderable behavior, in my case.
library(terra)
library(dplyr)
# load example raster in metric system
f <- system.file("ex/elev.tif", package="terra")
r <- rast(f) %>%
terra::project("EPSG:32631")
# focal
neigh <- terra::focal(r, w = 7, fun = "mean")
# plot
plot(c(r, neigh))
Update:
Following the suggestion made by #dww below, I could use terra::mask. A way to deal with that, then, would be:
# focal
neigh <- terra::focal(r, w = 7, fun = "mean") %>%
terra::mask(mask = r)
# plot
plot(c(r, neigh))
Is there another way out avoid the propagation of values to NA cells within focal?
(here it is a simple example of a square filter to calculate the mean, but I am searching something that would be usefull for all types of filter, e.g. any matrix defined by terra::focalMat())
Should I deal with that when defining the weight matrix?
With terra the focal method has an argument na.policy that can be set to one of "all", "only" or "omit".
library(terra)
#terra 1.5.6
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
r[45:50, 45:50] <- NA
f1 <- focal(r, 7, "mean", na.policy="omit", na.rm=TRUE)
plot(f1, fun=lines(v))
This is equivalent, but possibly more efficient, to using focal and mask:
f2 <- focal(r, 7, "mean", na.rm=TRUE) |> mask(r)
library(raster)
r <- raster(ncol=10, nrow=10)
s <- stack(lapply(1:5, function(i) setValues(r, runif(100, -1, 1))))
I want to do two things:
replace negative values with NA in each raster layer
scale each cell using the mean and standard deviation for that cell across the 5 layers
for(i in 1:5){
s[[i]][s[[i]] < 0] <- NA
}
For the second task, how can I scale individual cell using its mean and standard deviation
along the raster layer. For e.g. Is the following implementation correct?
scale(s, center = TRUE, scale = TRUE)
I think what it is doing is taking a layer, calculate the layer mean and sd and then
normalising each cell by using this mean and sd. What I want is to normalise
each cell using its mean and sd across the 5 layers
To replace negative values with NA you can use reclassify
Example data
library(raster)
r <- raster(ncol=10, nrow=10)
s <- stack(lapply(1:5, function(i) setValues(r, runif(100, -1, 1))))
Solution
x <- reclassify(s, cbind(-Inf, 0, NA))
To apply scale, or a similar function, to each cell, you can use calc
z <- calc(x, scale)
I'm trying to calculate the majority value in a categorical raster data set in R, for example using land cover data. This would be similar to the focal statistics tool in ArcGIS using the majority statistic. I am able to calculate the majority land cover type using a rectangular moving window and the modal function:
library(raster)
# create data
r <- raster(nrows = 120, ncol = 120, xmn=0)
r[] <- sample(3, ncell(r), replace=TRUE)
a<-focal(r, w=matrix(1,3,3), fun=modal) # 3x3 moving window
plot(a)
However, when I apply a focal weight to define my circular moving window, the output values appear to be treated as continuous numbers and not discrete values, even if I convert the raster to a factor:
#convert to a factor factor
r.f<-as.factor(r)
#set up window
fw <- focalWeight(r.f, 4.5, type='circle')
#apply focal fxn
r.f.focal<-focal(r.f, w=fw, fun=modal, na.rm=TRUE)
It appears zeros might be added into the analysis and that is creating the problem. Can anyone steer me in the right direction?
I think it will be fixed if d (which is now 4.5) is an integer. Your defined radius should be based on cell counts (1,2,3,...,n). However, still, the output will be in floating format as each xij in the kernel is a floating point with sum of 1. To achieve an integer output there is also a third option.
library(raster)
set.seed(070319)
# create data
r <- raster(nrows = 120, ncol = 120, xmn=0)
r[] <- sample(3, ncell(r), replace=TRUE)
a<-focal(r, w=matrix(1,3,3), fun=modal) # 3x3 moving window
par(mfrow=c(1,2))
plot(r)
plot(a)
#set up window
fw <- focalWeight(r, 4, type='circle')
#apply focal fxn
r.f.focal<-focal(r, w=fw, fun=modal, na.rm=TRUE)
par(mfrow=c(1,2))
plot(r)
plot(r.f.focal)
for integer output you can also do this:
#set up window
fw <- ceiling(focalWeight(r, 4, type='circle'))#for integer output
#apply focal fxn
r.f.focal<-focal(r, w=fw, fun=modal, na.rm=TRUE)
par(mfrow=c(1,2), oma=c(0,0,0,1))
plot(r)
plot(r.f.focal)
Is there a way to extract the grid position or (preferably for rasters with an explicit extent) point/centroid coordinates of the pixels that match a particular value? I nearly have a pretty inefficient workflow converting to matrix and using which(mtrx == max(mtrx), arr.ind = TRUE) to get the matrix position(s), but this (a) loses geospatial information and (b) causes data to rotate 90 degrees in the matrix conversion process, both of which requiring extra code to make it work and slow the computations significantly. Is there an equivalent raster workflow anyone is aware of?
Example data:
library(raster)
set.seed(0)
r <- raster(ncols=10, nrows=10)
r[] <- sample(50, 100, replace=T)
Now do:
p <- rasterToPoints(r, function(x) x == 11)
To get
x y layer
[1,] 18 81 11
[2,] -126 63 11
[3,] -90 45 11
[4,] 54 -63 11
If you want the cell(s) with the maximum value
vmax = maxValue(r)
p <- rasterToPoints(r, function(x) all.equal(x, vmax)
(do not use #data#max)
I do not understand why you would coerce to a matrix? Perhaps I do not understand your question but, if I get you correctly, you could just query the raster values and then coerce to points to get the geographic position(s).
require(raster)
r <- raster(ncols=100, nrows=100)
r[] <- runif(ncell(r), 0,1)
# Coerce < max to NA and coerce result to points
rMax <- r
m = maxValue(r)
rMax[rMax != m] <- NA
( r.pts <- rasterToPoints (rMax) )
# You could also use the raster specific Which or which.max functions.
i <- which.max(r)
xy.max <- xyFromCell(r, i)
plot(r)
points(xy.max, pch=19, col="black")
# Or for a more general application of Which
i <- Which(r >= 0.85, cells=TRUE)
xy.max <- xyFromCell(r, i)
plot(r)
points(xy.max, pch=19, col="black")
# If you prefer a raster object set cells=FALSE
i <- Which(r >= 0.85, cells=FALSE)
plot(i)
There are multiple raster functions that will allow you to pass custom or base functions to them. You may want to take a look at "focal" which is a local operator or "calc" . You may want to also read through the help related to raster.
To extend Jeffrey's answer, you can select the last instance of the lowest raster value with the following:
r <- raster(ncols=12, nrows=12)
set.seed(0)
r[] <- round(runif(ncell(r))*0.7 )
rc <- clump(r)
rc[12,8]<-1
plot(rc)
xy.min<-data.frame(xyFromCell(rc,max(which.min(rc))))
xy.min$dat<-1
coordinates(xy.min)<-~x+y
points(xy.min,lwd=2)
I was wondering if there was a function that scales down matrices in R statistical software exactly like with image resizing. The function imresize() in MATLAB is exactly what I'm looking for (I believe it takes the average of the surrounding points, but I am not sure of this), but I am wondering if there is an R equivalent for this function.
This question has been posted before on this forum, but with reference to MATLAB, not R:
Matlab "Scale Down" a Vector with Averages
The post starting with "Any reason why you can't use the imresize() function?" is exactly what I am looking for, but in R, not MATLAB.
Say I have a latitude-longitude grid of temperatures around the world, and let's say this is represented by a 64*128 matrix of temperatures. Now let's say I would like to have the same data contained in a new matrix, but I would like to rescale my grid to make it a 71*114 matrix of temperatures around the world. A function that would allow me to do so is what I'm looking for (again, the imresize() function, but in R, not MATLAB)
Thank you.
Steve
One way to do this is by using the function resample(), from the raster package.
I'll first show how you could use it to rescale your grid, and then give an easier-to-inspect example of its application to smaller raster objects
Use resample() to resize matrices
library(raster)
m <- matrix(seq_len(68*128), nrow=68, ncol=128, byrow=TRUE)
## Convert matrix to a raster with geographical coordinates
r <- raster(m)
extent(r) <- extent(c(-180, 180, -90, 90))
## Create a raster with the desired dimensions, and resample into it
s <- raster(nrow=71, ncol=114)
s <- resample(r,s)
## Convert resampled raster back to a matrix
m2 <- as.matrix(s)
Visually confirm that resample() does what you'd like:
library(raster)
## Original data (4x4)
rr <- raster(ncol=4, nrow=4)
rr[] <- 1:16
## Resize to 5x5
ss <- raster(ncol=5, nrow=5)
ss <- resample(rr, ss)
## Resize to 3x3
tt <- raster(ncol=3, nrow=3)
tt <- resample(rr, tt)
## Plot for comparison
par(mfcol=c(2,2))
plot(rr, main="original data")
plot(ss, main="resampled to 5-by-5")
plot(tt, main="resampled to 3-by-3")
The answer posted by Josh O'Brien is OK and it helped me (for starting point), but this approach was too slow since I had huge list of data. The method below is good alternative. It uses fields and works much faster.
Functions
rescale <- function(x, newrange=range(x)){
xrange <- range(x)
mfac <- (newrange[2]-newrange[1])/(xrange[2]-xrange[1])
newrange[1]+(x-xrange[1])*mfac
}
ResizeMat <- function(mat, ndim=dim(mat)){
if(!require(fields)) stop("`fields` required.")
# input object
odim <- dim(mat)
obj <- list(x= 1:odim[1], y=1:odim[2], z= mat)
# output object
ans <- matrix(NA, nrow=ndim[1], ncol=ndim[2])
ndim <- dim(ans)
# rescaling
ncord <- as.matrix(expand.grid(seq_len(ndim[1]), seq_len(ndim[2])))
loc <- ncord
loc[,1] = rescale(ncord[,1], c(1,odim[1]))
loc[,2] = rescale(ncord[,2], c(1,odim[2]))
# interpolation
ans[ncord] <- interp.surface(obj, loc)
ans
}
Lets look how it works
## Original data (4x4)
rr <- matrix(1:16, ncol=4, nrow=4)
ss <- ResizeMat(rr, c(5,5))
tt <- ResizeMat(rr, c(3,3))
## Plot for comparison
par(mfcol=c(2,2), mar=c(1,1,2,1))
image(rr, main="original data", axes=FALSE)
image(ss, main="resampled to 5-by-5", axes=FALSE)
image(tt, main="resampled to 3-by-3", axes=FALSE)