Finding the peak of a mountain - r

so I've combined those 2 rasters and made them into one dem raster which contains elevation values:
dem1 = read_stars("srtm_43_06.tif")
dem2 = read_stars("srtm_44_06.tif")
pol = st_read("israel_borders.shp")
dem = st_mosaic(dem1, dem2)
dem = dem[, 5687:6287, 2348:2948]
names(dem) = "elevation"
dem = st_warp(src = dem, crs = 32636, method = "near", cellsize = 90)
Now I need to calculate a point geometry of the peak of the mountain by finding the centroid of the pixel that has the highest elevation in the image, does anyone know what functions I can use?

Building on Grzegorz Sapijaszko's example, here is an alternative path to the top of the mountain.
library(terra)
f <- system.file("ex/elev.tif", package="terra")
x <- rast(f)
If there is a single maximum, you can do
g <- global(x, which.max)
xyFromCell(x, g[,1])
# x y
#[1,] 6.020833 50.17917
Now, consider a situation with multiple maxima. I add three more cells with the maximum value.
x[c(1000, 2500, 5000)] <- 547
We can find the four highest peaks with:
g <- global(x, which.max)[[1]]
v <- x[g] |> unlist()
y <- ifel(x == v, v, NA)
p <- as.points(y)
crds(p)
#[1,] 6.020833 50.17917
#[2,] 6.154167 50.10417
#[3,] 5.987500 49.97083
#[4,] 6.237500 49.75417
You should not warp (project with terra) the raster data first because that changes the cell values and potentially the location of the highest peak. You should find the peaks with the original data, but then you can transform the results like this.
pp <- project(p, "EPSG:32636")
crds(pp)
# x y
#[1,] -1411008 5916157
#[2,] -1404896 5904422
#[3,] -1422145 5894509
#[4,] -1413735 5864236
With your files, you could start with something like
ff <- c("srtm_43_06.tif", "srtm_44_06.tif")
v <- vrt(ff)
g <- global(x, which.max)
And then continue as in the examples above.

Let's use terra, however similar approach can be applied by raster package as well. For testing purposes we will use raster supplied with terra package
library(terra)
#> terra 1.5.12
f <- system.file("ex/elev.tif", package="terra")
v <- rast(f)
plot(v)
You can check the details of your raster just typing the raster object name and pressing enter, you can check the min and max values with minmax() function form terra:
minmax(v)
#> elevation
#> [1,] 141
#> [2,] 547
Let's create another raster by copying original one, however checking if the value is the max value of elevation:
w <- v == minmax(v)[2]
plot(w)
Let's create a substitution matrix, and substitute all FALSE with NA and TRUE with 1:
mx <- matrix(c(FALSE, NA, TRUE, 1), ncol = 2, byrow = TRUE)
w <- classify(w, mx)
plot(v)
plot(as.polygons(w), add=TRUE)
Let's find centroids of those polygon(s):
pts <- centroids(as.polygons(w))
plot(pts, add=TRUE)
Let's see our coordinates:
as.data.frame(pts, geom = "WKT")
#> elevation geometry
#> 1 1 POINT (6.020833 50.179167)
Created on 2022-01-29 by the reprex package (v2.0.1)

Related

sf: Generate random points with maximal distance condition

I'd like to generate 100 random points but imposed a maximal distance around points using st_buffer() of size 1000 meters around each point, and eliminating any offending points. But, in my example:
library(sf)
# Data set creation
set.seed(1)
df <- data.frame(
gr = c(rep("a",5),rep("b",5)),
x = rnorm(10),
y = rnorm(10)
)
df <- st_as_sf(df,coords = c("x","y"),remove = F, crs = 4326)
df.laea = st_transform(df,
crs = "+proj=laea +x_0=4600000 +y_0=4600000 +lon_0=0.13 +lat_0=0.24 +datum=WGS84 +units=m")
st_bbox(df.laea)
#
# Random simulation of 100 point inside df.laea extent
sim_study_area <- st_sample(st_as_sfc(st_bbox(df.laea)), 100) %>% # random points, as a list ...
st_sf()
border_area <- st_as_sfc(st_bbox(df.laea))%>% # random points, as a list ...
st_sf()
# I'd like to imposed a maximal distance of 1000 meters around points and for this:
i <- 1 # iterator start
buffer_size <- 1000 # minimal distance to be enforced (in meters)
repeat( {
# create buffer around i-th point
buffer <- st_buffer(sim_study_area[i,], buffer_size)
offending <- sim_study_area %>% # start with the intersection of master points...
st_intersects(buffer, sparse = F) # ... and the buffer, as a vector
# i-th point is not really offending
offending[i] <- TRUE
# if there are any offending points left - re-assign the master points
sim_study_area <- sim_study_area[offending,]
if ( i >= nrow(sim_study_area)) {
# the end was reached; no more points to process
break
} else {
# rinse & repeat
i <- i + 1
}
} )
# Visualizantion of points create with the offending condition:
simulation_area <- ggplot() +
geom_sf(data = border_area, col = 'gray40', fill = NA, lwd = 1) +
geom_sf(data = sim_study_area, pch = 3, col = 'red', alpha = 0.67) +
theme_bw()
plot(simulation_area)
It's not OK result because a don't have 100 points and I don't know how I can fix it.
Please any ideas?
Thanks in advance,
Alexandre
I think that the easiest solution is to adopt one of the sampling functions defined in the R package spatstat. For example:
# packages
library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
# create data
set.seed(1)
df <- data.frame(
gr = c(rep("a",5),rep("b",5)),
x = rnorm(10),
y = rnorm(10)
)
df <- st_as_sf(df,coords = c("x","y"),remove = F, crs = 4326)
df.laea = st_transform(
df,
crs = "+proj=laea +x_0=4600000 +y_0=4600000 +lon_0=0.13 +lat_0=0.24 +datum=WGS84 +units=m"
)
Now we sample with a Simple Sequential Inhibition Process. Check ?spatstat.core::rSSI for more details.
sampled_points <- st_sample(
x = st_as_sfc(st_bbox(df.laea)),
type = "SSI",
r = 1000, # threshold distance (in metres)
n = 100 # number of points
)
# Check result
par(mar = rep(0, 4))
plot(st_as_sfc(st_bbox(df.laea)), reset = FALSE)
plot(sampled_points, add = TRUE, pch = 16)
# Estimate all distances
all_distances <- st_distance(sampled_points)
all_distances[1:5, 1:5]
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.00 57735.67 183205.74 189381.50 81079.79
#> [2,] 57735.67 0.00 153892.93 143755.73 61475.85
#> [3,] 183205.74 153892.93 0.00 62696.68 213379.39
#> [4,] 189381.50 143755.73 62696.68 0.00 194237.12
#> [5,] 81079.79 61475.85 213379.39 194237.12 0.00
# Check they are all greater than 1000
sum(all_distances < 1000)
#> [1] 100 # since the diagonal is full of 100 zeros
Created on 2021-08-12 by the reprex package (v2.0.0)
Check here (in particular the answer from Prof. Baddeley), the references therein, and the help page of st_sample for more details.

Rasterizing polygons with complicated weighting

Imagine a regular 0.5° grid across the Earth's surface. A 3x3 subset of this grid is shown below. As a stylized example of what I'm working with, let's say I have three polygons—yellow, orange, and blue—that for the sake of simplicity all are 1 unit in area. These polygons have attributes Population and Value, which you can see in the legend:
I want to turn these polygons into a 0.5° raster (with global extent) whose values are based on the weighted-mean Value of the polygons. The tricky part is that I want to weight the polygons' values based on not their Population, but rather on their included population.
I know—theoretically—what I want to do, and below have done it for the center gridcell.
Multiply Population by Included (the area of the polygon that is included in the gridcell) to get Pop. included. (Assumes population is distributed evenly throughout polygon, which is acceptable.)
Divide each polygon's Included_pop by the sum of all polygons' Included_pop (32) to get Weight.
Multiply each polygon's Value by Weight to get Result.
Sum all polygons' Result to get the value for the center gridcell (0.31).
Population
Value
Frac. included
Pop. included
Weight
Result
Yellow
24
0.8
0.25
6
0.1875
0.15
Orange
16
0.4
0.5
8
0.25
0.10
Blue
18
0.1
1
18
0.5625
0.06
32
0.31
I have an idea of how to accomplish this in R, as described below. Where possible, I've filled in code that I think will do what I want. My questions: How do I do steps 2 and 3? Or is there a simpler way to do this? If you want to play around with this, I have uploaded old_polygons as a .rds file here.
library("sf")
library("raster")
Calculate the area of each polygon: old_polygons$area <- as.numeric(st_area(old_polygons))
Generate the global 0.5° grid as some kind of Spatial object.
Split the polygons by the grid, generating new_polygons.
Calculate area of the new polygons: new_polygons$new_area <- as.numeric(st_area(new_polygons))
Calculate fraction included for each new polygon: new_polygons$frac_included <- new_polygons$new_area / new_polygons$old_area
Calculate "included population" in the new polygons: new_polygons$pop_included <- new_polygons$pop * new_polygons$frac_included
Calculate a new attribute for each polygon that is just their Value times their included population. new_polygons$tmp <- new_polygons$Value * new_polygons$frac_included
Set up an empty raster for the next steps: empty_raster <- raster(nrows=360, ncols=720, xmn=-180, xmx=180, ymn=-90, ymx=90)
Rasterize the polygons by summing this new attribute together within each gridcell. tmp_raster <- rasterize(new_polygons, empty_raster, "tmp", fun = "sum")
Create another raster that is just the total population in each gridcell: pop_raster <- rasterize(new_polygons, empty_raster, "pop_included", fun = "sum")
Divide the first raster by the second to get what I want:
output_raster <- empty_raster
values(output_raster) <- getValues(tmp_raster) / getValues(pop_raster)
Any help would be much appreciated!
Example data:
library(terra)
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
values(v) <- data.frame(population=1:12, value=round(c(2:13)/14, 2))
r <- rast(ext(v)+.05, ncols=4, nrows=6, names="cell")
Illustrate the data
p <- as.polygons(r)
plot(p, lwd=2, col="gray", border="light gray")
lines(v, col=rainbow(12), lwd=2)
txt <- paste0(v$value, " (", v$population, ")")
text(v, txt, cex=.8, halo=TRUE)
Solution:
# area of the polygons
v$area1 <- expanse(v)
# intersect with raster cell boundaries
values(r) <- 1:ncell(r)
p <- as.polygons(r)
pv <- intersect(p, v)
# area of the polygon parts
pv$area2 <- expanse(pv)
pv$frac <- pv$area2 / pv$area1
Now we just use the data.frame with the attributes of the polygons to compute the polygon-cover-weighted-population-weighted values.
z <- values(pv)
a <- aggregate(z[, "frac", drop=FALSE], z[,"cell",drop=FALSE], sum)
names(a)[2] <- 'fsum'
z <- merge(z, a)
z$weight <- z$population * z$frac / z$fsum
z$wvalue <- z$value * z$weight
b <- aggregate(z[, c("wvalue", "weight")], z[, "cell", drop=FALSE], sum)
b$bingo <- b$wvalue / b$weight
Assign values back to raster cells
x <- rast(r)
x[b$cell] <- b$bingo
Inspect results
plot(x)
lines(v)
text(x, digits=2, halo=TRUE, cex=.9)
text(v, "value", cex=.8, col="red", halo=TRUE)
This may not scale very well to large data sets, but you could perhaps do it in chunks.
This is fast and scalable:
library(data.table)
library(terra)
# make the 3 polygons with radius = 5km
center_points <- data.frame(lon = c(0.5, 0.65, 1),
lat = c(0.75, 0.65, 1),
Population = c(16, 18, 24),
Value = c(0.4, 0.1, 0.8))
polygon <- vect(center_points, crs = "EPSG:4326")
polygon <- buffer(polygon, 5000)
# make the raster
my_raster <- rast(nrow = 3, ncol = 3, xmin = 0, xmax = 1.5, ymin = 0, ymax = 1.5, crs = "EPSG:4326")
my_raster[] <- 0 # set the value to 0 for now
# find the fractions of cells in each polygon
# "cells" gives you the cell ID and "weights" (or "exact") gives you the cell fraction in the polygon
# using "exact" instead of "weights" is more accurate
my_Table <- extract(my_raster, polygon, cells = TRUE, weights = TRUE)
setDT(my_Table) # convert to datatable
# merge the polygon attributes to "my_Table"
poly_Table <- setDT(as.data.frame(polygon))
poly_Table[, ID := 1:nrow(poly_Table)] # add the IDs which are the row numbers
merged_Table <- merge(my_Table, poly_Table, by = "ID")
# find Frac_included
merged_Table[, Frac_included := weight / sum(weight), by = ID]
# find Pop_included
merged_Table[, Pop_included := Frac_included * Population]
# find Weight, to avoid confusion with "weight" produced above, I call this "my_Weight"
merged_Table[, my_Weight := Pop_included / sum(Pop_included), by = cell]
# final results
Result <- merged_Table[, .(Result = sum(Value * my_Weight)), by = cell]
# add the values to the raster
my_raster[Result$cell] <- Result$Result
plot(my_raster)

R raster calculate areal weighted mean when scaling to a larger resolution with offset

I have two raster grids in R with different resolutions which don't line up exactly. In actual fact I have hundreds of each so any answer must be easily run many times.
I want to scale the finer resolution grid up to the coarser resolution by taking an areal weighted mean of the grid cells.
I was hoping I could use projectRaster or resample but neither give the desired output and I cannot use aggregate as I need my new grids to align to the coarser resolution grid.
For my real data my finer grid is 0.005 deg intervals and coarser is at 0.02479172 deg intervals and extents/origins don't exactly match up.
I've made an extreme version as an example why neither resample or projectRaster work
library(raster)
#> Warning: package 'raster' was built under R version 3.5.3
#> Loading required package: sp
testproj <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84"
testmat <- matrix(1, nrow = 8, ncol = 8)
# testmat <- matrix(sample(1:10, 64, replace = T), nrow = 8, ncol = 8)
testmat[1,5] <- 400
testmat[8,4] <- -400
testsmallraster <- raster(testmat, xmn=0, xmx=8, ymn=0, ymx=8)
crs(testsmallraster) <- testproj
plot(testsmallraster)
testlarger <- raster(matrix(rep(NA,4), nrow = 2, ncol = 2), xmn=0.3, xmx=8.3, ymn=0, ymx=8)
crs(testlarger) <- testproj
tout_reproj <- projectRaster(testsmallraster, testlarger)
tout_resamp <- resample(testsmallraster, testlarger)
tout_resampngb <- resample(testsmallraster, testlarger, method = "ngb")
tout_agg <- aggregate(testsmallraster, fact = 4)
#reprojected values ignore all but 4 cells closest to new centre
values(tout_reproj)
#> [1] 1 1 1 1
#resample uses bilinear interpolation which weights the grids cells furthest from the new centre less than those closest
# I need all grid cells entirely contained in the new grid to have equal weighting
#bilinear interpolation also weights cells which do not fall within the new cell at all which I do not want
values(tout_resamp)
#> [1] 10.851852 15.777778 -7.911111 -12.366667
#aggregate gives close to the values I want but they are not in the new raster origin/resolution and therefore not splitting values that fall across grid boundaries
values(tout_agg)
#> [1] 1.0000 25.9375 -24.0625 1.0000
#using ngb was never really going to make any sense but thought I'd as it for completeness
values(tout_resampngb)
#> [1] 1 1 1 1
#desired output first cell only 0.3 of a grid cell covers the grid cell = 400 the rest equal 1
#desired output second cell 0.7 of a grid cell covers the grid cell = 400 the rest equal 1
#desired output third cell has exactly 1 grid cell of -400 and 15 of 1
#desired output fourth cell only overlap grid cells = 1
desiredoutput <- raster(matrix(c((15.7*1+0.3*400)/16,(15.3*1+0.7*400)/16,mean(c(-400, rep(1,15))),1),byrow = T, nrow = 2, ncol = 2), xmn=0.3, xmx=8.3, ymn=0, ymx=8)
values(desiredoutput)
#> [1] 8.48125 18.45625 -24.06250 1.00000
Created on 2020-07-02 by the reprex package (v0.3.0)
You can get closer to the desired result by using a similar spatial resolution for resample, and then aggregate the results
library(raster)
testproj <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +datum=WGS84"
testmat <- matrix(1, nrow = 8, ncol = 8)
testmat[1,5] <- 400
testmat[8,4] <- -400
testsmallraster <- raster(testmat, xmn=0, xmx=8, ymn=0, ymx=8, crs=testproj)
testlarger <- raster(matrix(rep(NA,4), nrow = 2, ncol = 2), xmn=0.3, xmx=8.3, ymn=0, ymx=8, crs = testproj)
y <- disaggregate(testlarger, 4)
z <- resample(testsmallraster, y)
za <- aggregate(z, 4)
values(za)
#[1] 8.48125 18.45625 -24.06250 1.00000
for much better speed, try terra
library(terra)
a <- rast(testsmallraster)
b <- rast(testlarger)
b <- disaggregate(b, 4)
d <- resample(a, b)
da <- aggregate(d, 4)
values(da)
# layer
#[1,] 8.48125
#[2,] 18.45625
#[3,] -24.06250
#[4,] 1.00000
This probably ought to be done automatically by resample and project(Raster). raster attempts to do some of this for resample, but in this case not very satisfactorily.
When I needed to do similar resampling, this worked for me. This example is a 4-cell destination grid at 1o x 1o spacing with centroids at half degrees (to match some satellite data), and an offset half-degree grid for source data (ECMWF weather). 'Resample' does the heavy lifting of interpolating on mismatched grids. The code below is basically a manual version of a 'weights=' option that doesn't exist for resample. We need relative, not absolute, areas to be correct for weighting, so the caveat on the precision of raster::area described in the help seems of low concern.
library(raster)
wgs84 <- "+init=epsg:4326"
polar.brick.source <- array(dim = c(5, 5, 2), rep(c(1, 2), each = 25))
dimnames(polar.brick.source)[[1]] <- seq(-1, 1, by = .5)
dimnames(polar.brick.source)[[2]] <- seq(80, 82, by = .5)
dimnames(polar.brick.source)[[3]] <- c("time.a", "time.b")
# Add some outliers to see their effects.
polar.brick.source[1, 2, ] <- c(25, 50)
polar.brick.source[3, 2, 2] <- -30
polar.brick <- brick(polar.brick.source, crs = CRS(wgs84),
xmn = min(as.numeric(dimnames(polar.brick.source)[[1]])) - .25,
xmx = max(as.numeric(dimnames(polar.brick.source)[[1]])) + .25,
ymn = min(as.numeric(dimnames(polar.brick.source)[[2]])) - .25,
ymx = max(as.numeric(dimnames(polar.brick.source)[[2]])) + .25)
fine.polar.area <- raster::area(polar.brick)
polar.one.degree.source <- data.frame(
lon = c(-.5, .5, -.5, .5),
lat = c(80.5, 80.5, 81.5, 81.5),
placeholder = rep(1, 4))
polar.one.degree.raster <- rasterFromXYZ(polar.one.degree.source, crs = CRS(wgs84))
polar.one.degree.area <- raster::area(polar.one.degree.raster)
as.data.frame(polar.one.degree.area, xy = T)
fine.clip.layer <- disaggregate(polar.one.degree.raster, 2)
clipped.fine.polar <-resample(polar.brick * fine.polar.area,
fine.clip.layer)
new.weighted.wx <- aggregate(clipped.fine.polar * 4, 2)
as.data.frame(new.weighted.wx, xy = T) # look at partial results.
new.weather <- new.weighted.wx / polar.one.degree.area
as.data.frame(new.weather, xy = T)

3D with value interpolation in R (X, Y, Z, V)

Is there an R package that does X, Y, Z, V interpolation? I see that Akima does X, Y, V but I need one more dimension.
Basically I have X,Y,Z coordinates plus the value (V) that I want to interpolate. This is all GIS data but my GIS does not do voxel interpolation
So if I have a point cloud of XYZ coordinates with a value of V, how can I interpolate what V would be at XYZ coordinate (15,15,-12) ? Some test data would look like this:
X <-rbind(10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,30,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,40,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50)
Y <- rbind(10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50,10,10,10,10,10,20,20,20,20,20,30,30,30,30,30,40,40,40,40,40,50,50,50,50,50)
Z <- rbind(-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-5,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-17,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29,-29)
V <- rbind(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,25,35,75,25,50,0,0,0,0,0,10,12,17,22,27,32,37,25,13,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50,125,130,105,110,115,165,180,120,100,80,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
I had the same question and was hoping for an answer in R.
My question was: How do I perform 3D (trilinear) interpolation using regular gridded coordinate/value data (x,y,z,v)? For example, CT images, where each image has pixel centers (x, y) and greyscale value (v) and there are multiple image "slices" (z) along the thing being imaged (e.g., head, torso, leg, ...).
There is a slight problem with the given example data.
# original example data (reformatted)
X <- rep( rep( seq(10, 50, by=10), each=25), 3)
Y <- rep( rep( seq(10, 50, by=10), each=5), 15)
Z <- rep(c(-5, -17, -29), each=125)
V <- rbind(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,25,35,75,25,50,0,0,0,0,0,10,12,17,22,27,32,37,25,13,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,50,125,130,105,110,115,165,180,120,100,80,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
# the dimensions of the 3D grid described do not match the number of values
(length(unique(X))*length(unique(Y))*length(unique(Z))) == length(V)
## [1] FALSE
## which makes sense since 75 != 375
# visualize this:
library(rgl)
plot3d(x=X, y=Y, z=Z, col=terrain.colors(181)[V])
# examine the example data real quick...
df <- data.frame(x=X,y=Y,z=Z,v=V);
head(df);
table(df$x, df$y, df$z);
# there are 5 V values at each X,Y,Z coordinate... duplicates!
# redefine Z so there are 15 unique values
# making 375 unique coordinate points
# and matching the length of the given value vector, V
df$z <- seq(-5, -29, length.out=15)
head(df)
table(df$x, df$y, df$z);
# there is now 1 V value at each X,Y,Z coordinate
# that was for testing, now actually redefine the Z vector.
Z <- rep(seq(-5,-29, length.out = 15), 25)
# plot it.
library(rgl)
plot3d(x=X, y=Y, z=Z, col=terrain.colors(181)[V])
I couldn't find any 4D interpolation functions in the usual R packages, so I wrote a quick and dirty one. The following implements (without ANY error checking... caveat emptor!) the technique described at: https://en.wikipedia.org/wiki/Trilinear_interpolation
# convenience function #1:
# define a function that takes a vector of lookup values and a value to lookup
# and returns the two lookup values that the value falls between
between = function(vec, value) {
# extract list of unique lookup values
u = unique(vec)
# difference vector
dvec = u - value
vals = c(u[dvec==max(dvec[dvec<0])], u[dvec==min(dvec[dvec>0])])
return(vals)
}
# convenience function #2:
# return the value (v) from a grid data.frame for given point (x, y, z)
get_value = function(df, xi, yi, zi) {
# assumes df is data.frame with column names: x, y, z, v
subset(df, x==xi & y==yi & z==zi)$v
}
# inputs df (x,y,z,v), points to look up (x, y, z)
interp3 = function(dfin, xin, yin, zin) {
# TODO: check if all(xin, yin, zin) equals a grid point, if so just return the point value
# TODO: check if any(xin, yin, zin) equals a grid point, if so then do bilinear or linear interp
cube_x <- between(dfin$x, xin)
cube_y <- between(dfin$y, yin)
cube_z <- between(dfin$z, zin)
# find the two values in each dimension that the lookup value falls within
# and extract the cube of 8 points
tmp <- subset(dfin, x %in% cube_x &
y %in% cube_y &
z %in% cube_z)
stopifnot(nrow(tmp)==8)
# define points in a periodic and cubic lattice
x0 = min(cube_x); x1 = max(cube_x);
y0 = min(cube_y); y1 = max(cube_y);
z0 = min(cube_z); z1 = max(cube_z);
# define differences in each dimension
xd = (xin-x0)/(x1-x0); # 0.5
yd = (yin-y0)/(y1-y0); # 0.5
zd = (zin-z0)/(z1-z0); # 0.9166666
# interpolate along x:
v00 = get_value(tmp, x0, y0, z0)*(1-xd) + get_value(tmp,x1,y0,z0)*xd # 2.5
v01 = get_value(tmp, x0, y0, z1)*(1-xd) + get_value(tmp,x1,y0,z1)*xd # 0
v10 = get_value(tmp, x0, y1, z0)*(1-xd) + get_value(tmp,x1,y1,z0)*xd # 0
v11 = get_value(tmp, x0, y1, z1)*(1-xd) + get_value(tmp,x1,y1,z1)*xd # 65
# interpolate along y:
v0 = v00*(1-yd) + v10*yd # 1.25
v1 = v01*(1-yd) + v11*yd # 32.5
# interpolate along z:
return(v0*(1-zd) + v1*zd) # 29.89583 (~91.7% between v0 and v1)
}
> interp3(df, 15, 15, -12)
[1] 29.89583
Testing that same source's assertion that trilinear is simply linear(bilinear(), bilinear()), we can use the base R linear interpolation function, approx(), and the akima package's bilinear interpolation function, interp(), as follows:
library(akima)
approx(x=c(-11.857143,-13.571429),
y=c(interp(x=df[round(df$z,1)==-11.9,"x"], y=df[round(df$z,1)==-11.9,"y"], z=df[round(df$z,1)==-11.9,"v"], xo=15, yo=15)$z,
interp(x=df[round(df$z,1)==-13.6,"x"], y=df[round(df$z,1)==-13.6,"y"], z=df[round(df$z,1)==-13.6,"v"], xo=15, yo=15)$z),
xout=-12)$y
# [1] 0.2083331
Checked another package to triangulate:
library(oce)
Vmat <- array(data = V, dim = c(length(unique(X)), length(unique(Y)), length(unique(Z))))
approx3d(x=unique(X), y=unique(Y), z=unique(Z), f=Vmat, xout=15, yout=15, zout=-12)
[1] 1.666667
So 'oce', 'akima' and my function all give pretty different answers. This is either a mistake in my code somewhere, or due to differences in the underlying Fortran code in the akima interp(), and whatever is in the oce 'approx3d' function that we'll leave for another day.
Not sure what the correct answer is because the MWE is not exactly "minimum" or simple. But I tested the functions with some really simple grids and it seems to give 'correct' answers. Here's one simple 2x2x2 example:
# really, really simple example:
# answer is always the z-coordinate value
sdf <- expand.grid(x=seq(0,1),y=seq(0,1),z=seq(0,1))
sdf$v <- rep(seq(0,1), each=4)
> interp3(sdf,0.25,0.25,.99)
[1] 0.99
> interp3(sdf,0.25,0.25,.4)
[1] 0.4
Trying akima on the simple example, we get the same answer (phew!):
library(akima)
approx(x=unique(sdf$z),
y=c(interp(x=sdf[sdf$z==0,"x"], y=sdf[sdf$z==0,"y"], z=sdf[sdf$z==0,"v"], xo=.25, yo=.25)$z,
interp(x=sdf[sdf$z==1,"x"], y=sdf[sdf$z==1,"y"], z=sdf[sdf$z==1,"v"], xo=.25, yo=.25)$z),
xout=.4)$y
# [1] 0.4
The new example data in the OP's own, accepted answer was not possible to interpolate with my simple interp3() function above because:
(a) the grid coordinates are not regularly spaced, and
(b) the coordinates to lookup (x1, y1, z1) lie outside of the grid.
# for completeness, here's the attempt:
options(scipen = 999)
XCoor=c(78121.6235,78121.6235,78121.6235,78121.6235,78136.723,78136.723,78136.723,78136.8969,78136.8969,78136.8969,78137.4595,78137.4595,78137.4595,78125.061,78125.061,78125.061,78092.4696,78092.4696,78092.4696,78092.7683,78092.7683,78092.7683,78092.7683,78075.1171,78075.1171,78064.7462,78064.7462,78064.7462,78052.771,78052.771,78052.771,78032.1179,78032.1179,78032.1179)
YCoor=c(5213642.173,523642.173,523642.173,523642.173,523594.495,523594.495,523594.495,523547.475,523547.475,523547.475,523503.462,523503.462,523503.462,523426.33,523426.33,523426.33,523656.953,523656.953,523656.953,523607.157,523607.157,523607.157,523607.157,523514.671,523514.671,523656.81,523656.81,523656.81,523585.232,523585.232,523585.232,523657.091,523657.091,523657.091)
ZCoor=c(-3.0,-5.0,-10.0,-13.0,-3.5,-6.5,-10.5,-3.5,-6.5,-9.5,-3.5,-5.5,-10.5,-3.5,-5.5,-7.5,-3.5,-6.5,-11.5,-3.0,-5.0,-9.0,-12.0,-6.5,-10.5,-2.5,-3.5,-8.0,-3.5,-6.5,-9.5,-2.5,-6.5,-8.5)
V=c(2.4000,30.0,620.0,590.0,61.0,480.0,0.3700,0.0,0.3800,0.1600,0.1600,0.9000,0.4100,0.0,0.0,0.0061,6.0,52.0,0.3400,33.0,235.0,350.0,9300.0,31.0,2100.0,0.0,0.0,10.5000,3.8000,0.9000,310.0,0.2800,8.3000,18.0)
adf = data.frame(x=XCoor, y=YCoor, z=ZCoor, v=V)
# the first y value looks like a typo?
> head(adf)
x y z v
1 78121.62 5213642.2 -3.0 2.4
2 78121.62 523642.2 -5.0 30.0
3 78121.62 523642.2 -10.0 620.0
4 78121.62 523642.2 -13.0 590.0
5 78136.72 523594.5 -3.5 61.0
6 78136.72 523594.5 -6.5 480.0
x1=198130.000
y1=1913590.000
z1=-8
> interp3(adf, x1,y1,z1)
numeric(0)
Warning message:
In min(dvec[dvec > 0]) : no non-missing arguments to min; returning Inf
Whether the test data did or not make sense, I still needed an algorithm. Test data is just that, something to fiddle with and as a test data it was fine.
I wound up programming it in python and the following code takes XYZ V and does a 3D Inverse Distance Weighted (IDW) interpolation where you can set the number of points used in the interpolation. This python recipe only interpolates to one point (x1, y1, z1) but it is easy enough to extend.
import numpy as np
import math
#34 points
XCoor=np.array([78121.6235,78121.6235,78121.6235,78121.6235,78136.723,78136.723,78136.723,78136.8969,78136.8969,78136.8969,78137.4595,78137.4595,78137.4595,78125.061,78125.061,78125.061,78092.4696,78092.4696,78092.4696,78092.7683,78092.7683,78092.7683,78092.7683,78075.1171,78075.1171,78064.7462,78064.7462,78064.7462,78052.771,78052.771,78052.771,78032.1179,78032.1179,78032.1179])
YCoor=np.array([5213642.173,523642.173,523642.173,523642.173,523594.495,523594.495,523594.495,523547.475,523547.475,523547.475,523503.462,523503.462,523503.462,523426.33,523426.33,523426.33,523656.953,523656.953,523656.953,523607.157,523607.157,523607.157,523607.157,523514.671,523514.671,523656.81,523656.81,523656.81,523585.232,523585.232,523585.232,523657.091,523657.091,523657.091])
ZCoor=np.array([-3.0,-5.0,-10.0,-13.0,-3.5,-6.5,-10.5,-3.5,-6.5,-9.5,-3.5,-5.5,-10.5,-3.5,-5.5,-7.5,-3.5,-6.5,-11.5,-3.0,-5.0,-9.0,-12.0,-6.5,-10.5,-2.5,-3.5,-8.0,-3.5,-6.5,-9.5,-2.5,-6.5,-8.5])
V=np.array([2.4000,30.0,620.0,590.0,61.0,480.0,0.3700,0.0,0.3800,0.1600,0.1600,0.9000,0.4100,0.0,0.0,0.0061,6.0,52.0,0.3400,33.0,235.0,350.0,9300.0,31.0,2100.0,0.0,0.0,10.5000,3.8000,0.9000,310.0,0.2800,8.3000,18.0])
def Distance(x1,y1,z1, Npoints):
i=0
d=[]
while i < 33:
d.append(math.sqrt((x1-XCoor[i])*(x1-XCoor[i]) + (y1-YCoor[i])*(y1-YCoor[i]) + (z1-ZCoor[i])*(z1-ZCoor[i]) ))
i = i + 1
distance=np.array(d)
myIndex=distance.argsort()[:Npoints]
weightedNum=0
weightedDen=0
for i in myIndex:
weightedNum=weightedNum + (V[i]/(distance[i]*distance[i]))
weightedDen=weightedDen + (1/(distance[i]*distance[i]))
InterpValue=weightedNum/weightedDen
return InterpValue
x1=198130.000
y1=1913590.000
z1=-8
print(Distance(x1,y1,z1, 12))

If raster value NA search and extract the nearest non-NA pixel

On extracting values of a raster to points I find that I have several NA's, and rather than use a buffer and fun arguments of extract function, instead I'd like to extract the nearest non-NA Pixel to a point that overlaps NA.
I am using the basic extract function:
data.extr<-extract(loc.thr, data[,11:10])
Here's a solution without using the buffer. However, it calculates a distance map separately for each point in your dataset, so it might be ineffective if your dataset is large.
set.seed(2)
# create a 10x10 raster
r <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10)
r[] <- 1:10
r[sample(1:ncell(r), size = 25)] <- NA
# plot the raster
plot(r, axes=F, box=F)
segments(x0 = 0, y0 = 0:10, x1 = 10, y1 = 0:10, lty=2)
segments(y0 = 0, x0 = 0:10, y1 = 10, x1 = 0:10, lty=2)
# create sample points and add them to the plot
xy = data.frame(x=runif(10,1,10), y=runif(10,1,10))
points(xy, pch=3)
text(x = xy$x, y = xy$y, labels = as.character(1:nrow(xy)), pos=4, cex=0.7, xpd=NA)
# use normal extract function to show that NAs are extracted for some points
extracted = extract(x = r, y = xy)
# then take the raster value with lowest distance to point AND non-NA value in the raster
sampled = apply(X = xy, MARGIN = 1, FUN = function(xy) r#data#values[which.min(replace(distanceFromPoints(r, xy), is.na(r), NA))])
# show output of both procedures
print(data.frame(xy, extracted, sampled))
# x y extracted sampled
#1 5.398959 6.644767 6 6
#2 2.343222 8.599861 NA 3
#3 4.213563 3.563835 5 5
#4 9.663796 7.005031 10 10
#5 2.191348 2.354228 NA 2
#6 1.093731 9.835551 2 2
#7 2.481780 3.673097 3 3
#8 8.291729 2.035757 9 9
#9 8.819749 2.468808 9 9
#10 5.628536 9.496376 6 6
This is a raster-based solution, by first filling the NA pixels with the nearest non-NA pixel value.
Note however, that this does not take into account the position of a point within a pixel. Instead, it calculates the distances between pixel centers to determine the nearest non-NA pixel.
First, it calculates for each NA raster pixel the distance and direction to the nearest non-NA pixel. The next step is to calculate the coordinates of this non-NA cell (assumes projected CRS), extract its value and to store this value at the NA location.
Starting data: a projected raster, with identical values as in the answer from koekenbakker:
set.seed(2)
# set projected CRS
r <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10, crs='+proj=utm +zone=1')
r[] <- 1:10
r[sample(1:ncell(r), size = 25)] <- NA
# create sample points
xy = data.frame(x=runif(10,1,10), y=runif(10,1,10))
# use normal extract function to show that NAs are extracted for some points
extracted <- raster::extract(x = r, y = xy)
Calculate the distance and direction from all NA pixels to the nearest non-NA pixel:
dist <- distance(r)
# you can also set a maximum distance: dist[dist > maxdist] <- NA
direct <- direction(r, from=FALSE)
Retrieve coordinates of NA pixels
# NA raster
rna <- is.na(r) # returns NA raster
# store coordinates in new raster: https://stackoverflow.com/a/35592230/3752258
na.x <- init(rna, 'x')
na.y <- init(rna, 'y')
# calculate coordinates of the nearest Non-NA pixel
# assume that we have a orthogonal, projected CRS, so we can use (Pythagorean) calculations
co.x <- na.x + dist * sin(direct)
co.y <- na.y + dist * cos(direct)
# matrix with point coordinates of nearest non-NA pixel
co <- cbind(co.x[], co.y[])
Extract values of nearest non-NA cell with coordinates 'co'
# extract values of nearest non-NA cell with coordinates co
NAVals <- raster::extract(r, co, method='simple')
r.NAVals <- rna # initiate new raster
r.NAVals[] <- NAVals # store values in raster
Fill the original raster with the new values
# cover nearest non-NA value at NA locations of original raster
r.filled <- cover(x=r, y= r.NAVals)
sampled <- raster::extract(x = r.filled, y = xy)
# compare old and new values
print(data.frame(xy, extracted, sampled))
# x y extracted sampled
# 1 5.398959 6.644767 6 6
# 2 2.343222 8.599861 NA 3
# 3 4.213563 3.563835 5 5
# 4 9.663796 7.005031 10 10
# 5 2.191348 2.354228 NA 3
# 6 1.093731 9.835551 2 2
# 7 2.481780 3.673097 3 3
# 8 8.291729 2.035757 9 9
# 9 8.819749 2.468808 9 9
# 10 5.628536 9.496376 6 6
Note that point 5 takes another value than the answer of Koekenbakker, since this method does not take into account the position of the point within a pixel (as mentioned above). If this is important, this solution might not be appropriate. In other cases, e.g. if the raster cells are small compared to the point accuracy, this raster-based method should give good results.
For a raster stack, use #koekenbakker's solution above, and turn it into a function. A raster stack's #layers slot is a list of rasters, so, lapply it across and go from there.
#new layer
r2 <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10)
r2[] <- 1:10
r2[sample(1:ncell(r2), size = 25)] <- NA
#make the stack
r_stack <- stack(r, r2)
#a function for sampling
sample_raster_NA <- function(r, xy){
apply(X = xy, MARGIN = 1,
FUN = function(xy) r#data#values[which.min(replace(distanceFromPoints(r, xy), is.na(r), NA))])
}
#lapply to get answers
lapply(r_stack#layers, function(a_layer) sample_raster_NA(a_layer, xy))
Or to be fancy (speed improvements?)
purrr::map(r_stack#layers, sample_raster_NA, xy=xy)
Which makes me wonder if the whole thing can be sped up even more using dplyr...

Resources