Rasterizing polygons with complicated weighting

Rasterizing polygons with complicated weighting - r

Imagine a regular 0.5° grid across the Earth's surface. A 3x3 subset of this grid is shown below. As a stylized example of what I'm working with, let's say I have three polygons—yellow, orange, and blue—that for the sake of simplicity all are 1 unit in area. These polygons have attributes Population and Value, which you can see in the legend:
I want to turn these polygons into a 0.5° raster (with global extent) whose values are based on the weighted-mean Value of the polygons. The tricky part is that I want to weight the polygons' values based on not their Population, but rather on their included population.
I know—theoretically—what I want to do, and below have done it for the center gridcell.
Multiply Population by Included (the area of the polygon that is included in the gridcell) to get Pop. included. (Assumes population is distributed evenly throughout polygon, which is acceptable.)
Divide each polygon's Included_pop by the sum of all polygons' Included_pop (32) to get Weight.
Multiply each polygon's Value by Weight to get Result.
Sum all polygons' Result to get the value for the center gridcell (0.31).
Population
Value
Frac. included
Pop. included
Weight
Result
Yellow
24
0.8
0.25
6
0.1875
0.15
Orange
16
0.4
0.5
8
0.25
0.10
Blue
18
0.1
1
18
0.5625
0.06
32
0.31
I have an idea of how to accomplish this in R, as described below. Where possible, I've filled in code that I think will do what I want. My questions: How do I do steps 2 and 3? Or is there a simpler way to do this? If you want to play around with this, I have uploaded old_polygons as a .rds file here.
library("sf")
library("raster")
Calculate the area of each polygon: old_polygons$area <- as.numeric(st_area(old_polygons))
Generate the global 0.5° grid as some kind of Spatial object.
Split the polygons by the grid, generating new_polygons.
Calculate area of the new polygons: new_polygons$new_area <- as.numeric(st_area(new_polygons))
Calculate fraction included for each new polygon: new_polygons$frac_included <- new_polygons$new_area / new_polygons$old_area
Calculate "included population" in the new polygons: new_polygons$pop_included <- new_polygons$pop * new_polygons$frac_included
Calculate a new attribute for each polygon that is just their Value times their included population. new_polygons$tmp <- new_polygons$Value * new_polygons$frac_included
Set up an empty raster for the next steps: empty_raster <- raster(nrows=360, ncols=720, xmn=-180, xmx=180, ymn=-90, ymx=90)
Rasterize the polygons by summing this new attribute together within each gridcell. tmp_raster <- rasterize(new_polygons, empty_raster, "tmp", fun = "sum")
Create another raster that is just the total population in each gridcell: pop_raster <- rasterize(new_polygons, empty_raster, "pop_included", fun = "sum")
Divide the first raster by the second to get what I want:
output_raster <- empty_raster
values(output_raster) <- getValues(tmp_raster) / getValues(pop_raster)
Any help would be much appreciated!

Example data:
library(terra)
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f)
values(v) <- data.frame(population=1:12, value=round(c(2:13)/14, 2))
r <- rast(ext(v)+.05, ncols=4, nrows=6, names="cell")
Illustrate the data
p <- as.polygons(r)
plot(p, lwd=2, col="gray", border="light gray")
lines(v, col=rainbow(12), lwd=2)
txt <- paste0(v$value, " (", v$population, ")")
text(v, txt, cex=.8, halo=TRUE)
Solution:
# area of the polygons
v$area1 <- expanse(v)
# intersect with raster cell boundaries
values(r) <- 1:ncell(r)
p <- as.polygons(r)
pv <- intersect(p, v)
# area of the polygon parts
pv$area2 <- expanse(pv)
pv$frac <- pv$area2 / pv$area1
Now we just use the data.frame with the attributes of the polygons to compute the polygon-cover-weighted-population-weighted values.
z <- values(pv)
a <- aggregate(z[, "frac", drop=FALSE], z[,"cell",drop=FALSE], sum)
names(a)[2] <- 'fsum'
z <- merge(z, a)
z$weight <- z$population * z$frac / z$fsum
z$wvalue <- z$value * z$weight
b <- aggregate(z[, c("wvalue", "weight")], z[, "cell", drop=FALSE], sum)
b$bingo <- b$wvalue / b$weight
Assign values back to raster cells
x <- rast(r)
x[b$cell] <- b$bingo
Inspect results
plot(x)
lines(v)
text(x, digits=2, halo=TRUE, cex=.9)
text(v, "value", cex=.8, col="red", halo=TRUE)
This may not scale very well to large data sets, but you could perhaps do it in chunks.

This is fast and scalable:
library(data.table)
library(terra)
# make the 3 polygons with radius = 5km
center_points <- data.frame(lon = c(0.5, 0.65, 1),
lat = c(0.75, 0.65, 1),
Population = c(16, 18, 24),
Value = c(0.4, 0.1, 0.8))
polygon <- vect(center_points, crs = "EPSG:4326")
polygon <- buffer(polygon, 5000)
# make the raster
my_raster <- rast(nrow = 3, ncol = 3, xmin = 0, xmax = 1.5, ymin = 0, ymax = 1.5, crs = "EPSG:4326")
my_raster[] <- 0 # set the value to 0 for now
# find the fractions of cells in each polygon
# "cells" gives you the cell ID and "weights" (or "exact") gives you the cell fraction in the polygon
# using "exact" instead of "weights" is more accurate
my_Table <- extract(my_raster, polygon, cells = TRUE, weights = TRUE)
setDT(my_Table) # convert to datatable
# merge the polygon attributes to "my_Table"
poly_Table <- setDT(as.data.frame(polygon))
poly_Table[, ID := 1:nrow(poly_Table)] # add the IDs which are the row numbers
merged_Table <- merge(my_Table, poly_Table, by = "ID")
# find Frac_included
merged_Table[, Frac_included := weight / sum(weight), by = ID]
# find Pop_included
merged_Table[, Pop_included := Frac_included * Population]
# find Weight, to avoid confusion with "weight" produced above, I call this "my_Weight"
merged_Table[, my_Weight := Pop_included / sum(Pop_included), by = cell]
# final results
Result <- merged_Table[, .(Result = sum(Value * my_Weight)), by = cell]
# add the values to the raster
my_raster[Result$cell] <- Result$Result
plot(my_raster)

Related

Finding the peak of a mountain

so I've combined those 2 rasters and made them into one dem raster which contains elevation values:
dem1 = read_stars("srtm_43_06.tif")
dem2 = read_stars("srtm_44_06.tif")
pol = st_read("israel_borders.shp")
dem = st_mosaic(dem1, dem2)
dem = dem[, 5687:6287, 2348:2948]
names(dem) = "elevation"
dem = st_warp(src = dem, crs = 32636, method = "near", cellsize = 90)
Now I need to calculate a point geometry of the peak of the mountain by finding the centroid of the pixel that has the highest elevation in the image, does anyone know what functions I can use?

Building on Grzegorz Sapijaszko's example, here is an alternative path to the top of the mountain.
library(terra)
f <- system.file("ex/elev.tif", package="terra")
x <- rast(f)
If there is a single maximum, you can do
g <- global(x, which.max)
xyFromCell(x, g[,1])
# x y
#[1,] 6.020833 50.17917
Now, consider a situation with multiple maxima. I add three more cells with the maximum value.
x[c(1000, 2500, 5000)] <- 547
We can find the four highest peaks with:
g <- global(x, which.max)[[1]]
v <- x[g] |> unlist()
y <- ifel(x == v, v, NA)
p <- as.points(y)
crds(p)
#[1,] 6.020833 50.17917
#[2,] 6.154167 50.10417
#[3,] 5.987500 49.97083
#[4,] 6.237500 49.75417
You should not warp (project with terra) the raster data first because that changes the cell values and potentially the location of the highest peak. You should find the peaks with the original data, but then you can transform the results like this.
pp <- project(p, "EPSG:32636")
crds(pp)
# x y
#[1,] -1411008 5916157
#[2,] -1404896 5904422
#[3,] -1422145 5894509
#[4,] -1413735 5864236
With your files, you could start with something like
ff <- c("srtm_43_06.tif", "srtm_44_06.tif")
v <- vrt(ff)
g <- global(x, which.max)
And then continue as in the examples above.

Let's use terra, however similar approach can be applied by raster package as well. For testing purposes we will use raster supplied with terra package
library(terra)
#> terra 1.5.12
f <- system.file("ex/elev.tif", package="terra")
v <- rast(f)
plot(v)
You can check the details of your raster just typing the raster object name and pressing enter, you can check the min and max values with minmax() function form terra:
minmax(v)
#> elevation
#> [1,] 141
#> [2,] 547
Let's create another raster by copying original one, however checking if the value is the max value of elevation:
w <- v == minmax(v)[2]
plot(w)
Let's create a substitution matrix, and substitute all FALSE with NA and TRUE with 1:
mx <- matrix(c(FALSE, NA, TRUE, 1), ncol = 2, byrow = TRUE)
w <- classify(w, mx)
plot(v)
plot(as.polygons(w), add=TRUE)
Let's find centroids of those polygon(s):
pts <- centroids(as.polygons(w))
plot(pts, add=TRUE)
Let's see our coordinates:
as.data.frame(pts, geom = "WKT")
#> elevation geometry
#> 1 1 POINT (6.020833 50.179167)
Created on 2022-01-29 by the reprex package (v2.0.1)

R raster calculate areal weighted mean when scaling to a larger resolution with offset

I have two raster grids in R with different resolutions which don't line up exactly. In actual fact I have hundreds of each so any answer must be easily run many times.
I want to scale the finer resolution grid up to the coarser resolution by taking an areal weighted mean of the grid cells.
I was hoping I could use projectRaster or resample but neither give the desired output and I cannot use aggregate as I need my new grids to align to the coarser resolution grid.
For my real data my finer grid is 0.005 deg intervals and coarser is at 0.02479172 deg intervals and extents/origins don't exactly match up.
I've made an extreme version as an example why neither resample or projectRaster work
library(raster)
#> Warning: package 'raster' was built under R version 3.5.3
#> Loading required package: sp
testproj <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84"
testmat <- matrix(1, nrow = 8, ncol = 8)
# testmat <- matrix(sample(1:10, 64, replace = T), nrow = 8, ncol = 8)
testmat[1,5] <- 400
testmat[8,4] <- -400
testsmallraster <- raster(testmat, xmn=0, xmx=8, ymn=0, ymx=8)
crs(testsmallraster) <- testproj
plot(testsmallraster)
testlarger <- raster(matrix(rep(NA,4), nrow = 2, ncol = 2), xmn=0.3, xmx=8.3, ymn=0, ymx=8)
crs(testlarger) <- testproj
tout_reproj <- projectRaster(testsmallraster, testlarger)
tout_resamp <- resample(testsmallraster, testlarger)
tout_resampngb <- resample(testsmallraster, testlarger, method = "ngb")
tout_agg <- aggregate(testsmallraster, fact = 4)
#reprojected values ignore all but 4 cells closest to new centre
values(tout_reproj)
#> [1] 1 1 1 1
#resample uses bilinear interpolation which weights the grids cells furthest from the new centre less than those closest
# I need all grid cells entirely contained in the new grid to have equal weighting
#bilinear interpolation also weights cells which do not fall within the new cell at all which I do not want
values(tout_resamp)
#> [1] 10.851852 15.777778 -7.911111 -12.366667
#aggregate gives close to the values I want but they are not in the new raster origin/resolution and therefore not splitting values that fall across grid boundaries
values(tout_agg)
#> [1] 1.0000 25.9375 -24.0625 1.0000
#using ngb was never really going to make any sense but thought I'd as it for completeness
values(tout_resampngb)
#> [1] 1 1 1 1
#desired output first cell only 0.3 of a grid cell covers the grid cell = 400 the rest equal 1
#desired output second cell 0.7 of a grid cell covers the grid cell = 400 the rest equal 1
#desired output third cell has exactly 1 grid cell of -400 and 15 of 1
#desired output fourth cell only overlap grid cells = 1
desiredoutput <- raster(matrix(c((15.7*1+0.3*400)/16,(15.3*1+0.7*400)/16,mean(c(-400, rep(1,15))),1),byrow = T, nrow = 2, ncol = 2), xmn=0.3, xmx=8.3, ymn=0, ymx=8)
values(desiredoutput)
#> [1] 8.48125 18.45625 -24.06250 1.00000
Created on 2020-07-02 by the reprex package (v0.3.0)

You can get closer to the desired result by using a similar spatial resolution for resample, and then aggregate the results
library(raster)
testproj <- "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +datum=WGS84"
testmat <- matrix(1, nrow = 8, ncol = 8)
testmat[1,5] <- 400
testmat[8,4] <- -400
testsmallraster <- raster(testmat, xmn=0, xmx=8, ymn=0, ymx=8, crs=testproj)
testlarger <- raster(matrix(rep(NA,4), nrow = 2, ncol = 2), xmn=0.3, xmx=8.3, ymn=0, ymx=8, crs = testproj)
y <- disaggregate(testlarger, 4)
z <- resample(testsmallraster, y)
za <- aggregate(z, 4)
values(za)
#[1] 8.48125 18.45625 -24.06250 1.00000
for much better speed, try terra
library(terra)
a <- rast(testsmallraster)
b <- rast(testlarger)
b <- disaggregate(b, 4)
d <- resample(a, b)
da <- aggregate(d, 4)
values(da)
# layer
#[1,] 8.48125
#[2,] 18.45625
#[3,] -24.06250
#[4,] 1.00000
This probably ought to be done automatically by resample and project(Raster). raster attempts to do some of this for resample, but in this case not very satisfactorily.

When I needed to do similar resampling, this worked for me. This example is a 4-cell destination grid at 1o x 1o spacing with centroids at half degrees (to match some satellite data), and an offset half-degree grid for source data (ECMWF weather). 'Resample' does the heavy lifting of interpolating on mismatched grids. The code below is basically a manual version of a 'weights=' option that doesn't exist for resample. We need relative, not absolute, areas to be correct for weighting, so the caveat on the precision of raster::area described in the help seems of low concern.
library(raster)
wgs84 <- "+init=epsg:4326"
polar.brick.source <- array(dim = c(5, 5, 2), rep(c(1, 2), each = 25))
dimnames(polar.brick.source)[[1]] <- seq(-1, 1, by = .5)
dimnames(polar.brick.source)[[2]] <- seq(80, 82, by = .5)
dimnames(polar.brick.source)[[3]] <- c("time.a", "time.b")
# Add some outliers to see their effects.
polar.brick.source[1, 2, ] <- c(25, 50)
polar.brick.source[3, 2, 2] <- -30
polar.brick <- brick(polar.brick.source, crs = CRS(wgs84),
xmn = min(as.numeric(dimnames(polar.brick.source)[[1]])) - .25,
xmx = max(as.numeric(dimnames(polar.brick.source)[[1]])) + .25,
ymn = min(as.numeric(dimnames(polar.brick.source)[[2]])) - .25,
ymx = max(as.numeric(dimnames(polar.brick.source)[[2]])) + .25)
fine.polar.area <- raster::area(polar.brick)
polar.one.degree.source <- data.frame(
lon = c(-.5, .5, -.5, .5),
lat = c(80.5, 80.5, 81.5, 81.5),
placeholder = rep(1, 4))
polar.one.degree.raster <- rasterFromXYZ(polar.one.degree.source, crs = CRS(wgs84))
polar.one.degree.area <- raster::area(polar.one.degree.raster)
as.data.frame(polar.one.degree.area, xy = T)
fine.clip.layer <- disaggregate(polar.one.degree.raster, 2)
clipped.fine.polar <-resample(polar.brick * fine.polar.area,
fine.clip.layer)
new.weighted.wx <- aggregate(clipped.fine.polar * 4, 2)
as.data.frame(new.weighted.wx, xy = T) # look at partial results.
new.weather <- new.weighted.wx / polar.one.degree.area
as.data.frame(new.weather, xy = T)

Mapping slope of an area and returning percent above and below a threshold in R

I am trying to figure our the proportion of an area that has a slope of 0, +/- 5 degrees. Another way of saying it is anything above 5 degrees and below 5 degrees are bad. I am trying to find the actual number, and a graphic.
To achieve this I turned to R and using the Raster package.
Let's use a generic country, in this case, the Philippines
{list.of.packages <- c("sp","raster","rasterVis","maptools","rgeos")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)}
library(sp) # classes for spatial data
library(raster) # grids, rasters
library(rasterVis) # raster visualisation
library(maptools)
library(rgeos)
Now let's get the altitude information and plot the slopes.
elevation <- getData("alt", country = "PHL")
x <- terrain(elevation, opt = c("slope", "aspect"), unit = "degrees")
plot(x$slope)
Not very helpful due to the scale, so let's simply look at the Island of Palawan
e <- drawExtent(show=TRUE) #to crop out Palawan (it's the long skinny island that is roughly midway on the left and is oriented between 2 and 8 O'clock)
gewataSub <- crop(x,e)
plot(gewataSub, 1)## Now visualize the new cropped object
A little bit better to visualize. I get a sense of the magnitude of the slopes and that with a 5 degree restriction, I am mostly confined to the coast. But I need a little bit more for analysis.
I would like Results to be something to be in two parts:
1. " 35 % (made up) of the selected area has a slope exceeding +/- 5 degrees" or " 65 % of the selected area is within +/- 5 degrees". (with the code to get it)
2. A picture where everything within +/- 5 degrees is one color, call it good or green, and everything else is in another color, call it bad or red.
Thanks

There are no negative slopes, so I assume you want those that are less than 5 degrees
library(raster)
elevation <- getData('alt', country='CHE')
x <- terrain(elevation, opt='slope', unit='degrees')
z <- x <= 5
Now you can count cells with freq
f <- freq(z)
If you have a planar coordinate reference system (that is, with units in meters or similar) you can do
f <- cbind(f, area=f[,2] * prod(res(z)))
to get areas. But for lon/lat data, you would need to correct for different sized cells and do
a <- area(z)
zonal(a, z, fun=sum)
And there are different ways to plot, but the most basic one
plot(z)

You can use reclassify from the raster package to achieve that. The function assigns each cell value that lies within a defined interval a certain value. For example, you can assign cell values within interval (0,5] to value 0 and cell values within the interval (5, maxSlope] to value 1.
library(raster)
library(rasterVis)
elevation <- getData("alt", country = "PHL")
x <- terrain(elevation, opt = c("slope", "aspect"), unit = "degrees")
plot(x$slope)
e <- drawExtent(show = TRUE)
gewataSub <- crop(x, e)
plot(gewataSub$slope, 1)
m <- c(0, 5, 0, 5, maxValue(gewataSub$slope), 1)
rclmat <- matrix(m, ncol = 3, byrow = TRUE)
rc <- reclassify(gewataSub$slope, rclmat)
levelplot(
rc,
margin = F,
col.regions = c("wheat", "gray"),
colorkey = list(at = c(0, 1, 2), labels = list(at = c(0.5, 1.5), labels = c("<= 5", "> 5")))
)
After the reclassification you can calculate the percentages:
length(rc[rc == 0]) / (length(rc[rc == 0]) + length(rc[rc == 1])) # <= 5 degrees
[1] 0.6628788
length(rc[rc == 1]) / (length(rc[rc == 0]) + length(rc[rc == 1])) # > 5 degrees
[1] 0.3371212

If raster value NA search and extract the nearest non-NA pixel

On extracting values of a raster to points I find that I have several NA's, and rather than use a buffer and fun arguments of extract function, instead I'd like to extract the nearest non-NA Pixel to a point that overlaps NA.
I am using the basic extract function:
data.extr<-extract(loc.thr, data[,11:10])

Here's a solution without using the buffer. However, it calculates a distance map separately for each point in your dataset, so it might be ineffective if your dataset is large.
set.seed(2)
# create a 10x10 raster
r <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10)
r[] <- 1:10
r[sample(1:ncell(r), size = 25)] <- NA
# plot the raster
plot(r, axes=F, box=F)
segments(x0 = 0, y0 = 0:10, x1 = 10, y1 = 0:10, lty=2)
segments(y0 = 0, x0 = 0:10, y1 = 10, x1 = 0:10, lty=2)
# create sample points and add them to the plot
xy = data.frame(x=runif(10,1,10), y=runif(10,1,10))
points(xy, pch=3)
text(x = xy$x, y = xy$y, labels = as.character(1:nrow(xy)), pos=4, cex=0.7, xpd=NA)
# use normal extract function to show that NAs are extracted for some points
extracted = extract(x = r, y = xy)
# then take the raster value with lowest distance to point AND non-NA value in the raster
sampled = apply(X = xy, MARGIN = 1, FUN = function(xy) r#data#values[which.min(replace(distanceFromPoints(r, xy), is.na(r), NA))])
# show output of both procedures
print(data.frame(xy, extracted, sampled))
# x y extracted sampled
#1 5.398959 6.644767 6 6
#2 2.343222 8.599861 NA 3
#3 4.213563 3.563835 5 5
#4 9.663796 7.005031 10 10
#5 2.191348 2.354228 NA 2
#6 1.093731 9.835551 2 2
#7 2.481780 3.673097 3 3
#8 8.291729 2.035757 9 9
#9 8.819749 2.468808 9 9
#10 5.628536 9.496376 6 6

This is a raster-based solution, by first filling the NA pixels with the nearest non-NA pixel value.
Note however, that this does not take into account the position of a point within a pixel. Instead, it calculates the distances between pixel centers to determine the nearest non-NA pixel.
First, it calculates for each NA raster pixel the distance and direction to the nearest non-NA pixel. The next step is to calculate the coordinates of this non-NA cell (assumes projected CRS), extract its value and to store this value at the NA location.
Starting data: a projected raster, with identical values as in the answer from koekenbakker:
set.seed(2)
# set projected CRS
r <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10, crs='+proj=utm +zone=1')
r[] <- 1:10
r[sample(1:ncell(r), size = 25)] <- NA
# create sample points
xy = data.frame(x=runif(10,1,10), y=runif(10,1,10))
# use normal extract function to show that NAs are extracted for some points
extracted <- raster::extract(x = r, y = xy)
Calculate the distance and direction from all NA pixels to the nearest non-NA pixel:
dist <- distance(r)
# you can also set a maximum distance: dist[dist > maxdist] <- NA
direct <- direction(r, from=FALSE)
Retrieve coordinates of NA pixels
# NA raster
rna <- is.na(r) # returns NA raster
# store coordinates in new raster: https://stackoverflow.com/a/35592230/3752258
na.x <- init(rna, 'x')
na.y <- init(rna, 'y')
# calculate coordinates of the nearest Non-NA pixel
# assume that we have a orthogonal, projected CRS, so we can use (Pythagorean) calculations
co.x <- na.x + dist * sin(direct)
co.y <- na.y + dist * cos(direct)
# matrix with point coordinates of nearest non-NA pixel
co <- cbind(co.x[], co.y[])
Extract values of nearest non-NA cell with coordinates 'co'
# extract values of nearest non-NA cell with coordinates co
NAVals <- raster::extract(r, co, method='simple')
r.NAVals <- rna # initiate new raster
r.NAVals[] <- NAVals # store values in raster
Fill the original raster with the new values
# cover nearest non-NA value at NA locations of original raster
r.filled <- cover(x=r, y= r.NAVals)
sampled <- raster::extract(x = r.filled, y = xy)
# compare old and new values
print(data.frame(xy, extracted, sampled))
# x y extracted sampled
# 1 5.398959 6.644767 6 6
# 2 2.343222 8.599861 NA 3
# 3 4.213563 3.563835 5 5
# 4 9.663796 7.005031 10 10
# 5 2.191348 2.354228 NA 3
# 6 1.093731 9.835551 2 2
# 7 2.481780 3.673097 3 3
# 8 8.291729 2.035757 9 9
# 9 8.819749 2.468808 9 9
# 10 5.628536 9.496376 6 6
Note that point 5 takes another value than the answer of Koekenbakker, since this method does not take into account the position of the point within a pixel (as mentioned above). If this is important, this solution might not be appropriate. In other cases, e.g. if the raster cells are small compared to the point accuracy, this raster-based method should give good results.

For a raster stack, use #koekenbakker's solution above, and turn it into a function. A raster stack's #layers slot is a list of rasters, so, lapply it across and go from there.
#new layer
r2 <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10)
r2[] <- 1:10
r2[sample(1:ncell(r2), size = 25)] <- NA
#make the stack
r_stack <- stack(r, r2)
#a function for sampling
sample_raster_NA <- function(r, xy){
apply(X = xy, MARGIN = 1,
FUN = function(xy) r#data#values[which.min(replace(distanceFromPoints(r, xy), is.na(r), NA))])
}
#lapply to get answers
lapply(r_stack#layers, function(a_layer) sample_raster_NA(a_layer, xy))
Or to be fancy (speed improvements?)
purrr::map(r_stack#layers, sample_raster_NA, xy=xy)
Which makes me wonder if the whole thing can be sped up even more using dplyr...

Analyze data for spatial join (points to grid) and generate new dataset in R

I've got a dataset with longitude/latitude points and an outcome value for each set of coordinates. I would like to create a spatial grid and then take the average of outcomes for coordinates that are in the same grid and generate a new dataframe for which each coordinate is assigned a grid number and has the averaged outcome. For example, starting with this code:
require(sp)
require(raster)
frame <- data.frame(x = c(7.5, 8.2, 8.3), y = c(1,4,4.5), z = c(10,15,30))
coordinates(frame) <- c("x", "y")
proj4string(frame) <- CRS("+proj=longlat")
grid <- GridTopology(cellcentre.offset= c(0,0), cellsize = c(2,2), cells.dim = c(5,5))
sg <- SpatialGrid(grid)
poly <- as.SpatialPolygons.GridTopology(grid)
proj4string(poly) <- CRS("+proj=longlat")
plot(poly)
text(coordinates(poly), labels = row.names(poly), col = "gray", cex. =.6)
points(frame$x, frame$y, col = "blue", cex = .8)
I would then like to average the outcomes (z) within grid cells and produce an dataframe that looks like this (.e.g. observation):
x y z grid grid_mean
1 7.5 1.0 10 g20 10
2 8.2 4.0 15 g15 22.5
3 8.3 4.5 30 g15 22.5
Thanks for any and all help.

You can use the over(...) function in package sp for this. You don't need package raster at all, as far as I can see.
require(sp)
frame <- data.frame(x = c(7.5, 8.2, 8.3), y = c(1,4,4.5), z = c(10,15,30))
points <- SpatialPoints(frame)
proj4string(points) <- CRS("+proj=longlat")
grid <- GridTopology(cellcentre.offset= c(0,0), cellsize = c(2,2), cells.dim = c(5,5))
sg <- SpatialGrid(grid)
poly <- as.SpatialPolygons.GridTopology(grid)
proj4string(poly) <- CRS("+proj=longlat")
# identify grids...
result <- data.frame(frame,grid=over(points,poly))
# calculate means...
result <- merge(result,aggregate(z~grid,result,mean),by="grid")
# rename and reorder columns to make it look like your result
colnames(result) <- c("grid","x","y","z","grid_mean")
result <- result[,c(2,3,4,1,5)]
result
# x y z grid grid_mean
# 1 8.2 4.0 15 15 22.5
# 2 8.3 4.5 30 15 22.5
# 3 7.5 1.0 10 25 10.0
The over(x,y,...) function compares two Spatial* objects as overlays and returns a vector with the index into y of each geometry in x. In this case x is a SpatialPoints object and y is a SpatialPolygons object. So over(...) identifies the polygon ID (grid cell) in y associated with each point in x. The rest just calculates the means, merges the means with the original data frame, and renames and reorders the columns so the result looks like your result.
I tweaked your code a bit because it didn't make sense: you create a data frame with z-values, then convert it to a SpatialPoints object, which discards the z-values...