Crop, change values, and merge rasters with overlapping extent

Crop, change values, and merge rasters with overlapping extent - r

I am trying to take a raster of soils data for one state, crop it by county, change the cell values in each county (to the county fips code), and then re-merge the county rasters back into the state raster.
Here I read in the state soils raster (which by default as the map unit key associated with each soil type as the cell value) and a polygon of US counties. I then select out the polygon just of one state, transform it to the same coordiante system as the soils raster, and then select out the soils rasters and polygons two example counties.
state_soils_raster <- raster("MapunitRaster_IL_10m.tif")
us_county_polygons <- readOGR("cb_2016_us_county_500k/cb_2016_us_county_500k.shp")
IL_county_polygons <- us_county_polygons[us_county_polygons$STATEFP == 17,]
IL_county_polygons <- spTransform(IL_county_polygons, CRS = crs(state_soils_raster))
county1 <- "Douglas"
county2 <- "Coles"
county1_polygon <- IL_county_polygons[IL_county_polygons$NAME %in% county1,]
county2_polygon <- IL_county_polygons[IL_county_polygons$NAME %in% county2,]
county1_raster <- crop(state_soils_raster, county1_polygon)
county2_raster <- crop(state_soils_raster, county2_polygon)
If I plot each county by itself, you can see that the extent of the cropped region is rectangular and extends beyond the area of the county itself. The coloring is crazy because the mukey values are all over the place (although typically grouped by county). County1 lies just to the north of County2.
plot(county1_raster)
plot(county1_polygon, add = T)
plot(county2_raster)
plot(county2_polygon, add = T)
If I leave the values as is and merge the two county rasters back together, everything is fine. Even though the extents of the two rasters do overlap, the cell values are identical regardless of which raster merge is pulling from. I'm not actually sure which raster merge pulls from in this case, but it doesn't really matter. Everything fits back together nicely and the cell values are correct.
both_counties_raster <- merge(county1_raster, county2_raster)
plot(both_counties_raster)
plot(county1_polygon, add = T)
plot(county2_polygon, add = T)
However, what I want to do is to change the cell values by county prior to recombining the county rasters.
values(county1_raster) <- 1
values(county2_raster) <- 2
both_counties_raster_new <- merge(county1_raster, county2_raster)
Everything merges just fine, but when I now plot the new combined raster it is clear that for cells that were contained in both county rasters merge just took the cell values from one of the rasters. Clearly merge prioritizes the the first input raster by default.
plot(both_counties_raster_new)
plot(county1_polygon, add = T)
plot(county2_polygon, add = T)
What I'm looking for is to just change the cell values within the boundaries of each county and then merge all the counties back together again.
I am aware of the raster::mask function that can turn anything outside of the county boundary to NA, with a 10m cell resolution (described here), this takes an insane amount of time!
I have also tried an alternative approach using the raster::rasterize function to turn the county boundary polygons into a raster with the same cell size and extent of the state soils raster. Again, with a 10m cell resolution this takes forever. I was able to process one county on each of my 8 cores in 1.5 hours. And I've got a whole country to do!
I am not aware of any 10m raster US county dataset, although that would be amazing if someone pointed me to that.
The soils data is gSSURGO data - I'm also not aware if gSSURGO has within its many tables a county attribute. If it's there, I can't find it. that would also be an easy solution.

It may not be quicker but have you tried with raster::cellFromPolygon ?
Here is a simple example:
# Create a raster with zero values
r <- raster(ncols=30, nrows=30, res = 1/3)
values(r) <- 0
# Create polygons
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
pols <- SpatialPolygons(list(Polygons(list(Polygon(cds1)), 1), Polygons(list(Polygon(cds2)), 2)))
plot(r)
plot(pols, add = TRUE)
r2 <- r
# Find which cells are in which polygons
cellpol <- cellFromPolygon(r, pols)
# Not a really clean way to attribute values in the global environment...
lapply(1:length(cellpol), function(x) values(r2)[cellpol[[x]]] <<- x)
plot(r2)
plot(pols, add = TRUE)

Related

How to rasterize sf geometries tile by tile?

What I´m looking for is a way to rasterize (or fasterize) geometries within each tile of a certain extent step by step and join the parts of the rasterized geometries to entire raster objects.
## create sample data
# create a frame
library(sf)
ob = st_sf(st_sfc(st_polygon(list(rbind(c(0,0), c(0,9), c(6,9), c(6,0), c(0,0)))))
# create tiles
library(GSIF)
tl <- getSpatialTiles(as(ob, 'Spatial'), block.x=3, overlap.percent=2)
plot(tl)
# create sample polygons
g <- st_sfc(st_point(c(1,2)), st_point(c(5,6)), st_point(c(2,4)), st_point(c(3,3)), st_point(c(3,4)), st_point(c(4,5)))
g.b <- st_buffer(g,0.6)
p <- st_sf(value = ceiling(10*runif(6)),
geometry = st_sfc(g.b))
plot(p, add=TRUE)
Cropping the polygons (or parts of polygons) that are within each tile works fine , i guess. My actual goal is to process the data tile by tile. It looks like the following loop does this and also joins the parts of the polygons to entire polygons. Well, the output (cr) is the same as the input (p)... I thought that this step might be necessary in order to rasterize the resulting (parts of) polygons in the next step. i didn´t expect that the polygons would be joined. I was trying to build a loop based on the intermediate result, therefore following lines are add:
# crop polygons (or parts) for each tile
result <- p
for(i in 1:length(tl)) {cr <- rbind(result, st_crop(p, tl[i]))}
cr <- cr[-7,]
library(scales)
plot(st_geometry(cr), col=alpha("white", 0.5), add=TRUE)
However, I struggle rasterizing the (intermediate) results.

Try this:
myF <- function(i){return(st_crop(p, tl[i]))}
cr <- do.call(rbind, lapply(1:length(tl), myF))

Query raster brick layer based on another raster in R

I have a NetCDF file of global oceanographic (OmegaA) data at relatively coarse spatial resolution with 33 depth levels. I also have a global bathymetry raster at much finer resolution. My goal is to use get the seabed OmegaA data from the NetCDF file, using the bathymetry data to determine the desired depth. My code so far;
library(raster)
library(rgdal)
library(ncdf4)
# Aragonite data. Defaults to CRS WGS84
ncin <- nc_open("C:/..../GLODAPv2.2016b.OmegaA.nc")
ncin.depth <- ncvar_get(ncin, "Depth")# 33 depth levels
omegaA.brk <- brick("C:/.../GLODAPv2.2016b.OmegaA.nc")
omegaA.brk <-rotate(omegaA.bkr)# because netCDF is in Lon 0-360.
# depth raster. CRS WGS84
r<-raster("C:/....GEBCO.tif")
# resample the raster brick to the resolution that matches the bathymetry raster
omegaA.brk <-resample(omegaA.brk, r, method="bilinear")
# create blank final raster
omegaA.rast <- raster(ncol = r#ncols, nrow = r#nrows)
extent(omegaA.rast) <- extent(r)
omegaA.rast[] <- NA_real_
# create vector of indices of desired depth values
depth.values<-getValues(r)
depth.values.index<-which(!is.na(depth.values))
# loop to find appropriate raster brick layer, and extract the value at the desired index, and insert into blank raster
for (p in depth.values.index) {
dep.index <-which(abs(ncin.depth+depth.values[p]) == min(abs(ncin.depth+depth.values[p]))) ## this sometimes results in multiple levels being selected
brk.level <-omegaA.brk[[dep.index]] # can be more than on level if multiple layers selected above.
omegaA.rast[p] <-omegaA.brk[[1]][p] ## here I choose the first level if multiple levels have been selected above
print(paste(p, "of", length(depth.values.index))) # counter to look at progress.
}
The problem: The result is a raster with massive gaps (NAs) in it where there should be data. The gaps often take a distinctive shape - eg, follow a contour, or along a long straight line. I've pasted a cropped example.
enter image description here
I think this could be because either 1) for some reason the 'which' statement in the loop is not finding a match or 2) a misalignment of the projections is created which I've read can happen when using 'Rotate'.
I've tried to make sure all the extents, resolutions, number of cells, and CRS's are all the same, which they seem to be.
To speed up the process I've cropped the global brick and bathy raster to my area of interest, again checking that all the spatial resolutions, etc etc match - I've not included those steps here for simplicity.
At a loss. Any help welcome!

Without a reproducible example, this kind of problems is hard to solve. I can't tell where your problem is but I'll present to you the approach I would try. Maybe it's good, maybe it's bad, I don't know but it may inspire you to find a way to go around your problem.
To my understanding, you have a brick of OmegaA (33 layers/depth) and a bathymetry raster. You want to get the OmegaA value at the bottom of the sea. Here is how I would do:
Make OmegaA raster to the same resolution and extent to the bathymetry one
Transforme the bathymetry raster into a raster brick of 33 three layers of 0-1. e.g. If the sea bottom is at 200m for one particular pixel, than this pixel on all depth layer other than 200 is 0 and 1 for the 200. To program this, I would go the long way, something like
:
r_1 <- r
values(r_1) <- values(r)==10 # where 10 is the depth (it could be a range with < or >)
r_2 <- r
values(r_2) <- values(r)==20
...
r_33 <- r
values(r_33) <- values(r)==250
r_brick <- brick(r_1, r_2, ..., r_33)
then you multiple both your raster bricks. They have the same dimension, it should be easy. The output should be a raster brick of 33 layers with 0 everywhere where it isn't the bottom of the sea and the value of OmegaA anywhere else.
Combine all the layer of the brick obtained previously into a simple raster with a sum.
This should work. If you have problem with dealing with raster brick, you could make the data into base R arrays, it could be simpler.
Good luck.

R - find point farthest from set of points on rasterized USA map

New to spatial analysis on R here. I have a shapefile for the USA that I downloaded from HERE. I also have a set of lat/long points (half a million) that lie within the contiguous USA.
I'd like to find the "most remote spot" -- the spot within the contiguous USA that's farthest from the set of points.
I'm using the rgdal, raster and sp packages. Here's a reproducible example with a random sample of 10 points:
# Set wd to the folder tl_2010_us_state_10
usa <- readOGR(dsn = ".", layer = "tl_2010_us_state10")
# Sample 10 points in USA
sample <- spsample(usa, 10, type = "random")
# Set extent for contiguous united states
ext <- extent(-124.848974, -66.885444, 24.396308, 49.384358)
# Rasterize USA
r <- raster(ext, nrow = 500, ncol = 500)
rr <- rasterize(usa, r)
# Find distance from sample points to cells of USA raster
D <- distanceFromPoints(object = rr, xy = sample)
# Plot distances and points
plot(D)
points(sample)
After the last two lines of code, I get this plot.
However, I'd like it to be over the rasterized map of the USA. And, I'd like it to only consider distances from cells that are in the contiguous USA, not all cells in the bounding box. How do I go about doing this?
I'd also appreciate any other tips regarding the shape file I'm using -- is it the best one? Should I be worried about using the right projection, since my actual dataset is lat/long? Will distanceFromPoints be able to efficiently process such a large dataset, or is there a better function?

To limit raster D to the contiguous USA you could find the elements of rr assigned values of NA (i.e. raster cells within the bounding box but outside of the usa polygons), and assign these same elements of D a value of NA.
D[which(is.na(rr[]))] <- NA
plot(D)
lines(usa)
You can use 'proj4string(usa)' to find the projection info for the usa shapefile. If your coordinates of interest are based on a different projection, you can transform them to match the usa shapefile projection as follows:
my_coords_xform <- spTransform(my_coords, CRS(proj4string(usa)))
Not sure about the relative efficiency of distanceFromPoints, but it only took ~ 1 sec to run on my computer using your example with 10 points.

I think you were looking for the mask function.
library(raster)
usa <- getData('GADM', country='USA', level=1)
# exclude Alaska and Hawaii
usa <- usa[!usa$NAME_1 %in% c( "Alaska" , "Hawaii"), ]
# get the extent and create raster with preferred resolution
r <- raster(floor(extent(usa)), res=1)
# rasterize polygons
rr <- rasterize(usa, r)
set.seed(89)
sample <- spsample(usa, 10, type = "random")
# Find distance from sample points to cells of USA raster
D <- distanceFromPoints(object = rr, xy = sample)
# remove areas outside of polygons
Dm <- mask(D, rr)
# an alternative would be mask(D, usa)
# cell with highest value
mxd <- which.max(Dm)
# coordinates of that cell
pt <- xyFromCell(r, mxd)
plot(Dm)
points(pt)
The distances should be fine, also when using long/lat data. But rasterFromPoints could indeed be a bit slow with a large data set as it uses a brute force algorithm.

In R, how to average spatial points data over spatial grid squares

Managed to solve problem now
I have a set of around 50 thousand points that have coordinates and one value associated with them. I would like to be able to place points into a grid averaging the associated value of all points that fall into a grid square. So I want to end up with an object that identifies each grid square and gives the average inside the grid square.
I have the data in a spatial points data frame and a spatial grid object if that helps.
Improving answer: I have definitely done some searching, sorry about the initial state of the question I had only managed to frame the question inside my own head; hadn't had to communicate it to anyone else before...
Here is example data that hopefully illustrates the problem more clearly
##make some data
longi <- runif(100,0,10)
lati <- runif(100,0,10)
value <- runif(500,20,30)
##put in data frame then change to spatial data frame
df <- data.frame("lon"=longi,"lat"=lati,"val"=value)
coordinates(df) <- c("lon","lat")
proj4string(df) <- CRS("+proj=longlat")
##create a grid that bounds the data
grd <- GridTopology(cellcentre.offset=bbox(df)[,1],
cellsize=c(1,1),cells.dim=c(11,11))
sg <- SpatialGrid(grd)
Then I hope to get an object albeit a vector/data frame/list that gives me the average of value in each grid cell/square and some way of identifying which cell it is.
Solution
##convert the grid into a polygon##
polys <- as.SpatialPolygons.GridTopology(grd)
proj4string(polys) <- CRS("+proj=longlat")
##can now use the function over to select the correct points and average them
results <- rep(0, length(polys))
for(i in 1:length(polys)) {
results[i] = mean(df$val[which(!is.na(over(x=df,y=polys[i])))])
}
My question now is if this is the best way to do it or is there a more efficient way?

Your description is vague at best. Please try to ask more specific answers preferably, with code illustrating what you have already tried. Averaging a single value in your point data or a single raster cell makes absolutely no sense.
The best guess at an answer I can provide is to use raster extract() to assign the raster values to a sp point object and then use tapply() to aggregate the values to your grouping values in the points. You can use the coordinates of the points to identify cell location or alternately, the cellnumbers returned from extract (per below example).
require(raster)
require(sp)
# Create example data
r <- raster(ncol=500, nrow=500)
r[] <- runif(ncell(r))
pts <- sampleRandom(r, 100, sp=TRUE)
# Add a grouping value to points
pts#data <- data.frame(ID=rownames(pts#data), group=c( rep(1,25),rep(2,25),
rep(3,25),rep(4,25)) )
# Extract raster values and add to #data slot dataframe. Note, the "cells"
# attribute indicates the cell index in the raster.
pts#data <- data.frame(pts#data, extract(r, pts, cellnumbers=TRUE))
head(pts#data)
# Use tapply to cal group means
tapply(pts#data$layer, pts#data$group, FUN=mean)

How to get count of non-NA raster cells within polygon

I've been running into all sorts of issues using ArcGIS ZonalStats and thought R could be a great way. Saying that I'm fairly new to R, but got a coding background.
The situation is that I have several rasters and a polygon shape file with many features of different sizes (though all features are bigger than a raster cell and the polygon features are aligned to the raster).
I've figured out how to get the mean value for each polygon feature using the raster library with extract:
#load packages required
require(rgdal)
require(sp)
require(raster)
require(maptools)
# ---Set the working directory-------
datdir <- "/test_data/"
#Read in a ESRI grid of water depth
ras <- readGDAL("test_data/raster/pl_sm_rp1000/w001001.adf")
#convert it to a format recognizable by the raster package
ras <- raster(ras)
#read in polygon shape file
proxNA <- readShapePoly("test_data/proxy/PL_proxy_WD_NA_test")
#plot raster and shp
plot(ras)
plot(proxNA)
#calc mean depth per polygon feature
#unweighted - only assigns grid to district if centroid is in that district
proxNA#data$RP1000 <- extract(ras, proxNA, fun = mean, na.rm = TRUE, weights = FALSE)
#check results
head(proxNA)
#plot depth values
spplot(proxNA[,'RP1000'])
The issue I have is that I also need an area based ratio between the area of the polygon and all non NA cells in the same polygon. I know what the cell size of the raster is and I can get the area for each polygon, but the missing link is the count of all non-NA cells in each feature. I managed to get the cell number of all the cells in the polygon proxNA#data$Cnumb1000 <- cellFromPolygon(ras, proxNA)and I'm sure there is a way to get the actual value of the raster cell, which then requires a loop to get the number of all non-NA cells combined with a count, etc.
BUT, I'm sure there is a much better and quicker way to do that! If any of you has an idea or can point me in the right direction, I would be very grateful!

I do not have access to your files, but based on what you described, this should work:
library(raster)
mask_layer=shapefile(paste0(shapedir,"AOI.shp"))
original_raster=raster(paste0(template_raster_dir,"temp_raster_DecDeg250.tif"))
nonNA_raster=!is.na(original_raster)
masked_img=mask(nonNA_raster,mask_layer) #based on centroid location of cells
nonNA_count=cellStats(masked_img, sum)