Population density within polygons - r

So, I have some questions regarding the raster package in R. I have a raster with estimated population in each grid point. I also have a shapefile with polygons of regions. I want to find out the coordinates of the neighborhood with the highest population density within each regions. Supose that each neighborhood is a homogeneous square of 5 by 5 grid points.
The following toy example mimics my problem.
library(raster)
library(maptools)
set.seed(123)
data(wrld_simpl)
wrld_simpl <- st_as_sf(wrld_simpl)
contr_c_am <- wrld_simpl %>%
filter(SUBREGION ==13) %>%
filter(FIPS != "MX") %>%
select(NAME)
# Create a raster of population (sorry for the bad example spatial distribution)
r <- raster(xmn=-180, xmx=180, ymn=-90, ymx=90, res=0.1)
values(r) <- runif(ncell(r), 0, 100)
# keep only raster around the region of interest
r_small <- crop(r, extent(contr_c_am))
plot(r_small)
plot(st_geometry(contr_c_am), add = T)
raster_contr_c_am <- rasterize(contr_c_am, r)
raster_contr_c_am is the population grid and the name of the region is saved as an attribute.
Somehow I need to filter only grid points from one region, and probably use some funcion like focal() to find total nearby population.
focal(raster_contr_c_am, matrix(1,5,5),sum, pad = T, padValue = 0)
Then, I need to find which grid point has the highest value within each region, and save it's coordinates.
I hope my explanation is not too confusing,
Thanks for any help!

Here's an example that iterates over the shape defining the region, then uses the raster values within the region and the focal() function to find the maximum.
library(raster)
library(maptools)
library(sf)
library(dplyr)
set.seed(123)
data(wrld_simpl)
wrld_simpl <- st_as_sf(wrld_simpl)
contr_c_am <- wrld_simpl %>%
filter(SUBREGION ==13) %>%
filter(FIPS != "MX") %>%
select(NAME)
# Create a raster of population (sorry for the bad example spatial distribution)
r <- raster(xmn=-180, xmx=180, ymn=-90, ymx=90, res=0.1)
values(r) <- runif(ncell(r), 0, 100)
# keep only raster around the region of interest
r_small <- crop(r, extent(contr_c_am))
raster_contr_c_am <- rasterize(contr_c_am, r_small)
# function to find the max raster value using focal
# in a region
findMax <- function(region, raster) {
tt <- trim((mask(raster, region))) # focus on the region
ff <- focal(tt, w=matrix(1/25,nc=5,nr=5))
maximumCell <- which.max(ff) # find the maximum cell id
maximumvalue <- maxValue(ff) # find the maximum value
maximumx <- xFromCell(ff, maximumCell) # get the coordinates
maximumy <- yFromCell(ff, maximumCell)
# return a data frame
df <- data.frame(maximumx, maximumy, maximumvalue)
df
}
numberOfShapes <- nrow(contr_c_am)
ll <- lapply(1:numberOfShapes, function(s) findMax(region = contr_c_am[s,], raster = r_small))
merged <- do.call(rbind, ll)
maxpoints <- st_as_sf(merged, coords=c('maximumx', 'maximumy'), crs=crs(contr_c_am))
library(mapview) # optional but nice visualization - select layers to see if things look right
mapview(maxpoints) + mapview(r_small) + mapview(contr_c_am)
I've made an sf object so that it can be plotted with the other spatial objects. Using the mapview package, I get this.

Related

Raster output from rasterized sf points does not align with points

I am trying to rasterize some points and am getting a mismatch between the points and the rasters despite the crs being the same. If I convert the raster to polygons it lines up perfectly with the sf points data, but I can't figure out why the raster doesn't.
library(spData)
library(sf)
library(raster)
library(mapview)
## import some data
cycle_hire_osm = spData::cycle_hire_osm
## project to metres
cycle_hire_osm_projected = st_transform(cycle_hire_osm, crs = 27700)
## create raster template to rasterize to
raster_template <- raster(extent(cycle_hire_osm_projected), nrows = 10, ncols = 10, crs = 27700)
## rasterize the points
ch_raster1 = rasterize(cycle_hire_osm_projected, raster_template, field = 'capacity',
fun = sum, crs = 27700)
## convert raster to polygons
ch_poly <- rasterToPolygons(ch_raster1)
If these are plotted there are raster cells that have a value but have no points in.
## plot on a map
mapview(ch_poly)+cycle_hire_osm_projected+ch_raster1
Additional example based on reply to show the output as base, mapview and leaflet (note: I had to install the development versions of mapview and leaflet in order to plot SpatRasts)
library(spData)
library(sf)
library(terra)
library(dplyr)
# remove NAs so they are not considered
dat <- spData::cycle_hire_osm %>% filter(!is.na(capacity))
v <- vect(dat)
r <- rast(v, nrows=10, ncols=10)
chr <- rasterize(v, r, field="capacity", fun=sum, na.rm=TRUE)
## base plot
plot(chr)
points(v, cex=.5)
points(v[is.na(v$capacity)], cex=.5, col="red")
## mapview
library(mapview)
mapview(v)+chr
##leaflet
library(leaflet)
leaflet() |>
addProviderTiles(providers$CartoDB.Positron) |>
addCircles(data = v) |>
addRasterImage(chr)
All three plots are the same raster but the raster appears to have a different number of cells with values in each plot?
Adding example with project = FALSE as explain by #RobertHijmans
leaflet() |>
addProviderTiles(providers$CartoDB.Positron) |>
addCircles(data = v) |>
addRasterImage(chr, project = FALSE)
I do not see that issue with "raster" nor with its replacement, "terra"
when using base-plot
library(spData)
library(terra)
# using a SpatVector for easier plotting; results are the same
v <- vect(spData::cycle_hire_osm)
v <- project(v, "epsg:27700")
r <- rast(v, nrows=10, ncols=10)
chr <- rasterize(v, r, field="capacity", fun=sum, na.rm=TRUE)
plot(chr)
points(v, cex=.5)
points(v[is.na(v$capacity)], cex=.5, col="red")
But note that there are some cells with values of zero where all values of v$capacity are NA. That is because
sum(NA, na.rm=TRUE)
#[1] 0
To avoid that from happening you could do
vv <- v[!is.na(v$capacity)]
chr <- rasterize(vv, r, field="capacity", fun=sum, na.rm=TRUE)
The reason you see differences when using mapview/leaflet is that these use transform your data to the crs that they use. To avoid that use the
Pseudo-Mercator (EPSG:3857) crs, and, in leaflet, use project=FALSE when adding the raster data.
addRasterImage(chr, project=FALSE)

Repeat for loop for all rows of a spatial points data frame

I want to calculate the shortestPath distance (using gDistance package) between a set of geographic coordinates, using a transition layer of the ocean to prevent 'movement' across land.
Here is how I created the transition layer:
library(raster); library(gdistance); library(maptools); library(rgdal); library(sp)
mapcrs <- "+proj=longlat +datum=WGS84 +no_defs"
data(wrld_simpl)
world <- wrld_simpl
worldshp <- spTransform(world, mapcrs)
ras <- raster(nrow=300,ncol=300)
crs(ras) <- crs(oceans.shp)
extent(ras) <- extent(worldshp)
landmask <- rasterize(worldshp, ras)
landras <- is.na(landmask)
tr <- transition(landras, transitionFunction = mean, directions = 8, symm = FALSE)
tr = geoCorrection(tr, scl=FALSE)
I then want to calculate the shortestPath distance between every coordinate in my dataset i.e. location 1 to location n, location 2 to location n etc.
Let's produce some hypothetical geographic coordinates and convert to spatial points
x <- rnorm(10, mean = -40, sd=5)
y <- rnorm(10, mean = 20, sd=5)
xy <- cbind(x,y); colnames(xy) <- c("lon","lat")
xy <- SpatialPoints(xy); projection(xy) <- projection(mapcrs)
Using the shortestPath function in gDistance, I can calculate the distance from the first coordinate (i.e. xy[1]) to all other xy coordinates, like so.
dist <- shortestPath(tr, origin = xy, goal = xy, output="SpatialLines")
I then tried to apply a for loop to sequentially calculate distance from location 1 to all other locations, and then calculating distance from location 2 to all other locations etc., which I wrote as follows:
for(i in seq_along(xy)){
AtoB <- shortestPath(tr, origin = xy[i,], goal=xy, output="SpatialLines")
i <- i+1
}
This, however, still only calculates the distances relative to the first xy spatial point and does not 'loop' for all subsequent rows. I don't know what I'm doing wrong. It's probably super-easy, but I'm struggling. Any help would be appreciated.
Thanks in advance,
Tony
---- UPDATE ----
We have come up with a bit of a work around (thanks Charley Clubley) but it still won't produce outputs for every spatial line. This will generate a matrix of distances.
The work around is as follows:
Using xy as a matrix, not spatial points
distances <- matrix(ncol=nrow(xy), nrow=nrow(xy))
xy_b <- xy ## Coords needs to be as a matrix (not spatial points)
## This generates an error indicating there are no more rows to delete once complete, but the computation works
for (i in 1:nrow(xy_b)) {
AtoB <-shortestPath(tr, xy_b, xy, output="SpatialLines")
length <- SpatialLinesLengths(AtoB)
distances[i, ] <- length
xy_b <- xy_b[-1,]
}

Extract Raster Pixels Values Using Vector Polygons in R

I have been struggling with this for hours.
I have a shapefile (called "shp") containing 177 polygons i.e. 177 counties. This shapefile is overlaid on a raster. My raster (called "ras") is made of pixels having different pollution values.
Now I would like to extract all pixel values and their number of occurrences for each polygon.
This is exactly what the QGIS function "zonal histogram" is doing. But I would like to do the exact same thing in R.
I tried the extract() function and I managed to get a mean value per county, which is already a first step, but I would like to make a pixels distribution (histogram).
Could someone give me a hand ?
Many thanks,
Marie-Laure
Thanks a lot for your help. Next time I promise I will be careful and explain my issue more in details.
With your help I managed to find a solution.
I also used this website : http://zevross.com/blog/2015/03/30/map-and-analyze-raster-data-in-r/
For information, first I had to uninstall the "tidyr" package because there was a conflict with the extract function.
In case it can help someone, here is the final code :
# Libraries loading
library(raster)
library(rgdal)
library(sp)
# raster layer import
ras=raster("C:/*.tif")
# shapefile layer import
shp<-shapefile("C:/*.shp")
# Extract the values of the pixels raster per county
ext <- extract(ras, shp, method='simple')
# Function to tabulate pixel values by region & return a data frame
tabFunc <- function(indx, extracted, region, regname) {
dat <- as.data.frame(table(extracted[[indx]]))
dat$name <- region[[regname]][[indx]]
return(dat)
}
# run through each county & compute a table of the number
# of raster cells by pixel value. ("CODE" is the county code)
tabs <- lapply(seq(ext), tabFunc, ext, shp, "CODE")
# assemble into one data frame
df <- do.call(rbind, tabs)
# to see the data frame in R
print(df)
# table export
write.csv(df,"C:/*.csv", row.names = FALSE)
Here is a minimal, self-contained, reproducible example (almost literally from ?raster::extract, so not difficult to make)
library(raster)
r <- raster(ncol=36, nrow=18, vals=rep(1:9, 72))
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
polys <- spPolygons(cds1, cds2)
Now you can do
v <- extract(r, polys)
par(mfrow=c(1,2))
z <- lapply(v, hist)
Or more fancy
mains <- c("first", "second")
par(mfrow=c(1,2))
z <- lapply(1:length(v), function(i) hist(v[[i]], main=mains[i]))
Or do you want a barplot
z <- lapply(1:length(v), function(i) barplot(table(v[[i]]), main=mains[i]))

Difficulty with gBuffer in R: Resulting buffer is incorrect size

The overall goal of this code is to generate random points within a circular buffer based around a single lat/long point which I will enter in as needed. My apparent issue is that the buffer generated from gBuffer is not the correct size/location and therefore the points are farther than desired from the input location.
I am attempting to create a 130 meter buffer around a point. To construct my code I have been using 44.55555, -68.55555. I am using decimal degrees lat/long as that is what my data is in.
I have tried multiple stackoverflow threads to find the answer including:
Buffer (geo)spatial points in R with gbuffer
Create buffer and count points in R
#Enter in the lat and Long
NestLat <- readline(prompt="Enter Nest Latitude:") #Use 44.55555
NestLong <- readline(prompt="Enter Nest Longitude:") #Use -68.55555
#Coordinate from text to spatial points
NestLat <- as.numeric(NestLat)
NestLong <- as.numeric(NestLong)
nestcoords <- cbind(NestLat, NestLong)
nestcoords_sp <- SpatialPoints(nestcoords, proj4string=CRS("+proj=longlat +datum=WGS84"))
nestcoords_sp <- spTransform(nestcoords_sp, CRS("+init=epsg:2960"))
#Create buffer to generate 3 random points within 130m of nest
nestbuffer130 <- gBuffer(nestcoords_sp, width = 130)
nestbuffer130 <- spTransform(nestbuffer130, CRS("+proj=longlat +datum=WGS84"))
randoms130 <- spsample(nestbuffer130, 3, type = "random")
randoms130 <- spTransform(randoms130, CRS("+proj=longlat +datum=WGS84"))
nestbuffer130spdf <- as(nestbuffer130, "SpatialPolygonsDataFrame")
randoms130 <- as(randoms130, "SpatialPointsDataFrame")
The final buffer seems to be a circle with radius of 335 and not placed in the correct location spatially.
How are you measuring radius? The code seems to work just fine if you just want the sample points in the right location. Here is your code slightly modified with the gDistance function showing that your points are within the buffer zone. SF is now the preferred spatial package for R and the syntax is clearer and easier. I added what your code would look like with the SF package.
library(rgeos)
library(sp)
#Enter in the lat and Long
NestLat <- 44.55555
NestLong <- -68.55555
#Coordinate from text to spatial points
NestLat <- as.numeric(NestLat)
NestLong <- as.numeric(NestLong)
nestcoords <- cbind(NestLat, NestLong)
nestcoords_sp <- SpatialPoints(nestcoords, proj4string=CRS("+proj=longlat +datum=WGS84"))
nestcoords_sp <- spTransform(nestcoords_sp, CRS("+init=epsg:2960"))
#Create buffer to generate 3 random points within 130m of nest
nestbuffer130 <- gBuffer(nestcoords_sp, width = 130)
randoms130 <- spsample(nestbuffer130, 3, type = "random")
nestbuffer130spdf <- as(nestbuffer130, "SpatialPolygonsDataFrame")
randoms130 <- as(randoms130, "SpatialPointsDataFrame")
# measure distance
gDistance(randoms130, nestcoords_sp, byid = T)
SF
library(sf)
# turn coordinates into spatial poitns using sf
NestLat <- 44.55555
NestLong <- -68.55555
nestPoints <- st_point(c(NestLong,NestLat)) %>%
st_sfc(crs = 4326) %>%
st_transform(crs = 2960)
mapview(randoms130) + nestPoints
#Create buffer to generate 3 random points within 130m of nest
nestbuffer130 <- st_buffer(nestPoints, dist = 130)
randoms130 <- st_sample(nestbuffer130, 3)
# measure distance between points
st_distance(nestPoints,randoms130)
nestbuffer130SF <- st_sf(data = data.frame(ID = 1:length(randoms130)),
geometry = randoms130, crs = st_crs(randoms130)) %>%
st_transform(crs = 4326)
# check data visually
library(mapview)
mapview(nestbuffer130SF) + nestPoints

Is there a speedy and memory efficient way to calculate the proportion of overlay between a polygon and high resolutiong raster in R?

I need to calculate the proportion of each raster cell in a high resolution grid (raster stack with 8 layers) covered by a polygon using R.
My standard approach would be to use raster::rasterize(..., getCover = TRUE), however, this approach takes a very long time, particularly when the size of the polygon increases.
As an alternative, I tried cropping the raster stack to the extent of the polygon, transforming the raster stack to polygons and calculating the proportion from the intersection of the resulting shapes with the original polygon. This works well for small polygons but breaks down as the polygon increases, because R runs out of memory (I am limited to 16GB) or because the calculation of the intersection takes too long.
Here a reproducible example using my current solution with a very small shape file.
library(raster)
library(spex)
library(dplyr)
library(sf)
library(data.table)
# setup a dummy example
r <- raster(nrow = 21600, ncol = 43200)
r[] <- 1:933120000
r_stack <- stack(r,r,r,r,r,r,r,r)
# get a small dummy shapefile
shp_small <- raster::getData(name = "GADM", country = "CHE", level = 2, download = TRUE)
shp_small <- st_as_sf(shp_small)[1, ]
# for comparison, use a big dummy shapefile
# shp_big <- raster::getData(name = "GADM", country = "BRA", level = 0, download = TRUE)
## Approach for a small shape file
stack_small <- raster::crop(r_stack, shp_small, snap = "out")
## transform to polygon
stack_small_poly <- spex::polygonize(stack_small)
stack_small_poly$ID <- 1:nrow(stack_small_poly)
## I can then perform the necessary calculations on the polygons to obtain
## the proportional overlay
# transform to mollweide for area calculation
mollw <- "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
stack_small_crs <- st_crs(stack_small_poly)
stack_small <- st_transform(stack_small_poly, mollw)
stack_small_poly <- st_transform(stack_small_poly, mollw)
# calculate area for each cell
stack_small_poly$area_org <- st_area(stack_small_poly)
# transform to world equidistant cylindrical
stack_small_poly<- st_transform(stack_small_poly, 4087)
shp_small <- st_transform(shp_small, 4087)
# get call the cells that intersect with the shape (this might take a while)
i <- st_intersects(stack_small_poly, shp_small, sparse = FALSE)
stack_small_poly <- dplyr::filter(stack_small_poly, i)
# now calculate the extact intersection (this might take a while)
st_agr(stack_small_poly) <- "constant"
stack_small_poly <- st_intersection(stack_small_poly, st_geometry(shp_small))
# calculate the new areas and backtransform
stack_small_poly <- st_transform(stack_small_poly, mollw)
stack_small_poly$new_area <- st_area(stack_small_poly)
stack_small_poly <- st_transform(stack_small_poly, stack_small_crs)
# calculate proportion
stack_small_poly$proportion <- as.numeric(stack_small_poly$new_area/stack_small_poly$area_org)
# finally transform to data.table for subsequent analysis
st_geometry(stack_small_poly) <- NULL
setDT(stack_small_poly)
I am looking for a solution in R that is able to perform the task in 10-15 minutes (preferably faster) with a memory limit of 16 GB RAM for the shapefile representing Brazil (see shp_big in code above).
I am well aware that this optimum might not be achievable and every suggestion leading to a reduction in execution time and/or memory usage is more than wellcome.

Resources