Zonal statistics to get majority pixel value per polygon in R? - r

Actually I try to calculate the major pixel values from a raster with a SpatialPolygonsDataFrame. Here is some code I found which might lead in the right direction:
library(raster)
# Create interger class raster
r <- raster(ncol=36, nrow=18)
r[] <- round(runif(ncell(r),1,10),digits=0)
r[]<-as.integer(r[])
# Create two polygons
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
polys <- SpatialPolygonsDataFrame(SpatialPolygons(list(Polygons(list(Polygon(cds1)), 1),
Polygons(list(Polygon(cds2)),2))),data.frame(ID=c(1,2)))
# Extract raster values to polygons
( v <- extract(r, polys) )
# Get class counts for each polygon
v.counts <-lapply(v,table)
So far everything is fine but I´m really stuck to extract the column name of the column which has the highest counts.
I tried things like:
v.max<- lapply(v.counts,max)
But there the column information gets lost. After:
v.max<- lapply(v.counts, max.col)
I get just "1" as result.
I´d appreciate if somebody can give me a hint what I´m doing wrong. Is there also another way to extract the major pixel values in a polygon?

which.max() is your friend. Since you just want the names, use names().
sapply(v.counts, function(x) names(x)[which.max(x)])
# [1] "9" "5"
Note: set.seed(42)

exactextractr package can do this trick. It computes zonal statistics even faster than terra in some cases. See comparison here
library(exactextractr)
exact_extract(r, polys, 'majority')
#> Warning in .exact_extract(x, sf::st_as_sf(y), ...): No CRS specified for
#> polygons; assuming they have the same CRS as the raster.
#> |======================================================================| 100%
#> [1] 4 2

You can use the modal function
v <- extract(r, polys, modal)

Related

R raster::crop() The upper boundary of my cropped raster is always horizontal- why?

I'm trying to crop a large multipolygon shapefile by a single, smaller polygon. It works using st_intersection, however this takes a very long time, so I'm instead trying to convert the multipolygon to a raster, and crop that raster by the smaller polygon.
## packages - sorry if I've missed any!
library(raster)
library(rgdal)
library(fasterize)
library(sf)
## load files
shp1 <- st_read("pathtoshp", crs = 27700) # a large multipolygon shapefile to crop
### image below created using ggplot- ignore the black boundaries!
shp2 <- st_read("pathtoshp", crs = 27700) # a single, smaller polygon shapefile, to crop shp1 by
plot(shp2)
## convert to raster (faster than st_intersection)
projection1 <- CRS('+init=EPSG:27700')
rst_template <- raster(ncols = 1000, nrows = 1000,
crs = projection1,
ext = extent(shp1))
rst_shp1 <- fasterize(shp1, rst_template)
plot(rst_shp1)
rst_shp2 <- crop(rst_shp1, shp2)
plot(rst_shp2)
When I plot shp2, the upper boundary is flat, rather than fitting the true boundary of the shp2 polygon.
Any help would be greatly appreciated!
Maybe try raster::mask() instead of crop(). crop() uses the second argument as an extent with which to crop a raster; i.e. it's taking the bounding box (extent) of your second argument and cropping that entire rectangle from your raster.
Something important to understand about raster objects is that they are all rectangular. The white space you see surrounding your shape are just NA values.
raster::mask() will take your original raster, and a spatial object (raster, sf, etc.) and replace all values in your raster which don't overlap with your spatial object to NA (by default, you can supply other replacement values). Though I will say, mask() will likely also take awhile to run, so you may be better off just sticking with sf objects.
I would suggest moving to the "terra" package (faster and easier to use than "raster").
Here is an example.
library(terra)
r <- rast(system.file("ex/elev.tif", package="terra"))
v <- vect(system.file("ex/lux.shp", package="terra"))[4]
x <- crop(r, v)
plot(x); lines(v)
As edixon1 points out, a raster is always rectangular. If you want to set cells outside of the polygon to NA, you can do
x <- crop(r, v, mask=TRUE)
plot(x); lines(v)
In this example it makes no sense, but you could first rasterize
x <- crop(r, v)
y <- rasterize(v, x)
m <- mask(x, y)
plot(m); lines(v)
I am not sure if this answers your question. But if it does not, then please edit your question to make it reproducible, for example using the example data above.

Extract Raster Pixels Values Using Vector Polygons in R

I have been struggling with this for hours.
I have a shapefile (called "shp") containing 177 polygons i.e. 177 counties. This shapefile is overlaid on a raster. My raster (called "ras") is made of pixels having different pollution values.
Now I would like to extract all pixel values and their number of occurrences for each polygon.
This is exactly what the QGIS function "zonal histogram" is doing. But I would like to do the exact same thing in R.
I tried the extract() function and I managed to get a mean value per county, which is already a first step, but I would like to make a pixels distribution (histogram).
Could someone give me a hand ?
Many thanks,
Marie-Laure
Thanks a lot for your help. Next time I promise I will be careful and explain my issue more in details.
With your help I managed to find a solution.
I also used this website : http://zevross.com/blog/2015/03/30/map-and-analyze-raster-data-in-r/
For information, first I had to uninstall the "tidyr" package because there was a conflict with the extract function.
In case it can help someone, here is the final code :
# Libraries loading
library(raster)
library(rgdal)
library(sp)
# raster layer import
ras=raster("C:/*.tif")
# shapefile layer import
shp<-shapefile("C:/*.shp")
# Extract the values of the pixels raster per county
ext <- extract(ras, shp, method='simple')
# Function to tabulate pixel values by region & return a data frame
tabFunc <- function(indx, extracted, region, regname) {
dat <- as.data.frame(table(extracted[[indx]]))
dat$name <- region[[regname]][[indx]]
return(dat)
}
# run through each county & compute a table of the number
# of raster cells by pixel value. ("CODE" is the county code)
tabs <- lapply(seq(ext), tabFunc, ext, shp, "CODE")
# assemble into one data frame
df <- do.call(rbind, tabs)
# to see the data frame in R
print(df)
# table export
write.csv(df,"C:/*.csv", row.names = FALSE)
Here is a minimal, self-contained, reproducible example (almost literally from ?raster::extract, so not difficult to make)
library(raster)
r <- raster(ncol=36, nrow=18, vals=rep(1:9, 72))
cds1 <- rbind(c(-180,-20), c(-160,5), c(-60, 0), c(-160,-60), c(-180,-20))
cds2 <- rbind(c(80,0), c(100,60), c(120,0), c(120,-55), c(80,0))
polys <- spPolygons(cds1, cds2)
Now you can do
v <- extract(r, polys)
par(mfrow=c(1,2))
z <- lapply(v, hist)
Or more fancy
mains <- c("first", "second")
par(mfrow=c(1,2))
z <- lapply(1:length(v), function(i) hist(v[[i]], main=mains[i]))
Or do you want a barplot
z <- lapply(1:length(v), function(i) barplot(table(v[[i]]), main=mains[i]))

How to calculate area of multipart polygon in R

I'm having trouble calculating in R the area of an imported shapefile that has a multipart polygon (one feature containing two separate polygons). I noticed that ArcMap gave me a different value for the area of a shapefile than raster::area. To figure out which program was giving me the correct area, I broke the shapefile into single parts and recalculated the area of the two separate polygons:
library(raster)
> single_part <- shapefile("../Desktop/test/test_sp.shp")
> area(single_part)
[1] 575924.0 433409.8
> sum(area(single_part))
[1] 1009334
>
> multi_part <- shapefile("../Desktop/test/test_mp.shp")
> area(multi_part)
[1] 1018390
I realize now that I know about this problem, I should always break up polygon feature classes into single parts, but does anyone know how raster::area calculates the area of multipart polygons? I also tried using rgeos::gArea but got the same result. Is there a way to calculate the area of multipart polygons in R?
I'd love to know, because they're pretty common and I'm trying to switch from doing all my analyses in ArcMap to R.
In case it's helpful, here's an image of the shapefile:
multipart poly shapefile
EDIT ADDED 9/21/2018 -------------------------------------------------------
Here's a link to the shapefile test_mp.shp
From what I can tell, it seems like the problem stems from how R (vs. ArcMap) interprets the holes. See the difference between the ArcMap display and the R display. For some reason R is filling in those holes as part of the shapefile, which must be the reason that I'm getting different calculations for the area. Is there something wrong with the shapefile, or how I'm importing it?
Clearly your object named 'multi_part' has only one (multi?) polygon, as area returns a single value. I illustrate here how to investigate what you are after:
library(raster)
d <- getData('GADM', country='Isle of Man', level=0)
area(d)
[1] 579672897
Split into 4 polygons (islands)
dd <- disaggregate(d)
a <- area(dd)
a
[1] 19424.12 2705442.41 25629.79 576922400.90
sum(a)
[1] 579672897
The same area, and there is no reason why they would be different. Except perhaps if there is confusion with polygon holes. It is difficult to comment without your data.
You can write these objects to disk (see below) and see what ArcGIS gives you as area (but note that this example uses lon/lat coordinates, I am not sure if ArcGIS can compute areas on those).
shapefile(d, "man.shp")
Here is a case with and without a hole:
p1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60), c(-180,-20))
p2 <- rbind(c(-150,-20), c(-100,-10), c(-110,20), c(-150,-20))
# two (overlapping) polygons (no hole)
pol1 <- spPolygons(p1, p2, crs="+proj=utm +zone=1 +datum=WGS84")
# single polygon with hole
pol2 <- spPolygons(list(p1, p2), crs="+proj=utm +zone=1 +datum=WGS84")
a <- area(pol1) / 10e+9
b <- area(pol2) / 10e+9
a
#[1] 10925 800
sum(a)
#[1] 11725
a[1]-a[2]
#[1] 10125
b matches a[1] - a[2], as expected
b
#[1] 10125
I get exactly the same results with ArcGIS, using "calculate geometry" for a field in the attribute tables.

How to subset a raster based on grid cell values

My following question builds on the solution proposed by #jbaums on this post: Global Raster of geographic distances
For the purpose of reproducing the example, I have a raster dataset of distances to the nearest coastline:
library(rasterVis); library(raster); library(maptools)
data(wrld_simpl)
# Create a raster template for rasterizing the polys.
r <- raster(xmn=-180, xmx=180, ymn=-90, ymx=90, res=1)
# Rasterize and set land pixels to NA
r2 <- rasterize(wrld_simpl, r, 1)
r3 <- mask(is.na(r2), r2, maskvalue=1, updatevalue=NA)
# Calculate distance to nearest non-NA pixel
d <- distance(r3) # if claculating distances on land instead of ocean: d <- distance(r3)
# Optionally set non-land pixels to NA (otherwise values are "distance to non-land")
d <- d*r2
levelplot(d/1000, margin=FALSE, at=seq(0, maxValue(d)/1000, length=100),colorkey=list(height=0.6), main='Distance to coast (km)')
The data looks like this:
From here, I need to subset the distance raster (d), or create a new raster, that only contains cells for which the distance to coastline is less than 200 km. I have tried using getValues() to identify the cells for which the value <= 200 (as show below), but so far without success. Can anyone help? Am I on the right track?
#vector of desired cell numbers
my.pts <- which(getValues(d) <= 200)
# create raster the same size as d filled with NAs
bar <- raster(ncols=ncol(d), nrows=nrow(d), res=res(d))
bar[] <- NA
# replace the values with those in d
bar[my.pts] <- d[my.pts]
I think this is what you are looking for, you can treat a raster like a matrix here right after you d <- d*r2 line:
d[d>=200000]<-NA
levelplot(d/1000, margin=FALSE, at=seq(0, maxValue(d)/1000, length=100),colorkey=list(height=0.6), main='Distance to coast (km)')
(in case you forgot: the unit is in meters so the threshold should be 200000, not 200)

unwanted subgeometries when converting raster to polygons

I am converting many rasters to polygon. But in quite a few cases, I am seeing unexpected subgeometries, and I can't seem to get rid of them.
This is with R v3.3.3 and raster package v2.5-8.
Here is an example that should reproduce the problem I am having.
You can download the raster that I use here.
# first, read in raster and coarsen to something more manageable
library(raster)
library(rgeos)
env <- raster('adefi.tif')
env2 <-aggregate(env, 8)
# Reclassify such that cells are either 1 or NA
env2[!is.na(env2)] <- 1
# this is what the raster now looks like:
plot(env2)
# Now I convert to polygon, choosing to dissolve
p <- rasterToPolygons(env2, dissolve=T)
plot(p)
# I find that I can't get rid of these subgeometries
p <- gUnaryUnion(p) # identical result
gIsValid(p) # returns TRUE
I'm not sure where the problem is... Is it in how the raster package converts to cell polygons? Or is it how the rgeos package dissolves those cell polygons together?
Is there a work-around?
It looks like a projection issue. This works for me:
library(raster)
library(rgeos)
env <- raster(file.path(fp, "adefi.tif"))
env2 <- aggregate(env, 8)
env2[is.na(env2) == F] <- 1
# Project Raster
proj_env2 <- projectRaster(env2, crs = CRS("+init=epsg:3577"))
p <- rasterToPolygons(proj_env2, dissolve = T)
plot(p)
Not sure why the need for reprojection since epsg:3577 looks to be the same as the original projection, but I usually confirm projection using proj4string() or spTransform() to make sure everything will line up.

Resources