Summary Statistics of Raster By Latitudinal Intervals - r

Is there a quick way in R to do summary statistics on a raster based on latitudinal intervals or bins. Not a summary of the entire raster layer but spatial subsections. For example, get the mean and sd of raster cell values for every two degrees in latitude.
Below is some example data of a projected raster with Lat/Long coordinates.
set.seed(2013)
library(raster)
r <- raster(xmn=-110, xmx=-90, ymn=40, ymx=60, ncols=40, nrows=40)
r <- setValues(r, rnorm(1600)) #add values to raster
r[r > -0.2 & r < 0.2] <- NA #add some NA's to resemble real dataset
plot(r)
> r
class : RasterLayer
dimensions : 40, 40, 1600 (nrow, ncol, ncell)
resolution : 0.5, 0.5 (x, y)
extent : -110, -90, 40, 60 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
data source : in memory
names : layer
values : -3.23261, 2.861592 (min, max)

Since your raster's resolution is 0.5 and you have 40 rows, you want the mean / sd for every 4 rows:
set.seed(2013)
library(raster)
r <- raster(xmn=-110, xmx=-90, ymn=40, ymx=60, ncols=40, nrows=40)
r <- setValues(r, rnorm(1600)) #add values to raster
r[r > -0.2 & r < 0.2] <- NA #add some NA's to resemble real dataset
rmean <- sapply(seq(1,nrow(r),4),function(rix) mean(r[rix:rix+3,],na.rm=T))
rsd <- sapply(seq(1,nrow(r),4),function(rix) sd(r[rix:rix+3,],na.rm=T))
# > rmean
# [1] -0.033134373 -0.180689704 0.176575934 -0.003422832 -0.049113312 0.234891614 0.188559162 -0.026514169 0.106970362
# [10] 0.096033677
So you're basically indexing the raster as matrix, only using the slices needed for mean / sd. For iteration you could also use lapply, which puts everything in a neat list.

You can aggregate your rows (groups of 4 in this case) and columns (into one column)
a <- aggregate(r, c(ncol(r), 4), fun=mean)
b <- aggregate(r, c(ncol(r), 4), fun=sd)
lat <- yFromRow(a, 1:nrow(a))
plot(lat, values(a))

Related

How can I subset a raster by conditional statement in R using `terra`?

I am trying to plot only certain values from a categorical land cover raster I am working with. I have loaded it in to R using the terra package and it plots fine. However, since the original data did not come with a legend, I am trying to find out which raster value corresponds to what on the map.
Similar to the answer provided here: How to subset a raster based on grid cell values
I have tried using the following line:
> landcover
class : SpatRaster
dimensions : 20057, 63988, 1 (nrow, ncol, nlyr)
resolution : 0.0005253954, 0.0005253954 (x, y)
extent : -135.619, -102, 59.99989, 70.53775 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source : spat_n5WpgzBuVAV3Ijm.tif
name : CAN_LC_2015_CAL_wgs
min value : 1
max value : 18
> plot(landcover[landcover == 18])
Error: cannot allocate vector of size 9.6 Gb
However, this line takes a very long time to run and produces a vector memory error. The object is 1.3 kb in the global environment and the original tif is about 300 mb.
You can use cats to find out which values correspond to which categories.
library(terra)
set.seed(0)
r <- rast(nrows=10, ncols=10)
values(r) <- sample(3, ncell(r), replace=TRUE) - 1
cls <- c("forest", "water", "urban")
levels(r) <- cls
names(r) <- "land cover"
cats(r)[[1]]
# ID category
#1 0 forest
#2 1 water
#3 2 urban
To plot a logical (Boolean) layer for one category, you can do
plot(r == "water")
And from from the above you can see that in this case that is equivalent to
plot(r == 1)
I think I found the solution to write the conditional within the plot function as below:
plot(landcover == 18)
For those looking for a reproduceable example, just load the rlogo:
s <- rast(system.file("ex/logo.tif", package="terra"))
s <- s$red
plot(s == 255)

Calculate overlap of protected area polygons with predicted maxent habitat suitability

I want to calculate the percentage area of habitat suitability of a species that overlaps with protected area polygons. I do not know the R language very well, but here is what I have so far.
These are the attributes of the area of habitat suitability derived from a maxent prediction:
class : RasterLayer
dimensions : 6480, 8520, 55209600 (nrow, ncol, ncell)
resolution : 0.008333333, 0.008333333 (x, y)
extent : -103, -32, -36, 18 (xmin, xmax, ymin, ymax)
crs : +proj=longlat +ellps=WGS84
of the protected areas:
Simple feature collection with 5667 features and 2 fields (with 8 geometries empty)
geometry type: GEOMETRY
dimension: XY
bbox: xmin: -118.6344 ymin: -59.85538 xmax: -25.29094 ymax: 32.48333
CRS: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
Does someone know a way to calculate the percentage area of habitat suitability that overlaps with protected area polygons?
Sorry, I really do not know so much about how to work with these data. I hope I gave all the relevant information.
I would appreciate any input.
To answer your first question, you should be able to use zonal statistics to calculate the area of potential habitat found in protected areas using the spatialEco package:
zonal.stats(x, y, stats = c("min", "mean", "max"))
#x = Polygon object of class SpatialPolygonsDataFrame
#y = rasterLayer object of class raster
https://www.rdocumentation.org/packages/spatialEco/versions/1.3-0/topics/zonal.stats
Here is a reproducible example from the spatialEco package that first calculates the percentage of pixels in each polygon >= a threshold value and second calculates the sum of pixels in each polygon >= the threshold value used to reclassify the input raster. You might be interested in both avenues for your work.
library(spatialEco)
library(raster)
library(sp)
# here the fxn will calculate the percentage of cells >= 0.5
# percent x >= p function
pct <- function(x, p=0.50, na.rm = FALSE) {
if ( length(x[x >= p]) < 1 ) return(0)
if ( length(x[x >= p]) == length(x) ) return(1)
else return( length(x[x >= p]) / length(x) )
}
# create some example data
p <- raster(nrow=10, ncol=10)
p[] <- runif(ncell(p)) * 10
p <- rasterToPolygons(p, fun=function(x){x > 9})
r <- raster(nrow=100, ncol=100)
r[] <- runif(ncell(r))
plot(r)
plot(p, add=TRUE, lwd=4)
# run zonal statistics using pct functions
z.pct <- zonal.stats(x=p, y=r, stats = "pct")
z.pct
#Alternatively, reclassify the raster based on a threshold
r.c<-reclassify(r, c(-Inf, 0.5, 0, 0.5, Inf, 1)) #all values >0.5 reclassified to 1
plot(r.c)
plot(p, add=TRUE, lwd=4) #add poly to the plot
# run zonal stats and calculate sum of cells in each poly
z.sum <- zonal.stats(x=p, y=r.c, stats = "sum")
z.sum

Resample raster

I am trying to resample a forest cover raster with high resolution (25 meters) and categorical data (1 to 13) to a new RasterLayer with a lower resolution (~ 1 km). My idea is to combine the forest cover data with other lower-resolution raster data :
I tried raster::resample(), but since the data is categorical I lost a lot of information:
summary(as.factor(df$loss_year_mosaic_30m))
0 1 2 3 4 5 6 7 8 9 10 11 12 13
3777691 65 101 50 151 145 159 295 291 134 102 126 104 91
As you can see, the new raster has the desired resolution but have lots of zeros as well. I suppose that is normal since I used the ´ngb´ option in resample.
The second strategy was using raster::aggregate() but I find difficult to define a factor integer since the change of resolution is not straightforward (like the double of the resolution or alike).
My high-resolution raster has the following resolution, and I want it to aggregate it to a 0.008333333, 0.008333333 (x, y) resolution to the same extent.
loss_year
class : RasterLayer
dimensions : 70503, 59566, 4199581698 (nrow, ncol, ncell)
resolution : 0.00025, 0.00025 (x, y)
extent : -81.73875, -66.84725, -4.2285, 13.39725 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : /Volumes/LaCie/Deforestacion/Hansen/loss_year_mosaic_30m.tif
names : loss_year_mosaic_30m
values : 0, 13 (min, max)
I have tried a factor of ~33.33 following the description of the aggregate help: "The number of cells is the number of cells of x divided by fact*fact (when fact is a single number)." Nonetheless, the resulting raster data do not seem to have the same number of rows and columns as my other low-resolution rasters.
I have never used this high-resolution data, and I am also computationally limited (some of this commands can be parallelized using clusterR, but sometimes they took the same time than the non-parallelized commands, especially since they do not work for nearest neighboor calculations).
I am short of ideas; maybe I can try layerize to obtain a count raster, but I have to ´aggregate´ and the factor problem arises. Since this processes are taking me days to process, I do want to know the most efficient way to create a lower resolution raster without losing much information
A reproducible example could be the following:
r_hr <- raster(nrow=70, ncol=70) #High resolution raster with categorical data
set.seed(0)
r_hr[] <- round(runif(1:ncell(r_hr), 1, 5))
r_lr <- raster(nrow=6, ncol=6) #Low resolution raster
First strategy: loss of information
r <- resample(r_hr, r_lr, method = "ngb") #The raster data is categorical
Second strategy: difficult to define an aggregate factor
r <- aggregate(r_hr, factor) #How to define a factor to get exactly the same number of cells of h_lr?
Another option: layerize
r_brick <- layerize(r_hr)
aggregate(r_brick, factor) #How to define factor to coincide with the r_lr dimensions?
Thanks for your help!
r_hr <- raster(nrow=70, ncol=70) #High resolution raster with categorical data
set.seed(0)
r_hr[] <- round(runif(1:ncell(r_hr), 1, 5))
r_lr <- raster(nrow=6, ncol=6)
r_hr
#class : RasterLayer
#dimensions : 70, 70, 4900 (nrow, ncol, ncell)
#resolution : 5.142857, 2.571429 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
#data source : in memory
#names : layer
#values : 1, 5 (min, max)
r_lr
#class : RasterLayer
#dimensions : 6, 6, 36 (nrow, ncol, ncell)
#resolution : 60, 30 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
Direct aggregate is not possible, because 70/6 is not an integer.
dim(r_hr)[1:2] / dim(r_lr)[1:2]
#[1] 11.66667 11.66667
Nearest neighbor resampling is not a good idea either as the results would be arbitrary.
Here is a by layer approach that you suggested and dww also showed already.
b <- layerize(r_hr)
fact <- round(dim(r_hr)[1:2] / dim(r_lr)[1:2])
a <- aggregate(b, fact)
x <- resample(a, r_lr)
Now you have proportions. If you want a single class you could do
y <- which.max(x)
In that case, another approach would be to aggregate the classes
ag <- aggregate(r_hr, fact, modal)
agx <- resample(ag, r_lr, method='ngb')
Note that agx and y are the same. But they can both be problematic as you might have cells with 5 classes with each about 20%, making it rather unreasonable to pick one winner.
It is pretty standard practice to aggregate land cover maps into layers of %cover. I.e you should aim to produce 13 layers, each being something like %cover in that grid cell. Doing this allows you to reduce the resolution while retaining as much information as possible. N.B if you require a different summary statistic than %, should be easy to adapt the following method to whatever statistic you want, by changing the fun = function in aggregate.
The following method is pretty fast (it takes just a few seconds on my laptop to process raster with 100 million cells):
First, let's create some dummy rasters to use
Nhr <- 1e4 # resolution of high-res raster
Nlr <- 333 # resolution of low-res raster
r.hr <- raster(ncols=Nhr, nrows=Nhr)
r.lr <- raster(ncols=Nlr, nrows=Nlr)
r.hr[] <- sample(1:13, Nhr^2, replace=T)
Now, we begin by aggregating the high res raster to almost the same resolution as the low res one (to nearest integer number of cells). Each resulting layer contains the fraction of area within that cell in which value of original raster is N.
Nratio <- as.integer(Nhr/Nlr) # ratio of high to low resolutions, to nearest integer value for aggregation
layer1 <- aggregate(r.hr, Nratio, fun=function(x, na.rm=T) {mean(x==1, na.rm=na.rm)})
layer2 <- aggregate(r.hr, Nratio, fun=function(x, na.rm=T) {mean(x==2, na.rm=na.rm)})
And finally, resample low res raster to the desired resolution
layer1 <- resample(layer1, r.lr, method = "ngb")
layer2 <- resample(layer2, r.lr, method = "ngb")
repeat for each layer, and build your layers into a stack or a multi-band raster

Matching the resolution of two rasters

Im working with two rasters each with a different resolution. Im wondering if there is a more efficient way of matching the coarser raster resolution to the finer raster resolution. Right now I am using the mask function to save some time, clip to the correct extent and change the resolution:
library(raster)
#the raster template with the desired resolution
r <- raster(extent(-180, 180, -64, 84), res=0.04166667)
# set some pixels to values, others to NA
r <- setValues(r, sample(c(1:3, NA), ncell(r), replace=TRUE))
#load the raster
lc_r1 <- raster(r)
res(lc_r1) <- 0.5
values(lc_r1) <- 1:ncell(lc_r1)
lc_r1
##class : RasterLayer
##dimensions : 296, 720, 213120 (nrow, ncol, ncell)
##resolution : 0.5, 0.5 (x, y)
##extent : -180, 180, -64, 84 (xmin, xmax, ymin, ymax)
##coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
##data source : in memory
##names : layer
##values : 1, 213120 (min, max)
#create the new finer resolution raster.
lc_r2 <- mask (lc_r1, r2)
Error in compareRaster(x, mask) : different number or columns
Im also trying the disaggregate function in raster but I get this odd error!
lc_r2 <- disaggregate (lc_r1, nrows=3600 )
Error: !is.null(fact) is not TRUE
This seems to work for the time being but not sure if its correct:
lc_r2 <- disaggregate (lc_r1, fact=c(12,12 ), method='bilinear')
Why would this Error: !is.null(fact) is not TRUE be odd? If you look at ?disaggregate you will see that there is no argument nrows, but there is a required argument fact, which you did not supply.
You can do
lc_r2a <- disaggregate (lc_r1, fact=12)
Or
lc_r2b <- disaggregate(lc_r1, fact=12, method='bilinear')
which is equivalent to
lc_r2c <- resample(lc_r1, r)
Why are you not sure that this is correct?
However, given that you want to mask lc_r1, the logical approach would be to go the opposite direction and change the resolution of your mask, r,
ra <- aggregate(r, fact=12, na.rm=TRUE)
lcm <- mask(lc_r1, ra)

Converting shapefile to raster

I'm having an issue rasterizing a shapefile to produce points on a 0.5*0.5 grid. The shapefile represents classifications of risk level (Low-0, Medium-100, High-1000, Very High-1500) of global coral reefs to integrated threats.
I pulled the code from another example that works fine, but when I try it for my data I get nothing from the plot function. See below for the link to the shapefile and my code:
Reefs At Risk: Global Integreated Threats
# Read shapefile into R
library(rgdal)
library(raster)
int.threat.2030 <- readOGR(dsn = "Global_Threats/Integrated_Future",
layer = "rf_int_2030_poly")
## Set up a raster "template" for a 0.5 degree grid
ext <- extent(-110, -50, 0, 35)
gridsize <- 0.5
r <- raster(ext, res=gridsize)
## Rasterize the shapefile
rr <- rasterize(int.threat.2030, r)
## Plot raster
plot(rr)
Any ideas where I might be going wrong? Is it an issue with the shapefile itself?
Please and thanks!
You assumed that the polygons were in lon/lat coordinates, but they are not:
library(raster)
library(rgdal)
p <- shapefile('Global_Threats/Integrated_Future/rf_int_2030_poly.shp')
p
#class : SpatialPolygonsDataFrame
#features : 63628
#extent : -18663508, 14601492, -3365385, 3410115 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=cea +lon_0=-160 +lat_ts=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
#variables : 3
#names : ID, THREAT, THREAT_TXT
#min values : 1, 0, Critical
#max values : 63628, 2000, Very High
You can either change the projection
pgeo <- spTransform(p, CRS('+proj=longlat +datum=WGS84'))
and then do something like:
ext <- floor(extent(pgeo))
rr <- raster(ext, res=0.5)
rr <- rasterize(pgeo, rr, field=1)
Or keep the orginal CRS and do something like:
ext <- extent(p)
r <- raster(ext, res=50000)
r <- rasterize(p, r, field=1)
plot(r)
Note that you are rasterizing very small polygons to large raster cells. A polygon is considered 'inside' if it covers the center of a cell (i.e. assuming a case where polygons cover multiple cells). So for these data you would need to use a much higher resolution (and then perhaps aggregate the results). Alternatively you could rasterize polygon centroids.
But none of the above is relevant really, as you are doing this all backwards. The polygons are clearly derived from a raster (look how blocky they are) and the raster is available in the dataset you point to!
So instead of rasterizing, do:
x <- raster('Global_Threats/Integrated_Future/rf_int_2030')
x
#class : RasterLayer
#dimensions : 25456, 80150, 2040298400 (nrow, ncol, ncell)
#resolution : 500, 500 (x, y)
#extent : -20037508, 20037492, -6363885, 6364115 (xmin, xmax, ymin, ymax)
#coord. ref. : NA
#data source : C:\temp\Global_Threats\Integrated_Future\rf_int_2030
#names : rf_int_2030
#values : 0, 2000 (min, max)
#attributes :
# ID COUNT THREAT_TXT
# 0 80971 Low
# 100 343535 Medium
# 1000 322231 High
# 1500 168518 Very High
# 2000 83598 Critical
Here plotting a part of Palawan:
e <- extent(c(-8990636, -8929268, 1182946, 1256938))
plot(x, ext=e)
plot(p, add=TRUE)
If you need a lower resolution see raster::aggregate. For a different coordinate reference system, see raster::projectRaster.

Resources