Streamlining binary rasterization in R - r

I have a few very small country-level polygon and point shapefiles that I would like to rasterize in R. The final product should be one global binary raster (indicating whether grid cell center is covered by a polygon / point lies within cell or not). My approach is to loop over the shapefiles and do the following for each shapefile:
# load shapefile
shp = sf::read_sf(shapefile_path)
# create a global raster template with resolution 0.0083
ext = extent(-180.0042, 180.0042, -65.00417, 75.00417)
gridsize = 0.008333333
r = raster(ext, res = gridsize)
# rasterize polygon or point shapefile to raster
rr = rasterize(shp, r, background = 0) #all grid cells that are not covered get 0
# convert to binary raster
values(rr)[values(rr)>0] = 1
Here, rr is the raster file where the polygons / points in shp are coded as 1 and all other grid cells are coded as 0. Afterwards, I take the sum over all rr to arrive at one global binary raster file including all polygons / points.
The final two steps are incredibly slow. In addition, I get RAM problems when I try to replace the all positive values in rr with 1 as the cell count is very large due to the fine resolution. I was wondering whether it is possible to come up with a smarter solution for what I'd like to achieve.
I have already found the fasterize package that has a speedy implementation of rasterize which works fine. I think it would be of great help if someone has a solution where rasterize directly returns a binary raster.

This is how you can do this better with raster. Note the value=1 argument, and also that that I changed your specification of the extent -- as what you do is probably not correct.
library(raster)
v <- shapefile(shapefile_path)
ext <- extent(-180, 180, -65, 75)
r <- raster(ext, res = 1/120)
rr <- rasterize(v, r, value=1, background = 0)
There is no need for your last step, but you could have done
rr <- clamp(rr, 0, 1)
# or
rr <- rr > 0
# or
rr <- reclassify(rr, cbind(1, Inf, 1))
raster::calc is not very efficient for simple arithmetic like this
It should be much faster to rasterize all vector data in one step, rather than in a loop, especially with large rasters like this (for which the program may need to write a temp file for each iteration).
To illustrate this solution with example data
library(raster)
cds1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60))
cds2 <- rbind(c(-10,0), c(140,60), c(160,0), c(140,-55))
cds3 <- rbind(c(-125,0), c(0,60), c(40,5), c(15,-45))
v <- spLines(cds1, cds2, cds3)
r <- raster(ncols=90, nrows=45)
r <- rasterize(v, r, field=1)
To speed things up, you can use terra (the replacement for raster)
library(raster)
f <- system.file("ex/lux.shp", package="terra")
v <- as.lines(vect(f))
r <- rast(v, ncol=75, nrow=100)
x <- rasterize(v, r, field=1)

Something that seems to work computationally and significantly improves computation time is to
Create one large shapefile shp instead of working with individual rasterized shapefiles.
Use the fasterize package to rasterize the merged shapefile.
Use raster::calc to avoid memory problems.
ext = extent(-180.0042, 180.0042, -65.00417, 75.00417)
gridsize = 0.008333333
r = raster(ext, res=gridsize)
rr = fasterize(shp, r, background = 0) #all not covered cells get 0, others get sum
# convert to binary raster
fun = function(x) {x[x>0] <- 1; return(x) }
r2 = raster::calc(rr, fun)

Related

Aggregate high resolution (300m*300m) raster (raster::aggregate and velox not able to handle well this resolution)

I'm trying to aggregate a raster r of global extent from a ~300m*300m (10 arcā€seconds, 7.4GB) resolution to a ~10km resolution (0.083333 decimal degrees), i.e. a factor of 30.
Both the aggregate functions from the raster and the velox packages do not seem to handle such large dataset. I very much welcome recommendations!
# sample rasters
r <- raster(extent(-180,180, -90 , 90))
res(r)<-c(0.5/6/30, 0.5/6/30)
r <- setValues(r, runif(ncell(r))) # Error: cannot allocate vector of size 62.6 Gb
# velox example
devtools::install_github('hunzikp/velox')
library(velox)
vx <- velox(r) # the process aborts in linux
vx$aggregate(factor=30, aggtype='mean')
# raster example
r_agg <- aggregate(r, fact=30)
You say that raster cannot handle a large raster like that, but that is not true. The problem is that you are trying to create a very large data set in memory --- more memory than your computer has available. You can use the init function instead. I show that below but not using a global 300 m raster to make the example run a bit faster.
library(raster)
r <- raster(ymn=80, res=0.5/6/30)
r <- init(r, "col")
r_agg <- aggregate(r, fact=30)
You get better mileage with terra
library(terra)
rr <- rast(ymin=80, res= 0.5/6/30)
rr <- init(rr, "col")
rr_agg <- aggregate(rr, fact=30)
In addition to Robert's suggestion, I'd resample the rast with a template so the ext and crs would be precise.
r <- terra::rast("your_rast.tif") %>%
aggregate(., fact = 30) %>%
resample(., template_rast, filename ="sth.tif",
wopt = list(gdal = c("COMPRESS=LZW", "TFW=YES", "BIGTIFF=YES"),
tempdir = "somewhere_you_have_a_lot_of_space", todisk = TRUE))
Those wopt options might help you a lot with large rasters.

Transfering Conditional command from Raster Calculator of ArcGis to R

I would like to transfer the following code from Raster calculator of ArcGis to R.
Con("diff_canopy" >= 1), "diff_canopy")
This estimates a new raster which only contains the data from diff_canopy where diff_canopy is greater or equal than 1.
To solve this, I followed and adapted the code proposed in this post:
test <- raster (extent(canopy_sjd), nrows=nrow(canopy_sjd), ncols=ncol(canopy_sjd))
test[canopy_sjd[]>=1] <- canopy_sjd[canopy_sjd[] >=1]
The code works fine, however, when I compare the raster obtained with R code with the raster obtained directly with ArcGis calculator, I obtained different values:
From ArcGis calculator: min 1.01598 max 10.0271
From adapted R code: min 1.01598 max 11.7207
My questions are the following:
1) The adapted code match with the raster calculator statment?
2) If it matches, why the max values between output rasters differs?
3) If it do not matches, any other suggestions to fix the error?
Always include some example data:
library(raster)
canopy <- raster(nrow=4, ncol=4, xmn=0, xmx=1, ymn=0, ymx=1, crs='+proj=utm +zone=1')
values(canopy) <- 1:ncell(canopy)
canopy <- canopy - 5
Here is a simple solution:
x <- reclassify(canopy, cbind(-Inf, 1, NA), right=FALSE)
An alternative:
y <- mask(canopy, canopy >=1, maskvalue=0)
One more:
z <- calc(canopy, function(i){ i[i<1] <- NA; i})
For small data sets, it is possible to use your solution (but not recommended). I would do it like this:
a <- raster(canopy)
i <- which(values(canopy) >= 1)
a[i] <- canopy[i]

Line density function in R equivalent to Line density tool in ArcMap (arcpy)

I need to calculate the magnitude-per-unit area of polylines that fall within a radius around each cell. Essentially I need to calculate a km/km2 road density within a 500m pixel search radius. ArcMap has a quick and easy tool that handles this, but I need a pure R solution.
Here is a link on how line density works: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/how-line-density-works.htm
And this is how to use it in a python (arcpy) script: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/line-density.htm
I currently execute a backwards approach using raster::focal function, calculating a density of burned in road features. I then convert the km2/km2 output to km/km2.
#Import libraries
library(raster)
library(rgdal)
library(gdalUtils)
#Read-in an already created raster mask (cells are all set to 0)
mask <- raster("x://path to raster mask...")
#Make a copy of the mask to burn features in, keeping the original untouched
roads_mask <- file.copy(mask, "x://output path ...//roads.tif")
#Read-in road features (shapefile format)
roads_sldf <- readOGR("x://path to shapefile" , "roads")
#Rasterize spatial lines data frame ie. burn road features into mask
#Where road features get a value of 1, mask extent gets a value of 0
roads_raster <- gdalUtils::gdal_rasterize(src_datasource = roads_sldf,
dst_filename = "x://output path ...//roads.tif", b = 1,
burn = 1, l = "roads", output_Raster = TRUE)
#Run a 1km circular radius density function (be mindful of edge effects)
weight <- raster::focalWeight(roads_raster,1000,type = "circle")
1km_rdDensity <- raster::focal(roads_raster, weight, fun=sum, filename = '',
na.rm=TRUE, pad=TRUE, NAonly=FALSE, overwrite=TRUE)
#Convert km2/km2 road density to km/km2
#Set up the moving window
weight <- raster::focalWeight(roads_raster,1000,type = "circle")
#Count how many records in each column of the moving window are > 0
columnCount <- apply(weight,2,function(x) sum(x > 0))
#Get the sum of the column count
number_of_cells <- sum(columnCount)
#multiply km2/km2 density by number of cells in the moving window
step1 <- roads_raster * number_of_cells
#Rescale step1 output with respect to cell size(30m) and radius of a circle
final_rdDensity <- (step1*0.03)/3.14159265
#Write out final km/km2 road density raster
writeRaster(final_rdDensity,"X://path to output...", datatype = 'FLT4S', overwrite = TRUE)
After some more research I think I may be able to use a kernel function, however I don't want to apply the smoothing algorithm... As well the output is an 'im' object which I would need to write to as a 'tif'
#Import libraries
library(spatstat)
library(rgdal)
#Read-in road features (shapefile format)
roads_sldf <- readOGR("x://path to shapefile" , "roads")
#Convert roads spatial lines data frame to psp object
psp_roads <- as.psp(roads_sldf)
#Apply kernel density, however this is where I am unsure of the arguments
road_density <- spatstat::density.psp(psp_roads, sigma = 0.01, eps = 500)
Cheers.
See this question https://gis.stackexchange.com/questions/138861/calculating-road-density-in-r-using-kernel-density
Tried to mark as a duplicate but doesn't work because the other Q is on gis stack exchange
Short answer is use spatstat.geom::pixellate()
I also needed spatstat.geom::as.psp(sf::st_geometry(x)) to convert an sf lines object to the correct format and maptools::as.im.RasterLayer(r) to convert a raster. I was able to convert the result to RasterLayer with raster::raster(pix_res)
Perhaps you can use terra::rasterizeGeom which is available in the development version that you can install with install.packages('terra', repos='https://rspatial.r-universe.dev')
Example data
library(terra)
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f) |> as.lines()
r <- rast(v, res=.1)
Solution
x <- rasterizeGeom(v, r, fun="length", "km")
And then use focal sum, but you would not have a perfect circle.
What you could do instead, if your dataset is not too large, is create a circle for each grid cell and use intersect. Something like this:
p <- xyFromCell(r, 1:ncell(r)) |> vect(crs="+proj=longlat")
p$id <- 1:ncell(r)
b <- buffer(p, 10000)
values(v) <- NULL
i <- intersect(v, b)
x <- aggregate(perim(i), list(id=i$id), sum)
r[x$id] <- x[,2]

simplify shapefile for placing random points with spsample()

I need to place random points on each shapefile from a list of shapefiles with spsample(). For some irregular shapefiles this is proving to be a long process so I need to simply some shapefiles by dropping small and remote polygons which (I think) are the trouble-makers for spsample().
For this I need to know for each polygon it's size and it's mean distance to all other polygons. I am looking for how to speed up this calculation, probably can be done in a more elegant (and faster) way. The attempt shown below works but as a simplifying algorithm it takes too much time.
#program tries to place random points on shapefile shapes[[i]] if it fails after 300 seconds it goes though to simplifying part and swaps the old shapefile with a simplified version.
d <- shapes[[i]]
Fdist <- list()
for(m in 1:dim(d)[1]) {
pDist <- vector()
for(n in 1:dim(d)[1]) {
pDist <- append(pDist, gDistance(d[m,],d[n,]))
}
Fdist[[m]] <- pDist
d#data$mean[m]<-mean(Fdist[[m]])
d#data$gArea[m]<-gArea(d[m,])
}
#drop small and remote polygons
d.1<-d[d#data$gArea>=quantile(d#data$gArea, prob = seq(0, 1, length=11), type=5)[[1]] & (d#data$mean<=quantile(d#data$mean, prob = seq(0, 1, length=11), type=5)[[10]]),]
#replace with simplified polygon
shapes[[i]]<-d.1
I would be grateful for any suggestion.
I would try simplifying the polygons first. ms_simplify in the rmapshaper package can greatly simplify your polygons, without introducing slither polygons or gaps:
library("rgdal")
library("rmapshaper")
big <- readOGR(dsn = ".", "unsimplified_shapefile")
big_sample <- spsample(big, 1000, type = "stratified")
small <- rmapshaper::ms_simplify(big, keep = 0.01)
small_sample <- spsample(small, 1000, type = "stratified")
With a shapefile I had to hand, I reduced a ~100MB shapefile to ~2MB and reduced the time taken to sample from ~2.3s to ~0.11s.
If simplifying is not an option you can vectorise your gArea() and gDistance() functions by using byid = TRUE:
library("rgeos")
big#data$area <- gArea(big, byid = TRUE)
big#data$dist <- gDistance(big, byid = TRUE)

error in mask a raster by a spatialpolygon

I have raster of the following features:
library(raster)
library(rgeos)
test <- raster(nrow=225, ncols=478, xmn=-15.8, xmx=32, ymn=-9.4, ymx=13.1)
I want to mask in this raster the cells that are within a given distance of a point.
I create the spatial points as followed:
p2=readWKT("POINT(31.55 -1.05)")
Then I create a spatial polygon object by adding a 0.5 buffer:
p2_Buffered <- gBuffer(p2, width = 0.5)
mask(test, mask=p2_Buffered,inverse=T)
When I mask my raster given this spatial object, I have the following error message:
Error in .polygonsToRaster(x, y, field = field, fun = fun, background
= background, : number of items to replace is not a multiple of replacement length
I do not understand because this is script I have been running many many times with different point and different buffer width without any problem.
What is strange is that when I change the width of the buffer, it works fine:
p2_Buffered <- gBuffer(p2, width = 0.4)
mask(test, mask=p2_Buffered,inverse=T)
This is also true for a different focal point:
p2=readWKT("POINT(32.55 -1)")
p2_Buffered <- gBuffer(p2, width = 0.5)
mask(test, mask=p2_Buffered,inverse=T)
I would like to identify the specific problem I have for that point because this is a script I should run in a routine (I have been doing it without any problem so far).
Thanks a lot
This is indeed a bug with polygons that go over the edge of a raster. It has been fixed in version 2.3-40 (now on CRAN), so it should go away if you update the raster package.
Here is a workaround (removing the part of the polygon that goes over the edge).
library(raster)
library(rgeos)
r <- raster(nrow=225, ncols=478, xmn=-15.8, xmx=32, ymn=-9.4, ymx=13.1)
e <- as(extent(r), 'SpatialPolygons')
p <- readWKT("POINT(31.55 -1.05)")
pb <- gBuffer(p, width = 0.5)
pbe <- intersect(pb, e)
values(r)
x <- mask(r, mask=pbe, inverse=TRUE)
You usually need to set some values to the raster layer. For a mask layer its always best to set values to 1.
library(raster)
library(rgeos)
# make sample raster
test <- raster(nrow=225, ncols=478, xmn=-15.8, xmx=32, ymn=-9.4, ymx=13.1)
# set values of raster for mask
test <- setValues(test, 1)
# make point buffer
p2=readWKT("POINT(15 5)")
p2_Buffered <- gBuffer(p2, width = 1.5)
# name projection of buffer (assume its the same as raster)
projection(p2_Buffered) <- projection(test)
# visual check
plot(test); plot(p2_Buffered, add=T)
If you want to trim down your raster layer to the just the single polygon then try this workflow.
step1 <- crop(test, p2_Buffered) # crop to same extent
step2 <- rasterize(p2_Buffered, step1) # rasterize polygon
final <- step1*step2 # make your final product
plot(final)
If you just want to poke a hole in your raster layer then use the mask function
# rasterize your polygon
p2_Buffered <- rasterize(p2_Buffered, test, fun='sum')
# now mask it
my_mask <- mask(test, mask=p2_Buffered,inverse=T) # try changing the inverse argument
plot(my_mask)

Resources