simplify shapefile for placing random points with spsample() - r

I need to place random points on each shapefile from a list of shapefiles with spsample(). For some irregular shapefiles this is proving to be a long process so I need to simply some shapefiles by dropping small and remote polygons which (I think) are the trouble-makers for spsample().
For this I need to know for each polygon it's size and it's mean distance to all other polygons. I am looking for how to speed up this calculation, probably can be done in a more elegant (and faster) way. The attempt shown below works but as a simplifying algorithm it takes too much time.
#program tries to place random points on shapefile shapes[[i]] if it fails after 300 seconds it goes though to simplifying part and swaps the old shapefile with a simplified version.
d <- shapes[[i]]
Fdist <- list()
for(m in 1:dim(d)[1]) {
pDist <- vector()
for(n in 1:dim(d)[1]) {
pDist <- append(pDist, gDistance(d[m,],d[n,]))
}
Fdist[[m]] <- pDist
d#data$mean[m]<-mean(Fdist[[m]])
d#data$gArea[m]<-gArea(d[m,])
}
#drop small and remote polygons
d.1<-d[d#data$gArea>=quantile(d#data$gArea, prob = seq(0, 1, length=11), type=5)[[1]] & (d#data$mean<=quantile(d#data$mean, prob = seq(0, 1, length=11), type=5)[[10]]),]
#replace with simplified polygon
shapes[[i]]<-d.1
I would be grateful for any suggestion.

I would try simplifying the polygons first. ms_simplify in the rmapshaper package can greatly simplify your polygons, without introducing slither polygons or gaps:
library("rgdal")
library("rmapshaper")
big <- readOGR(dsn = ".", "unsimplified_shapefile")
big_sample <- spsample(big, 1000, type = "stratified")
small <- rmapshaper::ms_simplify(big, keep = 0.01)
small_sample <- spsample(small, 1000, type = "stratified")
With a shapefile I had to hand, I reduced a ~100MB shapefile to ~2MB and reduced the time taken to sample from ~2.3s to ~0.11s.
If simplifying is not an option you can vectorise your gArea() and gDistance() functions by using byid = TRUE:
library("rgeos")
big#data$area <- gArea(big, byid = TRUE)
big#data$dist <- gDistance(big, byid = TRUE)

Related

Streamlining binary rasterization in R

I have a few very small country-level polygon and point shapefiles that I would like to rasterize in R. The final product should be one global binary raster (indicating whether grid cell center is covered by a polygon / point lies within cell or not). My approach is to loop over the shapefiles and do the following for each shapefile:
# load shapefile
shp = sf::read_sf(shapefile_path)
# create a global raster template with resolution 0.0083
ext = extent(-180.0042, 180.0042, -65.00417, 75.00417)
gridsize = 0.008333333
r = raster(ext, res = gridsize)
# rasterize polygon or point shapefile to raster
rr = rasterize(shp, r, background = 0) #all grid cells that are not covered get 0
# convert to binary raster
values(rr)[values(rr)>0] = 1
Here, rr is the raster file where the polygons / points in shp are coded as 1 and all other grid cells are coded as 0. Afterwards, I take the sum over all rr to arrive at one global binary raster file including all polygons / points.
The final two steps are incredibly slow. In addition, I get RAM problems when I try to replace the all positive values in rr with 1 as the cell count is very large due to the fine resolution. I was wondering whether it is possible to come up with a smarter solution for what I'd like to achieve.
I have already found the fasterize package that has a speedy implementation of rasterize which works fine. I think it would be of great help if someone has a solution where rasterize directly returns a binary raster.
This is how you can do this better with raster. Note the value=1 argument, and also that that I changed your specification of the extent -- as what you do is probably not correct.
library(raster)
v <- shapefile(shapefile_path)
ext <- extent(-180, 180, -65, 75)
r <- raster(ext, res = 1/120)
rr <- rasterize(v, r, value=1, background = 0)
There is no need for your last step, but you could have done
rr <- clamp(rr, 0, 1)
# or
rr <- rr > 0
# or
rr <- reclassify(rr, cbind(1, Inf, 1))
raster::calc is not very efficient for simple arithmetic like this
It should be much faster to rasterize all vector data in one step, rather than in a loop, especially with large rasters like this (for which the program may need to write a temp file for each iteration).
To illustrate this solution with example data
library(raster)
cds1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60))
cds2 <- rbind(c(-10,0), c(140,60), c(160,0), c(140,-55))
cds3 <- rbind(c(-125,0), c(0,60), c(40,5), c(15,-45))
v <- spLines(cds1, cds2, cds3)
r <- raster(ncols=90, nrows=45)
r <- rasterize(v, r, field=1)
To speed things up, you can use terra (the replacement for raster)
library(raster)
f <- system.file("ex/lux.shp", package="terra")
v <- as.lines(vect(f))
r <- rast(v, ncol=75, nrow=100)
x <- rasterize(v, r, field=1)
Something that seems to work computationally and significantly improves computation time is to
Create one large shapefile shp instead of working with individual rasterized shapefiles.
Use the fasterize package to rasterize the merged shapefile.
Use raster::calc to avoid memory problems.
ext = extent(-180.0042, 180.0042, -65.00417, 75.00417)
gridsize = 0.008333333
r = raster(ext, res=gridsize)
rr = fasterize(shp, r, background = 0) #all not covered cells get 0, others get sum
# convert to binary raster
fun = function(x) {x[x>0] <- 1; return(x) }
r2 = raster::calc(rr, fun)

Is there a speedy and memory efficient way to calculate the proportion of overlay between a polygon and high resolutiong raster in R?

I need to calculate the proportion of each raster cell in a high resolution grid (raster stack with 8 layers) covered by a polygon using R.
My standard approach would be to use raster::rasterize(..., getCover = TRUE), however, this approach takes a very long time, particularly when the size of the polygon increases.
As an alternative, I tried cropping the raster stack to the extent of the polygon, transforming the raster stack to polygons and calculating the proportion from the intersection of the resulting shapes with the original polygon. This works well for small polygons but breaks down as the polygon increases, because R runs out of memory (I am limited to 16GB) or because the calculation of the intersection takes too long.
Here a reproducible example using my current solution with a very small shape file.
library(raster)
library(spex)
library(dplyr)
library(sf)
library(data.table)
# setup a dummy example
r <- raster(nrow = 21600, ncol = 43200)
r[] <- 1:933120000
r_stack <- stack(r,r,r,r,r,r,r,r)
# get a small dummy shapefile
shp_small <- raster::getData(name = "GADM", country = "CHE", level = 2, download = TRUE)
shp_small <- st_as_sf(shp_small)[1, ]
# for comparison, use a big dummy shapefile
# shp_big <- raster::getData(name = "GADM", country = "BRA", level = 0, download = TRUE)
## Approach for a small shape file
stack_small <- raster::crop(r_stack, shp_small, snap = "out")
## transform to polygon
stack_small_poly <- spex::polygonize(stack_small)
stack_small_poly$ID <- 1:nrow(stack_small_poly)
## I can then perform the necessary calculations on the polygons to obtain
## the proportional overlay
# transform to mollweide for area calculation
mollw <- "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
stack_small_crs <- st_crs(stack_small_poly)
stack_small <- st_transform(stack_small_poly, mollw)
stack_small_poly <- st_transform(stack_small_poly, mollw)
# calculate area for each cell
stack_small_poly$area_org <- st_area(stack_small_poly)
# transform to world equidistant cylindrical
stack_small_poly<- st_transform(stack_small_poly, 4087)
shp_small <- st_transform(shp_small, 4087)
# get call the cells that intersect with the shape (this might take a while)
i <- st_intersects(stack_small_poly, shp_small, sparse = FALSE)
stack_small_poly <- dplyr::filter(stack_small_poly, i)
# now calculate the extact intersection (this might take a while)
st_agr(stack_small_poly) <- "constant"
stack_small_poly <- st_intersection(stack_small_poly, st_geometry(shp_small))
# calculate the new areas and backtransform
stack_small_poly <- st_transform(stack_small_poly, mollw)
stack_small_poly$new_area <- st_area(stack_small_poly)
stack_small_poly <- st_transform(stack_small_poly, stack_small_crs)
# calculate proportion
stack_small_poly$proportion <- as.numeric(stack_small_poly$new_area/stack_small_poly$area_org)
# finally transform to data.table for subsequent analysis
st_geometry(stack_small_poly) <- NULL
setDT(stack_small_poly)
I am looking for a solution in R that is able to perform the task in 10-15 minutes (preferably faster) with a memory limit of 16 GB RAM for the shapefile representing Brazil (see shp_big in code above).
I am well aware that this optimum might not be achievable and every suggestion leading to a reduction in execution time and/or memory usage is more than wellcome.

Line density function in R equivalent to Line density tool in ArcMap (arcpy)

I need to calculate the magnitude-per-unit area of polylines that fall within a radius around each cell. Essentially I need to calculate a km/km2 road density within a 500m pixel search radius. ArcMap has a quick and easy tool that handles this, but I need a pure R solution.
Here is a link on how line density works: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/how-line-density-works.htm
And this is how to use it in a python (arcpy) script: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/line-density.htm
I currently execute a backwards approach using raster::focal function, calculating a density of burned in road features. I then convert the km2/km2 output to km/km2.
#Import libraries
library(raster)
library(rgdal)
library(gdalUtils)
#Read-in an already created raster mask (cells are all set to 0)
mask <- raster("x://path to raster mask...")
#Make a copy of the mask to burn features in, keeping the original untouched
roads_mask <- file.copy(mask, "x://output path ...//roads.tif")
#Read-in road features (shapefile format)
roads_sldf <- readOGR("x://path to shapefile" , "roads")
#Rasterize spatial lines data frame ie. burn road features into mask
#Where road features get a value of 1, mask extent gets a value of 0
roads_raster <- gdalUtils::gdal_rasterize(src_datasource = roads_sldf,
dst_filename = "x://output path ...//roads.tif", b = 1,
burn = 1, l = "roads", output_Raster = TRUE)
#Run a 1km circular radius density function (be mindful of edge effects)
weight <- raster::focalWeight(roads_raster,1000,type = "circle")
1km_rdDensity <- raster::focal(roads_raster, weight, fun=sum, filename = '',
na.rm=TRUE, pad=TRUE, NAonly=FALSE, overwrite=TRUE)
#Convert km2/km2 road density to km/km2
#Set up the moving window
weight <- raster::focalWeight(roads_raster,1000,type = "circle")
#Count how many records in each column of the moving window are > 0
columnCount <- apply(weight,2,function(x) sum(x > 0))
#Get the sum of the column count
number_of_cells <- sum(columnCount)
#multiply km2/km2 density by number of cells in the moving window
step1 <- roads_raster * number_of_cells
#Rescale step1 output with respect to cell size(30m) and radius of a circle
final_rdDensity <- (step1*0.03)/3.14159265
#Write out final km/km2 road density raster
writeRaster(final_rdDensity,"X://path to output...", datatype = 'FLT4S', overwrite = TRUE)
After some more research I think I may be able to use a kernel function, however I don't want to apply the smoothing algorithm... As well the output is an 'im' object which I would need to write to as a 'tif'
#Import libraries
library(spatstat)
library(rgdal)
#Read-in road features (shapefile format)
roads_sldf <- readOGR("x://path to shapefile" , "roads")
#Convert roads spatial lines data frame to psp object
psp_roads <- as.psp(roads_sldf)
#Apply kernel density, however this is where I am unsure of the arguments
road_density <- spatstat::density.psp(psp_roads, sigma = 0.01, eps = 500)
Cheers.
See this question https://gis.stackexchange.com/questions/138861/calculating-road-density-in-r-using-kernel-density
Tried to mark as a duplicate but doesn't work because the other Q is on gis stack exchange
Short answer is use spatstat.geom::pixellate()
I also needed spatstat.geom::as.psp(sf::st_geometry(x)) to convert an sf lines object to the correct format and maptools::as.im.RasterLayer(r) to convert a raster. I was able to convert the result to RasterLayer with raster::raster(pix_res)
Perhaps you can use terra::rasterizeGeom which is available in the development version that you can install with install.packages('terra', repos='https://rspatial.r-universe.dev')
Example data
library(terra)
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f) |> as.lines()
r <- rast(v, res=.1)
Solution
x <- rasterizeGeom(v, r, fun="length", "km")
And then use focal sum, but you would not have a perfect circle.
What you could do instead, if your dataset is not too large, is create a circle for each grid cell and use intersect. Something like this:
p <- xyFromCell(r, 1:ncell(r)) |> vect(crs="+proj=longlat")
p$id <- 1:ncell(r)
b <- buffer(p, 10000)
values(v) <- NULL
i <- intersect(v, b)
x <- aggregate(perim(i), list(id=i$id), sum)
r[x$id] <- x[,2]

error in mask a raster by a spatialpolygon

I have raster of the following features:
library(raster)
library(rgeos)
test <- raster(nrow=225, ncols=478, xmn=-15.8, xmx=32, ymn=-9.4, ymx=13.1)
I want to mask in this raster the cells that are within a given distance of a point.
I create the spatial points as followed:
p2=readWKT("POINT(31.55 -1.05)")
Then I create a spatial polygon object by adding a 0.5 buffer:
p2_Buffered <- gBuffer(p2, width = 0.5)
mask(test, mask=p2_Buffered,inverse=T)
When I mask my raster given this spatial object, I have the following error message:
Error in .polygonsToRaster(x, y, field = field, fun = fun, background
= background, : number of items to replace is not a multiple of replacement length
I do not understand because this is script I have been running many many times with different point and different buffer width without any problem.
What is strange is that when I change the width of the buffer, it works fine:
p2_Buffered <- gBuffer(p2, width = 0.4)
mask(test, mask=p2_Buffered,inverse=T)
This is also true for a different focal point:
p2=readWKT("POINT(32.55 -1)")
p2_Buffered <- gBuffer(p2, width = 0.5)
mask(test, mask=p2_Buffered,inverse=T)
I would like to identify the specific problem I have for that point because this is a script I should run in a routine (I have been doing it without any problem so far).
Thanks a lot
This is indeed a bug with polygons that go over the edge of a raster. It has been fixed in version 2.3-40 (now on CRAN), so it should go away if you update the raster package.
Here is a workaround (removing the part of the polygon that goes over the edge).
library(raster)
library(rgeos)
r <- raster(nrow=225, ncols=478, xmn=-15.8, xmx=32, ymn=-9.4, ymx=13.1)
e <- as(extent(r), 'SpatialPolygons')
p <- readWKT("POINT(31.55 -1.05)")
pb <- gBuffer(p, width = 0.5)
pbe <- intersect(pb, e)
values(r)
x <- mask(r, mask=pbe, inverse=TRUE)
You usually need to set some values to the raster layer. For a mask layer its always best to set values to 1.
library(raster)
library(rgeos)
# make sample raster
test <- raster(nrow=225, ncols=478, xmn=-15.8, xmx=32, ymn=-9.4, ymx=13.1)
# set values of raster for mask
test <- setValues(test, 1)
# make point buffer
p2=readWKT("POINT(15 5)")
p2_Buffered <- gBuffer(p2, width = 1.5)
# name projection of buffer (assume its the same as raster)
projection(p2_Buffered) <- projection(test)
# visual check
plot(test); plot(p2_Buffered, add=T)
If you want to trim down your raster layer to the just the single polygon then try this workflow.
step1 <- crop(test, p2_Buffered) # crop to same extent
step2 <- rasterize(p2_Buffered, step1) # rasterize polygon
final <- step1*step2 # make your final product
plot(final)
If you just want to poke a hole in your raster layer then use the mask function
# rasterize your polygon
p2_Buffered <- rasterize(p2_Buffered, test, fun='sum')
# now mask it
my_mask <- mask(test, mask=p2_Buffered,inverse=T) # try changing the inverse argument
plot(my_mask)

R simplify shapefile

I have a shapefile with amazonic big rivers. The shapefile alone has 37.9 MB, together with attribute table it goes up to 42.1 MB. I'm generating PNG images of all brazilian Amazon, of 1260x940 pixels each, and all these data in the shapefile only slows down the drawing of each map, so I wanna simplify it.
gSimplify function, in rgeos package, seem to only simplify each polygon, not to get rid of the smaller ones. I tried it wih tolerance of 0.1 and 1000, and always I get length(shp#polygons) the same value: 27633. And the final plot takes almost the same time to draw. I need a function to which I tell that the final raster will be 1260x940 pixels, so it could remove every unnecessary point. Is there a function to do that?
Thanks in advance.
Pretty comprehensive solution here: http://www.r-bloggers.com/simplifying-polygon-shapefiles-in-r/
In summary, you need to get areas of your polygons:
area <- lapply(rivers#polygons, function(x) sapply(x#Polygons, function(y) y#area))
where rivers is you shapefile object in R.
Then you figure out the large polygons and keep them:
sizeth <- 0.001 #size threshold of polygons to be deleted
mainPolys <- lapply(area, function(x) which(x > sizeth))
rivers#data <- rivers#data[-c(1:2),]
rivers#polygons <- rivers#polygons[-c(1:2)]
rivers#plotOrder <- 1:length(rivers#polygons)
mainPolys <- mainPolys[-c(1:2)]
for(i in 1:length(mainPolys)){ if(length(mainPolys[[i]]) >= 1 &&
mainPolys[[i]][1] >= 1){
rivers#polygons[[i]]#Polygons <- rivers#polygons[[i]]#Polygons[mainPolys[[i]]]
rivers#polygons[[i]]#plotOrder <- 1:length(rivers#polygons[[i]]#Polygons) } }
This might not be good enough, and you might not want to delete any polygons, in which case the dp() function from the shapefiles package will do the trick:
res <- 0.01 #the argument passed to dp() which determines extent of simplification. Increase or decrease as required to simplify more/less
for(i in 1:length(rivers#polygons)){
for(j in 1:length(rivers#polygons[[i]]#Polygons)){
temp <- as.data.frame(rivers#polygons[[i]]#Polygons[[j]]#coords)
names(temp) <- c("x", "y")
temp2 <- dp(temp, res)
rivers#polygons[[i]]#Polygons[[j]]#coords <- as.matrix(cbind(temp2$x, temp2$y))
}
}

Resources