R simplify shapefile - r

I have a shapefile with amazonic big rivers. The shapefile alone has 37.9 MB, together with attribute table it goes up to 42.1 MB. I'm generating PNG images of all brazilian Amazon, of 1260x940 pixels each, and all these data in the shapefile only slows down the drawing of each map, so I wanna simplify it.
gSimplify function, in rgeos package, seem to only simplify each polygon, not to get rid of the smaller ones. I tried it wih tolerance of 0.1 and 1000, and always I get length(shp#polygons) the same value: 27633. And the final plot takes almost the same time to draw. I need a function to which I tell that the final raster will be 1260x940 pixels, so it could remove every unnecessary point. Is there a function to do that?
Thanks in advance.

Pretty comprehensive solution here: http://www.r-bloggers.com/simplifying-polygon-shapefiles-in-r/
In summary, you need to get areas of your polygons:
area <- lapply(rivers#polygons, function(x) sapply(x#Polygons, function(y) y#area))
where rivers is you shapefile object in R.
Then you figure out the large polygons and keep them:
sizeth <- 0.001 #size threshold of polygons to be deleted
mainPolys <- lapply(area, function(x) which(x > sizeth))
rivers#data <- rivers#data[-c(1:2),]
rivers#polygons <- rivers#polygons[-c(1:2)]
rivers#plotOrder <- 1:length(rivers#polygons)
mainPolys <- mainPolys[-c(1:2)]
for(i in 1:length(mainPolys)){ if(length(mainPolys[[i]]) >= 1 &&
mainPolys[[i]][1] >= 1){
rivers#polygons[[i]]#Polygons <- rivers#polygons[[i]]#Polygons[mainPolys[[i]]]
rivers#polygons[[i]]#plotOrder <- 1:length(rivers#polygons[[i]]#Polygons) } }
This might not be good enough, and you might not want to delete any polygons, in which case the dp() function from the shapefiles package will do the trick:
res <- 0.01 #the argument passed to dp() which determines extent of simplification. Increase or decrease as required to simplify more/less
for(i in 1:length(rivers#polygons)){
for(j in 1:length(rivers#polygons[[i]]#Polygons)){
temp <- as.data.frame(rivers#polygons[[i]]#Polygons[[j]]#coords)
names(temp) <- c("x", "y")
temp2 <- dp(temp, res)
rivers#polygons[[i]]#Polygons[[j]]#coords <- as.matrix(cbind(temp2$x, temp2$y))
}
}

Related

How to crop multiple rasters to the same extent using a for loop in R?

I have 4 rasters I would like to crop to the same extent. In future iterations of this script I will have way more than 4, so I am trying to write a loop that will crop all rasters in a directory to the same extent. The rasters are downloaded Sentinel-2 products containing at least 4 bands that have been converted into GeoTIFFs using the sen2r() library. I've tried working with answers to similar questions posted here, but lose the bands somehow in the process, and i will need those bands to do some raster math later on.
Code so far:
raster_files <- list.files(here::here("data", "s2_rasters")) #dir with 4 rasters
raster_paths <- paste0(here::here("data", "s2_rasters", raster_files))
wp_shp <- readOGR(here::here("data", "wp_boundary.shp"))
e <- extent(wp_shp)
n <- length(raster_paths)
for (i in 1:n) {
m <- raster_paths[i]
crop(x = m, y = e)
}
EDIT:
I recognize my loop doesn't make sense. I'm new to this and idk what i'm doing. Up until this point in the script I have been using the paths to the files to do stuff (build virtual rasters, apply atmospheric corrections etc.).
Here's an example I did for a single crop that worked fine.
extent <- extent(802331.9, 802503.7, 9884986, 9885133)
ras_crop <- stack(here::here("data", "s2_rasters", "sample_raster.tif")) %>%
crop(extent) %>%
writeRaster(filename=file.path(here::here("data", "s2_rasters"), "raster1_crop.tif"))
From what I gather, you should be able to do something like this
# input filenames
inf <- list.files("data/s2_rasters", pattern="tif$", full.names=TRUE)
# create output filenames and folder
outf <- gsub("data/s2_rasters", "output", inf)
dir.create("output", FALSE, FALSE)
library(raster)
wp_shp <- shapefile("data/wp_boundary.shp")
e <- extent(wp_shp)
for (i in 1:length(inf)) {
b <- brick(inf[i])
crop(b, e, filename=outf[i])
}

Streamlining binary rasterization in R

I have a few very small country-level polygon and point shapefiles that I would like to rasterize in R. The final product should be one global binary raster (indicating whether grid cell center is covered by a polygon / point lies within cell or not). My approach is to loop over the shapefiles and do the following for each shapefile:
# load shapefile
shp = sf::read_sf(shapefile_path)
# create a global raster template with resolution 0.0083
ext = extent(-180.0042, 180.0042, -65.00417, 75.00417)
gridsize = 0.008333333
r = raster(ext, res = gridsize)
# rasterize polygon or point shapefile to raster
rr = rasterize(shp, r, background = 0) #all grid cells that are not covered get 0
# convert to binary raster
values(rr)[values(rr)>0] = 1
Here, rr is the raster file where the polygons / points in shp are coded as 1 and all other grid cells are coded as 0. Afterwards, I take the sum over all rr to arrive at one global binary raster file including all polygons / points.
The final two steps are incredibly slow. In addition, I get RAM problems when I try to replace the all positive values in rr with 1 as the cell count is very large due to the fine resolution. I was wondering whether it is possible to come up with a smarter solution for what I'd like to achieve.
I have already found the fasterize package that has a speedy implementation of rasterize which works fine. I think it would be of great help if someone has a solution where rasterize directly returns a binary raster.
This is how you can do this better with raster. Note the value=1 argument, and also that that I changed your specification of the extent -- as what you do is probably not correct.
library(raster)
v <- shapefile(shapefile_path)
ext <- extent(-180, 180, -65, 75)
r <- raster(ext, res = 1/120)
rr <- rasterize(v, r, value=1, background = 0)
There is no need for your last step, but you could have done
rr <- clamp(rr, 0, 1)
# or
rr <- rr > 0
# or
rr <- reclassify(rr, cbind(1, Inf, 1))
raster::calc is not very efficient for simple arithmetic like this
It should be much faster to rasterize all vector data in one step, rather than in a loop, especially with large rasters like this (for which the program may need to write a temp file for each iteration).
To illustrate this solution with example data
library(raster)
cds1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60))
cds2 <- rbind(c(-10,0), c(140,60), c(160,0), c(140,-55))
cds3 <- rbind(c(-125,0), c(0,60), c(40,5), c(15,-45))
v <- spLines(cds1, cds2, cds3)
r <- raster(ncols=90, nrows=45)
r <- rasterize(v, r, field=1)
To speed things up, you can use terra (the replacement for raster)
library(raster)
f <- system.file("ex/lux.shp", package="terra")
v <- as.lines(vect(f))
r <- rast(v, ncol=75, nrow=100)
x <- rasterize(v, r, field=1)
Something that seems to work computationally and significantly improves computation time is to
Create one large shapefile shp instead of working with individual rasterized shapefiles.
Use the fasterize package to rasterize the merged shapefile.
Use raster::calc to avoid memory problems.
ext = extent(-180.0042, 180.0042, -65.00417, 75.00417)
gridsize = 0.008333333
r = raster(ext, res=gridsize)
rr = fasterize(shp, r, background = 0) #all not covered cells get 0, others get sum
# convert to binary raster
fun = function(x) {x[x>0] <- 1; return(x) }
r2 = raster::calc(rr, fun)

Line density function in R equivalent to Line density tool in ArcMap (arcpy)

I need to calculate the magnitude-per-unit area of polylines that fall within a radius around each cell. Essentially I need to calculate a km/km2 road density within a 500m pixel search radius. ArcMap has a quick and easy tool that handles this, but I need a pure R solution.
Here is a link on how line density works: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/how-line-density-works.htm
And this is how to use it in a python (arcpy) script: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/line-density.htm
I currently execute a backwards approach using raster::focal function, calculating a density of burned in road features. I then convert the km2/km2 output to km/km2.
#Import libraries
library(raster)
library(rgdal)
library(gdalUtils)
#Read-in an already created raster mask (cells are all set to 0)
mask <- raster("x://path to raster mask...")
#Make a copy of the mask to burn features in, keeping the original untouched
roads_mask <- file.copy(mask, "x://output path ...//roads.tif")
#Read-in road features (shapefile format)
roads_sldf <- readOGR("x://path to shapefile" , "roads")
#Rasterize spatial lines data frame ie. burn road features into mask
#Where road features get a value of 1, mask extent gets a value of 0
roads_raster <- gdalUtils::gdal_rasterize(src_datasource = roads_sldf,
dst_filename = "x://output path ...//roads.tif", b = 1,
burn = 1, l = "roads", output_Raster = TRUE)
#Run a 1km circular radius density function (be mindful of edge effects)
weight <- raster::focalWeight(roads_raster,1000,type = "circle")
1km_rdDensity <- raster::focal(roads_raster, weight, fun=sum, filename = '',
na.rm=TRUE, pad=TRUE, NAonly=FALSE, overwrite=TRUE)
#Convert km2/km2 road density to km/km2
#Set up the moving window
weight <- raster::focalWeight(roads_raster,1000,type = "circle")
#Count how many records in each column of the moving window are > 0
columnCount <- apply(weight,2,function(x) sum(x > 0))
#Get the sum of the column count
number_of_cells <- sum(columnCount)
#multiply km2/km2 density by number of cells in the moving window
step1 <- roads_raster * number_of_cells
#Rescale step1 output with respect to cell size(30m) and radius of a circle
final_rdDensity <- (step1*0.03)/3.14159265
#Write out final km/km2 road density raster
writeRaster(final_rdDensity,"X://path to output...", datatype = 'FLT4S', overwrite = TRUE)
After some more research I think I may be able to use a kernel function, however I don't want to apply the smoothing algorithm... As well the output is an 'im' object which I would need to write to as a 'tif'
#Import libraries
library(spatstat)
library(rgdal)
#Read-in road features (shapefile format)
roads_sldf <- readOGR("x://path to shapefile" , "roads")
#Convert roads spatial lines data frame to psp object
psp_roads <- as.psp(roads_sldf)
#Apply kernel density, however this is where I am unsure of the arguments
road_density <- spatstat::density.psp(psp_roads, sigma = 0.01, eps = 500)
Cheers.
See this question https://gis.stackexchange.com/questions/138861/calculating-road-density-in-r-using-kernel-density
Tried to mark as a duplicate but doesn't work because the other Q is on gis stack exchange
Short answer is use spatstat.geom::pixellate()
I also needed spatstat.geom::as.psp(sf::st_geometry(x)) to convert an sf lines object to the correct format and maptools::as.im.RasterLayer(r) to convert a raster. I was able to convert the result to RasterLayer with raster::raster(pix_res)
Perhaps you can use terra::rasterizeGeom which is available in the development version that you can install with install.packages('terra', repos='https://rspatial.r-universe.dev')
Example data
library(terra)
f <- system.file("ex/lux.shp", package="terra")
v <- vect(f) |> as.lines()
r <- rast(v, res=.1)
Solution
x <- rasterizeGeom(v, r, fun="length", "km")
And then use focal sum, but you would not have a perfect circle.
What you could do instead, if your dataset is not too large, is create a circle for each grid cell and use intersect. Something like this:
p <- xyFromCell(r, 1:ncell(r)) |> vect(crs="+proj=longlat")
p$id <- 1:ncell(r)
b <- buffer(p, 10000)
values(v) <- NULL
i <- intersect(v, b)
x <- aggregate(perim(i), list(id=i$id), sum)
r[x$id] <- x[,2]

simplify shapefile for placing random points with spsample()

I need to place random points on each shapefile from a list of shapefiles with spsample(). For some irregular shapefiles this is proving to be a long process so I need to simply some shapefiles by dropping small and remote polygons which (I think) are the trouble-makers for spsample().
For this I need to know for each polygon it's size and it's mean distance to all other polygons. I am looking for how to speed up this calculation, probably can be done in a more elegant (and faster) way. The attempt shown below works but as a simplifying algorithm it takes too much time.
#program tries to place random points on shapefile shapes[[i]] if it fails after 300 seconds it goes though to simplifying part and swaps the old shapefile with a simplified version.
d <- shapes[[i]]
Fdist <- list()
for(m in 1:dim(d)[1]) {
pDist <- vector()
for(n in 1:dim(d)[1]) {
pDist <- append(pDist, gDistance(d[m,],d[n,]))
}
Fdist[[m]] <- pDist
d#data$mean[m]<-mean(Fdist[[m]])
d#data$gArea[m]<-gArea(d[m,])
}
#drop small and remote polygons
d.1<-d[d#data$gArea>=quantile(d#data$gArea, prob = seq(0, 1, length=11), type=5)[[1]] & (d#data$mean<=quantile(d#data$mean, prob = seq(0, 1, length=11), type=5)[[10]]),]
#replace with simplified polygon
shapes[[i]]<-d.1
I would be grateful for any suggestion.
I would try simplifying the polygons first. ms_simplify in the rmapshaper package can greatly simplify your polygons, without introducing slither polygons or gaps:
library("rgdal")
library("rmapshaper")
big <- readOGR(dsn = ".", "unsimplified_shapefile")
big_sample <- spsample(big, 1000, type = "stratified")
small <- rmapshaper::ms_simplify(big, keep = 0.01)
small_sample <- spsample(small, 1000, type = "stratified")
With a shapefile I had to hand, I reduced a ~100MB shapefile to ~2MB and reduced the time taken to sample from ~2.3s to ~0.11s.
If simplifying is not an option you can vectorise your gArea() and gDistance() functions by using byid = TRUE:
library("rgeos")
big#data$area <- gArea(big, byid = TRUE)
big#data$dist <- gDistance(big, byid = TRUE)

Identifying which points in a regular lattice are within a polygon's boundaries

I would like to work out which points that define a regular lattice are within a polygon. The code below does this but VERY VERY slowly:
#the polygon that I want to check each point against
glasgow_single <- readShapePoly(
fn="data/clipped/glasgow_single"
)
#interpolated contains the coordinates of the regular grid
points_to_check <- expand.grid(
x=interpolated$x,
y=interpolated$y
)
#function to be called by plyr
fn <- function(X){
this_coord <- data.frame(lon=X["x"], lat=X["y"])
this_point <- SpatialPoints(this_coord)
out <- gContains(glasgow_single, this_point)
out <- data.frame(x=X["x"], y=X["y"], val=out)
return(out)
}
#plyr call
vals <- adply(points_to_check, 1, fn, .progress="text")
vals$val <- as.numeric(vals$val)
Taking into account both thinking time and computing time, is there a much faster way of doing this?
Yes, there's a much better approach. For this and many other topological operations, the rgeos package has you well covered. Here, you're wanting rgeos::gWithin():
## Required packages
library(rgdal)
library(raster) ## For example polygon & functions used to make example points
library(rgeos)
## Reproducible example
poly <- readOGR(system.file("external", package="raster"), "lux")[1,]
points <- as(raster(extent(poly)), "SpatialPoints")
proj4string(points) <- proj4string(poly)
## Test which points fall within polygon
win <- gWithin(points, poly, byid=TRUE)
## Check that it works
plot(poly)
points(points, col=1+win)

Resources