How to create a binomial field in a raster from a polygon overlay? - r

I have a raster where each attribute field represents a different input in a fire risk model. I'm trying to test my model's skill at predicting fire locations. I have polygons representing two years of fire perimeters (years that weren't used to train the original model).
I'd like to create a binomial presence/absence field in my raster based on the fire polygons, where presence represents cells overlaid by a fire polygon and absence represents areas that did not burn in the study period. I'd like to include cells that are both completely and partially overlaid by a fire polygon as presences. Is there a good way to do this in either ArcMap or R?
I've considered rasterizing my polygons but that seems clunky. I'd like to avoid losing resolution if at all possible.

You say two contradictory things. First
I'd like to create a presence/absence raster based on the fire
polygons
And then
I've considered rasterizing my polygons but that seems clunky
Based on your goal (first statement) you could start with
r <- rasterize(x, y, field=1, getCover=TRUE)
(in the terra or raster packages)

Related

Mapping how many points are within a radius of every location in R

In R, I am trying to create a choropleth map. I have built a database of businesses, some are part of chains (e.g. McDonalds) and others are independent. I want to calculate how many businesses are within 30km of each point on the map, but treat the different locations of chains as a single business.
For example, if you a point is:
5km from a McDonalds,
10km from Taco Bell
15km from Chick-Fil-A
20km from KFC
25km from McDonalds
35km from Five Guys
The colour will show that there are 4 fast food outlets within 30km.
I am happy to use any R package but I am mostly familiar with tmaps and ggplot2 maps.
At this stage the best approach I can think of is to create polygons for each chain and stack them as transparent layers of the same colour. I don't think this would be very efficient and wouldn't create a very nice looking choropleth.
The other answers I could find were either counting points (e.g https://gis.stackexchange.com/questions/229066/counting-how-many-times-a-point-is-inside-a-set-of-intersecting-polygons-in-r) or for GIS software.
EDIT:
I have managed to create a 30km radius from every location of every chain (using rgeos gIntersection). I now have a series of polygons.
To solve my question the additional thing I need to do is create polygons for where:
Only one polygon covers the area,
Two polygons covers the area,
etc.
To try to visual is this I used the answer from https://gis.stackexchange.com/questions/229066/counting-how-many-times-a-point-is-inside-a-set-of-intersecting-polygons-in-r
In the linked question they are trying to count how many polygons cover the numbered points (the image on the right). What I am trying to do is to create the image on the left, where there are polygons of no overlap (1), two overlapping polygons (2) and so on.
I think what you are trying to accomplish would be best approached using a raster approach rather than a chloropleth. To make a chorlopleth, you define a set of (generally irregular) polygons, summarize something within each polygon, then color the polygons based on the attributes. This would be a good approach if you wanted to say how many fast food resteraunts are within each state or county, or how many fast food joints per capita by state.
From your description, however, you are looking for how many fast food joints within a set radius for all points. This is more of a raster question, since you can represent your data on a regular grid.
The raster package is a good start for working with raster data and works well with the sf package.
You need to determine what density you need to accomplish your goal, then use this to determine the resolution of your raster. Once you've got that you can use raster::rasterize() to summarize your (I'm assuming) point data.
I'm assuming you have an object that has the locations of each restaurant, I'll call this object "points".
library(raster)
library(sf)
# create raster template with 30km resolution (I'm assuming your projection is in meters)
raster_template = raster((extent(points),
resolution = 30000,
crs = st_crs(points)
)
# rasterize your point data
r = rasterize(points, raster_template, fun = "count")
This should create a grid where each cell has the number of points within each 30km cell. You should then be able to plot the raster, but may want to either clip or mask it to just show parts that are within New Zealand

How to get covariate data from a geographic raster for `ppm`?

I want to fit a Poisson point-process model with spatstat::ppm and I'm unsure what is the best way to feed covariate data to the function. I understand that spatstat expects planar coordinates, so I have transformed my point location data to a planar crs before creating a ppp point pattern object. The covariate data are in a raster stack with unprojected geographic coordinates and I understand that projecting rasters is generally ill-advised. I extracted covariate values for the point locations from the raster using the points' original geographic coordinates and raster::extract. So far so good. The issue is ...
it is not sufficient to have observed the covariate only at the points
of the data point pattern; the covariate must also have been observed
at other locations in the window. -ppm helpfile
I appear to have two options for providing the covariate data to the data argument.
A pixel image; seems ill-advised because of raster projection issues.
A list of functions (one per covariate) that can be evaluated at any location (x,y) to obtain corresponding covariate values. This seems like the way to go, but my attempt at writing such a function turns out to be ridiculously slow. It calls raster::extract for each coordinate pair after transforming the coordinates to the raster's crs. While raster::extract is reasonably fast when given a large number of points, there appears to be a substantial overhead for each call. According to microbenchmark, the coordinate transformation takes about 4ms and the extraction takes about 582ms for a single covariate, or about 4 seconds for each point to get all 7 covariates. I don't know how many times ppm will want to call this, but if it's even once per point in the pattern, it'll take too long.
Is there some way I can find out what is the complete set of points that ppm will query for covariate data so that I can extract those beforehand with a single call?
It seems like my use case (covariates in a geographic raster) should be pretty common, so I'm guessing there's an established way to do this right. What is it?
Thanks for a well written question clearly identifying you need. It would have been even better with a simple reproducible example using e.g. built-in data from raster and spatstat or artificially generated data. In lack of the reproducible example my answer will not contain any code but outline what you could do.
First step in ppm is to make a quadrature scheme or class quad or logiquad depending on which maximum likelihood approximation is used in ppm. These can be generated directly by the user via quadscheme or quadscheme.logi. The quadrature scheme contains all the points where ppm will evaluate the covariates. You can extract the coordinates of the quadrature scheme using the function coords. If you construct a data.frame with all covariates evaluated at these points you can supply that as the data argument to ppm while the quadrature scheme is the first argument. To understand things better try to read the Details section of help(ppm.quad).
Another approach which may give you the optimal use of your data is to extract the grid points of you current raster stack together with all the covariate values and project this point data. Then convert it to a simple data.frame with columns x, y, covar1, covar2, etc. Then you can use x and y together with your point observations of interest to create a quadrature scheme manually and the remaining columns can be supplied as data to ppm.
It would be interesting to compare the results from both these approaches as well as the results from just projecting the raster stack and converting it to a list of im objects.

Merge neighbouring areas together, if tm_fill for one area is too small

I have made a map with zip codes of a town. The fill is the ratio of being a case or not case. But some zip codes have very few numbers in total so outliers distort the map.
Is there a way to merge the polygons and data of two neighboring areas based on their n automatically?
And if that is not possible, how can I merge rows of my sf/df without losing ID?
And I guess simplest would be just to set the zip codes to NA.
Depends on what you mean by "automatically". Here's a simple algorithm.
repeat:
Find the region with the smallest population.
If that's more than your threshold, stop
Find that region's neighbours, pick one (at random, or smallest population).
merge that neighbour with that region
Finding neighbours and merging can all be done with either the sf package or the sp package and friends (like spdep and rgeos).
Equally, this can be considered a clustering algorithm using a distance metric based on adjacency. You could do a full hierarchical clustering and then cut the tree at a point such that all clusters had N>threshold.
Now as to whether this is a good idea statistically is another question, and depends on what your goal here is. If you are worried about whether an underlying risk is, say > 0.5, and you are getting positives because you have a population of 3 and 2 positives "by chance" from a small sample (of 3), then you need to model your data and work out the probability of exceeding 0.5 given the data. Then map that, which will take into account the small sample size.

Create stage height raster using least cost path and r

I have a point shapefile of Station IDs and stageheights. I would like to create a raster where each cell has the stage height value (in meters) of the closest in situ station to that cell.
I want this raster to match up with another raster. So I would like it if I could input both a raster I have created (dataset 3 described below) and my point shapefile (1).
Datasets:
1) Point Shapefile with stage heights of a river delta
2) Shapefile of the river delta extent
3) Raster of the delta where NA's represent land (could also have them be zero's if need be) and 1's are water. Two datasets 10 meter resolution and 30 meter resolution.
One conceptual issue I am having is with the amount of small streams I have.
For example (pictured in image below), station 1 (circled in blue) is technically closer to the black x region than station 2 (circled in red), but the stage height value in red is more representative of point x. There are NA's in between the two streams, does that mean that the value will not jump across streams?
How can I reassign the values in my Raster (all the 1's) to the stage height of the nearest station and make sure that these values are not jumping from stream to stream? Do I need to use least cost path? What is the best way to do this?
I would like to use R, but can use ArcMap if I must.
So I'm not sure what tools you have available to you but I think this answer may be useful:
Calculating attribute for network distance between multiple points in ArcGIS Desktop?
Here the questioner was looking to calculate distances on roads to some points, but your problem seems similar. I think the main point I would make here is that you should do your network distance classification prior to worrying about the raster layer. You may have to convert from polygon to lines or some workaround to get your data into a format that works, but this is the kind of job the tool is designed to do.
After you have reclassified your river shapefile based on their network distance to a given point, then convert the polygons to raster and use this to classify your original raster. You could do this in R or Arcmap. Arcmap will probably be faster.

R: Determining whether a point lies inside a region made up of separate polygons generated from contourLines()

I am using the function contourLines() in R to record the vertices of a contour based on a probability density estimation. Then I test to see whether a point lies inside the contour region. I can do this test easily when there is only one region (polygon) created from contourLines, but sometimes the there are multiple polygons created. I am trying to come up with a way to determine whether a point lies inside the multiple polygon contour.
My idea so far is to calculate the number of polygons generated and treat each one separately. I was thinking I could use graph theory to determine the number of polygons generated because there will not be a path between points on 2 separate polygons.
Probably there is an easier way. Any suggestions?
Thanks in advance,
HS

Resources