Spatial join in R - r

I have two shapefiles—let’s call them shp1.shp and shp2.shp—and I want to create a new shapefile of overlaps between the two.
Essentially, I'm trying to determine which geographies from shp1 fall within shp2. Most shp1 polygons will contain shp2 polygons, and some shp2 polygons will fall within multiple shp1 polygons.
If I start with
library(sp)
large_list <- over(shp1,shp2, returnList = TRUE)
that gets me a Large List of shared geographies. But how do I take that list and use it to do a spatial join, and create a new shapefile?
I'm relatively new to R (especially for GIS) and any help would be appreciated.

You might find the answer in the function intersect, from package raster
library(raster)
newshape <- intersect(shape1, shape2)

Related

Points not added to the map plot

I need to add some points to the map using simple points function. The issue is that points don't add to the map. It's simple command, I follow some tutorial where adding points to the map works this way but not in my case. Plot function plots Texas choropleth properly but next line (points) doesn't add points to the map at all:
library(rgdal)
library(rgeos)
library(sp)
companies <- read.csv('geoloc_data_comp.csv', header = T, dec = ',', sep = ';')
states <- readOGR('.', 'states')
plot(states[states#data$stat_name == 'texas',])
points(companies$coords.x1, companies$coords.x2, pch = 21)
First you shoud start to avoid rgeos/rgdal because they will stop being maintains. See : https://github.com/r-spatial/evolution
sf is replacing them:
library(sp)
library(sf)
library(spData) #used because I wanted US states
# list of data in spData you have one with US states
data(package = "spData")
if you want to read shapefile or other GIS format check sf::st_read() (instead of readOGR())
# one way with sf
plot(us_states$geometry[us_states$NAME == "Texas"])
# if you want do use the sp way
us_sp <- as(us_states, "Spatial") # convert to sp
plot(us_sp[us_sp#data$NAME == "Texas",])
with sf you have the geometry in one column (see "geometry") instead of having an R S4 with nested lists (see #data and #polygones).
Before getting some points we need to check in which CRS our data are. If you do not know CRS I like this website : https://ihatecoordinatesystems.com/
You also have information in the us_states documentation: https://www.rdocumentation.org/packages/spData/versions/2.0.1/topics/us_states
Then you can use:
sp::proj4string(us_sp)
sf::st_crs(us_states)
# This is EPSG 4269 or NAD83
If you want to use points() they need to be in this coordinates system (I suspect this explain your trouble ie different CRS).
You didn't provide data points so I produced some:
library(osmdata)
#this will just download node matching the key/value place=city
some_city_in_texas <- osmdata::opq(osmdata::getbb("Texas US"),
nodes_only = TRUE) %>%
osmdata::add_osm_feature(key = "place", value = "city") %>%
osmdata::osmdata_sf() #keep them in sf format
# osmdata_sp() also exist
The class osmdata is a bit complicated but here you just need to know that some_city_in_texas$osm_points provide us with points (to test points()). Now we can check their CRS:
sf::st_crs(some_city_in_texas$osm_points)
As you can see we are in an other CRS so we need to transform it. (you will probably need to do it).
city_in_texas <- sf::st_transform(some_city_in_texas$osm_points,
4269)
sf use simple feature standard to store localization and points() want two vectors x&y. You should also check that (common cause of error): R use x/y (long/lat) and not lat/long.
Here we convert city_in_texas to just coords. (if you need to do the reverse, ie converting data frame with X/Y, into an sf object look at sf::st_as_sf())
coords_city <- sf::st_coordinates(city_in_texas)
Finally this works fine now:
plot(us_states$geometry[us_states$NAME == "Texas"])
points(coords_city, pch = 21)
Good ressources are https://r-spatial.org/ and https://geocompr.robinlovelace.net/

Create a vector footprint of a multi-part raster [R]

I've read a raster into my R session, using this code:
raster <- stack("raster.tif")
and now I'd like to make a simple feature (sf) object representing the outline of that raster. I can't use a bounding box because the raster is multi-part so the bounding box would be much larger than the raster. So the footprint also needs to be a multi-part feature (sf multipolygon).
I'd appreciate any help with this. Thanks!
Mark
If you want each raster in the stack you need to loop over each one with lapply. This will return a list of polygon layers. You then need to convert each component of the list to an sf multipolygon. Lastly, you need to concatenate the features (note that c is the c() function). shp should be your multipolygon. You may not want to dissolve the polygons, you didn't really make it clear what you wanted.
a <- lapply(as.list(raster), rasterToPolygons, dissolve=TRUE)
b <- lapply(a, st_as_sf) # convert to sf multipolygon
shp <- Reduce(c, b) # combine all polygons to one
As a side note, it's probably not great to use raster as a variable name because the raster package has a function called raster.

How to subset a large SpatialPolygonsDataFrame

I want to calculate the area of a wildfire. I tried this by substracting the NDVI calculated on a Landsat image before and another image after the fire and see where the NDVI was reduced. However, not only in the burning areas the NDVI has changed, but there are also many random differences. I used rasterToPolygons to create a large SpatialPolygonsDataFrame containing all areas where NDVI after - NDVI before < 0.
Now I want to remove all the polygons with an area below a certain threshold value. However, I cannot find a way to subset the large SpatialPolygonsDataFrame.
I found an example on how to get a list of the polygons with an area above the threshold (where burned_poly is the large SpatialPolygonsDataFrame):
pols <- lapply(burned_poly#polygons , slot , "Polygons")
pols_areas <- lapply(pols[[2]], function(x) slot(x, "area"))
However, accessing the large SpatialPolygonsDataFrame like this
bp <- burned_poly#polygons[[1]]#Polygons[pols_areas >= 9000]
gives me a list which I am currently unable to coerce into a SpatialPolygonsDataFrame.
Can someone tell me how to do this last step (I have trouble with the Sf argument of which I don't know what it is in the SpatialPolygonsDataFrame function), or maybe there is a different and better approach to extract the fire extent as a polygon?
Alright, I think I have found a way thanks to Orlandos suggestion to use sf.
I transformed my large SpatialPolygonsDataFrame object to a sf object via st_as_sf() which gave me a multipolygon. This stf_MULTIPOLYGON object can be subdivided into single polygons using st_cast() and the resulting object is subsettable like a data.frame.
bp_sf <- st_as_sf(burned_poly)
bps_sf <- st_cast(bp_sf, "POLYGON")
BpSf <- bps_sf[as.numeric(st_area(bps_sf))>=10000,]
If you are using the simple features sf library you can use functions from the tidyverse. Filtering data is a matter of using the filter() function. Notice that you can convert your objects to sf using st_as_sf(). See: https://r-spatial.github.io/sf/reference/st_as_sf.html and How to filter an R simple features collection using sf methods like st_intersects()?

Extracting values from inside polygons raster r

I'm trying to find the mean daily temperature for counties in South Dakota from raster grids ('bil' files) found at http://prism.oregonstate.edu/. I am getting county boundaries from the 'maps' package.
library(maps)
library(raster)
sd_counties <- map('county','south dakota')
sd_raster <- raster('file_path')
How do I extract the grid cells within each county? I think I need to turn each county into it's own polygon to do this, but how? Then, I should be able to do something like the following. Any help would be greatly appreciated.
values <- extract(raster, list of polygons)
polygon_means <- unlist(lapply(values, FUN=mean))
I'm not familiar with the maps package or the map function, but it looks like it's solely for visualization, rather than geospatial operations.
While there might be a way to convert the map object to actual polygons, here's an easy way sing raster's getData function that works:
library(raster)
usa_adm2 <- getData(country='USA',level=2)
sd_counties <- usa_adm2[grepl('South Dakota',usa_adm2$NAME_1),]
plot(sd_counties)
Now you can extract pixels for each county using extract(r,sd_counties), where r is your desired raster.
Note, that depending on the number of pixels (and layers) you need to extract, that can take some time.

Choropleth Maps in R - TIGER Shapefile issue

Have a Question on Mapping with R, specifically around the choropleth maps in R.
I have a dataset of ZIP codes assigned to an are and some associated data (dataset is here).
My final data format is: Area ID, ZIP, Probability Value, Customer Count, Area Probability and Area Customer Total. I am attempting to present this data by plotting area probability and Area Customer Total on a Map. I have tried to do this by using the census TIGER Shapefiles but I guess R cannot handle the complete country.
I am comfortable with the Statistical capabilities and now I am moving all my Mapping from third party GIS focused applications to doing all my Mapping in R. Does anyone have any pointers to how to achieve this from within R?
To be a little more detailed, here's the point where R stops working -
shapes <- readShapeSpatial("tl_2013_us_zcta510.shp")
(where the shp file is the census/TIGER) shape file.
Edit - Providing further details. I am trying to first read the TIGER shapefiles, hoping to combine this spatial dataset with my data and eventually plot. I am having an issue at the very beginning when attempting to read the shape file. Below is the code with the output
require(maptools)
shapes<-readShapeSpatial("tl_2013_us_zcta510.shp")
Error: cannot allocate vector of size 317 Kb
There are several examples and tutorials on making maps using R, but most are very general and, unfortunately, most map projects have nuances that create inscrutable problems. Yours is a case in point.
The biggest issue I came across was that the US Census Bureau zip code tabulation area shapefile for the whole US is huge: ~800MB. When loaded using readOGR(...) the R SpatialPolygonDataFrame object is about 913MB. Trying to process a file this size, (e.g., converting to a data frame using fortify(...)), at least on my system, resulted in errors like the one you identified above. So the solution is to subset the file based in the zip codes that are actually in your data.
This map:
was made from your data using the following code.
library(rgdal)
library(ggplot2)
library(stringr)
library(RColorBrewer)
setwd("<directory containing shapfiles and sample data>")
data <- read.csv("Sample.csv",header=T) # your sample data, downloaded as csv
data$ZIP <- str_pad(data$ZIP,5,"left","0") # convert ZIP to char(5) w/leading zeros
zips <- readOGR(dsn=".","tl_2013_us_zcta510") # import zip code polygon shapefile
map <- zips[zips$ZCTA5CE10 %in% data$ZIP,] # extract only zips in your Sample.csv
map.df <- fortify(map) # convert to data frame suitable for plotting
# merge data from Samples.csv into map data frame
map.data <- data.frame(id=rownames(map#data),ZIP=map#data$ZCTA5CE10)
map.data <- merge(map.data,data,by="ZIP")
map.df <- merge(map.df,map.data,by="id")
# load state boundaries
states <- readOGR(dsn=".","gz_2010_us_040_00_5m")
states <- states[states$NAME %in% c("New York","New Jersey"),] # extract NY and NJ
states.df <- fortify(states) # convert to data frame suitable for plotting
ggMap <- ggplot(data = map.df, aes(long, lat, group = group))
ggMap <- ggMap + geom_polygon(aes(fill = Probability_1))
ggMap <- ggMap + geom_path(data=states.df, aes(x=long,y=lat,group=group))
ggMap <- ggMap + scale_fill_gradientn(name="Probability",colours=brewer.pal(9,"Reds"))
ggMap <- ggMap + coord_equal()
ggMap
Explanation:
The rgdal package facilitates the creation of R Spatial objects from ESRI shapefiles. In your case we are importing a polygon shapefile into a SpatialPolygonDataFrame object in R. The latter has two main parts: a polygon section, which contains the latitude and longitude points that will be joined to create the polygons on the map, and a data section which contains information about the polygons (so, one row for each polygon). If, e.g., we call the Spatial object map, then the two sections can be referenced as map#polygons and map#data. The basic challenge in making choropleth maps is to associate data from your Sample.csv file, with the relevant polygons (zip codes).
So the basic workflow is as follows:
1. Load polygon shapefiles into Spatial object ( => zips)
2. Subset if appropriate ( => map).
3. Convert to data frame suitable for plotting ( => map.df).
4. Merge data from Sample.csv into map.df.
5. Draw the map.
Step 4 is the one that causes all the problems. First we have to associate zip codes with each polygon. Then we have to associate Probability_1 with each zip code. This is a three step process.
Each polygon in the Spatial data file has a unique ID, but these ID's are not the zip codes. The polygon ID's are stored as row names in map#data. The zip codes are stored in map#data, in column ZCTA5CE10. So first we must create a data frame that associates the map#data row names (id) with map#data$ZCTA5CE10 (ZIP). Then we merge your Sample.csv with the result using the ZIP field in both data frames. Then we merge the result of that into map.df. This can be done in 3 lines of code.
Drawing the map involves telling ggplot what dataset to use (map.df), which columns to use for x and y (long and lat) and how to group the data by polygon (group=group). The columns long, lat, and group in map.df are all created by the call to fortify(...). The call to geom_polygon(...) tells ggplot to draw polygons and fill using the information in map.df$Probability_1. The call to geom_path(...) tells ggplot to create a layer with state boundaries. The call to scale_fill_gradientn(...) tells ggplot to use a color scheme based on the color brewer "Reds" palette. Finally, the call to coord_equal(...) tells ggplot to use the same scale for x and y so the map is not distorted.
NB: The state boundary layer, uses the US States TIGER file.
I would advise the following.
Use readOGR from the rgdal package rather than readShapeSpatial.
Consider using ggplot2 for good-looking maps - many of the examples use this.
Refer to one of the existing examples of creating a choropleth such as this one to get an overview.
Start with a simple choropleth and gradually add your own data; don't try and get it all right at once.
If you need more help, create a reproducible example with a SMALL fake dataset and with links to the shapefiles in question. The idea is that you make it easy to help us help you rather than discourage us by not supplying code and data in your question.

Resources