How to subset a large SpatialPolygonsDataFrame - r

I want to calculate the area of a wildfire. I tried this by substracting the NDVI calculated on a Landsat image before and another image after the fire and see where the NDVI was reduced. However, not only in the burning areas the NDVI has changed, but there are also many random differences. I used rasterToPolygons to create a large SpatialPolygonsDataFrame containing all areas where NDVI after - NDVI before < 0.
Now I want to remove all the polygons with an area below a certain threshold value. However, I cannot find a way to subset the large SpatialPolygonsDataFrame.
I found an example on how to get a list of the polygons with an area above the threshold (where burned_poly is the large SpatialPolygonsDataFrame):
pols <- lapply(burned_poly#polygons , slot , "Polygons")
pols_areas <- lapply(pols[[2]], function(x) slot(x, "area"))
However, accessing the large SpatialPolygonsDataFrame like this
bp <- burned_poly#polygons[[1]]#Polygons[pols_areas >= 9000]
gives me a list which I am currently unable to coerce into a SpatialPolygonsDataFrame.
Can someone tell me how to do this last step (I have trouble with the Sf argument of which I don't know what it is in the SpatialPolygonsDataFrame function), or maybe there is a different and better approach to extract the fire extent as a polygon?

Alright, I think I have found a way thanks to Orlandos suggestion to use sf.
I transformed my large SpatialPolygonsDataFrame object to a sf object via st_as_sf() which gave me a multipolygon. This stf_MULTIPOLYGON object can be subdivided into single polygons using st_cast() and the resulting object is subsettable like a data.frame.
bp_sf <- st_as_sf(burned_poly)
bps_sf <- st_cast(bp_sf, "POLYGON")
BpSf <- bps_sf[as.numeric(st_area(bps_sf))>=10000,]

If you are using the simple features sf library you can use functions from the tidyverse. Filtering data is a matter of using the filter() function. Notice that you can convert your objects to sf using st_as_sf(). See: https://r-spatial.github.io/sf/reference/st_as_sf.html and How to filter an R simple features collection using sf methods like st_intersects()?

Related

Normalize RasterLayer as Matrix to use as Clip Frame

I was assigned the task to clip a raster from .nc file from a .tif file.
edit (from comment):
i want to extract temp. info from the .nc because i need to check the yearly mean temperature of a specific region. to be comparable the comparison has to occur on exactly the same area. The .nc file is larger than the previously checked area so i need to "clip" it to the extent of a .tif I have. The .tif data is in form 0|1 where it is 0 (or the .tif is smaller than the .nc) the .nc data should be "cliped". In the end i want to keep the .nc data but at the extent of the .tif while still retaining its resolution & projection. (.tif and .nc have different projections&pixel sizes)
Now ordinarily that wouldn't be a problem as i could use raster::crop. This doesn't deal with different projections and different pixel size/resolution though. (I still used it to generate an approximation, but it is not precise enough for the final infromation, as can be seen in the code snippet below). The obvious method to generate a more reliable dataset/rasterset would be to first use a method like raster::projectRaster or raster::sp.Transform # adding sp.transform was done in an edit to the original question and homogenize the datasets but this approach takes too much time, as i have to do this for quite a few .nc files.
I was told the best method would be to generate a normalized matrix from the smaller raster "clip_frame" and then just multiply it with the "nc_to_clip" raster. Doing so should prevent any errors through map projections or other factors. This makes a lot of sense to me in theory but I have no idea how to do this in practice. I would be very grateful to any kind of hint/code snippet or any other help.
I have looked at similar problems on StackOverflow (and other sites) like:
convert matrix to raster in R
Convert raster into matrix with R
https://www.researchgate.net/post/Hi_Is_there_a_way_to_multiply_Raster_value_by_Raster_Latitude
As I am not even sure how to frame the question correctly, I might have overlooked an answer to this problem, if so please point me there!
My (working) code so far, just to give you an idea of how I want to approach the topic (here using the crop-function).
#library(ncdf4)
library(raster)
library(rgdal)
library(tidyverse)
nc_list<-list.files(pattern = ".*0.nc$") # list of .nc files containing raster and temperature information
#nc_to_clip <- lapply(nc_list, raster, varname="GST") # read in as raster
nc_to_clip < -raster(ABC.nc, vername="GST)
clip_frame <- raster("XYZ.tif") # read in .tif for further use as frame
mean_temp_from_raster<-function(input_clip_raster, input_clip_frame){ # input_clip_raster= raster to clip, input_clip_frame
r2_coord<-rasterToPoints(input_clip_raster, spatial = TRUE) # step 1 to extract coordinates
map_clip <- crop(input_clip_raster, extent(input_clip_frame)) # use crop to cut the input_clip_raster (this being the function I have to extend on)
temp<-raster::extract(map_clip, r2_coord#coords) # step 2 to extract coordinates
temp_C<-temp*0.01-273.15 # convert kelvin*100 to celsius
temp_C<-na.omit(temp_C)
mean(temp_C)
return_list<-list(map_clip, mean(temp_C))
return(return_list)
}
mean_tempC<-lapply(nc_to_clip, mean_temp_from_raster,clip_frame)
Thanks!
PS:
I don't have much experience working with .nc files and/or RasterLayers in R as I used to work with ArcGIS/Python (arcpy) for problems like this, which is not an option right now.
Perhaps something like this?
library(raster)
nc <- raster(ABC.nc, vername="GST)
clip <- raster("XYZ.tif")
x <- as(extent(clip), "SpatialPolygons")
crs(x) <- crs(clip)
y <- sp::spTransform(x, crs(nc))
clipped <- crop(nc, y)

Create a vector footprint of a multi-part raster [R]

I've read a raster into my R session, using this code:
raster <- stack("raster.tif")
and now I'd like to make a simple feature (sf) object representing the outline of that raster. I can't use a bounding box because the raster is multi-part so the bounding box would be much larger than the raster. So the footprint also needs to be a multi-part feature (sf multipolygon).
I'd appreciate any help with this. Thanks!
Mark
If you want each raster in the stack you need to loop over each one with lapply. This will return a list of polygon layers. You then need to convert each component of the list to an sf multipolygon. Lastly, you need to concatenate the features (note that c is the c() function). shp should be your multipolygon. You may not want to dissolve the polygons, you didn't really make it clear what you wanted.
a <- lapply(as.list(raster), rasterToPolygons, dissolve=TRUE)
b <- lapply(a, st_as_sf) # convert to sf multipolygon
shp <- Reduce(c, b) # combine all polygons to one
As a side note, it's probably not great to use raster as a variable name because the raster package has a function called raster.

Convert SpatialCollections to SpatialPolygonsDataFrame in R

I am struggling to convert an object of class SpatialCollections to a SpatialPolygonsDataFrame object.
My input files are both shapefiles and SpatialPolygonsDataFrame objects. They can be accessed here.
I do an intersection of both objects:
SPDF_A <- shapefile("SPDF_A")
SPDF_B <- shapefile("SPDF_B")
intersection <- gIntersection(gBuffer(SPDF_A, width=0), gBuffer(SPDF_B, width=0))
The result is:
> intersection
class : SpatialCollections
Setting gBuffer(... , byid=T) or gBuffer(... , byid=F) seems to make no difference.
I use gIntersection and gBuffer(... , width=0) insetead of intersect in order to avoid geometrical problems (Self-intersection).
This is part of a larger loop. I need to get the intersection as SpatialPolygonsDataFrame because it will be saved as shp file in a following step.
writeOGR(intersection, ".", layer=paste0("Int_SPDF_A-SPDF_B"), driver="ESRI Shapefile")
This is not possible from a SpatialCollections object. In order to convert this to a SpatialPolygonsDataFrame I tried:
intersection <- as(intersection ,"SpatialPolygonsDataFrame")
intersection <- SpatialPolygonsDataFrame(intersection)
intersection <- readOGR(intersection, layer = "intersection")
Nothing works. Does anybody have a solution? Thanks a lot!
First of all, according to the documentation SpatialCollections is kind of a container format that can "hold SpatialPoints, SpatialLines, SpatialRings, and SpatialPolygons (without attributes)". If you need the data frame part of your SpatialPolygonsDataFrame ("attribute table" in GIS language), you'll have to work around that somehow. If, on the other hand, you're only interested in the spatial information (the polygons without the data attached to them) try the following:
str(intersection, max.level = 3)
suggests that the #polyobj is nothing but a SpatialPolygons object. Hence
mySpoly <- intersection#polyobj
should do the trick and
class(mySpoly)
suggests that we indeed now have a SpatialPolygons.
You need to convert that to a SpatialPolygonsDataFrame before exporting:
mySpolyData <- as(mySpoly, "SpatialPolygonsDataFrame")
writeOGR(mySpolyData, ".", layer=paste0("Int_SPDF_A-SPDF_B"), driver="ESRI Shapefile")

Extracting values from inside polygons raster r

I'm trying to find the mean daily temperature for counties in South Dakota from raster grids ('bil' files) found at http://prism.oregonstate.edu/. I am getting county boundaries from the 'maps' package.
library(maps)
library(raster)
sd_counties <- map('county','south dakota')
sd_raster <- raster('file_path')
How do I extract the grid cells within each county? I think I need to turn each county into it's own polygon to do this, but how? Then, I should be able to do something like the following. Any help would be greatly appreciated.
values <- extract(raster, list of polygons)
polygon_means <- unlist(lapply(values, FUN=mean))
I'm not familiar with the maps package or the map function, but it looks like it's solely for visualization, rather than geospatial operations.
While there might be a way to convert the map object to actual polygons, here's an easy way sing raster's getData function that works:
library(raster)
usa_adm2 <- getData(country='USA',level=2)
sd_counties <- usa_adm2[grepl('South Dakota',usa_adm2$NAME_1),]
plot(sd_counties)
Now you can extract pixels for each county using extract(r,sd_counties), where r is your desired raster.
Note, that depending on the number of pixels (and layers) you need to extract, that can take some time.

Choropleth Maps in R - TIGER Shapefile issue

Have a Question on Mapping with R, specifically around the choropleth maps in R.
I have a dataset of ZIP codes assigned to an are and some associated data (dataset is here).
My final data format is: Area ID, ZIP, Probability Value, Customer Count, Area Probability and Area Customer Total. I am attempting to present this data by plotting area probability and Area Customer Total on a Map. I have tried to do this by using the census TIGER Shapefiles but I guess R cannot handle the complete country.
I am comfortable with the Statistical capabilities and now I am moving all my Mapping from third party GIS focused applications to doing all my Mapping in R. Does anyone have any pointers to how to achieve this from within R?
To be a little more detailed, here's the point where R stops working -
shapes <- readShapeSpatial("tl_2013_us_zcta510.shp")
(where the shp file is the census/TIGER) shape file.
Edit - Providing further details. I am trying to first read the TIGER shapefiles, hoping to combine this spatial dataset with my data and eventually plot. I am having an issue at the very beginning when attempting to read the shape file. Below is the code with the output
require(maptools)
shapes<-readShapeSpatial("tl_2013_us_zcta510.shp")
Error: cannot allocate vector of size 317 Kb
There are several examples and tutorials on making maps using R, but most are very general and, unfortunately, most map projects have nuances that create inscrutable problems. Yours is a case in point.
The biggest issue I came across was that the US Census Bureau zip code tabulation area shapefile for the whole US is huge: ~800MB. When loaded using readOGR(...) the R SpatialPolygonDataFrame object is about 913MB. Trying to process a file this size, (e.g., converting to a data frame using fortify(...)), at least on my system, resulted in errors like the one you identified above. So the solution is to subset the file based in the zip codes that are actually in your data.
This map:
was made from your data using the following code.
library(rgdal)
library(ggplot2)
library(stringr)
library(RColorBrewer)
setwd("<directory containing shapfiles and sample data>")
data <- read.csv("Sample.csv",header=T) # your sample data, downloaded as csv
data$ZIP <- str_pad(data$ZIP,5,"left","0") # convert ZIP to char(5) w/leading zeros
zips <- readOGR(dsn=".","tl_2013_us_zcta510") # import zip code polygon shapefile
map <- zips[zips$ZCTA5CE10 %in% data$ZIP,] # extract only zips in your Sample.csv
map.df <- fortify(map) # convert to data frame suitable for plotting
# merge data from Samples.csv into map data frame
map.data <- data.frame(id=rownames(map#data),ZIP=map#data$ZCTA5CE10)
map.data <- merge(map.data,data,by="ZIP")
map.df <- merge(map.df,map.data,by="id")
# load state boundaries
states <- readOGR(dsn=".","gz_2010_us_040_00_5m")
states <- states[states$NAME %in% c("New York","New Jersey"),] # extract NY and NJ
states.df <- fortify(states) # convert to data frame suitable for plotting
ggMap <- ggplot(data = map.df, aes(long, lat, group = group))
ggMap <- ggMap + geom_polygon(aes(fill = Probability_1))
ggMap <- ggMap + geom_path(data=states.df, aes(x=long,y=lat,group=group))
ggMap <- ggMap + scale_fill_gradientn(name="Probability",colours=brewer.pal(9,"Reds"))
ggMap <- ggMap + coord_equal()
ggMap
Explanation:
The rgdal package facilitates the creation of R Spatial objects from ESRI shapefiles. In your case we are importing a polygon shapefile into a SpatialPolygonDataFrame object in R. The latter has two main parts: a polygon section, which contains the latitude and longitude points that will be joined to create the polygons on the map, and a data section which contains information about the polygons (so, one row for each polygon). If, e.g., we call the Spatial object map, then the two sections can be referenced as map#polygons and map#data. The basic challenge in making choropleth maps is to associate data from your Sample.csv file, with the relevant polygons (zip codes).
So the basic workflow is as follows:
1. Load polygon shapefiles into Spatial object ( => zips)
2. Subset if appropriate ( => map).
3. Convert to data frame suitable for plotting ( => map.df).
4. Merge data from Sample.csv into map.df.
5. Draw the map.
Step 4 is the one that causes all the problems. First we have to associate zip codes with each polygon. Then we have to associate Probability_1 with each zip code. This is a three step process.
Each polygon in the Spatial data file has a unique ID, but these ID's are not the zip codes. The polygon ID's are stored as row names in map#data. The zip codes are stored in map#data, in column ZCTA5CE10. So first we must create a data frame that associates the map#data row names (id) with map#data$ZCTA5CE10 (ZIP). Then we merge your Sample.csv with the result using the ZIP field in both data frames. Then we merge the result of that into map.df. This can be done in 3 lines of code.
Drawing the map involves telling ggplot what dataset to use (map.df), which columns to use for x and y (long and lat) and how to group the data by polygon (group=group). The columns long, lat, and group in map.df are all created by the call to fortify(...). The call to geom_polygon(...) tells ggplot to draw polygons and fill using the information in map.df$Probability_1. The call to geom_path(...) tells ggplot to create a layer with state boundaries. The call to scale_fill_gradientn(...) tells ggplot to use a color scheme based on the color brewer "Reds" palette. Finally, the call to coord_equal(...) tells ggplot to use the same scale for x and y so the map is not distorted.
NB: The state boundary layer, uses the US States TIGER file.
I would advise the following.
Use readOGR from the rgdal package rather than readShapeSpatial.
Consider using ggplot2 for good-looking maps - many of the examples use this.
Refer to one of the existing examples of creating a choropleth such as this one to get an overview.
Start with a simple choropleth and gradually add your own data; don't try and get it all right at once.
If you need more help, create a reproducible example with a SMALL fake dataset and with links to the shapefiles in question. The idea is that you make it easy to help us help you rather than discourage us by not supplying code and data in your question.

Resources