How to calculate area of polygons from a large shapefile - r

Summary:
I'm trying to calculate the area of a large number of polygons in R. I've read a few posts about how I might do this (Example #1 & Example #2) but the problem I'm having is that my shapefile is too large (1.7gb) to import. Given I can't import the file, I can't calculate the area of the polygons.
Extended Explanation:
I'm actually trying to calculate the area of properties in Victoria, Australia. The polygons represent these properties. I downloaded the simplified models 1 and 2 of VicMaps from Spatial Datamart for all of Victoria.
However, given the size of the shapefiles, I had to narrow my search to just one local government area (LGA) and calculated the polygon areas (just for testing). The shapefile was 15.5MB.
library(raster)
x <- shapefile("D:/Downloads/SDM616230/ll_gda94/shape/lga_polygon/ballarat/VMPROP/PROPERTY_PRIMARY_APPROVED.shp")
crs(x)
x$area_sqkm <- area(x) / 1000000
This worked but its not a practical solution to my problem given there's many LGAs in Victoria and I plan to eventually follow the same process for Queensland and NSW.
However, trying to load a larger shapefile doesn't work and results in the below error code "Error: memory exhausted (limit reached?)".
I've tried using readShapePoly, readogr, st_read and read_sf to get the large shapefile into R but they don't work. I think the file is just too large. I tried using a select query within read_sf in an effort to reduce the size of the file I was reading but that didn't work either. I've read online that I should seek to split the shapefile into just the data I need to reduce the size - but I have no idea how to do that.
Hope you can help.

Obviously the file is too big for a single box. I think the options then are either
1) split the files into smaller ones, process one by one. See
https://gis.stackexchange.com/questions/195508/split-a-shapefile-into-smaller-files-on-linux-command-line
2) use some dbms or data warehouse to do it, they do such batching automatically.

Related

How do I mask a multi-layer netCDF or raster with a single-layer shapefile?

I am currently working with daily precipitation data in netCDF format. The data's at a 4km resolution that covers the United States. However, I want to mask/clip the data with a much higher-resolution shapefile for a particular geographical region (about the size of a county). Ultimately, I want the output to be daily precipitation data, either at that high resolution or the original 4km resolution, for the much smaller area.
I've tried a couple different methods, with the most success using the following code:
prcp_2000 <- raster::brick('pr_2000.nc')
shapefile <- shapefile("polygon_combined.shp")
shapefile <- spTransform(shapefile, crs(prcp_2000))
prcp_2000 <- mask(prcp_2000, shapefile)
prcp_2000 <- crop(prcp_2000, shapefile)
outfile <- paste("prcp_","2000_","CS",".nc",sep="")
writeRaster(prcp_2000, outfile, overwrite=TRUE, format="CDF", varname="prcp", varunit="mm/day", longname="mm of precipitation per day", xname="lon", yname="lat", zname="day", zunit="days since 1900-01-01")
However, I keep getting nothing but infinities/negative infinities for prcp output, even though I'm still getting appropriate variable lengths otherwise (day=366, lat=24, lon=23). Am I missing something?
The problem is the starting shapefile, shapefile <- shapefile("polygon_combined.shp"). I imported "polygon_combined.shp" into QGIS to plot a base layer under it easily. The image I've attached shows that it starts off in the Gulf of Mexico. With a bad location to begin with, transforming it isn't going to save it later.
I don't know the origin of the data, so I have no idea how it was produced this way to start. One way to possibly fix this is get the data from a different source. The EPA makes an ecoregions product that has several levels of regions. I think you want the level IV data to get your region. You may still need to transform it, but it'll be located in the right place to begin with. https://www.epa.gov/eco-research/ecoregion-download-files-state-region-5#pane-47

Raster increase in size when reprojected in R and QGIS

I'm using a Land cover raster of North America which is publicly available here: https://open.canada.ca/data/en/dataset/4e615eae-b90c-420b-adee-2ca35896caf6
I clipped it in R to cover Québec/Labrador:
veg <- raster("CanadaLandcover2015/CAN_LC_2015_CAL.tif")
e <- extent(c(1000000, 2700000, 500000, 2700000))#all qc
veg_qc <- crop(veg, e)
The raster is originally in projection ESPG:3978 NAD83/Canada Atlas Lambert. I wanted it to be in lat and long to be able to extract the values to datapoints.
veg_qc2 <- projectRaster(veg_qc,crs="+proj=longlat +ellps=WGS84 +no_defs")
That single line took ~12 hours to run and took over 200 GB of Temp data. Worst, there was a warning (sorry did not copy it) and only half the raster showed.
So I decided to try with the function Wrap in QGIS. Although it worked perfectly, the output raster was 16 Gb! The original clipped raster was only 693 MB.
To make things worse, I need the value layer to be included in R, so I used:
veg_qc <- getValues(veg_qc)
And I get the following error:
Error: cannot allocate vector of size 31.0 Gb
Why does the raster get bigger when reprojected?
Would there be a way to compress the raster or reproject without that giant increase of data?
How can I add the values to a big raster layer?
Ultimately, I could clip the raster further with the mcp of my data. I could also reproject my data and my other rasters in EPSG:3978 (although I am wondering if my other rasters my end up as > 10GB too).
I just had the same issue this week. After a reprojection, from 40Mb to 350Mb. I found the answer in this link Huge file size after averaging two rasters.
What I did in QGIS (not R by this time) is that I opened the following function in the Manu bar: raster > conversion > translate, and in Advanced parameters > Additional command-line parameters, I added -co COMPRESS=LZW. This will compress using the LZW method. Also, according to this link and considering your dataset, you can use PACKBITS. I hope this helps you.

Mapping species occurrences in R

I'm quite beginner at R so I'm struggling with what I've found on google for how to plot species occurrence data points in R (I know how in QGIS but my supervisors want R) and then fill in 10km or 1km grid squares where the species has occurred. The photo shows what I mean but has been produced in DMap rosemarybeetlemap
The main issue I have is that my csv file of records only has alphanumeric Ordnance Survey grid references - can R plot with these or do they need to be split into easting/northings or even decimal latitude/longitude? and if so, how?
Any help is greatly appreciated!
Locations detailed in Ordnance Survey’s National Grid (NGR) format in contrast to using eastings and northings (in metres) splits the Great Britain into lettered grid squares and then defines locations within each lettered grid square. You know but some explanation you'll find here as a beginners guide to finding grid references).
For your main issue following article maybe a solution for a first step Converting (British) National Grid references.
I quoted some info below (credits to mikerspencer and Claudia Vitolo).
There’s no need to write a script from scratch to convert grid references, someone has done it already! There is some legwork to do in getting your NGR coordinates in a format ready for the conversion. The script that follows does just that, taking a csv file as your start and end point.

Adding Boundaries to Spatial Polygons Object

I have the following SpatialPolygonsDataFrame.
require(raster)
usa <- getData('GADM', country='USA', level=2)
metro <- subset(usa, NAME_1=="Nebraska" & NAME_2 %in% c("Dodge","Douglas","Sarpy","Washington"))
plot(metro)
I would like to be able to replicate the following map boundaries (defined by the colors):
Does anyone know a good plan of attack? I realize this is a somewhat manual process. I have already downloaded all US Census files that are of a more detailed geography. I was hoping that a more detailed level of geography could be aggregated to answer the above question, but unfortunately the districts do not line up the same.
Is there a R function already out there that would be helpful in assisting this manual process? At the very minimum, I would like to be able to leverage the perimeter of the 4-county area.
Use writeOGR from the rgdal package to create a shapefile of your metro object. Then install QGIS (http://www.qgis.org/), a free and open-source GIS, and load the shapefile as a new layer.
Then you can edit the layer, add new polygons, edit lines etc, then save as a shapefile to read back into R.
Additionally, you may be able to "georeference" your image (by identifying known lat-long points on the image) and load that into QGIS as a raster layer. That makes it easier to digitise your new areas. All you need for that is a few lat-long coordinates of specific points, such as the corners of polygons or line intersections, and then QGIS has a georeferencing plugin that can do it.
I don't think you'll find any R code as suitable for digitising new geometries over an image as good as QGIS.
After half an hour (and twenty years experience, not all of which you'll need) I've got this:
I didn't precisely digitise your new boundaries though, just roughly for speed. That QGIS screen cap shows the five coloured areas under the four metro areas.
Step one was georeferencing. This screengrab shows how the PNG has been georeferenced - the red line is the metro area shapefile drawn with transparency over the PNG after the PNG has been converted to a GeoTIFF by matching control points.
Step two was then using QGIS editing tools to split, join, and create new polygons. Then I just coloured them and added labelling to pretty it up.
I could probably bundle these files all up for you to neaten, but it really doesn't take that long and you'll learn a lot from doing it. Also, this is probably a gis.stackexchange.com question...

How to use R to read Excel, create tables, get formatted tables back into GIS with coordinate locations

I'm new to R - which will be obvious in a sec here...and I was hoping someone could point me to the some packages to attempt to solve a specific problem:
I get Excel tables from scientists with analytical data for specific GIS point sample locations, we usually copy/paste these tables into the layout of GIS map documents; however quite often the line weights, fonts, etc get messed up...and the data gets updated/revised etc. - tedious copy/paste again...
I'd like to try to read these Excel files, have R create a multiple formatted tables for each sample location, and plot these tables with real world coordinates for use in GIS (ESRI or QGIS, etc.), where the tables would ideally show up offset some distance from the point sample locations in some sort of GIS file format.
I was thinking the export from R might be a .dwg, or even a raster geotiff with a transparent background...a format that would preserve formatting and position - not sure what the possibilities here could be...has anyone ever tried anything like this - I see several excel and geospatial packages, and understand that they can be used for regular geospatial data analysis, but in this case I'm trying to merge graphics (formatted tables from R) and GIS - which is something I'm having a hard time finding any info about.
Hopefully this question is not too vague...
edit - I have the SP package and am reading up on it, I guess I'm really stuck on the whole make several tables with R > get those tables all at once into a format that GIS can read - try this - imagine a georeferenced aerial photo - then imagine a layer of floating boxes on top of the aerial image, these boxes are placed with coordinates (i.e. lat/long, state plane feet, etc.) - can I make a layer like this with R and the geospatial packages?

Resources