Raster increase in size when reprojected in R and QGIS - r

I'm using a Land cover raster of North America which is publicly available here: https://open.canada.ca/data/en/dataset/4e615eae-b90c-420b-adee-2ca35896caf6
I clipped it in R to cover Québec/Labrador:
veg <- raster("CanadaLandcover2015/CAN_LC_2015_CAL.tif")
e <- extent(c(1000000, 2700000, 500000, 2700000))#all qc
veg_qc <- crop(veg, e)
The raster is originally in projection ESPG:3978 NAD83/Canada Atlas Lambert. I wanted it to be in lat and long to be able to extract the values to datapoints.
veg_qc2 <- projectRaster(veg_qc,crs="+proj=longlat +ellps=WGS84 +no_defs")
That single line took ~12 hours to run and took over 200 GB of Temp data. Worst, there was a warning (sorry did not copy it) and only half the raster showed.
So I decided to try with the function Wrap in QGIS. Although it worked perfectly, the output raster was 16 Gb! The original clipped raster was only 693 MB.
To make things worse, I need the value layer to be included in R, so I used:
veg_qc <- getValues(veg_qc)
And I get the following error:
Error: cannot allocate vector of size 31.0 Gb
Why does the raster get bigger when reprojected?
Would there be a way to compress the raster or reproject without that giant increase of data?
How can I add the values to a big raster layer?
Ultimately, I could clip the raster further with the mcp of my data. I could also reproject my data and my other rasters in EPSG:3978 (although I am wondering if my other rasters my end up as > 10GB too).

I just had the same issue this week. After a reprojection, from 40Mb to 350Mb. I found the answer in this link Huge file size after averaging two rasters.
What I did in QGIS (not R by this time) is that I opened the following function in the Manu bar: raster > conversion > translate, and in Advanced parameters > Additional command-line parameters, I added -co COMPRESS=LZW. This will compress using the LZW method. Also, according to this link and considering your dataset, you can use PACKBITS. I hope this helps you.

Related

How do I mask a multi-layer netCDF or raster with a single-layer shapefile?

I am currently working with daily precipitation data in netCDF format. The data's at a 4km resolution that covers the United States. However, I want to mask/clip the data with a much higher-resolution shapefile for a particular geographical region (about the size of a county). Ultimately, I want the output to be daily precipitation data, either at that high resolution or the original 4km resolution, for the much smaller area.
I've tried a couple different methods, with the most success using the following code:
prcp_2000 <- raster::brick('pr_2000.nc')
shapefile <- shapefile("polygon_combined.shp")
shapefile <- spTransform(shapefile, crs(prcp_2000))
prcp_2000 <- mask(prcp_2000, shapefile)
prcp_2000 <- crop(prcp_2000, shapefile)
outfile <- paste("prcp_","2000_","CS",".nc",sep="")
writeRaster(prcp_2000, outfile, overwrite=TRUE, format="CDF", varname="prcp", varunit="mm/day", longname="mm of precipitation per day", xname="lon", yname="lat", zname="day", zunit="days since 1900-01-01")
However, I keep getting nothing but infinities/negative infinities for prcp output, even though I'm still getting appropriate variable lengths otherwise (day=366, lat=24, lon=23). Am I missing something?
The problem is the starting shapefile, shapefile <- shapefile("polygon_combined.shp"). I imported "polygon_combined.shp" into QGIS to plot a base layer under it easily. The image I've attached shows that it starts off in the Gulf of Mexico. With a bad location to begin with, transforming it isn't going to save it later.
I don't know the origin of the data, so I have no idea how it was produced this way to start. One way to possibly fix this is get the data from a different source. The EPA makes an ecoregions product that has several levels of regions. I think you want the level IV data to get your region. You may still need to transform it, but it'll be located in the right place to begin with. https://www.epa.gov/eco-research/ecoregion-download-files-state-region-5#pane-47

How to get raster file from a nested raster list produced by landscapemetrics package in R?

Package landscapemetrics can calculate area of each patch for a given raster file, shape of that patch and so on. I want to have not only tibble-frame with patch metrics calculated, but a new raster where each pixel within specific patch will have a value of the area of that patch, shape indicator and so on. We can do it with function spatialize_lsm() (it produces a Large list nested object with probably RasterObject objects within):
library(landscapemetrics)
plot(podlasie_ccilc) # this raster data is provided with package
podlasie.metrics.area <- spatialize_lsm(podlasie_ccilc, what = 'lsm_p_area') # creates a list
plot(podlasie.metrics.area) # produces an error...
How to get a desirable raster file with patch metrics from that list? I guess it is a question of raster package or something else, since landscapemetrics documentation tells nothing about this step.
I not that this data and new raster do not have resolution of the pixel like in meters (30, 30 for Landsat satellite image, for example). So we cannot plot the new raster produced:
podlasie.metrics.area[[1]]
plot(podlasie.metrics.area[[1]])
So I guess landscapemetrics cannot deal with such rasters, we can even use its function to check a suitability of the prior raster for patch discovering:
check_landscape(podlasie_ccilc)
Upd. I did it for the Landsat dataset with resolution 30, 30 and it produced patch area raster, but again I cannot open/show/save as raster it, because of the same error.
Package maintainer helps to solve a problem (yes, it is just related to the structure of list):
plot(podlasie.metrics.area[[1]]$lsm_p_area)

How to calculate area of polygons from a large shapefile

Summary:
I'm trying to calculate the area of a large number of polygons in R. I've read a few posts about how I might do this (Example #1 & Example #2) but the problem I'm having is that my shapefile is too large (1.7gb) to import. Given I can't import the file, I can't calculate the area of the polygons.
Extended Explanation:
I'm actually trying to calculate the area of properties in Victoria, Australia. The polygons represent these properties. I downloaded the simplified models 1 and 2 of VicMaps from Spatial Datamart for all of Victoria.
However, given the size of the shapefiles, I had to narrow my search to just one local government area (LGA) and calculated the polygon areas (just for testing). The shapefile was 15.5MB.
library(raster)
x <- shapefile("D:/Downloads/SDM616230/ll_gda94/shape/lga_polygon/ballarat/VMPROP/PROPERTY_PRIMARY_APPROVED.shp")
crs(x)
x$area_sqkm <- area(x) / 1000000
This worked but its not a practical solution to my problem given there's many LGAs in Victoria and I plan to eventually follow the same process for Queensland and NSW.
However, trying to load a larger shapefile doesn't work and results in the below error code "Error: memory exhausted (limit reached?)".
I've tried using readShapePoly, readogr, st_read and read_sf to get the large shapefile into R but they don't work. I think the file is just too large. I tried using a select query within read_sf in an effort to reduce the size of the file I was reading but that didn't work either. I've read online that I should seek to split the shapefile into just the data I need to reduce the size - but I have no idea how to do that.
Hope you can help.
Obviously the file is too big for a single box. I think the options then are either
1) split the files into smaller ones, process one by one. See
https://gis.stackexchange.com/questions/195508/split-a-shapefile-into-smaller-files-on-linux-command-line
2) use some dbms or data warehouse to do it, they do such batching automatically.

writeRaster function in R is automatically setting (unwanted) maximum value, can I set the max value to null?

I am running into a problem with the "writeRaster" function in the raster package in R. I am importing a raster (TIF) that I made in ArcGIS (a distance to feature raster).
My goal was to resample the distance raster to the correct resolution and extent, then "mask" it with the appropriate raster to crop it to the shape I require. When I check the results of the mask with the basic plot function, everything looks great and I can see that each pixel in the new masked raster has a distance value.
However, when I write this raster to a file using the writeRaster function, the resulting raster looks like "swiss cheese" and has missing values for any distance over 35km. After much reading, I cannot find any documentation to suggest that there is a way to modify the maximum value set by writeRaster---or that it should even be setting a max value. I have included my code and the basic plots below. A big thank you to anyone who attempts to help me with this!
#Read in distance to fresh water raster
distFW <- raster("D:/Academia/Arc Data/Grackle/NicaCR_90mlayers/dist_FW.tif")
[plot(distFW)][1]
#Resample this layer to the desired resolution and template
NiCR_DistFW<-as.integer(resample(distFW,NiCRrast.tmpl,method="ngb"))
#essentially the same as the first plot
[plot(NiCR_DistFW)][2]
#Mask the resampled raster to the desired shape
NiCR.DistFW.mask.utm <- mask(NiCR_DistFW,NiCR_Mask) #with CA countries cut out.
[plot(NiCR.DistFW.mask.utm)][3]
#write raster to file (this is where things get weird)
writeRaster(x=NiCR.DistFW.mask.utm, filename='DistFWmask2.tif', format='GTiff', datatype='INT2S') #a way to ensure INT2S
#read the newly written raster file in to R so we can review it
dFW <-raster("DistFWMask2.tif")
[plot(dFW)_writeRaster_result][4]
[1]: https://i.stack.imgur.com/v9RkK.jpg
[2]: https://i.stack.imgur.com/v2DG3.jpg
[3]: https://i.stack.imgur.com/cCwJe.jpg
[4]: https://i.stack.imgur.com/MjWj7.jpg
As you can see from plot 4, an undesirable max value has been set. I was the raster I write to file to look like the one in plot 3, not plot 4.
Thanks in advance for any advice.
Well friends, after taking an hour to detail my question I managed to figure out the answer myself. It had to do with setting the datatype.
INT2S has a maximum value of 32,767
by switching it to INT4S, I capture the full range of values in my raster.
Problem solved!

cropped shapefile leads to different results in same extent

I cropped a shapefile to a smaller extent (my AOI - area of interest) - to work with a reduced working directory. During my workflow I rasterize the shapefile.
Here is my problem: I saved both of my shapefiles (smaller and bigger) to compare the rasterized results (which should be the same as the underlying raster has the same extent (AOI) as the smaller shapefile (obviously as well my AOI)).
But unfortunately they are not. CRS and number o cells are identical - but for example number of NAs not.
I did the same procedure and workflow with synthetic data and it worked perfectly - so the problem has to be my data maybe. Here is the dropboxlink where you can download the shapefile and the raster. https://www.dropbox.com/sh/btgt2rc7uzawtx5/AADJ2YrKOnPh8gM-PPF7rmIQa?dl=0
I leave you my code here:
#load shp files
setwd("C:/Users/.../R")
TESTshp<-readOGR(dsn="test_crop_dropbox", layer="boden_ebod_reproj")
extent(TESTshp)
setwd("C:/Users/.../R/test_crop_dropbox")
TESTraster<-raster("testraster.tif")
extent(TESTraster)
TESTshp_small <- crop(TESTshp, extent(TESTraster))
TESTrasterize<- rasterize(TESTshp, TESTraster, field="BodTyp_gen")
TESTrasterize_small<- rasterize(TESTshp_small, TESTraster, field="BodTyp_gen")
identical(TESTrasterize, TESTrasterize_small)
Do you have any suggestion what could be wrong?
Thanks a lot!

Resources