Create polygon from large raster in R - r

I have a large raster file (5GB) containing only 1's and NA's. I would like to convert this into a multipolygon of the areas with 1's, with adjacent cells dissolved into one polygon.
I have imported the file to R using
r = raster::raster(my_filename)
r
class : RasterLayer
dimensions : 17452, 45000, 785340000 (nrow, ncol, ncell)
resolution : 0.008, 0.008 (x, y)
extent : -180, 180, -55.9875, 83.6285 (xmin, xmax, ymin, ymax)
crs : NA
source : C://...binary_X01_januarysnow.asc
names : binary_X01_januarysnow
and
I have tried several methods to create the polygon:
rasterToPolygons from raster with dissolve==TRUE option (R crashes)
isoband from the isoband package (R crashes),
Both of the approaches have worked as expected when I've tried them on a subset of my raster covering appr. the area of Spain, so I assume the problem is only with the size of the data and not my code.
Then I have tried to read my raster with read_stars, and use stars::st_as_sf(st, as_points = FALSE, merge = TRUE, connect8 = TRUE). This returned an empty polygon, possibly because the file was read as a stars proxy object, but I'm not sure, I couldn't find any information about that online.
Then I have force-read the raster as stars and not as stars proxy by using read_stars(my_filename, proxy=FALSE) and have tried to use the st_as_sf command as above but got the message "Error: cannot allocate vector of size 2.9 Gb"
I know that in the worst case I can probably just decrease the raster resolution and therefore size and will be able to create the polygons I want (but with less precise resolution), but I was wondering if anyone has another suggestion I could try? Both the 1's and NA's are located in large continous areas, so it would be enough to have high resolution on the edges, if that helps.
PS This is my very first question on StackOverflow so I apologize if my problem is not clearly described. I don't know how to provide a reproducible example of a large dataset.

What you are looking for is as.polygons() from the terra package, the raster package's successor. terra handles large data sets better than raster does.

Related

Using R to obtain slope raster from DEM GRID raster

After some extensive googling, I wasn't able to find my answer (first time I couldn't surmount the issue by looking at others questions/answers). I am new to asking questions, so forgive any missteps.
I am attempting to perform what ArcGIS or QGIS performs with the slope tool, just within R. To do so I have been importing a raster that I exported from ArcGIS in GRID format with the following characteristics:
class : RasterLayer
dimensions : 821, 581, 477001 (nrow, ncol, ncell)
resolution : 4.996121, 4.996121 (x, y)
extent : 2832147, 2835049, 14234048, 14238150 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=34.75 +lon_0=-118.583333333333 +k=0.9999 +x_0=800000.000000001 +y_0=3999999.99999999 +datum=NAD83 +units=us-ft +no_defs
source : rr_2020_shell
names : rr_2020_shell
values : 5623.253, 6401.356 (min, max)
It is already projected in the correct coordinate system (EPSG: 3423) but when I go to find the slope using the following code:
RR_2020_Slope = terrain(RR_2020_St1_Raster,'slope', units = 'degrees', neighbors = 8, filename = 'RR_2020_Slope.grd', overwrite = T)
The result is a slope raster that ranges from 0 to 1.28°, which is very different from what I have calculated in ArcGIS using the slope tool. Using the same DEM raster in the same projection in ArcGIS I used the slope tool with an input of 'Degree' for the output measurement, 'Planar' for the method, and 1 for Z factor and my resulting slope raster ranges from 0.001 to 73.396°.
Overall I am wondering where my mistake in R originates from, is it an elevation resolution problem? Are there issues with my projection? Forgive me, I can't necessarily include the data as they are sensitive materials but perhaps there is a clear and obvious mistake in my approach or assumptions about the functions I have used?
The only red flag I see is that you say "it is already projected in the correct coordinate system". Projecting raster data degrades the quality. As cell values get smoothed, the slopes will get smaller. This may be particularly pronounced if the relief is at the scale of the cell size (e.g. sand dunes vs mountain chains). Have you compared with what you get with the original data?
Another source of error could be that the units of the values are different from the units of the coordinate reference system. But it would appear that in your case both are in feet.
Can you also try this with terra::terrain()?

Improving computational speed of Zonal Statistics on 150gb+ of raster tiles in R (terra package and vrt)

I'm working in R - I have a directory of raster (.tif) tiles at 60cm resolution, downloaded from Google Earth Engine (NAIP 2018 NDVI). (I am running my analysis on pc rather than in Google Earth Engine due to human subjects requirements of my polygon data.) The 52 tiles are each 1.2-3.8GB in size. I also have 982 polygons, for which I'm trying to calculate the zonal means from these rasters. My code (below) uses the terra package, and instead of mosaic-ing the tiles into a very large singular raster, I've chosen to create a VRT (virtual raster) file.
I am running this code on a Xeon Gold 6134 # 3.2GHZ and have 128GB of ram. No matter what I set my terraOptions() to, R doesn't even come close to using a significant proportion of my processor or ram potential.
With this code, all 982 polygons will take 11.8 days to run. I would GREATLY appreciate if anyone could point me at specific tricks/tools that I may not have already tried (e.g., I've tried working with all the terraOptions, I've tried the raster package and the exact_extract package. The exact_extract() function won't work for me as I am using a SpatRaster/VRT and a sf polygon object as inputs - again to avoid mosaic-ing a very large singular raster.)
Thank you. (I apologize that I cannot share data, as its either too large or human subjects related...) Here is the un-looped code:
Edit: 52 tiles of 1.2-3.8GB EACH. My original quote of 150GB total directory size was incorrect as this was the compressed size in ArcGIS.
library(terra)
c <- "path/to directory of raster tiles"
v <- "path/new.vrt" # name of virtual raster
ras_lst <- list.files(c, full.names=T, pattern=".tif$")
terra::vrt(ras_lst, v, overwrite = T)
ras <- rast(v)
w <- vect("path to polygon shapefile")
w2 <- terra::project(w, terra::crs(ras)) # transform proj to same as raster tiles
e2 <- terra::extract(ras, w2, fun="mean")
e2 # zonal mean value for 1 polygon (of 982)
show(ras) produces:
class : SpatRaster
dimensions : 212556, 247793, 1 (nrow, ncol, nlyr)
resolution : 5.389892e-06, 5.389892e-06 (x, y)
extent : -118.964, -117.6284, 33.68984, 34.8355 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs
source : naip2018mos.vrt
name : naip2018mos

Plotting white for zero value in R

I have a rasterstack
class : RasterStack
dimensions : 47, 89, 4183, 6 (nrow, ncol, ncell, nlayers)
resolution : 0.5, 0.5 (x, y)
extent : 60.75, 105.25, 15.75, 39.25 (xmin, xmax, ymin, ymax)
crs : NA
names : VegC.1, layer.1, VegC.2, layer.2, VegC.3, layer.3
min values : 0.00000, -11.52596, 0.00000, -11.51896, 0.00000, -11.49996
max values : 21.14100, 16.52118, 18.85200, 16.69225, 23.08900, 20.25300
I plot the stack with same color scheme. However I want the 0 value to be white. i use the colorscheme
cols <- inlmisc::GetColors(scheme = "BuRd",n = 256)
But the plot shows blue on the zero value. Is there an easy way for fixing color scale for raster stacks?
I answer based on the following assumptions:
You are using the raster package
You are using the plot() function of the raster package to generate the plot
(next time, please indicate both the packages and the functions you are using, and even better, provide code with a reproducible example --it doesn't have to be based on your data, just a small dataset that you can use to show your problem)
Most of the times your (very natural!) wish of mapping the zero value to the middle of the palette --even if the data are not symmetric around zero-- is fulfilled by using a parameter in the plotting function. See for instance this post on the pheatmap() function where the breaks parameter is used:
Set 0-point for pheatmap in R
That said, from the documentation of the plot() function of the raster package we see that it accepts a ... parameter which can receive any parameter that is accepted by the image.plot() function of the fields package, as indicated in the Details section of the plot() function:
Details
Most of the code for the plot function for a single Raster* object was taken from image.plot (fields package).")
In the documentation of the image.plot() function of the fields package we read:
breaks Break points in sorted order to indicate the intervals for assigning the colors.
Note that if there are nlevel colors there should be (nlevel+1) breakpoints. If
breaks is not specified (nlevel+1) equally spaced breaks are created where the
first and last bin have their midpoints at the minimum and maximum values in z
or at zlim
You can read the answer for Set 0-point for pheatmap in R on how to set the breaks so that the 0 value is mapped to the middle of the color palette you chose (i.e. white).
Note: I cannot give you a working example because I don't have the raster package installed.

random sampling from a large raster using clusterR

I need to take a 5% random sample from a very large raster and return a new raster. I am trying to use sampleRandom from the raster package, but the process is very slow (I only have 8GB RAM on my machine, running 64-bit R). The raster has been cropped/masked to match an irregularly shaped study area boundary, as well - so has NA values in the rectangular extent around the polygon boundary and some internal NA values - I'm trying to sample only from the non-NA values. I've tried both sampling 5% and reversing that to sampling 95% - both ran for >2 hours without producing a result, at which point I terminated the process.
I am trying to speed it up by running it in parallel using the clusterR command, but I'm new to both the sampleRandom command and to using clusterR. My code runs, but I get all of the non-NA pixels returned, so the sample doesn't seem to working. Is this a problem with my code or is it that sampleRandom can't run with clusterR?
Here is a description of my raster layer:
conv.mod
class : RasterLayer
dimensions : 23828, 19095, 454995660 (nrow, ncol, ncell)
resolution : 56, 56 (x, y)
extent : -1220192, -150872, 87580, 1421948 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=aea +lat_1=44.75 +lat_2=55.75 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
data source : C:\GIS\carbon_cows\Intact\conv_mod.tif
names : conv_mod
values : 1, 1 (min, max)
And here is the code I have tried:
library(raster)
library(parallel)
tot<-cellStats(conv.mod,'sum', na.rm=TRUE) #get the total pixels in conv.mod
sampsize<-tot * 0.05 #calculate how many pixels would represent 5%
removeTmpFiles() #clear some memory
numcores<-detectCores() -1
start<-Sys.time()
beginCluster(numcores)
cl<-getCluster()
clusterExport(cl,"sampsize", envir = .GlobalEnv)
conv.perc <- clusterR(conv.mod,sampleRandom,args=list(size=sampsize,na.rm=TRUE,asRaster=TRUE))
endCluster()
end<-Sys.time()
difftime(end,start)
Here are the total non-NA cells in the original raster layer:
tot<-cellStats(conv.mod,'sum', na.rm=TRUE)
tot
105193858
and the number that should be a 5% sample:
sampsize<-tot * 0.05
sampsize
5259693
But, the resulting raster has the same number of non-NA pixels as the original raster:
tot_convperc<-cellStats(conv.perc,'sum',na.rm=T)
tot_convperc
105193858
I've also tried reversing the sample size calculation and running sampleRandom, so that I'm requesting a 95% sample. But, I get the same result.
I'd appreciate any help in understanding why this code is not running as expected. Thanks!
Never mind. I was able to take advantage of this post: https://gis.stackexchange.com/questions/17255/random-sampling-of-raster-using-r and the reply by whuber.
The following code solved my problem, without the use of a cluster:
col.conv <- ncol(conv.mod)
row.conv<-nrow(conv.mod)
r<-conv.mod
start<-Sys.time()
r[runif(col.conv*row.conv) >= 0.95] <- NA # Randomly *unselect* 5% of the data
end<-Sys.time()
difftime(end,start)
That code ran in ~3 minutes, as opposed to over an hour for putting the simpleRandom code in the clusterR command. I still wonder why simpleRandom was not actually taking a sample and also why this new code is so much more efficient, but happy to have the problem solved.

Resolution values for Rasters in R

I was just wondering if anyone has a conversion table for the resolution of rasters in R. I would like to know what numbers like these refer to in meters:
resolution : 0.08333333, 0.08333333
resolution : 0.009398496, 0.009398496
resolution : 0.002349, 0.002349 = 250m (I think)
I would really like to know what resolution to set a raster object to make cell sizes 1km^2. I am using rasters that span the country of Australia.
Thanks in advance everyone.
Cheers,
Adam
It all depends on the units of your raster, and that depends on the projection. Rasters might not even be square grids in metres - they might be square in degrees which aren't square in metres!
1 degree longitude at the equator is 1/360 of the earth's circumference. Near the north pole 1 degree is a much smaller distance, and at the pole its pretty much zero. Degrees of latitude however are constant.
You could take the corner points of your raster, convert them to lat-long coordinates if not already, and then work out the great-circle distance between them (there's an rdist function somewhere that does this I recall). However this won't work if your raster spans the whole globe, since then your NW corner and your NE corner are at the same point... Ummm. Anyway, the answer is... 42.
If you want to make 1km rasters of Australia then.... you need a coordinate system of Australia in kilometres. In the UK we have a system called the OSGB National Grid, which is close enough to a metric grid. Australia might be trickier because it is slightly bigger than the UK... So Australia seems to have a few grid systems. See here:
http://www.spatialreference.org/ref/?search=AGD84
So you might want to use the system that is in the middle of the country to avoid the worst distortions, then work out the bounds of Australia in lat-long, convert to epsg:20353 and create a raster based on that:
In lat-long I reckon Australia is roughly:
> xtll
[,1] [,2]
[1,] 112.5162 -43.906900
[2,] 155.8425 -7.585619
make this into a SpatialPoints object:
> xtll=SpatialPoints(xtll,CRS("+init=epsg:4326"))
convert to that AGD84 in the middle of the country:
> spTransform(xtll,CRS("+init=epsg:20353"))
SpatialPoints:
coords.x1 coords.x2
[1,] -1306200 4886041
[2,] 2849956 9103124
Make a raster extent object rounded to km:
> ext = extent(-1306000,2850000,4886000,9103000)
How many rows and columns do we need?
> length(-1306:2850)
[1] 4157
> length(4886:9103)
[1] 4218
Create a raster:
> r = raster(ext,ncol=4156,nrow=4217,crs="+init=epsg:20353")
> r
class : RasterLayer
dimensions : 4217, 4156, 17525852 (nrow, ncol, ncell)
resolution : 1000, 1000 (x, y)
extent : -1306000, 2850000, 4886000, 9103000 (xmin, xmax, ymin, ymax)
coord. ref. : +init=epsg:20353 +proj=utm +zone=53 +south +ellps=aust_SA +units=m +no_defs
values : none
Note that the ncol and nrow values are one less than the values from the bounds - this would be a fencepost error to put those values in.
See how my resolution is 1000? This is a 1km grid. The problem is that this is possibly going to be a bit distorted on the coasts. You could work out how distorted by converting to lat-long (epsg:4326), then to the proper AGD zone for points on the coast, and seeing how different they are. They might be very close, except for an offset.
Anyway, nuff said.
Finding the meta-data that gives meaning to your raster can be a bit of a challenge. I have spent lots of time hunting for this. If the raster was published by a government agency, then I would hope that this information is posted somewhere prominently.
The good news is that once you know the projection used on the various rasters, you can convert them to a common projection using projectRaster() in the raster package. You need to find the proj.4 string describing the original and the desired projections in each case. You can get this from: http://www.spatialreference.org.
When you know your projection, then the resolution information you seek will have meaning.

Resources