Aggregate large raster to raster with lower resolution via mean (R) - r

I have a fairly large raster file with a resolution of 0.9 arcsec x 0.9 arcsec carrying values between 0 and 100 (and 255 for NA) called forest1. I want to aggregate this raster to the resolution of another raster (called dfr_2010_crop) which has a 0.5° x 0.5° resolution using the mean of values. Unfortunately, the strategy I am currently using requires too much memory. Namely, I am using
TreeCover = rasterToPoints(forest1, fun=NULL, spatial=T)
TreeCoverPercent <- rasterize(TreeCover, dfr_2010_crop, fun=function(x,...) {sum(x, na.rm=T)/(4*10^4)}, field=g )
whereby g is the correct field I have saved before. 4*10^4 is the number of 0.9 arcsec x 0.9 arcsec in a 0.5° x 0.5° cell. R tells me that he cannot allocate a vector of the size of 7.9GB after running the first line. I have tried to solve this problem in the following ways:
rasterOptions(maxmemory=1e+08)
And, after this did not work I have tried to work in blocks. I tried following the approach given here ([https://strimas.com/post/processing-large-rasters-in-r/][1]) where they use the calc() function when working in blocks. However, I failed to customite it to my setting as I do not know how to call the blocks as raster files inside of the loop. However, here is my try:
canProcessInMemory(forest1, 1, TRUE)
#working in block
f_in <- f
f_out <- tempfile(fileext = ".tif")
# input and output rasters
r_in <- stack(f_in)
r_out <- raster(r_in)
# blocks
b <- blockSize(r_in)
print(b)
r_in <- readStart(r_in)
r_out <- writeStart(r_out, filename = f_out)
# loop over blocks
for (i in seq_along(b$row)) {
# read values for block
# format is a matrix with rows the cells values and columns the layers
v <- getValues(r_in, row = b$row[i], nrows = b$nrows[i])
# mean cell value across layers
v <- rasterToPoints(v, fun=NULL, spatial=T)
# write to output file
r_out <- writeValues(r_out, v, b$row[i])
}
# close files
r_out <- writeStop(r_out)
r_in <- readStop(r_in)
Looking forward to any suggestions and thanks for your help.

The ratio of your resolutions turns out to be an exact integer:
res1 = 0.9/(60*60) # resolution converted to degrees
res2 = 0.5
res.factor = res2 / res1
res.factor
# [1] 2000
You can double check this with you actual rasters using res.factor = res(forest1) / res(dfr_2010_crop) - I can't do that because you did not provide a reproducible example.
This means that you can simply use raster::aggregate to change the resolution.
TreeCoverPercent = aggregate(forest1, res.factor)
In case your res.factor was not a precise integer, then you can still use this method by rounding to the nearest integer, followed by resampling to the final desired resolution.
TreeCoverPercent = aggregate(forest1, round(res.factor))
TreeCoverPercent = resample(TreeCoverPercent, dfr_2010_crop)

Related

raster calculation with condition of each cell by layers in R

I have stack raster dataset with several layers, however, I want to calculate the sum of each cell with for different layer selection, and finally generate a new layer, anyone has some good suggestion by using calc or overlay or some other raster calculation in R?
I can do by loops and make the calculation, but it will consume many times when I have many layers, and also use many of the storage, my script as follows,
## library(raster)
make_calc <- function(rr, start, end) {
rr <- as.array(rr)
start <- as.array(start)
end <- as.array(end)
dms <- dim(raster)
tmp <- array(NA, dim = dms[1:2])
for (i in 1:dms[1]) {
for (j in 1:dms[2]) {
tmp[i,j] <- sum(raster[i,j,start[i,j,1]:end[i,j,1]], na.rm = TRUE)
}
}
return(tmp)
}
rr <- raster(res = 10)
rr[] <- 1
rr <- stack(rr, rr, rr, rr)
start <- raster(res = 10)
start[] <- sample(1:2, ncell(start), replace = TRUE)
end <- raster(res = 10)
end[] <- sample(3:4, ncell(end), replace = TRUE)
result <- make_calc(rr, start, end)
Why are you coercing into arrays? You can easily collapse a raster into a vector but, that does not even seem necessary here. In the future, please try to be more clear on what your expected outcome is.
Based on your code, I really don't know what you are getting at. I am going to take a few guesses on summing specified rasters in the stack, drawing a random sample, across rasters to be summed and finally, drawing a random sample of cells to be summed.
For a sum on specified rasters in a stack, you can just index what you are after in the stack using a double bracket. In this case, rasters 1 and 3 in the stack would be the only ones summed.
library(raster)
rr <- raster(res = 10)
rr[] <- 1
rr <- stack(rr, rr, rr, rr)
( sum_1_3 <- calc(rr[[c(1,3)]], sum) )
If you are wanting a random sample of the values across rasters, for every cell, you could write a function that is passed to calc. Here is an example that grabs a random sample of n size, across the raster layers values and sums them.
rs.sum <- function(x, n=2) {sum( x[sample(1:length(x),n)], na.rm=TRUE)}
rs.sum.raster <- calc(rr, rs.sum)
If you are wanting to apply a function to a limited random selection of cells, you could create a random sample of the raster that would be used as an index. Here we create a random sample of cells, create an empty raster and pipe the sum of rasters 1 and 2 (in the stack) based on the random sample cell index. A raster in the stack is indexed using the double bracket and the raster values are indexed using a single bracket so, for raster 1 in the stack with limiting to the values in the random sample you would use: rr[[1]][rs]
rs <- sample(1:ncell(rr[[1]]), 300)
r.sum <- rr[[1]]
r.sum[] <- NA
r.sum[rs] <- rr[[1]][rs] + rr[[2]][rs]
plot(r.sum)

Reduce memory usage for mosaic on large list of rasters

I am using the mosaic function in the raster package to combine a long (11,000 files) list of rasters using the approach suggested by #RobertH here.
rlist <- sapply(list_names)
rlist$fun <- mean
rlist$na.rm <- TRUE
x <- do.call(mosaic, rlist)
As you might imagine, this eventually overruns my available memory (on several different machines and computing clusters). My question is: Is there a way to reduce the memory usage of either mosaic or do.call? I've tried altering maxmemory in rasterOptions(), but that does not seem to help. Processing the rasters in smaller batches seems problematic because the rasters may be spatially disjunct (i.e., sequential raster files may be located very far from each other). Thanks in advance for any help you can give.
Rather than loading all rasters into memory at once (in the mosaic() call), can you process them one at a time? That way, you have your mosaic that updates each time you bring one more raster into memory, but then you can get rid of the new raster and just keep the continuously updating mosaic raster.
Assuming that your rlist object is a list of rasters, I'm thinking of something like:
Pseudocode
Initialize an updating_raster object as the first raster in the list
Loop through each raster in the list in turn, starting from the 2nd raster
Read the ith raster into memory called next_raster
Update the updating_raster object by overwriting it with the mosaic of itself and the next raster using a weighted mean
R code
Testing with the code in the mosaic() help file example...
First generate some rasters and use the standard mosaic method.
library(raster)
r <- raster(ncol=100, nrow=100)
r1 <- crop(r, extent(-10, 11, -10, 11))
r2 <- crop(r, extent(0, 20, 0, 20))
r3 <- crop(r, extent(9, 30, 9, 30))
r1[] <- 1:ncell(r1)
r2[] <- 1:ncell(r2)
r3[] <- 1:ncell(r3)
m1 <- mosaic(r1, r2, r3, fun=mean)
Put the rasters in a list so they are in a similar format as I think you have.
rlist <- list(r1, r2, r3)
Because of the NA handling of the weighted.mean() function, I opted to create the same effect by breaking down the summation and the division into distinct steps...
First initialize the summation raster:
updating_sum_raster <- rlist[[1]]
Then initialize the "counter" raster. This will represent the number of rasters that went into mosaicking at each pixel. It starts as a 1 in all cells that aren't NA. It should properly handle NAs such that it only will increment for a given pixel if a non-NA value was added to the updating sum.
updating_counter_raster <- updating_sum_raster
updating_counter_raster[!is.na(updating_counter_raster)] <- 1
Here's the loop that doesn't require all rasters to be in memory at once. The counter raster for the raster being added to the mosaic has a value of 1 only in the cells that aren't NA. The counter is updated by summing the current counter raster and the updating counter raster. The total sum is updated by summing the current raster values and the updating raster values.
for (i in 2:length(rlist)) {
next_sum_raster <- rlist[[i]]
next_counter_raster <- next_sum_raster
next_counter_raster[!is.na(next_counter_raster)] <- 1
updating_sum_raster <- mosaic(x = updating_sum_raster, y = next_sum_raster, fun = sum)
updating_counter_raster <- mosaic(updating_counter_raster, next_counter_raster, fun = sum)
}
m2 <- updating_sum_raster / updating_counter_raster
The values here seem to match the use of the mosaic() function
identical(values(m1), values(m2))
> TRUE
But the rasters themselves aren't identical:
identical(m1, m2)
> FALSE
Not totally sure why, but maybe this gets you closer?
Perhaps compareRaster() is a better way to check:
compareRaster(m1, m2)
> TRUE
Hooray!
Here's a plot!
plot(m1)
text(m1, digits = 2)
plot(m2)
text(m2, digits = 2)
A bit more digging in the weeds...
From the mosaic.R file:
It looks like the mosaic() function initializes a matrix called v to populate with the values from all the cells in all the rasters in the list. The number of rows in matrix v is the number of cells in the output raster (based on the full mosaicked extent and resolution), and the number of columns is the number of rasters to be mosaicked (11,000) in your case. Maybe you're running into the limits of matrix creation in R?
With a 1000 x 1000 raster (1e6 pixels), the v matrix of NAs takes up 41 GB. How big do you expect your final mosaicked raster to be?
r <- raster(ncol=1e3, nrow=1e3)
x <- 11000
v <- matrix(NA, nrow=ncell(r), ncol=x)
format(object.size(v), units = "GB")
[1] "41 Gb"

iteratively search and reclass pixels with smallest value in a raster

I need to create an function that will:
search for the pixel in a raster containing the smallest value;
in the first iteration, assign all the pixels within the radius equal to raster the cell size (2.5km), a value of 1 (including the pixel with the smallest value)
in the second iteration, select the pixel with the next smallest value (excluding pixels selected in step ii) and search the same radius and assign these a value of 2. This continues untill there are no more pixels left (if there are no free pixels within the radius, the selection stops)
Sounds complex but hopefully possible? Here is an example of my raster:
xy <- matrix(pnorm(900,40, 200),30,30)image(xy)
rast <- raster(xy)
# Give it lat/lon coords for 36-37°E, 3-2°S
extent(rast) <- c(36,37,-3,-2)
Perhaps you can use the below. I would not try this on very large rasters (that will take forever). But for your example it works fine --- if you do not have to do this too many times.
library(raster)
set.seed(0)
xy <- matrix(rnorm(900, 40, 200),30 , 30)
r <- raster(xy)
extent(r) <- c(36,37,-3,-2)
rorig <- r
x <- r
i <- 1
while (TRUE) {
# cell with min value
m <- which.min(x)
## are there, and do you care about ties? Do they have the same level?
## If not, you can do
## m[1]
## or sample
## m <- sample(m, 1)
# focal and four adjacent cells
a <- adjacent(r, m, 4, FALSE, include=TRUE)
# exclude those that have already been affected
w <- which(!is.na(x[a]))
a <- a[w]
# assign the value
r[a] <- i
# set assigned cells to NA
x[a] <- NA
# stop when done
if (is.na(maxValue(x))) break
i <- i + 1
}
plot(r)
plot(rorig, r)

AdehabitatHR kerneloverlaphr

Hope someone can help. I have a large data set which includes 100 runs of random data for 10 animals. So far I have created an EstUD by stacking each of the runs to get a sum of utilised distribution. I would now like to compare each animals home range using kerneloverlaphr. Unfortunatley I get the error, In vi * vj : longer object length is not a multiple of shorter object length. I think it is because the grids are not all the same. Is there a way to convert the grids so they are all the same then I can estimate overlap please? The first part of the code I have run ten times, once for each animal. I'm sure this could be done in a loop too but not sure how.
#Part1: generate 10 estUD's 1 per animal
bat.master <- read.csv("C:/Users/a6915409/Dropbox/Wallington GIS/bat.master")
xybat <- subset(bat.master, bat.master$id == "H1608b",select=x:loopno )
#change to spatial points
xy <- xybat[1:2]#first two rows save as coords
df <- xybat[-1:-3]#remove unneded columns for ud
SPDF <- SpatialPointsDataFrame(coords=xy, data=df)#combine df and xy
udHR <- kernelUD(SPDF, h = "href", same4all = TRUE, kern = "bivnorm")
## I would proceed using the raster packages
library(raster)
ud <- stack(lapply(udHR, raster))
## take the sum
plot(udm <- sum(ud))
H1608b <- udHR[[1]]
H1608b#grid <- as(udm, "GridTopology")
# Part 2:
#combine all Ud's into one dataset
liud <- list(Y2889a, Y2889a, Y2850a, Y2850b, H1670a, H1670b, H1659a, H1659b,H1608a, H1608b)
class(liud) <- "estUDm"
image(liud)#plot all est ud's
Over<-kerneloverlaphr(liud, method="UDOI", percent= 90)
error: In vi * vj : longer object length is not a multiple of shorter object length
You need to estimate the kernelUD using the argument same4all=T. You will eliminate the problems regarding overlapping calculations.

R filter raster using focal() with threshold - defining correct function

I have the issue of defining the right function for my task. For filtering an image I have set up a filter matrix eucdis with
library(rgdal)
library(raster)
refm=matrix(1,nrow=11,ncol=11)
M = dim(refm)[1]
N = dim(refm)[2]
eucdis = matrix(NaN, nrow=11, ncol=11)
for (i in -5:5){
for (j in -5:5){
eucdis[i+6,j+6] = 2*(sqrt(sum(abs(0-i)^2+abs(0-j)^2))) #euclidean distance of the moving matrix
eucdis[6,6]=1
eucdis[eucdis>10]=0
eucdis[eucdis>0]=1
}
}
Using the example raster
f <- system.file("external/test.grd", package="raster")
f
r <- raster(f)
I want to filter all values of that raster that have a certain value, say 200 within 10% (=8) of the moving eucdis filter matrix
s=focal(x=r,w=eucdis,fun=function(w) {if (length(w[w==1])>=8) {s=1} else {s=0}})
But this only gives me all values where the eucdis filter matrix has at least 8 pixel with any values of r. If I add the constraint about r[r>=200] it is not working as I thought it would. It is not taking the second constraint into account.
s=focal(x=r,w=eucdis,fun=function(w,x) {
if (length(w[w==1])>=8 | x[x>=200]){s=1} else {s=0}})
# I also tried & and &&
If anyone can help me please. I have spend days already and can't figure it out my self.
Thank you,
Anne
The function passed to focal doesn't refer to the weights matrix. Rather, it refers to the cells of r that fall within the moving window (and these cells' relative contribution to the function's return value is controlled by the weights matrix). So, where you have used function(w) {if (length(w[w==1])>=8) 1 else 0}, you're actually saying that you want to return 1 if the focal subset of r has at least 8 cells with value equal to 1 (and return 0 otherwise).
One to achieve your aim is to perform a focal sum on a binary raster that has been thresholded at 200. The function that you would apply to the moving window would be sum, and the output of this focal sum would indicate the number of cells of your thresholded raster that have value 1 (this corresponds to the number of cells of r that have value >= 200 within the moving window).
library(raster)
r <- raster(system.file("external/test.grd", package="raster"))
m <- matrix(2 * pointDistance(expand.grid(-5:5, -5:5), c(0, 0), lonlat=FALSE),
ncol=11, nrow=11)
m <- m <= 10
r2 <- focal(r >= 200, m, sum, na.rm=TRUE, pad=TRUE)
plot(r2)
You can then check which cells of that raster have value >= 8.
r3 <- r2 >= 8
plot(r3)
In this case, almost all the cells meet your criteria.

Resources