raster calculation with condition of each cell by layers in R - r

I have stack raster dataset with several layers, however, I want to calculate the sum of each cell with for different layer selection, and finally generate a new layer, anyone has some good suggestion by using calc or overlay or some other raster calculation in R?
I can do by loops and make the calculation, but it will consume many times when I have many layers, and also use many of the storage, my script as follows,
## library(raster)
make_calc <- function(rr, start, end) {
rr <- as.array(rr)
start <- as.array(start)
end <- as.array(end)
dms <- dim(raster)
tmp <- array(NA, dim = dms[1:2])
for (i in 1:dms[1]) {
for (j in 1:dms[2]) {
tmp[i,j] <- sum(raster[i,j,start[i,j,1]:end[i,j,1]], na.rm = TRUE)
}
}
return(tmp)
}
rr <- raster(res = 10)
rr[] <- 1
rr <- stack(rr, rr, rr, rr)
start <- raster(res = 10)
start[] <- sample(1:2, ncell(start), replace = TRUE)
end <- raster(res = 10)
end[] <- sample(3:4, ncell(end), replace = TRUE)
result <- make_calc(rr, start, end)

Why are you coercing into arrays? You can easily collapse a raster into a vector but, that does not even seem necessary here. In the future, please try to be more clear on what your expected outcome is.
Based on your code, I really don't know what you are getting at. I am going to take a few guesses on summing specified rasters in the stack, drawing a random sample, across rasters to be summed and finally, drawing a random sample of cells to be summed.
For a sum on specified rasters in a stack, you can just index what you are after in the stack using a double bracket. In this case, rasters 1 and 3 in the stack would be the only ones summed.
library(raster)
rr <- raster(res = 10)
rr[] <- 1
rr <- stack(rr, rr, rr, rr)
( sum_1_3 <- calc(rr[[c(1,3)]], sum) )
If you are wanting a random sample of the values across rasters, for every cell, you could write a function that is passed to calc. Here is an example that grabs a random sample of n size, across the raster layers values and sums them.
rs.sum <- function(x, n=2) {sum( x[sample(1:length(x),n)], na.rm=TRUE)}
rs.sum.raster <- calc(rr, rs.sum)
If you are wanting to apply a function to a limited random selection of cells, you could create a random sample of the raster that would be used as an index. Here we create a random sample of cells, create an empty raster and pipe the sum of rasters 1 and 2 (in the stack) based on the random sample cell index. A raster in the stack is indexed using the double bracket and the raster values are indexed using a single bracket so, for raster 1 in the stack with limiting to the values in the random sample you would use: rr[[1]][rs]
rs <- sample(1:ncell(rr[[1]]), 300)
r.sum <- rr[[1]]
r.sum[] <- NA
r.sum[rs] <- rr[[1]][rs] + rr[[2]][rs]
plot(r.sum)

Related

Aggregate large raster to raster with lower resolution via mean (R)

I have a fairly large raster file with a resolution of 0.9 arcsec x 0.9 arcsec carrying values between 0 and 100 (and 255 for NA) called forest1. I want to aggregate this raster to the resolution of another raster (called dfr_2010_crop) which has a 0.5° x 0.5° resolution using the mean of values. Unfortunately, the strategy I am currently using requires too much memory. Namely, I am using
TreeCover = rasterToPoints(forest1, fun=NULL, spatial=T)
TreeCoverPercent <- rasterize(TreeCover, dfr_2010_crop, fun=function(x,...) {sum(x, na.rm=T)/(4*10^4)}, field=g )
whereby g is the correct field I have saved before. 4*10^4 is the number of 0.9 arcsec x 0.9 arcsec in a 0.5° x 0.5° cell. R tells me that he cannot allocate a vector of the size of 7.9GB after running the first line. I have tried to solve this problem in the following ways:
rasterOptions(maxmemory=1e+08)
And, after this did not work I have tried to work in blocks. I tried following the approach given here ([https://strimas.com/post/processing-large-rasters-in-r/][1]) where they use the calc() function when working in blocks. However, I failed to customite it to my setting as I do not know how to call the blocks as raster files inside of the loop. However, here is my try:
canProcessInMemory(forest1, 1, TRUE)
#working in block
f_in <- f
f_out <- tempfile(fileext = ".tif")
# input and output rasters
r_in <- stack(f_in)
r_out <- raster(r_in)
# blocks
b <- blockSize(r_in)
print(b)
r_in <- readStart(r_in)
r_out <- writeStart(r_out, filename = f_out)
# loop over blocks
for (i in seq_along(b$row)) {
# read values for block
# format is a matrix with rows the cells values and columns the layers
v <- getValues(r_in, row = b$row[i], nrows = b$nrows[i])
# mean cell value across layers
v <- rasterToPoints(v, fun=NULL, spatial=T)
# write to output file
r_out <- writeValues(r_out, v, b$row[i])
}
# close files
r_out <- writeStop(r_out)
r_in <- readStop(r_in)
Looking forward to any suggestions and thanks for your help.
The ratio of your resolutions turns out to be an exact integer:
res1 = 0.9/(60*60) # resolution converted to degrees
res2 = 0.5
res.factor = res2 / res1
res.factor
# [1] 2000
You can double check this with you actual rasters using res.factor = res(forest1) / res(dfr_2010_crop) - I can't do that because you did not provide a reproducible example.
This means that you can simply use raster::aggregate to change the resolution.
TreeCoverPercent = aggregate(forest1, res.factor)
In case your res.factor was not a precise integer, then you can still use this method by rounding to the nearest integer, followed by resampling to the final desired resolution.
TreeCoverPercent = aggregate(forest1, round(res.factor))
TreeCoverPercent = resample(TreeCoverPercent, dfr_2010_crop)

R function for creating discs around each point in a pattern, then counting number of points in each disc [spatial]

I am attempting to create a disc for each point in a pattern; each disc will have the same radius. Then for each disc, I want to count the number of points falling within the disc. Each pattern has 100-400 points. I have written code to do this, but it is quite slow. The code is below. I cannot provide the shapefile and points as that would be very difficult, but I could create some dummy data if need be.
W <- as.owin(shape)
#Converts created .shp file into a "window"
#in which everything is plotted and calculated
SPDF <- SpatialPointsDataFrame(P[,1:2], P)
#Converts data frame to spatial points data frame
SP <- as(SPDF, "SpatialPoints") #Converts SPDF to spatial points
SP1 <- as.ppp(coordinates(SP), W)
SP2 <- as.ppp(SP1)
attr(SP1, "rejects")
attr(SP2, "rejects")
aw <- area.owin(W) #Area, in pixels squared, of leaf window created earlier
#awm <- aw * (meas)^2 * 100 #Area window in millimeters squared
# Trichome_Density_Count-----------------------------------------------------------------------------------------------
TC <- nrow(P) #Counts number of rows in XY data points file,
#this is number of trichomes from ImageJ
TD <- TC/awm #Trichome density, trichomes per mm^2
#SPDF2 <- as.SpatialPoints.ppp(SP2)
#kg <- knn.graph(SPDF2, k = 1)
#Creates the lines connecting each NND pairwise connection
#dfkg <- data.frame(kg) #Converts lines into a data frame
#dfkgl <- dfkg$length
meanlength <- 78
discstest <- discs(SP2, radii = meanlength,
separate = TRUE, mask = FALSE, trim = FALSE,
delta = NULL, npoly=NULL)
#Function creates discs for each trichome
#Using nearest neighbor lengths as radii
#NEED TO ADD CLIPPING
ratiolist <- c()
for (i in 1:length(discstest)) {
ow2sp <- owin2SP(discstest[[i]])
leafsp <- owin2SP(W)
tic("gIntersection")
intersect <- rgeos::gIntersection(ow2sp, leafsp)
Sys.sleep(1)
toc()
tic("over")
res <- as.data.frame(sp::over(SP, intersect, returnList = FALSE))
Sys.sleep(1)
toc()
res[is.na(res)] <- 0
newowin <- as.owin(intersect)
circarea <- area.owin(newowin)
trichactual <- sum(res)
trichexpect <- (TC / aw) * circarea
ratio <- trichactual / trichexpect
ratiolist[[i]] <- ratio
}
If I understand you correctly you want to loop through each point and check how many points fall within a disc of radius R centered in that point. This is done very efficiently in spatstat with the function closepaircounts:
closepaircounts(SP2, r = meanlength)
This simply returns a vector with the number of points contained in the disc of radius r for each point in SP2.
I have just tried this for 100,000 points where each point on average had almost 3000 other points in the disc around it, and it took 8 seconds on my laptop. If you have many more points or in particular if the disc radius is so big that each disc contains many more points it may become very slow to calculate this.

Reduce memory usage for mosaic on large list of rasters

I am using the mosaic function in the raster package to combine a long (11,000 files) list of rasters using the approach suggested by #RobertH here.
rlist <- sapply(list_names)
rlist$fun <- mean
rlist$na.rm <- TRUE
x <- do.call(mosaic, rlist)
As you might imagine, this eventually overruns my available memory (on several different machines and computing clusters). My question is: Is there a way to reduce the memory usage of either mosaic or do.call? I've tried altering maxmemory in rasterOptions(), but that does not seem to help. Processing the rasters in smaller batches seems problematic because the rasters may be spatially disjunct (i.e., sequential raster files may be located very far from each other). Thanks in advance for any help you can give.
Rather than loading all rasters into memory at once (in the mosaic() call), can you process them one at a time? That way, you have your mosaic that updates each time you bring one more raster into memory, but then you can get rid of the new raster and just keep the continuously updating mosaic raster.
Assuming that your rlist object is a list of rasters, I'm thinking of something like:
Pseudocode
Initialize an updating_raster object as the first raster in the list
Loop through each raster in the list in turn, starting from the 2nd raster
Read the ith raster into memory called next_raster
Update the updating_raster object by overwriting it with the mosaic of itself and the next raster using a weighted mean
R code
Testing with the code in the mosaic() help file example...
First generate some rasters and use the standard mosaic method.
library(raster)
r <- raster(ncol=100, nrow=100)
r1 <- crop(r, extent(-10, 11, -10, 11))
r2 <- crop(r, extent(0, 20, 0, 20))
r3 <- crop(r, extent(9, 30, 9, 30))
r1[] <- 1:ncell(r1)
r2[] <- 1:ncell(r2)
r3[] <- 1:ncell(r3)
m1 <- mosaic(r1, r2, r3, fun=mean)
Put the rasters in a list so they are in a similar format as I think you have.
rlist <- list(r1, r2, r3)
Because of the NA handling of the weighted.mean() function, I opted to create the same effect by breaking down the summation and the division into distinct steps...
First initialize the summation raster:
updating_sum_raster <- rlist[[1]]
Then initialize the "counter" raster. This will represent the number of rasters that went into mosaicking at each pixel. It starts as a 1 in all cells that aren't NA. It should properly handle NAs such that it only will increment for a given pixel if a non-NA value was added to the updating sum.
updating_counter_raster <- updating_sum_raster
updating_counter_raster[!is.na(updating_counter_raster)] <- 1
Here's the loop that doesn't require all rasters to be in memory at once. The counter raster for the raster being added to the mosaic has a value of 1 only in the cells that aren't NA. The counter is updated by summing the current counter raster and the updating counter raster. The total sum is updated by summing the current raster values and the updating raster values.
for (i in 2:length(rlist)) {
next_sum_raster <- rlist[[i]]
next_counter_raster <- next_sum_raster
next_counter_raster[!is.na(next_counter_raster)] <- 1
updating_sum_raster <- mosaic(x = updating_sum_raster, y = next_sum_raster, fun = sum)
updating_counter_raster <- mosaic(updating_counter_raster, next_counter_raster, fun = sum)
}
m2 <- updating_sum_raster / updating_counter_raster
The values here seem to match the use of the mosaic() function
identical(values(m1), values(m2))
> TRUE
But the rasters themselves aren't identical:
identical(m1, m2)
> FALSE
Not totally sure why, but maybe this gets you closer?
Perhaps compareRaster() is a better way to check:
compareRaster(m1, m2)
> TRUE
Hooray!
Here's a plot!
plot(m1)
text(m1, digits = 2)
plot(m2)
text(m2, digits = 2)
A bit more digging in the weeds...
From the mosaic.R file:
It looks like the mosaic() function initializes a matrix called v to populate with the values from all the cells in all the rasters in the list. The number of rows in matrix v is the number of cells in the output raster (based on the full mosaicked extent and resolution), and the number of columns is the number of rasters to be mosaicked (11,000) in your case. Maybe you're running into the limits of matrix creation in R?
With a 1000 x 1000 raster (1e6 pixels), the v matrix of NAs takes up 41 GB. How big do you expect your final mosaicked raster to be?
r <- raster(ncol=1e3, nrow=1e3)
x <- 11000
v <- matrix(NA, nrow=ncell(r), ncol=x)
format(object.size(v), units = "GB")
[1] "41 Gb"

AdehabitatHR kerneloverlaphr

Hope someone can help. I have a large data set which includes 100 runs of random data for 10 animals. So far I have created an EstUD by stacking each of the runs to get a sum of utilised distribution. I would now like to compare each animals home range using kerneloverlaphr. Unfortunatley I get the error, In vi * vj : longer object length is not a multiple of shorter object length. I think it is because the grids are not all the same. Is there a way to convert the grids so they are all the same then I can estimate overlap please? The first part of the code I have run ten times, once for each animal. I'm sure this could be done in a loop too but not sure how.
#Part1: generate 10 estUD's 1 per animal
bat.master <- read.csv("C:/Users/a6915409/Dropbox/Wallington GIS/bat.master")
xybat <- subset(bat.master, bat.master$id == "H1608b",select=x:loopno )
#change to spatial points
xy <- xybat[1:2]#first two rows save as coords
df <- xybat[-1:-3]#remove unneded columns for ud
SPDF <- SpatialPointsDataFrame(coords=xy, data=df)#combine df and xy
udHR <- kernelUD(SPDF, h = "href", same4all = TRUE, kern = "bivnorm")
## I would proceed using the raster packages
library(raster)
ud <- stack(lapply(udHR, raster))
## take the sum
plot(udm <- sum(ud))
H1608b <- udHR[[1]]
H1608b#grid <- as(udm, "GridTopology")
# Part 2:
#combine all Ud's into one dataset
liud <- list(Y2889a, Y2889a, Y2850a, Y2850b, H1670a, H1670b, H1659a, H1659b,H1608a, H1608b)
class(liud) <- "estUDm"
image(liud)#plot all est ud's
Over<-kerneloverlaphr(liud, method="UDOI", percent= 90)
error: In vi * vj : longer object length is not a multiple of shorter object length
You need to estimate the kernelUD using the argument same4all=T. You will eliminate the problems regarding overlapping calculations.

R filter raster using focal() with threshold - defining correct function

I have the issue of defining the right function for my task. For filtering an image I have set up a filter matrix eucdis with
library(rgdal)
library(raster)
refm=matrix(1,nrow=11,ncol=11)
M = dim(refm)[1]
N = dim(refm)[2]
eucdis = matrix(NaN, nrow=11, ncol=11)
for (i in -5:5){
for (j in -5:5){
eucdis[i+6,j+6] = 2*(sqrt(sum(abs(0-i)^2+abs(0-j)^2))) #euclidean distance of the moving matrix
eucdis[6,6]=1
eucdis[eucdis>10]=0
eucdis[eucdis>0]=1
}
}
Using the example raster
f <- system.file("external/test.grd", package="raster")
f
r <- raster(f)
I want to filter all values of that raster that have a certain value, say 200 within 10% (=8) of the moving eucdis filter matrix
s=focal(x=r,w=eucdis,fun=function(w) {if (length(w[w==1])>=8) {s=1} else {s=0}})
But this only gives me all values where the eucdis filter matrix has at least 8 pixel with any values of r. If I add the constraint about r[r>=200] it is not working as I thought it would. It is not taking the second constraint into account.
s=focal(x=r,w=eucdis,fun=function(w,x) {
if (length(w[w==1])>=8 | x[x>=200]){s=1} else {s=0}})
# I also tried & and &&
If anyone can help me please. I have spend days already and can't figure it out my self.
Thank you,
Anne
The function passed to focal doesn't refer to the weights matrix. Rather, it refers to the cells of r that fall within the moving window (and these cells' relative contribution to the function's return value is controlled by the weights matrix). So, where you have used function(w) {if (length(w[w==1])>=8) 1 else 0}, you're actually saying that you want to return 1 if the focal subset of r has at least 8 cells with value equal to 1 (and return 0 otherwise).
One to achieve your aim is to perform a focal sum on a binary raster that has been thresholded at 200. The function that you would apply to the moving window would be sum, and the output of this focal sum would indicate the number of cells of your thresholded raster that have value 1 (this corresponds to the number of cells of r that have value >= 200 within the moving window).
library(raster)
r <- raster(system.file("external/test.grd", package="raster"))
m <- matrix(2 * pointDistance(expand.grid(-5:5, -5:5), c(0, 0), lonlat=FALSE),
ncol=11, nrow=11)
m <- m <= 10
r2 <- focal(r >= 200, m, sum, na.rm=TRUE, pad=TRUE)
plot(r2)
You can then check which cells of that raster have value >= 8.
r3 <- r2 >= 8
plot(r3)
In this case, almost all the cells meet your criteria.

Resources