Speeding up a loop over rasters - r

I have a big dataset with 30000 rasters. My goal is to extract a mean value using the polygon located within the raster and create a file with extracted rasters values and dates from rasters filenames.
I succeeded in doing this by performing the following loop:
for (i in 1:length(rasters2014)){
a <- raster(rasters2014[i])
ext[i] <- as.vector(extract(a, poligon2, fun=mean, na.rm=TRUE, df=F))
}
output2 = data.frame(ext, filename=filename2014)
The problem is that the presented above loop takes about 2.5h hours to complete the calculation. Does anyone have an idea how I could speed up this process?

If your raster are all properly aligned (same ncol, nrow, extent, origin, resolution), you could try identifying the "cell numbers" to be extracted by looking on the first file, then
extracting based on those. This could speed-up the processing beacause raster does not need to compute which cells to extract. Something like this:
rast1 <- raster(rasters2014[1])
cells <- extract(rast1, poligon2, cellnumbers = TRUE, df = TRUE)[,"cells"]
ext <- list()
for (i in 1:length(rasters2014)){
a <- raster(rasters2014[i])
ext[[i]] <- as.vector(extract(a, cells, fun=mean, na.rm=TRUE, df=F))
}
Note that I am also using a list to store the results to avoid "growing" a vector, which is usually wasteful.
Alternatively, as suggested by #qdread, you could build a rasterStack using raster::stack(rasters2014, quick = TRUE) and call extract over the stack to avoid the for loop. Don't know which would be faster.
HTH

If your polygons do not overlap (and in most cases they don't) an alternative route is
library(raster)
x <- rasterize(poligon2, rasters2014[1])
s <- raster::stack(rasters2014, quick = TRUE)
z <- zonal(s, x, "mean")
PS: Faster is nicer, but I would suggest getting lunch while this runs.

Thanks for your help! I've tried all of the proposed solutions and the computation time generally the same regardless of the applied method. Therefore, I guess that it is just not possible to significantly speed up the computational process.

Related

What is the fastest way to convert correlation between a vector and a matrix in r?

I am trying to find a fast way to calculate the correlation between a vector of values and a matrix. I have a data frame with 200 rows and 400,000 observations after transposing the data. I need to find the cor between each column and every other column.
My code is below but it is too slow. Can anyone come up with a faster way.
for(i in 1:400000){
x=cor(trainDataNew[,i],trainDataNew[,-i])
}
You don't need my data to do this. You can create random data like below.
norm1 <- rnorm(1000)
norm2 <- rnorm(1000)
norm3 <- rnorm(1000)
as.data.frame(cbind(norm1,norm2,norm3))
What's wrong with
cc <- cor(trainDataNew)
?
If you only want the lower triangle you can then use
cc2 <- cc[lower.tri(cc,diag=FALSE)]
This blog post claims to have done a similar-sized (slightly smaller) problem in about a minute. Their approach is implemented in HiClimR::fastCor.
library(HiClimR)
system.time(cc <- fastCor(dd, nSplit = 10,
upperTri = TRUE, verbose = TRUE,
optBLAS=TRUE))
I haven't gotten this working yet (keep running out of memory), but you may have better luck. You should also look into linking R to an optimized BLAS, e.g. see here for MacOS.
Someone here reports a parallelized version (code is here, along with some forked versions)

Merge rasters of different extents, sum overlapping cell values in R

I am trying to merge rasterized polylines which have differing extents, in order to create a single surface indicating the number of times cells overlap.
Due to computational constraints (given the size of my study area), I am unable to use extend and then stack for each raster (total count = 67).
I have come across the merge function in R, and this allows me to merge rasters together into one surface. It doesn't, however, seem to like me inserting a function to compute the sum of overlapping cells.
Maybe I'm missing something obvious, or this is a limitation of the merge function. Any advice on how to generate this output, avoiding extend & stack would be greatly appreciated!
Code:
# read in specific route rasters
raster_list <- list.files('Data/Raw/tracks/rasterized/', full.names = TRUE)
for(i in 1:length(raster_list)){
# get file name
file_name <- raster_list[i]
# read raster in
road_rast_i <- raster(file_name)
if(i == 1){
combined_raster <- road_rast_i
} else {
# merge rasters and calc overlap
combined_raster <- merge(combined_raster, road_rast_i,
fun = function(x, y){sum(x#data#values, y#data#values)})
}
}
Image of current output:
Image of a single route (example):
Image of fix:
Solved. There's a mosaic function, which allows the following:
combined_raster <- mosaic(combined_raster, road_rast_i, fun = sum)

Make distinction between inner and outer NA's in a raster in R

In R, how could I make the distinction between inner and outer NA's in a raster with some shape having NA's both around but also inside?
In the example below, how could I for exemple select only the NA's outside the R logo (i.e., how can I make everything which is included in the circle of the logo appear as white)?
library(raster)
r <- raster(system.file("external/rlogo.grd", package="raster"))
r[r>240] = NA
par(mfrow=c(1,2))
plot(r, main='r')
plot(is.na(r), main="is.na(r)")
You don't really have many options. This type of analysis usually requires some more elaborate methods. Here however is a simple workaround using the clumpfunction:
#your inital code
library(raster)
r <- raster(system.file("external/rlogo.grd", package="raster"))
rna <- rc <- r
rna[r>240] = NA
par(mfrow=c(2,2))
#reclass values <=240 to NA (needed for clump function.
#Here, NAs are used to seperate clumps)
rc[r<=240] <- NA
rc <- clump(rc)
#what you get after applying the clump function
#are homogenous areas that are separated by NAs.
#Let's reclassify all areas with an ID > 8.
#In this case, these are the areas inside the ring.
rc_reclass <- rc
rc_reclass[rc_reclass>8] <- 100
#let's do some plotting
plot(r, main='r')
plot(is.na(rna), main="is.na(r)")
plot(rc, main="clumps")
plot(rc_reclass, main="clumps reclass")
I agree with #maRtin, it's a bit tricky. Not only don't you have a dedicated NoData value, but also the image is a bit smudgy.
Nevertheless, I think I found a way which is a bit better than clump, which uses the spatial domain to separate the areas:
First, I'm getting the focal values of the pixel neighbourhoods:
#make copy
r2 <- r
# focal values
fv <- getValuesFocal(r2,ngb = c(3,3))
Then I I first I exclude all pixels, which have a neighbourhood value of greater than 242.8. This was purely trial and error, but it gives a good result.
ix <- rowMeans(fv,na.rm = T) > 242.8
r2[ix] <- NA
You could actually already deem this acceptable. The only problem is, that there is a small border around the value area which should be NA.
So somehow I need to get rid of the remaining NA pixels. This I try to do with an iterative exclusion. For every iteration, I see if there's pixels which have still NA values around and a max value lower than a certain threshold. Again, there's a lot of playing around involved and I guess you could achieve a better result than this, but I guess this would be a way to go.
while (TRUE){
fv <- getValuesFocal(r2,ngb = c(3,3))
ix <- apply(fv,1,function(x) max(x,na.rm=T)) > 243 & rowSums(is.na(fv)) > 0
if (any(ix)){
r2[ix] <- NA
} else {
break
}
}
After a couple of iterations, I get this:
There are clearly already some pixels gone which shouldn't be, maybe it could be done with a bit more fiddling around.
Another interesting thought would be looking at all three channels. If you load the image with brick, you can get the RGB channels. I've tried a few things like max, mean, mode, sd, etc. but to no avail.

Faster way to sum up raster values based on polygon extent in R

I am looking for a way to improve the speed and lower the memory-usage of the following lines:
export <- raster(paste0(catch_dir,'/export_streams.rst'))
catchm_polyg <- readOGR(dsn = catch_dir, layer = 'catchment')
Model_10 <- extract(export, catchm_polyg, fun = sum, na.rm = TRUE )
This gives me the sum of all values from export_streams.rst, with catchm_polyg as an extent.
I want to do this a lot of times for different input-data. Therefor the code is part of a function, which is then used in a foreach loop. That all works fine to a certain degree. The code doesn't work with larger input-data though, as I apparently don't have enough memory (32gb, 64bit R version). Also the calculation time is very high. Any suggestions on how to improve the code?
A couple things to speed things up might include some of the following:
Ask yourself: Can I first aggregate my raster to a courser resolution using the sum function?
Memory: Don't always write to the memory when using functions from the raster package. Instead try to write externally when possible or you will get memory errors.
If you have multi-part polygon (a SpatialPolygonDataframe object). Just run the extract function once, then unlist and then run functions.
# quickly summarise across multiple polygons
allmyvals <- extract(myrast, myploys)
myploys$sum_in_poly <- unlist(lapply(allmyvals , function(x) if (!is.null(x)) sum(x, na.rm=TRUE) else NA ))
Take an alternative approach out of the raster package or try something with getValues. See these threads:
https://gis.stackexchange.com/questions/130522/increasing-speed-of-crop-mask-extract-raster-by-many-polygons-in-r
https://gis.stackexchange.com/questions/156663/really-slow-extraction-from-raster-even-after-using-crop

using interp1 in R for matrix

I am trying to use the interp1 function in R for linearly interpolating a matrix without using a for loop. So far I have tried:
bthD <- c(0,2,3,4,5) # original depth vector
bthA <- c(4000,3500,3200,3000,2800) # original array of area
Temp <- c(4.5,4.2,4.2,4,5,5,4.5,4.2,4.2,4)
Temp <- matrix(Temp,2) # matrix for temperature measurements
# -- interpolating bathymetry data --
depthTemp <- c(0.5,1,2,3,4)
layerZ <- seq(depthTemp[1],depthTemp[5],0.1)
library(signal)
layerA <- interp1(bthD,bthA,layerZ);
# -- interpolate= matrix --
layerT <- list()
for (i in 1:2){
t <- Temp[i,]
layerT[[i]] <- interp1(depthTemp,t,layerZ)
}
layerT <- do.call(rbind,layerT)
So, here I have used interp1 on each row of the matrix in a for loop. I would like to know how I could do this without using a for loop. I can do this in matlab by transposing the matrix as follows:
layerT = interp1(depthTemp,Temp',layerZ)'; % matlab code
but when I attempt to do this in R
layerT <- interp1(depthTemp,t(Temp),layerZ)
it does not return a matrix of interpolated results, but a numeric array. How can I ensure that R returns a matrix of the interpolated values?
There is nothing wrong with your approach; I probably would avoid the intermediate t <-
If you want to feel R-ish, try
apply(Temp,1,function(t) interp1(depthTemp,t,layerZ))
You may have to add a t(ranspose) in front of all if you really need it that way.
Since this is a 3d-field, per-row interpolation might not be optimal. My favorite is interp.loess in package tgp, but for regular spacings other options might by available. The method does not work for you mini-example (which is fine for the question), but required a larger grid.

Resources