I have raster files that have the same resolution and extent but differ in the number of NA. I want to unify the number of NA between all of them. Is it possible to do it by considering a cell as non-NA if it's not NA in all the raster files?
Here an example :
library(raster)
library(terra)
f <- system.file("external/test.grd", package="raster")
r1 <- raster(f)
r2 <- calc(r1, fun=function(x){ x[x < 500] <- NA; return(x)} )
r1 <- calc(r1, fun=function(x){ x[x > 1200] <- NA; return(x)} )
raste <- rast(r1)
rNA <- terra:: global(!(is.na(raste)), sum, na.rm=TRUE)
print(paste0("Non-NA of r1", rNA))
raste <- rast(r2)
rNA <- terra:: global(!(is.na(raste)), sum, na.rm=TRUE)
print(paste0("Non-NA of r2", rNA))
I want both r1 and r2 to have the same number of non-NA cells. I have more than two rasters, so I wonder if I can do it for a large number of files.
It can be a bit confusing to use raster and terra together, so I will just use terra (but you can do the same with raster, using stack in stead of c and cellStats in stead of global.
Your example data
library(terra)
f <- system.file("external/test.grd", package="raster")
r <- rast(f)
r1 <- clamp(r, upper=1200, values=FALSE)
r2 <- clamp(r, lower=500, values=FALSE)
global(!(is.na(r1)), sum)
# sum
#lyr.1 3145
global(!(is.na(r2)), sum)
# sum
#lyr.1 802
Solution:
r <- c(r1, r2)
names(r) <- c("r1", "r2")
m <- any(is.na(r))
x <- mask(r, m, maskvalue=1)
global(!(is.na(x)), sum, na.rm=TRUE)
# sum
#r1 769
#r2 769
I like the use of any(is.na()) because it makes clear what the intent is.
But you could combine the layers in one of many other ways. As long as you do not use na.rm=TRUE the cells with an NA in one of the layers will be NA in the output. For example with sum, diff, prod, mean or app.
m <- sum(r)
x <- mask(r, m)
global(!(is.na(x)), sum, na.rm=TRUE)
Related
Given:
library(terra)
r <- rast(nrows=512, ncols=640,nlyrs=1, res=1,crs="local",extent=c(0,640,0,512))
values(r) <- rnorm(ncell(r),100,10)
x <- c(r, r*2, r*3, r*0.5)
plotRGB(x, stretch="lin")
and
weirds <- data.frame(bands=c(1,3), mincol=c(5,100), maxcol=c(7,103))
weirds
> weirds
bands mincol maxcol
1 5 7
3 100 103
I have to set as NA those cells within mincol and maxcol in weirds for the given bands.
I achieve in this way:
y <- x
for (i in 1:nrow(weirds)) {
y[[weirds$bands[i]]][,weirds$mincol[i]:weirds$maxcol[i]] <- NA
}
plotRGB(y, stretch="lin")
But I am sure there are more efficient ways. Can anyone make suggestions?
With terra 1.6-24 (currently the development version) you can use indexing for layers.
Example data
library(terra)
#terra 1.6.24
r <- rast(nrows=30, ncols=30, nlyrs=1,extent=c(0,640,0,512))
values(r) <- rnorm(ncell(r), 100, 10)
x <- c(r, r*2, r*3)
w <- data.frame(bands=c(1,3), mincol=c(5,20), maxcol=c(7,23))
First approach. This is very similar to what you had, but perhaps a little cleaner.
y <- x
for (i in 1:nrow(w)) {
y[ , w$mincol[i]:w$maxcol[i], w$bands[i]] <- NA
}
plot(y)
Second approach. Using a list for one or more of the indices i (row), j (col), or k (lyr) you can do multiple replacements at once.
z <- x
cols <- lapply(1:nrow(w), function(i) seq(w[i, 2], w[i, 3]))
z[ , cols, w$bands] <- NA
plot(z, nc=3)
I would assume that this is the better approach and that you can probably directly create the cols list, instead of deriving it from w
You can install the development version of "terra" with install.packages('terra', repos='https://rspatial.r-universe.dev')
I want to multiply two rasters in R with 'terra'. Raster1 has 457 bands with EVI values meanwhile raster2 is a one-layer raster -of almost same extent- with binary values (0 or 1). The result that I want to achieve is to get raster1 (with the 457 original bands) with values only in pixels that has a value = 1 in raster2. That's why I want to multiply them.
I have tried:
result <- raster1 * raster2
result <- overlay(raster1, raster2, fun = function(x,y){return(x*y)}, unstack=FALSE)
But it doesn't work. I appreciate some help.
Example data (please always include some):
library(terra)
f <- system.file("ex/logo.tif", package="terra")
r1 <- rast(c(f, f))
r2 <- rast(r1, nlyr=1)
set.seed(0)
values(r2) <- sample(c(0,1), ncell(r2), replace=TRUE)
Solution
x <- mask(r1, r2, maskvalue=0)
If you multiply (r1 * r2), all values that are zero in r2 become zero in the output, but you want them to become NA, and that is what mask will do for you.
Alternatively, you could first change the cells that are 0 to NA, and then multiply, but that is unnecessarily convoluted:
m <- subst(r2, 0, NA)
y <- r1 * m
I have a list of raster files in GeoTIFF files, and I've like to remove each file and if one has more than 50% of NA values.
I wanna delete it from the list my new list (f2). In my code:
library(raster)
# example data
r <- raster(ncol=10, nrow=10)
set.seed(0)
# 10 layers
s <- stack(lapply(1:10, function(i) setValues(r, runif(ncell(r)))))
# set about half the values to NA
s[s < .5] <- NA
#Create GeoTIFF for each layer
sl<-1:10
for (i in 1:length(sl)){
writeRaster(s[[i]],filename=paste(sl[i],sep=""),
format="GTiff",datatype="FLT4S",overwrite=TRUE)
}
#Take images in batch
f <- list.files(getwd(), pattern = ".tif")
ras <- lapply(f,raster)
#Remove from my list when I have more than 50% of cells that are NA
class <- vector()
for (j in 1:length(ras)){
i <- cellStats(is.na(ras[[j]]), sum) # count the NA values in each layer
i <- i/ncell(ras[[j]]) # fraction that is NA
ss <- ras[[j]][[which(i>.5)]] # Select the layers that more than half the cells with values
class<-c(class,ss)
}
Here, I have my problem, the output class has all my images and not the layers that more than 50% of the cells with values
I've like to apply this conditon for:
#Remove target images
f2 <- list.files(getwd(), pattern = ".tif")
f2<- f[f!=class]
ras2 <- lapply(f2,raster)
Your example data
library(raster)
r <- raster(ncol=10, nrow=10)
set.seed(0)
s <- stack(lapply(1:10, function(i) setValues(r, runif(ncell(r)))))
s[s < .5] <- NA
# I skip the file writing bit. But from these files I would make a RasterStack again
#f <- list.files(getwd(), pattern = ".tif")
#ras <- stack(f)
Count the number of cells with NA and divide by the number of cells
f <- freq(s, value=NA) / ncell(s)
# equivalent to cellStats(is.na(s), "mean")
i <- which(f <= 0.5)
i
#layer.2 layer.6 layer.7
# 2 6 7
use these indices to subset the RasterStack
ss <- s[[i]]
If you cannot make a RasterStack from your real data (perhaps the rasters do not align), you can use a list and a loop like this
ras <- as.list(s)
result <- rep(NA, length(ras))
for (i in 1:length(ras)){
result[i] <- freq(ras[[i]], value=NA) / ncell(ras[[i]])
}
# equivalent to
# result <- sapply(1:length(ras), function(i) freq(ras[[i]], value=NA) / ncell(ras[[i]]))
j <- result < 0.5
sras <- ras[j]
## input raster
s <- stack(list.files("~/dailyraster", full.names=TRUE)) # daily raster stack
r_start <- raster("~/stackSumSTART.asc") # this raster contain starting Julian day
r_end <- raster("~/stackSumEND.asc") # this raster contain ending Julian day
noNAcells <- which(!is.na(r[])) # cell numbers which contain values
## dummy raster
x <- r
x[] <- NA
## loop
for (i in noNAcells) {
x[i] <- sum(s[[r_start[i]:r_end[i]]][i])
}
I would like to create a function like stackApply(), but I want it to work on a cell basis.
Above is a for() loop version and it works well, but it takes too much time.
The point is that each cell gets the range of sum() from two raster layers, r_start, r_end in above script.
Now I am struggling to transform this code using apply() family.
Is there any possibility to improve the speed with for() loop? or please give me some tips to write this code in apply()
Any comments will help me, thank you.
Your approach
x <- s$layer.1
system.time(
for (i in 1:ncell(x)) {
x[i] <- sum(s[[r_start[i]:r_end[i]]][i], na.rm = T)
}
)
user system elapsed
0.708 0.000 0.710
My proposal
You can add the rasters used as indices at the end of your stack and then use calc to highly speed up the process (~30-50x).
s2 <- stack(s, r_start, r_end)
sum_time <- function(x) {sum(x[x[6]:x[7]], na.rm = T)}
system.time(
output <- calc(s2, fun = sum_time)
)
user system elapsed
0.016 0.000 0.015
all.equal(x, output)
[1] TRUE
Sample Data
library(raster)
# Generate rasters of random values
r1 <- r2 <- r3 <- r4 <- r5 <- r_start <- r_end <- raster(ncol=10, nrow=10)
r1[] <- rnorm(ncell(r1), 1, 0.2)
r2[] <- rnorm(ncell(r2), 1, 0.2)
r3[] <- rnorm(ncell(r3), 1, 0.2)
r4[] <- rnorm(ncell(r4), 1, 0.2)
r5[] <- rnorm(ncell(r5), 1, 0.2)
s <- stack(r1,r2,r3,r4,r5)
r_start[] <- sample(1:2, ncell(r_start),replace = T)
r_end[] <- sample(3:5, ncell(r_end),replace = T)
How can i get a new raster r90 that has only those values which are greater than 90th percentile of its values in each grid cell.
For example i tried the follwoing but i am not sure if r90 is giving me the right thin.
library(raster)
r1 <- raster(nrow=10, ncol=7)
r <- stack(setValues(r1, runif(ncell(r1))),
setValues(r1, runif(70 ,0.6,0.9)),
setValues(r1, runif(70 ,0.2,0.4)),
setValues(r1, runif(70 ,1,2)))
r
#calcaulte 90th percentile of each grid cells
q90fun <- function(x){quantile(x, probs = .90, na.rm=TRUE)}
q90<-calc(r,fun=q90fun)
#sort raster r with values greater than or equal to its 90th percentile
fun90gt <- function(x,y){x[x >= y]}
r90<-overlay(r,q90,fun=fun90gt)
r90
I think it works with the example data, but that it may not be OK in general because it is not guaranteed that fun90gt will always return the same number of values (consider ties, NA)
Instead of r90 <- overlay(r, q90, fun=fun90gt) you could do
r90 <- overlay(r, q90, fun=function(x,y) { x[x <= y] <- NA; x })
or the equivalent
rr <- r > q90
r90 <- mask(r, rr, maskvalue=0)
and then perhaps
# r90n <- sum(rr)
r90n <- calc(r90, function(x) sum(!is.na(x)) )
r90mx <- max(r90, na.rm=TRUE)
r90mn <- min(r90, na.rm=TRUE)