This question was asked once in the past (from what I could find), but the only response did not provide a solution. Within a RasterStack, I want to generate summary statistics for each raster (min, max, mean, SD), AND the number of cells included in these calculations (i.e., non-NA cell count). You might think the number would be the same for every raster if they are all the same extent and resolution, but these rasters have been masked by their respective QA layers, resulting in a different number of non-NA cells in each raster layer. I've been using cellStats, but that does not provide cell count as output. I could also use zonal stats, but that does not appear to have this functionality (from what I read). Does anyone know how to add this to my output?
Thanks
I think you are looking for the freq function. This function will give you the frequency of pixels by value. Here is one way you could calculate the number of non-NA pixels with some dummy data.
library(raster)
#Create 2 matrix
m1<-matrix(sample(1:10, 250, replace = T),
nrow = 50,
ncol = 50)
m2<-matrix(sample(11:20, 250, replace = T),
nrow = 50,
ncol = 50)
#Transform it to stack
r1 <- stack(raster(m1), raster(m2))
#Set pixel values == 3 and == 12 as NA
r1[r1 ==3 | r1 == 12]<-NA
#Get your cellStats
cellStats(r1, stat = "mean")
#Transform non-NA values to 1
r1[!is.na(r1)]<-1
#Get frequency of pixels by value
#1's will be the number of non-NA pixels
freq(r1)
Related
I have a RasterLayer that contains values 1:5. I would like to randomly select 10% of cells that hold the value of 3 and convert these to contain 6's.
Where raster is the RasterLayer Object.
We find the cells which are equal to 3 using raster[raster[]==3]
We then replace it with a vector of the same length, randomly sampling either 3 or 6 with probabilities of 90% and 10% respectively.
raster[raster[]==3] <- sample(c(3, 6),
size = length(raster[raster[]==3]),
replace = T,
prob = c(0.9, 0.1))
I have two rasters of different resolution and what I want is to mask out areas of raster A based on proportional cover of raster B. For example, I want to mask out pixels of raster A, if at least the 1/3 of them is covered by raster B.
I have no idea how to do this, so any help will be appreciated.
library(raster)
A <- raster(matrix(1:60,3,4))
B <- raster(matrix(1:20,5,6))
Your rasters are different resolutions over the same extent between 0 and 1 on x and y. A is 4x3, B is 6x5.
par(mfrow=c(1,2))
plot(A)
plot(B)
One first step could be to align the two to have the same resolution, so that we can apply the mask function. The raster::disaggregate function splits each cell into an integer number of columns an rows. In this case, with low resolution, I can split each cell into the number of columns and rows of the other matrix, so the four columns of A can each be split into 6, and the six columns of B can be split into 4, such that both are 24 wide. (Thanks to this answer for help.)
A_resize <- disaggregate(A, fact = c(dim(B)[2], dim(B)[1]), method = "bilinear")
B_resize <- disaggregate(B, fact = c(dim(A)[2], dim(A)[1]), method = "bilinear")
par(mfrow=c(1,2))
plot(A_resize)
plot(B_resize)
I'm interpreting your question to mean you want to mask out A when the values of B in that same place are < 1/3 of max.
We could make a modified version of B_resize which reclassifies everything < 1/3 of max as an NA. (Thanks to this answer for help.)
B_mask <- reclassify(B_resize,
rbind(c(0, B#data#max * 1/3, NA),
c(B#data#max * 1/3, B#data#max, 1)))
plot(B_mask)
and we could then use that to mask A:
par(mfrow = c(1,1))
plot(mask(A_resize,B_mask))
I'd like to sample non-adjacent cells from a raster using the raster cell values as their inclusion probability.
Here is an example of sampling cells from a raster which can result in adjacent cells being selected.
set.seed(1)
library(raster)
r <- raster(matrix(runif(2500), nrow = 50, ncol = 50))
plot(r)
cells = sample(1:ncell(r), 100, prob=r[], replace=FALSE)
cell.centers = xyFromCell(r, cells)
plot(r)
points(cell.centers, pch = 19)
I realize with large sample sizes it would be nearly impossible to avoid selecting adjacent cells. In practice the number of samples I will be making is always going to be much smaller than the number of available cells. I just want to avoid including adjacent cells. I guess I could sample iteratively and remove cells once they are chosen..but that seems obtuse. Any help appreciated.
I am working on an ecological problem, involving species distribution models. I have a raster which is essentially a landscape of probabilities of presence per cell, so to speak. I want to calculate a new raster, based on the old one, where each cell is equal to the mean of itself and all 8 adjacent cells. This is not the same as aggregating the cells by mean, which results in the border between the newly aggregated cells being calculated incorrectly.
I can do this with the bit of code provided, but the raster I am working with is way, way too big to run this calculation, as it uses too much memory. If I subdivide the raster, it will still take days to do. Does anyone have a more efficient way of calculating this? I have created a small version of the raster as an example, albeit somewhat clumsily:
require(raster)
## create raster called "ras" rather clumsily
## create raster called "ras" rather clumsily
# (UTM coordinates and a probability value for each cell, not really
# important)
s.x = seq(249990, by = 30, length.out = 20)
s.y = seq(6189390, by = 30, length.out = 20)
x.l = lapply(1:20, function(x){
rep(s.x[x], 20)
})
x.l2 = as.vector(c(x.l[[1]], x.l[[2]], x.l[[3]], x.l[[4]], x.l[[5]],
x.l[[6]], x.l[[7]], x.l[[8]], x.l[[9]], x.l[[10]],
x.l[[11]], x.l[[12]], x.l[[13]], x.l[[14]], x.l[[15]],
x.l[[16]],x.l[[17]], x.l[[18]], x.l[[19]], x.l[[20]]))
df = as.data.frame(cbind(x.l2, rep(s.y, 20), rnorm(20*20, 0.5, 0.2)))
colnames(df) = c("x", "y", "P")
coordinates(df) <- ~ x + y
gridded(df) <- TRUE
ras = raster(df)
# for each cell, make a vector of the values at
# the cell and all <=8 adjacent cells:
vl = lapply(1:length(ras), function(x){
extract(ras,
(c(x,(adjacent(ras, x, directions=8, pairs=F, sorted=F)))))
})
# find the mean for each cell
vm = sapply(1:length(ras), function(x){
as.vector(mean(vl[[x]], na.rm = T))
})
# create raster template
templ = ras/ras
# multiply into template for new raster
ras = vm*templ
I have two sets of points stored in R as sf objects. Point object x contains 204,467 and point y contains 5,297 points.
In theory, I would want to calculate the distance from all points in x to all points in y. I understand that this would create a beast of a matrix, but it is doable using st_distance(x, y, by_element=FALSE) in the sf package in about 40 minutes on my i7 desktop.
What I want to do is to calculate the distance from all of the points in x to all of the points in y, then I want to convert this into a data.frame, that contains all variables for the respective x and y pair of points. This is because I want flexibility in terms of aggregation using dplyr, for instance, I want to find the number of points in y, that is within 10, 50, 100 km from x, and where x$year < y$year.
I successfully created the distance matrix, which has around 1,083,061,699 cells. I know this is a very inefficient way of doing this, but it gives flexibility in terms of aggregation. Other suggestions are welcome.
Below is code to create two sf point objects, and measure the distance between them. Next, I would want to convert this into a data.frame with all variables from x and y, but this is where I fail to proceed.
If my suggested workflow is unfeasible, can someone provide an alternative solution to measure distance to all points within a predefined radius, and create a data.frame of the result with all variables from x and y?
# Create two sf point objects
set.seed(123)
library(sf)
pts1 <- st_as_sf(x = data.frame(id=seq(1,204467,1),
year=sample(seq(from = 1990, to = 2018, by = 1), size = 204467, replace = TRUE),
xcoord=sample(seq(from = -180, to = 180, by = 1), size = 204467, replace = TRUE),
ycoord=sample(seq(from = -90, to = 90, by = 1), size = 204467, replace = TRUE)),
coords=c("xcoord","ycoord"),crs=4326)
pts2 <- st_as_sf(x = data.frame(id=seq(1,5297,1),
year=sample(seq(from = 1990, to = 2018, by = 1), size = 5297, replace = TRUE),
xcoord=sample(seq(from = -180, to = 180, by = 1), size = 5297, replace = TRUE),
ycoord=sample(seq(from = -90, to = 90, by = 1), size = 5297, replace = TRUE)),
coords=c("xcoord","ycoord"),crs=4326)
distmat <- st_distance(pts1,pts2,by_element = FALSE)
I would consider approaching this differently. Once you have your distmat matrix, you can do the types of calculation you describe without needing a data.frame. You can use standard subsetting to find which points meet your specified criteria.
For example, to find the combinations of points where pts1$year is greater than pts2$year we can do:
subset_points = outer(pts1$year, pts2$year, `>`)
Then, to find how many of these are separated more than 100 km, we can do
library(units)
sum(distmat[subset_points] > (100 * as_units('km', 1)))
A note on memory usage
However you approach this with sf or data.frame objects, the chances are that you will start to bump up against RAM limits with 1e9 floating points in each matrix or column of a data.table. You might think about instead converting your distance matrix to a raster. Then the raster can be stored on disk rather than in memory, and you can utilise the memory-safe functions in the raster package to crunch your way through.
How we might use rasters to work from disk and save RAM
We can use memory-safe raster operations for the very large matrices like this, for example:
library(raster)
# convert our matrices to rasters, so we can work on them from disk
r = raster(matrix(as.numeric(distmat), length(pts1$id), length(pts2$id)))
s = raster(subset_points)
remove('distmat', 'subset_points')
# now create a raster equal to r, but with zeroes in the cells we wish to exclude from calculation
rs = overlay(r,s,fun=function(x,y){x*y}, filename='out1.tif')
# find which cells have value greater than x (1e6 in the example)
Big_cells = reclassify(rs, matrix(c(-Inf, 1e6, 0, 1e6, Inf, 1), ncol=3, byrow=TRUE), 'out.tiff', overwrite=T)
# and finally count the cells
N = cellStats(Big_cells, sum)