I'd like to sample non-adjacent cells from a raster using the raster cell values as their inclusion probability.
Here is an example of sampling cells from a raster which can result in adjacent cells being selected.
set.seed(1)
library(raster)
r <- raster(matrix(runif(2500), nrow = 50, ncol = 50))
plot(r)
cells = sample(1:ncell(r), 100, prob=r[], replace=FALSE)
cell.centers = xyFromCell(r, cells)
plot(r)
points(cell.centers, pch = 19)
I realize with large sample sizes it would be nearly impossible to avoid selecting adjacent cells. In practice the number of samples I will be making is always going to be much smaller than the number of available cells. I just want to avoid including adjacent cells. I guess I could sample iteratively and remove cells once they are chosen..but that seems obtuse. Any help appreciated.
Related
This question was asked once in the past (from what I could find), but the only response did not provide a solution. Within a RasterStack, I want to generate summary statistics for each raster (min, max, mean, SD), AND the number of cells included in these calculations (i.e., non-NA cell count). You might think the number would be the same for every raster if they are all the same extent and resolution, but these rasters have been masked by their respective QA layers, resulting in a different number of non-NA cells in each raster layer. I've been using cellStats, but that does not provide cell count as output. I could also use zonal stats, but that does not appear to have this functionality (from what I read). Does anyone know how to add this to my output?
Thanks
I think you are looking for the freq function. This function will give you the frequency of pixels by value. Here is one way you could calculate the number of non-NA pixels with some dummy data.
library(raster)
#Create 2 matrix
m1<-matrix(sample(1:10, 250, replace = T),
nrow = 50,
ncol = 50)
m2<-matrix(sample(11:20, 250, replace = T),
nrow = 50,
ncol = 50)
#Transform it to stack
r1 <- stack(raster(m1), raster(m2))
#Set pixel values == 3 and == 12 as NA
r1[r1 ==3 | r1 == 12]<-NA
#Get your cellStats
cellStats(r1, stat = "mean")
#Transform non-NA values to 1
r1[!is.na(r1)]<-1
#Get frequency of pixels by value
#1's will be the number of non-NA pixels
freq(r1)
I have two rasters of different resolution and what I want is to mask out areas of raster A based on proportional cover of raster B. For example, I want to mask out pixels of raster A, if at least the 1/3 of them is covered by raster B.
I have no idea how to do this, so any help will be appreciated.
library(raster)
A <- raster(matrix(1:60,3,4))
B <- raster(matrix(1:20,5,6))
Your rasters are different resolutions over the same extent between 0 and 1 on x and y. A is 4x3, B is 6x5.
par(mfrow=c(1,2))
plot(A)
plot(B)
One first step could be to align the two to have the same resolution, so that we can apply the mask function. The raster::disaggregate function splits each cell into an integer number of columns an rows. In this case, with low resolution, I can split each cell into the number of columns and rows of the other matrix, so the four columns of A can each be split into 6, and the six columns of B can be split into 4, such that both are 24 wide. (Thanks to this answer for help.)
A_resize <- disaggregate(A, fact = c(dim(B)[2], dim(B)[1]), method = "bilinear")
B_resize <- disaggregate(B, fact = c(dim(A)[2], dim(A)[1]), method = "bilinear")
par(mfrow=c(1,2))
plot(A_resize)
plot(B_resize)
I'm interpreting your question to mean you want to mask out A when the values of B in that same place are < 1/3 of max.
We could make a modified version of B_resize which reclassifies everything < 1/3 of max as an NA. (Thanks to this answer for help.)
B_mask <- reclassify(B_resize,
rbind(c(0, B#data#max * 1/3, NA),
c(B#data#max * 1/3, B#data#max, 1)))
plot(B_mask)
and we could then use that to mask A:
par(mfrow = c(1,1))
plot(mask(A_resize,B_mask))
When using the raster package, are resample's or projectRaster's bilinear interpolation always taking the four nearest centroids, or does it account for differences in cell sizes?
In the explanation for bilinear interpolation in resampling found at http://desktop.arcgis.com/en/arcmap/latest/extensions/spatial-analyst/performing-analysis/cell-size-and-resampling-in-analysis.htm, it says the four nearest centroids are used. In the example here, the cell sizes are similar. This explanation is not directly for the raster package in R, but it appeared to be the most accessible explanation for how bilinear interpolation might work in these cases.
Are a different amount of centroids used when the cell sizes differ by more than a factor of 2 in either (or both) dimensions?
For example, here is a raster with a higher resolution than what it is going to be resampled to:
library(raster)
R1 <- raster(resolution = 13, vals = sample(x = 1:5, size = 392, replace = TRUE))
plot(R1)
High Resolution Raster
Created here is the raster to be projectRaster to, and then R1 after projectRaster to match the new raster:
R2 <- raster(resolution = 50)
R3 <- projectRaster(from = R1, to = R2, method = "bilinear")
plot(R3)
New Low Resolution Raster
Which cells from the high resolution raster are being taken into consideration when projectRaster is being used to create the fewer, larger cells? Since the cells in the high resolution raster are more than four times as small as the cells in the low resolution one, if bilinear interpolation just used the four closest centroids, it may not be as accurately representative as it could be if more cells were taken into consideration.
Prior to the actual resampling with bilinear interpolation, the raster package first aggregates the values to create cells with an approximately equal resolution as the target. It is from this aggregated raster that the four cells are taken.
This is not done with projectRaster. Tis will be added in a future version, but for now you could first aggregate the input data yourself.
I would like to obtain the extent of raster layer conditional on certain cell values. Consider the following example:
raster1 is a large raster object, filled with values between 1 and 1000. However, I only want to obtain the extent for pixels with value 100. Since this subset of cells should crowd in a small region, the extent should be rather narrow. Once I know the coordinates of that box, I can crop this minor area.
My approach so far is to replace all values != 100 with NA - as suggested in related questions. Considering the raster object's overall size, this step takes an enormous amount of time and invests a lot of computational capacity in regions that I would like to crop anyways.
Does anyone know how to obtain the extent conditional on a certain pixel value which does not require to reclassify the entire object beforehand?
Here is an alternative way to do that
Example data:
library(raster)
r <- raster(ncol=18,nrow=18)
values(r) <- 1
r[39:45] <- 100
r[113:115] <- 100
r[200] <- 100
"Standard" way:
x <- r == 100
s <- trim(x, values=FALSE)
Alternate route by creating an extent:
xy <- rasterToPoints(r, function(x){ x ==100 })
e <- extent(xy[,1:2])
e <- alignExtent(e, r, snap='out')
v <- crop(r, e)
Either way, all cells need to be looked at, but at least you do not need to create another large raster.
I want to group a list of Long and Lats (my_long_lats) based on pre determined center points (my_center_Points).
When I run:-
k <- kmeans(as.matrix(my_long_lats), centers = as.matrix(my_center_Points))
k$centers does not equal my_center_Points.
I assume k-means has adjusted my center points to the optimal center. But what I need is for my_center_Points to not change and group my_long_lats around them.
In this link
they talk about setting initial centers but How do I set centers that wont change once I run the k means? Or is there a better clustering algorithm for this?
I could even settle for minimizing the movement of the centers.
I still have a lot to learn in R, any help is really appreciated.
centers are automatically evaluated after performing kmeans clustering. In fact, determining centers is a vital point in order to divide into cluster groups. Couple of options I can think that can help you.
Limit iter.max. You can set it to just 1 in kmeans function call. This will not guarantee to keep centers fixed but changes will be less if you are dealing with large data sets.
Use of dummy data. You can add many dummy data in your actual data sets around chosen centers. This will put extra weight towards along pre-determined centers. Most likely centers will remain unchanged.
Here is the calculation using the geosphere library to properly compute the distance from latitude and longitude.
The variable closestcenter is the result which identifies the closest center to each point.
#define random data
centers<-data.frame(x=c(44,44, 50, 50), y=c(44, 50, 44, 50))
pts<-data.frame(x=runif(50, 40, 55), y=runif(50, 40, 55))
#allocate space
distance<-matrix(-1, nrow = length(pts$x), ncol= length(centers$x))
library(geosphere)
#calculate the dist matrix - the define centers to each point
#columns represent centers and the rows are the data points
dm<-apply(data.frame(1:length(centers$x)), 1, function(x){ replace(distance[,x], 1:length(pts$x), distGeo(centers[x,], pts))})
#find the column with the smallest distance
closestcenter<-apply(dm, 1, which.min)
#color code the original data for verification
colors<-c("black", "red", "blue", "green")
plot(pts , col=colors[closestcenter], pch=19)