I have this stars object (could be also formatted to raster):
stars object with 2 dimensions and 2 attributes
attribute(s):
LST_mean elevation
Min. :14.98 Min. :296.0
1st Qu.:16.89 1st Qu.:346.9
Median :17.64 Median :389.3
Mean :17.52 Mean :389.2
3rd Qu.:18.18 3rd Qu.:428.3
Max. :20.11 Max. :521.6
dimension(s):
from to offset delta refsys point values
x 71 83 4387654 860.241 DHDN / 3-degree Gauss-Kru... FALSE NULL [x]
y 33 41 5598885 -860.241 DHDN / 3-degree Gauss-Kru... FALSE NULL [y]
Which has 2 attributes (layers in the case of raster): temperature and elevation.
Using temperature, I would like to select the pixels that fall within a buffer and return the mean, only for the pixels whose difference in elevation with the considered one everytime is less than 90 meters.
Any ideas how to do this?
Calculating the averages of the pixels that fall within the buffer is very easy, but I couldn't find a way to set any condition on them.
I will be immensly grateful for your help and suggestions. Approaches using other packages than satrs are also very welcome :)
Please see below a solution using terra. The code uses terra::extract to create two corresponding lists:
The pixel values
The surrounding buffer values
Subsequently the values are processed pairwise, using mapply, with a function similar to the one you suggested.
It's the first time I'm using terra but seems like terra::extract is much faster than raster::extract, therefore this solution may be feasible even for a large raster.
Creating sample data:
library(sf)
library(terra)
r = rast(ncol = ncol(volcano), nrow = nrow(volcano), xmin = 0, xmax = ncol(volcano), ymin = 0, ymax = nrow(volcano))
values(r) = volcano
s = r
s[] = rnorm(ncell(s))
r = c(r, s)
crs(r) = ""
plot(r)
Calculating buffers:
pnt = as.points(r, values = FALSE)
pol = buffer(pnt, 10)
Extracting raster values from points:
x = extract(r, pnt)
head(x)
## ID lyr.1 lyr.1
## [1,] 1 100 -0.03525223
## [2,] 2 100 0.31525467
## [3,] 3 101 0.94054608
## [4,] 4 101 0.37209238
## [5,] 5 101 -0.38388234
## [6,] 6 101 -0.03120593
Extracting raster values from buffers:
y = extract(r, pol)
head(y)
## ID lyr.1 lyr.1
## [1,] 1 100 -0.03525223
## [2,] 1 100 0.31525467
## [3,] 1 101 0.94054608
## [4,] 1 101 0.37209238
## [5,] 1 101 -0.38388234
## [6,] 1 101 -0.03120593
Now, the extracted values can be processed sequentially using mapply.
First, we convert the objects to lists:
x = as.data.frame(x)
x = split(x, x$ID)
y = as.data.frame(y)
y = split(y, y$ID)
Next, we use mapply to make the necessary calculation, each time
considering the current focal point value x and surrounding buffer
values y:
f = function(x, y) {
d = abs(x[, 2] - y[, 2]) ## differences
values = y[, 3] ## values
mean(values[d < 5], na.rm = TRUE) ## Mean of subset
}
result = mapply(f, x, y)
Finally, putting the results back into the raster template:
u = r[[1]]
values(u) = result
plot(u)
Related
I have a raster and a shapefile. The raster contains NA and I am filling the NAs using the focal function
library(terra)
v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
r[45:60, 45:60] <- NA
r_fill <- terra::focal(r, 5, mean, na.policy="only", na.rm=TRUE)
However, there are some NA still left. So I do this:
na_count <- terra::freq(r_fill, value = NA)
while(na_count$count != 0){
r_fill <- terra::focal(r_fill, 5, mean, na.policy="only", na.rm=TRUE)
na_count <- terra::freq(r_fill, value = NA)
}
Once all NA's are filled, I clip the raster again using the shapefile
r_fill <- terra::crop(r_fill, v, mask = T, touches = T)
This is what my before and after looks like:
I wondered if the while loop is an efficient way to fill the NAs or basically determine how many times I have to run focal to fill all the NAs in the raster.
Perhaps we can, or want to, dispense with the while( altogether by making a better estimate of focal('s w= arg in a world where r, as ground truth, isn't available. Were it available, we could readily derive direct value of w
r <- rast(system.file("ex/elev.tif", package="terra"))
# and it's variants
r2 <- r
r2[45:60, 45:60] <- NA
freq(r2, value=NA) - freq(r, value=NA)
layer value count
1 0 NA 256
sqrt((freq(r2, value=NA) - freq(r, value=NA))$count)
[1] 16
which might be a good value for w=, and introducing another variant
r3 <- r
r3[40:47, 40:47] <- NA
r3[60:67, 60:67] <- NA
r3[30:37, 30:37] <- NA
r3[70:77, 40:47] <- NA
rm(r)
We no longer have our ground truth. How might we estimate an edge of w=? Turning to boundaries( default values (inner)
r2_bi <- boundaries(r2)
r3_bi <- boundaries(r3)
# examining some properties of r2_bi, r3_bi
freq(r2_bi, value=1)$count
[1] 503
freq(r3_bi, value=1)$count
[1] 579
freq(r2_bi, value=1)$count/freq(r2_bi, value = 0)$count
[1] 0.1306833
freq(r3_bi, value=1)$count/freq(r3_bi, value = 0)$count
[1] 0.1534588
sum(freq(r2_bi, value=1)$count,freq(r2_bi, value = 0)$count)
[1] 4352
sum(freq(r3_bi, value=1)$count,freq(r3_bi, value = 0)$count)
[1] 4352
Taken in reverse order, sum[s] and freq[s] suggest that while the total area of (let's call them holes) are the same, they differ in number and r2 is generally larger than r3. This is also clear from the first pair of freq[s].
Now we drift into some voodoo, hocus pocus in pursuit of a better edge estimate
sum(freq(r2)$count) - sum(freq(r2, value = NA)$count)
[1] 154
sum(freq(r3)$count) - sum(freq(r3, value = NA)$count)
[1] 154
(sum(freq(r3)$count) - sum(freq(r3, value = NA)$count))
[1] 12.40967
freq(r2_bi, value=1)$count/freq(r2_bi, value = 0)$count
[1] 0.1306833
freq(r2_bi, value=0)$count/freq(r2_bi, value = 1)$count
[1] 7.652087
freq(r3_bi, value=1)$count/freq(r3_bi, value = 0)$count
[1] 0.1534588
taking the larger, i.e. freq(r2_bi 7.052087
7.652087/0.1306833
[1] 58.55444
154+58
[1] 212
sqrt(212)
[1] 14.56022
round(sqrt(212)+1)
[1] 16
Well, except for that +1 part, maybe still a decent estimate for w=, to be used on both r2 and r3 if called upon to find a better w, and perhaps obviate the need for while(.
Another approach to looking for squares and their edges:
wtf3 <- values(r3_bi$elevation)
wtf2 <- values(r2_bi$elevation)
wtf2_tbl_df2 <- as.data.frame(table(rle(as.vector(is.na(wtf2)))$lengths))
wtf3_tbl_df2 <- as.data.frame(table(rle(as.vector(is.na(wtf3)))$lengths))
names(wtf2_tbl_df2)
[1] "Var1" "Freq"
wtf2_tbl_df2[which(wtf2_tbl_df2$Var1 == wtf2_tbl_df2$Freq), ]
Var1 Freq
14 16 16
wtf3_tbl_df2[which(wtf3_tbl_df2$Freq == max(wtf3_tbl_df2$Freq)), ]
Var1 Freq
7 8 35
35/8
[1] 4.375 # 4 squares of 8 with 3 8 length vectors
bringing in v finally and filling
v <- vect(system.file("ex/lux.shp", package="terra"))
r2_fill_17 <- focal(r2, 16 + 1 , mean, na.policy='only', na.rm = TRUE)
r3_fill_9 <- focal(r3, 8 + 1 , mean, na.policy='only', na.rm = TRUE)
r2_fill_17_cropv <- crop(r2_fill_17, v, mask = TRUE, touches = TRUE)
r3_fill_9_cropv <- crop(r3_fill_9, v, mask = TRUE, touches = TRUE)
And I now appreciate your while( approach as your r2 looks better, more naturally transitioned, though the r3 looks fine. In my few, brief experiments with smaller than 'hole', i.e. focal(r2, 9, I got the sense it would take 2 passes to fill, that suggests focal(r2, 5 would take 4.
I guess further determining the proportion of fill:hole:rast for when to deploy a while would be worthwhile.
I am new-ish to using the stars package in R and I am having trouble with figuring out how to create a stars object C that has the values from object A but the extent of object B. Specifically, I have a map of average spring temperatures in Europe (Object A) and I want to crop it using a separate stars object containing presence of deciduous broadleaf forests (Object B).
Object A: https://i.stack.imgur.com/DQsZn.jpg
> CRU.SpringT.2009.2018_EU
stars object with 2 dimensions and 1 attribute
attribute(s):
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
CRU.SpringT.2009.2018_EU.tif -15.81895 1.201286 5.480992 4.979221 8.204463 17.55605 310479
dimension(s):
from to offset delta refsys point values x/y
x 1 1440 -10 0.0416667 WGS 84 FALSE NULL [x]
y 1 672 65 -0.0416667 WGS 84 FALSE NULL [y]
Object B: https://i.stack.imgur.com/6dons.jpg
> Dec.BL_EU3
stars_proxy object with 1 attribute in 1 file(s):
$Consensus_reduced_class_3.tif
[1] "[...]/Consensus_reduced_class_3.tif"
dimension(s):
from to offset delta refsys point values x/y
x 20401 27600 -180 0.00833333 +proj=longlat +datum=WGS8... FALSE NULL [x]
y 3001 6360 90 -0.00833333 +proj=longlat +datum=WGS8... FALSE NULL [y]
call_list:
[[1]]
x[i = i, drop = drop, crop = crop]
attr(,".Environment")
<environment: 0x000002a17a1ecac0>
[[2]]
x[i = i, drop = drop, crop = crop]
attr(,".Environment")
<environment: 0x000002a178d61dc0>
[[3]]
e1/e2
attr(,".Environment")
<environment: 0x000002a177987b10>
Both objects are cropped using the same bbox. The goal is for the resulting object (Object C) to have the extent of B but with the temperature values from A.
Dropbox links to tif files:
Object A: https://www.dropbox.com/s/lwvdxnis7k38e18/CRU.SpringT.2009.2018_EU.tif?dl=0
Object B: https://www.dropbox.com/s/uybxk40z853mu7a/EU%20Dec%20Broadleaf.tif?dl=0
That's it! It's much easier with the files :-) Please find below one possible solution to your problem. It works on my computer. I hope it will be the same for you :-)
I have deliberately detailed the code to make it easier for you to understand it. Of course, you can modify the form to make it more compact if needed.
PSEUDO Reprex !!
library(sf)
library(stars)
library(ggplot2)
A <- read_stars("CRU.SpringT.2009.2018_EU.tif")
B <- read_stars("consensus_full_class_3.tif")
# First step: Transform B to have the same extent and resolution of A
B_cropped_resample <- st_warp(B, A, use_gdal = TRUE)
# Second step: convert 0 to NA values in your object B (i.e. deciduous forests)
B_cropped_resample[B_cropped_resample == 0] <- NA # B should contain only NA and 1 values
# Third step: convert B into `sf` object with 'points' geometry
B_cropped_resample_sf <- st_as_sf(B_cropped_resample, as_points = TRUE, na.rm = TRUE)
# Fourth step : extract values of object A with the object `B_cropped_resample_sf`
C <- st_extract(A, B_cropped_resample_sf) # C is a sf object which contains the
# values of A at the corresponding points
# Sixth step: convert 'sf' object back into stars 'object' and split 'sf' point
# geometry into two 'stars' dimensions
C <- st_as_stars(C, name = attr(C, "CRU.SpringT.2009.2018_EU.tif", "geometry"))
C <- st_sfc2xy(C)
# Seventh step: plot the resulting 'stars' object which shows the temperatures (i.e.
# values from object A) for each deciduous forest (i.e. locations from object B)
ggplot2::ggplot() + geom_stars(data = C) +
coord_equal()
Please find the image obtained after having run the code:
I am trying to use the distHavrsine function in R, inside a loop to calculate the distance between some latitude and longitude coordinates for a several hundred rows. In my loop I have this code:
if ((distHaversine(c(file[i,"long"], file[i,"lat"]),
c(file[j,"long"], file[j,"lat"]))) < 50 )
after which if the distance is less than 50 meters i want it to record those rows, and where the latitude and longitude coordinates it is referencing look like:
0.492399367 30.42530045
and
0.496899361 30.42497045
but i get this error
Error in .pointsToMatrix(p1) : latitude > 90
i get this error "Error in .pointsToMatrix(p1) : latitude > 90". Can
anyone explain why and how to solve?
The error tells you that you got latitude values greater than 90, which is out of scope:
library(geosphere)
distHaversine(c(4,52), c(13,52))
# [1] 616422
distHaversine(c(4,52), c(1,91))
# Error in .pointsToMatrix(p2) : latitude > 90
You can solve this issue by only feeding distHaversine with coordinates inside the accepted ranges.
I am trying to use the distHavrsine function in R, inside a loop to
calculate the distance between some latitude and longitude coordinates
for a several hundred rows. (...) if the distance is less than 50
meters i want it to record those rows
Have a look at the distm function, which calculates a distance matrix for your few hundred rows easily (i.e. without loops). It uses distHaversine by default. For example, to get the data frame rows that are closer then 650000 meters:
df <- read.table(sep=",", col.names=c("lon", "lat"), text="
4,52
13,52
116,39")
(d <- distm(df))
# [,1] [,2] [,3]
# [1,] 0 616422 7963562
# [2,] 616422 0 7475370
# [3,] 7963562 7475370 0
d[upper.tri(d, T)] <- NA
( idx <- which(d < 650000, arr.ind = T) )
# row col
# [1,] 2 1
cbind(df[idx[, 1], ], df[idx[, 2], ])
# lon lat lon lat
# 2 13 52 4 52
On extracting values of a raster to points I find that I have several NA's, and rather than use a buffer and fun arguments of extract function, instead I'd like to extract the nearest non-NA Pixel to a point that overlaps NA.
I am using the basic extract function:
data.extr<-extract(loc.thr, data[,11:10])
Here's a solution without using the buffer. However, it calculates a distance map separately for each point in your dataset, so it might be ineffective if your dataset is large.
set.seed(2)
# create a 10x10 raster
r <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10)
r[] <- 1:10
r[sample(1:ncell(r), size = 25)] <- NA
# plot the raster
plot(r, axes=F, box=F)
segments(x0 = 0, y0 = 0:10, x1 = 10, y1 = 0:10, lty=2)
segments(y0 = 0, x0 = 0:10, y1 = 10, x1 = 0:10, lty=2)
# create sample points and add them to the plot
xy = data.frame(x=runif(10,1,10), y=runif(10,1,10))
points(xy, pch=3)
text(x = xy$x, y = xy$y, labels = as.character(1:nrow(xy)), pos=4, cex=0.7, xpd=NA)
# use normal extract function to show that NAs are extracted for some points
extracted = extract(x = r, y = xy)
# then take the raster value with lowest distance to point AND non-NA value in the raster
sampled = apply(X = xy, MARGIN = 1, FUN = function(xy) r#data#values[which.min(replace(distanceFromPoints(r, xy), is.na(r), NA))])
# show output of both procedures
print(data.frame(xy, extracted, sampled))
# x y extracted sampled
#1 5.398959 6.644767 6 6
#2 2.343222 8.599861 NA 3
#3 4.213563 3.563835 5 5
#4 9.663796 7.005031 10 10
#5 2.191348 2.354228 NA 2
#6 1.093731 9.835551 2 2
#7 2.481780 3.673097 3 3
#8 8.291729 2.035757 9 9
#9 8.819749 2.468808 9 9
#10 5.628536 9.496376 6 6
This is a raster-based solution, by first filling the NA pixels with the nearest non-NA pixel value.
Note however, that this does not take into account the position of a point within a pixel. Instead, it calculates the distances between pixel centers to determine the nearest non-NA pixel.
First, it calculates for each NA raster pixel the distance and direction to the nearest non-NA pixel. The next step is to calculate the coordinates of this non-NA cell (assumes projected CRS), extract its value and to store this value at the NA location.
Starting data: a projected raster, with identical values as in the answer from koekenbakker:
set.seed(2)
# set projected CRS
r <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10, crs='+proj=utm +zone=1')
r[] <- 1:10
r[sample(1:ncell(r), size = 25)] <- NA
# create sample points
xy = data.frame(x=runif(10,1,10), y=runif(10,1,10))
# use normal extract function to show that NAs are extracted for some points
extracted <- raster::extract(x = r, y = xy)
Calculate the distance and direction from all NA pixels to the nearest non-NA pixel:
dist <- distance(r)
# you can also set a maximum distance: dist[dist > maxdist] <- NA
direct <- direction(r, from=FALSE)
Retrieve coordinates of NA pixels
# NA raster
rna <- is.na(r) # returns NA raster
# store coordinates in new raster: https://stackoverflow.com/a/35592230/3752258
na.x <- init(rna, 'x')
na.y <- init(rna, 'y')
# calculate coordinates of the nearest Non-NA pixel
# assume that we have a orthogonal, projected CRS, so we can use (Pythagorean) calculations
co.x <- na.x + dist * sin(direct)
co.y <- na.y + dist * cos(direct)
# matrix with point coordinates of nearest non-NA pixel
co <- cbind(co.x[], co.y[])
Extract values of nearest non-NA cell with coordinates 'co'
# extract values of nearest non-NA cell with coordinates co
NAVals <- raster::extract(r, co, method='simple')
r.NAVals <- rna # initiate new raster
r.NAVals[] <- NAVals # store values in raster
Fill the original raster with the new values
# cover nearest non-NA value at NA locations of original raster
r.filled <- cover(x=r, y= r.NAVals)
sampled <- raster::extract(x = r.filled, y = xy)
# compare old and new values
print(data.frame(xy, extracted, sampled))
# x y extracted sampled
# 1 5.398959 6.644767 6 6
# 2 2.343222 8.599861 NA 3
# 3 4.213563 3.563835 5 5
# 4 9.663796 7.005031 10 10
# 5 2.191348 2.354228 NA 3
# 6 1.093731 9.835551 2 2
# 7 2.481780 3.673097 3 3
# 8 8.291729 2.035757 9 9
# 9 8.819749 2.468808 9 9
# 10 5.628536 9.496376 6 6
Note that point 5 takes another value than the answer of Koekenbakker, since this method does not take into account the position of the point within a pixel (as mentioned above). If this is important, this solution might not be appropriate. In other cases, e.g. if the raster cells are small compared to the point accuracy, this raster-based method should give good results.
For a raster stack, use #koekenbakker's solution above, and turn it into a function. A raster stack's #layers slot is a list of rasters, so, lapply it across and go from there.
#new layer
r2 <- raster(ncol=10,nrow=10, xmn=0, xmx=10, ymn=0,ymx=10)
r2[] <- 1:10
r2[sample(1:ncell(r2), size = 25)] <- NA
#make the stack
r_stack <- stack(r, r2)
#a function for sampling
sample_raster_NA <- function(r, xy){
apply(X = xy, MARGIN = 1,
FUN = function(xy) r#data#values[which.min(replace(distanceFromPoints(r, xy), is.na(r), NA))])
}
#lapply to get answers
lapply(r_stack#layers, function(a_layer) sample_raster_NA(a_layer, xy))
Or to be fancy (speed improvements?)
purrr::map(r_stack#layers, sample_raster_NA, xy=xy)
Which makes me wonder if the whole thing can be sped up even more using dplyr...
I need to calculate the centroids of a set of spatial zones based on a separate population grid dataset. Grateful for a steer on how to achieve this for the example below.
Thanks in advance.
require(raster)
require(spdep)
require(maptools)
dat <- raster(volcano) # simulated population data
polys <- readShapePoly(system.file("etc/shapes/columbus.shp",package="spdep")[1])
# set consistent coordinate ref. systems and bounding boxes
proj4string(dat) <- proj4string(polys) <- CRS("+proj=longlat +datum=NAD27")
extent(dat) <- extent(polys)
# illustration plot
plot(dat, asp = TRUE)
plot(polys, add = TRUE)
Three steps:
First, find all the cells in each polygon, return a list of 2-column matrices with the cell number and the value:
require(plyr) # for llply, laply in a bit...
cell_value = extract(dat, polys,cellnumbers=TRUE)
head(cell_value[[1]])
cell value
[1,] 31 108
[2,] 32 108
[3,] 33 110
[4,] 92 110
[5,] 93 110
[6,] 94 111
Second, turn into a list of similar matrices but add the x and y coords:
cell_value_xy = llply(cell_value, function(x)cbind(x,xyFromCell(dat,x[,"cell"])))
head(cell_value_xy[[1]])
cell value x y
[1,] 31 108 8.581164 14.71973
[2,] 32 108 8.669893 14.71973
[3,] 33 110 8.758623 14.71973
[4,] 92 110 8.581164 14.67428
[5,] 93 110 8.669893 14.67428
[6,] 94 111 8.758623 14.67428
Third, compute the weighted mean coordinate. This neglects any edge effects and assumes all grid cells are the same size:
centr = laply(cell_value_xy, function(m){c(weighted.mean(m[,3],m[,2]), weighted.mean(m[,4],m[,2]))})
head(centr)
1 2
[1,] 8.816277 14.35309
[2,] 8.327463 14.02354
[3,] 8.993655 13.82518
[4,] 8.467312 13.71929
[5,] 9.011808 13.28719
[6,] 9.745000 13.47444
Now centr is a 2-column matrix. In your example its very close to coordinates(polys) so I'd make a contrived example with some extreme weights to make sure its working as expected.
Another alternative.
I like it for its compactness, but it will likely only make sense if you're fairly familiar with the full family of raster functions:
## Convert polygons to a raster layer
z <- rasterize(polys, dat)
## Compute weighted x and y coordinates within each rasterized region
xx <- zonal(init(dat, v="x")*dat, z) / zonal(dat,z)
yy <- zonal(init(dat, v="y")*dat, z) / zonal(dat,z)
## Combine results in a matrix
res <- cbind(xx[,2],yy[,2])
head(res)
# [,1] [,2]
# [1,] 8.816277 14.35309
# [2,] 8.327463 14.02354
# [3,] 8.993655 13.82518
# [4,] 8.467312 13.71929
# [5,] 9.011808 13.28719
# [6,] 9.745000 13.47444
The answers by Spacedman and Josh are really great, but I'd like to share two other alternatives which are relatively fast and simple.
library(data.table)
library(spatialEco)
library(raster)
library(rgdal)
using a data.table approach:
# get centroids of raster data
data_points <- rasterToPoints(dat, spatial=TRUE)
# intersect with polygons
grid_centroids <- point.in.poly(data_points, polys)
# calculate weighted centroids
grid_centroids <- as.data.frame(grid_centroids)
w.centroids <- setDT(grid_centroids)[, lapply(.SD, weighted.mean, w=layer), by=POLYID, .SDcols=c('x','y')]
using wt.centroid{spatialEco} :
# get a list of the ids from each polygon
poly_ids <- unique(grid_centroids#data$POLYID)
# use lapply to calculate the weighted centroids of each individual polygon
w.centroids.list <- lapply(poly_ids, function(i){wt.centroid( subset(grid_centroids, grid_centroids#data$POLYID ==i)
, 'layer', sp = TRUE)} )
My own less elegant solution below. Gives exactly the same results as Spacedman and Josh.
# raster to pixels
p = rasterToPoints(dat) %>% as.data.frame()
coordinates(p) = ~ x + y
crs(p) = crs(polys)
# overlay pixels on polygons
ol = over(p, polys) %>% mutate(pop = p$layer) %>% cbind(coordinates(p)) %>%
filter(COLUMBUS_ %in% polys$COLUMBUS_) %>% # i.e. a unique identifier
dplyr::select(x, y, pop, COLUMBUS_) %>% as_data_frame()
# weighted means of x/y values, by pop
pwcs = split(ol, ol$COLUMBUS_) %>% lapply(function(g){
data.frame(x = weighted.mean(g$x, g$pop), y = weighted.mean(g$y, g$pop))
}) %>% bind_rows() %>% as_data_frame()