Get summary vectors of raster cell centers in R - r

I want to extract summary vectors that contain the coordinates for the centers of the different cells in a raster. The following code works but I believe involves an n-squared comparison operation. Is there a more efficient method? Not seeing anything obvious in {raster}'s guidance.
require(raster)
r = raster(volcano)
pts = rasterToPoints(r)
x_centroids = unique(pts[,1])
y_centroids = unique(pts[,2])

To get the centers of the raster cells, you should use the functions xFromCol, yFromRow and friends (see also the help pages)
In this case, you get exactly the same result as follows:
require(raster)
r <- raster(volcano)
x_centers <- xFromCol(r)
y_centers <- yFromRow(r)
Note that these functions actually don't do much else but check the minimum value of the coordinates and the resolution of the raster. From these two values, they calculate the sequence of centers as follows:
xmin(r) + (seq_len(ncol(r)) - 0.5) * xres(r)
ymin(r) + (seq_len(nrow(r)) - 0.5) * xres(r)
But you better use the functions mentioned above, as these do a bit more safety checks.

Related

Normalizing an R stars object by grid area?

first post :)
I've been transitioning my R code from sp() to sf()/stars(), and one thing I'm still trying to grasp is accounting for the area in my grids.
Here's an example code to explain what I mean.
library(stars)
library(tidyverse)
# Reading in an example tif file, from stars() vignette
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)
x
# Get areas for each grid of the x object. Returns stars object with "area" in units of [m^2]
x_area <- st_area(x)
x_area
I tried loosely adopting code from this vignette (https://github.com/r-spatial/stars/blob/master/vignettes/stars5.Rmd) to divide each value in x by it's grid area, and it's not working as expected (perhaps because my objects are stars and not sf?)
x$test1 = x$L7_ETMs.tif / x_area # Some computationally intensive calculation seems to happen, but doesn't produce the results I expect?
x$test1 = x$L7_ETMs.tif / x_area$area # Throws error, "non-conformable arrays"
What does seem to work is the following.
x %>%
mutate(test1 = L7_ETMs.tif / units::set_units(as.numeric(x_area$area), m^2))
Here are the concerns I have with this code.
I worry that as I turn the x_area$area (a matrix, areas in lat/lon) into a numeric vector, I may mess up the lat/lon matching between the grid and it's area. I did some rough testing to see if the areas match up the way I expect them to, but can't escape the worry that this could lead to errors that are difficult to catch.
It just doesn't seem clean that I start with "x_area" in the correct units, only to remove then set the units again during the computation.
Can someone suggest a "cleaner" implementation for what I'm trying to do, i.e. multiplying or dividing grids by its area while maintaining units throughout? Or convince me that the code I have is fine?
Thanks!
I do not know how to improve the stars code, but you can compare the results you get with this
tif <- system.file("tif/L7_ETMs.tif", package = "stars")
library(terra)
r <- rast(tif)
a <- cellSize(r, sum=FALSE)
x <- r / a
With planar data you could do this when it is safe to assume there is no distortion (generally not the case, but it can be the case)
y <- r / prod(res(r))

How to extract specific values with point coordinates from Kriging interpolations made in R?

By using R version 3.4.2 and the library "geoR", I made kriging interpolations for different variables (bellow I give an example of my process). I also made a matrix with the coordinates for 305 trees with distinct marks (species, DBH, Height) that are within the same space for the interpolations, as seen in the image attached (https://imgur.com/SLQBnZH). I've been looking for ways to extract the nearest value from each variable for each tree and save the corresponding values in a data.frame or matrix, but haven't been successful, and I can't find specific answers to this.
One thing I've been looking at is trying to convert the Kriging result into a Raster (.tif) and proceed from there. But Kriging interpolations are made out of vector data, so is it even posible?
I'd be glad to receive any sort of help, thank you in advance!
P.S. I'm doing this so that I can latter use the data for spatial point patern analysis.
#Kriging####:
PG<-read.csv("PGF.csv", header=T, stringsAsFactors=FALSE)
library("geoR")
x<-(PG$x)
y<-(PG$y)
#Grid
loci<-expand.grid(x=seq(-5, 65, length=100), y=seq(-5, 85, length=100))
names(loci)<-c("x", "y")
mix<-cbind(rep(1,10000), loci$x, loci$y, loci$x*loci$y)
#Model
pH1.mod<-lm(pH1~y*x, data=PG, x=T)
pH1.kg<-cbind(pH1.mod$x[,3], pH1.mod$x[,2], pH1.mod$residuals)
#Transform to geographic data
pH1.geo<-as.geodata(pH1.kg)
#Variogram
pH1.vario<-variog(pH1.geo, max.dist=35)
pH1.vario.mod<-eyefit(pH1.vario)
#Cross validation
pH1.valcruz<-xvalid(pH1.geo, model=pH1.vario.mod)
#Kriging
pH1.krig<-krige.conv(pH1.geo, loc=loci, krige=krige.control(obj.model=pH1.vario.mod[[1]]))
#Predictive model
pH1a.yhat<-mix %*% pH1.mod$coefficients + pH1.krig$predict
#Exchange Kriging prediction values
pH1.krig$predict<-pH1.yhat
#Image
image(pH1.krig2)
contour(pH1.krig2, add=TRUE)
#Tree matrix####:
CoA<-read.csv("CoAr.csv", header=T)
#Data
xa<-(CoA$X)
ya<-(CoA$Y)
points(xa,ya, col=4)
TreeDF<-(cbind.data.frame(xa, ya, CoA$Species, CoA$DBH, CoA$Height, stringsAsFactors = TRUE))
m<-(cbind(xa, ya, 1:305))
as.matrix(m)
I tried to find the value of a point in space (trees [1:305]) through the minimum distance to a predicted value using the following code, (I suggest not running this since it takes too long):
for(i in 1:2){print(c(2:10000)[as.matrix(dist(rbind(m[i,], as.matrix(pH1.krig2$predict))))[i,2:10000]==min(as.matrix(dist(rbind(m[i,],as.matrix(pH1.krig2$predict))))[i,2:10000])])}
In the following link aldo_tapia's answer was the approach needed for this problem. Thank you to everyone! https://gis.stackexchange.com/questions/284698/how-to-extract-specific-values-with-point-coordinates-from-kriging-interpolation
The process is as follows:
Use extract() function from raster package:
library(raster)
r <- SpatialPointsDataFrame(loci, data.frame(predict = pH1.krig$predict))
gridded(r) <- T
r <- as(r,'RasterLayer')
pts <- SpatialPointsDataFrame(CoA[,c('X','Y')],CoA)
pH1.arb <-extract(r, pts)
to this I just added the values through cbind to the tree data frame since they are in order.
COA2<-cbind(CoA, pH1val=pH1.arb)
I will repeat the process for each variable.

R: How do I loop through spatial points with a specific buffer?

So my problem is quite difficult to describe so I hope I can make my question as clear as possible.
I use the rLiDAR package to load a .las file into R and afterwards convert it into a SpatialPointsDataFrame using the sp package.
So my SpatialPointsDataFrame is quite dense.
Now I want to define a buffer of 0.5 meters and loop (iterate) with him (the buffer) through the points, choosing always the point with the highest Z value within the buffer, as the next point to jump to.This should be repeated until there isn't any point within the buffer with an higher Z value as the current. All values (or perhaps the X and Y values) of this "found" point should then be written into a list/dataframe and the process should be repeated until all such highest points are found.
Thats the code I got so far:
>library(rLiDAR)
>library(sp)
>rLAS<-readLAS("Test.las",short=FALSE)
>PointCloud<- data.frame(rLAS)
>coordinates(PointCloud) <- c("X", "Y")
Well I googled extensively but I could not find any clues how to proceed further...
I dont even know which packages could be of help, I guess perhaps spatstat as my question would probably go into the spatial point pattern analysis.
Does anyone have some ideas how to archive something like that in R? Or is something like that not possible? (Do I perhaps have to skip to python to make something like this work?)
Help would gladly be appreciated.
If you want to get the set of points which are the local maxima within a 0.5m radius circle around each point, this should work. The gist of it is:
Convert the LAS points to a SpatialPointsDataFrame
Create a buffered polygon set with overlapping polygons
Loop through all buffered polygons and find the desired element within the buffer -- in your case, it's the one with the maximum height.
Code below:
library(rLiDAR)
library(sp)
library(rgeos)
rLAS <- readLAS("Test.las",short=FALSE)
PointCloud <- data.frame(rLAS)
coordinates(PointCloud) <- c("X", "Y")
Finish creating the SpatialPointsDataFrame from the LAS source. I'm assuming the field with the point height is PointCloud$value
pointCloudSpdf <- SpatialPointsDataFrame(data=PointCloud,xy)
Use rgeos library for intersection. It's important to have byid=TRUE or the polygons will get merged where they intersect
bufferedPoints <- gBuffer(pointCloudSpdf,width=0.5,byid=TRUE)
# Save our local maxima state (this will be updated)
localMaxes <- rep(FALSE,nrow(PointCloud))
i=0
for (buff in 1:nrow(bufferedPoint#data)){
i <- i+1
bufPolygons <- bufferedPoints#polygons[[i]]
bufSpPolygons <- SpatialPolygons(list(bufPolygons))
bufSpPolygonDf <-patialPolygonsDataFrame(bufSpPolygons,bufferedPoints#data[i,])
ptsInBuffer <- which(!is.na(over(pointCloudSpdf,spPolygonDf)))
# I'm assuming `value` is the field name containing the point height
localMax <- order(pointCloudSpdf#data$value[ptsInBuffer],decreasing=TRUE)[1]
localMaxes[localMax] <- TRUE
}
localMaxPointCloudDf <- pointCloudSpdf#data[localMaxes,]
Now localMaxPointCloudDf should contain the data from the original points if they are a local maximum. Just a warning -- this isn't going to be super fast if you have a lot of points. If that ends up being a concern you may be smarter about pre-filtering your points using a smaller grid and extract from the raster package.
That would look something like this:
Make the cell size small enough so that each 0.5m buffer will intersect at least 4 raster cells -- err on smaller since we are comparing circles to squares.
library(raster)
numRows <- extent(pointCloudSpdf)#ymax-extent(pointCloudSpdf)#ymin/0.2
numCols <- extent(pointCloudSpdf)#xmax-extent(pointCloudSpdf)#xmin/0.2
emptyRaster <- raster(nrow=numRows,ncol=numCols)
rasterize will create a grid with the maximum value of the given field within a cell. Because of the square/circle mismatch this is only a starting point to filter out obvious non-maxima. After this we will have a raster in which all the local maxima are represented by cells. However, we won't know which cells are maxima in the 0.5m radius and we don't know which point in the original feature layer they came from.
r <- rasterize(pointCloudSpdf,emptyRaster,"value",fun="max")
extract will give us raster values (i.e., the highest value for each cell) that each point intersects. Recall from above that all the local maxima will be in this set, although some values will not be 0.5m radius local maxima.
rasterMaxes <- extract(r,pointCloudSpdf)
To match up the original points with the raster maxes, just subtract the raster value at each point from that point's value. If the value is 0, then the values are the same and we have a point with a potential maximum. Note that at this point we are only merging the points back to the raster -- we will have to throw some of these out because they are "under" a 0.5m radius with a higher local max even though they are the max in their 0.2m x 0.2m cell.
potentialMaxima <- which(pointCloudSpdf#data$value-rasterMaxes==0)
Next, just subset the original SpatialPointsDataFrame and we'll do the more exhaustive and accurate iteration over this subset of points since we should have thrown out a bunch of points which could not have been maxima.
potentialMaximaCoords <- coordinates(pointCloudSpdf#coords[potentialMaxima,])
# using the data.frame() constructor because my example has only one column
potentialMaximaDf <- data.frame(pointCloudSpdf#data[potentialMaxima,])
potentialMaximaSpdf <-SpatialPointsDataFrame(potentialMaximaCoords,potentialMaximaDf)
The rest of the algorithm is the same but we are buffering the smaller dataset and iterating over it:
bufferedPoints <- gBuffer(potentialMaximaSpdf, width=0.5, byid=TRUE)
# Save our local maxima state (this will be updated)
localMaxes <- rep(FALSE, nrow(PointCloud))
i=0
for (buff in 1:nrow(bufferedPoint#data)){
i <- i+1
bufPolygons <- bufferedPoints#polygons[[i]]
bufSpPolygons <- SpatialPolygons(list(bufPolygons))
bufSpPolygonDf <-patialPolygonsDataFrame(bufSpPolygons,bufferedPoints#data[i,])
ptsInBuffer <- which(!is.na(over(pointCloudSpdf, spPolygonDf)))
localMax <- order(pointCloudSpdf#data$value[ptsInBuffer], decreasing=TRUE)[1]
localMaxes[localMax] <- TRUE
}
localMaxPointCloudDf <- pointCloudSpdf#data[localMaxes,]

spatial filtering by proximity in R

I have occurrence points for a species, and I'd like to remove potential sampling bias (where some regions might have much greater density of points than others). One way to do this would be to maximize a subset of points that are no less than a certain distance X of each other. Essentially, I would prevent points from being too close to each other.
Are there any existing R functions to do this? I've searched through various spatial packages, but haven't found anything, and can't figure out exactly how to implement this myself.
An example occurrence point dataset can be downloaded here.
Thanks!
I've written a new version of this function that no longer really follows rMaternII.
The input can either be a SpatialPoints, SpatialPointsDataFrame or matrix object.
Seems to work well, but suggestions welcome!
filterByProximity <- function(xy, dist, mapUnits = F) {
#xy can be either a SpatialPoints or SPDF object, or a matrix
#dist is in km if mapUnits=F, in mapUnits otherwise
if (!mapUnits) {
d <- spDists(xy,longlat=T)
}
if (mapUnits) {
d <- spDists(xy,longlat=F)
}
diag(d) <- NA
close <- (d <= dist)
diag(close) <- NA
closePts <- which(close,arr.ind=T)
discard <- matrix(nrow=2,ncol=2)
if (nrow(closePts) > 0) {
while (nrow(closePts) > 0) {
if ((!paste(closePts[1,1],closePts[1,2],sep='_') %in% paste(discard[,1],discard[,2],sep='_')) & (!paste(closePts[1,2],closePts[1,1],sep='_') %in% paste(discard[,1],discard[,2],sep='_'))) {
discard <- rbind(discard, closePts[1,])
closePts <- closePts[-union(which(closePts[,1] == closePts[1,1]), which(closePts[,2] == closePts[1,1])),]
}
}
discard <- discard[complete.cases(discard),]
return(xy[-discard[,1],])
}
if (nrow(closePts) == 0) {
return(xy)
}
}
Let's test it:
require(rgeos)
require(sp)
pts <- readWKT("MULTIPOINT ((3.5 2), (1 1), (2 2), (4.5 3), (4.5 4.5), (5 5), (1 5))")
pts2 <- filterByProximity(pts,dist=2, mapUnits=T)
plot(pts)
axis(1)
axis(2)
apply(as.data.frame(pts),1,function(x) plot(gBuffer(SpatialPoints(coords=matrix(c(x[1],x[2]),nrow=1)),width=2),add=T))
plot(pts2,add=T,col='blue',pch=20,cex=2)
There is also an R package called spThin that performs spatial thinning on point data. It was developed for reducing the effects of sampling bias for species distribution models, and does multiple iterations for optimization. The function is quite easy to implement---the vignette can be found here. There is also a paper in Ecography with details about the technique.
Following Josh O'Brien's advice, I looked at spatstat's rMaternI function, and came up with the following. It seems to work pretty well.
The distance is in map units. It would be nice to incorporate one of R's distance functions that always returns distances in meters, rather than input units, but I couldn't figure that out...
require(spatstat)
require(maptools)
occ <- readShapeSpatial('occurrence_example.shp')
filterByProximity <- function(occ, dist) {
pts <- as.ppp.SpatialPoints(occ)
d <- nndist(pts)
z <- which(d > dist)
return(occ[z,])
}
occ2 <- filterByProximity(occ,dist=0.2)
plot(occ)
plot(occ2,add=T,col='blue',pch=20)
Rather than removing data points, you might consider spatial declustering. This involves giving points in clusters a lower weight than outlying points. The two simplest ways to do this involve a polygonal segmentation, like a Voronoi diagram, or some arbitrary grid. Both methods will weight points in each region according to the area of the region.
For example, if we take the points in your test (1,1),(2,2),(4.5,4.5),(5,5),(1,5) and apply a regular 2-by-2 mesh, where each cell is three units on a side, then the five points fall into three cells. The points ((1,1),(2,2)) falling into the cell [0,3]X[0,3] would each have weights 1/( no. of points in current cell TIMES tot. no. of occupied cells ) = 1 / ( 2 * 3 ). The same thing goes for the points ((4.5,4.5),(5,5)) in the cell (3,6]X(3,6]. The "outlier", (1,5) would have a weight 1 / ( 1 * 3 ). The nice thing about this technique is that it is a quick way to generate a density based weighting scheme.
A polygonal segmentation involves drawing a polygon around each point and using the area of that polygon to calculate the weight. Generally, the polygons completely cover the entire region, and the weights are calculated as the inverse of the area of each polygon. A Voronoi diagram is usually used for this, but polygonal segmentations may be calculated using other techniques, or may be specified by hand.

How to generate bivariate data of different shapes (e.g., square, circle, rectangle) with outliers?

I am currently looking for some tool that would generate datasets of different shapes like square, circle, rectangle, etc. with outliers for cluster analysis.
Can any one of you recommend a good dataset generator for cluster analysis?
Is there anyway to generates such datasets in languages like R?
You should probably look into the mlbench package, especially synthetic dataset generating from mlbench.* functions, see some examples below.
Other datasets or utility functions are probably best found on the Cluster Task View on CRAN. As #Roman said, adding outliers is not really difficult, especially when you work in only two dimensions.
I would create a shape and extract bounding coordinates. You can populate the shape with random points using splancs package.
Here's a small snippet from one of my programs:
# First we create a circle, into which uniform random points will be generated (kudos to Barry Rowlingson, r-sig-geo).
circle <- function(x = x, y = y, r = radius, n = n.faces){
t <- seq(from = 0, to = 2 * pi, length = n + 1)[-1]
t <- cbind(x = x + r * sin(t), y = y+ r * cos(t))
t <- rbind(t, t[1,])
return(t)
}
csr(circle(0, 0, 100, 30), 1000)
Feel free to add outliers. One way of going about this is sampling different shapes and joining them in different ways.
There is a flexible data generator in ELKI that can generate various distributions in arbitrary dimensionality. It also can generate Gamma distributed variables, for example.
There is documentation on the Wiki: http://elki.dbs.ifi.lmu.de/wiki/DataSetGenerator

Resources