I have data with associated longitudes and latitudes. How do I get a variogram for this data based on the great-circle distances between the points?
This simple example has all the data on the equator:
require(geoR)
long <- seq(-179, 180)
x <- sin(pi * long / 180) + rnorm(length(long))
V <- variog(data=x, coords=cbind(long, 0))
# variog: computing omnidirectional variogram
plot(V)
The first and last points are actually only 1 degree apart, but my naive attempt results in variog thinking they're separated by 359 degrees.
You should use the Semi-variogram from nmle. It allows you to specify a distance matrix, which you can trivially work out for yourself.
From a post on R-sig-geo (mailing list dedicated to spatial data in R) I seem to remember that there are no ready-to-go functions in R that support great circle distances:
http://r-sig-geo.2731867.n2.nabble.com/Great-Circle-distances-in-Automap-Gstat-td6863940.html
My suggestion would be to project your data and than perform interpolation on the projected data.
Related
I want to know the fastest algorithms for obtaining the cartesian distances between each point in a SpatialPointsDataFrame (X) and either (a) the closest point in a second SpatialPointsDataFrame (Y), or (b) the closest line segment in a SpatialLinesDataFrame (Y). So this is basically 2 questions, with perhaps the same answer.
For the lines, I know I can use dist2Line(X,Y, distfun=distGeo) but this is insanely slow. I also tried using nncross, after converting both X and Y to ppp objects, as below. This is did NOT work; heat mapping the new distance measure showed that it does not radiate from Y.
X_ppp <- as(X, "ppp")
Y_psp <- as(Y, "psp")
distR <- nncross(X_ppp,Y_ppp,what="dist",k=1)
X$dist2road <- distR
For lines, I also tried using gDistance(X,Y) but was met with the error, for i=1,2: Spatial object i is not projected; GEOS expects planar coordinates. I think this is because I'm using lat-lon, and it needs a true projection. But all the files i'm working with are lat-lon, and I'm not sure how to choose and specify a projection (for tanzania) w/out coping it from another file.
For points, again using the nncross approach resulted in definitely wrong distances. (In each the point and line case, is this because the output vector is not ordered in the same way that the points within X are? If so, I see now way of outputting an ID for the point within X.)
Also for points, this knn code below did work. But it's clearly not in cartesian distance, and so I'd like to convert it or find some other algorithm that provides cartesian distance.
knn.results = knn(data=coordinates(market.shp),
query=coordinates(tzprice.shp), k=1)
knn.results <- data.frame(knn.results)
tzprice.shp$dist2market <- knn.results[,2]
Basically, my hope is to find the fastest algorithm for each purpose (distance to nearest point, distance to nearest line), with output either in cartesian distance or convertible to cartesian distance. Thanks!
Somebody pointed me towards one possible answer for finding the cartesian distance between each point in a SpatialPointsDataFrame (X) and the closest point in a second SpatialPointsDataFrame (let's call it Y). So that's the first half of my question... perhaps there's a faster method out there, but this way is quite fast, and it DOES return answers in Km, at least if proj=longlat.
tree <- createTree(coordinates(Y))
inds <- knnLookup(tree, newdat=coordinates(X), k=1)
distkm <- sapply(seq_len(nrow(inds)), function(i) spDists(X[i, ], Y[inds[i, ],]))
Still looking for an algorithm that (quickly) finds meters/km from each point in X to the nearest line in a SpatialLinesDataFrame.
By using R version 3.4.2 and the library "geoR", I made kriging interpolations for different variables (bellow I give an example of my process). I also made a matrix with the coordinates for 305 trees with distinct marks (species, DBH, Height) that are within the same space for the interpolations, as seen in the image attached (https://imgur.com/SLQBnZH). I've been looking for ways to extract the nearest value from each variable for each tree and save the corresponding values in a data.frame or matrix, but haven't been successful, and I can't find specific answers to this.
One thing I've been looking at is trying to convert the Kriging result into a Raster (.tif) and proceed from there. But Kriging interpolations are made out of vector data, so is it even posible?
I'd be glad to receive any sort of help, thank you in advance!
P.S. I'm doing this so that I can latter use the data for spatial point patern analysis.
#Kriging####:
PG<-read.csv("PGF.csv", header=T, stringsAsFactors=FALSE)
library("geoR")
x<-(PG$x)
y<-(PG$y)
#Grid
loci<-expand.grid(x=seq(-5, 65, length=100), y=seq(-5, 85, length=100))
names(loci)<-c("x", "y")
mix<-cbind(rep(1,10000), loci$x, loci$y, loci$x*loci$y)
#Model
pH1.mod<-lm(pH1~y*x, data=PG, x=T)
pH1.kg<-cbind(pH1.mod$x[,3], pH1.mod$x[,2], pH1.mod$residuals)
#Transform to geographic data
pH1.geo<-as.geodata(pH1.kg)
#Variogram
pH1.vario<-variog(pH1.geo, max.dist=35)
pH1.vario.mod<-eyefit(pH1.vario)
#Cross validation
pH1.valcruz<-xvalid(pH1.geo, model=pH1.vario.mod)
#Kriging
pH1.krig<-krige.conv(pH1.geo, loc=loci, krige=krige.control(obj.model=pH1.vario.mod[[1]]))
#Predictive model
pH1a.yhat<-mix %*% pH1.mod$coefficients + pH1.krig$predict
#Exchange Kriging prediction values
pH1.krig$predict<-pH1.yhat
#Image
image(pH1.krig2)
contour(pH1.krig2, add=TRUE)
#Tree matrix####:
CoA<-read.csv("CoAr.csv", header=T)
#Data
xa<-(CoA$X)
ya<-(CoA$Y)
points(xa,ya, col=4)
TreeDF<-(cbind.data.frame(xa, ya, CoA$Species, CoA$DBH, CoA$Height, stringsAsFactors = TRUE))
m<-(cbind(xa, ya, 1:305))
as.matrix(m)
I tried to find the value of a point in space (trees [1:305]) through the minimum distance to a predicted value using the following code, (I suggest not running this since it takes too long):
for(i in 1:2){print(c(2:10000)[as.matrix(dist(rbind(m[i,], as.matrix(pH1.krig2$predict))))[i,2:10000]==min(as.matrix(dist(rbind(m[i,],as.matrix(pH1.krig2$predict))))[i,2:10000])])}
In the following link aldo_tapia's answer was the approach needed for this problem. Thank you to everyone! https://gis.stackexchange.com/questions/284698/how-to-extract-specific-values-with-point-coordinates-from-kriging-interpolation
The process is as follows:
Use extract() function from raster package:
library(raster)
r <- SpatialPointsDataFrame(loci, data.frame(predict = pH1.krig$predict))
gridded(r) <- T
r <- as(r,'RasterLayer')
pts <- SpatialPointsDataFrame(CoA[,c('X','Y')],CoA)
pH1.arb <-extract(r, pts)
to this I just added the values through cbind to the tree data frame since they are in order.
COA2<-cbind(CoA, pH1val=pH1.arb)
I will repeat the process for each variable.
I've been working with a spatial model which contains 21,000 grid cells of unequal size (i by j, where i is [1:175] and j is[1:120]). I have the latitude and longitude values in two seperate arrays (lat_array,lon_array) of i and j dimensions.
Plotting the coordinates:
> plot(lon_array, lat_array, main='Grid Coordinates')
Result:
My question: Is it possible to plot these spatial coordinates as a grid rather than as points? Does anyone know of a package or function that might be able to do this? I haven't been able to find anything online to this nature.
Thanks.
First of all it is always a bit dangerous to plot inherently spherical coordinates (lat,long) directly in the plane. Usually you should project them in some way, but I will leave it for you to explore the sp package and the function spTransform or something like that.
I guess in principle you could simply use the deldir package to calculate the Dirichlet tessellation of you points which would give you a nice grid. However, you need a bounding region for this to avoid large cells radiating out from the border of your region. I personally use spatstat to call deldir so I can't give you the direct commands in deldir, but in spatstat I would do something like:
library(spatstat)
plot(lon_array, lat_array, main='Grid Coordinates')
W <- clickpoly(add = TRUE) # Now click the region that contains your grid
i_na <- is.na(lon_array) | is.na(lat_array) # Index of NAs
X <- ppp(lon_array[!i_na], lat_array[!i_na], window = W)
grid <- dirichlet(X)
plot(grid)
I have not tested this yet and I will update this answer once I get the chance to test it with some artificial data. A major problem is the size of your dataset which may take a long time to calculate the Dirichlet tessellation of. I have only tried to call dirichlet on dataset of size up to 3000 points...
I have occurrence points for a species, and I'd like to remove potential sampling bias (where some regions might have much greater density of points than others). One way to do this would be to maximize a subset of points that are no less than a certain distance X of each other. Essentially, I would prevent points from being too close to each other.
Are there any existing R functions to do this? I've searched through various spatial packages, but haven't found anything, and can't figure out exactly how to implement this myself.
An example occurrence point dataset can be downloaded here.
Thanks!
I've written a new version of this function that no longer really follows rMaternII.
The input can either be a SpatialPoints, SpatialPointsDataFrame or matrix object.
Seems to work well, but suggestions welcome!
filterByProximity <- function(xy, dist, mapUnits = F) {
#xy can be either a SpatialPoints or SPDF object, or a matrix
#dist is in km if mapUnits=F, in mapUnits otherwise
if (!mapUnits) {
d <- spDists(xy,longlat=T)
}
if (mapUnits) {
d <- spDists(xy,longlat=F)
}
diag(d) <- NA
close <- (d <= dist)
diag(close) <- NA
closePts <- which(close,arr.ind=T)
discard <- matrix(nrow=2,ncol=2)
if (nrow(closePts) > 0) {
while (nrow(closePts) > 0) {
if ((!paste(closePts[1,1],closePts[1,2],sep='_') %in% paste(discard[,1],discard[,2],sep='_')) & (!paste(closePts[1,2],closePts[1,1],sep='_') %in% paste(discard[,1],discard[,2],sep='_'))) {
discard <- rbind(discard, closePts[1,])
closePts <- closePts[-union(which(closePts[,1] == closePts[1,1]), which(closePts[,2] == closePts[1,1])),]
}
}
discard <- discard[complete.cases(discard),]
return(xy[-discard[,1],])
}
if (nrow(closePts) == 0) {
return(xy)
}
}
Let's test it:
require(rgeos)
require(sp)
pts <- readWKT("MULTIPOINT ((3.5 2), (1 1), (2 2), (4.5 3), (4.5 4.5), (5 5), (1 5))")
pts2 <- filterByProximity(pts,dist=2, mapUnits=T)
plot(pts)
axis(1)
axis(2)
apply(as.data.frame(pts),1,function(x) plot(gBuffer(SpatialPoints(coords=matrix(c(x[1],x[2]),nrow=1)),width=2),add=T))
plot(pts2,add=T,col='blue',pch=20,cex=2)
There is also an R package called spThin that performs spatial thinning on point data. It was developed for reducing the effects of sampling bias for species distribution models, and does multiple iterations for optimization. The function is quite easy to implement---the vignette can be found here. There is also a paper in Ecography with details about the technique.
Following Josh O'Brien's advice, I looked at spatstat's rMaternI function, and came up with the following. It seems to work pretty well.
The distance is in map units. It would be nice to incorporate one of R's distance functions that always returns distances in meters, rather than input units, but I couldn't figure that out...
require(spatstat)
require(maptools)
occ <- readShapeSpatial('occurrence_example.shp')
filterByProximity <- function(occ, dist) {
pts <- as.ppp.SpatialPoints(occ)
d <- nndist(pts)
z <- which(d > dist)
return(occ[z,])
}
occ2 <- filterByProximity(occ,dist=0.2)
plot(occ)
plot(occ2,add=T,col='blue',pch=20)
Rather than removing data points, you might consider spatial declustering. This involves giving points in clusters a lower weight than outlying points. The two simplest ways to do this involve a polygonal segmentation, like a Voronoi diagram, or some arbitrary grid. Both methods will weight points in each region according to the area of the region.
For example, if we take the points in your test (1,1),(2,2),(4.5,4.5),(5,5),(1,5) and apply a regular 2-by-2 mesh, where each cell is three units on a side, then the five points fall into three cells. The points ((1,1),(2,2)) falling into the cell [0,3]X[0,3] would each have weights 1/( no. of points in current cell TIMES tot. no. of occupied cells ) = 1 / ( 2 * 3 ). The same thing goes for the points ((4.5,4.5),(5,5)) in the cell (3,6]X(3,6]. The "outlier", (1,5) would have a weight 1 / ( 1 * 3 ). The nice thing about this technique is that it is a quick way to generate a density based weighting scheme.
A polygonal segmentation involves drawing a polygon around each point and using the area of that polygon to calculate the weight. Generally, the polygons completely cover the entire region, and the weights are calculated as the inverse of the area of each polygon. A Voronoi diagram is usually used for this, but polygonal segmentations may be calculated using other techniques, or may be specified by hand.
I have a centroid, e.g., A. and I have other 100 points. All of these points are of high-dimensions, e.g, 1000 dimensions. Is there a way to visualize these points in a two-dimensional space in-terms of their distance with A.
A common (though simple) way to visualize high-dimensional points in low dimensional space is to use some form of multi-dimensional scaling:
dat <- matrix(runif(1000*99),99,1000)
#Combine with "special" point
dat <- rbind(rep(0.1,1000),dat)
out <- cmdscale(dist(dat),k = 2)
#Plot everything, highlighting our "special" point
plot(out)
points(out[1,1],out[1,2],col = "red")
You can also check out isoMDS or sammon in the MASS package for other implementations in R.
The distance (by which I assume you mean the norm of the difference vector) is only 1 value, so you can calculate these norms and show them on a 1D plot, but for 2D you'll need a second parameter.