I have been unable to find any information specific to local block kriging with a local variogram using the gstat package in R. There is freeware called VESPER from the Australian Center for Precision Agriculture that is able to do this, and from what I have read it should be possible in R, I could just use some help with putting together a for-loop to make the gstat functions work locally.
Using the meuse data set as an example, I have been able to calculate and fit a global variogram to a data set:
library(gstat)
data(meuse)
coordinates(meuse) = ~x+y
data(meuse.grid)
gridded(meuse.grid) = ~x+y
logzinc_vgm<- variogram(log(zinc)~1, meuse)
logzinc_vgm_fit <- fit.variogram(logzinc_vgm, model=vgm("Sph", "Exp"))
logzinc_vgm_fit
plot(logzinc_vgm, logzinc_vgm_fit)
This gives a nice plot of the variogram for the whole data set with the fitted model. Then I can use this to perform block kriging over the entire data set:
logzinc_blkkrig <- krige(log(zinc)~1, meuse, meuse.grid, model = logzinc_vgm_fit, block=c(100,100))
spplot(logzinc_blkkrig["var1.pred"], main = "ordinary kriging predictions")
spplot(logzinc_blkkrig["var1.var"], main = "ordinary kriging variance")
This produces a plot of the interpolated data as well as a plot of the variance for each predicted point. So this would be perfect if I wanted these functions to work once for my entire data set...
But I have been unable to generate a for-loop to handle these functions on a local level.
My goals are:
1. For each point in my grid file (which I have tried as both a data frame and SpatialPointsDataFrame), I would like to subset from my data file points within the distance diagonally of the range given in the global variogram (easy to call this location (i.e. logzinc_vgm_fit[2,3]))
2. On this subset of data, I would like to calculate the variogram (as above) and fit a model to it (as above)
3. Based on this model, I would like to perform block kriging to get a predicted value and variance at that grid point
4. Build the above three steps into a for-loop to predict values at each grid point based on the local variogram around each grid point
note: as with the meuse data set built into the gstat package, the dimensions of my grid and data data frames are different
Thank you very much for chiming in if anyone is able to tackle this question. Happy to post the code I am working with so far if it would be useful.
I made a for loop that I think accomplishes what you request. I do not think that block kriging is required for this because the loop predicts at each grid cell.
The rad parameter is the search radius, which can be set to other quantities, but currently references the global variogram range (with nugget effect). I think it would be best to search a little further for points because if you only search up to the global variogram range, a local variogram fit may not converge (i.e. no observed range).
The k parameter is for the minimum number of nearest neighbors within rad. This is important because some locations may have no points within rad, which would result in an error.
You should note that the way you specified model=vgm("Sph", "Exp") seems to take the first listed method. So, I used the Spherical model in the for loop, but you can change to what you want to use. Matern may be a good choice if you think the shape will change with location.
#Specify the search radius for the local variogram
rad = logzinc_vgm_fit[2,3]
#Specify minimum number of points for prediction
k = 25
#Index to indicate if any result has been stored yet
stored = 0
for (i in 1:nrow(meuse.grid)){
#Calculate the Euclidian distance to all points from the currect grid cell
dists = spDistsN1(pts = meuse, pt = meuse.grid[i,], longlat = FALSE)
#Find indices of the points within rad of this grid point
IndsInRad = which(dists < rad)
if (length(IndsInRad) < k){
print('Not enough nearest neighbors')
}else{
#Calculate the local variogram with these points
locVario = variogram(log(zinc)~1, meuse[IndsInRad,])
#Fit the local variogram
locVarioFit = fit.variogram(logzinc_vgm, model=vgm("Sph"))
#Use kriging to predict at grid cell i. Supress printed output.
loc_krig <- krige(log(zinc)~1, meuse[IndsInRad,], meuse.grid[i,], model = locVarioFit, debug.level = 0)
#Add result to database
if (stored == 0){
FinalResult = loc_krig
stored = 1
}else{
FinalResult = rbind(FinalResult, loc_krig)
}
}
}
Related
I am using the R package machisplin (it's not on CRAN) to downscale a satellite image. According to the description of the package:
The machisplin.mltps function simultaneously evaluates different combinations of the six algorithms to predict the input data. During model tuning, each algorithm is systematically weighted from 0-1 and the fit of the ensembled model is evaluated. The best performing model is determined through k-fold cross validation (k=10) and the model that has the lowest residual sum of squares of test data is chosen. After determining the best model algorithms and weights, a final model is created using the full training dataset.
My question is how can I check which model out of the 6 has been selected for the downscaling? To put it differently, when I export the downscaled image, I would like to know which algorithm (out of the 6) has been used to perform the downscaling.
Here is the code:
library(MACHISPLIN)
library(raster)
library(gbm)
evi = raster("path/evi.tif") # covariate
ntl = raster("path/ntl_1600.tif") # raster to be downscaled
##convert one of the rasters to a point dataframe to sample. Use any raster input.
ntl.points<-rasterToPoints(ntl,
fun = NULL,
spatial = FALSE)
##subset only the x and y data
ntl.points<- ntl.points[,1:2]
##Extract values to points from rasters
RAST_VAL<-data.frame(extract(ntl, ntl.points))
##merge sampled data to input
InInterp<-cbind(ntl.points, RAST_VAL)
#run an ensemble machine learning thin plate spline
interp.rast<-machisplin.mltps(int.values = InInterp,
covar.ras = evi,
smooth.outputs.only = T,
tps = T,
n.cores = 4)
#set negative values to 0
interp.rast[[1]]$final[interp.rast[[1]]$final <= 0] <- 0
writeRaster(interp.rast[[1]]$final,
filename = "path/ntl_splines.tif")
I vied all the output parameters (please refer to Example 2 in the package description) but I couldn't find anything relevant to my question.
I have posted a question on GitHub as well. From here you can download my images.
I think this is a misunderstanding; mahcisplin, isnt testing 6 and gives one. it's trying many ensembles of 6 and its giving one ensemble... or in other words
that its the best 'combination of 6 algorithms' that I will get, and not one of 6 algo's chosen.
It will get something like "a model which is 20% algo1 , 10% algo2 etc. "and not "algo1 is the best and chosen"
I am trying to account for spatial autocorrelation in a model in R. Each observation is a country for which I have the average latitude and longitude. Here's some sample data:
country <- c("IQ", "MX", "IN", "PY")
long <- c(43.94511, -94.87018, 78.10349, -59.15377)
lat <- c(33.9415073, 18.2283975, 23.8462264, -23.3900255)
Pathogen <- c(10.937891, 13.326284, 12.472374, 12.541716)
Answer.values <- c(0, 0, 1, 0)
data <- data.frame(country, long, lat, Pathogen, Answer.values)
I know spatial autocorrelation is an issue (Moran's i is significant in the whole dataset). This is the model I am testing (Answer Values (a 0/1 variable) ~ Pathogen Prevalence (a continuous variable)).
model <- glm(Answer.values ~ Pathogen,
na.action = na.omit,
data = data,
family = "binomial")
How would I account for spatial autocorrelation with a data structure like that?
There are a lot of potential answers to this. One easy(ish) way is to use mgcv::gam() to add a spatial smoother. Most of your model would stay the same:
library(mgcv)
gam(Answer.values ~ Pathogen +s([something]),
family="binomial",
data=data)
where s([something]) is some form of smooth spatial term. Three possible/reasonable choices would be:
a spherical spline (?mgcv::smooth.construct.sos.smooth.spec), which takes lat/long as input; this would be useful if (1) you have data over a significant fraction of the earth's surface (so that a smoother that constructs a 2D planar spatial smooth is less reasonable); (2) you want to account for distance between locations in a continuous way
a Markov random field (?mgcv::smooth.construct.mrf.smooth.spec). This is essentially the spatial analogue of a discrete order-1 autoregressive structure (i.e. countries are directly correlated only with their direct neighbours, however you choose to define that). In order to do this you have to come up somehow with a neighbourhood list (i.e. a list of countries, where the elements are lists of countries that are neighbours of the original countries). You could do this however you like, e.g. by finding nearest neighbours geographically. (Check out some introductions to spatial statistics/spatial data analysis in R.) (On the other hand, if you're testing Moran's I then you've presumably already come up with some way to identify first-order neighbours ...)
if you're comfortable treating lat/long as coordinates in a 2D plane, then you have lot of choices of smoothing basis, e.g. ?mgcv::smooth.construct.gp.smooth.spec (Gaussian process smoothers, which include most of the standard spatial autocorrelation models as special cases)
A helpful link for getting up to speed with GAMs in R ...
I am comparing cross validation ("leave one out") results for different variogram models for a 3D data set using the gstat library in R. I would like to compare similar cross validation results for inverse distance estimates as well, but I can't see how to do anisotropic estimations (or cross validation) with IDW in gstat. Is IDW in gstat estimation limited to isotropic estimation, and if not, what syntax can be applied?
Assuming it is not possible to do anisotropic IDW estimations in gstat, I translated the data locations to make them isotropic, then conducted cross-validation using krige.cv.
Starting with a csv file with X,Y,Z,HM values (X, Y and Z translated to make the points isotropic - in this case XY rotated by 15 degrees, then X coordinates multiplied by 3.333 and Z values multiplied by 30: "hm_assays_iso.csv"). The R session I used was as follows:
> library(gstat)
> library(sp)
> hm_iso<-read.csv("hm_assays_iso.csv")
> coordinates(hm_iso)<- c("X", "Y", "Z")
> hm_idw.cv<-krige.cv(HM~1, locations=hm_iso, set=list(idp = 3))
> write.csv(hm_idw.cv, file="hm_cv_idw3.csv")
Since a few days I am struggling with a new challenging spatial analysis which include spatial autocorrelation in R: Specifically, I am interested in verifying the autocorrelation between points set in a grid of 50 m (more or less). My aim is to test the autocorrelation between these points (the locations where I collected the data) and to verify if the autocorrelation decreases increasing the distance among them (this is expected). My idea is to generate different radius of specific meters around each point (50 m, 100 m, 150 m and so on...) and to test the Moran's I Autocorrelation Index. Finally I would like to use ggplot to display the MI at each specific distance results (but this is easy to get once I have the MI outputs...).
My starting dataframe contains 4 coloumns: the ID of the point where data where collected, the values measured at that specific points (z) a coloumn with longitude (x) and a coloumn with latitude(y),data are displayed as follows:
#install libraries
library(sp)
library(spdep)
library(splm)
library(ape)
ID<- c(1,2,3,4,5,6)
x<-c(20.99984,20.99889, 20.99806,20.99800,20.99700,20.99732)
y<-c(52.21511,52.21489,52.21464,52.21410,52.21327,52.21278)
z<-c(1.16,0.54,0.89,0.60,1.27,1.45)
data <- data.frame(ID,x,y,z)
I read many things online and found this tutorial
https://mgimond.github.io/Spatial/spatial-autocorrelation-in-r.html#morans-i-as-a-function-of-a-distance-band
which actually shows what I'm interested in: however, it doesn't really work from the real beginning and, starting from my coordinates, I think there is a problem and I don't know how to tranform them in a proper format for R. this is the error message I get:
data <- data.frame(dataPOL$Long , dataPOL$Lat, dataPOL$Human_presence)
coordinates(data) <- c('x','y')`
proj4string(data) <- "+init=epsg:4326"
S.dist <- dnearneigh(coordinates, 0, 50) #radius of 50 meters
Error in dnearneigh(coordinates, 0, 50) : Data non-numeric
I did not receive any answer, but I ended up finding a solution:
I have found that the most used packages to work with spatial autocorrelation in R (in my case, Moran I) are spdep and ape.
I tried both: spdep didn't work yet but ape did. Here is the tutorial I followed for my specific case:
https://stats.idre.ucla.edu/r/faq/how-can-i-calculate-morans-i-in-r/
before calculate the Moran index, you should generate a distance matrix, I did it with the ‘rdist.earth’ from the package 'fields'.
This function measures the distance between each set of data points based on their coordinates. This function recognizes that the world is not flat, and as such calculates what are known as great-circle distances. I specified the distance in Km for my specific case.
to calculate Moran I, I ran this:
library(ape)
pop.dists.1 <- (popdists > 0 & popdists <= .06) # radius of 60m (remember
that field package works in km or miles)
Moran.I(mydataframe$myzvariable, pop.dists.1)
This is the output I got at this specific radius:
pop.dists.1 <- (popdists > 0 & popdists <= .06) #60m
Moran.I(dataPOL$Human_presence, pop.dists.1)
$observed
[1] 0.3841241 #Moran index: between -1 and 1, in here points within 60 m are
autocorrelated
$expected
[1] -0.009615385
$sd
[1] 0.08767598
$p.value
[1] 7.094019e-06
I repeated the formulas for the distances I am interested in: it works really well and increasing the distance, the Moran I index approximate 0 (which is what I expected).
I am going to plot the single outputs by using ggplot as always, in order to follow the trend of spatial autocorrelation for my z variable.
Hope this will help if needed!
I'm using a model to fit complex leaf shape patterns to data. The data are just contour coordinates from (half) leaf scans which can look anything from a normal function to a complex shape, with multiple Y coordinates for the same X (for example think of one side of a maple leaf).
The model however, does not provide a function, but rather a generated sequence of coordinates according to a specific set of rules. These coordinates are not a pairwise approximation of the data (i.e. the points calculated from the model do not have the same intervals as the data, see image below)
The point of the exercise if to find a function to minimize (like you would use residual sum of squares for a simple shape) so that the parameters of my model could be optimized to describe the shape as accurately as possible.
Example 1: a simple shape, single Y for single X. In the image shown below:
The black dots are the datapoints while the red ones are the model generated points for an optimal fit. As you can see the 2 datasets do not have equal intervals so I used an interpolation method to calculate the RSS of the model on a set of control points (for example, interpolating both datasets for a number of points along the axis). This provides me with an optimal fit, so no problems here.
Example 2: a complex shape, multiple y, single x. A possibility of a shape generated by the model is the following:
Assuming I have a dataset describing a shape this model generation approximates. How can I go about determining how well the model fits my contour?
In case anyone ever comes across a similar problem. I eventually solved it by working with an area overlap optimization. Using the r packages sp and rgeos.
library(sp)
library(rgeos)
PolysData <- Polygons(list(Polygon(contourData), "Poly")
SpPData <- SpatialPolygons(list(PolysData), 1:1)
ParamOpt <- function(par){
RSS <- tryCatch({
contourGen <- GenerateShape(par)
PolyFit <- Polygon(contourGen)
PolysFit <- Polygons(list(PolyFit), "Poly")
SpPFit <- SpatialPolygons(list(PolysFit), 1:1)
RSS <- tryCatch({
Inters <- gIntersection(SpPData,SpPFit)
TruePos <- Inters#polygons[[1]]#area
FalsePos <- SpPFit#polygons[[1]]#area-TruePos
FalseNeg <- SpPData#polygons[[1]]#area-TruePos
TrueNeg <- 1-TruePos-FalsePos-FalseNeg
Acc <- (TruePos+TrueNeg)/(TruePos+TrueNeg+FalsePos+FalseNeg)
return(1 - Acc)
}, error = function(err) {
RSS <- 10
return(RSS)
})
return(RSS)
}, error = function(err) {
RSS <- 10
return(RSS)
})
return(RSS)
}
the GenerateShape function is a self written function creating shapes as presented in the question based on parameters. The criterion is based on the evaluation criteria from "Image processing techniques and segmentation evaluation" (Smochina, 2011)