R: Bad variogram fitting, bad kriging results - r

I try to do a kriging in the Jakarta Bay. I have a set of measurement points with appropriated coordinates and attributes (pH, salinity,...)
In order to do a kriging I first need to find a model for my variogram. When I use the "variogram" function the output is not perfect but is should be ok, but then when I try to fit the variogram I get a waring message: In fit.variogram(ph.vgm, model = vgm(0.12, "Sph", 0.1, 0.01)) :
Warning: singular model in variogram fit and I have a singular model.
Here I read about singular models associated to variogram calculations. Can I do something to make it better ?
How can I obtain a better fit for my variogram ? Why do I obtain only little circles around the measurement points ? I would like to have my full map with prediction values.
I also tried the "automap" library which is even less flexible and I don't obtain good results.
library(sp)
library(gstat)
library(automap)
x = c(11878417.51,11882987.17,11887690.42,11892582.91,11897119.18,11902527.08,11879348.14,11884237.29,11888933.86,11893819.67,11898835.73,11903940.84,11908386.94,11885529.71,11889836.66,11900118.13,11905765.37,11896037.16,11901234.67,11906244.04,11892136.86,11900822.56,11904493.1,11907692.42,11910346.05,11888709,11887268.41,11885237.28,11883450.38,11880668.5)
y = c(-668537.7429,-667290.838,-666043.9586,-663943.1247,-663992.3709,-662612.3726,-672878.6036,-672014.4364,-671960.7062,-669604.4601,-668541.1009,-667203.5333,-666181.6289,-676933.1896,-676566.0044,-673095.7667,-671736.8309,-679340.0992,-677788.4711,-676606.3051,-682542.446,-680607.5158,-680131.1539,-679733.0503,-662307.2774,-680754.1755,-681408.3272,-680494.7783,-680491.4197,-679426.19)
ph = c(7.1,7.76,7.14,7.19,7.56,7.56,7.11,8.14,7.22,7.17,7.33,7.37,7.36,7.23,7.12,7.54,7.96,7.98,7.96,7.2,7.44,7.36,7.71,7.71,8.01,7.73,8.11,7.03,7.26,7.77)
TSS = c(13.7,21,17.7,18.8,4.7,12.4,17.3,18.8,20.2,18.3,5.6,NA,NA,NA,21.9,11.1,NA,NA,21.2,29.1,31.3,29.3,21.3,25.4,31.8,14.5,2.9,11.7,8.4,NA)
df = data.frame(x,y,ph,TSS)
coordinates(df) = ~x+y
proj4string(df) <- CRS("+init=epsg:3857")
spplot(df)
grid <- data.frame(x=c(11877328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11912328.43,11877328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11877328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11912328.43,11877328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11912328.43,11877328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11912328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11904828.43,11894828.43),y=c(-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-679719.4518,-679719.4518,-679719.4518,-679719.4518,-679719.4518,-679719.4518,-679719.4518,-679719.4518,-682219.4518))
coordinates(grid)=~x+y
proj4string(grid) <- CRS("+init=epsg:3857")
gridded(grid)=T
spplot(grid)
ph.vgm <- variogram(ph~1, df[!is.na(df#data$ph),]); plot(ph.vgm)
ph.fit = fit.variogram(ph.vgm, model = vgm(0.12, "Sph", 4000, 0.01), warn.if.neg = T); ph.fit
plot(ph.vgm, ph.fit)
ph.kriged = krige(ph~1, df[!is.na(df#data$ph),], grid, model = ph.fit)
spplot(ph.kriged["var1.pred"])

And small sample.
If you want to see $0.5(Z(x)-Z(x+h))^2$ for every point pair, plotted against $h$, then use
plot(variogram(ph~1, df[!is.na(df#data$ph),], cloud=TRUE))
but this may also not make you happy. The bottomline is that you have very few observations (30), and with so few observations you essentially never get variograms that look nice.

Related

how to make a proper grid in case of a GAM + OK kriging method?

I am struggling with a Kriging + GAM problem since a while and hope someone will be able to help me.
I am trying to make interpolation of a pollutant into an area. To do so, I use ordinary kriging from gstat, in addition to GAM, based on this method (link where I found it is below):
GAM <- gam(formula = pollutant ~ 1+ s(Long, Lat) + s(Individual, bs="re"), data = dataset)
GAMPredictions <- predict(mydataset, GAM, type="response")
residKrigMap <- krige(formula = residuals(GAM) ,
locations = mydataset,
model = myVariogram,
newdata = MyGrid)
residKrigRstLayer <- as(residKrigMap, "RasterLayer")
gamKrigMap <- GAMPredictions + residKrigRstLayer
All the kriging part is doing ok, but when I try to this:
gamKrigMap <- GAMPredictions + residKrigRstLayer
it does not work. And I am quite sure it's because of the grid I am doing my interpolation on. Indeed, I am using a rds file on which are the areas I want to interpolate. But you can't make the prediction in such a file, as it does not contain the information from the dataset. Also, I have 760 observations from my dataset, but over 6000 from my grid, so there is a mistmatch.
So far, for what I have been reeding, I should create a grid based on my dataset.
So my questions are:
how can I create a grid from my data set with R? All the method I see include existing raster or shapefile that I don't have (or know how to create).
from then, how can I add the country boundaries (interpolation is made in the north atlantic and I want to add the country that are surrounding the area)
Of course let me know if you need more info.
Best,
CĂ©line
link: https://www.r-exercises.com/2018/03/31/advanced-techniques-with-raster-data-part-3-regression-kriging/?cf_chl_jschl_tk=6311eac67bd8c8522c68f83f62be077ef572ca2c-1588325277-0-AaBAYr6yFHMceWCKlsRhOxbdPIKI3oKq6c8PVrtfzQaBgUDqBu3-QG8E1z94b3ygHSpaaYMPtHYo4ThfcJWwsn2p1ZxcvutlBdn-nh0wKILVFc3xrLHUzNqxRqJlY9sjRcHDft9tn9PFPngo06AonPkEkCUIHtQ1mgjz3Kgt9zTp93OJyJEgAp0XRAvc1pzE8MTXv-fAwlkWki21vE9RYiZEgss5m5AYJ-fppvRUL2ckSTl0W765HVVeH3GySp6nuoTvmvaocZ_Q8axMB_za2Iiiwb2iar_h-9s8fOZXh5QbQc-9PojDlPBfU-nxy4E1gGHjFyfXNvnkghKYqZaz6fr4pQ7u7K3R-GglVx2vPlEmIfuIzHYdhAC4oJd84hKMvg)

Kriging with gstat : "Covariance matrix singular at location" with predict

I am trying to do an estimation by kriging with gstat, but can never achieve it because of an issue with the covariance matrix. I never have estimates on the locations I want, because they are all skipped. I have the following warning message, for each location :
1: In predict.gstat(g, newdata = newdata, block = block, nsim = nsim, :
Covariance matrix singular at location [-8.07794,48.0158,0]: skipping...
And all estimates are NA.
So far I've browsed many related StackOverflow threads, but none resolved my problems (https://gis.stackexchange.com/questions/222192/r-gstat-krige-covariance-matrix-singular-at-location-5-88-47-4-0-skipping ; https://gis.stackexchange.com/questions/200722/gstat-krige-error-covariance-matrix-singular-at-location-917300-3-6109e06-0 ; https://gis.stackexchange.com/questions/262993/r-gstat-predict-error?rq=1)
I checked that :
there is actually a spatial structure in my dataset (see bubble plot with code below)
there are no duplicate locations
the variogram model is not singular and has a good fit to the experimental variogram (see plot with code below)
I also tried several values of range, sill, nugget and all the models in the gstat library
The covariance matrix is positive definite and has positive eigen values. It is singular according to gstat, but not to is.singular.matrix function
There were enough pair of points to do the experimental variogram
How to overcome this problem? What tips to avoid a singular covariance matrix? I also welcome any "best practice" for kriging.
Code (requires forSO.Rdata : https://www.dropbox.com/s/5vfj2gw9rkt365r/forSO.Rdata?dl=0 ) :
library(ggplot2)
library(gstat)
#Attached Rdata
load("forSO.Rdata")
#The observations
str(abun)
#Spatial structure
abun %>% as.data.frame %>%
ggplot(aes(lon, lat)) +
geom_point(aes(colour=prop_species_cells), alpha=3/4) +
coord_equal() + theme_bw()
#Number of pair of points
cvgm <- variogram(prop_species_cells ~1, data=abun, width=3, cutoff=300)
plot(cvgm$dist,cvgm$np)
#Fit a model covariogram
efitted = fit.variogram(cvgm, vgm(model="Mat", range=100, nugget=1), fit.method=7, fit.sills=TRUE, fit.ranges=TRUE)
plot(cvgm,efitted)
#No warning, and the model is non singular
attr(efitted, "singular")
#Covariance matrix (only on a small set of points, I have more than 25000 points) : positive-definite, postiive eigen values and not singular
hex_pointsDegTiny=hex_pointsDeg
hex_pointsDegTiny#coords=hex_pointsDegTiny#coords[1:10,]
dists <- spDists(hex_pointsDegTiny)
covarianceMatrix=variogramLine(efitted, maxdist = max(cvgm$dist), n = 10*max(cvgm$dist), dir = c(1,0,0), dist_vector = dists, covariance = TRUE)
eigen(covarianceMatrix)$values
is.positive.definite(covarianceMatrix)
is.singular.matrix(covarianceMatrix)
# No duplicate locations
zerodist(hex_pointsDegTiny)
# Impossible to krig
OK_fit <- gstat(id = "OK_fit", formula = prop_species_cells ~ 1, data = abun, model = efitted)
dist <- predict(OK_fit, newdata = hex_pointsDegTiny)
dist#data
Actually, there were duplicate locations in abun dataset (zerodist(abun)), they were not to be seeked into the grid on which I wanted to krig estimates. After getting rid of the duplicates, kriging worked fine.

Cross-validation for kriging in R: how to include the trend while reestimating the variogram using xvalid?

I have a question very specific for the function xvalid (package geoR) in R which is used in spatial statistics only, so I hope it's not too specific for someone to be able to answer. In any case, suggestions for alternative functions/packages are welcome too.
I would like to compute a variogram, fit it, and then perform cross-validation. Function xvalid seems to work pretty nice to do the cross-validation. It works when I set reestimate=TRUE (so it reestimates the variogram for every point removed from the dataset in cross-validation) and it also works when using a trend. However, it does not seem to work when combining these two...
Here is an example using the classical Meuse dataset:
library(geoR)
library(sp)
data(meuse) # import data
coordinates(meuse) = ~x+y # make spatialpointsdataframe
meuse#proj4string <- CRS("+init=epsg:28992") # add projection
meuse_geo <- as.geodata(meuse) # create object of class geodata for geoR compatibility
meuse_geo$data <- meuse#data # attach all data (incl. covariates) to meuse_geo
meuse_vario <- variog(geodata=meuse_geo, data=meuse_geo$data$lead, trend= ~meuse_geo$data$elev) # variogram
meuse_vfit <- variofit(meuse_vario, nugget=0.1, fix.nugget=T) # fit
# cross-validation works fine:
xvalid(geodata=meuse_geo, data=meuse_geo$data$lead, model=meuse_vfit, variog.obj = meuse_vario, reestimate=F)
# cross-validation does not work when reestimate = T:
xvalid(geodata=meuse_geo, data=meuse_geo$data$lead, model=meuse_vfit, variog.obj = meuse_vario, reestimate=T)
The error I get is:
Error in variog(coords = cv.coords, data = cv.data, uvec = variog.obj$uvec, : coords and trend have incompatible sizes
It seems to remove the point from the dataset during cross-validation, but it doesn't seem to remove the point from the covariates/trend data. Any ideas on solving this or using a different package?
Thanks a lot in advance!

Duplicate data when using gstat or automap package in R

I am trying to using ordinary kriging to spatially predict data where an animal will occur based on predictor variables using the gstat or automap package in R. I have many (over 100) duplicate coordinate points, which I cannot throw out since those stations were sampled multiple times over many years. Every time that I run the code below for ordinary kriging, I get an LDL error, which is due to the duplicate points. Does anyone know how to fix this problem without throwing out data? I have tried the code from the automap package that is supposed to correct for duplicates but I can't get that to work. Thank you for the help!
coordinates(fish) <- ~ LONGITUDE+LATITUDE
x.range <- range(fish#coords[,1])
y.range <- range(fish#coords[,2])
grd <- expand.grid(x=seq(from=x.range[1], to=x.range[2], by=3), y=seq(from=y.range[1], to=y.range[2], by=3))
coordinates(grd) <- ~ x+y
plot(grd, pch=16, cex=.5)
gridded(grd) <- TRUE
library(gstat)
zerodist(fish) ###146 duplicate points
v <- variogram(log(WATER_TEMP) ~1, fish, na.rm=TRUE)
plot(v)
vgm()
f <- vgm(1, "Sph", 300, 0.5)
print(f)
v.fit <- fit.variogram(v,f)
plot(v, model=v.fit) ####In fit.variogram(v, d) : Warning: singular model in variogram fit
krg <- krige(log(WATER_TEMP) ~ 1, fish, grd, v.fit)
## [using ordinary kriging]
##"chfactor.c", line 131: singular matrix in function LDLfactor()Error in predict.gstat(g, newdata = newdata, block = block, nsim = nsim,: LDLfactor
##automap code for correcting for duplicates
fish.dup = rbind(fish, fish[1,]) # Create duplicate
coordinates(fish.dup) = ~LONGITUDE + LATITUDE
kr = autoKrige(WATER_TEMP, fish.dup, grd)
###Error in inherits(formula, "SpatialPointsDataFrame"):object 'WATER_TEMP' not found
###somehow my predictor variables are no longer available when in a Spatial Points Data Frame??
automap::autoKrige expects a formula as first argument, try
kr = autoKrige(WATER_TEMP~1, fish.dup, grd)
automaphas a very simple fix for duplicate observations, and that is to discard them. So, automapdoes not really solves the issue you have. I see some options:
Discard the duplicates.
Slightly perturb the coordinates of the duplicates so that they are not on exactly the same location anymore.
Perform space-time kriging using gstat.
In regard to your specific issue, please make your example reproducible. What I can guess is that rbind of your fish object is not doing what you expect...
Alternatively you can use the function jitterDupCoords of geoR package.
https://cran.r-project.org/web/packages/geoR/geoR.pdf

PLS in R: Predicting new observations returns Fitted values instead

In the past few days I have developed multiple PLS models in R for spectral data (wavebands as explanatory variables) and various vegetation parameters (as individual response variables). In total, the dataset comprises of 56. The first 28 (training set) have been used for model calibration, now all I want to do is to predict the response values for the remaining 28 observations in the tesset. For some reason, however, R keeps on the returning the fitted values of the calibration set for a given number of components rather than predictions for the independent test set. Here is what the model looks like in short.
# first simulate some data
set.seed(123)
bands=101
data <- data.frame(matrix(runif(56*bands),ncol=bands))
colnames(data) <- paste0(1:bands)
data$height <- rpois(56,10)
data$fbm <- rpois(56,10)
data$nitrogen <- rpois(56,10)
data$carbon <- rpois(56,10)
data$chl <- rpois(56,10)
data$ID <- 1:56
data <- as.data.frame(data)
caldata <- data[1:28,] # define model training set
valdata <- data[29:56,] # define model testing set
# define explanatory variables (x)
spectra <- caldata[,1:101]
# build PLS model using training data only
library(pls)
refl.pls <- plsr(height ~ spectra, data = caldata, ncomp = 10, validation =
"LOO", jackknife = TRUE)
It was then identified that a model comprising of 3 components yielded the best performance without over-fitting. Hence, the following command was used to predict the values of the 28 observations in the testing set using the above calibrated PLS model with 3 components:
predict(refl.pls, ncomp = 3, newdata = valdata)
Sensible as the output may seem, I soon discovered that all this piece of code generates are the fitted values of the PLS model for the calibration/training data, rather than predictions. I discovered this because the below code, in which newdata = is omitted, yields identical results.
predict(refl.pls, ncomp = 3)
Surely something must be going wrong, although I cannot seem to find out what specifically is. Is there someone out there who can, and is willing to help me move in the right direction?
I think the problem is with the nature of the input data. Looking at ?plsr and str(yarn) that goes with the example, plsr requires a very specific data frame that I find tricky to work with. The input data frame should have a matrix as one of its elements (in your case, the spectral data). I think the following works correctly (note I changed the size of the training set so that it wasn't half the original data, for troubleshooting):
library("pls")
set.seed(123)
bands=101
spectra = matrix(runif(56*bands),ncol=bands)
DF <- data.frame(spectra = I(spectra),
height = rpois(56,10),
fbm = rpois(56,10),
nitrogen = rpois(56,10),
carbon = rpois(56,10),
chl = rpois(56,10),
ID = 1:56)
class(DF$spectra) <- "matrix" # just to be certain, it was "AsIs"
str(DF)
DF$train <- rep(FALSE, 56)
DF$train[1:20] <- TRUE
refl.pls <- plsr(height ~ spectra, data = DF, ncomp = 10, validation =
"LOO", jackknife = TRUE, subset = train)
res <- predict(refl.pls, ncomp = 3, newdata = DF[!DF$train,])
Note that I got the spectral data into the data frame as a matrix by protecting it with I which equates to AsIs. There might be a more standard way to do this, but it works. As I said, to me a matrix inside of a data frame is not completely intuitive or easy to grok.
As to why your version didn't work quite right, I think the best explanation is that everything needs to be in the one data frame you pass to plsr for the data sources to be completely unambiguous.

Resources