Kriging with gstat : "Covariance matrix singular at location" with predict - r

I am trying to do an estimation by kriging with gstat, but can never achieve it because of an issue with the covariance matrix. I never have estimates on the locations I want, because they are all skipped. I have the following warning message, for each location :
1: In predict.gstat(g, newdata = newdata, block = block, nsim = nsim, :
Covariance matrix singular at location [-8.07794,48.0158,0]: skipping...
And all estimates are NA.
So far I've browsed many related StackOverflow threads, but none resolved my problems (https://gis.stackexchange.com/questions/222192/r-gstat-krige-covariance-matrix-singular-at-location-5-88-47-4-0-skipping ; https://gis.stackexchange.com/questions/200722/gstat-krige-error-covariance-matrix-singular-at-location-917300-3-6109e06-0 ; https://gis.stackexchange.com/questions/262993/r-gstat-predict-error?rq=1)
I checked that :
there is actually a spatial structure in my dataset (see bubble plot with code below)
there are no duplicate locations
the variogram model is not singular and has a good fit to the experimental variogram (see plot with code below)
I also tried several values of range, sill, nugget and all the models in the gstat library
The covariance matrix is positive definite and has positive eigen values. It is singular according to gstat, but not to is.singular.matrix function
There were enough pair of points to do the experimental variogram
How to overcome this problem? What tips to avoid a singular covariance matrix? I also welcome any "best practice" for kriging.
Code (requires forSO.Rdata : https://www.dropbox.com/s/5vfj2gw9rkt365r/forSO.Rdata?dl=0 ) :
library(ggplot2)
library(gstat)
#Attached Rdata
load("forSO.Rdata")
#The observations
str(abun)
#Spatial structure
abun %>% as.data.frame %>%
ggplot(aes(lon, lat)) +
geom_point(aes(colour=prop_species_cells), alpha=3/4) +
coord_equal() + theme_bw()
#Number of pair of points
cvgm <- variogram(prop_species_cells ~1, data=abun, width=3, cutoff=300)
plot(cvgm$dist,cvgm$np)
#Fit a model covariogram
efitted = fit.variogram(cvgm, vgm(model="Mat", range=100, nugget=1), fit.method=7, fit.sills=TRUE, fit.ranges=TRUE)
plot(cvgm,efitted)
#No warning, and the model is non singular
attr(efitted, "singular")
#Covariance matrix (only on a small set of points, I have more than 25000 points) : positive-definite, postiive eigen values and not singular
hex_pointsDegTiny=hex_pointsDeg
hex_pointsDegTiny#coords=hex_pointsDegTiny#coords[1:10,]
dists <- spDists(hex_pointsDegTiny)
covarianceMatrix=variogramLine(efitted, maxdist = max(cvgm$dist), n = 10*max(cvgm$dist), dir = c(1,0,0), dist_vector = dists, covariance = TRUE)
eigen(covarianceMatrix)$values
is.positive.definite(covarianceMatrix)
is.singular.matrix(covarianceMatrix)
# No duplicate locations
zerodist(hex_pointsDegTiny)
# Impossible to krig
OK_fit <- gstat(id = "OK_fit", formula = prop_species_cells ~ 1, data = abun, model = efitted)
dist <- predict(OK_fit, newdata = hex_pointsDegTiny)
dist#data

Actually, there were duplicate locations in abun dataset (zerodist(abun)), they were not to be seeked into the grid on which I wanted to krig estimates. After getting rid of the duplicates, kriging worked fine.

Related

How to solve "impacts()" neighbors length error after running spdep::lagsarlm (Spatial Autoregressive Regression model)?

I have 9,150 polygons in my dataset. I was trying to run a spatial autoregressive model (SAR) in spdep to test spatial dependence of my outcome variable. After running the model, I wanted to examine the direct/indirect impacts, but encountered an error that seems to have something to do with the length of neighbors in the weights matrix not being equal to n.
I tried running the very same equation as SLX model (Spatial Lag X), and impacts() worked fine, even though there were some polygons in my set that had no neighbors. I Googled and looked at spdep documentation, but couldn't find a clue on how to solve this error.
# Defining queen contiguity neighbors for polyset and storing the matrix as list
q.nbrs <- poly2nb(polyset)
listweights <- nb2listw(q.nbrs, zero.policy = TRUE)
# Defining the model
model.equation <- TIME ~ A + B + C
# Run SAR model
reg <- lagsarlm(model.equation, data = polyset, listw = listweights, zero.policy = TRUE)
# Run impacts() to show direct/indirect impacts
impacts(reg, listw = listweights, zero.policy = TRUE)
Error in intImpacts(rho = rho, beta = beta, P = P, n = n, mu = mu, Sigma = Sigma, :
length(listweights$neighbours) == n is not TRUE
I know that this is a question from 2019, but maybe it can help people dealing with the same problem. I found out that in my case the problem was the type of dataset, your data=polyset should be of type "SpatialPolygonsDataFrame". Which can be achieved by converting your data:
polyset_spatial_sf <- sf::as_Spatial(polyset, IDs = polyset$ID)
Then rerun your code.

R non-linear model fitting using fitModel function

I want to fit a non-linear model to a real data.
The real data consists of 2 known numerical vectors ; thickness as 'x' and fh as 'y'
thickness=seq(0.15,2.00,by=0.05)
fh = c(5.17641, 4.20461, 3.31091, 2.60899, 2.23541, 1.97771, 1.88141, 1.62821, 1.50138, 1.51075, 1.40850, 1.26222, 1.09432, 1.13202, 1.12918, 1.10355, 1.11867, 1.09740,1.08324, 1.05687, 1.19422, 1.22984, 1.34516, 1.19713,1.25398 ,1.29885, 1.33658, 1.31166, 1.40332, 1.39550,1.37855, 1.41491, 1.59549, 1.56027, 1.63925, 1.72440, 1.74192, 1.82049)
plot(thickness,fh)
This is apparently non-linear. So, I am trying to fit this model as a non-linear function of
y= x*2/3+(2+2*a)/(3*x)
Variable a is an unknown constant and I am trying to find the best constant a that minimizes the sum of square of error between the regression line and the real data.
I first used a function fitModel that I found on a YouTube video, Fitting Functions to Data in R.
library(TIMP)
f=fitModel(fh~thickness^2/3+(2+2*A)/(3*thickness)) #it finds the coefficient 'A'
coef(f) # to represent just the coefficient
However, there's an error
Error in modelspec[[datasetind[i]]] : subscript out of bounds
So, as an alternative, want to find a plot of 'a' and 'the Sum of Squares of Error'. This time, I have such a hard time finding 'a' and plotting this graph. By manual work, I figured out the value 'a' is somewhere near 0.2 but this is not a precise value.
It would be helpful if someone could manifest either:
Why the fitModel function didn't work or
How to find the value a and plot the graph.
You could try this instead:
yf = function(a,xv) xv*(2/3)+(2+2*a)/(3*xv)
yf(2,thickness)
f <- function (a,y, xv) sum((y - yf(a,xv))^2)
f(2,fh,thickness)
xmin <- optimize(f, c(0, 10), tol = 0.0001, y=fh,xv=thickness)
xmin
plot(thickness,fh)
lines(thickness,yf(xmin$minimum,thickness),col=3)

Calculating prediction accuracy of a tree using rpart's predict method

I have constructed a decision tree using rpart for a dataset.
I have then divided the data into 2 parts - a training dataset and a test dataset. A tree has been constructed for the dataset using the training data. I want to calculate the accuracy of the predictions based on the model that was created.
My code is shown below:
library(rpart)
#reading the data
data = read.table("source")
names(data) <- c("a", "b", "c", "d", "class")
#generating test and train data - Data selected randomly with a 80/20 split
trainIndex <- sample(1:nrow(x), 0.8 * nrow(x))
train <- data[trainIndex,]
test <- data[-trainIndex,]
#tree construction based on information gain
tree = rpart(class ~ a + b + c + d, data = train, method = 'class', parms = list(split = "information"))
I now want to calculate the accuracy of the predictions generated by the model by comparing the results with the actual values train and test data however I am facing an error while doing so.
My code is shown below:
t_pred = predict(tree,test,type="class")
t = test['class']
accuracy = sum(t_pred == t)/length(t)
print(accuracy)
I get an error message that states -
Error in t_pred == t : comparison of these types is not implemented In
addition: Warning message: Incompatible methods ("Ops.factor",
"Ops.data.frame") for "=="
On checking the type of t_pred, I found out that it is of type integer however the documentation
(https://stat.ethz.ch/R-manual/R-devel/library/rpart/html/predict.rpart.html)
states that the predict() method must return a vector.
I am unable to understand why is the type of the variable is an integer and not a list. Where have I made the mistake and how can I fix it?
Try calculating the confusion matrix first:
confMat <- table(test$class,t_pred)
Now you can calculate the accuracy by dividing the sum diagonal of the matrix - which are the correct predictions - by the total sum of the matrix:
accuracy <- sum(diag(confMat))/sum(confMat)
My response is very similar to #mtoto's one but a bit more simply... I hope it also helps.
mean(test$class == t_pred)

Duplicate data when using gstat or automap package in R

I am trying to using ordinary kriging to spatially predict data where an animal will occur based on predictor variables using the gstat or automap package in R. I have many (over 100) duplicate coordinate points, which I cannot throw out since those stations were sampled multiple times over many years. Every time that I run the code below for ordinary kriging, I get an LDL error, which is due to the duplicate points. Does anyone know how to fix this problem without throwing out data? I have tried the code from the automap package that is supposed to correct for duplicates but I can't get that to work. Thank you for the help!
coordinates(fish) <- ~ LONGITUDE+LATITUDE
x.range <- range(fish#coords[,1])
y.range <- range(fish#coords[,2])
grd <- expand.grid(x=seq(from=x.range[1], to=x.range[2], by=3), y=seq(from=y.range[1], to=y.range[2], by=3))
coordinates(grd) <- ~ x+y
plot(grd, pch=16, cex=.5)
gridded(grd) <- TRUE
library(gstat)
zerodist(fish) ###146 duplicate points
v <- variogram(log(WATER_TEMP) ~1, fish, na.rm=TRUE)
plot(v)
vgm()
f <- vgm(1, "Sph", 300, 0.5)
print(f)
v.fit <- fit.variogram(v,f)
plot(v, model=v.fit) ####In fit.variogram(v, d) : Warning: singular model in variogram fit
krg <- krige(log(WATER_TEMP) ~ 1, fish, grd, v.fit)
## [using ordinary kriging]
##"chfactor.c", line 131: singular matrix in function LDLfactor()Error in predict.gstat(g, newdata = newdata, block = block, nsim = nsim,: LDLfactor
##automap code for correcting for duplicates
fish.dup = rbind(fish, fish[1,]) # Create duplicate
coordinates(fish.dup) = ~LONGITUDE + LATITUDE
kr = autoKrige(WATER_TEMP, fish.dup, grd)
###Error in inherits(formula, "SpatialPointsDataFrame"):object 'WATER_TEMP' not found
###somehow my predictor variables are no longer available when in a Spatial Points Data Frame??
automap::autoKrige expects a formula as first argument, try
kr = autoKrige(WATER_TEMP~1, fish.dup, grd)
automaphas a very simple fix for duplicate observations, and that is to discard them. So, automapdoes not really solves the issue you have. I see some options:
Discard the duplicates.
Slightly perturb the coordinates of the duplicates so that they are not on exactly the same location anymore.
Perform space-time kriging using gstat.
In regard to your specific issue, please make your example reproducible. What I can guess is that rbind of your fish object is not doing what you expect...
Alternatively you can use the function jitterDupCoords of geoR package.
https://cran.r-project.org/web/packages/geoR/geoR.pdf

R: Bad variogram fitting, bad kriging results

I try to do a kriging in the Jakarta Bay. I have a set of measurement points with appropriated coordinates and attributes (pH, salinity,...)
In order to do a kriging I first need to find a model for my variogram. When I use the "variogram" function the output is not perfect but is should be ok, but then when I try to fit the variogram I get a waring message: In fit.variogram(ph.vgm, model = vgm(0.12, "Sph", 0.1, 0.01)) :
Warning: singular model in variogram fit and I have a singular model.
Here I read about singular models associated to variogram calculations. Can I do something to make it better ?
How can I obtain a better fit for my variogram ? Why do I obtain only little circles around the measurement points ? I would like to have my full map with prediction values.
I also tried the "automap" library which is even less flexible and I don't obtain good results.
library(sp)
library(gstat)
library(automap)
x = c(11878417.51,11882987.17,11887690.42,11892582.91,11897119.18,11902527.08,11879348.14,11884237.29,11888933.86,11893819.67,11898835.73,11903940.84,11908386.94,11885529.71,11889836.66,11900118.13,11905765.37,11896037.16,11901234.67,11906244.04,11892136.86,11900822.56,11904493.1,11907692.42,11910346.05,11888709,11887268.41,11885237.28,11883450.38,11880668.5)
y = c(-668537.7429,-667290.838,-666043.9586,-663943.1247,-663992.3709,-662612.3726,-672878.6036,-672014.4364,-671960.7062,-669604.4601,-668541.1009,-667203.5333,-666181.6289,-676933.1896,-676566.0044,-673095.7667,-671736.8309,-679340.0992,-677788.4711,-676606.3051,-682542.446,-680607.5158,-680131.1539,-679733.0503,-662307.2774,-680754.1755,-681408.3272,-680494.7783,-680491.4197,-679426.19)
ph = c(7.1,7.76,7.14,7.19,7.56,7.56,7.11,8.14,7.22,7.17,7.33,7.37,7.36,7.23,7.12,7.54,7.96,7.98,7.96,7.2,7.44,7.36,7.71,7.71,8.01,7.73,8.11,7.03,7.26,7.77)
TSS = c(13.7,21,17.7,18.8,4.7,12.4,17.3,18.8,20.2,18.3,5.6,NA,NA,NA,21.9,11.1,NA,NA,21.2,29.1,31.3,29.3,21.3,25.4,31.8,14.5,2.9,11.7,8.4,NA)
df = data.frame(x,y,ph,TSS)
coordinates(df) = ~x+y
proj4string(df) <- CRS("+init=epsg:3857")
spplot(df)
grid <- data.frame(x=c(11877328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11912328.43,11877328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11877328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11912328.43,11877328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11912328.43,11877328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11912328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11879828.43,11882328.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11902328.43,11904828.43,11907328.43,11909828.43,11884828.43,11887328.43,11889828.43,11892328.43,11894828.43,11897328.43,11899828.43,11904828.43,11894828.43),y=c(-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-659719.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-662219.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-664719.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-667219.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-669719.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-672219.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-674719.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-677219.4518,-679719.4518,-679719.4518,-679719.4518,-679719.4518,-679719.4518,-679719.4518,-679719.4518,-679719.4518,-682219.4518))
coordinates(grid)=~x+y
proj4string(grid) <- CRS("+init=epsg:3857")
gridded(grid)=T
spplot(grid)
ph.vgm <- variogram(ph~1, df[!is.na(df#data$ph),]); plot(ph.vgm)
ph.fit = fit.variogram(ph.vgm, model = vgm(0.12, "Sph", 4000, 0.01), warn.if.neg = T); ph.fit
plot(ph.vgm, ph.fit)
ph.kriged = krige(ph~1, df[!is.na(df#data$ph),], grid, model = ph.fit)
spplot(ph.kriged["var1.pred"])
And small sample.
If you want to see $0.5(Z(x)-Z(x+h))^2$ for every point pair, plotted against $h$, then use
plot(variogram(ph~1, df[!is.na(df#data$ph),], cloud=TRUE))
but this may also not make you happy. The bottomline is that you have very few observations (30), and with so few observations you essentially never get variograms that look nice.

Resources