Duplicate data when using gstat or automap package in R - r

I am trying to using ordinary kriging to spatially predict data where an animal will occur based on predictor variables using the gstat or automap package in R. I have many (over 100) duplicate coordinate points, which I cannot throw out since those stations were sampled multiple times over many years. Every time that I run the code below for ordinary kriging, I get an LDL error, which is due to the duplicate points. Does anyone know how to fix this problem without throwing out data? I have tried the code from the automap package that is supposed to correct for duplicates but I can't get that to work. Thank you for the help!
coordinates(fish) <- ~ LONGITUDE+LATITUDE
x.range <- range(fish#coords[,1])
y.range <- range(fish#coords[,2])
grd <- expand.grid(x=seq(from=x.range[1], to=x.range[2], by=3), y=seq(from=y.range[1], to=y.range[2], by=3))
coordinates(grd) <- ~ x+y
plot(grd, pch=16, cex=.5)
gridded(grd) <- TRUE
library(gstat)
zerodist(fish) ###146 duplicate points
v <- variogram(log(WATER_TEMP) ~1, fish, na.rm=TRUE)
plot(v)
vgm()
f <- vgm(1, "Sph", 300, 0.5)
print(f)
v.fit <- fit.variogram(v,f)
plot(v, model=v.fit) ####In fit.variogram(v, d) : Warning: singular model in variogram fit
krg <- krige(log(WATER_TEMP) ~ 1, fish, grd, v.fit)
## [using ordinary kriging]
##"chfactor.c", line 131: singular matrix in function LDLfactor()Error in predict.gstat(g, newdata = newdata, block = block, nsim = nsim,: LDLfactor
##automap code for correcting for duplicates
fish.dup = rbind(fish, fish[1,]) # Create duplicate
coordinates(fish.dup) = ~LONGITUDE + LATITUDE
kr = autoKrige(WATER_TEMP, fish.dup, grd)
###Error in inherits(formula, "SpatialPointsDataFrame"):object 'WATER_TEMP' not found
###somehow my predictor variables are no longer available when in a Spatial Points Data Frame??

automap::autoKrige expects a formula as first argument, try
kr = autoKrige(WATER_TEMP~1, fish.dup, grd)

automaphas a very simple fix for duplicate observations, and that is to discard them. So, automapdoes not really solves the issue you have. I see some options:
Discard the duplicates.
Slightly perturb the coordinates of the duplicates so that they are not on exactly the same location anymore.
Perform space-time kriging using gstat.
In regard to your specific issue, please make your example reproducible. What I can guess is that rbind of your fish object is not doing what you expect...

Alternatively you can use the function jitterDupCoords of geoR package.
https://cran.r-project.org/web/packages/geoR/geoR.pdf

Related

Kriging with gstat : "Covariance matrix singular at location" with predict

I am trying to do an estimation by kriging with gstat, but can never achieve it because of an issue with the covariance matrix. I never have estimates on the locations I want, because they are all skipped. I have the following warning message, for each location :
1: In predict.gstat(g, newdata = newdata, block = block, nsim = nsim, :
Covariance matrix singular at location [-8.07794,48.0158,0]: skipping...
And all estimates are NA.
So far I've browsed many related StackOverflow threads, but none resolved my problems (https://gis.stackexchange.com/questions/222192/r-gstat-krige-covariance-matrix-singular-at-location-5-88-47-4-0-skipping ; https://gis.stackexchange.com/questions/200722/gstat-krige-error-covariance-matrix-singular-at-location-917300-3-6109e06-0 ; https://gis.stackexchange.com/questions/262993/r-gstat-predict-error?rq=1)
I checked that :
there is actually a spatial structure in my dataset (see bubble plot with code below)
there are no duplicate locations
the variogram model is not singular and has a good fit to the experimental variogram (see plot with code below)
I also tried several values of range, sill, nugget and all the models in the gstat library
The covariance matrix is positive definite and has positive eigen values. It is singular according to gstat, but not to is.singular.matrix function
There were enough pair of points to do the experimental variogram
How to overcome this problem? What tips to avoid a singular covariance matrix? I also welcome any "best practice" for kriging.
Code (requires forSO.Rdata : https://www.dropbox.com/s/5vfj2gw9rkt365r/forSO.Rdata?dl=0 ) :
library(ggplot2)
library(gstat)
#Attached Rdata
load("forSO.Rdata")
#The observations
str(abun)
#Spatial structure
abun %>% as.data.frame %>%
ggplot(aes(lon, lat)) +
geom_point(aes(colour=prop_species_cells), alpha=3/4) +
coord_equal() + theme_bw()
#Number of pair of points
cvgm <- variogram(prop_species_cells ~1, data=abun, width=3, cutoff=300)
plot(cvgm$dist,cvgm$np)
#Fit a model covariogram
efitted = fit.variogram(cvgm, vgm(model="Mat", range=100, nugget=1), fit.method=7, fit.sills=TRUE, fit.ranges=TRUE)
plot(cvgm,efitted)
#No warning, and the model is non singular
attr(efitted, "singular")
#Covariance matrix (only on a small set of points, I have more than 25000 points) : positive-definite, postiive eigen values and not singular
hex_pointsDegTiny=hex_pointsDeg
hex_pointsDegTiny#coords=hex_pointsDegTiny#coords[1:10,]
dists <- spDists(hex_pointsDegTiny)
covarianceMatrix=variogramLine(efitted, maxdist = max(cvgm$dist), n = 10*max(cvgm$dist), dir = c(1,0,0), dist_vector = dists, covariance = TRUE)
eigen(covarianceMatrix)$values
is.positive.definite(covarianceMatrix)
is.singular.matrix(covarianceMatrix)
# No duplicate locations
zerodist(hex_pointsDegTiny)
# Impossible to krig
OK_fit <- gstat(id = "OK_fit", formula = prop_species_cells ~ 1, data = abun, model = efitted)
dist <- predict(OK_fit, newdata = hex_pointsDegTiny)
dist#data
Actually, there were duplicate locations in abun dataset (zerodist(abun)), they were not to be seeked into the grid on which I wanted to krig estimates. After getting rid of the duplicates, kriging worked fine.

Cross-validation for kriging in R: how to include the trend while reestimating the variogram using xvalid?

I have a question very specific for the function xvalid (package geoR) in R which is used in spatial statistics only, so I hope it's not too specific for someone to be able to answer. In any case, suggestions for alternative functions/packages are welcome too.
I would like to compute a variogram, fit it, and then perform cross-validation. Function xvalid seems to work pretty nice to do the cross-validation. It works when I set reestimate=TRUE (so it reestimates the variogram for every point removed from the dataset in cross-validation) and it also works when using a trend. However, it does not seem to work when combining these two...
Here is an example using the classical Meuse dataset:
library(geoR)
library(sp)
data(meuse) # import data
coordinates(meuse) = ~x+y # make spatialpointsdataframe
meuse#proj4string <- CRS("+init=epsg:28992") # add projection
meuse_geo <- as.geodata(meuse) # create object of class geodata for geoR compatibility
meuse_geo$data <- meuse#data # attach all data (incl. covariates) to meuse_geo
meuse_vario <- variog(geodata=meuse_geo, data=meuse_geo$data$lead, trend= ~meuse_geo$data$elev) # variogram
meuse_vfit <- variofit(meuse_vario, nugget=0.1, fix.nugget=T) # fit
# cross-validation works fine:
xvalid(geodata=meuse_geo, data=meuse_geo$data$lead, model=meuse_vfit, variog.obj = meuse_vario, reestimate=F)
# cross-validation does not work when reestimate = T:
xvalid(geodata=meuse_geo, data=meuse_geo$data$lead, model=meuse_vfit, variog.obj = meuse_vario, reestimate=T)
The error I get is:
Error in variog(coords = cv.coords, data = cv.data, uvec = variog.obj$uvec, : coords and trend have incompatible sizes
It seems to remove the point from the dataset during cross-validation, but it doesn't seem to remove the point from the covariates/trend data. Any ideas on solving this or using a different package?
Thanks a lot in advance!

R: error with autofitVariogram (automap package)

Using autofitVariogram() function from automap package I have generate following error:
Error in vgm_list[[which.min(SSerr_list)]] : attempt to select less
than one element in get1index
Example code:
model <- as.formula(Value ~ Elevation)
data <- matrix(c(11.07,42.75,5,62.5,
8.73,45.62,234,75,
12.62,44.03,12,75,
10.87,45.38,67,75,
8.79,42.53,64,75),
nrow = 5, byrow = TRUE)
data <- as.data.frame(data)
names(data) <- c('Lon', 'Lat', 'Elevation', 'Value')
library('sp')
coordinates(data) = ~Lon+Lat
library('automap')
autofitVariogram(model, data)
What causes this error? Do interpolated values cause some kind of 'singularity'?
Thx!
This error is caused by the fact that gstat cannot generate an experimental variogram given this number of observations:
library(gstat)
library(sp)
data <- matrix(c(11.07,42.75,5,62.5,
8.73,45.62,234,75,
12.62,44.03,12,75,
10.87,45.38,67,75,
8.79,42.53,64,75),
nrow = 5, byrow = TRUE)
data <- as.data.frame(data)
names(data) <- c('Lon', 'Lat', 'Elevation', 'Value')
coordinates(data) = ~Lon+Lat
variogram(Value ~ Elevation, data)
## NULL
When given insufficient observations, gstat::variogram returns NULL. This in turn causes autofitVariogram to fail.
The solution is to simply have more data if you want to use kriging. A rule of thumb is that you need about 30 observations to generate a meaningful variogram to fit a variogram model to.
Recently, I also come across this problem. I find out the reason is that there are some Inf values in my data, and if I delete them, the package works well. Hope this could help you.

Using randomForest package in R, how to map Random forest prediction?

enter image description hereI am trying to use randomforest to generate a spatial prediction map.
I developed my model by using random forest regression, but I met a little difficulty in the last step to use the best predictors for building the predictive map. I want to create a map prediction map.
My code:
library(raster)
library(randomForest)
set.seed(12)
s <- stack("Density.tif", "Aqui.tif", "Rech.tif", "Rainfall.tif","Land Use.tif", "Cond.tif", "Nitrogen.tif", "Regions.tif","Soil.tif","Topo.tif", "Climatclass.tif", "Depth.tif")
points <- read.table("Coordonnées3.txt",header=TRUE, sep="\t", dec=",",strip.white=TRUE)
d <- extract(s, points)
rf <-randomForest(nitrate~ . , data=d, importance=TRUE, ntree=500, na.action = na.roughfix)
p <- predict(s, rf)
plot(p)
Sample Data:
> head(points)
LAT LONG
1 -13.057007 27.549580
2 -4.255000 15.233745
3 5.300000 -1.983610
4 7.245675 -4.233336
5 12.096330 15.036016
6 -4.255000 15.233745
The error when I run my short code is:
Error in eval(expr, envir, enclos) : object 'nitrate' not found.
I am guessing the error happens when you fit the model.
Why would there be a variable called nitrate. Given how you create your RasterStack, perhaps there is one called Nitrogen. Either way you can find out by looking at names(s) and colnames(d).
NOTE that your points are not good! They are in reverse order. The order should be (longitude, latitude).
Based on your comments (please edit your question instead), you should
add nitrate the points file (the third column) or something like that. Then do
xy <- points[, 2:1]
nitrate <- points[,3]
Extract points and combine with your observed data
d <- extract(s, xy)
d <- cbind(nitrate=nitrate, d)
Build model and predict
rf <-randomForest(nitrate~ . , data=d, importance=TRUE, ntree=500, na.action = na.roughfix)
p <- predict(s, rf)
It sounds like the error is coming when you are trying to build the forest. It may be most helpful to not use the formula interface. Also, if d is large, then using the formula interface is not advisable. From the help file on randomForest: "For large data sets, especially those with large number of variables, calling randomForest via the formula interface is not advised: There may be too much overhead in handling the formula."
Assuming d$nitrate exists then the solution is randomForest(y = d$nitrate, x = subset(d, select = -nitrate), importance=TRUE, ntree=500, na.action = na.roughfix)

Time series modelling: "train" function with method "nnet" is not giving satisfactory result

I was trying to implement the use of train function in R using nnet as method on monthly consumption data. But the output (the predicted values) are all showing to be equal to some mean value.
I have data for 24 time points (each representing a month's data) and I have used first 20 for training and the rest 4 for testing the model. Here is my code:
a<-read.csv("...",header=TRUE)
tem<-a[,5]
hum<-a[,4]
con<- a[,3]
require(quantmod)
require(nnet)
require(caret)
y<-con
plot(con,type="l")
dat <- data.frame( y, x1=tem, x2=hum)
names(dat) <- c('y','x1','x2')
#Fit model
model <- train(y ~ x1+x2,
dat[1:20,],
method='nnet',
linout=TRUE,
trace = FALSE)
ps <- predict(model2, dat[21:24,])
plot(1:24,y,type="l",col = 2)
lines(1:24,c(y[1:20],ps), col=3,type="o")
legend(5, 70, c("y", "pred"), cex=1.5, fill=2:3)
Any suggestion on how can I approach this problem alternatively? Is there any way to use Neural Network more efficiently? Or is there any other better method for this?
The problem is likely to be not enough data. 24 data points is quite low, for any machine learning problem. If the curve/shape/surface of the data is eg a simple sin wave, then 24 would be enough.
But for any more complex function, the more data the better. Can you accurately model eg sin^2 x * cos^0.3 x / sinh x with only 6 data points? No, because the available data does not capture enough detail.
If you can acquire daily data, use that instead.

Resources