Cross-validation for kriging in R: how to include the trend while reestimating the variogram using xvalid? - r

I have a question very specific for the function xvalid (package geoR) in R which is used in spatial statistics only, so I hope it's not too specific for someone to be able to answer. In any case, suggestions for alternative functions/packages are welcome too.
I would like to compute a variogram, fit it, and then perform cross-validation. Function xvalid seems to work pretty nice to do the cross-validation. It works when I set reestimate=TRUE (so it reestimates the variogram for every point removed from the dataset in cross-validation) and it also works when using a trend. However, it does not seem to work when combining these two...
Here is an example using the classical Meuse dataset:
library(geoR)
library(sp)
data(meuse) # import data
coordinates(meuse) = ~x+y # make spatialpointsdataframe
meuse#proj4string <- CRS("+init=epsg:28992") # add projection
meuse_geo <- as.geodata(meuse) # create object of class geodata for geoR compatibility
meuse_geo$data <- meuse#data # attach all data (incl. covariates) to meuse_geo
meuse_vario <- variog(geodata=meuse_geo, data=meuse_geo$data$lead, trend= ~meuse_geo$data$elev) # variogram
meuse_vfit <- variofit(meuse_vario, nugget=0.1, fix.nugget=T) # fit
# cross-validation works fine:
xvalid(geodata=meuse_geo, data=meuse_geo$data$lead, model=meuse_vfit, variog.obj = meuse_vario, reestimate=F)
# cross-validation does not work when reestimate = T:
xvalid(geodata=meuse_geo, data=meuse_geo$data$lead, model=meuse_vfit, variog.obj = meuse_vario, reestimate=T)
The error I get is:
Error in variog(coords = cv.coords, data = cv.data, uvec = variog.obj$uvec, : coords and trend have incompatible sizes
It seems to remove the point from the dataset during cross-validation, but it doesn't seem to remove the point from the covariates/trend data. Any ideas on solving this or using a different package?
Thanks a lot in advance!

Related

shap plots for random forest models

I would like to get the Shap Contribution for variables for a Ranger/random forest model and have plots like this in R:
beeswarm plots
I have tried using the following libraries: DALEX, shapr, fastshap, shapper. I could only end up getting plots like this:
fastshap plot
Is it possible to get such plots? I have tried reticulate package and it still doesnt work.
Random forests need to grow many deep trees. While possible, crunching TreeSHAP for deep trees requires an awful lot of memory and CPU power. An alternative is to use the Kernel SHAP algorithm, which works for all kind of models.
library(ranger)
library(kernelshap)
library(shapviz)
set.seed(1)
fit <- ranger(Sepal.Length ~ ., data = iris,)
# Step 1: Calculate Kernel SHAP values
# bg_X is usually a small (50-200 rows) subset of the data
s <- kernelshap(fit, iris[-1], bg_X = iris)
# Step 2: Turn them into a shapviz object
sv <- shapviz(s)
# Step 3: Gain insights...
sv_importance(sv, kind = "bee")
sv_dependence(sv, v = "Petal.Length", color_var = "auto")
Disclaimer: I wrote "kernelshap" and "shapviz"

how to make a proper grid in case of a GAM + OK kriging method?

I am struggling with a Kriging + GAM problem since a while and hope someone will be able to help me.
I am trying to make interpolation of a pollutant into an area. To do so, I use ordinary kriging from gstat, in addition to GAM, based on this method (link where I found it is below):
GAM <- gam(formula = pollutant ~ 1+ s(Long, Lat) + s(Individual, bs="re"), data = dataset)
GAMPredictions <- predict(mydataset, GAM, type="response")
residKrigMap <- krige(formula = residuals(GAM) ,
locations = mydataset,
model = myVariogram,
newdata = MyGrid)
residKrigRstLayer <- as(residKrigMap, "RasterLayer")
gamKrigMap <- GAMPredictions + residKrigRstLayer
All the kriging part is doing ok, but when I try to this:
gamKrigMap <- GAMPredictions + residKrigRstLayer
it does not work. And I am quite sure it's because of the grid I am doing my interpolation on. Indeed, I am using a rds file on which are the areas I want to interpolate. But you can't make the prediction in such a file, as it does not contain the information from the dataset. Also, I have 760 observations from my dataset, but over 6000 from my grid, so there is a mistmatch.
So far, for what I have been reeding, I should create a grid based on my dataset.
So my questions are:
how can I create a grid from my data set with R? All the method I see include existing raster or shapefile that I don't have (or know how to create).
from then, how can I add the country boundaries (interpolation is made in the north atlantic and I want to add the country that are surrounding the area)
Of course let me know if you need more info.
Best,
Céline
link: https://www.r-exercises.com/2018/03/31/advanced-techniques-with-raster-data-part-3-regression-kriging/?cf_chl_jschl_tk=6311eac67bd8c8522c68f83f62be077ef572ca2c-1588325277-0-AaBAYr6yFHMceWCKlsRhOxbdPIKI3oKq6c8PVrtfzQaBgUDqBu3-QG8E1z94b3ygHSpaaYMPtHYo4ThfcJWwsn2p1ZxcvutlBdn-nh0wKILVFc3xrLHUzNqxRqJlY9sjRcHDft9tn9PFPngo06AonPkEkCUIHtQ1mgjz3Kgt9zTp93OJyJEgAp0XRAvc1pzE8MTXv-fAwlkWki21vE9RYiZEgss5m5AYJ-fppvRUL2ckSTl0W765HVVeH3GySp6nuoTvmvaocZ_Q8axMB_za2Iiiwb2iar_h-9s8fOZXh5QbQc-9PojDlPBfU-nxy4E1gGHjFyfXNvnkghKYqZaz6fr4pQ7u7K3R-GglVx2vPlEmIfuIzHYdhAC4oJd84hKMvg)

How to apply universal kriging with custom prediction spatial grid using autoKrige in R

I want to apply universal kriging on a dataset using the autokrige function in R. I would like to create my own custom, spatial grid for the predicted points (for the new_data argument of autokrige). I am using R version 3.2.2 (64-bit) and RStudio Version 0.99.486. The following is what I've done so far:
library(automap)
library(sp)
library(gstat)
library(raster)
library(rgdal)
data(meuse)
coordinates(meuse) <- ~x + y
proj4string(meuse) <- CRS("+init=epsg:28992")
The following code was received from stackexchange here (credit goes to Jeffrey Evans) and is used to create a custom spatial grid for the prediction values:
ext_meuse <- as(extent(meuse), "SpatialPolygons")
r_meuse <- rasterToPoints(raster(ext_meuse, resolution = 59), spatial = TRUE)
proj4string(r_meuse) <- proj4string(meuse)
I then try to apply universal kriging (regression on the 'dist' column) using autoKrige:
kriging_result = autoKrige(zinc~dist, meuse, r_meuse)
The following error is then received:
Error in model.frame.default(terms.f, newdata, na.action = na.action, :
object is not a matrix In addition: Warning message:
'newdata' had 3102 rows but variable found had 1 row
Did I made a mistake with the grid creation (r_meuse)? Is there a 'better' way to create a grid for the predicted data? All the examples I have found so far uses the meuse.grid data, but I would like to apply universal kriging to other data that does not have its own grid data yet.
I believe the problem here is that you are performing UK without having the predictor, dist, present in r_meuse. This is a problem as that information is needed for the linear to make a prediction. So, r_meuse needs to be a SpatialPointsDataFrame with dist defined.

Duplicate data when using gstat or automap package in R

I am trying to using ordinary kriging to spatially predict data where an animal will occur based on predictor variables using the gstat or automap package in R. I have many (over 100) duplicate coordinate points, which I cannot throw out since those stations were sampled multiple times over many years. Every time that I run the code below for ordinary kriging, I get an LDL error, which is due to the duplicate points. Does anyone know how to fix this problem without throwing out data? I have tried the code from the automap package that is supposed to correct for duplicates but I can't get that to work. Thank you for the help!
coordinates(fish) <- ~ LONGITUDE+LATITUDE
x.range <- range(fish#coords[,1])
y.range <- range(fish#coords[,2])
grd <- expand.grid(x=seq(from=x.range[1], to=x.range[2], by=3), y=seq(from=y.range[1], to=y.range[2], by=3))
coordinates(grd) <- ~ x+y
plot(grd, pch=16, cex=.5)
gridded(grd) <- TRUE
library(gstat)
zerodist(fish) ###146 duplicate points
v <- variogram(log(WATER_TEMP) ~1, fish, na.rm=TRUE)
plot(v)
vgm()
f <- vgm(1, "Sph", 300, 0.5)
print(f)
v.fit <- fit.variogram(v,f)
plot(v, model=v.fit) ####In fit.variogram(v, d) : Warning: singular model in variogram fit
krg <- krige(log(WATER_TEMP) ~ 1, fish, grd, v.fit)
## [using ordinary kriging]
##"chfactor.c", line 131: singular matrix in function LDLfactor()Error in predict.gstat(g, newdata = newdata, block = block, nsim = nsim,: LDLfactor
##automap code for correcting for duplicates
fish.dup = rbind(fish, fish[1,]) # Create duplicate
coordinates(fish.dup) = ~LONGITUDE + LATITUDE
kr = autoKrige(WATER_TEMP, fish.dup, grd)
###Error in inherits(formula, "SpatialPointsDataFrame"):object 'WATER_TEMP' not found
###somehow my predictor variables are no longer available when in a Spatial Points Data Frame??
automap::autoKrige expects a formula as first argument, try
kr = autoKrige(WATER_TEMP~1, fish.dup, grd)
automaphas a very simple fix for duplicate observations, and that is to discard them. So, automapdoes not really solves the issue you have. I see some options:
Discard the duplicates.
Slightly perturb the coordinates of the duplicates so that they are not on exactly the same location anymore.
Perform space-time kriging using gstat.
In regard to your specific issue, please make your example reproducible. What I can guess is that rbind of your fish object is not doing what you expect...
Alternatively you can use the function jitterDupCoords of geoR package.
https://cran.r-project.org/web/packages/geoR/geoR.pdf

PCA using raster datasets in R

I have several large rasters that I want to process in a PCA (to produce summary rasters).
I have seen several examples whereby people seem to be simply calling prcomp or princomp. However, when I do this, I get the following error message:
Error in as.vector(data): no method for coercing this S4 class to a vector
Example code:
files<-list.files() # a set of rasters
layers<-stack(files) # using the raster package
pca<-prcomp(layers)
I have tried using a raster brick instead of stack but that doesn't seem to the issue. What method do I need to provide the command so that it can convert the raster data to vector format? I understand that there are ways to sample the raster and run the PCA from that, but I would really like to understand why the above method is not working.
Thanks!
The above method is not working simply because prcomp does not know how to deal with a raster object. It only knows how to deal with vectors, and coercing to vector does not work, hence the error.
What you need to do is read each of your files into a vector, and put each of the rasters in a column of a matrix. Each row will then be a time series of values at a single spatial location, and each column will be all the pixels at a certain time step. Note that the exact spatial coordinates are not needed in this approach. This matrix serves as the input of prcomp.
Reading the files can be done using readGDAL, and using as.data.frame to cast the spatial data to data.frame.
Answer to my own question: I ended up doing something slightly different: rather than using every raster cell as input (very large dataset), I took a sample of points, ran the PCA and then saved the output model so that I could make predictions for each grid cell…maybe not the best solution but it works:
rasters <- stack(myRasters)
sr <- sampleRandom(rasters, 5000) # sample 5000 random grid cells
# run PCA on random sample with correlation matrix
# retx=FALSE means don't save PCA scores
pca <- prcomp(sr, scale=TRUE, retx=FALSE)
# write PCA model to file
dput(pca, file=paste("./climate/", name, "/", name, "_pca.csv", sep=""))
x <- predict(rasters, pca, index=1:6) # create new rasters based on PCA predictions
There is rasterPCA function in RStoolbox package http://bleutner.github.io/RStoolbox/rstbx-docu/rasterPCA.html
For example:
library('raster')
library('RStoolbox')
rasters <- stack(myRasters)
pca1 <- rasterPCA(rasters)
pca2 <- rasterPCA(rasters, nSamples = 5000) # sample 5000 random grid cells
pca3 <- rasterPCA(rasters, norm = FALSE) # without normalization
here is a working solution:
library(raster)
filename <- system.file("external/rlogo.grd", package="raster")
r1 <- stack(filename)
pca<-princomp(r1[], cor=T)
res<-predict(pca,r1[])
Display result:
r2 <- raster(filename)
r2[]<-res[,1]
plot(r2)
Yet another option would be to extract the vales from the raster-stack, i.e.:
rasters <- stack(my_rasters)
values <- getValues(rasters)
pca <- prcomp(values, scale = TRUE)
Here is another approach that expands on the getValues approach proposed by #Daniel. The result is a raster stack. The index (idx) references non-NA positions so that NA values are accounted for.
library(raster)
r <- stack(system.file("external/rlogo.grd", package="raster"))
r.val <- getValues(r)
idx <- which(!is.na(r.val))
pca <- princomp(r.val, cor=T)
ncomp <- 2 # first two principle components
r.pca <- r[[1:ncomp]]
for(i in 1:ncomp) { r.pca[[i]][idx] <- pca$scores[,i] }
plot(r.pca)

Resources