Issue with cross-validation using the automap package - r

I want to do a cross-validation for the ca20-Dataset from the geoR
package. With for example the meuse-dataset, this works fine, but for
this dataset, I encounter a strange problem with the dimensions of the
SpatialPointsDataFrame. Maybe you can try this for yourself and explain
why the autoKrige.cv function does not work (I tried several
nfold-values but this only changes the locations-value of the error
message...):
library(geoR)
library(gstat)
library(automap)
data(ca20)
east=ca20$coords[,1]
north=ca20$coords[,2]
concentration=ca20$data
frame=data.frame(east,north)
data=data.frame(concentration)
points<-SpatialPoints(data.frame(east,north),proj4string=CRS(as.character(NA)))
pointsframe<-SpatialPointsDataFrame(points,data, coords.nrs = numeric(0),proj4string = CRS(as.character(NA)), match.ID = TRUE)
krig=autoKrige(pointsframe$concentration~1,pointsframe)
plot(krig)
cv=autoKrige.cv(pointsframe$concentration~1,pointsframe)
I hope someone can reproduce the problem, my R version is 2.15, all packages are up to date (at least not older than a month or so...).
Thanks for your help!!

First, the way you build your SpatialPointsDataFrame can be done more easily:
library(geoR)
library(gstat)
library(automap)
...and build the SPDF:
pointsframe = data.frame(ca20$coords)
pointsframe$concentration = ca20$data
coordinates(pointsframe) = c("east", "north")
The problem you have is in how you use the formula argument. You add the spatial object pointsframe to the formula, in essence putting a vector directly into the formula. You should just use the column name in the formula, like this:
cv=autoKrige.cv(concentration~1,pointsframe)
and it works:
> summary(cv)
[,1]
mean_error -0.01134
me_mean -0.0002237
MAE 6.02
MSE 60.87
MSNE 1.076
cor_obspred 0.7081
cor_predres 0.01343
RMSE 7.802
RMSE_sd 0.7041
URMSE 7.802
iqr 9.519

Related

xgboost in R providing unexpected prediction

Below is a code that produces a simple xgboost model to show the issue I've been seeing. Once the model has been built, we predict using this model and take the second row in our data. If we take the log of relative difference between prediction of the 10th and 9th model, it should give us the prediction for the 10th tree: 0.00873184 in this case.
Now if we use the input to the tree (matrix "a" which has value 0.1234561702 for row 2) and run through the model, we expect a prediction of 0.0121501638. However, it looks like after the second split (<0.123456173) it takes the wrong direction and ends up at the node with 0.00873187464 - very close to what we expect!
Does anyone have an idea what is going on?
10th Tree
Versions:
R: 4.1.0
xgboost: 1.4.1.1
dplyr: 1.0.7
data.table: 1.14.0
library(xgboost)
library(dplyr)
library(data.table)
set.seed(2)
a <- matrix(runif(1000,0.1234561,0.1234562),
ncol=1,nrow=1000)
colnames(a) <- c("b")
d <- abs(rnorm(1000,3*a[,1]))
d2 <- xgb.DMatrix(data = a,label = d)
e <- xgboost::xgboost(data=d2,nrounds=10,method="hist",objective="reg:gamma")
xgb.plot.tree(e$feature_names,e,trees=9)
x <- 2
log((predict(e,a,ntreelimit = 10)/predict(e,a,ntreelimit = 9)))[x]
format(a[x,],nsmall=10)
For anyone interested in the answer, the xgboost team provided it here:
https://github.com/dmlc/xgboost/issues/7294
In short, xgboost converts the input data into float32 before training whereas R uses double by default. Hence, what should be done is convert 0.1234561702 to float32 before running through the model. Doing that gives the value 0.123456173 which now takes the right path.

Creating a confusion matrix for multiclass classification in keras using R

I am running a Bidirectional LSTM for multiclass text classification in R using Keras. I have run my model and I need to create a confusion matrix. I tried using predict_classes() but my RStudio threw an error that predict_classes() was deprecated. I tried to use this bit of code that I found on the RStudio Keras website:
prediction1 <- model %>%
predict(x.test) %>%
k_argmax(axis = -1)
NOTE: x.test is my matrix that contains the text features.
I am not sure how to use it + I have not found any examples of how to use it online so I am quite confused. I would appreciate any help that anyone could provide!
Thanks
You can use 'caret' library to achieve that.
#Install required packages
install.packages('caret')
#Import required library
library(caret)
#Creates vectors having data points
expected_value <- factor(c(1,0,1,0,1,1,1,0,0,1))
predicted_value <- factor(c(1,0,0,1,1,1,0,0,0,1))
#Creating confusion matrix
example <- confusionMatrix(data=predicted_value, reference = expected_value)
#Display results
example
Or the table function:
pred <- model %>% predict(x_test, batch_size = batch_size)
y_pred = round(pred)
# Confusion matrix
confusion_matrix = table(y_pred, y_test)
For the 'caret' example:
https://www.journaldev.com/46732/confusion-matrix-in-r

Sensitivity and Specificity in R

I want to know how i can write a functions Sensitivity() and Specificity() that help me to compute Sensitivity and Specificity by using R ? What options can help me?
Here is a method using the caret package and it includes a reproducible example (i.e. a bit of code that someone can quickly run to help you out) from the help files of the caret package. #llottmanhill is correct that you will get more help when you tell us what you are trying to do. Right now your question is quite vague. However, give this a shot:
library(caret)
library(MASS)
fit <- lda(Species ~ ., data = iris)
model <- predict(fit)$class
irisTabs <- table(model, iris$Species)
## When passing factors, an error occurs with more
## than two levels
sensitivity(model, iris$Species)
## When passing a table, more than two levels can
## be used
sensitivity(irisTabs, "versicolor")
specificity(irisTabs, c("setosa", "virginica"))

Cross-validation for kriging in R: how to include the trend while reestimating the variogram using xvalid?

I have a question very specific for the function xvalid (package geoR) in R which is used in spatial statistics only, so I hope it's not too specific for someone to be able to answer. In any case, suggestions for alternative functions/packages are welcome too.
I would like to compute a variogram, fit it, and then perform cross-validation. Function xvalid seems to work pretty nice to do the cross-validation. It works when I set reestimate=TRUE (so it reestimates the variogram for every point removed from the dataset in cross-validation) and it also works when using a trend. However, it does not seem to work when combining these two...
Here is an example using the classical Meuse dataset:
library(geoR)
library(sp)
data(meuse) # import data
coordinates(meuse) = ~x+y # make spatialpointsdataframe
meuse#proj4string <- CRS("+init=epsg:28992") # add projection
meuse_geo <- as.geodata(meuse) # create object of class geodata for geoR compatibility
meuse_geo$data <- meuse#data # attach all data (incl. covariates) to meuse_geo
meuse_vario <- variog(geodata=meuse_geo, data=meuse_geo$data$lead, trend= ~meuse_geo$data$elev) # variogram
meuse_vfit <- variofit(meuse_vario, nugget=0.1, fix.nugget=T) # fit
# cross-validation works fine:
xvalid(geodata=meuse_geo, data=meuse_geo$data$lead, model=meuse_vfit, variog.obj = meuse_vario, reestimate=F)
# cross-validation does not work when reestimate = T:
xvalid(geodata=meuse_geo, data=meuse_geo$data$lead, model=meuse_vfit, variog.obj = meuse_vario, reestimate=T)
The error I get is:
Error in variog(coords = cv.coords, data = cv.data, uvec = variog.obj$uvec, : coords and trend have incompatible sizes
It seems to remove the point from the dataset during cross-validation, but it doesn't seem to remove the point from the covariates/trend data. Any ideas on solving this or using a different package?
Thanks a lot in advance!

Duplicate data when using gstat or automap package in R

I am trying to using ordinary kriging to spatially predict data where an animal will occur based on predictor variables using the gstat or automap package in R. I have many (over 100) duplicate coordinate points, which I cannot throw out since those stations were sampled multiple times over many years. Every time that I run the code below for ordinary kriging, I get an LDL error, which is due to the duplicate points. Does anyone know how to fix this problem without throwing out data? I have tried the code from the automap package that is supposed to correct for duplicates but I can't get that to work. Thank you for the help!
coordinates(fish) <- ~ LONGITUDE+LATITUDE
x.range <- range(fish#coords[,1])
y.range <- range(fish#coords[,2])
grd <- expand.grid(x=seq(from=x.range[1], to=x.range[2], by=3), y=seq(from=y.range[1], to=y.range[2], by=3))
coordinates(grd) <- ~ x+y
plot(grd, pch=16, cex=.5)
gridded(grd) <- TRUE
library(gstat)
zerodist(fish) ###146 duplicate points
v <- variogram(log(WATER_TEMP) ~1, fish, na.rm=TRUE)
plot(v)
vgm()
f <- vgm(1, "Sph", 300, 0.5)
print(f)
v.fit <- fit.variogram(v,f)
plot(v, model=v.fit) ####In fit.variogram(v, d) : Warning: singular model in variogram fit
krg <- krige(log(WATER_TEMP) ~ 1, fish, grd, v.fit)
## [using ordinary kriging]
##"chfactor.c", line 131: singular matrix in function LDLfactor()Error in predict.gstat(g, newdata = newdata, block = block, nsim = nsim,: LDLfactor
##automap code for correcting for duplicates
fish.dup = rbind(fish, fish[1,]) # Create duplicate
coordinates(fish.dup) = ~LONGITUDE + LATITUDE
kr = autoKrige(WATER_TEMP, fish.dup, grd)
###Error in inherits(formula, "SpatialPointsDataFrame"):object 'WATER_TEMP' not found
###somehow my predictor variables are no longer available when in a Spatial Points Data Frame??
automap::autoKrige expects a formula as first argument, try
kr = autoKrige(WATER_TEMP~1, fish.dup, grd)
automaphas a very simple fix for duplicate observations, and that is to discard them. So, automapdoes not really solves the issue you have. I see some options:
Discard the duplicates.
Slightly perturb the coordinates of the duplicates so that they are not on exactly the same location anymore.
Perform space-time kriging using gstat.
In regard to your specific issue, please make your example reproducible. What I can guess is that rbind of your fish object is not doing what you expect...
Alternatively you can use the function jitterDupCoords of geoR package.
https://cran.r-project.org/web/packages/geoR/geoR.pdf

Resources