R: error with autofitVariogram (automap package) - r

Using autofitVariogram() function from automap package I have generate following error:
Error in vgm_list[[which.min(SSerr_list)]] : attempt to select less
than one element in get1index
Example code:
model <- as.formula(Value ~ Elevation)
data <- matrix(c(11.07,42.75,5,62.5,
8.73,45.62,234,75,
12.62,44.03,12,75,
10.87,45.38,67,75,
8.79,42.53,64,75),
nrow = 5, byrow = TRUE)
data <- as.data.frame(data)
names(data) <- c('Lon', 'Lat', 'Elevation', 'Value')
library('sp')
coordinates(data) = ~Lon+Lat
library('automap')
autofitVariogram(model, data)
What causes this error? Do interpolated values cause some kind of 'singularity'?
Thx!

This error is caused by the fact that gstat cannot generate an experimental variogram given this number of observations:
library(gstat)
library(sp)
data <- matrix(c(11.07,42.75,5,62.5,
8.73,45.62,234,75,
12.62,44.03,12,75,
10.87,45.38,67,75,
8.79,42.53,64,75),
nrow = 5, byrow = TRUE)
data <- as.data.frame(data)
names(data) <- c('Lon', 'Lat', 'Elevation', 'Value')
coordinates(data) = ~Lon+Lat
variogram(Value ~ Elevation, data)
## NULL
When given insufficient observations, gstat::variogram returns NULL. This in turn causes autofitVariogram to fail.
The solution is to simply have more data if you want to use kriging. A rule of thumb is that you need about 30 observations to generate a meaningful variogram to fit a variogram model to.

Recently, I also come across this problem. I find out the reason is that there are some Inf values in my data, and if I delete them, the package works well. Hope this could help you.

Related

R: Problem with raster prediction from a linear model

I am using the function raster::predict to extract the prediction part of a linear model as a raster but I am getting this error:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : object is not a matrix
In addition: Warning message:
'newdata' had 622 rows but variables found have 91 rows
My data set is a RasterStack of two satellite images (same CRS and data type). I have found this question but I couldn't solve my problem.
Here is the code and the data:
library(raster)
ntl = raster ("path/ntl.tif")
vals_ntl <- as.data.frame(values(ntl))
ntl_coords = as.data.frame(xyFromCell(ntl, 1:ncell(ntl)))
combine <- as.data.frame(cbind(ntl_coords,vals_ntl))
ebbi = raster ("path/ebbi.tif")
ebbi <- resample(ebbi, ntl, method = "bilinear")
vals_ebbi <- as.data.frame(values(ebbi))
s = stack(ntl, ebbi)
block.data <- as.data.frame(cbind(combine, vals_ebbi))
names(block.data)[3] <- "ntl"
names(block.data)[4] <- "ebbi"
block.data <- na.omit(block.data)
model <- lm(formula = ntl ~ ebbi, data = block.data)
#predict to a raster
r1 <- raster::predict(s, model, progress = 'text', na.rm = T)
plot(r1)
writeRaster(r1, filename = "path/lm_predict.tif")
The data can be downloaded from here (I don't know if by sharing a smaller dataset the problem would still exist so I decided to share the full dataset which is quite big when using the dput command to copy-paste it)
You are correct that dput is generally not very useful for spatial data; and that you should avoid using it. However, in most cases, there is no need to share data as you can create example data with code, or with data that ships with R, like in most examples in the help files and questions on this site. Saying that "I don't know if by sharing a smaller dataset the problem would still exist" suggests that the first thing you should do is to find out.
If you have a SpatRaster x that you want to reproduce, you can start with as.character(x), which is what I did to get the below.
library(terra)
ntl <- rast(ncols=48, nrows=91, nlyrs=1, xmin=582360, xmax=604440, ymin=1005560, ymax=1047420, names=c('avg_rad'), crs='EPSG:7767')
ebbi <- rast(ncols=48, nrows=91, nlyrs=1, xmin=582360, xmax=604440, ymin=1005560, ymax=1047420, names=c('B6_median'), crs='EPSG:7767')
values(ntl) <- sample(100, ncell(ntl), replace=TRUE)
values(ebbi) <- runif(ncell(ebbi))
Combine, set the names, and get the values into a data.frame. For larger datasets you could take a sample with spatSample(x, type="regular").
x <- c(ntl, ebbi)
names(x) <- c("ntl", "ebbi")
Fit the model. You can do that in two steps
v <- as.data.frame(x, na.rm=TRUE)
model <- lm(ntl ~ ebbi, data=v)
Or in one step
model <- lm(ntl ~ ebbi, data=x)
And now predict (set a filename if you want to save the raster to disk).
p <- predict(x$ebbi, model, filename="")
It is important that the first (SpatRaster) argument to predict has names that match the names in the model. So in this case you can use x$ebbi or x[[2]], but if you use ebbi you get a mysterious error message
p <- predict(ebbi, model)
#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : object is not a matrix
#In addition: Warning message:
#'newdata' had 48 rows but variables found have 91 rows
unless you first do
names(ebbi) <- "ebbi"
p <- predict(ebbi, model)
Alternative, using the raster package the solution is:
library(raster)
ntl = raster ("path/ntl.tif")
ebbi = raster ("path/ebbi.tif")
ebbi <- resample(ebbi, ntl, method = "bilinear")
s = stack(ntl, ebbi)
names(s) = c('ntl', 'ebbi') # important step in order to run the predict function successfully
block.data = data.frame(na.omit(values(s)))
names(block.data) <- c('ntl', 'ebbi')
model <- lm(formula = ntl ~ ebbi, data = block.data)
#predict to a raster
r1 <- raster::predict(s, model, progress = 'text', na.rm = T)
plot(r1)
writeRaster(r1, filename = "path/lm_predict.tif")
I found the answer based on this post.

Does order of data matter?

I am using R to perform hierarchical clustering to categorical data.
I am trying out different variables from my sample, in order to identify the ones that provide meaningful clustering results. However, I noticed that if I change the order of the data, the results are different. Is this due to the way hclust works, or am I missing something?
For each trial I extract a certain number of columns (in the following example I used columns 3,28,50,14).
my.data.final <- data.frame(read.csv("C:\\Final dataset-for R.csv"))
library(dplyr)
my.data.final <- my.data.final %>% mutate_if(is.character,as.factor)
my.data.final <- my.data.final %>% mutate_if(is.integer,as.factor)
my.data.final$Age <- factor(my.data.final$Age, ordered = TRUE)
my.data3 <- my.data.final[,c(3,28,50,14)]
my.data3 <- na.exclude(my.data3, row.names=1)
complete.cases(my.data3)
library(cluster)
dist.gower <- daisy(my.data3, metric = "gower")
aggl.clust.c <- hclust(dist.gower, method = "complete")
plot(aggl.clust.c,
main = "Agglomerative, complete linkages")
When I change the order of the columns in the line:
my.data3 <- my.data.final[,c(3,28,50,14)]
I noticed that the dendrogram changes. Is it expected to happen with hclust ?
I have found that the line:
my.data.final$Age <- factor(my.data.final$Age, ordered = TRUE)
somehow affects the result but I am not quite sure why.

Column changes from "WinorLoss" to "Class"

I am working on constructing a logistic model on R (I am a beginner on R and am following a tutorial on building logistic models). I have done the following, everything works but when I complete the downsample function for some reason the column named "WinorLoss" changes to "Class" and I am sure this cause an issue with everything.
Could anyone please let me know if what I am doing makes sense or is there big errors I am making?
my_data <- read.csv('C:/Users/Magician/Desktop/R files/Fnaticfirstround.csv', header=TRUE)
my_data
str(my_data)
library(mlbench)
glm(Map ~ WinorLoss, family="binomial", data=my_data)
table(my_data$Map)
table(my_data$WinorLoss)
my_data$WinorLoss <- ifelse(my_data$WinorLoss == "W", 1,0)
my_data$WinorLoss <- factor(my_data$WinorLoss, levels = c(0,1))
my_data
table(my_data$WinorLoss)
library(caret)
'%ni%' <- Negate('%in%')
options(scipen=999)
set.seed(100)
trainDataIndex <- createDataPartition(my_data$WinorLoss, p=0.7, list=F)
trainData <- my_data[trainDataIndex, ]
testData <- my_data[-trainDataIndex, ]
trainData
testData
table(trainData$WinorLoss)
table(testData$WinorLoss)
set.seed(100)
down_train <- downSample(x = trainData[, colnames(trainData) %ni% "WinorLoss"],
y = trainData$WinorLoss)
down_train
When running trainData the columns returned are Date, Event, opponent, Map, Score, WinorLoss, winner.. but when I run the downtrain function the columns become Date, Event, opponent, Map, Score, winner, Class
Help Please!
Yep, downSample and some of the other caret packages do that by default, unless specified otherwise.
If you have a question about a particular function try the manual packages first.
?downSample
If you do this you will see all of the arguments
downSample(x, y, list = FALSE, yname = "Class")
So by default the function will change the yname to "Class" which is what you are seeing.
Thus to get your desired output:
down_train <- downSample(x = trainData[, colnames(trainData) %ni% "WinorLoss"],
y = trainData$WinorLoss,
yname = "WinorLoss")

MXnet odd error

This is my first ANN so I imagine that there might be a lot of things done wrong here. I don't follow
I'm trying to predict species of flowers from iris data set provided in R language but I get following error:
Error in `dimnames<-.data.frame`(`*tmp*`, value = list(n)) :
invalid 'dimnames' given for data frame
My code:
require(mxnet)
train <- iris[1:130,]
test <- iris[131:150,]
train.data <- as.data.frame(train[-5])
train.label <- data.frame(model.matrix(data=train,object =~Species-1))
test.data <- as.data.frame(test[-5])
test.label <- data.frame(model.matrix(data=test,object =~Species-1))
var1 <- mx.symbol.Variable("data")
layer0 <- mx.symbol.FullyConnected(var1, num.hidden=3)
cat.out <- mx.symbol.SoftmaxOutput(layer0)
net.model <- mx.model.FeedForward.create(cat.out,
array.layout = "auto",
X=train.data,
y=train.label,
eval.data = list(data=test.data,label=test.label),
num.round = 20,
array.batch.size = 20,
learning.rate=0.1,
momentum=0.9,
eval.metric = mx.metric.accuracy)
UPDATE:
I managed to get rid of this error by specifying column to use in labels(traning.label[,1]and test.label[,1]).
However now I'm training my net to predict just one of my binary variables while I have 3 (one for each species).
I had the same problem, turned out that:
train.data should be a matrix
train.label should be a numeric vector
Check these two and hopefully it should work.
I had a similar problem but during the prediction step. It turns out that my features were in a Data Frame which was causing the issue. Once I converted the data frame into a matrix, the issue went away.
pred.values = stats::predict(model,as.matrix(features))
instead of
pred.values = stats::predict(model,features)
So, the features need to be a matrix both during training and during the process of making predictions.

Duplicate data when using gstat or automap package in R

I am trying to using ordinary kriging to spatially predict data where an animal will occur based on predictor variables using the gstat or automap package in R. I have many (over 100) duplicate coordinate points, which I cannot throw out since those stations were sampled multiple times over many years. Every time that I run the code below for ordinary kriging, I get an LDL error, which is due to the duplicate points. Does anyone know how to fix this problem without throwing out data? I have tried the code from the automap package that is supposed to correct for duplicates but I can't get that to work. Thank you for the help!
coordinates(fish) <- ~ LONGITUDE+LATITUDE
x.range <- range(fish#coords[,1])
y.range <- range(fish#coords[,2])
grd <- expand.grid(x=seq(from=x.range[1], to=x.range[2], by=3), y=seq(from=y.range[1], to=y.range[2], by=3))
coordinates(grd) <- ~ x+y
plot(grd, pch=16, cex=.5)
gridded(grd) <- TRUE
library(gstat)
zerodist(fish) ###146 duplicate points
v <- variogram(log(WATER_TEMP) ~1, fish, na.rm=TRUE)
plot(v)
vgm()
f <- vgm(1, "Sph", 300, 0.5)
print(f)
v.fit <- fit.variogram(v,f)
plot(v, model=v.fit) ####In fit.variogram(v, d) : Warning: singular model in variogram fit
krg <- krige(log(WATER_TEMP) ~ 1, fish, grd, v.fit)
## [using ordinary kriging]
##"chfactor.c", line 131: singular matrix in function LDLfactor()Error in predict.gstat(g, newdata = newdata, block = block, nsim = nsim,: LDLfactor
##automap code for correcting for duplicates
fish.dup = rbind(fish, fish[1,]) # Create duplicate
coordinates(fish.dup) = ~LONGITUDE + LATITUDE
kr = autoKrige(WATER_TEMP, fish.dup, grd)
###Error in inherits(formula, "SpatialPointsDataFrame"):object 'WATER_TEMP' not found
###somehow my predictor variables are no longer available when in a Spatial Points Data Frame??
automap::autoKrige expects a formula as first argument, try
kr = autoKrige(WATER_TEMP~1, fish.dup, grd)
automaphas a very simple fix for duplicate observations, and that is to discard them. So, automapdoes not really solves the issue you have. I see some options:
Discard the duplicates.
Slightly perturb the coordinates of the duplicates so that they are not on exactly the same location anymore.
Perform space-time kriging using gstat.
In regard to your specific issue, please make your example reproducible. What I can guess is that rbind of your fish object is not doing what you expect...
Alternatively you can use the function jitterDupCoords of geoR package.
https://cran.r-project.org/web/packages/geoR/geoR.pdf

Resources