R: Calculate sill, range and nugget from a raster object - r

I need to calculate the sill, range and nugget from a raster layer. I have explored gstat, usdm packages where one can create variogram however I couln't find a function which given a raster layer will estimate these parameters.In most of the functions these parameters have to be defined eg. krigging.
I have raster data layers for different heights which looks similar to
I would like get the sill, nugget and range from the parameters of semivariogram fitted to these data layers to create a plot similar to this:
The original data layers are available here as a multiband tiff. Here is a figure from this paper which further illustrates the concept.

Using gstat, here is an example:
library(raster)
library(gstat)
demo(meuse, ask = FALSE, echo = FALSE)
set.seed(131) # make random numbers reproducible
# add some noise with .1 variance
meuse.grid$dist = meuse.grid$dist + rnorm(nrow(meuse.grid), sd=sqrt(.1))
r = raster(meuse.grid["dist"])
v = variogram(dist~1, as(r, "SpatialPixelsDataFrame"))
(f = fit.variogram(v, vgm("Sph")))
# model psill range
# 1 Nug 0.09035948 0.000
# 2 Sph 0.06709838 1216.737
f$psill[2] # sill
# [1] 0.06709838
f$range[2] # range
# [1] 1216.737
f$psill[1] # nugget
# [1] 0.09035948
Plug in your own raster for r, and it should work. Change the Sph to fit another variogram model, try plot(v,f) to verify the plot.

This is just a guess. This is how I estimate semi variance
where n is the number of layers which their mean is less than the total mean. m is the total mean across all the layers. r is the mean of each layer that fell below the total mean.
s <- stack("old_gap_.tif")
m <- cellStats(mean(s), stat="mean", na.rm=T) # 0.5620522
r <- m[m < 0.5620522]
sem <- 1/53 * (0.5620522 - r)^2
plot(sem, r)

Related

Maximum pseudo-likelihood estimator for soft-core point process

I am trying to fit a soft-core point process model on a set of point pattern using maximum pseudo-likelihood. I followed the instructions given in this paper by Baddeley and Turner
And here is the R-code I came up with
`library(deldir)
library(tidyverse)
library(fields)
#MPLE
# irregular parameter k
k <- 0.4
## Generate dummy points 50X50. "RA" and "DE" are x and y coordinates
dum.x <- seq(ramin, ramax, length = 50)
dum.y <- seq(demin, demax, length = 50)
dum <- expand.grid(dum.x, dum.y)
colnames(dum) <- c("RA", "DE")
## Combine with data and specify which is data point and which is dummy, X is the point pattern to be fitted
bind.x <- bind_rows(X, dum) %>%
mutate(Ind = c(rep(1, nrow(X)), rep(0, nrow(dum))))
## Calculate Quadrature weights using Voronoi cell area
w <- deldir(bind.x$RA, bind.x$DE)$summary$dir.area
## Response
y <- bind.x$Ind/w
# the sum of distances between all pairs of points (the sufficient statistics)
tmp <- cbind(bind.x$RA, bind.x$DE)
t1 <- rdist(tmp)^(-2/k)
t1[t1 == Inf] <- 0
t1 <- rowSums(t1)
t <- -t1
# fit the model using quasipoisson regression
fit <- glm(y ~ t, family = quasipoisson, weights = w)
`
However, the fitted parameter for t is negative which is obviously not a correct value for a softcore point process. Also, my point pattern is actually simulated from a softcore process so it does not make sense that the fitted parameter is negative. I tried my best to find any bugs in the code but I can't seem to find it. The only potential issue I see is that my sufficient statistics is extremely large (on the order of 10^14) which I fear may cause numerical issues. But the statistics are large because my observation window spans a very small unit and the average distance between a pair of points is around 0.006. So sufficient statistics based on this will certainly be very large and my intuition tells me that it should not cause a numerical problem and make the fitted parameter to be negative.
Can anybody help and check if my code is correct? Thanks very much!

siber.ellipses -> Error in rmultireg(Y, X, Bbar, A, nu, V) : not a matrix

I am very new to R and am currently trying to create siber ellipses.
I watched the potcast Using ellipses to compare community members:(http://www.tcd.ie/Zoology/research/research/theoretical/Rpodcasts.php#siber) and got along just fine in the beginning. Whenever i get to the function of the siber.ellipses i get an Error:
(Error in rmultireg(Y, X, Bbar, A, nu, V) : not a matrix)
I can not figure out why. I get it with my own data as well as with the example data in the zip file provided along with the script.
I have researched the Error message online but could not come up with an answer.
It almost can be an error in the script or the data since I used those exactly as provided. My R version is 3.3.2
Does it have something to do with some kind of settings?
What could be the reason?
Can someone please helpe me :)
thanks
# this demo generates some random data for M consumers based on N samples and
# constructs a standard ellipse for each based on SEAc and SEA_B
rm(list = ls())
library(siar)
-------------------------------------------------------------------------
# ANDREW - REMOVE THESE LINES WHICH SHOULD BE REDUNDANT
# change this line
setwd("C:/Users/elisabeth/Desktop/R/demo")
# -----------------------------------------------------------------------------
# now close all currently open windows
graphics.off()
# read in some data
# NB the column names have to be exactly, "group", "x", "y"
mydata <- read.table("example_ellipse_data.txt",sep="\t",header=T)
# make the column names availble for direct calling
attach(mydata)
# now loop through the data and calculate the ellipses
ngroups <- length(unique(group))
# split the isotope data based on group
spx <- split(x,group)
spy <- split(y,group)
# create some empty vectors for recording our metrics
SEA <- numeric(ngroups)
SEAc <- numeric(ngroups)
TA <- numeric(ngroups)
dev.new()
plot(x,y,col=group,type="p")
legend("topright",legend=as.character(paste("Group ",unique(group))),
pch=19,col=1:length(unique(group)))
for (j in unique(group)){
# Fit a standard ellipse to the data
SE <- standard.ellipse(spx[[j]],spy[[j]],steps=1)
# Extract the estimated SEA and SEAc from this object
SEA[j] <- SE$SEA
SEAc[j] <- SE$SEAc
# plot the standard ellipse with d.f. = 2 (i.e. SEAc)
# These are plotted here as thick solid lines
lines(SE$xSEAc,SE$ySEAc,col=j,lty=1,lwd=3)
# Also, for comparison we can fit and plot the convex hull
# the convex hull is plotted as dotted thin lines
#
# Calculate the convex hull for the jth group's isotope values
# held in the objects created using split() called spx and spy
CH <- convexhull(spx[[j]],spy[[j]])
# Extract the area of the convex hull from this object
TA[j] <- CH$TA
# Plot the convex hull
lines(CH$xcoords,CH$ycoords,lwd=1,lty=3)
}
# print the area metrics to screen for comparison
# NB if you are working with real data rather than simulated then you wont be
# able to calculate the population SEA (pop.SEA)
# If you do this enough times or for enough groups you will easily see the
# bias in SEA as an estimate of pop.SEA as compared to SEAc which is unbiased.
# Both measures are equally variable.
print(cbind(SEA,SEAc,TA))
# So far we have fitted the standard ellipses based on frequentist methods
# and calculated the relevant metrics (SEA and SEAc). Now we turn our attention
# to producing a Bayesian estimate of the standard ellipse and its area SEA_B
reps <- 10^4 # the number of posterior draws to make
# Generate the Bayesian estimates for the SEA for each group using the
# utility function siber.ellipses
SEA.B <- siber.ellipses(x,y,group,R=reps)
Error in rmultireg(Y, X, Bbar, A, nu, V) : not a matrix
Reinstall the SIAR package using Jackson's Github site:
library(devtools)
install_github("andrewljackson/siar#v4.2.2", build_vingettes == TRUE)
library(siar)

Why does gstat.predict() function often return NaN values (GSTAT Package)? (R version 3.3.2, Windows 10)

I am trying to simulate a combination of two different random fields (yy1 and yy2 with different mean and correlation length) with an irregular boundary using Gstat package in R. I have attached the picture of my expected outcome. The code is not giving such output consistently and I am frequently getting atleast one of the yy1 and yy2 as NaNs, which results in the Undesired output as shown in image.
The key steps I used are:
1) Created two gstat objects with different means and psill (rf1 and rf2)
2) Created two computational grids (one for each random field) in the form of data frame with two variables “x” and “y” coordinates.
3) Predicted two random fields using unconditional simulation.
Any help in this regard would be highly appreciated.
Attachments: 2 images (link provided) and 1 R code
1) Expected Outcome
2) Undesired Outcome
library(gstat)
xy <- expand.grid(1:150, 1:200) # 150 x 200 grid is created in the form of a dataframe with x and y vectors
names(xy)<-c('x','y') # giving names to the variables
# creating gsat objects
rf1<-gstat(formula=z~1,locations=~x+y,dummy = T,beta=c(1,0,0), model=vgm(psill=0.025, range=5, model='Exp'), nmax=20) # dummy=T treats this as a unditional simulation
rf2<-gstat(formula=z~1,locations=~x+y,dummy = T,beta=c(4,0,0), model=vgm(psill=0.025, range=10, model='Exp'), nmax=20) # dummy=T treats this as a unditional simulation
# creating two computational grid
rows<-nrow(xy)
xy_shift <- expand.grid(60:90, 75:100)
names(xy_shift)<-c('x','y')
library(dplyr) # for antijoin
xy1<-xy[1:(rows/2),]
xy1<-anti_join(xy1, xy_shift, by = c("x","y")) # creating the irregular boundary
xy2<-rbind(xy[(rows/2+1):rows,],xy_shift)
library(sp)
yy1<- predict(rf1, newdata=xy1, nsim=1) # random field 1
yy2<- predict(rf2, newdata=xy2, nsim=1) # random field 2
rf1_label<-gl(1,length(yy1[,1]),labels="field1")
rf2_label<-gl(1,length(yy2[,1]),labels="field2")
yy1<-cbind(yy1,field=rf1_label)
yy2<-cbind(yy2,field=rf2_label)
yy<-rbind(yy1,yy2)
yyplot<-yy[,c(1,2,3)]
# plotting the field
gridded(yyplot) = ~x+y
spplot(obj=yyplot[1],scales=list(draw = TRUE))

spBayes spLM function with duplicate coordinates

I am using the spRecover function in package spBayes to produce a spatial univariate model.
Here is a reproducible example where there I made a duplicate coordinate point. The modeling procedure itself executes just fine, but it won't let me recover the spatial effects for each site:
require(spBayes)
set.seed(444)
N = 200
y = rnorm(N,0,100)
x = rnorm(N,2,7)
df <- as.data.frame(cbind((rnorm(N,5,2.5)),rep('location1',N)))
coord <- cbind(runif(N,-30,30),runif(N,-180,180))
coord[2,] <- coord [1,]
n.samples <- 1000
bef.sp <- spLM(y ~ x, ## the equation
data = df, coords=coord, ## data and coordinates
starting=list("phi"=3/200,"sigma.sq"=0.08,"tau.sq"=0.02),## start values
tuning=list("phi"=0.1, "sigma.sq"=0.05, "tau.sq"=0.05), ## tuning values
priors=list("phi.Unif"=c(3/1500, 3/50), "sigma.sq.IG"=c(2, 0.08),"tau.sq.IG"=c(2, 0.02)), ## priors
cov.model="exponential",n.samples=n.samples)
burn.in <- floor(0.75*n.samples)
bef.sp <- spRecover(bef.sp, start=burn.in, thin=2)
The error received is:
Error in spRecover(bef.sp, start = burn.in, thin = 2) :
c++ error: dpotrf failed
I found a post by the package author indicating this error might come up if one has replicated coordinates. I definitely have duplicated coordinates, since many sites were sampled many times (on the same day; this is not a time-series issue). How do I get the model to accept that there is lots of replication within each coordinate pair, and to recover individual spatial effects values for each site?
Thanks!

R: How to read Nomograms to predict the desired variable

I am using Rstudio. I have created nomograms using function nomogram from package rms using following code (copied from the example code of the documentation):
library(rms)
n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c('female','male'), n,TRUE))
# Specify population model for log odds that Y=1
L <- .4*(sex=='male') + .045*(age-50) +
(log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist='ddist')
f <- lrm(y ~ lsp(age,50)+sex*rcs(cholesterol,4)+blood.pressure)
nom <- nomogram(f, fun=function(x)1/(1+exp(-x)), # or fun=plogis
fun.at=c(.001,.01,.05,seq(.1,.9,by=.1),.95,.99,.999),
funlabel="Risk of Death")
#Instead of fun.at, could have specified fun.lp.at=logit of
#sequence above - faster and slightly more accurate
plot(nom, xfrac=.45)
Result:
This code produces a nomogram but there is no line connecting each scale (called isopleth) to help predict the desired variable ("Risk of Death") from the plot. Usually, nomograms have the isopleth for prediction (example from wikipedia). But here, how do I predict the variable value?
EDIT:
From the documentation:
The nomogram does not have lines representing sums, but it has a
reference line for reading scoring points (default range 0--100). Once
the reader manually totals the points, the predicted values can be
read at the bottom.
I don't understand this. It seems that predicting is supposed to be done without the isopleth, from the scale of points. but how? Can someone please elaborate with this example on how I can read the nomograms to predict the desired variable? Thanks a lot!
EDIT 2 (FYI):
In the description of the bounty, I am talking about the isopleth. When starting the bounty, I did not know that nomogram function does not provide isopleth and has points scale instead.
From the documentation, the nomogram is used to manualy obtain prediction:
In the top of the plot (over Total points)
you draw a vertical line for each of the variables of your patient (for example age=40, cholesterol=220 ( and sex=male ), blood.pressure=172)
then you sum up the three values you read on the Points scale (40+60+3=103) to obtain Total Points.
Finally you draw a vertical line on the Total Points scale (103) to read the Risk of death (0.55).
These are regression nomograms, and work in a different way to classic nomograms. A classic nomogram will perform a full calculation. For these nomograms you drop a line from each predictor to the scale at the bottom and add your results.
The only way to have a classic 'isopleth' nomogram working on a regression model would be 1 have just two predictors or 2 have a complex multi- step nomogram.

Resources