Is it possible to do anisotropic IDW estimation (and cross validation) using gstat in R? - gstat

I am comparing cross validation ("leave one out") results for different variogram models for a 3D data set using the gstat library in R. I would like to compare similar cross validation results for inverse distance estimates as well, but I can't see how to do anisotropic estimations (or cross validation) with IDW in gstat. Is IDW in gstat estimation limited to isotropic estimation, and if not, what syntax can be applied?

Assuming it is not possible to do anisotropic IDW estimations in gstat, I translated the data locations to make them isotropic, then conducted cross-validation using krige.cv.
Starting with a csv file with X,Y,Z,HM values (X, Y and Z translated to make the points isotropic - in this case XY rotated by 15 degrees, then X coordinates multiplied by 3.333 and Z values multiplied by 30: "hm_assays_iso.csv"). The R session I used was as follows:
> library(gstat)
> library(sp)
> hm_iso<-read.csv("hm_assays_iso.csv")
> coordinates(hm_iso)<- c("X", "Y", "Z")
> hm_idw.cv<-krige.cv(HM~1, locations=hm_iso, set=list(idp = 3))
> write.csv(hm_idw.cv, file="hm_cv_idw3.csv")

Related

How do I implement a non-default dissimilarity metric with vegan function metaMDS()?

I have constructed a distance matrix from phylogenetic data using the Claddis function MorphDistMatrix() with the distance metric "MORD" (Maximum Observable Rescaled Distance). I now want to use this dissimilarity matrix to run an NMDS using the vegan function metaMDS(). However, although metaMDS has many distance metrics to choose from, "MORD" is not one of them. How do I enable metaMDS() to have this metric as an option?
Edit: here is some example code:
nexus.data<-ReadMorphNexus("example.nex")
Reading in Nexus file
dist<- MorphDistMatrix(nexus.data, distance = "MORD")
Claddis command for creating distance matrix. Instead of using the Gower dissimilarity (distance = "GC"), I would like to use Maximum Observable Rescaled Distance (distance = "MORD"), which is a modified form of Gower for use with ordered characters (Lloyd 2016). So far so good.
nmds<-metaMDS(dist$DistanceMatrix, k=2, trymax=1000, distance = "GC")
Here is where I run into trouble: as I understand it, the distance used for the metaMDS command should be the same as was used to construct the distance matrix, but MORD is not an option for "distance" in metaMDS. If I were to construct the distance matrix under Gower dissimilarity it wouldn't be a problem, as that is also available in metaMDS
Lloyd, G. T., 2016. Estimating morphological diversity and tempo with discrete character-taxon matrices: implementation, challenges, progress, and future directions. Biological Journal of the Linnean Society, 118, 131-151.
metaMDS has argument distfun to select other dissimilarity functions than vegdist. Such a function should accept argument method to select the dissimilarity measure used. Further, it should return a regular dissimilarity object that inherits from standard R dist function. I do not know about this Claddis package: does it return regular dissimilarities or something peculiar? Your example hints that it returns something that is not a regular R object, but something peculiar. Alternatively, you can use pre-calculated dissimilarities as input in metaMDS. Again these should be regular dissimilarities like in any decent R implementation. So you need to check the following with your dissimilarities:
inherits(dist, "dist") # your dist result: should be TRUE
inherits(dist$DistanceMatrix, "dist") # alternatively this should be TRUE
## if the latter was TRUE, you can extract that with
d <- dist$DistanceMatrix
## if d is not a "dist" object, you can see if it can be turned into one
d <- as.dist(dist$DistanceMatrix)
inherits(d, "dist") # TRUE: OK, FALSE: no hope
## if it was OK, you just do
metaMDS(d)

Local Block Kriging with Local Variogram with gstat

I have been unable to find any information specific to local block kriging with a local variogram using the gstat package in R. There is freeware called VESPER from the Australian Center for Precision Agriculture that is able to do this, and from what I have read it should be possible in R, I could just use some help with putting together a for-loop to make the gstat functions work locally.
Using the meuse data set as an example, I have been able to calculate and fit a global variogram to a data set:
library(gstat)
data(meuse)
coordinates(meuse) = ~x+y
data(meuse.grid)
gridded(meuse.grid) = ~x+y
logzinc_vgm<- variogram(log(zinc)~1, meuse)
logzinc_vgm_fit <- fit.variogram(logzinc_vgm, model=vgm("Sph", "Exp"))
logzinc_vgm_fit
plot(logzinc_vgm, logzinc_vgm_fit)
This gives a nice plot of the variogram for the whole data set with the fitted model. Then I can use this to perform block kriging over the entire data set:
logzinc_blkkrig <- krige(log(zinc)~1, meuse, meuse.grid, model = logzinc_vgm_fit, block=c(100,100))
spplot(logzinc_blkkrig["var1.pred"], main = "ordinary kriging predictions")
spplot(logzinc_blkkrig["var1.var"], main = "ordinary kriging variance")
This produces a plot of the interpolated data as well as a plot of the variance for each predicted point. So this would be perfect if I wanted these functions to work once for my entire data set...
But I have been unable to generate a for-loop to handle these functions on a local level.
My goals are:
1. For each point in my grid file (which I have tried as both a data frame and SpatialPointsDataFrame), I would like to subset from my data file points within the distance diagonally of the range given in the global variogram (easy to call this location (i.e. logzinc_vgm_fit[2,3]))
2. On this subset of data, I would like to calculate the variogram (as above) and fit a model to it (as above)
3. Based on this model, I would like to perform block kriging to get a predicted value and variance at that grid point
4. Build the above three steps into a for-loop to predict values at each grid point based on the local variogram around each grid point
note: as with the meuse data set built into the gstat package, the dimensions of my grid and data data frames are different
Thank you very much for chiming in if anyone is able to tackle this question. Happy to post the code I am working with so far if it would be useful.
I made a for loop that I think accomplishes what you request. I do not think that block kriging is required for this because the loop predicts at each grid cell.
The rad parameter is the search radius, which can be set to other quantities, but currently references the global variogram range (with nugget effect). I think it would be best to search a little further for points because if you only search up to the global variogram range, a local variogram fit may not converge (i.e. no observed range).
The k parameter is for the minimum number of nearest neighbors within rad. This is important because some locations may have no points within rad, which would result in an error.
You should note that the way you specified model=vgm("Sph", "Exp") seems to take the first listed method. So, I used the Spherical model in the for loop, but you can change to what you want to use. Matern may be a good choice if you think the shape will change with location.
#Specify the search radius for the local variogram
rad = logzinc_vgm_fit[2,3]
#Specify minimum number of points for prediction
k = 25
#Index to indicate if any result has been stored yet
stored = 0
for (i in 1:nrow(meuse.grid)){
#Calculate the Euclidian distance to all points from the currect grid cell
dists = spDistsN1(pts = meuse, pt = meuse.grid[i,], longlat = FALSE)
#Find indices of the points within rad of this grid point
IndsInRad = which(dists < rad)
if (length(IndsInRad) < k){
print('Not enough nearest neighbors')
}else{
#Calculate the local variogram with these points
locVario = variogram(log(zinc)~1, meuse[IndsInRad,])
#Fit the local variogram
locVarioFit = fit.variogram(logzinc_vgm, model=vgm("Sph"))
#Use kriging to predict at grid cell i. Supress printed output.
loc_krig <- krige(log(zinc)~1, meuse[IndsInRad,], meuse.grid[i,], model = locVarioFit, debug.level = 0)
#Add result to database
if (stored == 0){
FinalResult = loc_krig
stored = 1
}else{
FinalResult = rbind(FinalResult, loc_krig)
}
}
}

Time taken to krige in gstat package in R

The following R program creates an interpolated surface using 470 data points using walker Lake data in gstat package.
source("D:/kriging/allfunctions.r") # Reads in all functions.
source("D:/kriging/panel.gamma0.r") # Reads in panel function for xyplot.
library(lattice) # Needed for "xyplot" function.
library(geoR) # Needed for "polygrid" function.
library(akima)
library(gstat);
library(sp);
walk470 <- read.table("D:/kriging/walk470.txt",header=T)
attach(walk470)
coordinates(walk470) = ~x+y
walk.var1 <- variogram(v ~ x+y,data=walk470,width=10) #the width has to be tuned resulting different point pairs
plot(walk.var1,xlab="Distance",ylab="Semivariance",main="Variogram for V, Lag Spacing = 5")
model1.out <- fit.variogram(walk.var1,vgm(70000,"Sph",40,20000))
plot(walk.var1, model=model1.out,xlab="Distance",ylab="Semivariance",main="Variogram for V, Lag Spacing = 10")
poly <- chull(coordinates(walk470))
plot(coordinates(walk470),type="n",xlab="X",ylab="Y",cex.lab=1.6,main="Plot of Sample and Prediction Sites",cex.axis=1.5,cex.main=1.6)
lines(coordinates(walk470)[poly,])
poly.in <- polygrid(seq(2.5,247.5,5),seq(2.5,297.5,5),coordinates(walk470)[poly,])
points(poly.in)
points(coordinates(walk470),pch=16)
coordinates(poly.in) <- ~ x+y
krige.out <- krige(v ~ 1, walk470,poly.in, model=model1.out)
print(krige.out)
This program calculates the following for each point of 2688 points
(470x470) matrix inversion
(470x470) and (470x1) matrix multiplication
Is gstat package is using some smart way for calculation. I knew from previous stackoverflow query that it uses cholesky decomposition for matrix inversion. Is it normal speed for one machine to calculate it so quickly.
It uses LDL' decomposition, which is similar to Choleski. As you are using global kriging, the covariance matrix needs to be decomposed only once; then, for each prediction point, a system is solved, which is O(n). No 470x470 matrix gets ever inverted, neither are solutions obtained by multiplying it. Inverses are notational devices, but avoided as computational strategy when possible. In R, for instance, compare runtime of solve(A,b) with solve(A) %*% b.
Use the source, Luke!

Principal component analysis with EQUAMAX rotation

I need to do a principal component analysis (PCA) with EQUAMAX-rotation in R.
Unfortunately the function principal() I normally use for PCA does not offer this kind of rotation.
I could find out that it may be possible somehow with the package GPArotation but I could not yet figure out how to use this in the PCA.
Maybe someone can give an example on how to do an equamax-rotation PCA?
Or is there a function for PCA in another package that offers the use of equamax-rotation directly?
The package psych from i guess you are using principal() has the rotations varimax, quatimax, promax, oblimin, simplimax, and cluster but not equamax (psych p.232) which is a compromise between Varimax and Quartimax
excerpt from the STATA manual: mvrotate p.3
Rotation criteria
In the descriptions below, the matrix to be rotated is denoted as A, p denotes the number of rows of A, and f denotes the number of columns of A (factors or components). If A is a loading matrix from factor or pca, p is the number of variables, and f is the number of factors or components.
Criteria suitable only for orthogonal rotations
varimax and vgpf apply the orthogonal varimax rotation (Kaiser 1958). varimax maximizes the variance of the squared loadings within factors (columns of A). It is equivalent to cf(1/p) and to oblimin(1). varimax, the most popular rotation, is implemented with a dedicated fast algorithm and ignores all optimize options. Specify vgpf to switch to the general GPF algorithm used for the other criteria.
quartimax uses the quartimax criterion (Harman 1976). quartimax maximizes the variance of
the squared loadings within the variables (rows of A). For orthogonal rotations, quartimax is equivalent to cf(0) and to oblimax.
equamax specifies the orthogonal equamax rotation. equamax maximizes a weighted sum of the
varimax and quartimax criteria, reflecting a concern for simple structure within variables (rows of A) as well as within factors (columns of A). equamax is equivalent to oblimin(p/2) and cf(#), where # = f /(2p).
now the cf (Crawford-Ferguson) method is also available in GPArotation
cfT orthogonal Crawford-Ferguson family
cfT(L, Tmat=diag(ncol(L)), kappa=0, normalize=FALSE, eps=1e-5, maxit=1000)
The argument kappa parameterizes the family for the Crawford-Ferguson method. If m is the number of factors and p is the number of indicators then kappa values having special names are 0=Quartimax, 1/p=Varimax, m/(2*p)=Equamax, (m-1)/(p+m-2)=Parsimax, 1=Factor parsimony.
X <- matrix(rnorm(500), ncol=10)
C <- cor(X)
eig <- eigen(C)
# PCA by hand scaled by sqrt
eig$vectors * t(matrix(rep(sqrt(eig$values), 10), ncol=10))
require(psych)
PCA0 <- principal(C, rotate='none', nfactors=10) #PCA by psych
PCA0
# as the original loadings PCA0 are scaled by their squarroot eigenvalue
apply(PCA0$loadings^2, 2, sum) # SS loadings
## PCA with Equimax rotation
# now i think the Equamax rotation can be performed by cfT with m/(2*p)
# p number of variables (10)
# m (or f in STATA manual) number of components (10)
# gives m==p --> kappa=0.5
PCA.EQ <- cfT(PCA0$loadings, kappa=0.5)
PCA.EQ
I upgraded some of my PCA knowledge by your question, hope it helps, good luck
Walter's answer helped a great deal!
I'll add some sidenotes for what it's worth:
R's psych::principal says under option "rotate", that more rotations are available. Under the linked "fa", there's in fact an "equamax". Sadly, the results are neither replicable with STATA nor with SPSS, at least not with the standard syntax I tried:
# R:
PCA.5f=principal(data, nfactors=5, rotate="equamax", use="complete.obs")
Walter's solution replicates SPSS' equamax rotation (Kaiser-normalized by default) in the first 3 decimal places (i.e. loadings and rotating matrix fairly equivalent) using the following syntax with m=no of factors and p=no of indicators:
# R:
PCA.5f=principal(data, nfactors=5, rotate="none", use="complete.obs")
PCA.5f.eq = cfT(PCA.5f$loadings, kappa=m/(2*p), normalize=TRUE) # replace kappa factor formula with your actual numbers!
# SPSS:
FACTOR
/VARIABLES listofvariables
/MISSING LISTWISE
/ANALYSIS listofvariables
/PRINT ROTATION
/CRITERIA FACTORS(5) ITERATE(1000)
/EXTRACTION PC
/CRITERIA ITERATE(1000)
/ROTATION EQUAMAX
/METHOD=CORRELATION.
STATA's equamax - Kaiser-normalized and unnormalized - is replicable at least in the first 4 decimal places with Kappa .5 irrespective of your actual number of factors and indicators which seems to contradict their manual (c.f. Walter's citation).
# R:
PCA.5f=principal(data, nfactors=5, rotate="none", use="complete.obs")
PCA.5f.eq = cfT(PCA.5f$loadings, kappa=.5, normalize=TRUE)
# STATA:
factor listofvars, pcf factors(5)
rotate, equamax normalize # kick the "normalize" to replicate R's "normalize=FALSE"
mat list e(r_L)

Is there an R library that estimates a multivariate natural cubic spline (or similar) function?

note: originally posted on Cross Validated (stats SE) on 07-26-2011, with no correct answers to date.
Background
I have a model, f, where Y=f(X)
X is an n x m matrix of samples from m parameters and Y is the n x 1 vector of model outputs.
f is computationally intensive, so I would like to approximate f using a multivariate cubic spline through (X,Y) points, so that I can evaluate Y at a larger number of points.
Question
Is there an R function that will calculate an arbitrary relationship between X and Y?
Specifically, I am looking for a multivariate version of the splinefun function, which generates a spline function for the univariate case.
e.g. this is how splinefun works for the univariate case
x <- 1:100
y <- runif(100)
foo <- splinefun(x,y, method = "monoH.FC")
foo(x) #returns y, as example
The test that the function interpolates exactly through the points is successful:
all(y == foo(1:100))
## TRUE
What I have tried
I have reviewed the mda package, and it seems that the following should work:
library(mda)
x <- data.frame(a = 1:100, b = 1:100/2, c = 1:100*2)
y <- runif(100)
foo <- mars(x,y)
predict(foo, x) #all the same value
however the function does not interpolate exactly through the design points:
all(y == predict(foo,x))
## FALSE
I also could not find a way to implement a cubic-spline in either the gam, marss, or earth packages.
Actually several packages can do it. The one I use is the "rms" package which has rcs, but the survival package also has pspline and the splines package has the ns function {}. "Natural splines" (constructed with ns) are also cubic splines. You will need to form multivariate fitting function with the '*' operator in the multivariate formula creating "crossed" spline terms.
that the example you offered was not sufficiently rich.
I guess I am confused that you want exact fits. R is a statistical package. Approximate estimation is the goal. Generally exact fits are more of a problem because they lead to multicollinearity.
Have a look at the DiceKriging package which was developed to undertake tasks like this.
http://cran.r-project.org/web/packages/DiceKriging/index.html
I've provided an example application at
https://stats.stackexchange.com/questions/13510/fitting-multivariate-natural-cubic-spline/65012#65012
I'm not sure if this is precisely what you are looking for, but you could try Tps() in the R package fields. It's meant for doing thin-plate splines interpolations (2D equivalent of cubic splines) for spatial data, but will take up to four covariates, although it will expect them to be euclidean x,y,z + time, so you need to be clear that you are selecting the correct options for your particular case. If you want to interpolate, set the smoothing parameter lambda to zero. You might also try the function polymars() in the R package polspline.

Resources