Replicated point patterns on linear networks in spatstat - r

I have point pattern data from a replicated experiment where the points in each replicate are constrained to the same linear network (the data are from daily surveys of a bike path for snakes: each day gives a separate point pattern of locations where animals are found).
I know that in spatstat it is possible to fit point processes to multiple point patterns simultaneously (with mppm), and to fit point process models on linear networks (with lppm); is it possible to do both simultaneously? As far as I can tell, mppm will not accept lpp objects: is there another way of fitting this type of model?

This is not yet fully supported in spatstat.
However, you can do most of what you want by converting the lpp objects to quadrature schemes using linequad and then using these quadrature schemes instead of ppp objects in the hyperframe. Example:
X1 <- spiders
X2 <- runiflpp(25, domain(spiders))
A <- linequad(X1)
B <- linequad(X2)
f <- function(x,y)x
H <- hyperframe(X=solist(A,B), Z=list(f,f))
fit <- mppm(X ~ Z, data=H)
Most methods for mppm will work correctly; except that you can't simulate or predict the fitted model, because it doesn't know that it's supposed to be on a network.
If you have a long list of ppp objects, then instead of converting the point patterns to quadrature schemes one-by-one, you could use solapply( , linequad).

Related

Plot an envelope for an mppm object in spatstat

My question is closely related to this previous one: Simulation-based hypothesis testing on spatial point pattern hyperframes using "envelope" function in spatstat
I have obtained an mppm object by fitting a model on several independent datasets using the mppmfunction from the R package spatstat. How can I study its envelope to compare it to my observations ?
I fitted my model as such:
data <- listof(NMJ1,NMJ2,NMJ3)
data <- hyperframe(X=1:3, Points=data)
model <- mppm(Points ~marks*sqrt(x^2+y^2), data)
where NMJ1, NMJ2, and NMJ3 are marked ppp and are independent realizations of the same experiment.
However, the envelope function does not accept inputs of type mppm:
> envelope(model, Kcross.inhom, nsim=10)
Error in UseMethod("envelope") :
no applicable method for 'envelope' applied to an object of class "c('mppm', 'list')"
The answer provided to the previously mentioned question indicates how to plot global envelopes for each pattern, and to use the product rule for multiple testing. However, my fitted model implies that my 3 ppp objects are statistically equivalent, and are independent realizations of the same experiment (ie no different covariates between them). I would thus like to obtain one single plot comparing my fitted model to my 3 datasets. The following code:
gamma= 1 - 0.95^(1/3)
nsims=round(1/gamma-1)
sims <- simulate(model, nsim=2*nsims)
SIMS <- list()
for(i in 1:nrow(sims)) SIMS[[i]] <- as.solist(sims[i,,drop=TRUE])
Hplus <- cbind(data, hyperframe(Sims=SIMS))
EE1 <- with(Hplus, envelope(Points, Kcross.inhom, nsim=nsims, simulate=Sims))
pool(EE1[1],EE1[2],EE1[3])
leads to the following error:
Error in pool.envelope(`1` = list(r = c(0, 0.78125, 1.5625, 2.34375, 3.125, :
Arguments 2 and 3 do not belong to the class “envelope”
Wrong type of subset index. Use
pool(EE1[[1]], EE1[[2]], EE1[[3]])
or just
pool(EE1)
These would have given an error message that the envelope commands should have been called with savefuns=TRUE. So you just need to change that step as well.
However, statistically this procedure makes little sense. You have already fitted a model, which allows for rigorous statistical inference using anova.mppm and other tools. Instead of this, you are generating simulated data from the fitted model and performing a Monte Carlo test, with all the fraught issues of multiple testing and low power. There are additional problems with this approach - for example, even if the model is "the same" for each row of the hyperframe, the patterns are not statistically equivalent unless the windows of the point patterns are identical, and so on.

Thin plate spline on Linear Network with Spatstat

I have been using lppm from spatstat and I want to fit a log-linear model.
I can define covariates as linfun object and use in the model.
Let's say we are interested in modeling the car theft problem in Australia. let's assume cov1 is the distance to the nearest school and cov2 is the distance to the nearest police department.
We want to use X and Y coordinates in the model.
lppm(L~cov1+cov2+x+y} would work? X and Y's in the model are the location of events?
how can I use thin-plate spline on the linear network? I can create grids on ppp but lpp is not as straight forward as I think. Can I pass a matrix to lppm object?
Code in spatstat for linear networks is still under development, but lppm is based upon ppm, so you can look at the help files or documents about ppm for explanation. The variable names appearing in the model formula can be
the names of images (of class im or linim)
the names of spatial functions (class funxy or linfun)
the symbols x, y (representing cartesian coordinates)
the symbol marks representing categorical mark value
A term in the model formula may be just the name of one of these variables, or an expression involving these variable names, including functions applied to these variables.
Your example would work.
You can get B-splines of the cartesian coordinates by including a term such as bs(x)
If you need more help, first read chapter 9 of the spatstat book

Low-pass fltering of a matrix

I'm trying to write a low-pass filter in R, to clean a "dirty" data matrix.
I did a google search, came up with a dazzling range of packages. Some apply to 1D signals (time series mostly, e.g. How do I run a high pass or low pass filter on data points in R? ); some apply to images. However I'm trying to filter a plain R data matrix. The image filters are the closest equivalent, but I'm a bit reluctant to go this way as they typically involve (i) installation of more or less complex/heavy solutions (imageMagick...), and/or (ii) conversion from matrix to image.
Here is sample data:
r<-seq(0:360)/360*(2*pi)
x<-cos(r)
y<-sin(r)
z<-outer(x,y,"*")
noise<-0.3*matrix(runif(length(x)*length(y)),nrow=length(x))
zz<-z+noise
image(zz)
What I'm looking for is a filter that will return a "cleaned" matrix (i.e. something close to z, in this case).
I'm aware this is a rather open-ended question, and I'm also happy with pointers ("have you looked at package so-and-so"), although of course I'd value sample code from users with experience on signal processing !
Thanks.
One option may be using a non-linear prediction method and getting the fitted values from the model.
For example by using a polynomial regression, we can predict the original data as the purple one,
By following the same logic, you can do the same thing to all columns of the zz matrix as,
predictions <- matrix(, nrow = 361, ncol = 0)
for(i in 1:ncol(zz)) {
pred <- as.matrix(fitted(lm(zz[,i]~poly(1:nrow(zz),2,raw=TRUE))))
predictions <- cbind(predictions,pred)
}
Then you can plot the predictions,
par(mfrow=c(1,3))
image(z,main="Original")
image(zz,main="Noisy")
image(predictions,main="Predicted")
Note that, I used a polynomial regression with degree 2, you can change the degree for a better fitting across the columns. Or maybe, you can use some other powerful non-linear prediction methods (maybe SVM, ANN etc.) to get a more accurate model.

sommer, multivariate liner mixed model analysis, plant breeding applications

I had the opportunity to read the sommer documentation but I was able to find some example of regression on markers (rrBLUP parametrization), just examples using the kinship parametrization (GBLUP parametrization). Please, could you gently say if it is possible
on sommer to regress directly on markers, instead of using the kinship matrix? Especially under multivariate scenarios (multiple traits, locations etc) modelling an unstructured var-cov for the marker effects
In sommer >= 3.7 is straight forward to fit an rrBLUP model in the multivariate setting, the DT_cpdata has a good example
librayr(sommer)
data(DT_cpdata)
mix.rrblup <- mmer(fixed=cbind(color,Yield)~1,
random=~vs(list(GT),Gtc=unsm(2)) + vs(Rowf,Gtc=diag(2))
rcov=~vs(units,Gtc=unsm(2)),
data=DT)
summary(mix.rrblup)
A <- A.mat(GT)
mix.gblup <- mmer(fixed=cbind(color,Yield)~1,
random=~vs(id,Gu=A, Gtc=unsm(2)) + vs(Rowf,Gtc=diag(2))
rcov=~vs(units,Gtc=unsm(2)),
data=DT)
summary(mix.gblup)
the vs() function makes a variance structure for a given random effect, and the covariance structure for the univariate/multivariate setting is provided in the Gtc argument as a matrix, where i.e. diagonal, unstructured or a customized structure can be provided. When the user want to provide a customized matrix as a random effect such as a marker matrix GT to do rrBLUP it has to be provided in a list() to internally help sommer to put it in the right format, whereas in the GBLUP version the random effect id which has the labels for individuals can have a covariance matrix provided in the Gu argument.

Function and data format for doing vector-based clustering in R

I need to run clustering on the correlations of data row vectors, that is, instead of using individual variables as clustering predictor variables, I intend to use the correlations between the vector of variables between data rows.
Is there a function in R that does vector-based clustering. If not and I need to do it manually, what is the right data format to feed in a function such as cmeans or kmeans?
Say, I have m variables and n data rows, the m variables constitute one vector for each data row. so I have a n X n matrix for correlation or cosine. Can this matrix be plugged in the clustering function directly or certain processing is required?
Many thanks.
You can transform your correlation matrix into a dissimilarity matrix,
for instance 1-cor(x) (or 2-cor(x) or 1-abs(cor(x))).
# Sample data
n <- 200
k <- 10
x <- matrix( rnorm(n*k), nr=k )
x <- x * row(x) # 10 dimensions, with less information in some of them
# Clustering
library(cluster)
r <- pam(1-cor(x), diss=TRUE, k=5)
# Check the results
plot(prcomp(t(x))$x[,1:2], col=r$clustering, pch=16, cex=3)
R clustering is often a bit limited. This is a design limitation of R, since it heavily relies on low-level C code for performance. The fast kmeans implementation included with R is an example of such a low-level code, that in turn is tied to using Euclidean distance.
There are a dozen of extensions and alternatives available in the community around R. There are PAM, CLARA and CLARANS for example. They aren't exactly k-means, but closely related. There should be a "spherical k-means" somewhere, that is sensible for cosine distance. There is the whole family of hierarchical clusterings (which scale rather badly - usually O(n^3), with O(n^2) in a few exceptions - but are very easy to understand conceptually).
If you want to explore some more clustering options, have a look at ELKI, it should allow clustering (with various methods, including k-means) by correlation based distances (and it also includes such distance functions). It's not R, though, but Java. So if you are bound to using R, it won't work for you.

Resources