How to figure out the parameters from mppm in R - r

I am working using the spatstat library in R.
I have several point pattern objects built from my own dataset. The point patterns contain only the x and y coordinates of the points in them. I wanted to fit the point patterns to a Gibbs process with Strauss interaction to build a model and simulate similar point patterns. I was able to use ppm function for that purpose if I work with one point pattern at a time. I used rmhmodel function on the ppm object returned from the ppm function. The rmhmodel function gave me the parameters beta, gamma and r, which I needed to use in rStrauss function further to simulate new point patterns. FYI, I am not using the simulate function directly as I want the new simulated point pattern to have flexible number of points that simulate does not give me.
Now, if I want to work with all the point patterns I have, I can build a hyperframe of point patterns as described in the replicated point pattern chapter of the Baddeley textbook, but it requires mppm function instead of ppm function to fit the model and mppm is not working with rmhmodel when I am trying to figure out the model parameters beta, gamma and r.
How can I extract the fitted beta, gamma and r from a mppm object?

There are several ways to do this.
If you print a fitted model (obtained from ppm or from mppm) simply by typing the name of the object, the printed output contains a description of the fitted model including the model parameters.
If you apply the function parameters to a fitted model obtained from ppm you will obtain a list of the parameter values with appropriate names.
fit <- ppm(cells ~ 1, Strauss(0.12))
fit
parameters(fit)
For a model obtained from mppm, there could be different parameter values applying to each row of the hyperframe of data, so you would have to do lapply(subfits(model), parameters) and the result is a list with one entry for each row of the hyperframe, containing the parameters relevant to each row.
A <- hyperframe(Bugs=waterstriders)
mfit <- mppm(Bugs ~ 1, data=A, Strauss(5))
lapply(subfits(mfit), parameters)
Alternatively you can extract the canonical parameters by coef and transform them to the natural parameters.
You wrote:
I am not using the simulate function directly as I want the new simulated point pattern to have flexible number of points that simulate does not give me.
This cannot be right. The function simulate.mppm generates simulated realisations with a variable number of points. Try simulate(mfit).

Related

Fitting an inhomogeneous Cox LGCP to a replicated point process using mppm

My recent foray into spatial point patterns has brought me to examining LGCP Cox processes. In my case I actually have a series of point patterns that I want to fit a single model to. One of my previous inquiries brought me to using mppm to train such models( thanks Adrian Baddeley!). My next question relates to using this type of Cox model in the context of mppm.
Is this possible to fit an inhomogeneous LGCP Cox process (or other type of Cox process) to a replicated point pattern using mppm? I see some info on fitting Gibbs processes, but not really for Cox processes.
It seems like the answer may be "possibly" through some creative use of the "random" argument.
For the sake of example, lets say I'm fitting a using point pattern Y with a single covariate X (which is a single im). The call to kppm would be:
myModel = kppm(Y ~ X,"LGCP")
If I were fitting a simple inhomogeneous Poisson process to a replicated point pattern and associated covariate in hyperframe G, I believe the call would look like the following:
myModel = mppm(Y ~ X, data=G)
After going through Chapter 16 of the SpatStat book I think that fitting a replicated LGCP Cox model might be accomplished by using the simulated intensities from calls to rLGCP, maybe like this...
myLGCP = rLGCP(model="exp",mu=0,saveLambda=TRUE,nsim=2,win=myWindow)
myIntensity = lapply(myLGCP,function(x) attributes(x)$Lambda)
G$Z = myIntensity
myModel = mppm(Y ~ X, data=G, random=~Z|id)
The above approach "runs" without errors... but I have no idea if I'm even remotely close to actually accomplishing what I wanted to do. It's also a little unclear how to use the fitted object to then simulate a realization of the model, since simulate.kppm requires a kppm object.
Thoughts and suggestions appreciated.
mppm does not currently support Cox processes.
You could do the following
Fit the trend part of the model to your replicated point pattern data using mppm, for example m <- mppm(Y ~ X, data=G)
Extract the fitted intensities for each point pattern using predict.mppm
For each point pattern, using the corresponding intensity obtained from the model, compute the inhomogeneous K function using Kinhom (with argument ratio=TRUE)
Combine the K functions using pool
Estimate the cluster parameters of the LGCP by applying lgcp.estK to the pooled K function.
Optionally after step 4 you could convert the pooled K function to a pair correlation function using pcf.fv and then fit the cluster parameters using lgcp.estpcf.
This approach assumes that the same cluster parameters will apply to each point pattern. If your data consist of several distinct groups of patterns, and you want the model to assign different cluster parameter values to the different groups of patterns, then just apply steps 4 and 5 separately to each group.

how to decompose a gamma distribution into two gamma distribution in R

Is there an algorithm available in R that can decompose a gamma distribution into two (or more) gamma distributions? If so, can you give me an example with it? Basically, I have a data set that looks like a gamma distribution if I plot it with respect to time (it's a time series data). Basically, this data contains the movement of the animal. And the animal can be in two different states: hungry, not hungry. My immediate reaction was to use the Hidden Markov Model and see if I can predict the two states. I was trying to use the depmix() function from depmixS4 library in R to see if I can see the two different states. However, I don't really know how to use this function in gamma distribution. The following is the code that I wrote, but it says that I need an argument for gamma, which I don't understand. Can someone tell me what parameter I should use and how to determine the parameter? Thanks!
mod <- depmix(freq ~ 1, data = mod.data, nstates = 2, family = gamma())
fit.mod <- fit(mod)
Thank you!

Simple variogram in R, understanding gstat::variogram() and object gstat

I have a data.frame in R whose variables represent locations and whose observations are measures of a certain variable in those locations. I want to measure the decay of dependence for certain locations depending on distance, so the variogram comes particularly useful for my studies.
I am trying to use gstat library but I am a bit confused about certain parameters. As far as I understand the (empirical) variogram should only need as basic data:
The locations of the variables
Observations for these variables
And then other parameters like maximun distance, directions, ...
Now, gstat::variogram() function requires as first input an object of class gstat. Checking the documentation of function gstat() I see that it outputs an object of this class, but this function requires a formula argument, which is described as:
formula that defines the dependent variable as a linear model of independent variables; suppose the dependent variable has name z, for ordinary and simple kriging use the formula z~1; for simple kriging also define beta (see below); for universal kriging, suppose z is linearly dependent on x and y, use the formula z~x+y
Could someone explain me what this formula is for?
try
methods(variogram)
and you'll see that gstat has several methods for variogram, one requiring a gstat object as first argument.
Given a data.frame, the easiest is to use the formula method:
variogram(z~1, ~x+y, data)
which specifies that in data, z is the observed variable of interest, ~1 specifies a constant mean model, ~x+y specify that the coordinates are found in columns x and y of data.

Prediction at a new value using lowess function in R

I am using lowess function to fit a regression between two variables x and y. Now I want to know the fitted value at a new value of x. For example, how do I find the fitted value at x=2.5 in the following example. I know loess can do that, but I want to reproduce someone's plot and he used lowess.
set.seed(1)
x <- 1:10
y <- x + rnorm(x)
fit <- lowess(x, y)
plot(x, y)
lines(fit)
Local regression (lowess) is a non-parametric statistical method, it's a not like linear regression where you can use the model directly to estimate new values.
You'll need to take the values from the function (that's why it only returns a list to you), and choose your own interpolation scheme. Use the scheme to predict your new points.
Common technique is spline interpolation (but there're others):
https://www.r-bloggers.com/interpolation-and-smoothing-functions-in-base-r/
EDIT: I'm pretty sure the predict function does the interpolation for you. I also can't find any information about what exactly predict uses, so I've tried to trace the source code.
https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/library/stats/R/loess.R
else { ## interpolate
## need to eliminate points outside original range - not in pred_
I'm sure the R code calls the underlying C implementation, but it's not well documented so I don't know what algorithm it uses.
My suggestion is: either trust the predict function or roll out your own interpolation algorithm.

What is the objective of model.matrix()?

I'm currently going through the 'Introduction to Statistical Learning' MOOC by Stanford OpenX. In one of the lab exercises, it suggests creating a model matrix from the test data by explicitly using model.matrix().
Extract from textbook
We now compute the validation set error for the best model of each model size. We first make a model matrix from the test data.
test.mat=model.matrix (Salary∼.,data=Hitters [test ,])
The model.matrix() function is used in many regression packages for
building an X matrix from data. Now we run a loop, and for each size i, we
extract the coefficients from regfit.best for the best model of that
size, multiply them into the appropriate columns of the test model
matrix to form the predictions, and compute the test MSE.
val.errors =rep(NA ,19)
for(i in 1:19){
coefi=coef(regfit .best ,id=i)
pred=test.mat [,names(coefi)]%*% coefi
val.errors [i]= mean(( Hitters$Salary[test]-pred)^2)
}
I understand that model.matrix would convert string variables into values with different levels, and that models like lm() would do the conversions under the hood.
However, what are the instances that we would explicitly use model.matrix(), and why?

Resources