confidence interval of estimates in a fitted hybrid model by spatstat - r

hybrid Gibbs models are flexible for fitting spatial pattern data, however, I am confused on how to get the confidence interval for the fitted model's estimate. for instance, I fitted a hybrid geyer model including a hardcore and a geyer saturation components, got the estimates:
Mo.hybrid<-Hybrid(H=Hardcore(), G=Geyer(81,1))
my.hybrid<-ppm(my.X~1,Mo.hybrid, correction="bord")
#beta = 1.629279e-06
#Hard core distance: 31.85573
#Fitted G interaction parameter gamma: 10.241487
what I interested is the gamma, which present the aggregation of points. obviously, the data X is a sample, i.e., of cells in a anatomical image. in order to report statistical result, a confidence interval for gamma is needed. however, i do not have replicates for the image data.
can i simlate 10 time of the fitted hybrid model, then refitted them to get confidence interval of the estimate? something like:
mo.Y<-rmhmodel(cif=c("hardcore","geyer"),
par=list(list(beta=1.629279e-06,hc=31.85573),
list(beta=1, gamma=10.241487,r=81,sat=1)), w=my.X)
Y1<-rmh(model=mo.Y, control = list(nrep=1e6,p=1, fixall=TRUE),
start=list(n.start=c(npoint(my.X))))
Y1.fit<-ppm(Y1~1, Mo.hybrid,rbord=0.1)
# simulate and fit Y2,Y3,...Y10 in same way
or:
Y10<-simulate(my.hybrid,nsim=10)
Y1.fit<-ppm(Y10[1]~1, Mo.hybrid,rbord=0.1)
# fit Y2,Y3,...Y10 in same way
certainly, the algorithms is different, the rmh() can control simulated intensity while the simulate() does not.
now the questions are:
is it right to use simualtion to get confidence interval of estimate?
or the fitted model can provide estimate interval that could be extracted?
if simulation is ok, which algorithm is better in my case?

The function confint calculates confidence intervals for the canonical parameters of a statistical model. It is defined in the standard stats package. You can apply it to fitted point process models in spatstat: in your example just type confint(my.hybrid).
You wanted a confidence interval for the non-canonical parameter gamma. The canonical parameter is theta = log(gamma) so if you do exp(confint(my.hybrid) you can read off the confidence interval for gamma.
Confidence intervals and other forms of inference for fitted point process models are discussed in detail in the spatstat book chapters 9, 10 and 13.
The confidence intervals described above are the asymptotic ones (based on the asymptotic variance matrix using the central limit theorem).
If you really wanted to estimate the variance-covariance matrix by simulation, it would be safer and easier to fit the model using method='ho' (which performs the simulation) and then apply confint as before (which would then use the variance of the simulations rather than the asymptotic variance).
rmh.ppm and simulate.ppm are essentially the same algorithm, apart from some book-keeping. The differences observed in your example occur because you passed different arguments. You could have passed the same arguments to either of these functions.

Related

Using GAMLSS, the difference between fitDist() and gamlss()

When using the GAMLSS package in R, there are many different ways to fit a distribution to a set of data. My data is a single vector of values, and I am fitting a distribution over these values.
My question is this: what is the main difference between using fitDist() and gamlss() since they give similar but different answers for parameter values, and different worm plots?
Also, using the function confint() works for gamlss() fitted objects but not for objects fitted with fitDist(). Is there any way to produce confidence intervals for parameters fitted with the fitDist() function? Is there an accuracy difference between the two procedures? Thanks!
m1 <- fitDist()
fits many distributions and chooses the best according to a
generalized Akaike information criterion, GAIC(k), wit penalty k for each
fitted parameter in the distribution, where k is specified by the user,
e.g. k=2 for AIC,
k = log(n) for BIC,
k=4 for a Chi-squared test (rounded from 3.84, the 5% critical value of a Chi-squared distribution with 1 degree of fereedom), which is my preference.
m1$fits
gives the full results from the best to worst distribution according to GAIC(k).

No standard error output in quantile regression using R

I am running a quantile regression (as my residuals in linear regression were not normally distributed) for a study on the association of mediterranean diet and inflammatory markers. As I was building the model I got outputs for beta coefficients and standard error plus p-values, and confidence intervals. However, once I stratified for low and high levels of exercise, there was no longer an output for standard error. Any ideas?enter image description here
By default rq does not returns standard errors, but rather the confidence interval by inverting a rank test. If you want standard errors you have to specify another method for se, such as se="boot".
Note: the whole point of quantile regression is to move away from means and SD, so this may not be the most adequate estimate for your problem.

auto.arima produces non-gaussian residual

I'm using R's auto.arimafunction - but it seems like that it does not produce gaussian errors all the time. I cannot find any documentation that it does some bootstrapping of the prediction error (if the error is not gaussian), or what it does if the error is not gaussian?
Estimation does not require Gaussian errors, even when a Gaussian likelihood is being used. A Gaussian likelihood is almost the same as least squares and will give consistent estimates for any error distribution with finite variance.
The only time that the distribution of residuals really matters is when producing prediction intervals. If the residuals are not Gaussian, the default prediction intervals will not necessarily have the correct coverage. But then you can set bootstrap=TRUE and get bootstrapped prediction intervals which are based on the empirical distribution of the residuals.

re-estimating confidence of C5.0 rules using another dataset

Context: I have fitted a glmnet to my data. But for operational reason we would actually like to have rule-set. So I then fitted a C5.0Rules model to the predicted class from my glmnet. i.e. the C5.0Rules is essentially approximating my glmnet. However, as a result, the C5.0Rules will report a very high confidence (and other performance metrics), because its target is easy. A natural approach to correct this is to re-estimate the confidence (and other performance metrics) using the real response, or another dataset. But I need to do this so that the model remembers this new confidence, so in the future, it will report the corrected confidence level along with the prediction. How do I do that?
Reproducible example:
library(glmnet)
library(C50)
library(caret)
data(churn)
## original glmnet
glmnet=train(churn~.-state-area_code-international_plan-voice_mail_plan,data=churnTrain,method="glmnet")
## only retain useful predictors
temp=varImp(glmnet)$importance
reducedVar=rownames(temp)[temp>0]
churnTrain2=data.frame(churnTrain[,match(reducedVar,colnames(churnTrain))],
prediction=fitted(glmnet))
## fit my C5.0 which approximates the glmnet prediction
C5=train(prediction~.,data=churnTrain2,method="C5.0Rules")
summary(C5) ## notice the high confidence and performance measure.
(An alternative approach I can think of is to get C5.0 to predict the predicted probability instead of class, but this turns it into a regression problem so I won't be able to use C5.0)

What's the difference between ks test and bootstrap_p for power law fitting?

I want to know the goodness of fit while fitting a power law distribution in R using poweRlaw package.
After estimate_xmin() , I had a p-value 0.04614726. But the bootstrap_p() returns another p-value 0.
So why do these two p-value differ? And how can I judge if it is a power law distribution?
here is the plot when using poweRlaw for fittingpoweRlaw fitting result
You're getting a bit confused. One of the statistics that estimate_xmin returns is the Kolmogorov-Smirnoff statistic (as described in Clauset, Shalizi, Newman (2009)). This statistic is used to estimate the best cut-off value for your model, i.e. xmin. However, this doesn't tell you anything about the model fit.
To assess model suitability is where the bootstrap function comes in.

Resources