I'm using R's auto.arimafunction - but it seems like that it does not produce gaussian errors all the time. I cannot find any documentation that it does some bootstrapping of the prediction error (if the error is not gaussian), or what it does if the error is not gaussian?
Estimation does not require Gaussian errors, even when a Gaussian likelihood is being used. A Gaussian likelihood is almost the same as least squares and will give consistent estimates for any error distribution with finite variance.
The only time that the distribution of residuals really matters is when producing prediction intervals. If the residuals are not Gaussian, the default prediction intervals will not necessarily have the correct coverage. But then you can set bootstrap=TRUE and get bootstrapped prediction intervals which are based on the empirical distribution of the residuals.
Related
I am running a quantile regression (as my residuals in linear regression were not normally distributed) for a study on the association of mediterranean diet and inflammatory markers. As I was building the model I got outputs for beta coefficients and standard error plus p-values, and confidence intervals. However, once I stratified for low and high levels of exercise, there was no longer an output for standard error. Any ideas?enter image description here
By default rq does not returns standard errors, but rather the confidence interval by inverting a rank test. If you want standard errors you have to specify another method for se, such as se="boot".
Note: the whole point of quantile regression is to move away from means and SD, so this may not be the most adequate estimate for your problem.
hybrid Gibbs models are flexible for fitting spatial pattern data, however, I am confused on how to get the confidence interval for the fitted model's estimate. for instance, I fitted a hybrid geyer model including a hardcore and a geyer saturation components, got the estimates:
Mo.hybrid<-Hybrid(H=Hardcore(), G=Geyer(81,1))
my.hybrid<-ppm(my.X~1,Mo.hybrid, correction="bord")
#beta = 1.629279e-06
#Hard core distance: 31.85573
#Fitted G interaction parameter gamma: 10.241487
what I interested is the gamma, which present the aggregation of points. obviously, the data X is a sample, i.e., of cells in a anatomical image. in order to report statistical result, a confidence interval for gamma is needed. however, i do not have replicates for the image data.
can i simlate 10 time of the fitted hybrid model, then refitted them to get confidence interval of the estimate? something like:
mo.Y<-rmhmodel(cif=c("hardcore","geyer"),
par=list(list(beta=1.629279e-06,hc=31.85573),
list(beta=1, gamma=10.241487,r=81,sat=1)), w=my.X)
Y1<-rmh(model=mo.Y, control = list(nrep=1e6,p=1, fixall=TRUE),
start=list(n.start=c(npoint(my.X))))
Y1.fit<-ppm(Y1~1, Mo.hybrid,rbord=0.1)
# simulate and fit Y2,Y3,...Y10 in same way
or:
Y10<-simulate(my.hybrid,nsim=10)
Y1.fit<-ppm(Y10[1]~1, Mo.hybrid,rbord=0.1)
# fit Y2,Y3,...Y10 in same way
certainly, the algorithms is different, the rmh() can control simulated intensity while the simulate() does not.
now the questions are:
is it right to use simualtion to get confidence interval of estimate?
or the fitted model can provide estimate interval that could be extracted?
if simulation is ok, which algorithm is better in my case?
The function confint calculates confidence intervals for the canonical parameters of a statistical model. It is defined in the standard stats package. You can apply it to fitted point process models in spatstat: in your example just type confint(my.hybrid).
You wanted a confidence interval for the non-canonical parameter gamma. The canonical parameter is theta = log(gamma) so if you do exp(confint(my.hybrid) you can read off the confidence interval for gamma.
Confidence intervals and other forms of inference for fitted point process models are discussed in detail in the spatstat book chapters 9, 10 and 13.
The confidence intervals described above are the asymptotic ones (based on the asymptotic variance matrix using the central limit theorem).
If you really wanted to estimate the variance-covariance matrix by simulation, it would be safer and easier to fit the model using method='ho' (which performs the simulation) and then apply confint as before (which would then use the variance of the simulations rather than the asymptotic variance).
rmh.ppm and simulate.ppm are essentially the same algorithm, apart from some book-keeping. The differences observed in your example occur because you passed different arguments. You could have passed the same arguments to either of these functions.
I'm trying to fit a poisson model with glmnet where I know that a lot of the parameters have to be included in the model and then unpenalized. I'm setting their penalty.factor to 0. Unfortunately, glmnet has issue with convergence as the number of unpenalized parameters is high. Here is an example from the glmnet manual where the algorithm doesn't converge if the number of unpenalized parameters is high.
library(glmnet)
N=500; p=300
nzc=5
x=matrix(rnorm(N*p),N,p)
beta=rnorm(nzc)
f = x[,seq(nzc)]%*%beta
mu=exp(f)
y=rpois(N,mu)
pena=rep(0,ncol(x))
pena[1:10]=1 # penalized only the first ten parameters
fit=glmnet(x,y,family="poisson",penalty.factor=pena,alpha=0.9)
# fit is not converging
pena=rep(0,ncol(x))
pena[1:250]=1 # penalized the first 250 parameters
fit=glmnet(x,y,family="poisson",penalty.factor=pena,alpha=0.9)
# fit is converging
I know that the model doesn't make sense for this example but at least I can reproduce the same error as in my data. It looks like lambda_max is set to infinity. Does it mean that the algorithm can't find a minimum lambda_max where all the penalized parameters are zero? How can I fit a model where a lot of the parameters are unpenalized?
Thanks a lot
I am trying to estimate confidence intervals for a mixed effects poisson model using robust standard errors in R. I followed these instructions and was able to estimate confidence intervals for a model without random effects.
Now, I would like to estimate confidence intervals with robust standard errors for a poisson model, with a random term. It seems like the sandwich command does not work for glmer, only glm. I have not been able to find a good solution yet. Any suggestions?
I want to fit a distribution to my data. I use fitdistrplus package in r to find the distribution. I can compare the goodness of fit results for different distributions to see which one is more fitted to my data but I don't know how to check the pvalue for goodness of fit test for each of the distributions. The results might show that among gamma, lognormal and exponential, exponential distribution has the lower statistics for anderson darling test but I don't know how to check if pvalue for these tests does not reject the null hypothesis. Is there any built in function in R which gives the pvalues?
Here is a piece of code I used as an example:
d <- sample(100,50)
library(fitdistrplus)
descdist(d)
fitg <- fitdist(d,"gamma")
fitg2 <- fitdist(d,"exp")
gofstat(list(fitg,fitg2))
This code makes 50 random numbers from 0 to 100 and tries to find best fitted model to these data. If descdist(d) shows that gamma and exponential are the two candidates as the best fitted model, fitg and fitg2 finds their related models. the last line compares Ks and anderson darling statistics to show which distribution is most fitted. Distribution with lower value for these tests is the best. However, I dont know how to find p-values for fitg and fitg2 before comparying them. If pvalues show that none of these distributions are not fitted to these data, there is no point to comparing their goodness of fit statistics to my knowledge.
Any help is appreciated.
Thanks