I am running a quantile regression (as my residuals in linear regression were not normally distributed) for a study on the association of mediterranean diet and inflammatory markers. As I was building the model I got outputs for beta coefficients and standard error plus p-values, and confidence intervals. However, once I stratified for low and high levels of exercise, there was no longer an output for standard error. Any ideas?enter image description here
By default rq does not returns standard errors, but rather the confidence interval by inverting a rank test. If you want standard errors you have to specify another method for se, such as se="boot".
Note: the whole point of quantile regression is to move away from means and SD, so this may not be the most adequate estimate for your problem.
Related
I am using the glmmTMB package in R to run a logistic regression glm model with fixed and random effects (random intercepts and slopes). For some background, I have 5 fixed covariates, one of which includes a quadratic (so really 6 fixed effects) and I am including random slopes for each of those 6 covariates. Prior to running my model, I have scaled and centered each covariate (using the scale function) and checked for correlation between covariates (other than the quadratic, correlation < 0.6). I would like to convert the estimates from the model (which are standardized) to unstandardized estimates because I need to create a predictive map in ArcGIS, which have unstandardized rasters. For obtaining unstandardized estimates for use in ArcGIS, I have tried running my model with the raw data (i.e. skipping the scale and center code) but I believe I am running into convergence issues because even though it runs without warnings, the estimates have large standard errors (10-100x larger than the estimate) and the relationship of the estimate (+ or -) flips between the standardized and unstandardized runs. I have found similar posts such as this, this, and this but I don't think they are exactly my issue, or I am not understanding the math in the solutions. Advice would be very much appreciated.
I have been tasked with calculating the SE for logistic regression point estimates (where all my predictor variables are factors). I typically use ggpredict to estimate my predictions which provides CI's. However, we are comparing our results to estimates from program MARK and we find readers have a better grasp at understanding our plots with SE as opposed to 95% CI's.
Based on reading the package notes, it appears I can simply calculate (conf.high - predicted value)/1.96). Am I correct? Or am I missing something and that is not the correct way to calculate SE for the predicted estimates. If I am wrong, any ideas on how I can do this or do I need to just use CI's?
Thank you very much for your help.
hybrid Gibbs models are flexible for fitting spatial pattern data, however, I am confused on how to get the confidence interval for the fitted model's estimate. for instance, I fitted a hybrid geyer model including a hardcore and a geyer saturation components, got the estimates:
Mo.hybrid<-Hybrid(H=Hardcore(), G=Geyer(81,1))
my.hybrid<-ppm(my.X~1,Mo.hybrid, correction="bord")
#beta = 1.629279e-06
#Hard core distance: 31.85573
#Fitted G interaction parameter gamma: 10.241487
what I interested is the gamma, which present the aggregation of points. obviously, the data X is a sample, i.e., of cells in a anatomical image. in order to report statistical result, a confidence interval for gamma is needed. however, i do not have replicates for the image data.
can i simlate 10 time of the fitted hybrid model, then refitted them to get confidence interval of the estimate? something like:
mo.Y<-rmhmodel(cif=c("hardcore","geyer"),
par=list(list(beta=1.629279e-06,hc=31.85573),
list(beta=1, gamma=10.241487,r=81,sat=1)), w=my.X)
Y1<-rmh(model=mo.Y, control = list(nrep=1e6,p=1, fixall=TRUE),
start=list(n.start=c(npoint(my.X))))
Y1.fit<-ppm(Y1~1, Mo.hybrid,rbord=0.1)
# simulate and fit Y2,Y3,...Y10 in same way
or:
Y10<-simulate(my.hybrid,nsim=10)
Y1.fit<-ppm(Y10[1]~1, Mo.hybrid,rbord=0.1)
# fit Y2,Y3,...Y10 in same way
certainly, the algorithms is different, the rmh() can control simulated intensity while the simulate() does not.
now the questions are:
is it right to use simualtion to get confidence interval of estimate?
or the fitted model can provide estimate interval that could be extracted?
if simulation is ok, which algorithm is better in my case?
The function confint calculates confidence intervals for the canonical parameters of a statistical model. It is defined in the standard stats package. You can apply it to fitted point process models in spatstat: in your example just type confint(my.hybrid).
You wanted a confidence interval for the non-canonical parameter gamma. The canonical parameter is theta = log(gamma) so if you do exp(confint(my.hybrid) you can read off the confidence interval for gamma.
Confidence intervals and other forms of inference for fitted point process models are discussed in detail in the spatstat book chapters 9, 10 and 13.
The confidence intervals described above are the asymptotic ones (based on the asymptotic variance matrix using the central limit theorem).
If you really wanted to estimate the variance-covariance matrix by simulation, it would be safer and easier to fit the model using method='ho' (which performs the simulation) and then apply confint as before (which would then use the variance of the simulations rather than the asymptotic variance).
rmh.ppm and simulate.ppm are essentially the same algorithm, apart from some book-keeping. The differences observed in your example occur because you passed different arguments. You could have passed the same arguments to either of these functions.
I'm using R's auto.arimafunction - but it seems like that it does not produce gaussian errors all the time. I cannot find any documentation that it does some bootstrapping of the prediction error (if the error is not gaussian), or what it does if the error is not gaussian?
Estimation does not require Gaussian errors, even when a Gaussian likelihood is being used. A Gaussian likelihood is almost the same as least squares and will give consistent estimates for any error distribution with finite variance.
The only time that the distribution of residuals really matters is when producing prediction intervals. If the residuals are not Gaussian, the default prediction intervals will not necessarily have the correct coverage. But then you can set bootstrap=TRUE and get bootstrapped prediction intervals which are based on the empirical distribution of the residuals.
I'm new to R and am trying to calculate the 95% confidence intervals for the R-squared values and residual standard error for linear models have formed by using the bootstrap method to resample the response variable, and then create 999 linear models by regressing these 999 bootstrapped response variables on the original explanatory variable.
First of all, I am not sure if I should be calculating the 95% CI for R-squared and residual standard error for the ORIGINAL linear model (without the bootstrap data), because that doesn't make sense - the R-squared value is 100% exact for that linear model, and it doesn't make sense to calculate a CI for it.
Is that correct?
Importantly I'm not sure how to calculate the CI for the R-squared values and residual standard error values for the 999 linear models I've created from bootstrapping.
You can definitely use the boot package to do this. But because I may be confused about what you want Ill go step by step.
I make up some fake data
n=10
x=rnorm(n)
realerror=rnorm(n,0,.9)
beta=3
y=beta*x+realerror
make an empty place to catch the statistics I am interested in.
rsquared=NA
sse=NA
Then make a for loop that will resample the data, run a regression and collect two statistics for each iteration.
for(i in 1:999)
{
#create a vector of the index to resample data row-wise with replacement.
use=sample(1:n,replace=T)
lm1=summary(lm(y[use]~x[use]))
rsquared[i]=lm1$r.squared
sse[i]=sum(lm1$residuals^2)
}
Now I want to figure out the confidence intervals so I order each of them and report out the (n*.025)th and the (n*.975)th
first order the statistics
sse=sse[order(sse)]
rsquared=rsquared[order(rsquared)]
Then the 25th is the lower confidence limit and the 975th is the upper confidence limit
> sse[c(25,975)]
[1] 2.758037 18.027106
> rsquared[c(25,975)]
[1] 0.5613399 0.9795167