I am trying to find influential observations in my logistic regression. Particularly, I am trying to plot Pregibon's delta beta statistics against the predicted probabilities to find these observations.
I could not find any package that would help me to create this statistics. Anyone have any suggestions?
Here is more on Pregibon's delta beta statistics (basically Cook's D for logit): http://people.umass.edu/biep640w/pdf/5.%20%20Logistic%20Regression%202014.pdf
I have found glmtoolbox package that approximates Cook's D for glm - but I am not sure whether this is the correct approach.
Related
I've created 12 imputed samples using the MICE package and wish to run paired sample t-tests and cohen's d calculation using an imputed dataset but I'm not sure how to do this. My end goal is to compare parameter estimates, t-test results and effect size estimates from both complete case analysis and adjusted (via MICE) to compare these, but while I have no issue with parameter estimates, I can't figure out t-tests and cohen's d.
I'm a bit confused as to how to approach this and searching online and in the mice package documentation and has not led to much progress. I did find mi.t.test from the MKmisc package but this appears to be for datasets imputed using Amelia, not MICE, and I can't quite figure it out. Would anyone have any advice or resources here please?
So far I have:
Identified auxiliary variables
Created Predictor Matrix
Imputed missing data m times
Fit & pooled estimates for linear models using with() for parameter estimates using summary()
Is there perhaps a way I can create an object of an imputed dataset that is usable with other analyses or am I looking at this in the wrong way?
I used multiple imputations for the first time for my research, but maybe I can help you by passing on the tips I received.
Perform the t-test on every imputed dataset
Use the Mice pool.scalar function. You can find documentation online. For Q fill in the Mean Difference, and for U the Standard Error Difference.
Then your pooled t-value is: qbar / sqrt(t)
You can find the values of qbar and t in the output of pool.scalar
And your pooled p-value is: 2 * (1 - pt(abs(statistic), pmax(DF, 0.001)))
Hope this helps!
A journal is asking whether I report sample or population beta weights for my regressions. I am using the lm() function in base R and the lme() function from the nlme-package. Which kind of beta weight do they give? I was not able to find any information on this in the package documentations.
I am using the rcorr function within the Hmisc package in R to develop Pearson correlation coefficients and corresponding p-values when analyzing the correlation of several fishery landings time series. The data isn't really important here but what I would like to know is: how are the p-values calculated for this? It states that the asymptotic P-values are approximated by using the t or F distributions but I am wondering if someone could help me find some more information on this or an equation that describes how exactly these values are calculated.
I am interested in calculating p-values within a Cox PH model based upon the maximal test statistic, to get very robust estimates. Does anyone have experience with this?
I have played around a bit with the R package 'coxphf' that incorporates Firth's penalized likelihood, but it seems to be giving me different coefficients and p-values if I chose firth=FALSE vs. use the standard coxph function in 'survival'.
I do not discount being completely lost on this, so any advice would be useful.
Thanks!
I want to fit a distribution to my data. I use fitdistrplus package in r to find the distribution. I can compare the goodness of fit results for different distributions to see which one is more fitted to my data but I don't know how to check the pvalue for goodness of fit test for each of the distributions. The results might show that among gamma, lognormal and exponential, exponential distribution has the lower statistics for anderson darling test but I don't know how to check if pvalue for these tests does not reject the null hypothesis. Is there any built in function in R which gives the pvalues?
Here is a piece of code I used as an example:
d <- sample(100,50)
library(fitdistrplus)
descdist(d)
fitg <- fitdist(d,"gamma")
fitg2 <- fitdist(d,"exp")
gofstat(list(fitg,fitg2))
This code makes 50 random numbers from 0 to 100 and tries to find best fitted model to these data. If descdist(d) shows that gamma and exponential are the two candidates as the best fitted model, fitg and fitg2 finds their related models. the last line compares Ks and anderson darling statistics to show which distribution is most fitted. Distribution with lower value for these tests is the best. However, I dont know how to find p-values for fitg and fitg2 before comparying them. If pvalues show that none of these distributions are not fitted to these data, there is no point to comparing their goodness of fit statistics to my knowledge.
Any help is appreciated.
Thanks