How to add p-values to odds ratio with logistic svyglm()? - r

I am using the code below to get odds ratios and confidence intervals for my svyglm model.
model <- svyglm(y ~ x + covariate,
design = survey_design,
family = quasibinomial(link = logit))
exp(cbind(OR = coef(model), confint(model)))
I get the p-values when I use summary, however, this returns coefficients that then need to be exponentiated. How do I add these p-values to the odds ratio and confint table?

There's a tidy method for svlglm objects in the broom package.
library(broom)
tidy(model, expo=TRUE, conf.int=TRUE)

Related

How to get r-squared using mice and matchthem?

I am having trouble with getting the r-squared and standardized beta for the pooled function after using mice and matchthem function:
for each imputed dataset
matched.models_1 <- with(matched,
lm(Grade_re ~ interv_g))
summary(matched.models_1, conf.int = T)
for pool
matched.results_1 <- pool(matched.models_1)
summary(matched.results_1, conf.int = T)
pool.r.squared(matched.results_1)
Error in pool.r.squared(matched.results_1) :
r^2 can only be calculated for results of the 'lm' modeling function
I would like to get standardized beta and r-squared for the pooled result.

Standard Error of Ridge Logistic Regression Coefficient using caret

I am using caret package in R, to perform Ridge Logistic Regression.
Now I am able to find the coefficients for each variable.
Question is: How to know the standard error of coefficient for each variable produce using Ridge logistic regression?
Here is the sample code that I have:-
Ridge1 <- train(Group ~., data = train, method = 'glmnet',
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 0,
lambda = lambda),
family="binomial")
Coefficient of Ridge logistic regression
coef(Ridge1$finalModel, Ridge1$bestTune$lambda)
How to get a result as in logistic regression model (ie: the standard error, wald statistic, p-value.. etc?)
You don't get p-values and confidence intervals from ridge or glmnet regressions because it is very difficult to estimate the distribution of the estimator when a penalization term is present. The first part of the publication for R package hmi touches on this and you can check out post such as this and this
We can try something below, for example getting the optimal lambda from caret and using that in another package hmi to estimate confidence intervals and p-values, but I would interpret these with caution, they are very different from a custom logistic glm.
library(caret)
library(mlbench)
data(PimaIndiansDiabetes)
X = as.matrix(PimaIndiansDiabetes[,-ncol(PimaIndiansDiabetes)])
y = as.numeric(PimaIndiansDiabetes$diabetes)-1
lambda = 10^seq(-5,4,length.out=25)
Ridge1 <- train(x=X,y=factor(y), method = 'glmnet',family="binomial",
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 0,
lambda = lambda))
bestLambda = Ridge1$bestTune$lambda
Use hdi, but note that the coefficients will not be exactly the same as what you get with caret, or glmnet:
library(hdi)
fit = ridge.proj(X,y,family="binomial",lambda=bestLambda)
cbind(fit$bhat,fit$se,fit$pval)
[,1] [,2] [,3]
pregnant 0.1137868935 0.0314432291 2.959673e-04
glucose 0.0329008177 0.0035806920 3.987411e-20
pressure -0.0122503030 0.0051224313 1.677961e-02
triceps 0.0009404808 0.0067935741 8.898952e-01
insulin -0.0012293122 0.0008902878 1.673395e-01
mass 0.0787408742 0.0145166392 5.822097e-08
pedigree 0.9120151630 0.2927090989 1.834633e-03
age 0.0116844697 0.0092017927 2.041546e-01

Find R-square value of Weibull fit (Survival model) in R

I have a survival object (S) for which I am doing a weibull fit using the survreg function and weibull distribution in R.
S = Surv(data$ValueX, data$ValueY)
W = Survreg(S ~ 1, data=data, dist="weibull")
How do I extract the R-square value of the Weibull fit which is essentially a linear line? Or is there a function to calculate the correlation coefficient value Rho?
Basically, I want to calculate the goodness of fit.
Look at pam.censor in the PAmeasures package which produces an R^2 like statistic. Using the ovarian dataset from the survival package:
library(PAmeasures)
library(survival)
fit.s <- survreg(Surv(futime, fustat) ~ age, data = ovarian, dist="weibull" )
p <- predict(fit.s, type = "response")
with(ovarian, pam.censor(futime, p, fustat))
For the ovarian data with an age regressor we get a value of only 0.0915 .
Another idea is that for a Weibull model with no covariates we have S(t) = exp(- (lambda * t)^p) so log(-log(S(t))) is linear in log(t) hence we could use the R squared of the corresponding regression to measure how well the model fits to a Weibull.
library(survival)
fit1 <- survfit(Surv(futime, fustat) ~ 1, data = ovarian)
sum1 <- summary(fit1, times = ovarian$futime)
fo <- log(-log(surv)) ~ log(time)
d <- as.data.frame(sum1[c("time", "surv")])
fit.lm <- lm(fo, d)
summary(fit.lm)$r.sq
plot(fo, d)
abline(fit.lm)
For the ovarian data without covariates the R^2 at 93% is high but the plot does suggest systematic departures from linearity so it may not really be Weibull.
Other
Not sure if this is of interest but the eha package has the check.dist function which can be used for a visual comparison of a parametric baseline hazard model to a cox proportional hazard model. See the documentation as well as:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5233524/
Using the ovarian dataset from survival:
library(eha)
library(surival)
fit.c <- coxreg(Surv(futime, fustat) ~ age, data = ovarian)
fit.p <- phreg(Surv(futime, fustat) ~ age, data = ovarian, dist = "weibull")
check.dist(fit.c, fit.p)
The survAUC package has three functions that provide r squared type statistics for cox proportional hazard models (OXS, Nagelk and XO).

How to calculate R Squared value for Lasso regression using glmnet in R

I am performing lasso regression in R using glmnet package:
fit.lasso <- glmnet(x,y)
plot(fit.lasso,xvar="lambda",label=TRUE)
Then using cross-validation:
cv.lasso=cv.glmnet(x,y)
plot(cv.lasso)
One tutorial (last slide) suggest the following for R^2:
R_Squared = 1 - cv.lasso$cvm/var(y)
But it did not work.
I want to understand the model efficiency/performance in fitting the data. As we usually get R^2 and adjusted R^2 when performing lm() function in r.
If you are using "gaussian" family, you can access R-squared value by
fit.lasso$glmnet.fit$dev.ratio
I use the example data to demonstrate it
library(glmnet)
load data
data(BinomialExample)
head(x)
head(y)
For cross validation
cvfit = cv.glmnet(x, y, family = "binomial", type.measure = "class")
rsq = 1 - cvfit$cvm/var(y)
plot(cvfit$lambda,rsq)
Firtst fit the Lasso model with the selected lambda
...
lasso.model <- glmnet(x=X,y=Y, family = "binomial", alpha=1, lambda = cv.model$lambda.min )
then you could get the pseudo R2 from the fitted model
`lasso.model$dev.ratio`
this value give the deviance explained by the model/Null deviance

R How to get confidence interval for multinominal logit?

Let me use UCLA example on multinominal logit as a running example---
library(nnet)
library(foreign)
ml <- read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")
ml$prog2 <- relevel(ml$prog, ref = "academic")
test <- multinom(prog2 ~ ses + write, data = ml)
dses <- data.frame(ses = c("low", "middle", "high"), write = mean(ml$write))
predict(test, newdata = dses, "probs")
I wonder how can I get 95% confidence interval?
This can be accomplished with the effects package, which I showcased for another question at Cross Validated here.
Let's look at your example.
library(nnet)
library(foreign)
ml <- read.dta("http://www.ats.ucla.edu/stat/data/hsbdemo.dta")
ml$prog2 <- relevel(ml$prog, ref = "academic")
test <- multinom(prog2 ~ ses + write, data = ml)
Instead of using the predict() from base, we use Effect() from effects
require(effects)
fit.eff <- Effect("ses", test, given.values = c("write" = mean(ml$write)))
data.frame(fit.eff$prob, fit.eff$lower.prob, fit.eff$upper.prob)
prob.academic prob.general prob.vocation L.prob.academic L.prob.general L.prob.vocation U.prob.academic
1 0.4396845 0.3581917 0.2021238 0.2967292 0.23102295 0.10891758 0.5933996
2 0.4777488 0.2283353 0.2939159 0.3721163 0.15192359 0.20553211 0.5854098
3 0.7009007 0.1784939 0.1206054 0.5576661 0.09543391 0.05495437 0.8132831
U.prob.general U.prob.vocation
1 0.5090244 0.3442749
2 0.3283014 0.4011175
3 0.3091388 0.2444031
If we want to, we can also plot the predicted probabilities with their respective confidence intervals using the facilities in effects.
plot(fit.eff)
Simply use the confint function on your model object.
ci <- confint(test, level=0.95)
Note that confint is a generic function and a specific version is run for multinom, as you can see by running
> methods(confint)
[1] confint.default confint.glm* confint.lm* confint.multinom*
[5] confint.nls*
EDIT:
as for the matter of calculating confidence interval for the predicted probabilities, I quote from: https://stat.ethz.ch/pipermail/r-help/2004-April/048917.html
Is there any possibility to estimate confidence intervalls for the
probabilties with the multinom function?
No, as confidence intervals (sic) apply to single parameters not
probabilities (sic). The prediction is a probability distribution, so
the uncertainty would have to be some region in Kd space, not an interval.
Why do you want uncertainty statements about predictions (often called
tolerance intervals/regions)? In this case you have an event which
happens or not and the meaningful uncertainty is the probability
distribution. If you really have need of a confidence region, you could
simulate from the uncertainty in the fitted parameters, predict and
summarize somehow the resulting empirical distribution.

Resources