How to get r-squared using mice and matchthem? - r

I am having trouble with getting the r-squared and standardized beta for the pooled function after using mice and matchthem function:
for each imputed dataset
matched.models_1 <- with(matched,
lm(Grade_re ~ interv_g))
summary(matched.models_1, conf.int = T)
for pool
matched.results_1 <- pool(matched.models_1)
summary(matched.results_1, conf.int = T)
pool.r.squared(matched.results_1)
Error in pool.r.squared(matched.results_1) :
r^2 can only be calculated for results of the 'lm' modeling function
I would like to get standardized beta and r-squared for the pooled result.

Related

How to add p-values to odds ratio with logistic svyglm()?

I am using the code below to get odds ratios and confidence intervals for my svyglm model.
model <- svyglm(y ~ x + covariate,
design = survey_design,
family = quasibinomial(link = logit))
exp(cbind(OR = coef(model), confint(model)))
I get the p-values when I use summary, however, this returns coefficients that then need to be exponentiated. How do I add these p-values to the odds ratio and confint table?
There's a tidy method for svlglm objects in the broom package.
library(broom)
tidy(model, expo=TRUE, conf.int=TRUE)

Standard Error of Ridge Logistic Regression Coefficient using caret

I am using caret package in R, to perform Ridge Logistic Regression.
Now I am able to find the coefficients for each variable.
Question is: How to know the standard error of coefficient for each variable produce using Ridge logistic regression?
Here is the sample code that I have:-
Ridge1 <- train(Group ~., data = train, method = 'glmnet',
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 0,
lambda = lambda),
family="binomial")
Coefficient of Ridge logistic regression
coef(Ridge1$finalModel, Ridge1$bestTune$lambda)
How to get a result as in logistic regression model (ie: the standard error, wald statistic, p-value.. etc?)
You don't get p-values and confidence intervals from ridge or glmnet regressions because it is very difficult to estimate the distribution of the estimator when a penalization term is present. The first part of the publication for R package hmi touches on this and you can check out post such as this and this
We can try something below, for example getting the optimal lambda from caret and using that in another package hmi to estimate confidence intervals and p-values, but I would interpret these with caution, they are very different from a custom logistic glm.
library(caret)
library(mlbench)
data(PimaIndiansDiabetes)
X = as.matrix(PimaIndiansDiabetes[,-ncol(PimaIndiansDiabetes)])
y = as.numeric(PimaIndiansDiabetes$diabetes)-1
lambda = 10^seq(-5,4,length.out=25)
Ridge1 <- train(x=X,y=factor(y), method = 'glmnet',family="binomial",
trControl = trainControl("cv", number = 10),
tuneGrid = expand.grid(alpha = 0,
lambda = lambda))
bestLambda = Ridge1$bestTune$lambda
Use hdi, but note that the coefficients will not be exactly the same as what you get with caret, or glmnet:
library(hdi)
fit = ridge.proj(X,y,family="binomial",lambda=bestLambda)
cbind(fit$bhat,fit$se,fit$pval)
[,1] [,2] [,3]
pregnant 0.1137868935 0.0314432291 2.959673e-04
glucose 0.0329008177 0.0035806920 3.987411e-20
pressure -0.0122503030 0.0051224313 1.677961e-02
triceps 0.0009404808 0.0067935741 8.898952e-01
insulin -0.0012293122 0.0008902878 1.673395e-01
mass 0.0787408742 0.0145166392 5.822097e-08
pedigree 0.9120151630 0.2927090989 1.834633e-03
age 0.0116844697 0.0092017927 2.041546e-01

Getting estimated means after multiple imputation using the mitml, nlme & geepack R packages

I'm running multilevel multiple imputation through the package mitml (using the panimpute() function) and am fitting linear mixed models and marginal models through the packages nlme and geepack and the mitml:with() function.
I can get the estimates, p-values etc for those through the testEstimates() function but I'm also looking to get estimated means across my model predictors. I've tried the emmeans package, which I normally use for getting estimated means when running nlme & geepack without multiple imputation but doing so emmeans tell me "Can't handle an object of class “mitml.result”".
I'm wondering is there a way to get pooled estimated means from the multiple imputation analyses I've run?
The data frames I'm analyzing are longitudinal/repeated measures and in long format. In the linear mixed model I want to get the estimated means for a 2x2 interaction effect and in the marginal model I'm trying to get estimated means for the 6 levels of 'time' variable. The outcome in all models is continuous.
Here's my code
# mixed model
fml <- Dep + time ~ 1 + (1|id)
imp <- panImpute(data=Data, formula=fml, n.burn=50000, n.iter=5000, m=100, group = "treatment")
summary(imp)
plot(imp, trace="all")
implist <- mitmlComplete(imp, "all", force.list = TRUE)
fit <- with(implist, lme(Dep ~ time*treatment, random = ~ 1|id, method = "ML", na.action = na.exclude, control = list(opt = "optim")))
testEstimates(fit, var.comp = TRUE)
confint.mitml.testEstimates(testEstimates(fit, var.comp = TRUE))
# marginal model
fml <- Dep + time ~ 1 + (1|id)
imp <- panImpute(data=Data, formula=fml, n.burn=50000, n.iter=5000, m=100)
summary(imp)
plot(imp, trace="all")
implist <- mitmlComplete(imp, "all", force.list = TRUE)
fit <- with(implist, geeglm(Dep ~ time, id = id, corstr ="unstructured"))
testEstimates(fit, var.comp = TRUE)
confint.mitml.testEstimates(testEstimates(fit, var.comp = TRUE))
is there a way to get pooled estimated means from the multiple imputation analyses I've run?
This is not a reprex without Data, so I can't verify this works for you. But emmeans provides support for mira-class (lists of) models in the mice package. So if you fit your model in with() using the mids rather than mitml.list class object, then you can use that to obtain marginal means of your outcome (and any contrasts or pairwise comparisons afterward).
Using example data found here, which uncomfortably loads an external workspace:
con <- url("https://www.gerkovink.com/mimp/popular.RData")
load(con)
## imputation
library(mice)
ini <- mice(popNCR, maxit = 0)
meth <- ini$meth
meth[c(3, 5, 6, 7)] <- "norm"
pred <- ini$pred
pred[, "pupil"] <- 0
imp <- mice(popNCR, meth = meth, pred = pred, print = FALSE)
## analysis
library(lme4) # fit multilevel model
mod <- with(imp, lmer(popular ~ sex + (1|class)))
library(emmeans) # obtain pooled estimates of means
(em <- emmeans(mod, specs = ~ sex) )
pairs(em) # test comparison

How to calculate R Squared value for Lasso regression using glmnet in R

I am performing lasso regression in R using glmnet package:
fit.lasso <- glmnet(x,y)
plot(fit.lasso,xvar="lambda",label=TRUE)
Then using cross-validation:
cv.lasso=cv.glmnet(x,y)
plot(cv.lasso)
One tutorial (last slide) suggest the following for R^2:
R_Squared = 1 - cv.lasso$cvm/var(y)
But it did not work.
I want to understand the model efficiency/performance in fitting the data. As we usually get R^2 and adjusted R^2 when performing lm() function in r.
If you are using "gaussian" family, you can access R-squared value by
fit.lasso$glmnet.fit$dev.ratio
I use the example data to demonstrate it
library(glmnet)
load data
data(BinomialExample)
head(x)
head(y)
For cross validation
cvfit = cv.glmnet(x, y, family = "binomial", type.measure = "class")
rsq = 1 - cvfit$cvm/var(y)
plot(cvfit$lambda,rsq)
Firtst fit the Lasso model with the selected lambda
...
lasso.model <- glmnet(x=X,y=Y, family = "binomial", alpha=1, lambda = cv.model$lambda.min )
then you could get the pseudo R2 from the fitted model
`lasso.model$dev.ratio`
this value give the deviance explained by the model/Null deviance

Probability predictions with model averaged Cumulative Link Mixed Models fitted with clmm in ordinal package

I found that the predict function is currently not implemented in cumulative link mixed models fitted using the clmm function in ordinal R package. While predict is implemented for clmm2 in the same package, I chose to apply clmm instead because the later allows for more than one random effects. Further, I also fitted several clmm models and performed model averaging using model.avg function in MuMIn package. Ideally, I want to predict probabilities using the average model. However, while MuMIn supports clmm models, predict will also not work with the average model.
Is there a way to hack the predict function so that the function not only could predict probabilities from a clmm model, but also predict using model averaged coefficients from clmm (i.e. object of class "averaging")? For example:
require(ordinal)
require(MuMIn)
mm1 <- clmm(SURENESS ~ PROD + (1|RESP) + (1|RESP:PROD), data = soup,
link = "probit", threshold = "equidistant")
## test random effect:
mm2 <- clmm(SURENESS ~ PROD + (1|RESP) + (1|RESP:PROD), data = soup,
link = "logistic", threshold = "equidistant")
#create a model selection object
mm.sel<-model.sel(mm1,mm2)
##perform a model average
mm.avg<-model.avg(mm.sel)
#create new data and predict
new.data<-soup
##predict with indivindual model
predict(mm1, new.data)
I got the following error message:
In UseMethod("predict") :
no applicable method for predict applied to an object of class "clmm"
##predict with model average
predict(mm.avg, new.data)
Another error is returned:
Error in predict.averaging(mm.avg, new.data) :
predict for models 'mm1' and 'mm2' caused errors
I've been using clmm as well and yes I confirm predict.clmm is NOT (yet?) implemented. I didn't yet check the source code for fake.predict.clmm. It might work. If it doesn't, you're stuck with doing stuff by hand or using predict.clmm2.
I found a potential solution (pasted below) but have not been able to make work for my data.
Solution here: https://gist.github.com/mainambui/c803aaf857e54a5c9089ea05f91473bc
I think the problem is the number of coefficients I am using but am not experienced enough to figure it out. Hopefully this helps someone out though.
This is the model and newdata that I am using, though it is actually a model averaged version. Same predictors though.
ma10 <- clmm(Location3 ~ Sex * Grass3 + Sex * Forb3 + (1|Tag_ID), data =
IP_all_dunes)
ma_1 <- model.avg(ma10, ma8, ma5)##top 3 models
new_ma<- data.frame(Sex = c("m","f","m","f","m","f","m","f"),
Grass3 = c("1","1","1","1","0","0","0","0"),
Forb3 = c("0","0","1","1","0","0","1","1"))
# Arguments:
# - model = a clmm model
# - modelAvg = a clmm model average (object of class averaging)
# - newdata = a dataframe of new data to apply the model to
# Returns a dataframe of predicted probabilities for each row and response level
fake.predict.clmm <- function(modelAvg, newdata) {
# Actual prediction function
pred <- function(eta, theta, cat = 1:(length(theta) + 1), inv.link = plogis) {
Theta <- c(-1000, theta, 1000)
sapply(cat, function(j) inv.link(Theta[j + 1] - eta) - inv.link(Theta[j] -
eta))
}
# Multiply each row by the coefficients
#coefs <- c(model$beta, unlist(model$ST))##turn off if a model average is used
beta <- modelAvg$coefficients[2,3:12]
coefs <- c(beta, unlist(modelAvg$ST))
xbetas <- sweep(newdata, MARGIN=2, coefs, `*`)
# Make predictions
Theta<-modelAvg$coefficients[2,1:2]
#pred.mat <- data.frame(pred(eta=rowSums(xbetas), theta=model$Theta))
pred.mat <- data.frame(pred(eta=rowSums(xbetas), theta=Theta))
#colnames(pred.mat) <- levels(model$model[,1])
a<-attr(modelAvg, "modelList")
colnames(pred.mat) <- levels(a[[1]]$model[,1])
pred.mat
}

Resources