Cost function in cv. glm for a fitted logistic model when cutoff value of the model is not 0.5 - r

I have a logistic model fitted with the following R function:
glmfit<-glm(formula, data, family=binomial)
A reasonable cutoff value in order to get a good data classification (or confusion matrix) with the fitted model is 0.2 instead of the mostly used 0.5.
And I want to use the cv.glm function with the fitted model:
cv.glm(data, glmfit, cost, K)
Since the response in the fitted model is a binary variable an appropriate cost function is (obtained from "Examples" section of ?cv.glm):
cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)
As I have a cutoff value of 0.2, can I apply this standard cost function or should I define a different one and how?

Related

95% CI for the ICC in linear mixed effects model (multilevel model, hierarchical model)

I fitted a linear mixed effect model to predict the math score as the outcome, x= participant factor (nominal or ordinal) as the fixed effect, Schl is the random effect. Then I compared it with the simple linear regression model using compare_performance, and while the output gives the ICC, I was not sure how to calculate the 95% for it? (for coefficients I used confintconfint and it did the job)
lm1<- lm(math~ gender, data= df)
lme1<- lmer(math~gender+(1|schl), data=df)
compare_performance(lm1,lme1)
the ICC was 0.15
From this gist from Peter Dahlgren, taken in turn from a CrossValidated answer by #Ashe, here is the crux:
calc.icc <- function(y) {
sumy <- summary(y)
(sumy$varcor$id[1]) / (sumy$varcor$id[1] + sumy$sigma^2)
}
boot.icc <- bootMer(mymod, calc.icc, nsim=1000)
#Draw from the bootstrap distribution the usual 95% upper and lower confidence limits
quantile(boot.icc$t, c(0.025, 0.975))
You can (and should) check that this calc.icc() function gives the same results as your compare_performance() function. Since this uses parametric bootstrapping, you can substitute any ICC function you like as it long takes a fitted model as input and returns the ICC as a single numeric value. (Also, because it uses PB, it will be slow; there are potentially faster approximate methods, but PB is reliable and easy to program.)

Confidence Interval of the predicted mean of a LMER object for large dataset

I would like to get the confidence interval (CI) for the predicted mean of a Linear Mixed Effect Model on a large dataset (~40k rows), which is itself a subset of an even larger dataset. This CI is then used for estimating the uncertainty of another calculation that uses the mean and its related CI as input data.
I managed to create a prediction estimate and interval for the full dataset, but a Prediction Interval is not the same and much larger than a CI. Beside bootstrapping (which takes way too much time with this much data), I cannot find a method that would allow me to estimate a CI – either because it is throwing errors or because it only offers to calculate Prediction intervals.
I quite recently moved into LME and I might therefore have overseen some obvious method.
Here is what I did so far in more detail:
The input data is confidential and I can therefore unfortunately not share any extract.
But in general, we have one dependent variable (y) representing the probability of a event and 2 categorical (c1 and c2) and two continuous variables (x1 and x2) with some weighting factor (w1). Some values in the dataset are missing. An extract of the first rows of the data could look like the example below:
c1
c2
x1
x2
w1
y
London
small
1
10
NA
NA
London
small
1
20
NA
NA
London
large
2
10
0.2
0.1
Paris
small
1
10
0.2
0.23
Paris
large
2
10
0.3
0.3
Based on this input data, I am then fitting a LMER model in the following form:
lmer1 <- lme4::lmer( y ~ x1 * poly(x2, 5) + ((x1 * poly(x2 ,5)) | c1),
data = df,
weights = w1,
control = lme4::lmerControl(check.conv.singular = lme4::.makeCC(action = "ignore", tol = 1e-3)))
This runs for some minutes and returns several warnings:
Warning messages: 1: In optwrap(optimizer, devfun, getStart(start,
rho$pp), lower = rho$lower, : convergence code 5 from nloptwrap:
NLOPT_MAXEVAL_REACHED: Optimization stopped because maxeval (above)
was reached.
2: In checkConv(attr(opt, “derivs”), opt$par, ctrl =
control$checkConv, : unable to evaluate scaled gradient
3: In checkConv(attr(opt, “derivs”), opt$par, ctrl =
control$checkConv, : Model failed to converge: degenerate Hessian with
11 negative eigenvalues
I increased the MAXEVAL parameter but this still did not help to get rid of the warnings and I found that despite these warnings, the model is still fitted. I therefore started to apply different methods to get a prediction of the mean for the whole dataset and the related CI for the mean.
predictInterval
I started with creating a Prediction Interval for the full dataset:
predictions <- merTools::predictInterval(lmer1,
newdata = df,
which = "full",
n.sims = 1000,
include.resid.var = FALSE,
level=0.95,
stat="mean")
However, as stated above, the Prediction Interval is not the same as the CI (see also https://datascienceplus.com/prediction-interval-the-wider-sister-of-confidence-interval/).
I found that the general predict function has the option to set interval to either “prediction” or “confidence”, but this option does not exist with the prediction from a LMER object. And I could not find another possibility to switch from Prediction Interval to CI – even though I would believe that the data drawn should be sufficient to do this.
confint
I then saw that there is a function called “confint”, but when running this function I get the following error:
predicition_ci = lme4::confint.merMod(lmer1)
Computing profile confidence intervals ...
Error in zeta(shiftpar, start = opt[seqpar1][-w]) : profiling
detected new, lower deviance
In addition: Warning messages:
1: In commonArgs(par, fn, control, environment()) : maxfun < 10 *
length(par)^2 is not recommended.
2: In optwrap(optimizer, devfun, x#theta, lower = x#lower, calc.derivs
= TRUE, : convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
I found this thread (Error when estimating CI for GLMM using confint()), which said that I need to reduce the “devtol” parameter by setting a different profile. But doing so results in the same error:
lmer1_devtol = profile(lmer1, devtol = 1e-7)
Error in zeta(shiftpar, start = opt[seqpar1][-w]) : profiling
detected new, lower deviance
In addition: Warning messages:
1: In commonArgs(par, fn, control, environment()) : maxfun < 10 *
length(par)^2 is not recommended.
2: In optwrap(optimizer, devfun, x#theta, lower = x#lower, calc.derivs
= TRUE, : convergence code 1 from bobyqa: bobyqa -- maximum number of function evaluations exceeded
add_ci
I found the function “add_ci” but this again resulted in another error:
predictions_ci = ciTools::add_ci(df, lmer1,
alpha = 0.05)
Error in levelfun(r, n, allow.new.levels = allow.new.levels) : new
levels detected in newdata
I then set the new “allow.new.levels” parameter to TRUE like in the description of the prediction function, but this parameter seems not to be carried through:
predictions_ci = ciTools::add_ci(df, lmer1,
alpha = 0.05,
allow.new.levels = TRUE)
Error in levelfun(r, n, allow.new.levels = allow.new.levels) : new
levels detected in newdata
Diag
I found a method to calculate CI intervals for the sleepstudy data, which uses a matrix conversion with diag.
Designmat <- model.matrix(as.formula("y ~ x1 * poly(x2, 5)")[-2], df)
predvar <- diag(Designmat %*% vcov(lmer1) %*% t(Designmat))
#With new data
newdat = df
newdat$pred <- predict(lmer1, newdat, allow.new.levels = TRUE)
Designmat <- model.matrix(formula(lmer1)[-2], newdat)
But the diag method does not work for such large datasets.
bootMer
As said earlier, the boostrapping of the confidence interval with bootMer is taking too much time for this subset of data (I started it 1 day ago and it is still running). I tried to use some parallel processing with the sleepstudy sample data but this could not increase the speed dramatically, so I would assume it will have the same effect on my large dataset.
merBoot <- bootMer(lmer1, predict, nsim = 1000, re.form = NA)
Others
I have read through all these post (and more), but none of them could help me to get the CI in reasonable time for my case. But maybe I have overseen something.
https://stats.stackexchange.com/questions/344012/confidence-intervals-from-bootmer-in-r-and-pros-cons-of-different-interval-type
https://stats.stackexchange.com/questions/117641/how-trustworthy-are-the-confidence-intervals-for-lmer-objects-through-effects-pa
How to get coefficients and their confidence intervals in mixed effects models?
Error when estimating CI for GLMM using confint()
https://stats.stackexchange.com/questions/235018/r-extract-and-plot-confidence-intervals-from-a-lmer-object-using-ggplot
How to get confidence intervals for lmer object?
Confidence intervals for the predicted probabilities from glmer object, error with bootMer
https://rdrr.io/cran/ciTools/man/add_ci.lmerMod.html
Error when estimating Confidence interval in lme4
https://fromthebottomoftheheap.net/2018/12/10/confidence-intervals-for-glms/
https://cran.r-project.org/web/packages/merTools/vignettes/Using_predictInterval.html
https://drewtyre.rbind.io/classes/nres803/week_12/lab_12/
Unsurprising to me but unfortunate for you, nonconvergence of mixed model estimation and difficulty in generating confidence intervals results from the misuse of a linear model for data with a limited dependent variable. "Despite these warnings, the model is still fitted" is a dangerous practice, as iterations are not to be used from predictions if not converged. As you described, the dependent variable (y) represents the probability of an event, which is a continuous variable between zero and one. Using a linear model to predict probability constitutes a linear probability regression, which requires censoring predicted outcomes (e.g. forcing all predicted values greater than .99 to be .99 while forcing all predicted values smaller than .01 to be .01) and adjusting for heterogenous variances using weighted least squares (see https://bookdown.org/ccolonescu/RPoE4/heteroskedasticity.html). Having continuous variables produce both fixed and random effects also burden the convergence, while some or all the random effects of continuous variables may not be necessary. The use of weights can be also problematic.
Instead of a linear probability regression, beta regression works best for dependent variables which are proportions and probabilities. Beta regression without random effects is done in betareg::betareg(). glmmTMB::glmmTMB() handles beta regression with random effects. Start from a simple setting where only the intercept has random effects such as
glmmTMB(y ~ 1 + x1 * poly(x2, 5) + c2 + (1 | c1), family = list(family = "beta", link ="logit"), data = df)
You may compare the result with glmer() and lmer()
glmer(y ~ 1 + x1 * poly(x2, 5) + c2 + (1 | c1), family = gaussian(link = "logit"), data = df)
lmer(log(y/(1-y)) ~ 1 + x1 * poly(x2, 5) + c2 + (1 | c1), data = df)
glmer() and lmer() with the above specifications are equivalent, and both assume that predicting log(y/(1-y)) has normal residuals, while glmmTMB() assumes that y follows a gamma distribution. lmer() results are easier to explain and receive wider support from other packages, since they are linear models. On the other hand, glmmTMB() may fit better according to AIC, BIC, and log likelihood. Note that all three requires y strictly in (0, 1) noninclusive. To include occasional zeros and ones, manipulate observations at both boundaries by introducing a small tolerance usually equal to half of the smaller distance from a boundary to its closest observed value (see https://stats.stackexchange.com/questions/109702 and https://graphworkflow.com/eda/bounded01/). For probabilities with either or both of many zeros and ones, zero-, one-, and zero-one–inflated beta regression is fitted via gamlss::gamlss(). See Korosteleva, O. (2019). Advanced regression models with SAS and R. CRC Press.
Add random effects of slopes if necessary according to likelihood ratio tests. Make sure there are enough levels in c1 (e.g. more than 10 different cities) to necessitate mixed effect models. The {glmmTMB} package extends glm() and glmer(). Its alternative {brms} package is built for Bayesian approach. Note that the weights = argument in glmmTMB() as in glm() specifies that values in weights are inversely proportional to the dispersions and are not automatically scaled to sum to one unless integer values which specifies number of observation units. Therefore, you need to investigate what w1 stands for and evaluate how to use it in modeling.
merTools::predictInterval() generates many kinds of intervals for mixed models, some comparable to confidence intervals and prediction intervals in linear models without random effects. However, it supports lmer() model objects only. See https://cran.r-project.org/web/packages/merTools/vignettes/merToolsIntro.html and https://cran.r-project.org/web/packages/merTools/vignettes/Using_predictInterval.html.
predictInterval(lmer(), include.resid.var = F) includes uncertainty from both fixed and random effects of all coefficients including the intercept but excludes variation from multiple measurements of the same group or individual. This can be considered similar to prediction intervals of linear models without random effects. predictInterval(lmer(), include.resid.var = F, fix.intercept.variance = T) generates shorter CI than above by accounting for covariance between the fixed and random effects of the intercept. predictInterval(lmer(), include.resid.var = F, ignore.fixed.terms = "(Intercept)") also shortens CI by removing uncertainty from the fixed effect of the intercept. If there are no random slopes other than random intercept, the last two methods are comparable to confidence intervals of of linear models without random effects. confint(lmear()) and confint(profile(lmear())) generates confidence intervals of modal parameters such as a slope, so they do not produce confidence intervals of predicted outcomes.
You may also find the following functions and packages useful for generating CIs of mixed effect models.
ggeffect() {ggeffects} predictions() {marginaleffects} and margins() prediction() {margins} {predictions}
They can produce predictions averaged over observed distribution of covariates, instead of making predictions by holding some predictors at specific values such as means or modes which can be misleading and not useful.

Prediction with glmnet for ridge regression

I used the cross validation function from the glmnet package to get my optimal lambda. Now I want use the model with the optimal lambda to compute the mse of the test data set. For that i need the predicted values of the test data set.
Should i first use the glmnet function to estimate the train data with the optimal lambda that i get via cross validation and store this model and use it for the prediction of the test data values?
Like this:
fitOptRidge5 <- glmnet(x[train ,], y[train], alpha = 0,standardize=TRUE, lambda = cvRidge5$lambda.min)
and predict like this :
ridgePred5 <- predict (fitOptRidge5 ,s=cvRidge5$lambda.min
,newx=x[test ,])
I am not sure about my approach because i set 2 times the optimal lambda.

using lambda.min to extrace coefficients from model trained with glmnet

I am using glmnet to train the logistic regression model and then try to obtain the coefficients with the specific lambda. I used the simple example here:
load("BinomialExample.RData")
fit = glmnet(x, y, family = "binomial")
coef(fit, s = c(0.05,0.01))
I have checked the values of fit$lambda, however, I could not find the specific values of 0.05 or 0.01 in fit$lambda. So how could coef return the coefficients with a lambda not in the fit$lambda vector.
This is explained in the help for coef.glmnet, specifically the exact argument:
exact
This argument is relevant only when predictions are made at values of s (lambda) different from those used in the fitting of the original model. If exact=FALSE (default), then the predict function uses linear interpolation to make predictions for values of s (lambda) that do not coincide with those used in the fitting algorithm. While this is often a good approximation, it can sometimes be a bit coarse. With exact=TRUE, these different values of s are merged (and sorted) with object$lambda, and the model is refit before predictions are made.

How can I pass a weight decay argument to mlogit()?

How can I specify weight decay in a model fit by the mlogit?
The multinom() function of nnet allows you to specify weight decay for the model that is being fit, and mlogit uses this function behind the scenes to fit its models so I imagine that it should be possible to pass the decay argument to multinom, but have not so far found a way to do this.
So far I have attempted to simply pass a value in the model formula, like this.
library(mlogit)
set.seed(1)
data("Fishing", package = "mlogit")
Fishing$wts <- runif(nrow(Fishing)) #for some weights
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
fit1 <- mlogit(mode ~ 0 | income, data = Fish, weights = wts, decay = .01)
fit2 <- mlogit(mode ~ 0 | income, data = Fish, weights = wts)
But the output is exactly the same:
identical(logLik(fit1), logLik(fit2))
[1] TRUE
mlogit() and nnet::multinom() both fit multinomial logistic models (predicting probability of class membership for multiple classes) but they use different algorithms to fit the model. nnet::multinom() uses a neural network to fit the model and mlogit() uses maximum likelihood.
Weight decay is a parameter for neural networks and is not applicable to maximum likelihood.
The effect of weight decay is keep the weights in the neural network from getting too large by penalizing larger weights during the weight update step of the fitting algorithm. This helps to prevent over-fitting and hopefully creates a more general model.
Consider using the pmlr function in the pmlr package. This function implements a "Penalized maximum likelihood estimation for multinomial logistic regression" when called with the default function parameter penalized = TRUE.

Resources