Alternative to car lm.beta for standardizing regression coefficients? - r

I'm trying to standardize regression coefficients for a linear regression model which has interaction terms. Currently, I'm using lm.beta from the car package, but the help file states:
Warning: This function does not produce 'correct' standardized
coefficients when interaction terms are present
Since my regression has interaction terms, this is worrying. Is there an alternative to car's lm.beta which standardizes regression coefficients and works with regression models with interactions?

You can use the scale function to scale your data before passing it to lm. This does not require any extra packages and gives the standardized regression. Here is a simple example:
iris2 <- iris
iris2[ ,c('Sepal.Length','Sepal.Width')] <- scale(iris2[ ,c('Sepal.Length','Sepal.Width')])
fit <- lm( Petal.Width ~ Sepal.Length*Sepal.Width, data=iris2)

Related

Does the function multinom() from R's nnet package fit a multinomial logistic regression, or a Poisson regression?

The documentation for the multinom() function from the nnet package in R says that it "[f]its multinomial log-linear models via neural networks" and that "[t]he response should be a factor or a matrix with K columns, which will be interpreted as counts for each of K classes." Even when I go to add a tag for nnet on this question, the description says that it is software for fitting "multinomial log-linear models."
Granting that statistics has wildly inconsistent jargon that is rarely operationally defined by whoever is using it, the documentation for the function even mentions having a count response and so seems to indicate that this function is designed to model count data. Yet virtually every resource I've seen treats it exclusively as if it were fitting a multinomial logistic regression. In short, everyone interprets the results in terms of logged odds relative to the reference (as in logistic regression), not in terms of logged expected count (as in what is typically referred to as a log-linear model).
Can someone clarify what this function is actually doing and what the fitted coefficients actually mean?
nnet::multinom is fitting a multinomial logistic regression as I understand...
If you check the source code of the package, https://github.com/cran/nnet/blob/master/R/multinom.R and https://github.com/cran/nnet/blob/master/R/nnet.R, you will see that the multinom function is indeed using counts (which is a common thing to use as input for a multinomial regression model, see also the MGLM or mclogit package e.g.), and that it is fitting the multinomial regression model using a softmax transform to go from predictions on the additive log-ratio scale to predicted probabilities. The softmax transform is indeed the inverse link scale of a multinomial regression model. The way the multinom model predictions are obtained, cf.predictions from nnet::multinom, is also exactly as you would expect for a multinomial regression model (using an additive log-ratio scale parameterization, i.e. using one outcome category as a baseline).
That is, the coefficients predict the logged odds relative to the reference baseline category (i.e. it is doing a logistic regression), not the logged expected counts (like a log-linear model).
This is shown by the fact that model predictions are calculated as
fit <- nnet::multinom(...)
X <- model.matrix(fit) # covariate matrix / design matrix
betahat <- t(rbind(0, coef(fit))) # model coefficients, with expicit zero row added for reference category & transposed
preds <- mclustAddons::softmax(X %*% betahat)
Furthermore, I verified that the vcov matrix returned by nnet::multinom matches that when I use the formula for the vcov matrix of a multinomial regression model, Faster way to calculate the Hessian / Fisher Information Matrix of a nnet::multinom multinomial regression in R using Rcpp & Kronecker products.
Is it not the case that a multinomial regression model can always be reformulated as a Poisson loglinear model (i.e. as a Poisson GLM) using the Poisson trick (glmnet e.g. uses the Poisson trick to fit multinomial regression models as a Poisson GLM)?

Longitudinal analysis using sampling weigths in R

I have longitudinal data from two surveys and I want to do a pre-post analysis. Normally, I would use survey::svyglm() or svyVGAM::svy_vglm (for multinomial family) to include sampling weights, but these functions don't account for the random effects. On the other hand, lme4::lmer accounts for the repeated measures, but not the sampling weights.
For continuous outcomes, I understand that I can do
w_data_wide <- svydesign(ids = ~1, data = data_wide, weights = data_wide$weight)
svyglm((post-pre) ~ group, w_data_wide)
and get the same estimates that I would get if I could use lmer(outcome ~ group*time + (1|id), data_long) with weights [please correct me if I'm wrong].
However, for categorical variables, I don't know how to do the analyses. WeMix::mix() has a parameter weights, but I'm not sure if it treats them as sampling weights. Still, this function can't support multinomial family.
So, to resume: can you enlighten me on how to do a pre-post test analysis of categorical outcomes with 2 or more levels? Any tips about packages/functions in R and how to use/write them would be appreciated.
I give below some data sets with binomial and multinomial outcomes:
library(data.table)
set.seed(1)
data_long <- data.table(
id=rep(1:5,2),
time=c(rep("Pre",5),rep("Post",5)),
outcome1=sample(c("Yes","No"),10,replace=T),
outcome2=sample(c("Low","Medium","High"),10,replace=T),
outcome3=rnorm(10),
group=rep(sample(c("Man","Woman"),5,replace=T),2),
weight=rep(c(1,0.5,1.5,0.75,1.25),2)
)
data_wide <- dcast(data_long, id~time, value.var = c('outcome1','outcome2','outcome3','group','weight'))[, `:=` (weight_Post = NULL, group_Post = NULL)]
EDIT
As I said below in the comments, I've been using lmer and glmer with variables used to calculate the weights as predictors. It happens that glmer returns a lot of problems (convergence, high eigenvalues...), so I give another look at #ThomasLumley answer in this post and others (https://stat.ethz.ch/pipermail/r-help/2012-June/315529.html | https://stats.stackexchange.com/questions/89204/fitting-multilevel-models-to-complex-survey-data-in-r).
So, my question is now if a can use participants id as clusters in svydesign
library(survey)
w_data_long_cluster <- svydesign(ids = ~id, data = data_long, weights = data_long$weight)
summary(svyglm(factor(outcome1) ~ group*time, w_data_long_cluster, family="quasibinomial"))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.875e+01 1.000e+00 18.746 0.0339 *
groupWoman -1.903e+01 1.536e+00 -12.394 0.0513 .
timePre 5.443e-09 5.443e-09 1.000 0.5000
groupWoman:timePre 2.877e-01 1.143e+00 0.252 0.8431
and still interpret groupWoman:timePre as differences in the average rate of change/improvement in the outcome over time between sex groups, as if I was using mixed models with participants as random effects.
Thank you once again!
A linear model with svyglm does not give the same parameter estimates as lme4::lmer. It does estimate the same parameters as lme4::lmer if the model is correctly specified, though.
Generalised linear models with svyglm or svy_vglm don't estimate the same parameters as lme4::glmer, as you note. However, they do estimate perfectly good regression parameters and if you aren't specifically interested in the variance components or in estimating the realised random effects (BLUPs) I would recommend just using svy_glm.
Another option if you have non-survey software for random effects versions of the models is to use that. If you scale the weights to sum to the sample size and if all the clustering in the design is modelled by random effects in the model, you will get at least a reasonable approximation to valid inference. That's what I've seen recommended for Bayesian survey modelling, for example.

How to obtain R^2 for robust mixed effect model (rlmer command; robustlmm)?

I estimated a robust mixed effect model with the rlmercommand from the robustlmmpackage. Is there a way to obtain the marginal and conditional R^2 values?
Just going to answer that myself. I could not find a package or rather a function in R that is equivalent to e.g. r.squaredGLMM in the case of lmerMod objects but I found a quick workaround that works with rlmerMod objects. Basically you just have to extract the variance components for the fixed effects, random effects and residuals and then manually calcualte the marginal and conditional R^2 based on the formula provided by Nakagawa & Schielzeth (2013).
library(robustlmm)
library(insight)
library(lme4)
data(Dyestuff, package = "lme4")
robust.model <- rlmer(Yield ~ 1|Batch, data=Dyestuff)
var.fix <- get_variance_fixed(robust.model)
var.ran <- get_variance_random(robust.model)
var.res <- get_variance_residual(robust.model)
R2m = var.fix/(var.fix+var.ran+var.res)
R2c = (var.fix+var.ran)/(var.fix+var.ran+var.res)
Literature:
Nakagawa, S. and Schielzeth, H. (2013), A general and simple method for obtaining R2 from generalized linear mixed‐effects models. Methods Ecol Evol, 4: 133-142. doi:10.1111/j.2041-210x.2012.00261.x

Is there a function to obtain pooled standardized coefficient of linear equation modelling related to analysis of a MI database?

I replaced missing data by using MICE package.
I realized the linear equation modelling by using : summary(pool(with(imputed_base_finale,lm(....)))
I tried to obtain standardized coefficients by using the function lm.beta, however it doesn't work.
lm.beta (with(imputed_base_finale,lm(...)))
Error in lm.beta(with(imputed_base_finale, lm(...)))
object has to be of class lm
How can I obtain these standardized coefficients ?
Thank you for you help!!!
lm.scale works on lm objects and adds standardized coefficients. This however was not build to work on mira objects.
Have you considered using scale on the data before you build a model, effectively getting standardized coefficients?
Instead of standardizing the data before imputation, you could also apply it with post processing during imputation.
I am not sure which of these would be the most robust option.
require(mice)
# non-standardized
imp <- mice(nhanes2)
pool(with(imp,lm(chl ~ bmi)))
# standardized
imp_scale <- mice(scale(nhanes2[,c('bmi','chl')]))
pool(with(imp_scale,lm(chl ~ bmi)))

Logistic Regression Using R

I am running logistic regressions using R right now, but I cannot seem to get many useful model fit statistics. I am looking for metrics similar to SAS:
http://www.ats.ucla.edu/stat/sas/output/sas_logit_output.htm
Does anyone know how (or what packages) I can use to extract these stats?
Thanks
Here's a Poisson regression example:
## from ?glm:
d.AD <- data.frame(counts=c(18,17,15,20,10,20,25,13,12),
outcome=gl(3,1,9),
treatment=gl(3,3))
glm.D93 <- glm(counts ~ outcome + treatment,data = d.AD, family=poisson())
Now define a function to fit an intercept-only model with the same response, family, etc., compute summary statistics, and combine them into a table (matrix). The formula .~1 in the update command below means "refit the model with the same response variable [denoted by the dot on the LHS of the tilde] but with only an intercept term [denoted by the 1 on the RHS of the tilde]"
glmsumfun <- function(model) {
glm0 <- update(model,.~1) ## refit with intercept only
## apply built-in logLik (log-likelihood), AIC,
## BIC (Bayesian/Schwarz Information Criterion) functions
## to models with and without intercept ('model' and 'glm0');
## combine the results in a two-column matrix with appropriate
## row and column names
matrix(c(logLik(glm.D93),BIC(glm.D93),AIC(glm.D93),
logLik(glm0),BIC(glm0),AIC(glm0)),ncol=2,
dimnames=list(c("logLik","SC","AIC"),c("full","intercept_only")))
}
Now apply the function:
glmsumfun(glm.D93)
The results:
full intercept_only
logLik -23.38066 -26.10681
SC 57.74744 54.41085
AIC 56.76132 54.21362
EDIT:
anova(glm.D93,test="Chisq") gives a sequential analysis of deviance table containing df, deviance (=-2 log likelihood), residual df, residual deviance, and the likelihood ratio test (chi-squared test) p-value.
drop1(glm.D93) gives a table with the AIC values (df, deviances, etc.) for each single-term deletion; drop1(glm.D93,test="Chisq") additionally gives the LRT test p value.
Certainly glm with a family="binomial" argument is the function most commonly used for logistic regression. The default handling of contrasts of factors is different. R uses treatment contrasts and SAS (I think) uses sum contrasts. You can look these technical issues up on R-help. They have been discussed many, many times over the last ten+ years.
I see Greg Snow mentioned lrm in 'rms'. It has the advantage of being supported by several other functions in the 'rms' suite of methods. I would use it , too, but learning the rms package may take some additional time. I didn't see an option that would create SAS-like output.
If you want to compare the packages on similar problems that UCLA StatComputing pages have another resource: http://www.ats.ucla.edu/stat/r/dae/default.htm , where a large number of methods are exemplified in SPSS, SAS, Stata and R.
Using the lrm function in the rms package may give you the output that you are looking for.

Resources