stepAIC handling of multinom models - r

I am seeing some weird behavior with the stepAIC function in the MASS package when dealing with multinomial logistic models. Here is some sample code:
library(nnet)
library(MASS)
example("birthwt")
race.model <- multinom(race ~ smoke, bwt)
race.model2 <- stepAIC(race.model, k = 2)
In this case race.model and race.model2 have identical terms; stepAIC did not prune anything. However, I need to query certain attributes of the models, and I get an error with race.model2:
formula(race.model)[2]
returns race() but
formula(race.model2)[2]
gives the error:
Error in terms.formula(newformula, specials = names(attr(termobj, "specials"))) :
invalid model formula in ExtractVars
This behavior only seems to occur when stepAIC does not remove terms from the model. In the following code, terms are removed by stepAIC, and both models can be properly queried:
race.big <- multinom(race ~ ., bwt)
race.big2 <- stepAIC(race.big, k = 2)
formula(race.big)[2]
formula(race.big2)[2]
Any ideas about what is going wrong here?

Related

Compare regression with robust standard errors to null using Wald's Test in R

I am running a regression model that looks like this:
wwMLR <- lm(contAOMIdiff ~ PHQ9 + KVIQtot, data = wwMeanWide4)
Having used check_heteroscedasticity(wwMLR) from the Performance package I can see that the regression model violates the assumption of homoscedasticity. Due to this I have built a model with robust standard errors shown below:
library(estimatr)
wwMLR_hc3 <- lm_robust(formula = contAOMIdiff ~ PHQ9 + KVIQtot, data = wwMeanWide4,
se_type = "HC3", alpha = 0.0482)
What I would like to do now is compare this regression model to a null using Wald's Test. The null model looks like the below:
wwnull_hc3 <- lm_robust(formula = contAOMIdiff ~ 1, data = wwMeanWide4,
se_type = "HC3", alpha = 0.0482)
When I try to compare these using a Wald's Test:
library(lmtest)
waldtest(wwMLR_hc3, wwnull_hc3, vcov = vcovHC)
I get an error:
Error in eval(predvars, data, env) : object 'contAOMIdiff' not found
contAOMIdiff is my response variable in the regression. I am not sure why it can't be found but I am assuming this may be a compatibility issue between the lm_robust model type and the waldtest() function.
If anyone has any ideas on how I can get this to work, or an alternate way to run a Wald's Test on these two models I would be very grateful.
I have found a similar question here, which has not been answered: R Wald test for cluster robust se's

Marginal Effect from svyglm object with a subsample in R

I need to compute marginal effects out of a Generalized Linear Model (family=Poisson) estimated via the svyglm function from the R package survey for a subsample.
First, I declared the survey desgin with:
myDesisgn = svydesign(id=data$id, strata=data$strata, weights=data$sw, data=data)
Second, I estimated my model as:
fit = svyglm(y~ x1 +x2, design=myDesisgn, data=data, subset= x3 == 1, family= poisson(link = "log"))
Finally, when I want to get the Average Marginal Effect for, let's say, x1 I run:
summary(margins(fit, variables = "x1", design=myDesisgn))
... but I get the following error message:
"Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'summary': 'x' and 'w' must have the same length"
Running the following does not work either:
summary(margins(fit, variables = "x1", design=myDesisgn, subset=x3==1))
Solution:
summary(margins(fit, variables = "x1", design=myDesisgn[myDesisgn$variables$x3 == 1]))
Subsetting complex surveys leads to problems in the error estimation. When interested in a parameter for a specific subsample, one should use the desired subsample to estimate the parameter of interest and the full sample for the estimation of its error.
For example, svyglm(y~x, data=data, subset = z == 1) does exactly this (beta_hat estimated using observations for which z=1 and se(beta_hat) using the full sample).
Subsetting a svy design is possible and it keeps the original design information about number of clusters, strata. The code shown above is the "manual" way of doing so. Alternative one can directly rely on the subset.survey.design {survey} function.
myDesign_subset <- subset(myDesign, data$x3 == 1)
The two methods are equivalent and produce correct z-stats.

VIF function returning error message

I'm trying to perform a VIF on a multivariate regression model, but when I ran the vif function in r I get an error.
Code and error below:
vif(analys3.lm)
Error in if (names(coefficients(mod)[1]) == "(Intercept)") { :
argument is of length zero
The intercept is still there in my model though.
analys3.lm<- lm(formula = cbind(df$col1,
df$col2) ~
df$col3+ df$col4,
data = df)
Apparently, vif can't deal with an mlm object (multiple DVs). Run separate models and check them.

Increasing number of iterations for lme4 version 1.1-7

I am encountering a problem with iterations whilst trying to do a mixed effects binomial regression using the glmer function of the package lme4 version 1/1-7.
When I run the model using the code:
model <- glmer(Clinical.signs ~ cloacal +(1|Chicken_ID), family = binomial,
data = viral_load_9)
I get the warning:
Error: pwrssUpdate did not converge in (maxit) iterations
When I follow the advice given here
Using the code:
model <- glmer(Clinical.signs ~ cloacal +(1|Chicken_ID), family = binomial,
data = viral_load_9,
control=glmerControl(optimizer="bobyqa",
optCtrl = list(maxfun = 100000)))
I still have the exact same error message.
Any suggestions on what might be wrong with my code will be gratefully received.
-----------------------------------------------------------------
Following the advice from aosmith (Thanks for the sugggestion!) I am including the data and the code so as others might be able to replicate the results I am getting. Note that the code worked fine for variable "oral" and produced "model_1", but when I ran it with the variable "cloacal", I got the error message as noted above.
Chicken_ID <- c(44,44,45,45,46,46,47,47,48,48,49,49,50,50,51,51,52,52,53,55,55)
oral <- c(-0.4827578,-0.1845839,-1.3772797,-0.7809318,-0.4827578,1.6044598,0.1135901,0.411764,-0.1845839,1.6044598,-0.1845839,1.6044598,-1.6754536,0.709938,-1.0791057,0.709938,0.1135901,1.0081119,0.411764,-1.6754536,-0.1845839)
cloacal <- c(-0.9833258,0.450691,-1.1267275,0.7374944,-1.1267275,1.0242977,-1.5569325,1.0242977,0.3072893,1.0242977,-0.1229157,1.1676994,-1.5569325,0.5940927,0.450691,0.3072893,-1.1267275,0.7374944,0.1638876,-1.5569325,1.1676994)
clinical.signs <- c("YES","YES","NO","YES","NO","YES","NO","YES","YES","YES","YES","YES","NO","YES","YES","YES","NO","YES","YES","NO","YES")
clinical.signs <- factor(clinical.signs)
viral_load <- data.frame(Chicken_ID, oral, cloacal, clinical.signs)
library(lme4)
model_1 <- glmer(clinical.signs ~ oral +(1|Chicken_ID),
family = binomial, data = viral_load)
summary(model_1)
model_2 <- glmer(clinical.signs ~ cloacal +(1|Chicken_ID),
family = binomial, data = viral_load)
It may not be a problem with your code. See this Q on Cross-Validated.
Some things you can do to prevent convergence failures:
Rescale continuous variables
Try different approximators using glmerControl()
Check your data for sparse data. If there aren't sufficient outcomes or observations at certain levels of predictors the model may fail to converge.

Binomial GLM using caret train

I would like to fit a Binomial GLM on a certain dataset. Using glm(...,family=binomial) everything works fine however I would like to do it with the caret train() function. Unfortunately I get an unexpected error which I cannot get rid of.
library("marginalmodelplots")
library("caret")
MissUSA <- MissAmerica08[,c(2,4,6,7,8,10)]
formula <- cbind(Top10, 9-Top10)~.
glmfit <- glm(formula=formula, data=MissUSA, family=binomial())
trainfit <-train(form=formula,data=MissUSA,trControl=trainControl(method = "none"), method="glm", family=binomial())
The error I get is:
"Error : nrow(x) == length(y) is not TRUE"
caret doesn't support grouped data for a binomial outcome. You can expand the data into a factor variable that is binary (Bernoulli) data. Also, if you do that, you do not need to use family=binomial() in the call to train.
Max

Resources