Margins Package error using quadratic and interaction terms - r

I have code which uses the margins command in Stata and I am trying to replicate it in R using the "margins" package found here and on cran.
I keep getting the error:
marg1<-margins(reg2)
Error in names(classes) <- clean_terms(names(classes)) : 'names' attribute [18] must be the same length as the vector [16]"
A minimum reproducible example is show below:
install.packages(margins)
library(margins)
mod1 <- lm(log(mpg) ~ vs + cyl + hp + vs*hp + I(vs*hp*hp) + wt + I(hp*hp), data = mtcars)
(marg1 <- margins(mod1))
summary(marg1)
I need vs to be a dummy variable interacted with both a quadratic term and a normal interaction.
Does anyone know what I am doing wrong or if there is a way around this?

Your model specification is a bit confusing. For example, vs*hp introduces 3 variables: i) vs, ii) hp and iii) interaction vs and hp. As a result, hp appears twice in the formula you provided. You can simplify massively! Try this for example (I think it is what you want):
mtcars$hp2 = mtcars$hp^2
mod1 <- lm(log(mpg) ~ cyl + wt + vs*hp + vs*hp2, data = mtcars)
summary(mod1) # With this you can check that the model you specified is what you want
(marg1 <- margins(mod1)) # The error disappeared.
summary(marg1)
In general, I would recommend you to avoid I() in formula specifications, as it often gives rise to such errors when not treated with enough care (though sometimes one cannot avoid it). Good luck!

Related

fixest::feols and ggeffects::ggeffect not working together in R

I'm having a hard time getting a fixest object to play nicely with ggeffects in R, when fixed effects are included.
When I run the following code:
m <- feols(mpg ~ disp + gear + hp | cyl, mtcars,
cluster = c("am", "cyl"))
summary(m)
marg1 <- ggeffect(m, terms = c("disp"))
I get an error reading:
Can't compute marginal effects, 'effects::Effect()' returned an error.
Reason: non-conformable arguments
You may try 'ggpredict()' or 'ggemmeans()'.
However, there are no problems when I remove the fixed effects term / include it without using the pipe:
m <- feols(mpg ~ disp + gear + hp + cyl, mtcars,
cluster = c("am", "cyl"))
summary(m)
marg1 <- ggeffect(m, terms = c("disp"))
ggpredict also returns an error on my data (Could not compute variance-covariance matrix of predictions. No confidence intervals are returned.) but I am unable to replicate that same error using the toy data.

Error when using regr() command: undefined columns selected

I get the following error when trying to run the regr() command from the yhat package:
Error in `[.data.frame`(new.data, , c(DV, IVx)) :
undefined columns selected
Here is the code I'm using:
DEregr_model <- lm(TotalBiomass ~ propnC + propnV + propnR + I(propnC^2) + I(propnV^2) + propnC:propnV + propnV:propnR + propnV:I(propnC^2), DE_model)
DEregrout <- regr(DEregr_model)
Why is this function returning an error?
I think I can demonstrate my suspicion expressed in the comments with this MCVE:
> lm.gas <- lm( mpg ~ hp + disp +hp:I(disp^2), data= mtcars)
> lm.gas
Call:
lm(formula = mpg ~ hp + disp + hp:I(disp^2), data = mtcars)
Coefficients:
(Intercept) hp disp hp:I(disp^2)
3.562e+01 -4.168e-02 -5.879e-02 3.151e-07
> install.packages("yhat")
also installing the dependency ‘yacca’
> library(yhat)
> regr(lm.gas)
Error in `[.data.frame`(new.data, , c(DV, IVx)) :
undefined columns selected
In addition: Warning message:
In regr(lm.gas) : NAs introduced by coercion
I suspect that the I(.) terms are not being saved in the result of the lm call in a manner that the regr function is able to handle.
The work around would be to calculate the values of the squared variables with separate names in an augmented dataset.
Based on the comments, I figured out the issue. The interaction terms (i.e., I(propnV^2)) weren't being read correctly by the function. So I added additional columns in my data frame with the squared values, so that the model was reading these terms as individual values, not trying to separate them. Corrected code is below:
## make new columns for interaction effect of seeding rate propn
DE$propnC2 <- DE$propnC^2
DE$propnV2 <- DE$propnV^2
DE$propnR2 <- DE$propnR^2
## run lm model with adjusted terms
DEregr_model <- lm(TotalBiomass ~ propnC + propnV + propnR + propnC2 + propnV2 + propnC:propnV + propnV:propnR + propnV:propnC2, DE_model)
DEregrout <- regr(DEregr_model)
The regr() function now runs without error, thanks everyone for your input!

Limit on the Number of Variables for bestglm Function

I am trying to run the bestglm function in R for subset selection and the run fails immediately if I use more than 15 variables in the function. I attached some sample code below (I know these models have far too many variables for this dataset, I am just including these models here as an example):
cars.df = data.frame(mtcars)
cars.df
resp.var = cars.df$mpg
ind.matrix.15 = model.matrix(mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb + disp:wt + drat:wt + qsec:am + gear:hp + cyl:disp + drat:gear, data = cars.df)[, -1]
matrix.xy.15 = data.frame(ind.matrix.15, y = as.matrix(resp.var))
bestglm(Xy = matrix.xy.15, family = gaussian(link = 'log'), nvmax = 15)
ind.matrix.16 = model.matrix(mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb + disp:wt + drat:wt + qsec:am + gear:hp + cyl:disp + drat:gear + disp:hp, data = cars.df)[, -1]
matrix.xy.16 = data.frame(ind.matrix.16, y = as.matrix(resp.var))
bestglm(Xy = matrix.xy.16, family = gaussian(link = 'log'), nvmax = 16)
The first bestglm function runs fine, but when I add an additional variable for a total of 16 features, the second bestglm function instantly produces this error message: p = 16. must be <= 15 for GLM.
Changing the method argument to a simpler algorithm such as backward rather than the default exhaustive does not make the error go away.
Is this just a limitation of the bestglm function, or is there an argument I can change to allow more than 15 features.
As #RomanLuštrik says, this is a hard-coded constraint in bestglm, presumably because 15 predictors means there are 2^15 = 32768 candidate models, and one has to stop somewhere ... as far as I can see there is no way around this constraint when running a GLM. (Roman's suggestion of RequireFullEnumerationQ=FALSE doesn't work, because the leaps-and-bound algorithm is only available for linear models, not GLMs.)
One possible strategy (not fully explored here) would be to fit the linear model exhaustively with leaps-and-bounds, save a large number of the top models (say TopModels=1000) and then re-evaluate the top models with your preferred variance structure ... this doesn't work directly in leaps, but can be hacked as follows:
leaps.obj <- leaps:::leaps.setup(matrix.xy.16,y=cars.df$mpg,nvmax=16,
nbest=10000)
bb <- leaps:::leaps.exhaustive(leaps.obj, really.big=TRUE)
but I don't know (and it seems like a lot of work) to figure out how to re-evaluate these models with a log-link Gaussian.
You might be able to get the glmulti package to work (it offers both method="h" for full enumeration and method="g" for a genetic algorithm), but so far I haven't managed to overcome some Java errors ...
Unfortunately, the J Stat Software article describing glmulti shows that this method has some of the same constraints:
For performance, the Java classes encode formulas as compact bit strings. Currently two integers (32 bits each) are used for main effects, and two long integers (128 bits) are used for each category of interaction terms (factor:factor, covariate:covariate, and factor:covariate),to encode models. This means that there can be at most 32 factors and 32 covariates, and, if including interactions, at most 128 interactions of each category. The latter constraint necessitates that, if x is the number of factors and y the number of covariates:x <16y <16xy <128

removing covariates from a linear mixed model using update

I'm newish to R. I have a linear mixed model with several predictors and I want to test the significance of each of them. I know that I could use lmerTest but my co-authors want me to do a likelihood ratio test for each predictor instead. I would like to use the update function to get a series of submodels that omit each predictor in turn. I tried the following
data(mtcars)
h=lmer(mpg ~ 1 + cyl + disp + hp + drat + (1|carb), data=mtcars)
predvars=c("cyl","disp","hp","drat")
for (i in predvars){
modelform=update(as.formula(paste0("h, . ~ . -",i)))
print(summary(modelform))
}
I got the following error
Error in parse(text = x, keep.source = FALSE) :
:1:2: unexpected ','
1: h,
^
I also tried using lapply
Fits=lapply(predvars, function(x) {update(h, .~.-i, list(i=as.name(x)))})
names(Fits)=predvars
which doesn't actually update the model, it just refits the full model i times. What am I doing wrong? Thanks.
Your first attempt generates an error because you put h inside as.formula. Do:
modelform <- update(h, as.formula(paste0(". ~ . -",i)))

Predicting with log transformation in formula

Imagine a simple model
fit <- lm(log(mpg) ~ cyl + hp, data = mtcars)
to make predictions we have to take exponent
exp(predict(fit, newdata = mtcars))
Is there any better way to do this then apply it manually? Documentation of ?predict does not give any helpful hints on this.
I guess, the easiest way would be to extract the transforming function from the formula
> formula(fit)
log(mpg) ~ cyl + hp
How can I check if any transformation was applied to the left hand side of formula and if there was any transformation, to extract name of the function?
I'm not sure if it helps, but you could test it in this manner:
Convert it into character and check if it starts with log/sqrt and the likes:
startsWith(as.character((formula(fit))[2]), "log")
The answer is true of false:
[1] TRUE
Maybe this will help you automate your solution?

Resources