Intercept of random effect - library(sommer) - r

When I run this mixed model, I get all of the statistics I need.
library(sommer)
data(example)
#Model without intercept - OK
ans1 <- mmer2(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
data=example, silent = TRUE)
summary(ans1)
ans1$u.hat #Random effects
However if I try to get the intercept to random effects, like in the R library lme4, I get a error like:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
#Model with intercept
ans2 <- mmer2(Yield~Env,
random= ~ 1+Name + Env:Name,
rcov= ~ units,
data=example, silent = TRUE)
summary(ans2)
ans2$u.hat #Random effects
How can I overcome that?

Your model:
ans1 <- mmer2(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
data=example, silent = TRUE)
is equivalent to:
ans1.lmer <- lmer(Yield~Env + (1|Name) + (1|Env:Name),
data=example)
using lme4. Please notice that lme4 uses the notation (x|y) to specify if there is for example different intercepts (x term) for each level of the second term (y term) which is a random regression model. If you specify:
ans2.lmer <- lmer(Yield~Env + (Env|Name),
data=example)
you get three variance components, one for each of the 3 levels in the Env term. The equivalent in sommer is not a random regression but a heterogeneous variance model using the diag() functionality:
ans2 <- mmer2(Yield~Env,
random= ~ diag(Env):Name,
rcov= ~ units,
data=example, silent = TRUE)
## or in sommer >=3.7
ans2 <- mmer(Yield~Env,
random= ~ vs(ds(Env),Name),
rcov= ~ units,
data=example, silent = TRUE)
The first 2 models above are equivalent because both models assume there's no different intercepts, whereas the last two models tackle the same problem but with two different approaches that are not exactly the same; random regression versus heterogeneous variance model.
In short, sommer doesn't have random regression implemented yet so you cannot use random intercepts in sommer like you do in lme4, but instead use a heterogeneous variance models.
Cheers,

I know it is not an elegant solution, but how about adding intercept to the data, so you can easily use it in the model?
What I mean is:
example <- cbind(example, inter=1)
ans2 <- mmer2(Yield~Env,
random= ~ Name + Env:Name + inter, #here inter are 1's
rcov= ~ units,
data=example, silent = TRUE)
summary(ans2)
ans2$u.hat

Related

R: modeling on residuals

I have heard people talk about "modeling on the residuals" when they want to calculate some effect after an a-priori model has been made. For example, if they know that two variables, var_1 and var_2 are correlated, we first make a model with var_1 and then model the effect of var_2 afterwards. My problem is that I've never seen this done in practice.
I'm interested in the following:
If I'm using a glm, how do I account for the link function used?
What distribution do I choose when running a second glm with var_2 as explanatory variable?
I assume this is related to 1.
Is this at all related to using the first models prediction as an offset in the second model?
My attempt:
dt <- data.table(mtcars) # I have a hypothesis that `mpg` is a function of both `cyl` and `wt`
dt[, cyl := as.factor(cyl)]
model <- stats::glm(mpg ~ cyl, family=Gamma(link="log"), data=dt) # I want to model `cyl` first
dt[, pred := stats::predict(model, type="response", newdata=dt)]
dt[, res := mpg - pred]
# will this approach work?
model2_1 <- stats::glm(mpg ~ wt + offset(pred), family=Gamma(link="log"), data=dt)
dt[, pred21 := stats::predict(model2_1, type="response", newdata=dt) ]
# or will this approach work?
model2_2 <- stats::glm(res ~ wt, family=gaussian(), data=dt)
dt[, pred22 := stats::predict(model2_2, type="response", newdata=dt) ]
My first suggested approach has convergence issues, but this is how my silly brain would approach this problem. Thanks for any help!
In a sense, an ANCOVA is 'modeling on the residuals'. The model for ANCOVA is y_i = grand_mean + treatment_i + b * (covariate - covariate_mean_i) + error for each treatment i. The term (covariate - covariate_mean_i) can be seen as the residuals of a model with covariate as DV and treatment as IV.
The following regression is equivalent to this ANCOVA:
lm(y ~ treatment * scale(covariate, scale = FALSE))
Which applied to the data would look like this:
lm(mpg ~ factor(cyl) * scale(wt, scale = FALSE), data = mtcars)
And can be turned into a glm similar to the one you use in your example:
glm(mpg ~ factor(cyl) * scale(wt, scale = FALSE),
family=Gamma(link="log"),
data = mtcars)

Residual modeling for mixed models: Any other package than nlme?

Aside from R function nlme::lme(), I'm wondering how else I can model the Level-1 residual variance-covariance structure?
ps. My search showed I could possibly use glmmTMB package but it seems it is not about Level-1 residuals but random-effects themselves (see below code).
glmmTMB::glmmTMB(y ~ times + ar1(times | subjects), data = data) ## DON'T RUN
nlme::lme (y ~ times, random = ~ times | subjects,
correlation = corAR1(), data = data) ## DON'T RUN
glmmTMB can effectively be used to model level-1 residuals, by adding an observation-level random effect to the model (and if necessary suppressing the level-1 variance via dispformula ~ 0. For example, comparing the same fit in lme and glmmTMB:
library(glmmTMB)
library(nlme)
data("sleepstudy" ,package="lme4")
ss <- sleepstudy
ss$times <- factor(ss$Days) ## needed for glmmTMB
I initially tried with random = ~Days|Subject but neither lme nor glmmTMB were happy (overfitted):
lme1 <- lme(Reaction ~ Days, random = ~1|Subject,
correlation=corAR1(form=~Days|Subject), data=ss)
m1 <- glmmTMB(Reaction ~ Days + (1|Subject) +
ar1(times + 0 | Subject),
dispformula=~0,
data=ss,
REML=TRUE,
start=list(theta=c(4,4,1)))
Unfortunately, in order to get a good answer with glmmTMB I did have to tweak the starting values ...

pglm fixed effect Poisson model with offset

I would like to run a fixed effect Poisson model with panel data in R, with a count variable as the outcome, and the log of the population as an offset variable (i.e. modeling a rate). However, using the example dataset below, I get the same results when I run the two models m1 and m2. I'd be grateful if anyone could point out what I'm doing wrong in terms of specifying m1, or offer a solution using a different package? Many thanks
library(AER)
data(Fatalities)
library(pglm)
m1 <- pglm(fatal ~ beertax + as.factor(year) + offset(log(pop)), index = c("state"), model = "within", effect="individual", data = Fatalities, family = poisson)
summary(m1)
m2 <- pglm(fatal ~ beertax + as.factor(year), index = c("state"), model = "within", effect="individual", data = Fatalities, family = poisson)
summary(m2)
One direct solution is by using glm instead, with dummy variables for year and state:
fit_model <- glm(fatal ~ beertax + as.factor(year) + as.factor(state) + offset(log(pop)) , data = Fatalities, family = poisson)
which gives the same result in STATA (at least using this command: xtpoisson fatal beertax year1-year7, fe offset(log_pop)).
This approach is not feasible when the number of states is reasonably large. In CRAN, there is the novel fixest package (https://cran.r-project.org/web/packages/fixest/index.html) that provides a fast solution with robust standard errors.

How do I code a piecewise mixed-model in lme in R?

I followed this example for running a piecewise mixed model using lmer, and it works very well. However, I am having trouble translating the model to lme because I need to deal with heteroscedasticity, and lmer doesn’t have that ability.
Code to reproduce the problem is here. I included details about the experimental design in the code if you think it’s necessary to answer the question.
Here is the model without the breakpoint:
linear <- lmer(mass ~ lat + (1 | pop/line), data = df)
And here is how I run it with the breakpoint:
bp = 30
b1 <- function(x, bp) ifelse(x < bp, x, 0)
b2 <- function(x, bp) ifelse(x < bp, 0, x)
breakpoint <- lmer(mass ~ b1(lat, bp) + b2(lat, bp) + (1 | pop/line), data = df)
The problem is that I have pretty severe heteroscedasticity. As far as I understand, that means I should be using lme from the nlme package. Here is the linear model in lme:
ctrl <- lmeControl(opt='optim')
linear2 <- lme(mass ~ lat , random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
And this is the breakpoint model that is, well, breaking:
breakpoint2 <- lme(mass ~ b1(lat, bp) + b2(lat, bp), random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
Here is the error message:
Error in model.frame.default(formula = ~pop + mass + lat + bp + line, : variable lengths differ (found for 'bp')
How can I translate this lovely breakpoint model from lmer to lme? Thank you!
Looks like lme doesn't like it when you use variables in your formula that aren't in the data.frame you are fitting your model on. One option would be to build your formula first then pass it to lme. For example
myform <- eval(substitute(mass ~ b1(lat, bp) + b2(lat, bp), list(bp=bp)))
breakpoint2 <- lme(myform, random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
The eval()/substitute() is just to swap out the bp in your formula with the value of the variable bp
Or if bp were always 30, you would just put that directly in the formula
breakpoint2 <- lme(mass ~ b1(lat, 30) + b2(lat, 30), random=~1|pop/line, na.action = na.exclude, data=df, control = ctrl, weights=varIdent(form=~1|pop))
and that would work as well.

lme4 random effect structure with dredge

I have constructed an lme4 model for model selection in dredge but I am having trouble aligning the random effects with the relevant fixed effects.The structure of my full model is as follows.
fullModel<-glmer(y ~x1 + x2 + (0+x1|Year) + (0+x1|Country) + (0+x2|Year) + (0+x2|Country) + (1 | Year) +(1|Country), family=binomial('logit'),data = alldata)
In this model structure, model selection in dredge produces three combinations of fixed effects, i.e. x1, x2, and x1+x2, however the random effect structure remains the same as in the full model, such that even when fixed effect is only x1, the random effect will include (0+x2|Year) + (0+x2|Country). For example the model with only x1 as the fixed effect, will still have x2 within the random effects structure as follows.
y ~x1 + (0+x1|Year) + (0+x1|Country) + (0+x2|Year) +(0+x2|Country) + (1 | Year) +(1|Country), family=binomial('logit')
Is there a way to configure dredge not to select random effects that have other fixed effects specified in them? I have about x1….x50.
You cannot do that out-of-box as dredge currently omits all (x|g) expressions, but you can make a "wrapper" around (g)lmer that replaces the "|" terms in the formula with something else (e.g. re(x,g)), so that dredge thinks these are fixed effects. Example:
glmerwrap <-
function(formula) {
cl <- origCall <- match.call()
cl[[1L]] <- as.name("glmer") # replace 'lmerwrap' with 'glmer'
# replace "re" with "|" in the formula:
f <- as.formula(do.call("substitute", list(formula, list(re = as.name("|")))))
environment(f) <- environment(formula)
cl$formula <- f
x <- eval.parent(cl) # evaluate modified call
# store original call and formula in the result:
x#call <- origCall
attr(x#frame, "formula") <- formula
x
}
formals(glmerwrap) <- formals(lme4::glmer)
Following example(glmer):
# note the use of re(x,group) instead of (x|group)
(fm <- glmerwrap(cbind(incidence, size - incidence) ~ period +
re(1, herd) + re(1, obs), family = binomial, data = cbpp))
Now,
dredge(fm)
manipulates both fixed and random effects.

Resources