lme4 random effect structure with dredge - r

I have constructed an lme4 model for model selection in dredge but I am having trouble aligning the random effects with the relevant fixed effects.The structure of my full model is as follows.
fullModel<-glmer(y ~x1 + x2 + (0+x1|Year) + (0+x1|Country) + (0+x2|Year) + (0+x2|Country) + (1 | Year) +(1|Country), family=binomial('logit'),data = alldata)
In this model structure, model selection in dredge produces three combinations of fixed effects, i.e. x1, x2, and x1+x2, however the random effect structure remains the same as in the full model, such that even when fixed effect is only x1, the random effect will include (0+x2|Year) + (0+x2|Country). For example the model with only x1 as the fixed effect, will still have x2 within the random effects structure as follows.
y ~x1 + (0+x1|Year) + (0+x1|Country) + (0+x2|Year) +(0+x2|Country) + (1 | Year) +(1|Country), family=binomial('logit')
Is there a way to configure dredge not to select random effects that have other fixed effects specified in them? I have about x1….x50.

You cannot do that out-of-box as dredge currently omits all (x|g) expressions, but you can make a "wrapper" around (g)lmer that replaces the "|" terms in the formula with something else (e.g. re(x,g)), so that dredge thinks these are fixed effects. Example:
glmerwrap <-
function(formula) {
cl <- origCall <- match.call()
cl[[1L]] <- as.name("glmer") # replace 'lmerwrap' with 'glmer'
# replace "re" with "|" in the formula:
f <- as.formula(do.call("substitute", list(formula, list(re = as.name("|")))))
environment(f) <- environment(formula)
cl$formula <- f
x <- eval.parent(cl) # evaluate modified call
# store original call and formula in the result:
x#call <- origCall
attr(x#frame, "formula") <- formula
x
}
formals(glmerwrap) <- formals(lme4::glmer)
Following example(glmer):
# note the use of re(x,group) instead of (x|group)
(fm <- glmerwrap(cbind(incidence, size - incidence) ~ period +
re(1, herd) + re(1, obs), family = binomial, data = cbpp))
Now,
dredge(fm)
manipulates both fixed and random effects.

Related

Intercept of random effect - library(sommer)

When I run this mixed model, I get all of the statistics I need.
library(sommer)
data(example)
#Model without intercept - OK
ans1 <- mmer2(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
data=example, silent = TRUE)
summary(ans1)
ans1$u.hat #Random effects
However if I try to get the intercept to random effects, like in the R library lme4, I get a error like:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
#Model with intercept
ans2 <- mmer2(Yield~Env,
random= ~ 1+Name + Env:Name,
rcov= ~ units,
data=example, silent = TRUE)
summary(ans2)
ans2$u.hat #Random effects
How can I overcome that?
Your model:
ans1 <- mmer2(Yield~Env,
random= ~ Name + Env:Name,
rcov= ~ units,
data=example, silent = TRUE)
is equivalent to:
ans1.lmer <- lmer(Yield~Env + (1|Name) + (1|Env:Name),
data=example)
using lme4. Please notice that lme4 uses the notation (x|y) to specify if there is for example different intercepts (x term) for each level of the second term (y term) which is a random regression model. If you specify:
ans2.lmer <- lmer(Yield~Env + (Env|Name),
data=example)
you get three variance components, one for each of the 3 levels in the Env term. The equivalent in sommer is not a random regression but a heterogeneous variance model using the diag() functionality:
ans2 <- mmer2(Yield~Env,
random= ~ diag(Env):Name,
rcov= ~ units,
data=example, silent = TRUE)
## or in sommer >=3.7
ans2 <- mmer(Yield~Env,
random= ~ vs(ds(Env),Name),
rcov= ~ units,
data=example, silent = TRUE)
The first 2 models above are equivalent because both models assume there's no different intercepts, whereas the last two models tackle the same problem but with two different approaches that are not exactly the same; random regression versus heterogeneous variance model.
In short, sommer doesn't have random regression implemented yet so you cannot use random intercepts in sommer like you do in lme4, but instead use a heterogeneous variance models.
Cheers,
I know it is not an elegant solution, but how about adding intercept to the data, so you can easily use it in the model?
What I mean is:
example <- cbind(example, inter=1)
ans2 <- mmer2(Yield~Env,
random= ~ Name + Env:Name + inter, #here inter are 1's
rcov= ~ units,
data=example, silent = TRUE)
summary(ans2)
ans2$u.hat

How to correctly pass formulas associated with a variable name with random effects into fitted regression models in `R`?

I currently have a problem in that I have to pre-specify my formulas before sending them into a regression function. For example, using the stan_gamm4 function in R, we have the following example:
dat <- mgcv::gamSim(1, n = 400, scale = 2) ## simulate 4 term additive truth
## Now add 20 level random effect `fac'...
dat$fac <- fac <- as.factor(sample(1:20, 400, replace = TRUE))
dat$y <- dat$y + model.matrix(~ fac - 1) %*% rnorm(20) * .5
br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data = dat, random = ~ (1 | fac),
chains = 1, iter = 200) # for example speed
Now, because the formula and random formula were specified explicitly, then if we call:
br$call$random
> ~(1 | fac)
We are able to retrieve the form of the random effects.
NOW, let us then leave everything the same, BUT use an expression for the random part:
formula.rand <- as.formula( '~(1|fac)' )
Then, if we did the same thing before, but with formula.rand taking the place, we have:
br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data = dat, random = formula.rand,
chains = 1, iter = 200) # for example speed
BUT NOW: we have that:
br$call$random
> formula.rand
Instead of the original. A lot of bayesian packages rely on accessing br$call$random, so is there a way to use a variable for formula, have it pass in, AND retain the original relation when calling br$call$random? Thanks.
While I haven't used Stan, this is a problem inherent in the way that R handles storing calls. You can see it happening with lm, for example:
model <- function(formula)
{
lm(formula, data=mtcars)
}
m <- model(mpg ~ disp)
m$call$formula
# formula
The simplest solution is to construct the call using substitute to insert the actual values you want to keep, not the symbol name. In the case of lm, this would be something like
model2 <- function(formula)
{
call <- substitute(lm(formula=.f, data=mtcars), list(.f=formula))
eval(call)
}
m2 <- model2(mpg ~ disp)
m2$call$formula
# mpg ~ disp
For Stan, you can do
stan_call <- substitute(br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data=dat, random=.rf,
chains=1, iter=200),
list(.rf=formula.rand))
br <- eval(stan_call)
If I understand correctly, your problem is not, that stan_gamm4 could be computing incorrect results (which is not the case, from what I gather), but only that br$call$random refers to the variable name and not the formula. This seems to be problematic for further post-processing of the model.
Since stan_gamm4 uses match.call inside to find the call, I don't know of a way to specify the model differently to obtain a "correct" br$call$random up front. But you can simply modify it after the fact via:
br <- stan_gamm4(y ~ s(x0) + x1 + s(x2), data = dat, random = formula.rand)
br$call$random <- formula.rand
br$call$random
#> ~(1 | fac)
and the continue with whatever you are doing.
IMHO, this is not a problem with stan_gamm4. In your second example, if you then do
class(br$call$random)
you will see that it is of class "name". So, it is not as if $call is just some list with stuff in it. In order to access it programatically in general, you need to evaluate that with
eval(br$call$random)
in order to obtain ~(1 | fac), which is of class "formula".

r loess: coefficients of global "parametric" terms

Is there a way how I can extract coefficients of globally fitted terms in local regression modeling?
Maybe I do misunderstand the role of globally fitted terms in the function loess, but what I would like to have is the following:
# baseline:
x <- sin(seq(0.2,0.6,length.out=100)*pi)
# noise:
x_noise <- rnorm(length(x),0,0.1)
# known structure:
x_1 <- sin(seq(5,20,length.out=100))
# signal:
y <- x + x_1*0.25 + x_noise
# fit loess model:
x_seq <- seq_along(x)
mod <- loess(y ~ x_seq + x_1,parametric="x_1")
The fit is done perfectly, however, how can I extract the estimated value of the globally fitted term x_1 (i.e. some value near 0.25 for the example above)?
Finally, I found a solution to my problem using the function gam from the package gam:
require(gam)
mod2 <- gam(y ~ lo(x_seq,span=0.75,degree=2) + x_1)
However, the fits from the two models are not exactly the same (which might be due to different control settings?)...

Subsetting in dredge (MuMIn) - must include interaction if main effects are present

I'm doing some exploratory work where I use dredge{MuMIn}. In this procedure there are two variables that I want to set to be allowed together ONLY when the interaction between them is present, i.e. they can not be present together only as main effects.
Using sample data: I want to dredge the model fm1 (disregarding that it probably doesn't make sense). If the variables GNP and Population appear together, they must also include the interaction between them.
require(stats); require(graphics)
## give the data set in the form it is used in S-PLUS:
longley.x <- data.matrix(longley[, 1:6])
longley.y <- longley[, "Employed"]
pairs(longley, main = "longley data")
names(longley)
fm1 <- lm(Employed ~GNP*Population*Armed.Forces, data = longley)
summary(fm1)
dredge(fm1, subset=!((GNP:Population) & !(GNP + Population)))
dredge(fm1, subset=!((GNP:Population) && !(GNP + Population)))
dredge(fm1, subset=dc(GNP+Population,GNP:Population))
dredge(fm1, subset=dc(GNP+Population,GNP*Population))
How can I specify in dredge() that it should disregard all models where GNP and Population are present, but not the interaction between them?
If I understand well, you want to model the two main effects (say, a and b) only together with their interaction (a:b). So how about: subset = !a | (xor(a, b) | 'a:b') (enclose a:b in backticks (`) not straight quotes), e.g:
library(MuMIn)
data(Cement)
fm <- lm(y ~ X1 * X2, Cement, na.action = na.fail)
dredge(fm, subset = !X2 | (xor(X1, X2) | `X1:X2`))
or wrap this condition into a function to have the code more clear:
test <- function(a, b, c) !a | (xor(a, b) | c)
dredge(fm, subset = test(X1, X2, `X1:X2`))
that produces: null, X1, X2, X1*X2 (and excludes X1 + X2)

R probit regression marginal effects

I am using R to replicate a study and obtain mostly the same results the
author reported. At one point, however, I calculate marginal effects that seem to be unrealistically small. I would greatly appreciate if you could have a look at my reasoning and the code below and see if I am mistaken at one point or another.
My sample contains 24535 observations, the dependent variable "x028bin" is a
binary variable taking on the values 0 and 1, and there are furthermore 10
explaining variables. Nine of those independent variables have numeric levels, the independent variable "f025grouped" is a factor consisting of different religious denominations.
I would like to run a probit regression including dummies for religious denomination and then compute marginal effects. In order to do so, I first eliminate missing values and use cross-tabs between the dependent and independent variables to verify that there are no small or 0 cells. Then I run the probit model which works fine and I also obtain reasonable results:
probit4AKIE <- glm(x028bin ~ x003 + x003squ + x025secv2 + x025terv2 + x007bin + x04chief + x011rec + a009bin + x045mod + c001bin + f025grouped, family=binomial(link="probit"), data=wvshm5red2delna, na.action=na.pass)
summary(probit4AKIE)
However, when calculating marginal effects with all variables at their means from the probit coefficients and a scale factor, the marginal effects I obtain are much too small (e.g. 2.6042e-78).
The code looks like this:
ttt <- cbind(wvshm5red2delna$x003,
wvshm5red2delna$x003squ,
wvshm5red2delna$x025secv2,
wvshm5red2delna$x025terv2,
wvshm5red2delna$x007bin,
wvshm5red2delna$x04chief,
wvshm5red2delna$x011rec,
wvshm5red2delna$a009bin,
wvshm5red2delna$x045mod,
wvshm5red2delna$c001bin,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped) #I put variable "f025grouped" 9 times because this variable consists of 9 levels
ttt <- as.data.frame(ttt)
xbar <- as.matrix(mean(cbind(1,ttt[1:19]))) #1:19 position of variables in dataframe ttt
betaprobit4AKIE <- probit4AKIE$coefficients
zxbar <- t(xbar) %*% betaprobit4AKIE
scalefactor <- dnorm(zxbar)
marginprobit4AKIE <- scalefactor * betaprobit4AKIE[2:20] #2:20 are the positions of variables in the output of the probit model 'probit4AKIE' (variables need to be in the same ordering as in data.frame ttt), the constant in the model occupies the first position
marginprobit4AKIE #in this step I obtain values that are much too small
I apologize that I can not provide you with a working example as my dataset is
much too large. Any comment would be greatly appreciated. Thanks a lot.
Best,
Tobias
#Gavin is right and it's better to ask at the sister site.
In any case, here's my trick to interpret probit coefficients.
The probit regression coefficients are the same as the logit coefficients, up to a scale (1.6). So, if the fit of a probit model is Pr(y=1) = fi(.5 - .3*x), this is equivalent to the logistic model Pr(y=1) = invlogit(1.6(.5 - .3*x)).
And I use this to make a graphic, using the function invlogit of package arm. Another possibility is just to multiply all coefficients (including the intercept) by 1.6, and then applying the 'divide by 4 rule' (see the book by Gelman and Hill), i.e, divide the new coefficients by 4, and you will find out an upper bound of the predictive difference corresponding to a unit difference in x.
Here's an example.
x1 = rbinom(100,1,.5)
x2 = rbinom(100,1,.3)
x3 = rbinom(100,1,.9)
ystar = -.5 + x1 + x2 - x3 + rnorm(100)
y = ifelse(ystar>0,1,0)
probit = glm(y~x1 + x2 + x3, family=binomial(link='probit'))
xbar <- as.matrix(mean(cbind(1,ttt[1:3])))
# now the graphic, i.e., the marginal effect of x1, x2 and x3
library(arm)
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*x + probit$coef[3]*xbar[3] + probit$coef[4]*xbar[4]))) #x1
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*xbar[2] + probit$coef[3]*x + probit$coef[4]*xbar[4]))) #x2
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*xbar[2] + probit$coef[3]*xbar[3] + probit$coef[4]*x))) #x3
This will do the trick for probit or logit:
mfxboot <- function(modform,dist,data,boot=1000,digits=3){
x <- glm(modform, family=binomial(link=dist),data)
# get marginal effects
pdf <- ifelse(dist=="probit",
mean(dnorm(predict(x, type = "link"))),
mean(dlogis(predict(x, type = "link"))))
marginal.effects <- pdf*coef(x)
# start bootstrap
bootvals <- matrix(rep(NA,boot*length(coef(x))), nrow=boot)
set.seed(1111)
for(i in 1:boot){
samp1 <- data[sample(1:dim(data)[1],replace=T,dim(data)[1]),]
x1 <- glm(modform, family=binomial(link=dist),samp1)
pdf1 <- ifelse(dist=="probit",
mean(dnorm(predict(x, type = "link"))),
mean(dlogis(predict(x, type = "link"))))
bootvals[i,] <- pdf1*coef(x1)
}
res <- cbind(marginal.effects,apply(bootvals,2,sd),marginal.effects/apply(bootvals,2,sd))
if(names(x$coefficients[1])=="(Intercept)"){
res1 <- res[2:nrow(res),]
res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res1)),nrow=dim(res1)[1])
rownames(res2) <- rownames(res1)
} else {
res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep="")),nrow=dim(res)[1]))
rownames(res2) <- rownames(res)
}
colnames(res2) <- c("marginal.effect","standard.error","z.ratio")
return(res2)
}
Source: http://www.r-bloggers.com/probitlogit-marginal-effects-in-r/

Resources