I calculated a linear mixed model using the packages lme4 and lsmeans with the lmer-function, where I have one dependent variable rv and the interacting factors treatment, time, age, and race. I'm interested in the response variable change over time, that's why I use the lstrends-function. So far so good. The problem is, I have to square root the response variable in order to fit the model properly. But the pairs-function only gives out a response to the square root of the rv, hard to interpret!
So I tried to back-transform the response variable after pairs:
model.lmer <- lmer(sqrt(rv) ~ treat*time*age*race + (1|individual), data=mydata)
model.lst <- lstrends(model.lmer, ~treat | age*race , var = "time", type="response")
pairs(mouse.lst, type="response")
This obviously doesn't work, as stated by the package itself:
# Transformed response
sqwarp.rg <- ref.grid(update(warp.lm, sqrt(breaks) ~ .))
summary(sqwarp.rg)
# Back-transformed results - compare with summary of 'warp.rg'
summary(sqwarp.rg, type = "response")
# But differences of sqrts can't be back-transformed
summary(pairs(sqwarp.rg, by = "wool"), type = "response")
# We can do it via regrid
sqwarp.rg2 <- regrid(sqwarp.rg)
summary(sqwarp.rg2) # same as for sqwarp.rg with type = "response"
pairs(sqwarp.rg2, by = "wool")
It could look like the following code:
summary(pairs(lsmeans(rg.regrid, ~ treat | race*age, trend="time")), type="response")
The problem is, I can't alter the reference grid for lstrends, just for lsmeans, because the first argument in lstrends or lsmeans with trend="time" requires the linear mixed effect model (model.lmer) intead of just the reference grid like in lsmeans, without the trend-argument... That's probably why I can't back-transform the data with
This here sums up my problem pretty well:
model.sqrt <- lmer(sqrt(rv) ~ time*treat*race*age, data=mydata)
rg <- ref.grid(model.sqrt)
rg.regrid <- regrid(rg)
summary(pairs(lsmeans(rg.regrid, ~treat | race*age*time), type = "response"))
Works perfectly.
summary(pairs(lsmeans(rg.regrid, ~treat | race*age, trend="time"), type = "response"))
Gives the following error:
Error in summary(pairs(lsmeans(rg.regrid, ~vns | gen * age, trend = "time"), :
error in evaluating the argument 'object' in selecting a method for function 'summary': Error in data[[var]] : subscript out of bounds
How to avoid the error and still be able to back-transform my data?
It does NOT seem to seem possible at all - the back-transformation would be a complicated procedure without any obvious pattern. That's what the creator of the package said.
Related
When trying to graph the conditional fixed effects of a glmmTMB model with two random intercepts in GGally I get the error:
There was an error calling "tidy_fun()". Most likely, this is because the
function supplied in "tidy_fun=" was misspelled, does not exist, is not
compatible with your object, or was missing necessary arguments (e.g. "conf.level=" or "conf.int="). See error message below.
Error: Error in "stop_vctrs()":
! Can't recycle "..1" (size 3) to match "..2" (size 2).`
I have tinkered with figuring out the issue and it seems to be related to the two random intercepts included in the model. I have also tried extracting the coefficient and standard error information separately through broom.mixed::tidy and then feeding the data frame into GGally:ggcoef() with no avail. Any suggestions?
# Example with built-in randu data set
data(randu)
randu$A <- factor(rep(c(1,2), 200))
randu$B <- factor(rep(c(1,2,3,4), 100))
# Model
test <- glmmTMB(y ~ x + z + (0 +x|A) + (1|B), family="gaussian", data=randu)
# A few of my attempts at graphing--works fine when only one random effects term is in model
ggcoef_model(test)
ggcoef_model(test, tidy_fun = broom.mixed::tidy)
ggcoef_model(test, tidy_fun = broom.mixed::tidy, conf.int = T, intercept=F)
ggcoef_model(test, tidy_fun = broom.mixed::tidy(test, effects="fixed", component = "cond", conf.int = TRUE))
There are some (old!) bugs that have recently been fixed (here, here) that would make confidence interval reporting on RE parameters break for any model with multiple random terms (I think). I believe that if you are able to install updated versions of both glmmTMB and broom.mixed:
remotes::install_github("glmmTMB/glmmTMB/glmmTMB#ci_tweaks")
remotes::install_github("bbolker/broom.mixed")
then ggcoef_model(test) will work.
I am using the package lqmm, to run a linear quantile mixed model on an imputed object of class mira from the package mice. I tried to make a reproducible example:
library(lqmm)
library(mice)
summary(airquality)
imputed<-mice(airquality,m=5)
summary(imputed)
fit1<-lqmm(Ozone~Solar.R+Wind+Temp+Day,random=~1,
tau=0.5, group= Month, data=airquality,na.action=na.omit)
fit1
summary(fit1)
fit2<-with(imputed, lqmm(Ozone~Solar.R+Wind+Temp+Day,random=~1,
tau=0.5, group= Month, na.action=na.omit))
"Error in lqmm(Ozone ~ Solar.R + Wind + Temp + Day, random = ~1, tau = 0.5, :
`data' must be a data frame"
Yes, it is possible to get lqmm() to work in mice. Viewing the code for lqmm(), it turns out that it's a picky function. It requires that the data argument is supplied, and although it appears to check if the data exists in another environment, it doesn't seem to work in this context. Fortunately, all we have to do to get this to work is capture the data supplied from mice and give it to lqmm().
fit2 <- with(imputed,
lqmm(Ozone ~ Solar.R + Wind + Temp + Day,
data = data.frame(mget(ls())),
random = ~1, tau = 0.5, group = Month, na.action = na.omit))
The explanation is that ls() gets the names of the variables available, mget() gets those variables as a list, and data.frame() converts them into a data frame.
The next problem you're going to find is that mice::pool() requires there to be tidy() and glance() methods to properly pool the multiple imputations. It looks like neither broom nor broom.mixed have those defined for lqmm. I threw together a very quick and dirty implementation, which you could use if you can't find anything else.
To get pool(fit2) to run you'll need to create the function tidy.lqmm() as below. Then pool() will assume the sample size is infinite and perform the calculations accordingly. You can also create the glance.lqmm() function before running pool(fit2), which will tell pool() the residual degrees of freedom. Afterwards you can use summary(pooled) to find the p-values.
tidy.lqmm <- function(x, conf.int = FALSE, conf.level = 0.95, ...) {
broom:::as_tidy_tibble(data.frame(
estimate = coef(x),
std.error = sqrt(
diag(summary(x, covariance = TRUE,
R = 50)$Cov[names(coef(x)),
names(coef(x))]))))
}
glance.lqmm <- function(x, ...) {
broom:::as_glance_tibble(
logLik = as.numeric(stats::logLik(x)),
df.residual = summary(x, R = 2)$rdf,
nobs = stats::nobs(x),
na_types = "rii")
}
Note: lqmm uses bootstrapping to estimate the standard error. By default it uses R = 50 bootstrapping replicates, which I've copied in the tidy.lqmm() function. You can change that line to increase the number of replicates if you like.
WARNING: Use these functions and the results with caution. I know just enough to be dangerous. To me it looks like these functions work to give sensible results, but there are probably intricacies that I'm not aware of. If you can find a more authoritative source for similar functions that work, or someone who is familiar with lqmm or pooling mixed models, I'd trust them more than me.
I have been using glmulti to obtain model averaged estimates and relative importance values for my variables of interest. In running glmulti I specified a candidate model for which all variables and interactions were included based on a priori knowledge (see code below).
After running the glmutli model I studied the results by using the functions summary() and weightable(). There seem to be a number of strange things going on with the results which I do not understand.
First of all, when I run my candidate model with lme4 glmer() function I obtain an AIC value of 2086. In the glmulti output this candidate model (with exactly the same formula) has a lower AIC value (2107), as a result of which it appears at position 8 out of 26 in the list of all potential models (as obtained through the weigtable() function).
What seems to be causing this problem is that the logArea:Habitat interaction is dropped from the candidate model, despite level=2 being specified. The function summary(output_new#objects[[8]]) provides a different formula (without the logArea:Habitat interaction variable) compared to the formula provided through weightable(). This explains why the candidate model AIC value is not the same as obtained through lme4, but I do not understand why the interaction variables logArea:Habitat is missing from the formula. The same is happening for other possible models. It seems that for all models with 2 or more interactions, one interaction is dropped.
Does anyone have an explanation for what is going on? Any help would be much appreciated!
Best,
Robert
Note: I have created a subset of my data (https://drive.google.com/open?id=1rc0Gkp7TPdnhW6Bw87FskL5SSNp21qxl) and simplified the candidate model by removing variables in order to decrease model run time. (The problem remains the same)
newdat <- Data_ommited2[, c("Presabs","logBodymass", "logIsolation", "Matrix", "logArea", "Protection","Migration", "Habitat", "Guild", "Study","Species", "SpeciesStudy")]
glmer.glmulti <- function (formula, data, random, ...) {
glmer(paste(deparse(formula), random), data = data, family=binomial(link="logit"),contrasts=list(Matrix=contr.sum, Habitat=contr.treatment, Protection=contr.treatment, Guild=contr.sum),glmerControl(optimizer="bobyqa", optCtrl = list(maxfun = 100000)))
}
output_new <- glmulti(y = Presabs ~ Matrix + logArea*Protection + logArea*Habitat,
data = sampledata,
random = '+(1|Study)+(1|Species)+(1|SpeciesStudy)',
family = binomial,
method = 'h',
level=2,
marginality=TRUE,
crit = 'aic',
fitfunc = glmer.glmulti,
confsetsize = 26)
print(output_new)
summary(output_new)
weightable(output_new)
I found a post (https://stats.stackexchange.com/questions/341356/glmulti-package-in-r-reporting-incorrect-aicc-values) of someone who encountered the same problem and it appears that the problem was caused by this line of code:
glmer.glmulti <- function (formula, data, random, ...) {
glmer(paste(deparse(formula), random), data = data, family=binomial(link="logit"))
}
By changing this part of the code into the following the problem was solved:
glmer.glmulti<-function(formula,data,random,...) {
newf <- formula
newf[[3]] <- substitute(f+r,
list(f=newf[[3]],
r=reformulate(random)[[2]]))
glmer(newf,data=data,
family=binomial(link="logit"))
}
I am trying to run a random effects model with LME. It is part of a larger function and I want it to be flexible so that I can pass the fixed (and ideally random) effects variable names to the lme function as variables. get() worked great for this where I started with lm, but it only seems to throw the ambiguous "Error in model.frame.default(formula = ~var1 + var2 + ID, data = list( : variable lengths differ (found for 'ID')." I'm stumped, the data are the same lengths, there are no NAs in this data or the real data, ...
set.seed(12345) #because I got scolded for not doing this previously
var1="x"
var2="y"
exdat<-data.frame(ID=c(rep("a",10),rep("b",10),rep("c",10)),
x = rnorm(30,100,1),
y = rnorm(30,100,2))
#exdat<-as.data.table(exdat) #because the data are actually in a dt, but that doesn't seem to be the issue
Works great
lm(log(get(var1))~log(get(var2)),data=exdat)
lme(log(y)~log(x),random=(~1|ID), data=exdat)
Does not work
lme(log(get(var1,pos=exdat))~log(get(var2)),random=(~1|ID), data=exdat)
Does not work, but throws a new error code: "Error in model.frame.default(formula = ~var1 + var2 + rfac + exdat, data = list( : invalid type (list) for variable 'exdat'"
rfac="ID"
lme(log(get(var1))~log(get(var2)),random=~1|get(rfac,pos=exdat), data=exdat)
Part of the problem seems to be with the nlme package. If you can consider using lme4, the desired results can be obtained by with:
lme4::lmer(log(get(var1)) ~ log(get(var2)) + (1 | ID),
data = exdat)
So I have a data set called x. The contents are simple enough to just write out so I'll just outline it here:
the dependent variable, Report, in the first column is binary yes/no (0 = no, 1 = yes)
the subsequent 3 columns are all categorical variables (race.f, sex.f, gender.f) that have all been converted to factors, and they're designated by numbers (e.g. 1= white, 2 = black, etc.)
I have run a logistic regression on x as follows:
glm <- glm(Report ~ race.f + sex.f + gender.f, data=x,
family = binomial(link="logit"))
And I can check the fitted probabilities by looking at summary(glm$fitted).
My question: How do I create a fifth column on the right side of this data set x that will include the predictions (i.e. fitted probabilities) for Report? Of course, I could just insert the glm$fitted as a column, but I'd like to try to write a code that predicts it based on whatever is in the race, sex, gender columns for a more generalized use.
Right now I the follow code which I will hope create a predicted column as well as lower and upper bounds for the confidence interval.
xnew <- cbind(xnew, predict(glm5, newdata = xnew, type = "link", se = TRUE))
xnew <- within(xnew, {
PredictedProb <- plogis(fit)
LL <- plogis(fit - (1.96 * se.fit))
UL <- plogis(fit + (1.96 * se.fit))
})
Unfortunately I get the error:
Error in eval(expr, envir, enclos) : object 'race.f' not found
after the cbind code.
Anyone have any idea?
There appears to be a few typo in your codes; First Xnew calls on glm5 but your model as far as I can see is glm (by the way using glm as name of your output is probably not a good idea). Secondly make sure the variable race.f is actually in the dataset you wish to do the prediction from. My guess is R can't find that variable hence the error.