Cannot run lmer from within a function - r

I am running into a problem trying to embed lmer within a function. Here is a reproducible example using data from lexdec. If I run lmer on the data frame directly, there is no problem. For example, say that I want to see whether reading times in a lexical decision task differed as a function of Trial. There were 2 types of word stimuli, "animal" (e.g. "dog") and "plant" (e.g. "cherry"). I can compute a mixed-effects model for animal words:
library(languageR) #load lexdec data
library(lme4) #load lmer()
s <- summary(lmer(RT ~ Trial + (1|Subject) + (1|Word), data = lexdec[lexdec$Class== "animal", ]))
s #this works well
However, if I embed the lmer model inside a function (say to not type the same command for each level of class) I get an error message. Do you know why? Any suggestions will be much appreciated!
#lmer() is now embedded in a function
compute.lmer <- function(df,class) {
m <- lmer(RT ~ Trial + (1|Subject) + (1|Word),data = df[df$Class== class, ])
m <- summary(m)
return(m)
}
#Now I can use this function to iterate over the 2 levels of the **Class** factor
for (c in levels(lexdec$Class)){
s <- compute.lmer(lexdec,c)
print(c)
print(s)
}
#But this gives an error message
Error in `colnames<-`(`*tmp*`, value = c("Estimate", "Std. Error", "df", :
length of 'dimnames' [2] not equal to array extent

I don't know what the problem is, your code runs just fine for me. (Are your packages up to date? What R version are you running? Have you cleaned your workspace and tried your code from scratch?)
That said, this is a great use case for plyr::dlply. I would do it like this:
library(languageR)
library(lme4)
library(plyr)
stats <- dlply(lexdec,
.variables = c("Class"),
.fun=function(x) return(summary(lmer(RT ~ Trial + (1 | Subject) +
(1 | Word), data = x))))
names(stats) <- levels(lexdec$Class)
Which then yields
> stats[["plant"]]
Linear mixed model fit by REML ['lmerMod']
Formula: RT ~ Trial + (1 | Subject) + (1 | Word)
Data: x
REML criterion at convergence: -389.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.2647 -0.6082 -0.1155 0.4502 6.0593
Random effects:
Groups Name Variance Std.Dev.
Word (Intercept) 0.003718 0.06097
Subject (Intercept) 0.023293 0.15262
Residual 0.028697 0.16940
Number of obs: 735, groups: Word, 35; Subject, 21
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.3999245 0.0382700 167.23
Trial -0.0001702 0.0001357 -1.25
Correlation of Fixed Effects:
(Intr)
Trial -0.379
When I run your code (copied and pasted without modification), I get similar output. It's identical except for the Data: line.
stats = list()
compute.lmer <- function(df,class) {
m <- lmer(RT ~ Trial + (1|Subject) + (1|Word),data = df[df$Class== class, ])
m <- summary(m)
return(m)
}
for (c in levels(lexdec$Class)){
s <- compute.lmer(lexdec,c)
stats[[c]] <- s
}
> stats[["plant"]]
Linear mixed model fit by REML ['lmerMod']
Formula: RT ~ Trial + (1 | Subject) + (1 | Word)
Data: df[df$Class == class, ]
REML criterion at convergence: -389.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.2647 -0.6082 -0.1155 0.4502 6.0593
Random effects:
Groups Name Variance Std.Dev.
Word (Intercept) 0.003718 0.06097
Subject (Intercept) 0.023293 0.15262
Residual 0.028697 0.16940
Number of obs: 735, groups: Word, 35; Subject, 21
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.3999245 0.0382700 167.23
Trial -0.0001702 0.0001357 -1.25
Correlation of Fixed Effects:
(Intr)
Trial -0.379

Related

Equivalence of a mixed model fitted by lme and lmer

I have fitted a mixed effects model considering both functions widely used in R, namely: the lme function from the nlme package and the lmer function from the lme4 package.
To readjust the model from lme to lme4, following the same reparametrization, I used the following information from this topic, being that is only possible to do this in lme4 in a hackable way.: Heterocesdastic model of mixed effects via lmer function
I apologize for hosting the data in a link, however, I couldn't find an internal R database that has variables that might match my problem.
Data: https://drive.google.com/file/d/1jKFhs4MGaVxh-OPErvLDfMNmQBouywoM/view?usp=sharing
The fitted models were:
library(nlme)
library(lme4)
ModLME = lme(Var1~I(Var2)+I(Var2^2),
random = ~1|Var3,
weights = varIdent(form=~1|Var4),
Dataone, method="REML")
ModLMER = lmer(Var1~I(Var2)+I(Var2^2)+(1|Var3)+(0+dummy(Var4,"1")|Var5),
Dataone, REML = TRUE,
control=lmerControl(check.nobs.vs.nlev="ignore",
check.nobs.vs.nRE="ignore"))
Which are equivalent, see:
all.equal(REMLcrit(ModLMER), c(-2*logLik(ModLME)))
[1] TRUE
all.equal(fixef(ModLME), fixef(ModLMER), tolerance=1e-7)
[1] TRUE
> summary(ModLME)
Linear mixed-effects model fit by REML
Data: Dataone
AIC BIC logLik
-209.1431 -193.6948 110.5715
Random effects:
Formula: ~1 | Var3
(Intercept) Residual
StdDev: 0.05789852 0.03636468
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Var4
Parameter estimates:
0 1
1.000000 5.641709
Fixed effects: Var1 ~ I(Var2) + I(Var2^2)
Value Std.Error DF t-value p-value
(Intercept) 0.9538547 0.01699642 97 56.12093 0
I(Var2) -0.5009804 0.09336479 97 -5.36584 0
I(Var2^2) -0.4280151 0.10038257 97 -4.26384 0
summary(ModLMER)
Linear mixed model fit by REML. t-tests use Satterthwaites method [lmerModLmerTest]
Formula: Var1 ~ I(Var2) + I(Var2^2) + (1 | Var3) + (0 + dummy(Var4, "1") |
Var5)
Data: Dataone
Control: lmerControl(check.nobs.vs.nlev = "ignore", check.nobs.vs.nRE = "ignore")
REML criterion at convergence: -221.1
Scaled residuals:
Min 1Q Median 3Q Max
-4.1151 -0.5891 0.0374 0.5229 2.1880
Random effects:
Groups Name Variance Std.Dev.
Var3 (Intercept) 6.466e-12 2.543e-06
Var5 dummy(Var4, "1") 4.077e-02 2.019e-01
Residual 4.675e-03 6.837e-02
Number of obs: 100, groups: Var3, 100; Var5, 100
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.95385 0.01700 95.02863 56.121 < 2e-16 ***
I(Var2) -0.50098 0.09336 92.94048 -5.366 5.88e-07 ***
I(Var2^2) -0.42802 0.10038 91.64017 -4.264 4.88e-05 ***
However, when observing the residuals of these models, note that they are not similar. See that in the model adjusted by lmer, mysteriously appears a residue with the shape of a few points close to a straight line. So, how could you solve such a problem so that they are identical? I believe the problem is in the lme4 model.
aa=plot(ModLME, main="LME")
bb=plot(ModLMER, main="LMER")
gridExtra::grid.arrange(aa,bb,ncol=2)
I can tell you what's going on and what should in principle fix it, but at the moment the fix doesn't work ...
The residuals being plotted take all of the random effects into account, which in the case of the lmer fit includes the individual-level random effects (the (0+dummy(Var4,"1")|Var5) term), which leads to weird residuals for the Var4==1 group. To illustrate this:
plot(ModLMER, col = Dataone$Var4+1)
i.e., you can see that the weird residuals are exactly the ones in red == those for which Var4==1.
In theory we should be able to get the same residuals via:
res <- Dataone$Var1 - predict(ModLMER, re.form = ~(1|Var3))
i.e., ignore the group-specific observation-level random effect term. However, it looks like there is a bug at the moment ("contrasts can be applied only to factors with 2 or more levels").
An extremely hacky solution is to construct the random-effect predictions without the observation-level term yourself:
## fixed-effect predictions
p0 <- predict(ModLMER, re.form = NA)
## construct RE prediction, Var3 term only:
Z <- getME(ModLMER, "Z")
b <- drop(getME(ModLMER, "b"))
## zero out observation-level components
b[101:200] <- 0
## add RE predictions to fixed predictions
p1 <- drop(p0 + Z %*% b)
## plot fitted vs residual
plot(p1, Dataone$Var1 - p1)
For what it's worth, this also works:
library(glmmTMB)
ModGLMMTMB <- glmmTMB(Var1~I(Var2)+I(Var2^2)+(1|Var3),
dispformula = ~factor(Var4),
REML = TRUE,
data = Dataone)

Calculate α and β in Probit Model in R

I am facing following issue: I want to calculate the α and β from the following probit model in R, which is defined as:
Probability = F(α + β sprd )
where sprd denotes the explanatory variable, α and β are constants, F is the cumulative normal distribution function.
I can calculate probabilities for the entire dataset, the coeffcients (see code below) etc. but I do not know how to get the constant α and β.
The purpose is to determine the Spread in Excel that corresponds to a certain probability. E.g: Which Spread corresponds to 50% etc.
Thank you in advance!
Probit model coefficients
probit<- glm(Y ~ X, family=binomial (link="probit"))
summary(probit)
Call:
glm(formula = Y ~ X, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4614 -0.6470 -0.3915 -0.2168 2.5730
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3566755 0.0883634 -4.036 5.43e-05 ***
X -0.0058377 0.0007064 -8.264 < 2e-16 ***
From the help("glm") page you can see that the object returns a value named coefficients.
An object of class "glm" is a list containing at least the following
components:
coefficients a named vector of coefficients
So after you call glm() that object will be a list, and you can access each element using $name_element.
Reproducible example (not a Probit model, but it's the same):
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
d.AD <- data.frame(treatment, outcome, counts)
# fit model
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
Now glm.D93$coefficients will print the vector with all the coefficients:
glm.D93$coefficients
# (Intercept) outcome2 outcome3 treatment2 treatment3
#3.044522e+00 -4.542553e-01 -2.929871e-01 1.337909e-15 1.421085e-15
You can assign that and access each individually:
coef <- glm.D93$coefficients
coef[1] # your alpha
#(Intercept)
# 3.044522
coef[2] # your beta
# outcome2
#-0.4542553
I've seen in your deleted post that you are not convinced by #RLave's answer. Here are some simulations to convince you:
# (large) sample size
n <- 10000
# covariate
x <- (1:n)/n
# parameters
alpha <- -1
beta <- 1
# simulated data
set.seed(666)
y <- rbinom(n, 1, prob = pnorm(alpha + beta*x))
# fit the probit model
probit <- glm(y ~ x, family = binomial(link="probit"))
# get estimated parameters - very close to the true parameters -1 and 1
coef(probit)
# (Intercept) x
# -1.004236 1.029523
The estimated parameters are given by coef(probit), or probit$coefficients.

Recurring error using lmer() function for a linear mixed-effects model

I attempted to construct a linear mixed effects model using lmer function from lme4 package and I ran into a recurring error. The model uses two fixed effects:
DBS_Electrode (factor w/3 levels) and
PostOp_ICA (continuous variable).
I use (1 | Subject) as a random effect term in which Subject is a factor of 38 levels (38 total subjects). Below is the line of code I attempted to run:
LMM.DBS <- lmer(Distal_Lead_Migration ~ DBS_Electrode + PostOp_ICA + (1 | Subject), data = DBS)
I recieved the following error:
Number of levels of each grouping factor must be < number of observations.
I would appreciate any help, I have tried to navigate this issue myself and have been unsuccessful.
Linear mixed effect model supposes that there is less subjects than observations so it throws an if it is not the case.
You can think of this formula as telling your model that it should
expect that there’s going to be multiple responses per subject, and
these responses will depend on each subject’s baseline level.
Please consult A very basic tutorial for performing linear mixed effects analyses by B. Winter, p. 4.
In your case you should increase amount of observations per subject (> 1). Please see the simulation below:
library(lme4)
set.seed(123)
n <- 38
DBS_Electrode <- factor(sample(LETTERS[1:3], n, replace = TRUE))
Distal_Lead_Migration <- 10 * abs(rnorm(n)) # Distal_Lead_Migration in cm
PostOp_ICA <- 5 * abs(rnorm(n))
# amount of observations equals to amout of subjects
Subject <- paste0("X", 1:n)
DBS <- data.frame(DBS_Electrode, PostOp_ICA, Subject, Distal_Lead_Migration)
model <- lmer(Distal_Lead_Migration ~ DBS_Electrode + PostOp_ICA + (1|Subject), data = DBS)
# Error: number of levels of each grouping factor must be < number of observations
# amount of observations more than amout of subjects
Subject <- c(paste0("X", 1:36), "X1", "X37")
DBS <- data.frame(DBS_Electrode, PostOp_ICA, Subject, Distal_Lead_Migration)
model <- lmer(Distal_Lead_Migration ~ DBS_Electrode + PostOp_ICA + (1|Subject), data = DBS)
summary(model)
Output:
Linear mixed model fit by REML ['lmerMod']
Formula: Distal_Lead_Migration ~ DBS_Electrode + PostOp_ICA + (1 | Subject)
Data: DBS
REML criterion at convergence: 224.5
Scaled residuals:
Min 1Q Median 3Q Max
-1.24605 -0.73780 -0.07638 0.64381 2.53914
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 2.484e-14 1.576e-07
Residual 2.953e+01 5.434e+00
Number of obs: 38, groups: Subject, 37
Fixed effects:
Estimate Std. Error t value
(Intercept) 7.82514 2.38387 3.283
DBS_ElectrodeB 0.22884 2.50947 0.091
DBS_ElectrodeC -0.60940 2.21970 -0.275
PostOp_ICA -0.08473 0.36765 -0.230
Correlation of Fixed Effects:
(Intr) DBS_EB DBS_EC
DBS_ElctrdB -0.718
DBS_ElctrdC -0.710 0.601
PostOp_ICA -0.693 0.324 0.219

bootstrapping for lmer with interaction term

I am running a mixed model using lme4 in R:
full_mod3=lmer(logcptplus1 ~ logdepth*logcobb + (1|fyear) + (1 |flocation),
data=cpt, REML=TRUE)
summary:
Formula: logcptplus1 ~ logdepth * logcobb + (1 | fyear) + (1 | flocation)
Data: cpt
REML criterion at convergence: 577.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.7797 -0.5431 0.0248 0.6562 2.1733
Random effects:
Groups Name Variance Std.Dev.
fyear (Intercept) 0.2254 0.4748
flocation (Intercept) 0.1557 0.3946
Residual 0.9663 0.9830
Number of obs: 193, groups: fyear, 16; flocation, 16
Fixed effects:
Estimate Std. Error t value
(Intercept) 4.3949 1.2319 3.568
logdepth 0.2681 0.4293 0.625
logcobb -0.7189 0.5955 -1.207
logdepth:logcobb 0.3791 0.2071 1.831
I have used the effects package and function in R to calculate the 95% confidence intervals for the model output. I have calculated and extracted the 95% CI and standard error using the effects package so that I can examine the relationship between the predictor variable of importance and the response variable by holding the secondary predictor variable (logdepth) constant at the median (2.5) in the data set:
gm=4.3949 + 0.2681*depth_median + -0.7189*logcobb_range + 0.3791*
(depth_median*logcobb_range)
ef2=effect("logdepth*logcobb",full_mod3,
xlevels=list(logcobb=seq(log(0.03268),log(0.37980),,200)))
I have attempted to bootstrap the 95% CIs using code from here. However, I need to calculate the 95% CIs for only the median depth (2.5). Is there a way to specify in the confint() code so that I can calculate the CIs needed to visualize the bootstrapped results as in the plot above?
confint(full_mod3,method="boot",nsim=200,boot.type="perc")
You can do this by specifying a custom function:
library(lme4)
?confint.merMod
FUN: bootstrap function; if ‘NULL’, an internal function that returns the fixed-effect parameters as well as the random-effect parameters on the standard deviation/correlation scale will be used. See ‘bootMer’ for details.
So FUN can be a prediction function (?predict.merMod) that uses a newdata argument that varies and fixes appropriate predictor variables.
An example with built-in data (not quite as interesting as yours since there's a single continuous predictor variable, but I think it should illustrate the approach clearly enough):
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
pframe <- data.frame(Days=seq(0,20,by=0.5))
## predicted values at population level (re.form=NA)
pfun <- function(fit) {
predict(fit,newdata=pframe,re.form=NA)
}
set.seed(101)
cc <- confint(fm1,method="boot",FUN=pfun)
Picture:
par(las=1,bty="l")
matplot(pframe$Days,cc,lty=2,col=1,type="l",
xlab="Days",ylab="Reaction")

Colnames error after running Summary() in mixed model

R version 3.1.0 (2014-04-10)
lmer package version 1.1-6
lmerTest package version 2.0-6
I am currently working with lmer and lmerTest for my analysis.
Every time I add an effect to the random structure, I get the following error when running summary():
#Fitting a mixed model:
TRT5ToVerb.lmer3 = lmer(TRT5ToVerb ~ Group + Condition + (1+Condition|Participant) + (1|Trial), data=AllData, REML=FALSE, na.action=na.omit)
summary(TRT5ToVerb.lmer3)
Error in `colnames<-`(`*tmp*`, value = c("Estimate", "Std. Error", "df", : length of 'dimnames' [2] not equal to array extent
If I leave the structure like this:
TRT5ToVerb.lmer2 = lmer(TRT5ToVerb ~ Group + Condition + (1|Participant) + (1|Trial), data=AllData, REML=FALSE, na.action=na.omit)
there is no error run summary(TRT5ToVerb.lmer2), returning AIC, BIC, logLik deviance, estimates of the random effects, estimates of the fixed effects and their corresponding p-values, etc., etc.
So, apparently something happens when I run lmerTest, despite the fact that the object TRT5ToVerb.lmer3 is there. The only difference between both is the random structure: (1+Condition|Participant) vs. (1|Participant)
Some characteristics of my data:
Both Condition and Group are categorical variables: Condition
comprises 3 levels, and Group 2
The dependent variable (TRT5ToVerb) is continuous: it corresponds to
reading time in terms of ms
This a repeated measures experiment, with 48 observations per
participant (participants=28)
I read this threat, but I cannot see a clear solution. Will it be that I have to transform my dataframe to long format?
And if so, then how do I work with that in lmer?
I hope it is not that.
Thanks!
Disclaimer: I am neither an expert in R, nor in statistics, so please, have some patience.
(Should be a comment, but too long/code formatting etc.)
This fake example seems to work fine with lmerTest 2.0-6 and a development version of lme4 (1.1-8; but I wouldn't expect there to be any relevant differences from 1.1-6 for this example ...)
AllData <- expand.grid(Condition=factor(1:3),Group=factor(1:2),
Participant=1:28,Trial=1:8)
form <- TRT5ToVerb ~ Group + Condition + (1+Condition|Participant) + (1|Trial)
library(lme4)
set.seed(101)
AllData$TRT5ToVerb <- simulate(form[-2],
newdata=AllData,
family=gaussian,
newparam=list(theta=rep(1,7),sigma=1,beta=rep(0,4)))[[1]]
library(lmerTest)
lmer3 <- lmer(form,data=AllData,REML=FALSE)
summary(lmer3)
Produces:
Linear mixed model fit by maximum likelihood ['merModLmerTest']
Formula: TRT5ToVerb ~ Group + Condition + (1 + Condition | Participant) +
(1 | Trial)
Data: AllData
AIC BIC logLik deviance df.resid
4073.6 4136.0 -2024.8 4049.6 1332
Scaled residuals:
Min 1Q Median 3Q Max
-2.97773 -0.65923 0.02319 0.66454 2.98854
Random effects:
Groups Name Variance Std.Dev. Corr
Participant (Intercept) 0.8546 0.9245
Condition2 1.3596 1.1660 0.58
Condition3 3.3558 1.8319 0.44 0.82
Trial (Intercept) 0.9978 0.9989
Residual 0.9662 0.9829
Number of obs: 1344, groups: Participant, 28; Trial, 8
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.49867 0.39764 12.40000 1.254 0.233
Group2 0.03002 0.05362 1252.90000 0.560 0.576
Condition2 -0.03777 0.22994 28.00000 -0.164 0.871
Condition3 -0.27796 0.35237 28.00000 -0.789 0.437
Correlation of Fixed Effects:
(Intr) Group2 Cndtn2
Group2 -0.067
Condition2 0.220 0.000
Condition3 0.172 0.000 0.794

Resources