I have a pre-post-control design (see picture below). A control group (c) is pretested (pre.c) and post-tested (pos.c). Similarly a treatment group (t) is prettested (pre.t) and post-tested (pos.t). So we have two groups (group factor) tested at two time points (time factor).
Data for this design appears below (in R code).
Question: I was wondering if my R code below is correctly capturing both factors (i.e., time and group) and interaction for this design?
Here is my data and what I have tried (with no success):
pre.c = rnorm(100)
pos.c = rnorm(100)
pre.t = rnorm(100)
pos.t = rnorm(100, 5)
data = data.frame(subjects = rep(1:200, each = 2),
y = c(pre.c, pos.c, pre.t, pos.t),
time = rep(0:1, 200),
group = rep(c(0, 1), each = 200))
library(lme4)
m1 <- lmer(y ~ time + (time | subjects), data = data) # Gives Error
m2 <- lmer(y ~ time * group + (time | subjects), data = data) # Gives Error
Related
Predicting values in new data from an lmer model throws an error when a period is used to represent predictors. Is there any way around this?
The answer to this similar question offers a way to automatically write out the full formula instead of using the period, but I'm curious if there's a way to get predictions from new data just using the period.
Here's a reproducible example:
mydata <- data.frame(
groups = rep(1:3, each = 100),
x = rnorm(300),
dv = rnorm(300)
)
train_subset <- sample(1:300, 300 * .8)
train <- mydata[train_subset,]
test <- mydata[-train_subset,]
# Returns an error
mod <- lmer(dv ~ . - groups + (1 | groups), data = train)
predict(mod, newdata = test)
predict(mod) # getting predictions for the original data works
# Writing the full formula without the period does not return an error, even though it's the exact same model
mod <- lmer(dv ~ x + (1 | groups), data = train)
predict(mod, newdata = test)
This should be fixed in the development branch of lme4 now. You can install from GitHub (see first line below) or wait a few weeks (early April-ish) for a new version to hit CRAN.
remotes::install_github("lme4/lme4") ## you will need compilers etc.
mydata <- data.frame(
groups = rep(1:3, each = 100),
x = rnorm(300),
dv = rnorm(300)
)
train_subset <- sample(1:300, 300 * .8)
train <- mydata[train_subset,]
test <- mydata[-train_subset,]
# Returns an error
mod <- lmer(dv ~ . - groups + (1 | groups), data = train)
p1 <- predict(mod, newdata = test)
mod2 <- lmer(dv ~ x + (1 | groups), data = train)
p2 <- predict(mod2, newdata = test)
identical(p1, p2) ## TRUE
I attempt to fit a GAM model with interactions between days (tt variable) and lagged predictors (k=2) using k basis functions.
library(mgcv)
# Example data
data=data.frame(
tt=1:107, # days
pol=(sample.int(101,size=107,replace=TRUE)-1)/100,
at_rec=sample.int(101,size=107,replace=TRUE),
w_cas=sample.int(2000,size=107,replace=TRUE)
)
# model
gam1<-gam(pol ~ s(tt, k = 10) +
s(tt, by = Lag(at_rec, k = 2), k = 10)+
s(tt, by = Lag(w_cas, k = 2), k = 10),
data=data,method="GACV.Cp")
summary(gam1)
# while making newdata
> newdata=data.frame(tt=c(12,22),at_rec=c(44,34), w_cas=c(2011,2455))
# and prediction
> predict(gam1,newdata=newdata,se.fit=TRUE)
I got this error
"Error in PredictMat(object$smooth[[k]], data) : Can't find by variable"
How to predict such a model with new data?
I'm 99.9% sure that the predict method can't find the by terms because they are functions of variables and it's looking for variables with exactly the names you provided: "Lag(at_rec, k = 2)".
Try adding those lagged variables to your data frame as explicit variables and refit the model and it should work:
data <- transform(data,
lag_at_rec = Lag(at_rec, k=2),
lag_w_cas = Lag(w_cas, k=2))
gam1 <- gam(pol ~ s(tt, k = 10) +
s(tt, by = lag_at_rec, k = 10)+
s(tt, by = lag_w_cas, k = 10),
data = data, method = "GACV.Cp")
I have the following data:
set.seed(3)
library(data.table)
library(lme4)
a <- rep(1:5, times = 20)
b <- rep(c(1,1,1,1,1,2,2,2,2,2), times = 50)
ppt <- rep(101:110, each = 10)
item <- rep(1:10, times = 10)
dv <- rnorm(n = 100)
contrasts(data$a) = contr.sum(4)
data <- data.table(cbind(ppt, item, a, b, dv))
data$ppt <- as.factor(data$ppt)
data$item <- as.factor(data$item)
data$a <- as.factor(data$a)
data$b <- as.factor(data$b)
I would like to get a coefficient for each level of a. u/omsa_d00d and u/dead-serious pointed me to the idea of running a model without an intercept.
If I run this model:
m1 <- lmer(dv ~ a + b -1 +(1|ppt) + (1|item), data = data)
I get coefficients for each level of a.
However if I run this model in which b comes first:
m2 <- lmer(dv ~ b + a -1 +(1|ppt) + (1|item), data = data)
I get coefficients for each level of b, but not a.
What exactly is happening in each case?
Additionally, is running m1 sufficient to get an effect of each level of a compared to the grand mean, while also controlling for b?
Does it matter if I mean centre my predictors first?
What are the different implications of dummy vs. sum coding factor a?
I know variations of this question have been asked before but I haven't yet seen an answer on how to implement the multinomial Poisson transformation with multilevel models.
I decided to make a fake dataset and follow the method outlined here, also consulting the notes the poster mentions as well as the Baker paper on MP transformation.
In order to check if I'm doing the coding correctly, I decided to create a binary outcome variable as a first step; because glmer can handle binary response variables, this will let me check I'm correctly recasting the logit regression as multiple Poissons.
The context of this problem is running multilevel regressions with survey data where the outcome variable is response to a question and the possible predictors are demographic variables. As I mentioned above, I wanted to see if I could properly code the binary outcome variable as a Poisson regression before moving on to multi-level outcome variables.
library(dplyr)
library(lme4)
key <- expand.grid(sex = c('Male', 'Female'),
age = c('18-34', '35-64', '45-64'))
set.seed(256)
probs <- runif(nrow(key))
# Make a fake dataset with 1000 responses
n <- 1000
df <- data.frame(sex = sample(c('Male', 'Female'), n, replace = TRUE),
age = sample(c('18-34', '35-64', '45-64'), n, replace = TRUE),
obs = seq_len(n), stringsAsFactors = FALSE)
age <- model.matrix(~ age, data = df)[, -1]
sex <- model.matrix(~ sex, data = df)[, -1]
beta_age <- matrix(c(0, 1), nrow = 2, ncol = 1)
beta_sex <- matrix(1, nrow = 1, ncol = 1)
# Create class probabilities as a function of age and sex
probs <- plogis(
-0.5 +
age %*% beta_age +
sex %*% beta_sex +
rnorm(n)
)
id <- ifelse(probs > 0.5, 1, 0)
df$y1 <- id
df$y2 <- 1 - df$y1
# First run the regular hierarchical logit, just with a varying intercept for age
glm_out <- glmer(y1 ~ (1|age), family = 'binomial', data = df)
summary(glm_out)
#Next, two Poisson regressions
glm_1 <- glmer(y1 ~ (1|obs) + (1|age), data = df, family = 'poisson')
glm_2 <- glmer(y2 ~ (1|obs) + (1|age), data = df, family = 'poisson')
coef(glm_1)$age - coef(glm_2)$age
coef(glm_out)$age
The outputs for the last two lines are:
> coef(glm_1)$age - coef(glm_2)$age
(Intercept)
18-34 0.14718933
35-64 0.03718271
45-64 1.67755129
> coef(glm_out)$age
(Intercept)
18-34 0.13517758
35-64 0.02190587
45-64 1.70852847
These estimates seem close but they are not exactly the same. I'm thinking I've specified an equation wrong with the intercept.
I am running multivariate mixed model in R by using nlme package. Suppose that x and y are responses variables for longitudinal data which assumed that the error within group is correlated. The residual error matrix is presented as:
So my question is how to involve the correlation into lme function?
I tried commands corr = corComSymm(from =~ 1 | x) or corr = corAR1(from =~ 1 | x) but did not work!
here en example:
# visiting time by months
time = rep(c(0,3,6,9),time = 4, 200)
# subjects
subject = rep(1:50, each = 4)
# first response variable "identity"
x = c(rep(0, 100), rep(1,100))
# second response variable "identity"
y = c(rep(1, 100), rep(0,100))
# values of both reponses variables (x_1, x_2)
value = c(rnorm(100,20,1),rnorm(100,48,1))
# variables refer to reponses variables (x_1, x_2)
variable = factor(c(rep(0,150),rep(1,50)), label=c("X","Y"))
df = data.frame(subject , time, x,y,value, variable)
library(nlme)
# fit the model that each response variable has intercept and slope (time) for each random and fixed effects
# as well as fixed effects slopes for sex and lesion, and each response has different variance
f= lme(value ~ -1 + x + y + x:time + y:time , random = ~ -1 + (x + y) + time:( x + y)|subject ,
weights = varIdent(form=~1| x),corr = corAR1(from = ~ 1|x), control=lmeControl(opt="optim"), data =df)
Error in corAR1(from = ~1 | x) : unused argument (from = ~1 | x)
Any suggestions?
I found this website (below) which helpful and useful, I posted here in case someone might has this problem in future.
https://rpubs.com/bbolker/3336