R lmer with variance structure - r

i want to fit a mixed model but with variance structure.
I create my data frame
DTF <- data.frame(Y = rnorm(7*4, 0, 1),
TIME = rep(1:7, 4),
GROUP = c(rep(1,7), rep(2,7), rep(3,7), rep(4,7)),
X = rep(rnorm(7, 20, 5), 4))
and than i run this command
model <- lmer(Y ~ TIME:GROUP + X + (1 | GROUP), DTF, REML = FALSE,
varWeights(varFixed(~ TIME)))
and i have
Error in lmer(...): 'control' is not a list; use lmerControl()
I don't know what i have to change.
Thanks for your help!

Related

How can you force predictions of a linear model to start at the same point as new data

I would like my logistic regression model to start at the same point as a predictor variable.
Data:
df <- tibble(
x = c(0:20, 0:20),
y = c(log(10:30 + 2), log(10:30 + 10)),
init = c(rep(log(10 + 2), 21), rep(log(10 + 10), 21)),
group = c(rep('A', 21), rep('B', 21))
)
Model:
lm_fit <- lm(y ~ log(x + 1) + init, data = df)
Example of model fitted to data:
newdata <- df %>%
filter(group == 'A') %>%
mutate(pred_y = predict(lm_fit, newdata = newdata, type = 'response')) %>%
pivot_longer(c(y, pred_y), names_to = 'pred_type', values_to = 'value')
ggplot(aes(x, value, colour = pred_type)) +
geom_point() +
geom_line()
How can I change my model so the red line (model) starts at the same value as the blue line (data)? i.e. when x=0, pred_y = y.
Using your init variable, you have to treat it as an offset (its coefficient will be 1) and disable the intercept (-1 in model formula).
lm_fit <- lm(y ~ log(x + 1) + offset(init) - 1, data = df)
After changing the model formula to log(y) ~ log(x + 1) a possible approach is to transform the y variable and use its new value in x = 0 for the offset (init) variable (I would actually recommend to always derive the offset from the y variable and not compute it independently). This way only the data is modified and the rest will remain the same.
df <- df %>%
group_by(group) %>%
mutate(y = log(y),
init = y[x==0])

Can anyone please help fix the following glmmLasso R package errors?

I have been trying to run the following code and I am getting various errors. Anyone know how to fix the current one? I am trying to run a generalized linear mixed model with a tuning parameter (specifically LASSO), but was trying to start at the basics and get the fixed effects to work first. Thank you!
y <- rbinom(n = 50, size = 1, prob = .5)
x <- rnorm(n = 50, mean = 1, sd = .5)
data <- data.frame(x, y)
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 10, data = data)
error: the condition has length > 1 and only the first element will be usedthe condition has length > 1 and only the first element will be used
another error: data length is not a multiple of split variable (this does not happen with this simulation data, but it does with my real data)
Another note - I have tried the exact code in the help documentation for generalized linear mixed models with the soccer data and I get the same error about the length > 1
I guess it is a problem with R 4.0.3.
I used glmmLasso without any errors and this error occurred when I updated my Base R from 3.6 to 4.0.3. I wrote an email to the author.
My guess is that with only 1 predictor and data that has no relationship, your lambda is too high for 1 variable and throws a weird solution of the matrix, you can check see the source code throws error at these two lines:
finish<-(sqrt(sum((Eta.ma[l,]-Eta.ma[l+1,])^2))/sqrt(sum((Eta.ma[l,])^2))<eps)
finish2<-(sqrt(sum((Eta.ma[l-1,]-Eta.ma[l+1,])^2))/sqrt(sum((Eta.ma[l-1,])^2))<eps)
if(finish || finish2)
To make this reproducible:
set.seed(2)
y <- rbinom(n = 50, size = 1, prob = .5)
x <- rnorm(n = 50, mean = 1, sd = .5)
data <- data.frame(x, y)
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 10, data = data)
Error in if (finish || finish2) break :
missing value where TRUE/FALSE needed
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 1, data = data)
mod1
Call:
glmmLasso(fix = y ~ x, rnd = NULL, data = data, lambda = 1, family = binomial(link = logit))
Fixed Effects:
Coefficients:
(Intercept) x
-0.8089034 0.8678967
If we try another seed, you can see there's no issue, although you can see the end solution is a the coefficient set to zero:
set.seed(1)
y <- rbinom(n = 50, size = 1, prob = .5)
x <- rnorm(n = 50, mean = 1, sd = .5)
data <- data.frame(x, y)
mod1 <- glmmLasso(fix = y ~ x , rnd = NULL, family = binomial(link = logit), lambda = 10, data = data)
mod1
Call:
glmmLasso(fix = y ~ x, rnd = NULL, data = data, lambda = 10,
family = binomial(link = logit))
Fixed Effects:
Coefficients:
(Intercept) x
0.1603426 0.0000000
To sum up.. most likely for your data, you need to move it through some lambdas to examine the fit

Plotting with GLMMadaptive for zero-inflated, semi-continuous data?

I'm trying to utilize the effectPlotData as described here: https://cran.r-project.org/web/packages/GLMMadaptive/vignettes/Methods_MixMod.html
But, I'm trying to apply it to a model (two-part mixed model for zero-inflated semi-continuous data) that includes random/fixed effects for both a linear and logistic portion (hurdle lognormal). I get the following error:
'Error in Qs[1, ] : incorrect number of dimensions'
Which, I think is from having more than one set of random/fixed effect outcomes, but if anyone else has come across this error or can advise, it would be appreciated! I've tried changing the terms in the new data frame and tried a couple of different options with length.out (attempted this as number of subjects and then number of total observations across all subjects), but get the same error each time.
Code below, specifies the model into m and new data frame into nDF:
m = mixed_model(Y~X, random = ~1|Subject,
data = data_combined_temp_Fix_Num3,
family = hurdle.lognormal,
n_phis = 1, zi_fixed = ~X , zi_random = ~1|Subject,
na.action = na.exclude)
nDF <- with(data_combined_temp_Fix_Num3,
expand.grid(X = seq(min(X), max(X), length.out = 908),
Y = levels(Y)))
effectPlotData(m, nDF)
It seems to work for with the following example:
library("GLMMadaptive")
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 5 # maximum follow-up time
# we constuct a data frame with the design:
# everyone has a baseline measurment, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects non-zero part
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
# design matrices for the fixed and random effects zero part
X_zi <- model.matrix(~ sex, data = DF)
Z_zi <- model.matrix(~ 1, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients non-zero part
sigma <- 0.5 # standard deviation error terms non-zero part
gammas <- c(-1.5, 0.5) # fixed effects coefficients zero part
D11 <- 0.5 # variance of random intercepts non-zero part
D22 <- 0.1 # variance of random slopes non-zero part
D33 <- 0.4 # variance of random intercepts zero part
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)), rnorm(n, sd = sqrt(D33)))
# linear predictor non-zero part
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, 1:2, drop = FALSE]))
# linear predictor zero part
eta_zi <- as.vector(X_zi %*% gammas + rowSums(Z_zi * b[DF$id, 3, drop = FALSE]))
# we simulate log-normal longitudinal data
DF$y <- exp(rnorm(n * K, mean = eta_y, sd = sigma))
# we set the zeros from the logistic regression
DF$y[as.logical(rbinom(n * K, size = 1, prob = plogis(eta_zi)))] <- 0
###############################################################################
km1 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF,
family = hurdle.lognormal(),
zi_fixed = ~ sex)
km1
nDF <- with(DF, expand.grid(time = seq(min(time), max(time), length.out = 15),
sex = levels(sex)))
plot_data <- effectPlotData(km1, nDF)
library("lattice")
xyplot(pred + low + upp ~ time | sex, data = plot_data,
type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
xlab = "Follow-up time", ylab = "")
local({
km1$Funs$mu_fun <- function (eta) {
pmax(exp(eta + 0.5 * exp(2 * km1$phis)), .Machine$double.eps)
}
km1$family$linkfun <- function (mu) log(mu)
plot_data <- effectPlotData(km1, nDF)
xyplot(exp(pred) + exp(low) + exp(upp) ~ time | sex, data = plot_data,
type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
xlab = "Follow-up time", ylab = "")
})
In case someone comes across the same error, I was filtering data from my data frame within the model -- which caused the dimensions of the model and the variable from the data frame to not match. I applied the same filtering to the new data frame (I've also moved forward with a completely new data frame that only includes trials that are actually used by the model so that no filtering has to be used at any step).
m = mixed_model(Y~X, random = ~1|Subject,
data = data_combined_temp_Fix_Num3[data_combined_temp_Fix_Num3$Z>=4 &
data_combined_temp_Fix_Num3$ZZ>= 4,],
family = hurdle.lognormal,
n_phis = 1, zi_fixed = ~X , zi_random = ~1|Subject,
na.action = na.exclude)`
nDF <- with(data_combined_temp_Fix_Num3,
expand.grid(X = seq(min(X[data_combined_temp_Fix_Num3$Z>= 4 &
data_combined_temp_Fix_Num3$ZZ>= 4])),
max(X[data_combined_temp_Fix_Num3$Z>= 4 &
data_combined_temp_Fix_Num3$ZZ>= 4])), length.out = 908),
Y = levels(Y)))`
effectPlotData(m, nDF)

Visualize Multilevel Growth Model with nlme/ggplot2 vs lme4/ggplot2

I am trying to visualize the results of an nlme object without success. When I do so with an lmer object, the correct plot is created. My goal is to use nlme and visualize a fitted growth curve for each individual with ggplot2. The predict() function seems to work differently with nlme and lmer objects.
model:
#AR1 with REML
autoregressive <- lme(NPI ~ time,
data = data,
random = ~time|patient,
method = "REML",
na.action = "na.omit",
control = list(maxlter=5000, opt="optim"),
correlation = corAR1())
nlme visualization attempt:
data <- na.omit(data)
data$patient <- factor(data$patient,
levels = 1:23)
ggplot(data, aes(x=time, y=NPI, colour=factor(patient))) +
geom_point(size=1) +
#facet_wrap(~patient) +
geom_line(aes(y = predict(autoregressive,
level = 1)), size = 1)
when I use:
data$fit<-fitted(autoregressive, level = 1)
geom_line(aes(y = fitted(autoregressive), group = patient))
it returns the same fitted values for each individual and so ggplot produces the same growth curve for each. Running test <-data.frame(ranef(autoregressive, level=1)) returns varying intercepts and slopes by patient id. Interestingly, when I fit the model with lmer and run the below code it returns the correct plot. Why does predict() work differently with nlme and lmer objects?
timeREML <- lmer(NPI ~ time + (time | patient),
data = data,
REML=T, na.action=na.omit)
ggplot(data, aes(x = time, y = NPI, colour = factor(patient))) +
geom_point(size=3) +
#facet_wrap(~patient) +
geom_line(aes(y = predict(timeREML)))
In creating a reproducible example, I found that the error was not occurring in predict() nor in ggplot() but instead in the lme model.
Data:
###libraries
library(nlme)
library(tidyr)
library(ggplot2)
###example data
df <- data.frame(replicate(78, sample(seq(from = 0,
to = 100, by = 2), size = 25,
replace = F)))
##add id
df$id <- 1:nrow(df)
##rearrange cols
df <- df[c(79, 1:78)]
##sort columns
df[,2:79] <- lapply(df[,2:79], sort)
##long format
df <- gather(df, time, value, 2:79)
##convert time to numeric
df$time <- factor(df$time)
df$time <- as.numeric(df$time)
##order by id, time, value
df <- df[order(df$id, df$time),]
##order value
df$value <- sort(df$value)
Model 1 with no NA values fits successfully.
###model1
model1 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=5000, opt="optim"),
correlation = corAR1(0, form=~time|id,
fixed=F))
Introducing NA's causes invertible coefficient matrix error in model 1.
###model 1 with one NA value
df[3,3] <- NA
model1 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=2000, opt="optim"),
correlation = corAR1(0, form=~time|id,
fixed=F))
But not in model 2, which has a more simplistic within-group AR(1) correlation structure.
###but not in model2
model2 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=2000, opt="optim"),
correlation = corAR1(0, form = ~1 | id))
However, changing opt="optim" to opt="nlminb" fits model 1 successfully.
###however changing the opt to "nlminb", model 1 runs
model3 <- lme(value ~ time,
data = df,
random = ~time|id,
method = "ML",
na.action = "na.omit",
control = list(maxlter=2000, opt="nlminb"),
correlation = corAR1(0, form=~time|id,
fixed=F))
The code below visualizes model 3 (formerly model 1) successfully.
df <- na.omit(df)
ggplot(df, aes(x=time, y=value)) +
geom_point(aes(colour = factor(id))) +
#facet_wrap(~id) +
geom_line(aes(y = predict(model3, level = 0)), size = 1.3, colour = "black") +
geom_line(aes(y = predict(model3, level=1, group=id), colour = factor(id)), size = 1)
Note that I am not exactly sure what changing the optimizer from "optim" to "nlminb" does and why it works.

Marginal effects of logit in weighted survey data using R

I´m trying to estimate marginal effect of a logit model in which I have several dichotomous explanatory variables.
Let's say the model estimated by
logit<- svyglm ( if_member ~ if_female + dummy_agegroup_2 + dummy_agegroup_3 + dummy_education_2 + dummy_education_3 + dummy_education_4, family = quasibinomial(link = "logit"), design = survey_design)
I know about the marginpred function in survey package, but I am not very familiar with it. I have only dichotomous variebles in the model so I am wondering how to estimate marginal effects by this function, especially I am not sure about the predictat (A data frame giving values of the variables in model to predict at).
Are you looking for marginal effects or marginal predictions?
As the name implies, the marginpred() function returns predictions. The argument for predictat is a data frame with both the control variables and the variables that are in the model. Let me emphasize that: control variables should be left out of the model.
library("survey")
odds2prob <- function(x) x / (x + 1)
prob2odds <- function(x) x / (1 - x)
expit <- function(x) odds2prob(exp(x))
logit <- function(x) log(prob2odds(x))
set.seed(1)
survey_data <- data.frame(
if_female = rbinom(n = 100, size = 1, prob = 0.5),
agegroup = factor(sample(x = 1:3, size = 100, replace = TRUE)),
education = NA_integer_,
if_member = NA_integer_)
survey_data["agegroup"] <- relevel(survey_data$agegroup, ref = 3)
# Different probabilities between female and male persons
survey_data[survey_data$if_female == 0, "education"] <- sample(
x = 1:4,
size = sum(survey_data$if_female == 0),
replace = TRUE,
prob = c(0.1, 0.1, 0.5, 0.3))
survey_data[survey_data$if_female == 1, "education"] <-sample(
x = 1:4,
size = sum(survey_data$if_female == 1),
replace = TRUE,
prob = c(0.1, 0.1, 0.3, 0.5))
survey_data["if_member"] <- rbinom(n = 100, size = 1, prob =
expit((survey_data$education - 3)/2))
survey_data["education"] <- factor(survey_data$education)
survey_data["education"] <- relevel(survey_data$education, ref = 3)
survey_design <- svydesign(ids = ~ 1, data = survey_data)
logit <- svyglm(if_member ~ if_female + agegroup + education,
family = quasibinomial(link = "logit"),
design = survey_design)
exp(cbind(`odds ratio` = coef(logit), confint(logit)))
newdf <- data.frame(if_female = 0:1, education = c(3, 3), agegroup = = c(3, 3))
# Fails
mp <- marginpred(model = logit, adjustfor = ~ agegroup + education,
predictat = newdf, se = TRUE, type = "response")
logit2 <- svyglm(if_member ~ if_female,
family = quasibinomial(link = "logit"),
design = survey_design)
mp <- marginpred(model = logit2, adjustfor = ~ agegroup + education,
predictat = newdf, se = TRUE, type = "response")
# Probability for male and for female persons controlling for agegroup and education
cbind(prob = mp, confint(mp))
That's how I estimate marginal effects with the survey package:
# Probability difference between female and male persons
# when agegroup and education are set to 3
svycontrast(full_model, quote(
(exp(`(Intercept)` + if_female) / (exp(`(Intercept)` + if_female) + 1)) -
(exp(`(Intercept)`) / (exp(`(Intercept)`) + 1))))
# Can't use custom functions like expit :_(
There are probably smarter ways, but I hope it helps.
Please note that the difference between the probabilities predicted by marginpred() is different from the difference estimated by svycontrast(). The probabilities predicted by marginpred() don't seem to be affected by changing the value of the control variables (in example,
education = c(4, 4) instead of education = c(3, 3)), but the estimates from svycontrast() are affected as implied by the regression model.

Resources