Multilevel Imputation: glmer and HMI - r

Trying to fit a two-level imputation model with HMI (hierch. multiple imputation)...
The model I'm using is this (I want random intercept ONLY):
glmer(pica_yn ~ 1 + visit_c+visit_c2 + geo_child + hhloc + diar_c + hemo_c + (1|pid))
I keep getting this error:
Error in buildZ(rmodel.terms[r], data = data, nginverse =
names(ginverse)): object id not found
It seems as though HMI prefers the specified formula also has a random slope.
Has anyone fit a multilevel imputation model for a BINARY response?
Here is an example you can run that will get the same error:
data("sleepstudy", package="lme4")
sleepstudy[sample(1:nrow(sleepstudy), size = 20), "Reaction"] <- NA
sleep_formula<-Reaction ~ Days + (1|Subject)
hmi_imp <- hmi(data = sleepstudy, model_formula = sleep_formula, M = 5, maxit = 1)

Related

I need help in implementing a glmm model using Laplace approximation

The model I am trying to implement using laplace approximation i s below:
model2<-lmer(ChickWeight$weight~Time + Diet + Time*Diet + (1+Time|Chick), data = ChickWeight)
summary(model2)
I was trying to use the code below to do this but i am not sure:
#Laplace approximation
library(glmmsr)
mod_Laplace <- glmm(ChickWeight$weight~Time + Diet + Time*Diet + (1+Time|Chick), data = ChickWeight,
family = gaussian, method = "Laplace")
I am getting an error as the output

Random effects specification in gamlss in R

I would like to use the gamlss package for fitting a model benefiting from more available distributions in that package. However, I am struggling to correctly specify my random effects or at least I think there is a mistake because if I compare the output of a lmer model with Gaussian distribution and the gamlss model with Gaussian distribution output differs. If comparing a lm model without the random effects and a gamlss model with Gaussian distribution and without random effects output is similar.
I unfortunately cannot share my data to reproduce it.
Here my code:
df <- subset.data.frame(GFW_food_agg, GFW_food_agg$fourC_area_perc < 200, select = c("ISO3", "Year", "Forest_loss_annual_perc_boxcox", "fourC_area_perc", "Pop_Dens_km2", "Pop_Growth_perc", "GDP_Capita_current_USD", "GDP_Capita_growth_perc",
"GDP_AgrForFis_percGDP", "Gini_2008_2018", "Arable_land_perc", "Forest_loss_annual_perc_previous_year", "Forest_extent_2000_perc"))
fourC <- lmer(Forest_loss_annual_perc_boxcox ~ fourC_area_perc + Pop_Dens_km2 + Pop_Growth_perc + GDP_Capita_current_USD +
GDP_Capita_growth_perc + GDP_AgrForFis_percGDP + Gini_2008_2018 + Arable_land_perc + Forest_extent_2000_perc + (1|ISO3) + (1|Year),
data = df)
summary(fourC)
resid_panel(fourC)
df <- subset.data.frame(GFW_food_agg, GFW_food_agg$fourC_area_perc < 200, select = c("ISO3", "Year", "Forest_loss_annual_perc_boxcox", "fourC_area_perc", "Pop_Dens_km2", "Pop_Growth_perc", "GDP_Capita_current_USD", "GDP_Capita_growth_perc",
"GDP_AgrForFis_percGDP", "Gini_2008_2018", "Arable_land_perc", "Forest_loss_annual_perc_previous_year", "Forest_extent_2000_perc"))
df <- na.omit(df)
df$ISO3 <- as.factor(df$ISO3)
df$Year <- as.factor(df$Year)
fourC <- gamlss(Forest_loss_annual_perc_boxcox ~ fourC_area_perc + Pop_Dens_km2 + Pop_Growth_perc + GDP_Capita_current_USD +
GDP_Capita_growth_perc + GDP_AgrForFis_percGDP + Gini_2008_2018 + Arable_land_perc + Forest_extent_2000_perc + random(ISO3) + random(Year),
data = df, family = NO, control = gamlss.control(n.cyc = 200))
summary(fourC)
plot(fourC)
How do the random effects need to be specified in gamlss to be similar to the random effects in lmer?
If I specify the random effects instead using
re(random = ~1|ISO3) + re(random = ~1|Year)
I get the following error:
Error in model.frame.default(formula = Forest_loss_annual_perc_boxcox ~ :
variable lengths differ (found for 're(random = ~1 | ISO3)')
I found the +re(random=~1|x) specification to work fairly well with my GAMLSS. Have you double check that the NA's are being removed from your dataset? Sometimes na.omit does not work properly.
Have a look at this thread that has the same error than yours, but in a GAM. You can try that code to remove your NA's
Error in model.frame.default: variable lengths differ

Syntax for glmer function for use with glmulti?

Using glmer, I can run a logistic regression mixed model just fine. But when I try to do the same using glmulti, I get errors (described below). I think the problem is with the function I am specifying for use in glmulti. I want a function that specifies a logistic regression model for data containing continuous fixed covariates and categorical random effects, using a logit link. The response variable is a binary 0/1.
Sample data:
library(lme4)
library(rJava)
library(glmulti)
set.seed(666)
x1 = rnorm(1000) # some continuous variables
x2 = rnorm(1000)
x3 = rnorm(1000)
r1 = rep(c("red", "blue"), times = 500) #categorical random effects
r2 = rep(c("big", "small"), times = 500)
z = 1 + 2*x1 + 3*x2 +2*x3
pr = 1/(1+exp(-z))
y = rbinom(1000,1,pr) # bernoulli response variable
df = data.frame(y=y,x1=x1,x2=x2, x3=x3, r1=r1, r2=r2)
A single glmer logistic regression works just fine:
model1<-glmer(y~x1+x2+x3+(1|r1)+(1|r2),data=df,family="binomial")
But errors occur when I try to use the same model structure through glmulti:
# create a function - I think this is where my problem is
glmer.glmulti<-function(formula, data, family=binomial(link ="logit"), random="", ...){
glmer(paste(deparse(formula),random),data=data,...)
}
# run glmulti models
glmulti.logregmixed <-
glmulti(formula(glmer(y~x1+x2+x3+(1|r1)+(1|r2), data=df), fixed.only=TRUE), #error w/o fixed.only=TRUE
data=df,
level = 2,
method = "g",
crit = "aicc",
confsetsize = 128,
plotty = F, report = F,
fitfunc = glmer.glmulti,
family = binomial(link ="logit"),
random="+(1|r1)","+(1|r2)", # possibly this line is incorrect?
intercept=TRUE)
#Errors returned:
singular fit
Error in glmulti(formula(glmer(y ~ x1 + x2 + x3 + (1 | r1) + (1 | r2), :
Improper call of glmulti.
In addition: Warning message:
In glmer(y ~ x1 + x2 + x3 + (1 | r1) + (1 | r2), data = df) :
calling glmer() with family=gaussian (identity link) as a shortcut to lmer() is deprecated; please call lmer() directly
I've tried various changes to the function, and within the formula and fitfunc portion of the glmulti code. I've tried substituting lmer for glmer and I guess I don't understand the error. I'm also afraid that calling lmer may change the model structure, as during one of my attempts the summary() of the model stated "Linear mixed model fit by REML ['lmerMod']." I need the glmulti models to be the same as what I'm obtaining with model1 using glmer (ie summary(model1) gives "Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']"
Many similar questions remain unanswered. Thanks in advance!
Credit:
sample data set created with help from here:
https://stats.stackexchange.com/questions/46523/how-to-simulate-artificial-data-for-logistic-regression
glmulti code adapted from here:
Model selection using glmulti

Multilevel moderated mediation with continuous variables

I am a beginner in R, so please forgive me if my question reflects insufficient background.
I am trying to run a moderated mediation model using the mediation and lme4 libraries.
All of my variables are continuous. My data have a nested structure with individuals nested in branches (Branch).
In the model I'm trying to test, my predictor/independent variable (abranch) is at the branch level. My mediator (bmed) and outcome (cout) are at the individual level. And the effect of the mediator is moderated by another individual level variable (dmod). So in my model I have abranch predicting bmed, and bmed*dmod are predicting cout.
This is the syntax I've used:
med.fit <- glmer(
bmed ~ abranch + (1|Branch),
family = binomial(link = "logit"),
data = Dataset
)
out.fit <- glmer(
cout ~ dmod*bmed + (1+bmed|Branch),
family = binomial(link = "logit"),
data = Dataset
)
I was then thinking of using:
med.out <- mediate(med.fit, out.fit, treat = "abranch", mediator = "bmed",
+ sims = 100)
summary(med.out)
But even before getting to the last two lines, I get the following error:
Error in eval(family$initialize, rho) : y values must be 0 <= y <= 1
I now realize that this is because I'm using the "binomial"/logit family whereas my DV is continuous and not between 0 and 1. What can I do, given the nature of my variables?

Using CARET together with GAM ("gamSpline" method) in R Poisson Regression

I am trying to use caret package to tune 'df' parameter of a gam model for my cohort analysis.
With the following data:
cohort = 1:60
age = 1:26
grid = data.frame(expand.grid(age = age, cohort = cohort))
size = data.frame(cohort = cohort, N = sample(100:150,length(cohort), replace = TRUE))
df = merge(grid, size, by = "cohort")
log_k = -3 + log(df$N) - 0.5*log(df$age) + df$cohort*(df$cohort-30)*(df$cohort-50)/20000 + runif(nrow(df),min = 0, max = 0.5)
df$conversion = rpois(nrow(df),exp(log_k))
Explanation of the data : Cohort number is the time of arrival of the potential customer. N is the number of potential customer that arrived at that time. Conversion is the number out of those potential customer that 'converted' (bought something). Age is the age (time spent from arrival) of the cohort when conversion took place. For a given cohort there are fewer conversions as age grows. This effect follows a power law.
But the total conversion rate of each cohort can also change slowly in time (cohort number). Thus I want a smoothing spline of the time variable in my model.
I can fit a gam model from package gam
library(gam)
fit = gam(conversion ~ log(N) + log(age) + s(cohort, df = 4), data = df, family = poisson)
fit
> Call:
> gam(formula = conversion ~ log(N) + log(age) + s(cohort, df = 4),
> family = poisson, data = df)
> Degrees of Freedom: 1559 total; 1553 Residual
> Residual Deviance: 1869.943
But if i try to train the model using the CARET package
library(caret)
fitControl = trainControl(verboseIter = TRUE)
fit.crt = train(conversion ~ log(N) + log(age) + s(cohort,df),
data = df, method = "gamSpline",
trControl = fitControl, tune.length = 3, family = poisson)
I get this error :
+ Resample01: df=1
model fit failed for Resample01: df=1 Error in as.matrix(x) : object 'N' not found
- Resample01: df=1
+ Resample01: df=2
model fit failed for Resample01: df=2 Error in as.matrix(x) : object 'N' not found .....
Please does anyone know what I'm doing wrong here?
Thanks
There are a two things wrong with your code.
The train function can be a bit tedious depending on the method you used (as you have noticed). In the case of method = "gamSpline", the train function adds a smooth term to every independent term in the formula. So it converts your variables to s(log(N), df), s(log(age) df) and to s(s(cohort, df), df).
Wait s(s(cohort, df), df) does not really makes sense. So you must change s(cohort, df) to cohort.
I am not sure why, but the train with method = "gamSpline" does not like it when you put functions (e.g. log) in the formula. I think this is due to the fact that this method already applies the s() functions to your variables. This problem can be solved by applying the log earlier to your variables. Such as df$N <- log(df$N) or logN <- log(df$N) and use logN as variable. And of course, do the same for age.
My guess is that you don't want this method to apply a smoothing term to all your independent variables based on the code you provided. I am not sure if this is possible and how to do it, if it is possible.
Hope this helps.
EDIT: If you want a more elegant solution than the one I provided at point 2, make sure to read the comment of #topepo. This suggestion also allows you to apply s() function to the variables you want if I understand it correctly.

Resources