Linear mixed effects model with heteroscedastic structure for the errors - r

In a randomized clinical trial in the ophthalmology field, I would like to find the R implementation for a linear mixed effects model given by
log(y_ijk) = b0 + b1_j + b2_k + b3_jk + w_i + e_ijk
where y_ijk is an amount of residual antibiotics, b1_j is the effect of structure (j=1 if cornea and j=2 if aquous humor), b2_k are the effects of treatments (k=1,2,3), b3 are interaction effects, w_i are random effects and e_ijk are random errors. We have pairs of observations for each individual (the residual antibiotic is measured in both structures for each subject).
My main difficulty is that the model should incorporate a heteroscedastic structure for the errors, so that the variances are different for each treatment group. In this way,
e_ijk ~ Normal(0,sigma_k), k = 1,2,3
Thanks

Related

How to specify icc_pre_subject and var_ratio in study_parameters function (powerlmm package)?

I am trying to conduct a power analysis for studies where I use Linear Mixed Model for the analysis. I conducted a pilot study in order to see the effect sizes of the fixed effects and to see the results of random effects, which are required to fill in in a R function - study_parametor().
First, I build a lmer model using the data from the pilot study. In the model, the reaction time for the stimuli is set as the dependent variable, and the experimental condition (with 2levels), the number of trials (from 0 to 159, code as numeric values) as well as the interaction between the condition and the number of trials are included as fixed factors. The experimental condition is a between-subject condition, but the number of trials is within-subject factor - all participants go through the trials from 0 to 159. For random effects, I set the random intercept and slope for participants, and random intercept for beauty rating for each item (as a control factor). Together, the model looks like:
lmer(Reaction time ~ Condition*Number of trial + (1 + Number of trial|Subject) + (1|Beautyrating))
For power analysis I want to use a function study_parametor() in powerlmm package. In this function, we have to specify icc_pre_subject and var_ratio as the parametors for random effect variance information. What I want to do here is, to set the parametors based on the results of the pilot study.
From the tutorial, the two variables are defined as follows:
icc_pre_subject: the amount of the total baseline variance the is between-subjects. (there is a typo in the sentence in the tutorial). icc_pre_subject would be the 2-level ICC if there was no random slopes.
icc_pre_subject = var(subject_intercepts)/(var(subject_intercepts) + var(within-subject_error))
var_ratio: the ratio of total random slope variance over the level-1 residual variance.
var_ratio = var(subject_slopes)/var(within-subject_error))
Here, I am not sure what var(within-subject_error)) means, and how to specify it.
This is the results of random effects in the model which used the pilot study data
My question
which number should I use for specify the icc_pre_subject and var_ratio in the function of study_parametor()

Interpreting Interaction Coefficients within Multiple Linear Regression Model

I am struggling with the interpretation of the coefficients within interaction models.
I am looking at the outcome of an interaction model of 2 binary (dummy variables). I was just wondering how I interpret the:
- Intercept (is everything at 0)?
- The slope coefficients?
- The interaction coefficients?
In standard multiple linear regression, we talk about the change in y when we have a 1-unit change in x, holding everything else constnat. How do we interpret this in interactions? Especially since both my variables are dummy?
Hope this makes sense and thanks very much in advance.
How do we interpret this in interactions?
The meaning of the regression coefficients in models having interaction do not remain the same as in the case of simple linear regression without interaction simply because of the added interaction term/terms.
The regression coefficients no longer indicate the change in the mean response with a unit increase of the predictor variable, with the other predictor variable held constant at any given level. This interpretation is only valid after accounting for the dependence on the level of the other predictor variable.
Ex:
A SLRM with interaction terms:
E(Y) = B0 + B1X1 + B2X2 + B3X1X2
Interpretation:
It can be shown that the change in the mean response with a unit increase in X1 when X2 is held constant is:
B1 + B3X2
And, the change in the mean response with a unit increase in X2 when X1 is held
constant is:
B2 + B3X1
I was just wondering how I interpret the: - Intercept (is everything at 0)?
The intercept is the prediction from the regression model when all the predictors are at level zero.
The slope coefficients?
In case of no interaction coefficients.
E(Y) = B0 + B1X1 + B2X2
The coefficients B1, B2 indicate, respectively, how much higher (lower) the response functions for dummies X1, X2 are than the one for, both dummies zero.
Thus, B1 and B2 measure the differential effects of the dummy variables on the height of the response function i.e. E(Y).
You can inspect that only the slope changes:
When X1 = 1 and X2 = 0.
E(Y) = B0 + B1
and, when X1 = 0 and X2 + 1.
E(Y) = B0 + B2
The interaction coefficients?
By interaction coefficients, I understand the regression coefficients for model with interaction.
The model:
E(Y) = B0 + B1X1 + B2X2 + B3X1X2
When both X1 and X2 are 1, then the model becomes:
E(Y) = B0 + B1 + B2 + B3.
Which translates to an increase or decrease in the height of the response function.
You can create a more interesting example with a third continuous predictor and explore the interaction relationship of the continuous variable with the dummies, in which case the slope of the regression would also change instead of only the intercept. And, hence the interpretation that how much higher (lower) one response function is than the other for any given level of X1 and X2 would not be valid as the slope also would have changed and thus the effect of the dummy predictor also would have been more evident.
When interaction effects are present, the effect of the qualitative predictor (dummy variable) can be studied by comparing the regression functions within the scope of the model for the different classes of the dummy variable.
Reference: Kutner et. al. Applied Linear Statistical Models

Fitting GLM (family = inverse.gaussian) on simulated AR(1)-data.

I am encountering quite an annoying and to me incomprehensible problem, and I hope some of you can help me. I am trying to estimate the autoregression (influence of previous measurements of variable X on current measurement of X) for 4 groups that have a positively skewed distribution to various degrees. The theory is that more positively skewed distributions have less variance, and since the relationship between 2 variables depends on the amount of shared variance, positively skewed distributions have a smaller autoregression that more normally distributed variables.
I use simulations to investigate this, and generate data as follows: I simulate data for n people with tp time points. I use a fixed autoregressive parameter, phi (at .3 so we have a stationary process). To generate positively skewed distributions I use a chi-square distributed error. Individuals differ in the degrees of freedom that is used for the chi2 distributed errors. In other words, degrees of freedom is a level 2 variable (and is in itself chi2(1)-distributed). Individuals with a very low df get a very skewed distribution whereas individuals with a higher df get a more normal distribution.
for(i in 1:n) { # Loop over persons.
chi[i, 1] <- rchisq(1, df[i]) # Set initial value.
for(t in 2:(tp + burn)) { # Loop over time points.
chi[i, t] <- phi[i] * chi[i, t - 1] + # Autoregressive effect.
rchisq(1, df[i]) # Chi-square distributed error.
} # End loop over time points.
} # End loop over persons.
Now that I have the outcome variable generated, I put it in long format, I create a lagged predictor, and I person mean center the predictor (or group mean center, or cluster mean center, all the same). I call this lagged and centered predictor chi.pred. I make the subgroups based on the degrees of freedom of individuals. The 25% with a lowest df goes in subgroup 1, 26% - 50% in subgroup 2, etc.
The problem is this: fitting a multilevel (i.e. mixed or random effects model) autoregressive(1) model with family = inverse.gaussian and link = 'identity', using glmer() from the lme4 package gives me quite a lot of warnings. E.g. "degenerate Hessian", "large eigen value/ratio", "failed to converge with max|grad", etc.. I just don't get why.
The model I fit are
# Random intercept, but fixed slope with subgroups as level 2 predictor of slope.
lmer(chi ~ chi.pred + chi.pred:factor(sub.df.noise) + (1|id), data = sim.data, control = lmerControl(optimizer = 'bobyqa'))
# Random intercept and slope.
lmer(chi ~ chi.pred + (1 + chi.pred|id), data = sim.data, control = lmerControl(optimizer = 'bobyqa'))
The reason I use inverse gaussian is because it is said to work better on skewed data.
Does anybody have any clue why I can't fit the models? I have tried increasing sample size and time points, different optimizers, I have double-double-double checked if lagging and centering the data is correct, increased the number of iterations, added some noise to the subgroups (since otherwise they are 1 on 1 related to degree of freedom) etc.

Using Zero-inflation regression and Zero-inflation negative binomial regression for trend

I am using Zero-inflation Poisson (zip) and Zero-inflation negative binomial (zinb) regressions to detect temporal trends in count data (death per year for 30 years reported at 6 hospitals) that has may zeros and Overdispersion.
I have written some codes using pscl package and my goal is to compare trends among hospitals.
Counts<- read.csv("data.csv", header = T)
Years= Counts$X
Ho1= Counts$Ho1
Ho2= Counts$Ho2
Ho3= Counts$Ho3
... .........
... ..........
require(pscl)
zip1 <- zeroinfl(Ho1 ~ Years, dist = "poisson")
zinb4 <- zeroinfl(Ho4 ~ Years, dist = "negbin")
But when I plot some of the data it shows slightly increasing trends whereas the zip and zinb show negative trends
Here is an example:
zip result:
zip1
Call:
zeroinfl(formula = Ho1 ~ Years, dist = "poisson")
Count model coefficients (poisson with log link):
(Intercept) Years
-4.836815 0.002837
Zero-inflation model coefficients (binomial with logit link):
(Intercept) Years
467.2323 -0.2353
for this model the trend (slope) is -0.235 and when I used ordinary least squares (OLS) the trend= 0.043.
My understanding is that both zip and OLS should differ slightly.
So I was thinking maybe my codes are not correct or I am missing something.
I would appreciate any thoughts and suggestion
With increasing Years you get increasing counts (= higher responses and less zeros) and you get decreasing zero inflation (= higher responses and less zeros). Thus, the effects in both components of the model appear to be in sync and conform with your OLS results.

comparing coefficients between lmList and lmer

can anyone tell me why the slope coefficients deviate between those extracted from a lmer model with a random slope, and those from a lmList model fitted to the same dataset?
Thanks...
After some digging I found the answer in Doug Bates' book on lme4. Paraphrasing... when the individual linear fit at the subject level is poor the linear mixed effects model coefficient tends to exhibit what is called “shrinkage” (see http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf) towards the population level value (e.g. the fixed effect). In this case the uncertainty in the site-level coefficient is large (e.g. our confidence in our absolute estimate of its precise value is low), so in order to balance fidelity to the data, measured by the residual sum of squares, with simplicity of the model, the mixed-effects model smooths out the between-subject differences in the predictions by bringing them closer to a common set of predictions, but not at the expense of dramatically increasing the sum of squared residuals.
Note that the "shrinkage" might be a good thing assuming some degree of similarity among your subjects (or observational units), for example if you assume they are drawn from the same population, because it makes the model more robust to outliers at the individual level.
You can quantify the increase in the sum of squared residuals by computing an overall coefficient of determination for the mixed-effects model and the within-subject fits. I am doing it here for the sleepstudy dataset contained in the lme4 package.
> library(lme4)
> mm <- lmer(Reaction ~ Days + (Days|Subject), data = sleepstudy) # mixef-effects
> ws <- lmList(Reaction ~ Days |Subject, data = sleepstudy) # within-subject
>
> # coefficient of determination for mixed-effects model
> summary(lm(sleepstudy$Reaction ~ predict(mm)))$r.squared
[1] 0.8271702
>
> # coefficient of determination for within subjects fit
> require(nlme)
> summary(lm(sleepstudy$Reaction ~ predict(ws)))$r.squared
[1] 0.8339452
You can check that the decrease in the proportion of variability explained by the mixed-effects model respect to within-subjects fits is quite small 0.8339452 - 0.8271702 = 0.006775.

Resources