R: mixed model with time series data, repeated measurement of predictors - r

I am trying to run a mixed effects model that uses time as a fixed effect and some covariate variables.
The repeated measures of mean score of depression (=depr) is the dependent variable. We measured the outcome at 6 time points.
lme(depr ~ time, random = ~ 1 | ID, data=longirisk, method = "ML")
Now, I am interested in whether the values of a third variable (here: stess) contribute to mean scores of depression. Maybe I could add an interaction term (time * stress) or I could add a fixed effect (stress). However, I am not sure how to consider repeated measured stress values.
Any ideas to solve this problems?

Related

GLS / GLM nested design with autocorrelation over time

Still fairly new to GLM and a bit confused about how to establish my model.
About my project:
I sampled the microbiome (and measured a diversity index value = Shannon) from the root system of a sample of 9 trees (=tree1_cat).
In each tree I sampled fine and thick roots (=rootpart) and each tree was sampled four times (=days) over the course of one season. Thus I have a nested design but have to keep time in mind for autocorrelation. Also not all values are present, thus I have a few missing values). So far I have tried and tested the following:
Model <- gls(Shannon ~ tree1_cat/rootpart + tree1_cat + days,
na.action = na.omit, data = psL.meta,
correlation = corAR1(form =~ 1|days),
weights = varIdent(form= ~ 1|days))
Furthermore I've tried to get more insight and used anova(Model) to get the p-values of those factors. Am I allowed to use those p-values? Also I've used emmeans(Model, specs = pairwise ~ rootpart)for pairwise comparisons but since rootpart was entered as nested factor it only gives me the paired interactions.
It all works, but I am not sure, whether this is the right model! Any help would be highly appreciated!
It would be helpful to know your scientific question, but let's suppose you're interested in differences in Shannon diversity between fine and thick roots and in time trends. A model you could use would be:
library(lmerTest)
lmer(Shannon ~ rootpart*days + (rootpart*days|tree1_cat), data = ...)
The fixed-effect component rootpart*days can be expanded into 1 + rootpart + days + rootpart:days (where 1 signifies the intercept)
intercept: SD in fine roots on day 0 (hopefully that's the beginning of the season)
rootpart: difference between fine and thick roots on day 0
days: change per day in SD in fine roots (slope)
rootpart:days difference in slope between thick roots and fine roots
The random-effect component (rootpart*days|tree1_cat) measures how all four of these effects vary across trees, and their correlations (e.g. do trees with a larger-than-average difference between fine and thick roots on day 0 also have a larger-than-average change over time in fine root SD?)
This 'maximal' random effects model is almost certainly too complex for your data; a rough rule of thumb says you should have 10-20 data points per parameter estimated, the fixed-effect model takes 4 parameters. A full model with 4 random effects requires the estimate of a 4×4 covariance matrix, which has (4*5)/2 = 10 parameters all by itself. I might just try (1+days|tree1_cat) (random slopes) or (rootpart|tree_cat) (among-tree difference in fine vs. thick differences), with a bias towards allowing for the variation in the effect that is your primary interest (e.g. if your primary question is about fine vs. thick then go with (rootpart|tree_cat).
I probably wouldn't worry about autocorrelation at all, nor heteroscedasticity by day (your varIdent(~1|days) term) unless those patterns are very strongly evident in the data.
If you want to allow for autocorrelation you'll need to fit the model with nlme::lme or glmmTMB (lmer still doesn't have machinery for autocorrelation models); something like
library(nlme)
lme(Shannon ~ rootpart*days,
random = ~days|tree1_cat,
data = ...,
correlation = corCAR1(form = ~days|tree1_cat)
)
You need to use corCAR1 (continuous-time autoregressive order-1) rather than the more common corAR1 for unevenly sampled data. Be aware that lme is more finicky/worse at dealing with singular models, so you may discover you need to simplify your model before you can actually get this model to run.

How to specify icc_pre_subject and var_ratio in study_parameters function (powerlmm package)?

I am trying to conduct a power analysis for studies where I use Linear Mixed Model for the analysis. I conducted a pilot study in order to see the effect sizes of the fixed effects and to see the results of random effects, which are required to fill in in a R function - study_parametor().
First, I build a lmer model using the data from the pilot study. In the model, the reaction time for the stimuli is set as the dependent variable, and the experimental condition (with 2levels), the number of trials (from 0 to 159, code as numeric values) as well as the interaction between the condition and the number of trials are included as fixed factors. The experimental condition is a between-subject condition, but the number of trials is within-subject factor - all participants go through the trials from 0 to 159. For random effects, I set the random intercept and slope for participants, and random intercept for beauty rating for each item (as a control factor). Together, the model looks like:
lmer(Reaction time ~ Condition*Number of trial + (1 + Number of trial|Subject) + (1|Beautyrating))
For power analysis I want to use a function study_parametor() in powerlmm package. In this function, we have to specify icc_pre_subject and var_ratio as the parametors for random effect variance information. What I want to do here is, to set the parametors based on the results of the pilot study.
From the tutorial, the two variables are defined as follows:
icc_pre_subject: the amount of the total baseline variance the is between-subjects. (there is a typo in the sentence in the tutorial). icc_pre_subject would be the 2-level ICC if there was no random slopes.
icc_pre_subject = var(subject_intercepts)/(var(subject_intercepts) + var(within-subject_error))
var_ratio: the ratio of total random slope variance over the level-1 residual variance.
var_ratio = var(subject_slopes)/var(within-subject_error))
Here, I am not sure what var(within-subject_error)) means, and how to specify it.
This is the results of random effects in the model which used the pilot study data
My question
which number should I use for specify the icc_pre_subject and var_ratio in the function of study_parametor()

How to fit a Generalized Linear Model with negative binomial distribution to ratios

I have a data set with the number of birds counted using one method (PIROP_Count) and the number of birds counted with a different method where fewer birds are always detected (ECSAS_Count) per 5 minute survey. My data set also records the Weather and SeaState during each survey. I am trying to fit the ratio of ECSAS/PIROP counts to a generalized linear model with a negative binomial distribution by using glm.nb() but I am confused how to proceed considering the response variable must be an non-neg integer. Using an offset term has been suggested but I am unclear background stats and the code and hoping someone can clarify how I should proceed.
glm_birds = glm.nb(ECSAS_Count ~ SeaState + Weather, offset(PIROP_Count), data = df_test)
summary(glm_kitchen.sink)
Is the correct use of this function? And if so, why does it appear as weights = offset(PIROP_Count) in my summary?

How do graph predicted results of a general linear mixed model based on categorical treatment

I fit my data to a general linear mixed model with Treatment as a fixed effect and Clutch as a random effect. Here is my code:
model<-glmer(cbind(Successes,Failures)~Treatment+(1|Clutch),
data = cont, family = "binomial")
My work deals with sex ratios, and I define a female as a success, and a male as a failure for each observation. I have 4 different treatments. I want to plot (preferably using ggplot) the predicted sex ratio from the model for each treatment, taking clutch into account (with 95% confidence intervals). I realize this is probably a large question, but can anyone help me with the code I would need to do this? I have been searching online for the past few days. Thanks!

How to get individual coefficients and residuals in panel data using fixed effects

I have a panel data including income for individuals over years, and I am interested in the income trends of individuals, i.e individual coefficients for income over years, and residuals for each individual for each year (the unexpected changes in income according to my model). However, I have a lot of observations with missing income data at least for one or more years, so with a linear regression I lose the majority of my observations. The data structure is like this:
caseid<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4)
years<-c(1998,2000,2002,2004,2006,2008,1998,2000,2002,2004,2006,2008,
1998,2000,2002,2004,2006,2008,1998,2000,2002,2004,2006,2008)
income<-c(1100,NA,NA,NA,NA,1300,1500,1900,2000,NA,2200,NA,
NA,NA,NA,NA,NA,NA, 2300,2500,2000,1800,NA, 1900)
df<-data.frame(caseid, years, income)
I decided using a random effects model, that I think will still predict income for missing years by using a maximum likelihood approach. However, since Hausman Test gives a significant result I decided to use a fixed effects model. And I ran the code below, using plm package:
inc.fe<-plm(income~years, data=df, model="within", effect="individual")
However, I get coefficients only for years and not for individuals; and I cannot get residuals.
To maybe give an idea, the code in Stata should be
xtest caseid
xtest income year
predict resid, resid
Then I tried to run the pvcm function from the same library, which is a function for variable coefficients.
inc.wi<-pvcm(Income~Year, data=ldf, model="within", effect="individual")
However, I get the following error message:
"Error in FUN(X[[i]], ...) : insufficient number of observations".
How can I get individual coefficients and residuals with pvcm by resolving this error or by using some other function?
My original long form data has 202976 observations and 15 years.
Does the fixef function from package plm give you what you are looking for?
Continuing your example:
fixef(inc.fe)
Residuals are extracted by:
residuals(inc.fe)
You have a random effects model with random slopes and intercepts. This is also known as a random coefficients regression model. The missingness is the tricky part, which (I'm guessing) you'll have to write custom code to solve after you choose how you wish to do so.
But you haven't clearly/properly specified your model (at least in your question) as far as I can tell. Let's define some terms:
Let Y_it = income for ind i (i= 1,..., N) in year t (t= 1,...,T). As I read you question, you have not specified which of the two below models you wish to have:
M1: random intercepts, global slope, random slopes
Y_it ~ N(\mu_i + B T + \gamma_i I T, \sigma^2)
\mu_i ~ N(\phi_0, \tau_0^2)
\gamma_i ~ N(\phi_1, tau_1^2)
M2: random intercepts, random slopes
Y_it ~ N(\mu_i + \gamma_i I T, \sigma^2)
\mu_i ~ N(\phi_0, \tau_0^2)
\gamma_i ~ N(\phi_1, tau_1^2)
Also, your example data is nonsensical (see below). As you can see, you don't have enough observations to estimate all parameters. I'm not familiar with library(plm) but the above models (without missingness) can be estimated in lme4 easily. Without a realistic example dataset, I won't bother providing code.
R> table(df$caseid, is.na(df$income))
FALSE TRUE
1 2 4
2 4 2
3 0 6
4 5 1
Given that you do have missingness, you should be able to produce estimates for either hierarchical model via the typical methods, such as EM. But I do think you'll have to write the code to do the estimation yourself.

Resources