I am analysing some whale tourism data and am trying to construct linear mixed effect models in the nlme package to see if any of my explanatory variables affect encounter time between whales and tourists. (I am also open to running this model in lme4.)
My variables are:
mins: encounter time (response variable)
Id: individual whale ID (random effect)
Vessel: vessel Id (random effect)
Sex: sex of the animal
Length: length of the animal
Year
Month (nested within Year).
So my random variables are Id and Vessel and I also have Year and Month as nested random effects.
I have come up with the following:
form1 <- formula(Min ~ length + Sex+ Encounter)
model1 <- lme(form1,
random = list(Id = ~1,
Vessel = ~1,
Year=~1,
Month = ~1), data=wsdata, method="ML")
But all my random effects become nested within Id.
Is there any way I can define Id and Vessel as separate random effects and Year and Month as nested random effects?
In general it's much easier to specify crossed (what you mean by "separate", I think) random effects in lme4, so unless you need models for temporal or spatial autocorrelation or heteroscedasticity (which are still easier to achieve with nlme), I would go ahead with
library(lme4)
fit <- lmer(mins ~ Length + Sex+ (1|Id) + (1|Vessel) +
(1|Year/Month), data=wsdata, REML=FALSE)
A few other comments:
what is encounter? it was in your formula but not in your description of the data set
it seems quite likely that encounter times (a duration of encounters?) would be skewed, in which case you might want to log-transform them.
Related
I am trying to conduct a power analysis for studies where I use Linear Mixed Model for the analysis. I conducted a pilot study in order to see the effect sizes of the fixed effects and to see the results of random effects, which are required to fill in in a R function - study_parametor().
First, I build a lmer model using the data from the pilot study. In the model, the reaction time for the stimuli is set as the dependent variable, and the experimental condition (with 2levels), the number of trials (from 0 to 159, code as numeric values) as well as the interaction between the condition and the number of trials are included as fixed factors. The experimental condition is a between-subject condition, but the number of trials is within-subject factor - all participants go through the trials from 0 to 159. For random effects, I set the random intercept and slope for participants, and random intercept for beauty rating for each item (as a control factor). Together, the model looks like:
lmer(Reaction time ~ Condition*Number of trial + (1 + Number of trial|Subject) + (1|Beautyrating))
For power analysis I want to use a function study_parametor() in powerlmm package. In this function, we have to specify icc_pre_subject and var_ratio as the parametors for random effect variance information. What I want to do here is, to set the parametors based on the results of the pilot study.
From the tutorial, the two variables are defined as follows:
icc_pre_subject: the amount of the total baseline variance the is between-subjects. (there is a typo in the sentence in the tutorial). icc_pre_subject would be the 2-level ICC if there was no random slopes.
icc_pre_subject = var(subject_intercepts)/(var(subject_intercepts) + var(within-subject_error))
var_ratio: the ratio of total random slope variance over the level-1 residual variance.
var_ratio = var(subject_slopes)/var(within-subject_error))
Here, I am not sure what var(within-subject_error)) means, and how to specify it.
This is the results of random effects in the model which used the pilot study data
My question
which number should I use for specify the icc_pre_subject and var_ratio in the function of study_parametor()
I would like to estimate a random effect for Subject within each Day. In the data set, not every Subject is observed every Day, and therefore, I should not have an estimated effect for each Subject in each Day. I want to estimate a separate variance parameter (10 total, 1 for each Day) and distribution of Subjects within each Day (independent between time steps), to evaluate the change in the among Subject variability over time. Is this possible with lmer?
library(lme4)
data(sleepstudy)
set.seed(1)
sleep = sample_n(sleepstudy, size=500, replace=T)
sleep$Days = as.factor(sleep$Days)
table(sleep$Days, sleep$Subject)
fm1 <- lmer(Reaction ~ Days + (1|Days/Subject), sleep)
summary(fm1)
ranef(fm1)
This sounds like you would benefit more from using a Latent Growth Curve model as opposed to a Mixed Model. I believe the growth function in lavaan would best suit you.
I'm doing repeated measurements ANOVA in R with libraries:
library(ordinal)
library(car)
library(RVAideMemoire)
I have two groups: months and distance and the dependent variable is CO2:
distance month CO2
0 metres july 234
I've made a clmm model for CO2 explained by distance, month and interaction betwee month and distance:
model_CO2 = clmm(CO2.f ~ month + distance + month:distance + (1|nest),
data = field_data,
threshold = "equidistant")
The results show that both month and distance are significan, but not there interaction.
Now, I want to perform a Tukey test with this information, so my idea is to perform a Tukey test for each factor separatedly.
My question is:
Do I have to make another model, where I separate each factor? Or can I just perform the Tukey test using the model I created but only considering one factor?
Example:
Using the initial model:
library(emmeans)
library(lsmeans)
Tmonth = lsmeans(model_CO2,
~ month)
multcomp::cld(Tmonth,
alpha = 0.05,
Letters = letters,
adjust = "tukey")
Creating a new model only for month and then performing a Tukey test:
model_CO2m = clmm(CO2.f ~ month + (1|nest),
data = field_data,
threshold = "equidistant")
Tmonth = lsmeans(model_CO2m,
~ month)
multcomp::cld(Tmonth,
alpha = 0.05,
Letters = letters,
adjust = "tukey")
Thanks in advance!
I think some people would recommend that you do. But no, you don't have to, in that the estimated marginal means that you are comparing are well-defined; the interaction effects are just averaged over.
I would recommend that you plot the estimates for the factor combinations, though -- using emmip() for example -- so that you clearly understand what the estimates are that are being averaged.
Note
I just noticed in the question that you took a factor completely out of the model. I definitely recommend against doing that. Each factor contributes a significant main effect, so they both belong in the model. If you are to consider a reduced model here, only consider the one with both main effects but no interaction.
Hello (first timer here),
I would like to estimate a "two-way" cluster-robust variance-covariance matrix in R. I am using a particular canned routine from the "multiwayvcov" library. My question relates solely to the set-up of the cluster.vcov function in R. I have panel data of various crime outcomes. My cross-sectional unit is the "precinct" (over 40 precincts) and I observe crime in those precincts over several "months" (i.e., 24 months). I am evaluating an intervention that 'turns on' (dummy coded) for only a few months throughout the year.
I include "precinct" and "month" fixed effects (i.e., a full set of precinct and month dummies enter the model). I have only one independent variable I am assessing. I want to cluster on "both" dimensions but I am unsure how to set it up.
Do I estimate all the fixed effects with lm first? Or, do I simply run a model regressing crime on the independent variable (excluding fixed effects), then use cluster.vcov i.e., ~ precinct + month_year.
This seems like it would provide the wrong standard error though. Right? I hope this was clear. Sorry for any confusion. See my set up below.
library(multiwayvcov)
model <- lm(crime ~ as.factor(precinct) + as.factor(month_year) + policy, data = DATASET_full)
boot_both <- cluster.vcov(model, ~ precinct + month_year)
coeftest(model, boot_both)
### What the documentation offers as an example
### https://cran.r-project.org/web/packages/multiwayvcov/multiwayvcov.pdf
library(lmtest)
data(petersen)
m1 <- lm(y ~ x, data = petersen)
### Double cluster by firm and year using a formula
vcov_both_formula <- cluster.vcov(m1, ~ firmid + year)
coeftest(m1, vcov_both_formula)
Is is appropriate to first estimate a model that ignores the fixed effects?
First the answer: you should first estimate your lm -model using fixed effects. This will give you your asymptotically correct parameter estimates. The std errors are incorrect because they are calculated from a vcov matrix which assumes iid errors.
To replace the iid covariance matrix with a cluster robust vcov matrix, you can use cluster.vcov, i.e. my_new_vcov_matrix <- cluster.vcov(~ precinct + month_year).
Then a recommendation: I warmly recommend the function felm from lfe for both multi-way fe's and cluster-robust standard erros.
The syntax is as follows:
library(multiwayvcov)
library(lfe)
data(petersen)
my_fe_model <- felm(y~x | firmid + year | 0 | firmid + year, data=petersen )
summary(my_fe_model)
I have claim frequency panel data for five years consisting of age,cc,make of car,age and gender of insured person and geographical area.I have run a Poisson regression to model the claim frequency in R. Now i want to test for a Poisson fixed and random effect in R. I have done some research work and found the related codes:fm <- glmer(Y ~ A + B + (1|Subject), family = poisson, data = pData) in lme4 package.How do i apply it in my context? Thanks in advance