How to write linear model with fixed and random effect - r

I'm trying to figure out if I'm writing my linear mixed effect model in the correct manner to compare the models.
The experimental set up:
We are looking at the length~weight relationship of a species of snails. We chose 3 sites (rivers) for these experiments that we consider a fixed effect- we expect there to be a difference, we just want to see the magnitude of that difference between rivers. We also have the snails in different individual groups (6 groups/river) so that we could keep them in a constant area to be measured. We don't expect/aren't interested in a group effect, but want to make sure it isn't causing any issues. So here we want to see the effect of the river on the length~weight relationship with cage as a source of random effects.
I have written the models as follows to compare with AIC:
No group effect:
Model1<-aov(Weight~Length*River,data=SnailData)
Group effect:
Model2<-lmer(Weight~Length*River +(1|group),data=SnailData)
Would this be correct, or is there a different way I should be looking at the group random effect?

Related

use.u=TRUE in bootMer function

I have a question about boostrapping confidence intervals for the random effects (BLUPs) of a multilevel model.
I'm currently using bootMer and there is an argument use.u=TRUE that allows one to treat the BLUPs as fixed instead of re-estimating them. Since the BLUPs are random variables it would seem appropriate to re-estimate them at each bootstrap, and indeed the default option is use.u=FALSE.
However the underlying assumption is that my clusters are a random sample of clusters drawn from a population of clusters. In my case I am running a survey experiment in 26 countries (this is the cluster of interest) which in reality were not randomly drawn. And while I am interested in drawing inferences about the larger population of countries from which my sample is drawn, I am also interested in the cluster specific effects, AKA the BLUPs, for each one of these clusters. Because of this I'm resorting to performing bootstrap to get valid confidence intervals for these "estimates".
In this case would it be OK to set use.u=TRUE?
A related question was asked here: https://stats.stackexchange.com/questions/417518/how-to-get-confidence-intervals-for-modeled-data-of-lmer-model-in-r-with-bootmer
however I'm not sure if the answer travelled to my case. Anyone have ideas?

clustering standard errors within MLMs/lme4

Is it possible to use both cluster standard errors and multilevel models together and how does one implement this in R?
In my set up I am running a conjoint experiment in 26 countries with 2000 participants per country. Like any conjoint experiment each participant is shown two vignettes and asked to choose/rate each vignette. The same participants is then shown two fresh vignettes for comparison and asked to repeat the task. In this case each participant performs two comparisons. The hierarchy is thus comparisons nested within individuals nested within countries. I am currently running a multilevel model with each comparison at level 1 and country is the level 2 unit. Obviously comparisons within individuals are likely to be correlated so I'd like to cluster standard errors at the individual level as well. It seems overkill to add another level in the MLM for this since the size of my clusters are extremely small (n=2) and it makes more sense to do my analysis at the individual level (not to mention unnecessarily complicating the model since with 2000 individuals*26 countries the parameter space becomes crazy huge). Is this possible? If so how does one do this in R together with a multilevel model set up?
The cluster size of 2 is not an issue, and I don't see any issue with the parameter space either. If you fit random intercepts for participants, and countries, these are estimated as latent normally distributed variables. A model such as:
lmer(outomce ~ fixed effects + (1|country/participant)
This will handle the dependencies within clusters (at the participant level and the country level) so there will be no need to use cluster standard errors.

Conjoint experiment: How to formally test if effect sizes for attribute levels are significantly different from each other

I am doing conjoint analysis in R, working with the Cregg-package by Thomas Leeper. I am estimating AMCEs (all good up to this point, getting nice plots, etc.)
Now, I want to formally test whether the estimated effect of one attribute level is significantly different from the estimated effect of another attribute level within the same dimension (just for the simple AMCE, no subgroups).
Anybody has an idea of how to do that? Any solution for cjoint (or STATA) would also be most welcome. Thanks!

Did I just do an ANCOVA or MANOVA?

I’m trying to do an ANCOVA here ...
I want to analyze the effect of EROSION FORCE and ZONATION on all the species (listed with small letters) in each POOL.STEP (ranging from 1-12/1-4), while controlling for the effect of FISH.
I’m not sure if I’m doing it right. What is the command for ANCOVA?
So far I used lm(EROSIONFORCE~ZONATION+FISH,data=d), which yields:
So what I see here is that both erosion force percentage (intercept?) and sublittoral zonation are significant in some way, but I’m still not sure if I’ve done an ANCOVA correctly here or is this just an ANOVA?
In general, ANCOVA (analysis of covariance) is simply a special case of the general linear model with one categorical predictor (factor) and one continuous predictor (the "covariate"), so lm() is the right function to use.
However ... the bottom line is that you have a moderately challenging statistical problem here, and I would strongly recommend that you try to get local help (if you're working within a research group, can you consult with others in your group about appropriate methods?) I would suggest following up either on CrossValidated or r-sig-ecology#r-project.org
by putting EROSIONFORCE on the left side of the formula, you're specifying that you want to use EROSIONFORCE as a response (dependent) variable, i.e. your model is estimating how erosion force varies across zones and for different fish numbers - nothing about species response
if you want to analyze the response of a single species to erosion and zone, controlling for fish numbers, you need something like
lm(`Acmaeidae s...` ~ EROSIONFORCE+ZONATION+FISH, data=your_data)
the lm() suggestion above would do each species independently, i.e. you'd have to do a separate analysis for each species. If you also want to do it separately for each POOL.STEP you're going to have to do a lot of separate analyses. There are various ways of automating this in R, the most idiomatic is probably to melt your data (see reshape2::melt or tidy::gather) into long format and then use lmList from lme4.
since you have count data with low means, i.e. lots of zeros (and a few big values), you should probably consider a Poisson or negative binomial model, and possibly even a zero-inflated/hurdle model (i.e. analyze presence-absence and size of positive responses separately)
if you really want to analyze the joint distribution of all species (i.e., a response of a multivariate analysis, which is the M in MANOVA), you're going to have to work quite a bit harder ... there are a variety of joint species distribution models by people like Pierre Legendre, David Warton and others ... I'd suggest you try starting with the mvabund package, but you might need to do some reading first

How can I include repeated measures to my lmer correctly

In my study I was sampling the same sites in different regions for many years. Each site has different properties in each year, which is important for my research question. I want to know, if the properties of the site affect biodiversity on the sites. And I am interested in the interaction of the propterties and the regions.
Overview:
Biodiversity = response
Site property = fixed factor, changes every year
Region = fixed factor , same regions every year
Site = random effect, is repeatedly sampled in the different sampling years
Year = random effect, is the factor in which "site" is repeated
At the moment my model looks like this:
mod1 <- lmer(biodiversity~region*siteProperty+(1|Year)+(1|site))
I'm not sure if this accounts for the repeated measures.
Alternatively I was thinking about this one, as it includes also the nestedness of the sites in the years, but maybe that is not necessary:
mod2 <- lmer(biodiversity~region*siteProperty+(1|Year)+(1|Year:site))
The problem with this approach is, that it only works if my site properties are not zero. But I have zeros in different properties and I need to analyse their effects as well.
If you need more information, just ask me for.
Thanks for your help!
Your first example,
mod1 <- lmer(biodiversity~region*siteProperty+(1|Year)+(1|site))
should be fine (although I'd encourage you to use the data argument explicitly ...). If you have samples for "many years" for each site, you might want to consider
including a trend model (i.e. include Year, as a numeric variable, in the fixed effects part of the model as well, either as a simple linear term or as part of an additive model, e.g. using splines::ns
checking for/allowing for autocorrelation (although this is tricky in lme4; you can use lme, but then crossed random effects of Year and site become harder).
If you have one sample per site/year combination, you don't want (1|Site:year), because that will be the same as the residual variability term.
Your statement "only works if my site properties are not zero" doesn't make sense to me: in general, having zeros in predictor variables shouldn't cause any problems ... ?

Resources