I have this kind of data:
y: count data x: a factorial predictor with 3 levels
Conceptually I need an ANOVA testing if the means of y for the three levels (group) are significantly different.
Due to the y is a count I performed a poisson glm like this (in R):
y (count) ~ glm(x, family=poisson)
and then? How can I proceede?
I need to know if level 1 is significantly different from 2 and 3 and if 2 is significantly different from 3 (all the possible combinations).
Thanks
Related
I have the following variables and if they were in wide format I would calculate something like
lm(happiness ~ personality_trait*condition)
But my data is in long format.
I suppose it will be a repeated measures model but I'm not sure. I considered Linear Mixed Models but I'm not sure if I understood and whether it is what I'm looking for.
Thanks a lot!
participant
personality_trait1
condition
happiness
1
10
animal
5
1
10
human
7
2
2
animal
3
2
2
human
4
3
5
animal
6
3
5
human
2
I think
library(lme4)
lmer(happiness ~ personality_trait*condition + (1|participant), data= ...)
should do it. This allows for a different intercept for each individual, drawn from a Gaussian distribution around the population mean intercept). In some situations you could also fit a random slopes model (different slope for each individual), but in this case it wouldn't make sense since you appear to have only two observations per individual (thus, estimates of variation in slope would be confounded with the residual variation: see here for an example).
Are your samples always in the order "animal, then human"? If not, you might want to add a subject-level fixed effect of order ...
I'm new to linear mixed effects models and I'm trying to use them for hypothesis testing.
In my data (DF) I have two categorical/factor variables: color (red/blue/green) and direction (up/down). I want to see if there are significant differences in scores (numeric values) across these factors and if there is an interaction effect, while accounting for random intercepts and random slopes for each participant.
What is the appropriate lmer formula for doing this?
Here's what I have...
My data is structured like so:
> str(DF)
'data.frame': 4761 obs. of 4 variables:
$ participant : Factor w/ 100 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ direction : Factor w/ 2 levels "down","up": 2 2 2 2 2 2 2 2 2 2 ...
$ color : Factor w/ 3 levels "red","blue",..: 3 3 3 3 3 3 3 3 3 3 ...
$ scores : num 15 -4 5 25 0 3 16 0 5 0 ...
After some reading, I figured that I could write a model with random slopes and intercepts for participants and one fixed effect like so:
model_1 <- lmer(scores ~ direction + (direction|participant), data = DF)
This gives me a fixed effect estimate and p-value for direction, which I understand to be a meaningful assessment of the effect of direction on scores while individual differences across participants are accounted for as a random effect.
But how do I add in my second fixed factor, color, and an interaction term whilst still affording each participant a random intercept and slope?
I thought maybe I could do this:
model_2 <- lmer(scores ~ direction * color + (direction|participant) + (color|participant), data = DF)
But ultimately I really don't know what exactly this formula means. Any guidance would be appreciated.
You can include several random slopes in at least two ways:
What you proposed: Estimate random slopes for both predictors, but don't estimate the correlation between them (i.e. assume the random slopes of different predictors don't correlate):
scores ~ direction * color + (direction|participant) + (color|participant)
The same but also estimate the correlation between random slopes of different predictors:
scores ~ direction * color + (direction + color|participant)
Please note two things:
First, in both cases, the random intercepts for "participant" are included, as are correlations between each random slope and the random intercept. This probably makes sense unless you have theoretical reasons to the contrary. See this useful summary if you want to avoid the correlation between random intercepts and slopes.
Second, in both cases you don't include a random slope for the interaction term! If the interaction effect is actually what you are interested in, you should at least try to fit a model with random slopes for it so to avoid potential bias in the fixed interaction effect. Here, again, you can choose to allow or avoid correltions between the interaction term's random slopes and other random slopes:
Without correlation:
scores ~ direction * color + (direction|participant) + (color|participant) + (direction:color|participant)
With correlation:
scores ~ direction * color + (direction * color|participant)
If you have no theoretical basis to decide between models with or without correlations between the random slopes, I suggest you do both, compare them with anova() and choose the one that fits your data better.
We did a field study in which we tried to understand which factors significantly explain the probability of a group of animals (5 species in total) crossing through 30 wildlife road-crossing structures. The response variable is binomial (yes=crossed; no = did not cross) and was recorded by animal species. We did about 30 visits to each crossing structure (our random factor) in which we recorded the binomial response by each animal species and the values of a few predictors.
So, I have this (simplified for better understanding) mixed effects model: library (lme4)
Mymodel <- glmer(cross.01 ~ stream.01 + width.m + grass.per + (1|structure.id),
data = Mydata, family = binomial)
stream is a factor with 2 levels; width.m is continuous; grass.per is a percentage
This is the model in which I assessed crossings by all species combined (i.e., cross. 01 = 1 when an animal of any species crossed, cross.01 = 0 when no animal crossed). However, we did one model per species and those species-specific models highlight that different species exhibit different relationships between crossings and explanatory variables.
My problem: This means that my model above suffers from an additional source of variation related to the species level without accounting for it. However I cannot recalibrate the above model adding the species level as random factor because, in my binomial response, the zero means no species crossed (all zeros would have "NA" or, say, "none" for species) and so that additional source of variation is only present when the response was 1. Just to confirm this, I did add species as a random factor:
(1 | structure.id) + (1 | species)
As expected, the message is "Error: Response is constant"
How can I account for the species variability in my model in lme4?
I am a total novice to R, I have an assignment using Linear regression, where we have to produce 2 different models, to see which one is a better predictor of pain. The first model is just to contain age and gender. The second model is to include extra variables; The State Trait Anxiety Inventory scores, The Pain Catastrophizing Scale, The Mindful Attention Awareness Scale, and measures of cortisol levels in both salivia and serum (blood).
The research question states that we need to conduct a hierarchical regression, by building a model containing age and sex as predictors of pain (model 1), then building a new model with the predictors: age, sex, STAI, pain catastrophizing, mindfulness, and cortisol measures (model 2). Hence, the predictors used in model 1 are a subset of the predictors used in model 2. After completion of both models, need comparison to assess whether substantial new information was gained about pain in model 2 compared to model 1.
I am having a lot of problems with "sex" as a variable, someone had coded a "3" instead of male and female and although I have excluded the score, "3" is still coming up as a level in the data set, is there a way to remove this?
Furthermore how can I convert "sex" into a "factor" type vector instead of "character" vector? Can categorical variables be predictors in a model? I have attempted to do this using the following command, but it continues to return in errors.
sex_vector <- c("female", "male") etc.
factor.sex.vector <- factor(sex.vector)
Below is an excerpt of the data set:
data.frame': 156 obs. of 10 variables:
$ sex : Factor w/ 3 levels "3","female","male": 2 2 3 3 3 3 3 2 2 2 ...
Eliminate the unwanted value and then, as suggested by mt1022 apply factor again:
factor.sex.vector <- subset(factor.sex.vector, factor.sex.vector != 3)
factor.sex.vector <- factor(factor.sex.vector)
I am testing a mixed model with 4 predictors : 2 categorical predictors (with 6 and 7 levels respectively) and 2 quantitative predictors.
I would like to know if I am allowed, while testing my model, to create interaction terms in which I mix categorical and quantitative predictors.
Suppose Y = f(a, b) is the model I want to test, a is the quantitative predictor and b is the categorical predictor.
Am I allowed to search for (example in R):
linfit <- lm(Y ~ a +b +a:b, data=mydata)
The interpretation of the results is similar of the one I have when mixing quantitative predictors?
First, the code you wrote is right, R will give you a result. And if the class of b is already been set up as factor, R will do the regression considering b as a categorical predictor.
Second, I assume you are asking about the statistical interpretation of the interaction term. The statistical meaning of the below three situations are not the same,
(1) a and b are quantitative predictors.
In the regression result from R, there will be four rows, a, b, ab, interception. The regression process takes ab as another quantitative variable and do linear regression.
y = β0 + β1⋅a + β2⋅b + β3⋅a*b
(2) a and b are categorical predictors.
Suppose a has 3 levels and b has 2. Draw out the the design matrix which is consisted with 1 or 0;
y = β0 + β1⋅a2 + β2⋅a3 + β3⋅b2 + β4⋅a2*b2 + β5⋅a3*b2
(3) a is categorical and b is quantitative predictor.
Suppose a has 3 levels.
y = β0 + β1⋅a2 + β2⋅a3 + β3⋅b + β4⋅a2*b + β5⋅a3*b
For more details of interaction term and design matrix, generalized linear model will talk more about it. Also, it's easy to try it out in R from the regression results.