Repeated measures: continuous outcome predicted by continous and categorical predictors - r

I have the following variables and if they were in wide format I would calculate something like
lm(happiness ~ personality_trait*condition)
But my data is in long format.
I suppose it will be a repeated measures model but I'm not sure. I considered Linear Mixed Models but I'm not sure if I understood and whether it is what I'm looking for.
Thanks a lot!
participant
personality_trait1
condition
happiness
1
10
animal
5
1
10
human
7
2
2
animal
3
2
2
human
4
3
5
animal
6
3
5
human
2

I think
library(lme4)
lmer(happiness ~ personality_trait*condition + (1|participant), data= ...)
should do it. This allows for a different intercept for each individual, drawn from a Gaussian distribution around the population mean intercept). In some situations you could also fit a random slopes model (different slope for each individual), but in this case it wouldn't make sense since you appear to have only two observations per individual (thus, estimates of variation in slope would be confounded with the residual variation: see here for an example).
Are your samples always in the order "animal, then human"? If not, you might want to add a subject-level fixed effect of order ...

Related

Summary measurement lost when adding mods to rma.glmm

At this moment I'm trying to calculate the (adjusted)IRLM with the rma.glmm function of the metafor package.
My data is a dataframe that looks like the following:
head(data)
patient-years events age
1 180.0000 4 NA
2 116.2500 13 51.83
3 66.2500 6 48.00
4 423.6333 21 58.00
5 142.1783 7 53.20
6 1117.3167 72 59.90
The function to calculate IRLM works fine:
y=rma.glmm(xi=events, ti=patent-years, data=data, measure="IRLN",method="ML")
And gives me the following forest plot:
metafor::forest.rma (y)
Forest plot
However, when I want to adjust my model:
nh=rma.glmm(xi=events,ti=patient-years, data=datanh,
measure="IRLN", mods = ~ age , method="ML")
(Where age is a numeric vector)
The summary measure is lost
Adjusted forest
I've tried all I can think of, but really don't know how to fix this. Do you have any suggestions?
When you add a moderator to the model, there is no longer the effect (or to be precise, the average effect in a random-effects model). The size of the average effect then depends on the value of the moderator. The gray-shaded polygons in the forest plot then reflect the estimated average effects corresponding to the values of 'age' for the included studies.
You could compute the predicted average effect for a particular value of age with the predict() function, i.e.,:
predict(nh, newmods = <age value>, transf=exp)
(transf=exp to obtain the estimated average IR for the specified age value).
Some might plug the average of the age values observed in the studies into and interpret this as an adjusted estimate. One can debate whether this terminology ('adjusted effect') is correct.

How to write lmer formula for mixed effects model with two fixed effects

I'm new to linear mixed effects models and I'm trying to use them for hypothesis testing.
In my data (DF) I have two categorical/factor variables: color (red/blue/green) and direction (up/down). I want to see if there are significant differences in scores (numeric values) across these factors and if there is an interaction effect, while accounting for random intercepts and random slopes for each participant.
What is the appropriate lmer formula for doing this?
Here's what I have...
My data is structured like so:
> str(DF)
'data.frame': 4761 obs. of 4 variables:
$ participant : Factor w/ 100 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ direction : Factor w/ 2 levels "down","up": 2 2 2 2 2 2 2 2 2 2 ...
$ color : Factor w/ 3 levels "red","blue",..: 3 3 3 3 3 3 3 3 3 3 ...
$ scores : num 15 -4 5 25 0 3 16 0 5 0 ...
After some reading, I figured that I could write a model with random slopes and intercepts for participants and one fixed effect like so:
model_1 <- lmer(scores ~ direction + (direction|participant), data = DF)
This gives me a fixed effect estimate and p-value for direction, which I understand to be a meaningful assessment of the effect of direction on scores while individual differences across participants are accounted for as a random effect.
But how do I add in my second fixed factor, color, and an interaction term whilst still affording each participant a random intercept and slope?
I thought maybe I could do this:
model_2 <- lmer(scores ~ direction * color + (direction|participant) + (color|participant), data = DF)
But ultimately I really don't know what exactly this formula means. Any guidance would be appreciated.
You can include several random slopes in at least two ways:
What you proposed: Estimate random slopes for both predictors, but don't estimate the correlation between them (i.e. assume the random slopes of different predictors don't correlate):
scores ~ direction * color + (direction|participant) + (color|participant)
The same but also estimate the correlation between random slopes of different predictors:
scores ~ direction * color + (direction + color|participant)
Please note two things:
First, in both cases, the random intercepts for "participant" are included, as are correlations between each random slope and the random intercept. This probably makes sense unless you have theoretical reasons to the contrary. See this useful summary if you want to avoid the correlation between random intercepts and slopes.
Second, in both cases you don't include a random slope for the interaction term! If the interaction effect is actually what you are interested in, you should at least try to fit a model with random slopes for it so to avoid potential bias in the fixed interaction effect. Here, again, you can choose to allow or avoid correltions between the interaction term's random slopes and other random slopes:
Without correlation:
scores ~ direction * color + (direction|participant) + (color|participant) + (direction:color|participant)
With correlation:
scores ~ direction * color + (direction * color|participant)
If you have no theoretical basis to decide between models with or without correlations between the random slopes, I suggest you do both, compare them with anova() and choose the one that fits your data better.

How to improve a Zero-Inflated Negative Binomial regression model?

everybody!
I have a response variable that counts sucessful days in a month and is distributed in a peculiar shape (see above). About 50% are zeros, and there is a heavy tail. Because of the overdispersion and the excess of zeros, I was advised to predict it with a Zero-Inflated Negative Binomial regression model.
However, no matter how significant a model I obtain, it reflects little of those distributing features (see below). For example, the peaks are always around 4, and no predictions fall beyond 20.
Is this usual in fitting overdispersed, heavy-tailed count data? Are there other ways to improve the fitting? Any suggestions would be appreciated. Thank you!
P. S.
I also tried logistic regression to predict zero/non-zero only. But none of the fitted models perform better than simply guessing zeros for all cases.
I suppose you did a histogram of the fitted values, so this will only reflect the fitted means, and possibly multiplied by the ratio of being zero depending on the model you use. It is not supposed to recreate that distribution because how spread your data can be is embedded in the dispersion parameter.
We can use an example from the pscl package:
library(pscl)
data("bioChemists")
fit <- hurdle(art ~ ., data = bioChemists,dist="negbin",zero.dist="binomial")
par(mfrow=c(1,2))
hist(fit$y,main="Observed")
hist(fit$fitted.values,main="Fitted")
As mentioned before, in this hurdle model, the fitted values you see, are the predicted means multiplied by the ratio of being zero (see more here):
head(fit$fitted.values)
1 2 3 4 5 6
1.9642025 1.2887343 1.3033753 1.3995826 2.4560884 0.8783207
head(predict(fit,type="zero")*predict(fit,type="count"))
1 2 3 4 5 6
1.9642025 1.2887343 1.3033753 1.3995826 2.4560884 0.8783207
To simulate the data based on the fitted model, we extract out the parameters:
Theta=fit$theta
Means=predict(fit,type="count")
Zero_p = predict(fit,type="prob")[,1]
Have function to simulate the counts:
simulateCounts = function(mu,theta,zero_p){
N = length(mu)
x = rnbinom(N,mu=mu,size=THETA)
x[runif(x)<zero_p] = 0
x
}
So run this simulation a number of times to get the spectrum of values:
set.seed(100)
simulated = replicate(10,simulateCounts(Means,Theta,Zero_p))
simulated = unlist(simulated)
par(mfrow=c(1,2))
hist(bioChemists$art,main="Observed")
hist(simulated,main="simulated")

Linear Regression Model in R

I am a total novice to R, I have an assignment using Linear regression, where we have to produce 2 different models, to see which one is a better predictor of pain. The first model is just to contain age and gender. The second model is to include extra variables; The State Trait Anxiety Inventory scores, The Pain Catastrophizing Scale, The Mindful Attention Awareness Scale, and measures of cortisol levels in both salivia and serum (blood).
The research question states that we need to conduct a hierarchical regression, by building a model containing age and sex as predictors of pain (model 1), then building a new model with the predictors: age, sex, STAI, pain catastrophizing, mindfulness, and cortisol measures (model 2). Hence, the predictors used in model 1 are a subset of the predictors used in model 2. After completion of both models, need comparison to assess whether substantial new information was gained about pain in model 2 compared to model 1.
I am having a lot of problems with "sex" as a variable, someone had coded a "3" instead of male and female and although I have excluded the score, "3" is still coming up as a level in the data set, is there a way to remove this?
Furthermore how can I convert "sex" into a "factor" type vector instead of "character" vector? Can categorical variables be predictors in a model? I have attempted to do this using the following command, but it continues to return in errors.
sex_vector <- c("female", "male") etc.
factor.sex.vector <- factor(sex.vector)
Below is an excerpt of the data set:
data.frame': 156 obs. of 10 variables:
$ sex : Factor w/ 3 levels "3","female","male": 2 2 3 3 3 3 3 2 2 2 ...
Eliminate the unwanted value and then, as suggested by mt1022 apply factor again:
factor.sex.vector <- subset(factor.sex.vector, factor.sex.vector != 3)
factor.sex.vector <- factor(factor.sex.vector)

Anova with count data: glm?

I have this kind of data:
y: count data x: a factorial predictor with 3 levels
Conceptually I need an ANOVA testing if the means of y for the three levels (group) are significantly different.
Due to the y is a count I performed a poisson glm like this (in R):
y (count) ~ glm(x, family=poisson)
and then? How can I proceede?
I need to know if level 1 is significantly different from 2 and 3 and if 2 is significantly different from 3 (all the possible combinations).
Thanks

Resources