The hormone levels are inflated by the sample mass, even after correcting hormone levels by sample mass (its a common problem for endocrinologists).
I'm trying to determine if treatment affect hormone levels, ''correcting'' for sample mass.
lme(hormone levels ~ treatment, random= list(~1|INDIVIDUAL, ~1|sample mass), na.action="na.omit", method = "ML",dados)
However, the reviewer said I cant use continuous variable as a random effect.
What is the alternative?
welcome to stack overflow. This question is probably more appropriate for Cross Validated, as it is more about statistics that coding. I'm going to answer anyway.
The reviewer is correct, you can't have a continuous predictor as a random effect. See some discussion about that here: https://stats.stackexchange.com/questions/105698/how-do-i-enter-a-continuous-variable-as-a-random-effect-in-a-linear-mixed-effect
To directly answer your question, the alternative is to add the predictor sample mass to the model as a fixed-effect, where it will be a covariate. This means the model will take into account both how hormone levels vary by size and the treatment. This is what user63230 has suggested, and I think it is the correct way to move forward. If you have a specific prediction about how the treatment may vary by the sample mass, you could include an interaction. If you expect the treatment to affect different individuals differently, you can fit a random slope for treatment on individual.
Related
I have a generalized mixed model that has 2 factors (fac1 (2 levels), fac2 (3 levels)) and 4 continuous variables (x1,x2,x3,x4) as fixed effects and a continuous response. I am interested in answering:
are the main effects x1-x4 (slopes) significant ignoring fac1 and fac2
are fac1 and fac2 levels significantly different from the model mean and to each other
is there a difference in slopes between fac1 levels and fac2 levels and fac1*fac2 levels
This means I would need to include interations in my model (random effects ignored here)
say: Y~x1+x2+x3+x4+fac1+fac2+x1:fac1+x2:fac1+x3:fac1+x4:fac1+x1:fac2+x2:fac2+x3:fac2+x4:fac2
but now my coefficients for x1-x4 are based on my ref level and interpretation of the overall main effects is not possible.
Also do I have to include xi:fac1:fac2+fac1:fac2 in my model as well to answer 3)?
is there an R package that can do this? I though about refitting the model (e.g. without the interactions) to answer 1) but the data points in each factor level are not the same so ignoring this in Y~x1+x2+x3+x4 the slope of the most common factor combination may dominate the result and inference? I know you can use contrasts e.g. by not dummy coding a factor with 2 levels to 0 and 1 but -0.5,0.5 but not sure how something would look like in this case.
Would it be better to ease the model combining the factors first e.g.
fac3<-interaction(fac1,fac2) #and then
Y~x1+x2+x3+x4+x1:fac3+x2:fac3+x3:fac3+x4:fac3
But how do I answer 1-3 from that.
Thanks a lot for your advice
I think you have to take a step back and ask yourself what hypotheses exactly you want to test here. Taken word for word, your 3-point list results in a lot (!) of hypotheses tests, some of which can be done in the same model, some requiring different parameterizations. Given that the question at hand is about hypotheses and not how to code them in R, this is more about statistics rather than programming and may be better moved to CrossValidated.
Nevertheless, for starters, I would propose the following procedure:
To test x1-x4 alone, just add all of these to your model, then use drop1() to check which of them actually add to the model fit.
In order to reduce the number of hypothesis tests (and different models to fit), I suggest you also test for each factor and the interaction as whole whether it is relevant. Again, add all three terms (both factors and interaction, so just fac1*fac2 if they are formatted as factors) to the model and use drop1.
This point alone includes many potential hypotheses/contrasts to test. Depending on parameterization (dummy or effect coding), for each of the 4 continuous predictors you have 3 or 5 first-order interactions with the factor dummies/effects and 2 or 6 second-order interactions, given that you test each group against all others. This is a total of 20 or 44 tests and means that it becomes very likely that you have false-positives (if you test at 95% confidence level). Additionally, please ask yourself whether these interactions can even be interpreted in a meaningful way. Therefore, I would advise that you to either focus on some specific interactions that you expect to be relevant. If you really want to explore all interactions, just test entire interactions (e.g. fac1:x1, not specific contrasts) first. For this you have to make 8 models, each including one factor-continuous interaction, then compare all of them to the no-interaction model, using anova().
One last thing: I have assumed that you have already figured out the random variable structure of your model (i.e. what cluster variable(s) to consider and whether there should be random slopes). If not, do that first.
I have a response Y that is a percentage ranging between 0-1. My data is nested by taxonomy or evolutionary relationship say phylum/genus/family/species and I have one continuous covariate temp and one categorial covariate fac with levels fac1 & fac2.
I am interested in estimating:
is there a difference in Y between fac1 and fac2 (intercept) and how much variance is explained by that
does each level of fac responds differently in regard to temp (linearly so slope)
is there a difference in Y for each level of my taxonomy and how much variance is explained by those (see varcomp)
does each level of my taxonomy responds differently in regard to temp (linearly so slope)
A brute force idea would be to split my data into the lowest taxonomy here species, do a linear beta regression for each species i as betareg(Y(i)~temp) . Then extract slope and intercepts for each speies and group them to a higher taxonomic level per fac and compare the distribution of slopes (intercepts) say, via Kullback-Leibler divergence to a distribution that I get when bootstrapping my Y values. Or compare the distribution of slopes (or interepts) just between taxonomic levels or my factor fac respectively.Or just compare mean slopes and intercepts between taxonomy levels or my factor levels.
Not sure is this is a good idea. And also not sure of how to answer the question of how many variance is explained by my taxonomy level, like in nested random mixed effect models.
Another option may be just those mixed models, but how can I include all the aspects I want to test in one model
say I could use the "gamlss" package to do:
library(gamlss)
model<-gamlss(Y~temp*fac+re(random=~1|phylum/genus/family/species),family=BE)
But here I see no way to incorporate a random slope or can I do:
model<-gamlss(Y~re(random=~temp*fac|phylum/genus/family/species),family=BE)
but the internal call to lme has some trouble with that and guess this is not the right notation anyways.
Is there any way to achive what I want to test, not necessarily with gamlss but any other package that inlcuded nested structures and beta regressions?
Thanks!
In glmmTMB, if you have no exact 0 or 1 values in your response, something like this should work:
library(glmmTMB)
glmmTMB(Y ~ temp*fac + (1 + temp | phylum/genus/family/species),
data = ...,
family = beta_family)
if you have zero values, you will need to do something . For example, you can add a zero-inflation term in glmmTMB; brms can handle zero-one-inflated Beta responses; you can "squeeze" the 0/1 values in a little bit (see the appendix of Smithson and Verkuilen's paper on Beta regression). If you have only a few 0/1 values it won't matter very much what you do. If you have a lot, you'll need to spend some serious time thinking about what they mean, which will influence how you handle them. Do they represent censoring (i.e. values that aren't exactly 0/1 but are too close to the borders to measure the difference)? Are they a qualitatively different response? etc. ...)
As I said in my comment, computing variance components for GLMMs is pretty tricky - there's not necessarily an easy decomposition, e.g. see here. However, you can compute the variances of intercept and slope at each taxonomic level and compare them (and you can use the standard deviations to compare with the magnitudes of the fixed effects ...)
The model given here might be pretty demanding, depending on the size of your phylogeny - for example, you might not have enough replication at the phylum level (in which case you could fit the model ~ temp*(fac + phylum) + (1 + temp | phylum:(genus/family/species)), i.e. pull out the phylum effects as fixed effects).
This is assuming that you're willing to assume that the effects of fac, and its interaction with temp, do not vary across the phylogeny ...
I'd like to obtain emmeans of a continuous response for each level of a several-level factor (numerous different populations) while "correcting" for differences in the frequency in another factor (gender) between the first, assuming no interaction between these two.
The model I am working with is x <- lm(response ~ size*population + gender).
As I understand it, weights=equal and weights=proportional do not take into account differences in the frequency of the gender factor across different populations, but use either an equal frequency or the frequency in the entire sample, respectively. The description of weights=outer is rather obscure to me, but it doesn't sound like this is exactly what I'm looking for either; the emmeans package documentation states: "All except "cells" uses the same set of weights for each mean."
But it seems like weights=cells is also not what I'm looking for, as it the emmeans will be closer to the ordinary marginal means, whereas I want them to be further away in cases where gender is unbalanced in certain populations. If I understand correctly, I would like the weighting to be the 'reverse' of this option. The emmean for each population should reflect what the mean of each population should be if gender had been sampled equally in each.
Perhaps I don't understand these weights fully, but is there an option to set weights to accomplish this?
I had originally asked this in an edit to a previous question, but I think that it deserves its own question.
I am running a glmer with a single dichotomous predictor (coded 1/0). The model also includes a random subject intercept, as well as a random item intercept and slope.
Changing which level of the predictor serves as the reference category doesn’t change the absolute value of the coefficient, EXCEPT when the random intercept and slope are uncorrelated.
This happens whether I keep the predictor as a numeric variable, or change the predictor into a factor and use the following code:
t1<-glmer(DV~IV+(1|PPT)+(0+dummy(IV, "1")|Item)+(1|Item), data = data, family = "binomial”)
Is this a genuine result? If so, can anyone explain why the uncorrelated random intercept and slope allow it to emerge? If not, how can I run a model that has an uncorrelated random intercept and slope that would prevent the choice of reference category from affecting the result?
Thank you very much!
Edit:
Here is some sample data.
I'm sorry I wasn't sure how to link to it with an r command, but a sample of csv data is here: https://pastebin.com/embed_js/X2h9yT4c
testdata<-read.csv("test.csv")
testdata$PPT<-as.factor(testdata$PPT)
testdata$BalancedIV<-as.factor(testdata$BalancedIV)
testdata$BalancedIVReversed<-as.factor(testdata$BalancedIVReversed)
testdata$BIV<-as.numeric(as.character(testdata$BalancedIV))
testdata$BIVR<-as.numeric(as.character(testdata$BalancedIVReversed))
testdata$UIV<-as.numeric(as.character(testdata$UnBalancedIV))
testdata$UIVR<-as.numeric(as.character(testdata$UnbalancedIVReversed))
These two models have the same predictor but with reverse coding (i.e., 1/0 the first time, and 0/1 the second time). You can see that their coefficients have different absolute values.
t19<-glmer(DV~BalancedIV+(1|PPT)+(0+dummy(BalancedIV, "1")|Item)+(1|Item), data = testdata, family = "binomial")
t20<-glmer(DV~BalancedIVReversed+(1|PPT)+(0+dummy(BalancedIVReversed, "1")|Item)+(1|Item), data = testdata, family = "binomial")
summary(t19)
summary(t20)
I'm working my way through a Linear Regression Textbook and am trying to replicate the results from a section on the Test of the General Linear Hypothesis, but I need a little bit of help on how to do so in R.
I've already taken a look at a number of other posts, but am hoping someone can give me some example code. I have data on twenty-six subjects which has the following form:
Group, Weight (lb), HDL Cholesterol mg/decaliters
1,163.5,75
1,180,72.5
1,178.5,62
2,106,57.5
2,134,49
2,216.5,74
3,163.5,76
3,154,55.5
3,139,68
Given this data I am trying to test to see if the regression lines fit to the three groups of subjects have a common slope. The models postulated are:
y=βo + β1⋅x + ϵ
y=γ0 + γ1⋅xi + ϵ
y= δ0 + δ1⋅xi + ϵ
So the hypothesis of interest is H0: β1 = γ1 = δ1
I have been trying to do this using the linearHypothesis function in the car library, but have been having trouble knowing what the model object should be, and am not confident that this is the correct approach (or package) to be using.
Any help would be much appreciated – Thanks!
Tim, your question doesn't seem so much to be about R code. Instead, it appears that you have questions about how to test the interaction of your Group and Weight (lb) variables on the outcome HDL Cholesterol mg/decaliters. You don't state this specifically, but I'm taking a guess that these are your predictors and outcome, respectively.
So essentially, you're trying to see if the predictor Weight (lb) has differential effects depending on the level of the variable Group. This can be done in a number of ways using the linear model. A simple regression approach would be lm(hdl ~ 1 + group + weight + group*weight). And then the coefficient for the interaction term group*weight would tell you whether or not there is a significant interaction (i.e., moderation) effect.
However, I think we would have a major concern. In particular, we ought to worry that our hypothesized effect is that the group variable and the hdl variable do not interact. That is, you're essentially predicting the null. Furthermore, you're predicting the null despite having a small sample size. Therefore, it would be rather unlikely that we have sufficient statistical power to detect an effect, even if there were one to be observed.