Syntax error when fitting a Bayesian logistic regression - r

I am attempting to model binary species traits, where presence is represented by 1 and absence by 0, as a function of some sampling variables. To accomplish this, I have constructed a brms model and added a phylogenetic structure to it. Here is the model I used:
model <- brms::brm(male_head | trials(1 + 0) ~
PC1 + PC2 + PC3 +
(1|gr(phylo, cov = covariance_matrix)),
data = data,
family = binomial(),
prior = prior,
data2 = list(covariance_matrix = covariance_matrix))
Each line of my df represents one observation with a binary outcome.
Initially, I was unsure about which arguments to use in the trials() function. Since my species are non-repeated and some have the traits I'm modeling while others do not, I thought that trials(1 + 0) might be appropriate. I recall seeing a vignette that suggested this, but I can't find it now. Is this syntax correct?
Furthermore, for some reason I'm unaware, the model is producing one estimate value for each line of my predictors. As my df has 362 lines, the model summary displays a lengthy list of 362 estimate values. I would prefer to have one estimate value for each sampling variable instead. Although I have managed to achieve this by making the treatment effect a random effect (i.e., (1|PC1) + (1|PC2) + (1|PC3)), I don't think this is the appropriate approach. I also tried bernoulli() but no success either. Do you have any suggestions for how I can address this issue?
EDIT:
For some reason the values of my sampling variables/principal components were being read as factors. The second part of this question was solved.

Related

backwards selection of glm does not change the complete model

I am very new to working with GLM. I have a dataset with categorical (as factors) and numerical predictor variables and the response variable is count data wiht a poisson distribution. These I put in a glm:
glm2<- glm(formula = count ~ Salinity + Period + Intensity + Depth + Temp + Treatment, data = dfglm, family = "poisson")
Treatment(1.1 - 3.6) and Period (morning/midday) are factors.
The output looks like this:
I already see multiple suprising things in this output (very big difference between the null-deviance and residual deviance, treatment 1.1 not showing, period morning and midday not shown as separate levels, very high standard errors) but I will continue for now.
For the backward selection I used this code:
backward<-step(glm2,direction="backward",trace=0)
summary(backward)
I got exactly the same output as given above. Also when checking backward$coefficients, all coefficients remained.
Lastly I tried this:
If anyone could give me advice/an interpretation of this output and how to make a better model with a working backward selection, it is greatly appreciated!

Clustering standard errors by a variable in a logistic regression - for graphing interaction plot (R)

I'm running a logistic regression/survival analysis where I cluster standard errors by a variable in the dataset. I'm using R.
Since this is not as straight forward as it is in STATA, I'm using a solution I found in the past : https://www.rdocumentation.org/packages/miceadds/versions/3.0-16/topics/lm.cluster
As an illustrative example of what I'm talking about:
model <- miceadds::glm.cluster(data = data, formula = outcome ~ a + b + c + years + years^2 + years^3, cluster = "cluster.id", family = "binomial")
This works well for getting the important values, this produces the coefficients, std. errors (clustered), and z-values. It took me forever just to get at this solution; and even now it is not ideal (like not being able to output to Stargazer). I've explored a lot of the other common suggestions on this issue - such as the Economic Theory solution (https://economictheoryblog.com/2016/12/13/clustered-standard-errors-in-r/); however, this is for lm() and I cannot get it to work for logistic regression.
I'm not beyond just running two models, one with glm() and one with glm.cluster() and replacing the standard errors in stargazer manually.
My concern is that I am at a loss as to how I would graph the above function, say if I were to do the following instead:
model <- miceadds::glm.cluster(data = data, formula = outcome ~ a*b + c + years + years^2 + years^3, cluster = "cluster.id", family = "binomial")
In this case, I want to graph a predicted probability plot to look at the interaction between a*b on my outcome; however, I cannot do so with the glm.cluster() object. I have to do it with a glm() model, but then my confidence intervals are awash.
I've been looking into a lot of the options on clustering standard errors for logistic regression around here, but am at a complete loss.
Has anyone found any recent developments on how to do so in r?
Are there any packages that let you cluster SE by a variable in the dataset and plot the objects? (Bonus points for interactions)
Any and all insight would be appreciated. Thanks!

How to convert Afex or car ANOVA models to lmer? Observed variables

In the afex package we can find this example of ANOVA analysis:
data(obk.long, package = "afex")
# estimate mixed ANOVA on the full design:
# can be written in any of these ways:
aov_car(value ~ treatment * gender + Error(id/(phase*hour)), data = obk.long,
observed = "gender")
aov_4(value ~ treatment * gender + (phase*hour|id), data = obk.long,
observed = "gender")
aov_ez("id", "value", obk.long, between = c("treatment", "gender"),
within = c("phase", "hour"), observed = "gender")
My question is, How can I write the same model in lme4?
In particular, I don't know how to include the "observed" term?
If I just write
lmer(value ~ treatment * gender + (phase*hour|id), data = obk.long,
observed = "gender")
I get an error telling that observed is not a valid option.
Furthermore, if I just remove the observed option lmer produces the error:
Error: number of observations (=240) <= number of random effects (=240) for term (phase * hour | id); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable.
Where in the lmer syntax do I specify the "between" or "within" variable?. As far as I know you just write the dependent variable on the left side and all other variables on the right side, and the error term as (1|id).
The package "car" uses the idata for the intra-subject variable.
I might not know enough about classical ANOVA theory to answer this question completely, but I'll take a crack. First, a couple of points:
the observed argument appears only to be relevant for the computation of effect size.
observed: ‘character’ vector indicating which of the variables are
observed (i.e, measured) as compared to experimentally
manipulated. The default effect size reported (generalized
eta-squared) requires correct specification of the obsered [sic]
(in contrast to manipulated) variables.
... so I think you'd be safe leaving it out.
if you want to override the error you can use
control=lmerControl(check.nobs.vs.nRE="ignore")
... but this probably isn't the right way forward.
I think but am not sure that this is the right way:
m1 <- lmer(value ~ treatment * gender + (1|id/phase:hour), data = obk.long,
control=lmerControl(check.nobs.vs.nRE="ignore",
check.nobs.vs.nlev="ignore"),
contrasts=list(treatment=contr.sum,gender=contr.sum))
This specifies that the interaction of phase and hour varies within id. The residual variance and (phase by hour within id) variance are confounded (which is why we need the overriding lmerControl() specification), so don't trust those particular variance estimates. However, the main effects of treatment and gender should be handled just the same. If you load lmerTest instead of lmer and run summary(m1) or anova(m1) it gives you the same degrees of freedom (10) for the fixed (gender and treatment) effects that are computed by afex.
lme gives comparable answers, but needs to have the phase-by-hour interaction constructed beforehand:
library(nlme)
obk.long$ph <- with(obk.long,interaction(phase,hour))
m2 <- lme(value ~ treatment * gender,
random=~1|id/ph, data = obk.long,
contrasts=list(treatment=contr.sum,gender=contr.sum))
anova(m2,type="marginal")
I don't know how to reconstruct afex's tests of the random effects.
As Ben Bolker correctly says, simply leave observed out.
Furthermore, I would not recommend to do what you want to do. Using a mixed model for a data set without replications within each cell of the design per participant is somewhat questionable as it is not really clear how to specify the random effects structure. Importantly, the Barr et al. maxim of "keep it maximal" does not work here as you realized. The problem is that the model is overparametrized (hence the error from lmer).
I recommend using the ANOVA. More discussion on exactly this question can be found on a crossvalidated thread where Ben and me discussed this more thoroughly.

Testing a General Linear Hypothesis in R

I'm working my way through a Linear Regression Textbook and am trying to replicate the results from a section on the Test of the General Linear Hypothesis, but I need a little bit of help on how to do so in R.
I've already taken a look at a number of other posts, but am hoping someone can give me some example code. I have data on twenty-six subjects which has the following form:
Group, Weight (lb), HDL Cholesterol mg/decaliters
1,163.5,75
1,180,72.5
1,178.5,62
2,106,57.5
2,134,49
2,216.5,74
3,163.5,76
3,154,55.5
3,139,68
Given this data I am trying to test to see if the regression lines fit to the three groups of subjects have a common slope. The models postulated are:
y=βo + β1⋅x + ϵ
y=γ0 + γ1⋅xi + ϵ
y= δ0 + δ1⋅xi + ϵ
So the hypothesis of interest is H0: β1 = γ1 = δ1
I have been trying to do this using the linearHypothesis function in the car library, but have been having trouble knowing what the model object should be, and am not confident that this is the correct approach (or package) to be using.
Any help would be much appreciated – Thanks!
Tim, your question doesn't seem so much to be about R code. Instead, it appears that you have questions about how to test the interaction of your Group and Weight (lb) variables on the outcome HDL Cholesterol mg/decaliters. You don't state this specifically, but I'm taking a guess that these are your predictors and outcome, respectively.
So essentially, you're trying to see if the predictor Weight (lb) has differential effects depending on the level of the variable Group. This can be done in a number of ways using the linear model. A simple regression approach would be lm(hdl ~ 1 + group + weight + group*weight). And then the coefficient for the interaction term group*weight would tell you whether or not there is a significant interaction (i.e., moderation) effect.
However, I think we would have a major concern. In particular, we ought to worry that our hypothesized effect is that the group variable and the hdl variable do not interact. That is, you're essentially predicting the null. Furthermore, you're predicting the null despite having a small sample size. Therefore, it would be rather unlikely that we have sufficient statistical power to detect an effect, even if there were one to be observed.

Mixed Modelling - Different Results between lme and lmer functions

I am currently working through Andy Field's book, Discovering Statistics Using R. Chapter 14 is on Mixed Modelling and he uses the lme function from the nlme package.
The model he creates, using speed dating data, is such:
speedDateModel <- lme(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality,
random = ~1|participant/looks/personality)
I tried to recreate a similar model using the lmer function from the lme4 package; however, my results are different. I thought I had the proper syntax, but maybe not?
speedDateModel.2 <- lmer(dateRating ~ looks + personality + gender +
looks:gender + personality:gender +
(1|participant) + (1|looks) + (1|personality),
data = speedData, REML = FALSE)
Also, when I run the coefficients of these models I notice that it only produces random intercepts for each participant. I was trying to then create a model that produces both random intercepts and slopes. I can't seem to get the syntax correct for either function to do this. Any help would be greatly appreciated.
The only difference between the lme and the corresponding lmer formula should be that the random and fixed components are aggregated into a single formula:
dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+ (1|participant/looks/personality)
using (1|participant) + (1|looks) + (1|personality) is only equivalent if looks and personality have unique values at each nested level.
It's not clear what continuous variable you want to define your slopes: if you have a continuous variable x and groups g, then (x|g) or equivalently (1+x|g) will give you a random-slopes model (x should also be included in the fixed-effects part of the model, i.e. the full formula should be y~x+(x|g) ...)
update: I got the data, or rather a script file that allows one to reconstruct the data, from here. Field makes a common mistake in his book, which I have made several times in the past: since there is only a single observation in the data set for each participant/looks/personality combination, the three-way interaction has one level per observation. In a linear mixed model, this means the variance at the lowest level of nesting will be confounded with the residual variance.
You can see this in two ways:
lme appears to fit the model just fine, but if you try to calculate confidence intervals via intervals(), you get
intervals(speedDateModel)
## Error in intervals.lme(speedDateModel) :
## cannot get confidence intervals on var-cov components:
## Non-positive definite approximate variance-covariance
If you try this with lmer you get:
## Error: number of levels of each grouping factor
## must be < number of observations
In both cases, this is a clue that something's wrong. (You can overcome this in lmer if you really want to: see ?lmerControl.)
If we leave out the lowest grouping level, everything works fine:
sd2 <- lmer(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+
(1|participant/looks),
data=speedData)
Compare lmer and lme fixed effects:
all.equal(fixef(sd2),fixef(speedDateModel)) ## TRUE
The starling example here gives another example and further explanation of this issue.

Resources