Converting random effect expression from SAS to R lmer syntax - r

Random effects of my mixed models formula in SAS proc mixed syntax looks like this:
random intercept color size
/type = vc subject = group solution;
I converted it to R lmer syntax as follows:
((1|group) + (0 + color|group)) + ((1|group) + (0 + size|group))
Is it correct?
Can I represent sas random effects formula as follows:
(1|group) + (0 + color|group)) + (0 + size|group) ?
Or is it a wrong implementation in R?

I'm unfamiliar with the SAS syntax, but a quick look at the SAS documentation for the PROC MIXED arguments SUBJECT = and TYPE gives the following info:
identifies the subjects in your mixed model. Complete independence is assumed across subjects; thus, for the RANDOM statement, the SUBJECT= option produces a block-diagonal structure in with identical blocks. The matrix is modified to accommodate this block diagonality. In fact, specifying a subject effect is equivalent to nesting all other effects in the RANDOM statement within the subject effect.
and
specifies standard variance components and is the default structure for both the RANDOM and REPEATED statements. In the RANDOM statement, a distinct variance component is assigned to each effect
Using table 2 (page 7) in the lme4 vignete for fitting mixed effect models, we are looking for the following statement
library(lme4)
fit <- (g)lmer(outcome ~ fixed_effects + (1|subject/color) + (1|subject/size), data = data)
It has been a while, so I don't remember if a parenthesis works out in the random effects such that it can be reduced to (1|subject/(color + size)). This states that "color and size are random intercepts, nested with subject".
Please note that I filled in the entire call. Here one would have to change outcome, fixed_effects and data to suite your data.

Related

Syntax error when fitting a Bayesian logistic regression

I am attempting to model binary species traits, where presence is represented by 1 and absence by 0, as a function of some sampling variables. To accomplish this, I have constructed a brms model and added a phylogenetic structure to it. Here is the model I used:
model <- brms::brm(male_head | trials(1 + 0) ~
PC1 + PC2 + PC3 +
(1|gr(phylo, cov = covariance_matrix)),
data = data,
family = binomial(),
prior = prior,
data2 = list(covariance_matrix = covariance_matrix))
Each line of my df represents one observation with a binary outcome.
Initially, I was unsure about which arguments to use in the trials() function. Since my species are non-repeated and some have the traits I'm modeling while others do not, I thought that trials(1 + 0) might be appropriate. I recall seeing a vignette that suggested this, but I can't find it now. Is this syntax correct?
Furthermore, for some reason I'm unaware, the model is producing one estimate value for each line of my predictors. As my df has 362 lines, the model summary displays a lengthy list of 362 estimate values. I would prefer to have one estimate value for each sampling variable instead. Although I have managed to achieve this by making the treatment effect a random effect (i.e., (1|PC1) + (1|PC2) + (1|PC3)), I don't think this is the appropriate approach. I also tried bernoulli() but no success either. Do you have any suggestions for how I can address this issue?
EDIT:
For some reason the values of my sampling variables/principal components were being read as factors. The second part of this question was solved.

Heteroskedasticity and Autocorrelation Standard errors for Least Square Dummy Variables (LSDV)

I have panel data set with N = 17 Spanish regions and T = 32 years and I want to perform a fixed effect model which controls for individual heterogeneity. However, as I have 2 time invariant independent variables I can't use the whitin estimator from plm() because it drops them off. Thus, I must use the LSDV like this one:
mcorr <- lm(subv ~ preelec + elec + postelec + ideo + ali + crec_pib + pob + pob16 + pob64 + factor(ccaa)-1, data = datos)
where ccaa is the name of the variable that indicates the individual (region). Of course the results of the coeficients are the same as if I performed the same model using plm() and the whitin estimator.
Nevertheless, when I use robust standard errors to fix heteroskedasticity and autocorrelation in panel data I get different values in the errors, for example while using coeftest(mcorr, vcovHC(mcorr), method = "arellano"). When I use another alternative for the LSDV model, with the command vcovHAC(), errors are similar, but still not identical.
Which is the best way to account for that heteroskedasticity and autocorrelation while using the LSDV method?

How to define random effects in the linear mixed effects model?

I read a paper which applied linear mixed-effects model for data analysis. I am confused about defining random effects in the equations.
First, how to define a combined random effect, such as 𝜀𝑓𝑖𝑒𝑙𝑑−𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦 where 𝑓𝑖𝑒𝑙𝑑 indicates plot number and 𝑠𝑡𝑎𝑏𝑖𝑙𝑖𝑡𝑦 indicates somewhat classification results.
Second, how to include random effects in the slope term, such as intercept + slope * (var1 + random effect) + residuals
I do not know how to write code to represent this equations.
I expect an expression of these equations.
Like Nate mentioned, the lme4 package will do all that you'd need. Their vignette here will have the examples for your answer, particularly section 2.2.
Simple REs can be written using (1 | group) which will add a group-specific intercept estimated, and a random effect on the intercept varying by group for the fixed effect x let's say, can be written as (1 + x | group).

Mixed Modelling - Different Results between lme and lmer functions

I am currently working through Andy Field's book, Discovering Statistics Using R. Chapter 14 is on Mixed Modelling and he uses the lme function from the nlme package.
The model he creates, using speed dating data, is such:
speedDateModel <- lme(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality,
random = ~1|participant/looks/personality)
I tried to recreate a similar model using the lmer function from the lme4 package; however, my results are different. I thought I had the proper syntax, but maybe not?
speedDateModel.2 <- lmer(dateRating ~ looks + personality + gender +
looks:gender + personality:gender +
(1|participant) + (1|looks) + (1|personality),
data = speedData, REML = FALSE)
Also, when I run the coefficients of these models I notice that it only produces random intercepts for each participant. I was trying to then create a model that produces both random intercepts and slopes. I can't seem to get the syntax correct for either function to do this. Any help would be greatly appreciated.
The only difference between the lme and the corresponding lmer formula should be that the random and fixed components are aggregated into a single formula:
dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+ (1|participant/looks/personality)
using (1|participant) + (1|looks) + (1|personality) is only equivalent if looks and personality have unique values at each nested level.
It's not clear what continuous variable you want to define your slopes: if you have a continuous variable x and groups g, then (x|g) or equivalently (1+x|g) will give you a random-slopes model (x should also be included in the fixed-effects part of the model, i.e. the full formula should be y~x+(x|g) ...)
update: I got the data, or rather a script file that allows one to reconstruct the data, from here. Field makes a common mistake in his book, which I have made several times in the past: since there is only a single observation in the data set for each participant/looks/personality combination, the three-way interaction has one level per observation. In a linear mixed model, this means the variance at the lowest level of nesting will be confounded with the residual variance.
You can see this in two ways:
lme appears to fit the model just fine, but if you try to calculate confidence intervals via intervals(), you get
intervals(speedDateModel)
## Error in intervals.lme(speedDateModel) :
## cannot get confidence intervals on var-cov components:
## Non-positive definite approximate variance-covariance
If you try this with lmer you get:
## Error: number of levels of each grouping factor
## must be < number of observations
In both cases, this is a clue that something's wrong. (You can overcome this in lmer if you really want to: see ?lmerControl.)
If we leave out the lowest grouping level, everything works fine:
sd2 <- lmer(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+
(1|participant/looks),
data=speedData)
Compare lmer and lme fixed effects:
all.equal(fixef(sd2),fixef(speedDateModel)) ## TRUE
The starling example here gives another example and further explanation of this issue.

Calculating the standard error of parameters in nlme

I am running a non-linear mixed model in nlme, and I am having trouble calculating the standard errors of the three parameters. We have our final model here:
shortG.nlme9 <- update(shortG.nlme6,
fixed = Asym + xmid + scal ~ Treatment * Breed + Environment,
start = c(shortFix6[1:16], rep(0,2),
shortFix6[17:32], rep(0,2),
shortFix6[33:48], rep(0,2)),
control = nlmeControl(pnlsTol = 0.02, msVerbose = TRUE))
And when we plug it in with the summary statement, we can get the standard errors of each of the treatments, breeds, treatment*breed interactions, and environments. However, we are looking at making growth curves for specific combinations (treatment1/breed1, treatment2/breed1, treatment3/breed1, etc), so we need to combine effects of treatment, breed, and the environments for the parameter values, and logically combine their standard errors to get the SE of the full parameter. To do this, is there either a way to get R to come up with the full SE on its own, or is there an easy way to have R give us a covariate matrix so we can calculate the values by hand? When we look at the basic statistics by simply plugging in the summary(shortG.nlme9) statement, we are automatically given a correlation matrix, so is there something we could write in for a covariate matrix instead?

Resources