I know this is a kinda old question, but I just wonder if this has a solution now.
I usually perform the mixed effects model with lme4 package with lmer function. However, I know this function does not allow me to include the negative variance components in the model.
I would really want to include the negative variances in model with R. Does anyone have any suggestions that what packages I would use? Or, does the new lme4 allow it?
New lme4 doesn't allow it, nor does nlme. It looks like ASReml might do it, if you set the IGU argument as described here -- but ASReml is commercial, so you'd have to buy a license.
The more statistically sound way to deal with this situation is typically to fit a compound symmetry variance structure, which will allow negative correlations within groups. You can do this via nlme, or in several somewhat more experimental frameworks (e.g. the "flexLambda" development branch of lme4).
More discussion from the mailing list here.
Related
Unfortunately, I had convergence (and singularity) issues when calculating my GLMM analysis models in R. When I tried it in SPSS, I got no such warning message and the results are only slightly different. Does it mean I can interpret the results from SPSS without worries? Or do I have to test for singularity/convergence issues to be sure?
You have two questions. I will answer both.
First Question
Does it mean I can interpret the results from SPSS without worries?
You do not want to do this. The reason being is that mixed models have a very specific parameterization. Here is a screenshot of common lme4 syntax from the original article about lme4 from the author:
With this comes assumptions about what your model is saying. If for example you are running a model with random intercepts only, you are assuming that the slopes do not vary by any measure. If you include correlated random slopes and random intercepts, you are then assuming that there is a relationship between the slopes and intercepts that may either be positive or negative. If you present this data as-is without knowing why it produced this summary, you may fail to explain your data in an accurate way.
The reason as highlighted by one of the comments is that SPSS runs off defaults whereas R requires explicit parameters for the model. I'm not surprised that the model failed to converge in R but not SPSS given that SPSS assumes no correlation between random slopes and intercepts. This kind of model is more likely to converge compared to a correlated model because the constraints that allow data to fit a correlated model make it very difficult to converge. However, without knowing how you modeled your data, it is impossible to actually know what the differences are. Perhaps if you provide an edit to your question that can be answered more directly, but just know that SPSS and R do not calculate these models the same way.
Second Question
Or do I have to test for singularity/convergence issues to be sure?
SPSS and R both have singularity checks as a default (check this page as an example). If your model fails to converge, you should drop it and use an alternative model (usually something that has a simpler random effects structure or improved optimization).
I have a dataset in which individuals, each belonging to a particular group, repeatedly chose between multiple discrete outcomes.
subID group choice
1 Big A
1 Big B
2 Small B
2 Small B
2 Small C
3 Big A
3 Big B
. . .
. . .
I want to test how group membership influences choice, and want to account for non-independence of observations due to repeated choices being made by the same individuals. In turn, I planned to implement a mixed multinomial regression treating group as a fixed effect and subID as a random effect. It seems that there are a few options for multinomial logits in R, and I'm hoping for some guidance on which may be most easily implemented for this mixed model:
1) multinom - GLM, via nnet, allows the usage of the multinom function. This appears to be a nice, clear, straightforward option... for fixed effect models. However is there a manner to implement random effects with multinom? A previous CV post suggests that multinom is able to handle mixed-effects GLM with poisson distribution and a log link. However, I don't understand (a) why this is the case or (b) the required syntax. Can anyone clarify?
2) mlogit - A fantastic package, with incredibly helpful vignettes. However, the "mixed logit" documentation refers to models that have random effects related to alternative specific covariates (implemented via the rpar argument). My model has no alternative specific variables; I simply want to account for the random intercepts of the participants. Is this possible with mlogit? Is that variance automatically accounted for by setting subID as the id.var when shaping the data to long form with mlogit.data? EDIT: I just found an example of "tricking" mlogit to provide random coefficients for variables that vary across individuals (very bottom here), but I don't quite understand the syntax involved.
3) MCMCglmm is evidently another option. However, as a relative novice with R and someone completely unfamiliar with Bayesian stats, I'm not personally comfortable parsing example syntax of mixed logits with this package, or, even following the syntax, making guesses at priors or other needed arguments.
Any guidance toward the most straightforward approach and its syntax implementation would be thoroughly appreciated. I'm also wondering if the random effect of subID needs to be nested within group (as individuals are members of groups), but that may be a question for CV instead. In any case, many thanks for any insights.
I would recommend the Apollo package by Hess & Palma. It comes with a great documentation and a quite helpful user group.
I have recently run an ensemble classifier in MLR (R) of a multicenter data set. I noticed that the ensemble over three classifiers (that were trained on different data modalities) was worse than the best classifier.
This seemed to be unexpected to me. I was using logistic regressions (without any parameter optimization) as simple classifier and a Partial Least Squares (PLS) Discriminant Analysis as a superlearner, since the base-learner predictions ought to be correlated. I also tested different superlearners like NB, and logistic regression. The results did not change.
Here are my specific questions:
1) Do you know, whether this can in principle occur?
(I also googled a bit and found this blog that seems to indicate that it can:
https://blogs.sas.com/content/sgf/2017/03/10/are-ensemble-classifiers-always-better-than-single-classifiers/)
2) Especially, if you are as surprised as I was, do you know of any checks I could do in mlr to make sure, that there isnt a bug. I have tried to use a different cross-validation scheme (originally I used leave-center-out CV, but since some centers provided very little data, I wasnt sure, whether this might lead to weird model fits of the super learner), but it still holds. I also tried to combine different data modalities and they give me the same phenomenon.
I would be grateful to hear, whether you have experienced this and if not, whether you know what the problem could be.
Thanks in advance!
Yes, this can happen - ensembles do not always guarantee a better result. More details regarding cases where this can happen are discussed also in this cross-validate question
I'm working with a large longitudinal dataset of firm-year observations. For some time now I have been using lme4to implement crossed (non-nested) effects for year and firm-ID groups.
My goal is now to correct for the serial correlation in the firm-group dimension. Based on chl's and fabians' answers to this question (as well as Ben Bolker's comment on the latter), I've assumed this is impossible with lmer(), but is feasible with nlme::lme().
I have been able to implement crossed effects in nlme based on the discussion in Pinheiro & Bates (2000, sec. 4.2.2, pp. 163-6). In principle then, I believe I can use the correlation = AR1() speficiation in lme() to control for autocorrelation.
My strong preference, however, would be to implement such a correlation specification in lmer() because:
lme4 is much, much (much) faster
nlme requires crossed effects to be nested in some higher group -- without such a higher level grouping I'm forced to create an arbitrary dummy for groupedData to which all observations belong (e.g., here). This creates issues interpreting the relative levels of variation between the two crossed groups and the residual variance because some of the variation appears to be captured by the higher-level dummy group.
I got excited when I found the feature request #224 on GitHub, but alas it doesn't seem like there's much movement on the flexLambda front (please let me know if I'm wrong!).
lme4 v1.1-10
I've just noticed that the latest (Oct. 2015) version of lme4 contains a vcconv command that can
Convert between representation of (co-)variance structures (EXPERIMENTAL.)
Based on the source code, it seems that maybe the sdcor2cov option could allow one to specify a correlation structure such as AR(1).
So my questions are:
Is this interpretation of the vcconv function correct?
If so, does the user supply the correlation (e.g., AR(1)) parameters or are they determined internally in lmer()?
How does one implement this function properly?
How can one see in a Boosted trees classification model for machine learning (adaboost), which variables interact with each other and how much? I would like to make use of this in R gbm package if possible.
To extract the interaction between input variables, you can use any package like lm. http://www.r-bloggers.com/r-tutorial-series-regression-with-interaction-variables/
You can use ?interact.gbm. See also this cross-validated question, which directs to a vignette of a related technique from the package dismo.
In general, these interactions may not necessarily agree with the interaction terms estimated in a linear model.