Computational error with lmerTest and undefined columns selected when using mixed() - r

I want to fit a mixed effects linear regression. The dependent variable is acceptability judgments on a 4 point rating scale (Totally unacceptable to Totally unacceptable). These judgments have been assigned a numeric value (1, 2, 3, 4) and that vector was centered and scaled.
I call the model with the following code:
ln1 = lmer(RatingNorm ~ Group + ProfScore + RegularityInflectedForm + RegRhyme*SimilarityReal + Tense + VerbClass + (1|SubjectID) + (1|Infinitive), data=AJT1)
Then try for p values with:
mixed(ln1, AJT1)
No error messages have appeared following fitting the model. Using mixed() from the afex package to get p values gives a strange error message.
Fitting one lmer() model. [DONE]
Calculating p-values.
anova from lme4 is returned
some computational error has occurred in lmerTest
Error in `[.data.frame`(anova_table, , c("NumDF", "DenDF", "F.value", :
undefined columns selected
This has repeated itself when I call the same model using the lmerTest package. I have also tried simpler models with only one of the fixed effects (just Group or just Tense, which are categorical, and just ProfScore, which is continuous), as well as using only one of the two random effects. The same error always repeats. However, I am able to use anova(model) to see p values. I would like to know why I cannot use mixed() successfully in this case. I also have the most recent version of R installed, and am not seeing any posts showing similar errors for this kind of scenario.
Here are links to code and dataset:
R code
Dataset

I am getting a similar problem trying to use piecewiseSEM in R. It runs fine if I eliminate the mixed model portion of the DVpopStand model (and use lm).
dumpster = list(
lmer(DVpopStand ~ Treatment + Site + (1|fLine/fBowl)),
lmer(centroidStand ~ Treatment + Site+ (1|fLine/fBowl)),
lmer(Crush ~ Treatment + Site + centroidStand + DVpopStand + (1|fLine/fBowl))
)
Model.result = sem.fit(dumpster, data= F1culled)
And get the error message:
summary from lme4 is returned
some computational error has occurred in lmerTest
Error in [.data.frame(ret, 3:4) : undefined columns selected

Related

Converting random effect expression from SAS to R lmer syntax

Random effects of my mixed models formula in SAS proc mixed syntax looks like this:
random intercept color size
/type = vc subject = group solution;
I converted it to R lmer syntax as follows:
((1|group) + (0 + color|group)) + ((1|group) + (0 + size|group))
Is it correct?
Can I represent sas random effects formula as follows:
(1|group) + (0 + color|group)) + (0 + size|group) ?
Or is it a wrong implementation in R?
I'm unfamiliar with the SAS syntax, but a quick look at the SAS documentation for the PROC MIXED arguments SUBJECT = and TYPE gives the following info:
identifies the subjects in your mixed model. Complete independence is assumed across subjects; thus, for the RANDOM statement, the SUBJECT= option produces a block-diagonal structure in with identical blocks. The matrix is modified to accommodate this block diagonality. In fact, specifying a subject effect is equivalent to nesting all other effects in the RANDOM statement within the subject effect.
and
specifies standard variance components and is the default structure for both the RANDOM and REPEATED statements. In the RANDOM statement, a distinct variance component is assigned to each effect
Using table 2 (page 7) in the lme4 vignete for fitting mixed effect models, we are looking for the following statement
library(lme4)
fit <- (g)lmer(outcome ~ fixed_effects + (1|subject/color) + (1|subject/size), data = data)
It has been a while, so I don't remember if a parenthesis works out in the random effects such that it can be reduced to (1|subject/(color + size)). This states that "color and size are random intercepts, nested with subject".
Please note that I filled in the entire call. Here one would have to change outcome, fixed_effects and data to suite your data.

Running the same glm model with caret provides different accuracy and errors

As stated in the title, running the same glm model with caret returns different accuracies and errors (no error OR glm.fit: fitted probabilities numerically 0 or 1 occurred OR 1: In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type == : prediction from a rank-deficient fit may be misleading). If I set the seed and always run it with the seed and then the model, predictably I always get the same accuracy and error (or no error) message.
When running the same model with the glm() function, coefficients are always the same (as with caret), but I never ever get any of the errors in this case. Should I just interpret this as being an issue with resample or may the errors provided by the glm of the caret package have any important meaning, if they depend on seed?
I've searched for this and though I assume it has something to do with resampling, I am not quite able to understand how it works and would like assistance in understanding this. Also, I'm trying to use the caret package for all the modelling, so I would also like some help trying to understand if I should instead start my process by always running glm() instead of through the caret package, as this will always provide me the same error message straight away no matter the seed.
Data is from a client, so I'd prefer not to share it. The formula I'm using is (example) simply train(Y ~ X + Z + A, data = df, method = "glm") for the caret version and glm(Y ~ X + Z + A, data = df, family = binomial()) in the glm() function.

Get test error in a logistic regression model in R

I'm performing some experiments with logistic regression in R with the Auto dataset included in R.
I've get the training part (80%) and the test part (20%) normalizing each part individually.
I can create the model without any problem with the line:
mlr<-glm(mpg ~
displacement + horsepower + weight, data =train)
I can even predict train$mpg with the train set:
trainpred<-predict(mlr,train,type="response")
And with this calculate the sample error:
etab <- table(trainpred, train[,1])
insampleerror<-sum(diag(etab))/sum(etab)
The problem comes when I want predict with the test set. I use the following line:
testpred<-predict(model_rl,test,type="response")
Which gives me this warning:
'newdata' had 79 rows but variables found have 313 rows
but it doesn't work, because testpred have the same length of trainpred (should be less). When I want calculate the error in test using testpred with the following line:
etabtest <- table(testpred, test[,1])
I get the following error:
Error en table(testpred, test[, 1]) :
all arguments must have the same length
What I'm doing wrong?
I response my own question if someone have the same problem:
When I put the arguments in glm I'm saying what I want to predict, this is Auto$mpg labels with train data, hence, my glm call must be:
attach(Auto)
mlr<-glm(mpg ~
displacement + horsepower + weight, data=Auto, subset=indexes_train)
If now I call predict, table, etc there isn't any problem of structures sizes. Modifying this mistake it works for me.
As imo says:
"More importantly, you might check that this creates a logistic regression. I think it is actually OLS. You have to set the link and family arguments."
set familiy = 'binomial'

Mixed Modelling - Different Results between lme and lmer functions

I am currently working through Andy Field's book, Discovering Statistics Using R. Chapter 14 is on Mixed Modelling and he uses the lme function from the nlme package.
The model he creates, using speed dating data, is such:
speedDateModel <- lme(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality,
random = ~1|participant/looks/personality)
I tried to recreate a similar model using the lmer function from the lme4 package; however, my results are different. I thought I had the proper syntax, but maybe not?
speedDateModel.2 <- lmer(dateRating ~ looks + personality + gender +
looks:gender + personality:gender +
(1|participant) + (1|looks) + (1|personality),
data = speedData, REML = FALSE)
Also, when I run the coefficients of these models I notice that it only produces random intercepts for each participant. I was trying to then create a model that produces both random intercepts and slopes. I can't seem to get the syntax correct for either function to do this. Any help would be greatly appreciated.
The only difference between the lme and the corresponding lmer formula should be that the random and fixed components are aggregated into a single formula:
dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+ (1|participant/looks/personality)
using (1|participant) + (1|looks) + (1|personality) is only equivalent if looks and personality have unique values at each nested level.
It's not clear what continuous variable you want to define your slopes: if you have a continuous variable x and groups g, then (x|g) or equivalently (1+x|g) will give you a random-slopes model (x should also be included in the fixed-effects part of the model, i.e. the full formula should be y~x+(x|g) ...)
update: I got the data, or rather a script file that allows one to reconstruct the data, from here. Field makes a common mistake in his book, which I have made several times in the past: since there is only a single observation in the data set for each participant/looks/personality combination, the three-way interaction has one level per observation. In a linear mixed model, this means the variance at the lowest level of nesting will be confounded with the residual variance.
You can see this in two ways:
lme appears to fit the model just fine, but if you try to calculate confidence intervals via intervals(), you get
intervals(speedDateModel)
## Error in intervals.lme(speedDateModel) :
## cannot get confidence intervals on var-cov components:
## Non-positive definite approximate variance-covariance
If you try this with lmer you get:
## Error: number of levels of each grouping factor
## must be < number of observations
In both cases, this is a clue that something's wrong. (You can overcome this in lmer if you really want to: see ?lmerControl.)
If we leave out the lowest grouping level, everything works fine:
sd2 <- lmer(dateRating ~ looks + personality +
gender + looks:gender + personality:gender +
looks:personality+
(1|participant/looks),
data=speedData)
Compare lmer and lme fixed effects:
all.equal(fixef(sd2),fixef(speedDateModel)) ## TRUE
The starling example here gives another example and further explanation of this issue.

Using lme4 glmer function for unbalanced treatment comparison results in variable length error

I am using the lme4 package to run a generalized linear mixed model for proportion data using a binary response. I have unequal sample sizes for my treatments and am getting the following error, which I understand is due to the very fact that I have unequal sample sizes:
Error in model.frame.default(data = POL3, drop.unused.levels = TRUE,
formula = X2 ~ : variable lengths differ (found for 'Trtmt')
Here is the code that leads to the error:
#Excluding NA from the data set
POL3<-na.exclude(POL)
#Indicating the binary response
X2<-cbind(POL3$CHSd, POL3$TotSd-POL3$CHSd)
#Running the model
MMCHS4<-glmer(X2~Trtmt+(1|BSD)+(1|Hgt), family=binomial, data=POL3)
I have read that lme4 can deal with unbalanced samples but can't get this to work.
Impossible to say for sure without a reproducible example, but you probably need to make sure that the Trtmt variable is contained within POL3 (i.e., that there isn't another Trtmt variable lying around in your global workspace).
I would probably implement the model in this way:
glmer(CHSd/TotSd~Trtmt+(1|BSD)+(1|Hgt),
weights=TotSd,
family=binomial,
na.action=na.exclude,
data=POL)

Resources