glmer predict: warning message about outcome variable not being a factor - r

I have a mixed effect model with binomial outcome fitted with glmer. For plotting purposes I would like to predict population-level values for a small dataset.
Below is an example illustrating my approach:
silly <- glmer(Sex ~ distance +age + (1|Subject), data=Orthodont, family=binomial)
sillypred <- expand.grid(distance=c(20, 25), age=unique(Orthodont$age))
sillypred$fitted <- predict(silly, sillypred, re.form=NA, type="response")
I get the following warning message:
Warning message:
In model.frame.default(delete.response(Terms), newdata, na.action = na.action, :
variable 'Sex' is not a factor
However, when I check, it looks like it is:
str(Orthodont["Sex"])
The variable fitted is still created and the values make sense, but I'm curious about this message. Is there something I should be concerned about? Otherwise, what is the purpose of this message.
It might seem like a trivial question (after all, it all seems to work), in which case I apologize, but I want to make sure that I don't overlook something important.

This appears to be a harmless bug that will always occur when predicting from a model with a factor response (the problem is that we use the xlev argument to model.frame including the levels of the response variable). I've posted this at https://github.com/lme4/lme4/issues/205 . Thanks for the report!
You could either ignore the warning or coerce the response variable to a binary value (which will give identical results).

Related

Solving error for the good beginning of the day

Let's consider data following
library(plm)
library(pglm)
data("EmplUK", package="plm")
I will add new column with 0 and 1 randomly placed. After that I want to perform logit random effects model.
df1<-EmplUK
#adding 0's and 1's
df1<-cbind(df1,'binary'=sample(0:1,1031,replace=T))
#Performing logit regression
pglm(binary~output+wage, data=df1, family=quasibinomial(link='logit'), start = NULL, model = 'random')
And the following problem occurs :
Error in maxRoutine(fn = logLik, grad = grad, hess = hess, start = start, :
argument "start" is missing, with no default
I'm not sure exactly what's the reason, I've read about this error and it seems that there are some problems when you trying to estimate 'within' model, but I get this error for every model type. Could you please give me a hand pointing out reason of this error ?
I don't think the quasibinomial family is setup in this function. Inside pglm there is a function pglm:::starting.values that looks for specific families:
"binomial"
"ordinal"
"poisson"
"negbin"
"gaussian"
"tobit"
Negative binomial allows for modelling of the variance so that may suit your needs else binomial(link='logit') works ok if there's no evidence of overdispersion.
edit: happy to be corrected on this, I haven't worked with this package before :)

Get test error in a logistic regression model in R

I'm performing some experiments with logistic regression in R with the Auto dataset included in R.
I've get the training part (80%) and the test part (20%) normalizing each part individually.
I can create the model without any problem with the line:
mlr<-glm(mpg ~
displacement + horsepower + weight, data =train)
I can even predict train$mpg with the train set:
trainpred<-predict(mlr,train,type="response")
And with this calculate the sample error:
etab <- table(trainpred, train[,1])
insampleerror<-sum(diag(etab))/sum(etab)
The problem comes when I want predict with the test set. I use the following line:
testpred<-predict(model_rl,test,type="response")
Which gives me this warning:
'newdata' had 79 rows but variables found have 313 rows
but it doesn't work, because testpred have the same length of trainpred (should be less). When I want calculate the error in test using testpred with the following line:
etabtest <- table(testpred, test[,1])
I get the following error:
Error en table(testpred, test[, 1]) :
all arguments must have the same length
What I'm doing wrong?
I response my own question if someone have the same problem:
When I put the arguments in glm I'm saying what I want to predict, this is Auto$mpg labels with train data, hence, my glm call must be:
attach(Auto)
mlr<-glm(mpg ~
displacement + horsepower + weight, data=Auto, subset=indexes_train)
If now I call predict, table, etc there isn't any problem of structures sizes. Modifying this mistake it works for me.
As imo says:
"More importantly, you might check that this creates a logistic regression. I think it is actually OLS. You have to set the link and family arguments."
set familiy = 'binomial'

maximum number of covariates in R glm package is 128?

I have a dataset with 146 covariates, and am training a logistic regression.
logit = glm(Y ~ .,
data = pred.dataset[1:1000,],
family = binomial)
The model trains very quickly, but when I then try to view the Beta's with
logit
After the 128th variable the Beta's are all "NA"
I noticed this when trying to export it as pmml and noticed it stopped listing Beta's after 128 predictors.
I've gone through the documentation and can't find a reference to a maximum number of covariates, and also trained on 60k rows - I still see NAs after the 128th predictor.
Is this a limitation of glm, or a limitation of my system? I am running R 3.1.2 64 bit. How can I increase the number of predictors?
This is a question I actually just asked on Stack Exchange, which is where this question should be. See this link:
https://stats.stackexchange.com/questions/159316/logistic-regression-in-r-with-many-predictors?noredirect=1#comment303422_159316 and the subsequent links included in the thread. To answer your question though, basically that is too many predictors for logistic regression, and OLS can be used in this case, and even though it does not yield the best results for a binary outcome, the results are still valid and can be used.
You didn't provide reproducible data, so it's hard to tell exactly what is going on--is there an issue with how some of the variables are coded? Are variables that seem uniform not uniform at all? These would be a couple of situations that could be ruled out with a reproducible code example.
However, I'm answering because I think you may have a legitimate concern. What can you say about these other variables? What type are they? I have been trying to run some logits that seem to be dropping factor levels over 48.
What worked for me (at least to get the model to run in full) was going into the glm() function and changing
mf$drop.unused.levels <- TRUE
to
mf$drop.unused.levels <- FALSE
then saving the function under a different name and using that to run my analyses. (I was inspired by this answer.)
Be warned, though! It gave me some warning messages:
Warning messages:
1: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
2: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
3: In predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
prediction from a rank-deficient fit may be misleading
I know that there are frequency issues in certain groups in the data; I have to analyze these separately and I will do so. But for the time being, I have achieved the prediction of all levels that I wanted.
The first step would be to check your data, though. Part of why this happens with my data is almost certainly due to issues in the data itself, but this approach let me override it. This may or may not be an appropriate solution for you.

Using lme4 glmer function for unbalanced treatment comparison results in variable length error

I am using the lme4 package to run a generalized linear mixed model for proportion data using a binary response. I have unequal sample sizes for my treatments and am getting the following error, which I understand is due to the very fact that I have unequal sample sizes:
Error in model.frame.default(data = POL3, drop.unused.levels = TRUE,
formula = X2 ~ : variable lengths differ (found for 'Trtmt')
Here is the code that leads to the error:
#Excluding NA from the data set
POL3<-na.exclude(POL)
#Indicating the binary response
X2<-cbind(POL3$CHSd, POL3$TotSd-POL3$CHSd)
#Running the model
MMCHS4<-glmer(X2~Trtmt+(1|BSD)+(1|Hgt), family=binomial, data=POL3)
I have read that lme4 can deal with unbalanced samples but can't get this to work.
Impossible to say for sure without a reproducible example, but you probably need to make sure that the Trtmt variable is contained within POL3 (i.e., that there isn't another Trtmt variable lying around in your global workspace).
I would probably implement the model in this way:
glmer(CHSd/TotSd~Trtmt+(1|BSD)+(1|Hgt),
weights=TotSd,
family=binomial,
na.action=na.exclude,
data=POL)

Trouble getting se.fit and confidence intervals using clmm2 from ordinal package

I'm using clmm function from ordinal package in R in order to fit cumulative mixed models to my data. It worked fine until I tried to get predicted probabilities. I can't get either SE or confidence intervals by specifying se.fit=TRUE and interval=TRUE. It looks like this:
mod1<-clmm2(response~X0+X1+X2+X3+X4+X5+X7+X0*X2*X3+X2*X3*X4+X0:X4, random=X6,
data=df,link ="logistic", threshold ="flexible",
Hess=TRUE, nAGQ=7)
As you can see there a bunch of interaction there (all important). I've tried to create a dummy dataset for my problem to be reproducible but clmm can't achieve convergence with a simpler dataset. I took the wine dataset included in the package ordinal and did some changes with the formula to mimic my own (I don't think it makes any sense though):
library(ordinal)
data(wine)
fm1 <- clmm2(rating ~ temp + contact+bottle+temp:contact:bottle+temp:contact+ temp:bottle+bottle:contact,random=judge, data=wine,link ="logistic", threshold ="flexible",
Hess=TRUE, nAGQ=7)
head(do.call("cbind", predict(fm1, se.fit=TRUE, interval=TRUE)))
And then I get this error:
Error in head(do.call("cbind", predict(fm1, se.fit = TRUE, interval = TRUE))) :
error in evaluating the argument 'x' in selecting a method for function 'head' : Erreur dans do.call("cbind", predict(fm1, se.fit = TRUE, interval = TRUE)) : second argument must be a list
My guess is that predict does'nt even compute SE and IC in a case like this. Does anybody knows why? Is there anyway to get those values?
Thanks a lot!
The predict method for clmm2 objects does not offer std-errors. See its help page. This is in keeping with the usual practice of R package authors when dealing with mixed effects models.

Resources