lmer failing with na.pass - r

When I run a lmer model with lme4 using na.pass as the na.action, I get the following error:
R: NA/NaN/Inf in foreign function call (arg 1)
I run the model like this:
model1 <- lme4::lmer(agg_dv_singing ~ GMS.Musical.Training +
JAJ.ability + MDT.ability + MPT.ability + PDCT.ability +
PIAT.ability + agg_dv_long_note + demographics.age +
aggiv_entropy + aggiv_interval_complexity +
aggiv_rhythmic_complexity + aggiv_tonal_complexity +
log.freq + length + (1|p_id),
data = dat, na.action = na.pass)
summary(dat) indicates that there are no Inf or NaN values, although yes, there are many NA values.
Running na.pass outside of lmer on the same data set does not give an error:
na.pass(dat)
So what could be going wrong within lmer?

Comments to a previous question of yours attempted to explain that, in general, mixed model machinery cannot handle estimation from cases when there are missing values in the predictors; it just doesn't work that way. If you want to fit mixed models with missing data you need to do some form of imputation, i.e. filling in values for missing predictors (e.g. see the mice package, which is more or less the state of the art at least as far as the R ecosystem is concerned). Here is what the four different standard na.* actions do in the context of mixed models:
na.fail(): fail immediately if there are missing values in the data (predictors or response). This is frustrating, but alerts you immediately to the fact that you have missing data, and lets you decide what to do about it.
na.omit(): drop non-complete cases from the data before fitting.
na.exclude(): like na.omit(), but keep track of the locations of the excluded cases. When using predict() or residuals() (or any function that produces results per observation), reconstitute a complete data set with NA values for the non-complete cases in the original data set. (I usually find this setting to be the most useful default.)
na.pass: do not remove NA values, but attempt to continue with the fitting procedure. As you found out, this usually doesn't work at all! It will just pass the NA values down through the code until something goes wrong. Typically one of two things happens at this point:
if the entire estimation procedure is written using R functions that can handle and propagate missing values, then you'll usually get a fitted model object with NA/NaN for all coefficients, likelihoods, etc. etc. (because the missing values contaminate the entire fitting procedure);
if some step of the estimation procedure can't handle NA/NaN values (as in this case), you get an inscrutable error from the first point in the procedure that fails.
If you look at the source code of na.pass() (by typing na.pass at the R prompt), you'll see that in fact all it does is return the same object, unchanged. To be honest, I'm not really sure why na.pass even exists, except for completeness ... (or compatibility with S)
Your NA value was not in a parameter that is used in a random-effects term: if it had, you would have gotten a more interpretable error message:
library(lme4)
ss <- sleepstudy
ss[1,"Days"] <- NA
lmer(Reaction ~ Days + (Days|Subject), ss, na.action=na.pass)
Error in lme4::lFormula(formula = Reaction ~ Days + (Days | Subject), :
NA in Z (random-effects model matrix): please use "na.action='na.omit'" or "na.action='na.exclude'"
If I fit a model with (1|Subject), so that the NA value only affects the fixed effects
lmer(Reaction ~ Days + (1|Subject), ss, na.action=na.pass)
then we get your error message.
Error in qr.default(X, tol = tol, LAPACK = FALSE) :
NA/NaN/Inf in foreign function call (arg 1)
traceback() tells me that this happens in the internal chkRank.drop.cols() function, where R is trying to figure out if any of your fixed-effect columns are collinear. There should probably be a check for missing values there ...

Related

"Non conformable arguments" error with pgmm (plm library)

I am unsuccessfully trying to do the Arellano and Bond (1991) estimation using pgmm from the plm package. To see if the problem was in my data, I instead used the data supplied i the plm library, but got the same problem when using the "summary" command:
Error in t(y) %*% x : non-conformable arguments
The coefficients of the model can be obtained though.
My own data has T=3, N=290. As I understand it T=3 is the minimnum, but should be sufficient. When using the Arellano and Bond data, I get the same error when T=4.
data("EmplUK", package = "plm")
library(sqldf)
UK<-sqldf("select * from EmplUK where year in ('1982','1981',
'1980','1979')")
z1 <- pgmm(log(emp) ~ lag(log(emp), 1) + log(wage) +
log(capital) + log(output) | lag(log(emp), 2),
data = UK, effect = "twoways", model = "twosteps")
summary(z1)
The way I understand the estimation method and the R-formula, the left hand term is the difference in the dependent variable, and the first right hand term is the lagged difference. And the latter term is instrumented by the level of the dependent variable in (t-2)
I have verified that subset I use is a balanced panel with T=4. When I include more years, everything works out. So it must be the length of the panel that causes troubles.
Any help would be much appreciated.
A similar question is asked here. It is suggested that the error has to do with mtest, a serial correlation test performed by the pgmm summary method. Running the function separately seems to confirm this
>mtest(z1, order = 2)
Error in t(y) %*% x : non-conformable arguments
T=3 is enough to estimate the model, but this only only leaves you with an estimate for the last period. A second order mtest requires the residuals to contain at least 3 periods, i.e. T=5 for your model.

Error message: Error in fn(x, ...) : Downdated VtV is not positive definite

I'm trying to use the lmer function to create a minimum adequate model. My model is Mated ~ Size * Attempts * Status + (random factor).
as.logical(Mated)
as.numeric(Size)
as.factor(Attempts)
as.factor(Status)
(These have all worked on previous models)
So after all that I try running my model:
Model1<-lmer(Mated ~ Size*Status*Attempts + (1|FemaleID),data=mydata)
And it can be submitted without fault.It's only when I try to apply this update that it goes wrong:
Model2<-update(Model1, REML=FALSE)
Here is the error message supplied:
Error in fn(x, ...) : Downdated VtV is not positive definite
If I make a third model without the interaction and do an ANOVA between that and model one, then it says the two are significantly different.
Model3<-update(Model1,~.-Size:Status:Attempts
anova(Model1,Model3)
What am I doing wrong? Is the three way interaction really significant or have I made some mistake?
Thank you
If Mated is binary, then you should probably be using glmer with a logit or probit link function instead, something like:
model <- glmer(Mated ~ Size * Status * Attempts + (1|FemaleID),
data = mydata, family = binomial)
It would help if you could let us know what your data looks like (head(mydata) might be fine, or see here for how to make a reproducible example).
Also, I would avoid making Mated logical (see this question and answer for how it can make your life more difficult). Instead, as.factor(Mated) will explicitly make your response variable discrete.
After that, you can compare your full and reduced models with anova().

randomForest() machine learning in R

I am exploring with the function randomforest() in R and several articles I found all suggest using a similar logic as below, where the response variable is column 30 and independent variables include everthing else except for column 30:
dat.rf <- randomForest(dat[,-30],
dat[,30],
proximity=TRUE,
mtry=3,
importance=TRUE,
do.trace=100,
na.action = na.omit)
When I try this, I got the following error messages:
Error in randomForest.default(dat[, -30], dat[, 30], proximity = TRUE, :
NA not permitted in predictors
In addition: Warning message:
In randomForest.default(dat[, -30], dat[, 30], proximity = TRUE, :
The response has five or fewer unique values. Are you sure you want to do regression?
However, I was able to get it to work when I listed the independent variables one by one while keeping all the other parameters the same.
dat.rf <- randomForest(as.factor(Y) ~X1+ X2+ X3+ X4+ X5+ X6+ X7+ X8+ X9+ X10+......,
data=dat
proximity=TRUE,
mtry=3,
importance=TRUE,
do.trace=100,
na.action = na.omit)
Could someone help me debug the simplier command where I don't have to list each predictor one by one?
The error message gives you a clue to two problems:
First, you need to remove any row that has a NA anywhere. Removing NA should be easy enough and I'll leave you that one as an exercise.
It looks like you need to do classification (which predicts a response which only has one of a few discrete levels), rather than regression (which predicts a continuous response). If the response is continuous, randomForest() will automatically apply regression.
So, how do you force randomForest() to use classification?As you noticed in your first try, randomForest allows you to give data as predictors and response data, not just using the formula style. To force randomForest() to apply classification, make sure that the value you are trying to predict (the response, or dat[,30]) is a factor. Remember to explicitly identify the $x$ and $y$ arguments. This is easy to do:
randomForest(x = dat[,-30],
y = factor(dat[,30]),
...)
This way your output can only take one of the levels given in y.
This is all buried in the description of the arguments $x$ and $y$: see ?help.

Odd behavior with step()

step() and stepAIC() produce a "remove missing values error" when running the code on data with missing values.
Error in step(mod1, direction = "backward") :
number of rows in use has changed: remove missing values?
According to ?step:
The model fitting must apply the models to the same dataset. This may be
a problem if there are missing values and R's default of na.action = na.omit
is used. We suggest you remove the missing values first.
I have a data frame with one variable which has four na values. However, when I run step on the lm object, I don't get the "missing values" error even though it has missing values. Can anyone tell me what could be going on?
> d1$Impressions
[1] NA NA NA 6924180 9313226 27888455
18213812 54557205 13495553
...
This does not produce an error message:
mod1 = lm(Leads ~ G + Con + GOO + DAY + Res + SD + ED +
ME + Impressions + Inc + Sea, data=d1)
step(mod1, direction="backward")
stepAIC(mod1)
Even with a variable which has missing values, it's not generating an error message. Any ideas on what's going on?
One reason for the stated behaviour is this. step() fits the full model and hence drops 3 (as stated) observations due to presence of NAs. As long as the variables for which there are NAs remain in the model, the lm() function will remove those observations at each step. If stepping stops before it removes a variable that would result in one of the previously removed observations remaining in the model, then no error will be raised, because the numbers of rows in the model matrix will not have changed.
As an aside, stepwise selection like this is considered to be of somewhat dubious validity. Not least, in using it you a making a fairly bold statement that the effects of the eliminated variables are exactly equal to zero. This also has the effect of biasing the effect (estimated coefficients) of the variables retained in the model to have larger (absolute) value.
Alternatives to this stepwise selection include shrinkage methods such as the Lasso and the Elastic Net.

Error in fitting a model with gee(): NA/NaN/Inf in foreign function call (arg 3)

I'm fitting a gee model on a dataset including 13,500 observations (here students). Students are grouped into 52 different schools. I know that there is evidence that students are nested within schools (low ICC) and therefore I should adjust this nesting effect in the variance covariance matrix. What I'm planning to do is to first fit a gee model with exchangeable var-cov structure. Then, on top of that, I'll run Huber-White Sandwich estimator also known as robust variance estimator. I wrote my own code for robust variance estimator and it works perfectly. My gee statement doesn't work and give the error below:
NA/NaN/Inf in foreign function call (arg 3)
Here is my code:
STMath.OneYr.C1 = gee(postCSTMath1Yr ~ TRT1Yr + preCSTMath + preCSTENG +
post1YrGradeRef + ELLBaseLine + GENDER + ECODIS + ETHNICITY.F +
as.factor(FailedInd1Yr), data = UCI.clone[UCI.clone$COHORT0809 == "C1",],
id = post1YrSchIID, corstr = "exchangeable")
Unfortunately, the code above is not reproducible for you guys and perhaps difficult to figure out what the issue is.
I appreciate if you could help me figure out to solve the issue.
OK, this question is quite old but I ended up here, so this might help someone eventually.
Basically, this error was caused because unlike in other libraries, the id parameter is treated as a numeric vector.
Indeed, the gee function is casting id as a double, which I don't really understand. Here are the implicated lines (l. 119-120 of the function):
if (!(is.double(id)))
id <- as.double(id)
If your id column is a character, just cast it to a factor, or use some function (like dplyr::min_rank) to turn it to a numeric variable.
This should do the trick.

Resources